5,649 Matching Annotations
  1. Dec 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript by Bai et al concerns the expression of Scleraxis (Scx) by muscle satellite cells (SCs) and the role of that gene in regenerative myogenesis. The authors report the expression of this gene associated with tendon development in satellite cells. Genetic deletion of Scx in SCs impairs muscle regeneration, and the authors provide evidence that SCs deficient in Scx are impaired in terms of population growth and cellular differentiation. Overall, this report provides evidence of the role of this gene, unexpectedly, in SC function and adult regenerative myogenesis.

      We appreciate the comments and thank her/him for the support.

      There are a few minor points of concern.

      (1) From the data in Figure 1, it appears that all of the SCs, assessed both in vitro and in vivo, express Scx. The authors refer to a scRNA-seq dataset from their lab and one report from mdx mouse muscle that also reveals this unexpected gene expression pattern. Has this been observed in many other scRNA-seq datasets? If not, it would be important to discuss potential explanations as to why this has not been reported previously.

      Thanks for this question regarding data in Fig.1. We did initially use immunofluorescence staining of Pax7 and GFP on muscle sections and primary myoblast cultures prepared from Tg-ScxGFP mice to conclude that Scx was expressed in satellite cells (SCs). In addition to the cited mdx RNA-seq data, we have included a re-analysis of a published scRNA-seq data set in Fig.2E (Dell'Orso et al., Development, 2019), and our own scRNA-seq data (Fig.S5D, F). We have now re-examined an additional scRNA-seq data set of TA muscles at various regeneration time points (De Micheli et al., Cell Rep. 2020), in which Scx expression was detected in MuSC progenitors and mature muscle cells. We have added the De Micheli et al. reference and the re-analysis of that scRNA-seq data set for Scx expression as an additional panel in Fig. 2E, with accompanying text (p. 7, ln. 4-6). Thus, our immunostaining results are consistent with scRNA-seq data from our and two other independent scRNA-seq data sets.

      We think that Scx expression in the adult myogenic lineage was not previously reported mainly because its expression level was low, and might be dismissed as spurious detection. Additionally, detecting such low expression levels requires sophisticated detection methods with high capture efficiency. Previous studies have noted limitations in transcript capture or transcription factor dropout in 10x Genomics-based datasets (Lambert et al., Cell, 2018; Pokhilko et al., Genome Res., 2021). The most likely and straightforward reason is that Scx was simply not a focus in prior studies amid so many other genes of interest. We have now added this last explanation in the text (p.7, ln. 8-9), following the re-analyses of Scx expression in published scRNA-seq data sets.

      (2) A major point of the paper, as illustrated in Fig. 3, is that Scx-neg SCs fail to produce normal myofibers and renewed SCs following injury/regeneration. They mention in the text that there was no increased PCD by Caspase staining at 5 DPI. A failure of cell survival during the process of SC activation, proliferation, and cell fate determination (differentiation versus self-renewal) would explain most of the in vivo data. As such, this conclusion would seem to warrant a more detailed analysis in terms of at least one or two other time points and an independent method for detecting dead/dying cells (the in vitro data in Fig. 4F is also based on an assessment of activated Caspase to assess cell death). The in vitro data presented later in Fig. S4G, H do suggest an increase in cell loss during proliferative expansion of Scx-neg SCs. To what extent does cell loss (by whatever mechanism of cell death) explain both the in vivo findings of impaired regeneration and even the in vitro studies showing slower population expansion in the absence of Scx?

      We appreciate these constructive suggestions. Based on the number of available control and cKO animals, we were limited to one additional time point at 3 dpi to assess PCD by TUNEL in vivo. We were disappointed again to find no appreciable levels of PCD at 3 dpi by TUNEL (new Fig.S4I), thus no quantifications were included. We also re-did the in vitro experiment using purified SCs and monitored PCD by staining for cleaved Caspase-3 using a validated tube of antibodies (positive staining after 6 h of treatment by 1 mM staurosporine of control and ScxcKO cells; included as new Fig. S4J and legend). We were pleased to find an increase of cleaved Caspase3 stained cells, i.e. PCD, of Scx-cKO SCs at day 4 in culture, compared to that of the control. We have now replaced the old Fig. 4F with new Fig.4F and 4G to document PCD. We also provided new text/legend for these new data (p.10. ln. 2-10; new legend for Fig. 4F and 4G).

      (3) I'm not sure I understand the description of the data or the conclusions in the section titled "Basement membrane-myofiber interaction in control and Scx cKO mice". Is there something specific to the regeneration from Scx-neg myogenic progenitors, or would these findings be expected in any experimental condition in which myogenesis was significantly delayed, with much smaller fibers in the experimental group at 5 DPI?

      We very much appreciate this comment. We agree that there is unlikely anything specific about the regeneration from Scx-negative myogenic progenitors. Unfilled or empty ghost fibers (basement membrane remnant) are expected due to small fiber and poor regeneration in the ScxcKO mice at 5 dpi. We have removed the subtitle and changed the content to an expected consequence rather than something special (p. 8, ln. 19-22).

      (4) The data presented in Fig. 4B showing differences in the purity of SC populations isolated by FACS depending on the reporter used are interesting and important for the field. The authors offer the explanation of exosomal transfer of Tdt from SCs to non-SCs. The data are consistent with this explanation, but no data are presented to support this. Are there any other explanations that the authors have considered and that could be readily tested?

      Thanks for highlighting this phenomenon. We struggled with the SC purity issue for a long time. The project started with using the R26RtdT reporter for tdT’s paraformaldehyde  resistant strong fluorescence (fixation) to aid visualization in vivo. Later, when we used the tdT signal to purify SCs by FACS, we found that only 80% sorted tdT+ cells are Pax7+. We then switched to the R26RYFP reporter, from which we achieved much higher purity (95%) of SCs (Pax7+) by FACS. As such, we also repeated and confirmed many in vivo experimental results using the R26RYFP reporter (included in the manuscript). Due to the low purity of tdT+SCs by FACS, we discontinued that mouse colony after we confirmed the superior utility of the R26RYFP reporter for SC isolation.

      We sincerely apologize for not being able to conduct further testable experiments on this intriguing phenomenon. However, this issue has since been addressed and published by Murach et al., iScience, (2021). Like our experience, they found non-satellite mononuclear cells with tdT fluorescence after TMX treatment when SCs were isolated via FACS. To determine this was not due to off-target recombination or a technical artifact from tissue processing, they conducted extensive analyses. They found that the tdT+ mononuclear cells included fibrogenic cells (fibroblasts and FAPs), immune cells/macrophages, and endothelial cells. Additionally, they confirmed the significant potential of extracellular vesicle (EV)-mediated cargo transfer, which facilitates the transfer of full-length tdT transcript from lineage-marked Pax7+ cells to those mononuclear cells. We have modified the text to emphasize and acknowledge their contribution to this important point, and explained the difference between YFP and tdT reporter alleles in more detail (p.9, ln. 11-17).

      (5) The Cut&Run data of Fig. 6 certainly provide evidence of direct Scx targets, especially since the authors used a novel knock-in strain for analyses. The enrichment of E-box motifs provides support for the 207 intersecting genes (scRNA-seq and Cut&Run) being direct targets. However, the rationale elaborated in the final paragraph of the Results section proposing how 4 of these genes account for the phenotypes on the Scx-neg cells and tissues is just speculation, however reasonable. These are not data, and these considerations would be more appropriate in the Discussion in the absence of any validation studies.

      We agree with this comment and have moved speculations into the Discussion (p. 15, ln. 4-15, and from p. 18, ln. 4 to p. 19, ln. 4).

      Reviewer #2 (Public Review):

      Summary:

      Scx is a well-established marker for tenocytes, but the expression in myogenic-lineage cells was unexplored. In this study, the authors performed lineage-trace and scRNA-seq analyses and demonstrated that Scx is expressed in activated SCs. Further, the authors showed that Scx is essential for muscle regeneration using conditional KO mice and identified the target genes of Scx in myogenic cells, which differ from those of tendons.

      Strengths:

      Sometimes, lineage-trace experiments cause mis-expression and do not reflect the endogenous expression of the target gene. In this study, the authors carefully analyzed the unexpected expression of Scx in myogenic cells using some mouse lines and scRNA-seq data.

      We appreciate the comments and thank her/him for noting the strengths of our manuscript.

      Weaknesses:

      Scx protein expression has not been verified.

      We are aware of this weakness. We had previously used Western blotting (WB) using cultured SCs from control and ScxcKO mice, but did not detect endogenous Scx protein even in the control. In response to this comment, we have re-done several WB experiments using new lysates from control and ScxcKO SCs and two commercial antibodies: anti-Scx antibody 1 from Abcam (ab58655) and anti-Scx antibody 2 from Invitrogen (PA5-23943). These antibodies have been reported to detect endogenous Scx protein in tendon cells in Spang et al., BMC Musculoskelet Disord (2016) and  Bochon et al., Int J Stem Cells (2021). Despite our best efforts, we were not able to detect a reliable Scx band. We have also conducted immunofluorescence using these two antibodies. Still, we failed to detect a difference of staining signals between control and cKO SCs using these antibodies. Lastly, we conducted immunofluorescence using the ScxTy1 myoblasts and we did not find the staining signal coinciding with the Ty1 signal (by double staining). We have been very frustrated by not knowing what caused this technical difficulty in our hands. Given that these were negative data, we did not include them. However, we do hope that the combined data from scRNA-seq, ScxCreERT2 lineage-tracing, Tg-ScxGFP expression, and ScxTy1 knock-in together are deemed sufficient to make up for the deficiency of data for endogenous Scx protein in regenerative myogenic cells.

      Response to Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      p. 8: The text refers to Fig. 3I, but this should be Fig. 3H.

      We apologize for the confusion. Please note that by keeping all 14 dpi data in the same row, we placed Fig.3I at an unconventional/unexpected position, i.e., next to 3D &3E, and above 3F-H. We were aware that this unconventional placement could cause confusion, and it did. With that said, we have now re-arranged the subfigures (same data content) so that the updated Fig.3 contains subfigures in the expected and proper spatial order. We double-checked the figure referral in the text (p. 8, ln. 16-17) and the text is correct – just that the original Fig.3I should have been at the original Fig.3H position and that is now corrected.

      Reviewer #2 (Recommendations For The Authors):

      (1) Given that Scx binds to the E-box and regulates gene expression, it is of interest to know the relevance between MyoD and Scx. If possible, the reviewer recommends to include some discussions.

      Thanks for the comment. MyoD1 is a well-known transcript factor regulating myogenesis, whereas Scx is primarily studied in tenocytes and other connective tissues. We agree that our new findings deserve a discussion regarding the relevance between MyoD1 and Scx.  We have added a description of their differences in the discussion and two new references (p.19, ln. 7-17).

      (2) Considering that Scx is a transcriptional factor, it is interesting that Scx-GFP was not detected in the nuclei of regenerated myofibers. Could the subcellular localization of Scx-GFP provide some insights into the function of Scx as a transcription factor during muscle regeneration?

      Tg-ScxGFP is a transgenic line generated by random insertion into the genome (Pryce et al., 2007; cited). The plasmid used for transgenesis was constructed by replacing most of Scx’s first exon with GFP, and including ~ 9Kb flanking regulatory sequences. As such, the ScxGFP is not a fusion gene, but rather that the GFP expression is regulated by Scx promoter and enhancer(s). This GFP reporter lacks a nuclear localization signal (NLS), hence it is mainly detected in the cytoplasm; some nuclear signal is detected, presumably due to GFP’s small size permitting passive diffusion into the nucleus. Thus, the GFP signal is used as a reporter for Scx expression, but GFP subcellular localization does not provide insight into Scx function per se. Conversely, ScxTy1/Ty1 is a knock-in allele created by fusing a triple-Ty1 tag (3XTy1) to the C-terminus of Scx, and we observed that Ty1 is located in the nucleus by the immunofluorescent staining. We used the Ty1 epitope to carry out CUT&RUN experiments to gain insight to the function of Scx as a transcription factor.

      (3) Fig1D The number of arrows in the Merge image is not matched with others. In addition, the star mark in the Pax7 image is likely an error.

      Apologies. We have now corrected these errors in the revised Fig.1D.

      (4) FigS1A Is there only one myofiber shown in the dashed line in this image? It is unclear why only this myofiber is surrounded by the dashed line.

      The dashed line encircles a single fiber because it was not visible in the provided image. However, there are 3 fibers in this image. Because we did not immuno-stain for myofibers here, we circled one fiber for illustration. For clarity, we brightened the background (of the entire original images) so the background signals from myofiber boundaries are discernable without outlines.

      (5) FigS1B There was no overlapped DAPI staining in the Myogenin+ cell. DAPI-staining should be present in Myogenin+ cells because myogenin is located in the nucleus.

      Fig.S1B is immuno-staining for MyoD , and we marked one MyoD+DAPI+GFP+ cell/nucleus. Fig.S1C is immune-staining for Myogenin, and we also marked one (cell/nucleus) that is triple positive.

      (6) The position of the asterisk for the ScxGFP in FigS1D is misaligned. In addition, the position is not matched with Fig1C. Because all myofibers are Scx-positive, it is strange that only one myofiber has an asterisk. The reviewer suggests removing the mark.

      Thank you for pointing out these errors. We have now corrected the misalignment and removed the unnecessary asterisk.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This study presents valuable experimental and numerical results on the motility of a magnetotactic bacterium living in sedimentary environments, particularly in environments of varying magnetic field strengths. The evidence supporting the claims of the authors is solid, although the statistical significance comparing experiments with the numerical work is weak. The study will be of interest to biophysicists interested in bacterial motility. 

      We thank the reviewers and editors for their careful reading and the constructive comments. With respect to the statement about weak statistical significance, we think that this statement mixes two separate issues, the significance of the difference between experiments at 0 and 50µT and the comparison of experiments with simulations. We have amended our manuscript to address both points as described below. The difference between the experiments at 0 and 50µT is indeed significant, and the discrepancy between experiments and simulations can be explained by unavoidable differences in the way we quantify bacterial throughput.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors present experimental and numerical results on the motility Magnetospirillum gryphiswaldense MSR-1, a magnetotactic bacterium living in sedimentary environments. The authors manufactured microfluidic chips containing three-dimensional obstacles of irregular shape, that match the statistical features of the grains observed in the sediment via microcomputer tomography. The bacteria are furthermore subject to an external magnetic field, whose intensity can be varied. The key quantity measured in the experiments is the throughput ratio, defined as the ratio between the number of bacteria that reach the end of the microfluidic channel and the number of bacteria entering it. The main result is that the throughput ratio is non-monotonic and exhibits a maximum at magnetic field strength comparable with Earth's magnetic field. The authors rationalize the throughput suppression at large magnetic fields by quantifying the number of bacteria trapped in corners between grains. 

      Strengths: 

      While magnetotactic bacteria's general motility in bulk has been characterized, we know much less about their dynamics in a realistic setting, such as a disordered porous material. The micro-computer tomography of sediments and their artificial reconstruction in a microfluidic channel is a powerful method that establishes the rigorous methodology of this work. This technique can give access to further characterization of microbial motility. The coupling of experiments and computer simulations lends considerable strength to the claims of the authors, because the model parameters (with one exception) are directly measured in the experiments. 

      Weaknesses: 

      The main weakness of the manuscript pertains to the discussion of the statistical significance of the experimental throughput ratio. Especially when comparing results at zero and 50 micro Tesla. The simulations seem to predict a stronger effect than seen in the experiments. The authors do not address this discrepancy. 

      We thank the reviewer for their positive assessment and the detailed constructive remarks. 

      The increase in bacterial throughput between 0 and 50 µT is indeed more pronounced in the simulations than in the experiments, partly due to the fact that there is considerably more variability in the experimental data. We did two things to address this issue: (1) We performed additional statistical test addressing the difference between the experimental results at 0 and 50 µT. Indeed, the difference is only weakly significant (in contrast to the difference of either to 500µT). The increase is however consistent with the observation in the absence of obstacles in the channel, where we see a monotonous increase from 0 to 500 µT (Supp. Figure S5). We have added the test results in the caption of Fig. 3. (2) To address the difference between simulations and experiments, we added a section in Methods on how we determine the throughput and a short discussion in the Results section. The key points are that the initial condition is different in simulations and experiments and that the throughput is therefore quantified differently. This difference is due to experimental limitations: we cannot track bacteria through the whole channel and we wanted to avoid pushing them into the channel with fluid flow to avoid effects of flow on the results. As a consequence, bacteria continue to enter the IN region of the channel from the inlet during the experiment, while in the simulation, they all start at the beginning of the channel simultaneously. We expect this to mostly affect the case with diffusive transport (B=0).

      Reviewer #2 (Public Review): 

      Summary: 

      simulation study of magnetotactic bacteria in microfluidic channels containing sediment-mimicking obstacles. The obstacles were produced based on micro-computer tomography reconstructions of bacteria-rich sediment samples. The swimming of bacteria through these channels is found experimentally to display the highest throughput for physiological magnetic fields. Computer simulations of active Brownian particles, parameterized based on experimental trajectories are used to quantify the swimming throughput in detail. Similar behavior as in experiments is obtained, but also considerable variability between different channel geometries. Swimming at strong field is impeded by the trapping of bacteria in corners, while at weak fields the direction of motion is almost random. The trapping effect is confirmed in the experiments, as well as the escape of bacteria with reducing field strength. 

      Strengths: 

      This is a very careful and detailed study, which draws its main strength from the fruitful combination of the construction of novel microfluidic devices, their use in motility experiments, and simulations of active Brownian particles adapted to the experiment. Based on their results, the authors hypothesize that magnetotactic bacteria may have evolved to produce magnetic properties that are adapted to the geomagnetic field in order to balance movement and orientation in such crowded environments. They provide strong arguments in favor of such a hypothesis. 

      Weaknesses: 

      Some of the issues touched upon here have been studied also in other articles. It would be good to extend the list of references accordingly and discuss the relation briefly in the text. 

      We thank the reviewer for the constructive comments. We answer to the point concerning previous literature in the response to the recommendations below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Here follows a list of points the authors should address. 

      (1) Are additional experiments feasible to decrease the statistical noise present in Fig. 3c? At the very least, the authors should discuss the statistical significance of the results at 50 muT vis-a-vis 0 T. 

      See our response to Strengths/Weaknesses above

      (2) The experimental setup is not immediately clear. I think that adding a panel from Fig. S1 (or a sketch thereof) would help clarify, especially in relation to the entry zone and end zone. 

      We are not sure what you mean. Fig. 3A already contains exactly such a panel. We have however added another supplementary figure that shows an additional detailed view of the setup (Fig. S3). In addition, we revised several figures: We have replaced Fig. S1 with a better version and exchanged the schematic view of the obstacle channel in Fig 1, removing the additional inlets that were not used in this study (also in Fig 3A), Instead we added a comment in Methods explaining their presence. Hopefully this makes the setup clear.

      (3) It should be also stated that there is no external flow imposed on the channel. 

      We have added such a statement in the description of the experiment (in section 2.2 Swimming of magnetotactic bacteria through sediment-mimicking obstacle channels.  

      (4) Fig. 3c and Fig. 6c are seemingly showing the same quantity (or closely related ones). The authors should use the same symbol and give an explicit mathematical definition. 

      The two quantities are not exactly the same, as we cannot directly quantify the flux of bacteria through the channel in our experiments. On the one hand, we cannot track bacteria through the whole channel, on the other hand, the initial conditions are not exactly the same as in the simulations. In the simulations all bacteria start at the same time at the entrance to the channel. In the experiments, they enter from the inlet and do so at different times (pushing them in with fluid flow would be possible, but carries the risk of perturbing the results due to induced flow through the channel). We have added a new section in the Methods section that explains this difference and describes the procedure used to obtain the throughput from the experiments in detail. We have also added a corresponding comment in the Result section, where the simulations are compared with the experiments. 

      Minor issues: 

      - Figures have different styles that should be unified. For example, the panel labels sometimes have round brackets and sometimes they don't.

      See above

      - Page 6, (muCT) should have the Greek letter mu 

      Thanks, corrected.

      - Fig. 3a is not very clear; see my point 2 above. 

      See above

      Reviewer #2 (Recommendations For The Authors): 

      I have only a few comments and questions, which the authors should address: 

      (1) The observed exponential dependence of decay time on the "well" depth could be related to the exponential density distribution of active particles in a gravitational field, which has been derived previously. Might be interesting to discuss such a possible connection. 

      Thank you for the suggestion, the two cases are indeed somewhat analogous with behaviors reminiscent of thermal processes with an effective temperature. Such a description is however not generally possible (even for sedimentation, only some features are described). We plan to address in future work whether it can be made more quantitative in our case of escape from the corner traps. We have included a short discussion of the analogy in the section on trapping and escape. 

      (2) The authors should consider the following relevant references, and discuss them briefly in their manuscript:

      - Sedimentation, trapping, and rectification of dilute bacteria J Tailleur, ME Cates EPL 86, 60002 (2009) 

      - Human spermatozoa migration in microchannels reveals boundary-following navigation P Denissenko, V Kantsler, DJ Smith, J Kirkman-Brown Proc. Natl. Acad. Sci. USA 109, 8007-8010 (2012) 

      - Wall accumulation of self-propelled spheres J Elgeti, G Gompper Europhysics Letters 101, 48003 (2013) 

      - Wall entrapment of peritrichous bacteria: a mesoscale hydrodynamics simulation study SM Mousavi, G Gompper, RG Winkler Son Maber 16 (20), 4866-4875 (2020) 

      - A Geometric Criterion for the Optimal Spreading of Active Polymers in Porous Media C Kurzthaler, S Mandal, T Bhabacharjee, H Löwen, SS Daba, HA Stone Nat. Commun. 12, 7088 (2021) 

      - Run-to-Tumble Variability Controls the Surface Residence Times of E. coli Bacteria G Junot, T Darnige, A Lindner, VA Martinez, J Arlt, A Dawson, WCK Poon, H Auradou, E Clement Phys. Rev. Leb. 128, 248101 (2022) 

      - Dynamics and phase separation of active Brownian particles on curved surfaces and in porous media P Iyer, RG Winkler, DA Fedosov, G Gompper Phys. Rev. Research 5, 033054 (2023) 

      We agree that there is a lot of literature on these aspects, specifically interaction of self-propelled objects with walls and motion of swimmers through porous media. We have slightly extended our overview of previous literature in the introduction and included most of these references.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1: 

      (1) Their results with human macrophages suggest that there are differences between murine and human macrophages in inflammasome-mediated restriction of STm growth. For example, Thurston et al. showed that in murine macrophages that inflammasome activation controls the replication of mutant STm that aberrantly invades the cytosol, but only slightly limits replication of WT STm. In contrast, here the authors found that primed human macrophages rely on caspase-1, gasdermin D and ninjurin-1 to restrict WT STm. I wonder if the priming of the human macrophages in this study could account for the differences in these studies. Along those lines, do the authors see the same results presented in this study in the absence of priming the macrophages with Pam3CSK4. I think that determining whether the control of intracellular STm replication is dependent on priming is very important.

      We thank the Reviewer for their careful attention to our manuscript and for their thoughtful comments. We have addressed this question about the impact of priming by repeating the bacterial intracellular burden assays in unprimed WT and CASP1-/- THP-1 cells. We have added additional figures to the manuscript to address this: Figure 1 – Figure Supplement 3. Under unprimed conditions, CASP1-/- cells still harbored significantly higher bacterial burdens at 6 hpi and a significant fold-increase in bacterial CFUs compared to WT cells. These results suggest that the caspase-1-mediated restriction of intracellular Salmonella replication in human macrophages is independent of priming. 

      (2) Another difference with the Thurston et al. paper is the way that the STm inoculum was prepared - stationary phase bacteria that were opsonized. Could this also account for differences between the two studies rather than differences between murine and human macrophages in inflammasome-dependent control of STm?

      We thank the Reviewer for this excellent suggestion. To address this possibility, we repeated the bacterial intracellular burden assays in WT and CASP1-/- THP-1 cells using stationary phase bacteria. We infected WT and CASP1-/- THP-1 cells with stationary phase Salmonella, and we subsequently assayed for intracellular bacterial burdens. These data have now been added to the manuscript in Figure 1 – Figure Supplement 4. Interestingly, we did not observe any fold-change in the bacterial colony forming units in both the WT and CASP1-/- THP-1 cells for the stationary phase Salmonella. These data indicate that by 6 hours postinfection, Salmonella do not replicate efficiently in human macrophages unless grown under SPI-1-inducing conditions. Furthermore, these results suggest that differences in how the Salmonella inoculum is prepared may contribute to the discrepancies between our study and previous studies, as noted by the Reviewer. 

      (3) The authors show that the pore-forming proteins GSDMD and Ninj1 contribute to control of STm replication in human macrophages. Is it possible that leakage of gentamicin from the media contributes to this control?

      Response: We thank the Reviewer for their insightful comment. We have addressed this question on the impact of gentamicin by repeating the bacterial intracellular burden assays using a lower concentration of gentamicin in combination with extensively washing the cells with RPMI media to remove the gentamicin. WT and CASP1-/- THP-1 cells were infected with WT Salmonella. Then, at 30 minutes post-infection, cells were treated with 25 μg/ml of gentamicin to kill any extracellular bacteria. At 1 hour post-infection (hpi), the cells were washed for a total of five times with fresh RPMI to remove the gentamicin, and then the media was replaced with fresh media containing no gentamicin. In parallel, we also treated cells with 100 μg/ml of gentamicin at 30 minutes post-infection, washed the cells five times with fresh RPMI at 1 hpi to remove the gentamicin, and then replaced the media with fresh media containing 10 μg/ml of gentamicin. This data has now been included in the manuscript as Figure 1 – Figure Supplement 5. We observed similar levels in the intracellular bacterial burdens at 1 hpi and 6 hpi and a fold-increase in bacterial colony forming units in CASP1-/- cells compared to WT cells across both gentamicin conditions, suggesting that gentamicin appears to not contribute to the intracellular control of Salmonella replication in human macrophages. Of note, we also tried repeating the bacterial intracellular burden assays without gentamicin, using only washes to remove extracellular at 1 hpi; however, under these experimental conditions, we observed high levels of extracellular Salmonella. Therefore, we relied on using a lower concentration of gentamicin to kill extracellular Salmonella in conjunction with extensive washing to remove the gentamicin for the remainder of the infection. 

      (4) One major question that remains to be answered is whether casp-1 plays a direct role in the intracellular localization of STm. If the authors quantify the percentage of vacuolar vs. cytosolic bacteria at early time points in WT and casp-1 KO macrophages, would that be the same in the presence and absence of casp-1? If so, then this would suggest that there is a basal level of bacterial-dependent lysis of the SCV and in WT macrophages the presence of cytosolic PAMPS trigger cell death and bacteria can't replicate in the cytosol. However, in the inflammasome KO macrophages, the host cell remains alive and bacteria can replicate in the cytosol.

      We thank this Reviewer for raising this important point. We have addressed this experimentally by quantifying the percentage of vacuolar vs. cytosolic Salmonella at 2 hpi in WT, NAIP-/-, and CASP1-/- THP-1 cells using a chloroquine (CHQ) resistance assay. This data has now been included in the manuscript in the new Figure 5A. The original subfigures of Figure 5 have consequently been rearranged. We did not observe any significant differences in vacuolar and cytosolic bacterial burdens at this early time point in WT, NAIP-/-, and CASP1-/- THP-1 cells. As noted by the Reviewer, these results suggest that the basal level of bacterialdependent lysis of the SCV in human macrophages is not dependent on caspase-1 or NAIP. 

      Reviewer #3: 

      (1) The main weaknesses of the study are the inherent limitations of tissue culture models. For example, to study interaction of Salmonella with host cells in vitro, it is necessary to kill extracellular bacteria using gentamicin. However, since Salmonella-induced macrophage cell death damages the cytosolic membrane, gentamicin can reach intracellular bacteria and contribute to changes in CFU observed in tissue culture models (major point 1). This can result in tissue culture "artefacts" (i.e., observations/conclusions that cannot be recapitulated in vivo). For example, intracellular replication of Salmonella in murine macrophages requires T3SS-2 in vitro, but T3SS-2 is dispensable for replication in macrophages of the spleen in vivo (Grant et al., 2012).  

      We thank the Reviewer for their helpful comments and insightful suggestions. We have addressed some of the concerns about gentamicin in our response to Reviewer #1 above. To address the Reviewer’s concerns further, we have included language to acknowledge the limitations of our study based on the artefacts of tissue culture models in our Discussion section: “In this study, we utilized tissue culture models to examine intracellular Salmonella replication in human macrophages. These in vitro systems allow for precise control of experimental conditions and, therefore, serve as powerful tools to interrogate the molecular mechanisms underlying inflammasome responses and Salmonella replication in both immortalized and primary human cells. Still, there are limitations of tissue culture models, as they lack the inherent complexity of tissues and organs in vivo. To assess whether our findings reflect Salmonella dynamics in the mammalian host, it will be important to complement our studies and extend the implications of our work using approaches that model more complex systems, such as organoids or organ explant models co-cultured with immune cells, and in vivo techniques, such as humanized mouse models.”

      (2) In Figure 1: are increased CFU in WT vs CASP1-deficient THP-1 cells due to Caspase 1 restricting intracellular replication or due to Caspase-1 causing pore formation to allow gentamicin to enter the cytosol thereby restricting bacterial replication? The same question arises about Caspase-4 in Figure 2, where differences in CFU are observed only at 24h when differences in cell death also become apparent. The idea that gentamicin entering the cytosol through pores is responsible for controlling intracellular Salmonella replication is also consistent with the finding that GSDMD-mediated pore formation is required for restricting intracellular Salmonella replication (Figure 3). Similarly, the finding that inflammasome responses primarily control Salmonella replication in the cytosol could be explained by an intact SCV membrane protecting Salmonella from gentamicin (Figure 5). 

      We thank the Reviewer for highlighting this important point regarding gentamicin.

      We have addressed this question in our response above to Review #1 and in Figure 1 – Figure Supplement 5. We observed caspase-1-mediated restriction of Salmonella in human macrophages even when cells were treated with a lower concentration of gentamicin (25 μg/ml) for 30 minutes and then extensively washed with RPMI media to remove any gentamicin for the remainder of the infection. These data suggest that gentamicin is likely not responsible for controlling intracellular Salmonella in human macrophages.

    1. Author response:

      We thank all three reviewers and the editors for their detailed comments on our manuscript.  The two main themes of this feedback concern the paper’s generality and its presentation.  Reviewers #2 and #3 raise questions about how the discrepancies in fitness statistics we report will be realized across organisms, environments, and in models with interactions beyond resource competition (e.g., toxicity or cross-feeding).  All reviewers and the editors have also expressed the need for the presentation to be improved, including a broader introduction to the concept of fitness (Reviewer #1), a clearer explanation of our model (Reviewer #1), better explanations of how quantifying fitness answers key biological questions (Reviewer #3), and improvements to the most technical sections to ensure accessibility to experimentalists (Reviewer #3).

      In light of these comments, we wish to clarify that the goal of this paper is to provide a proof-of-principle for how different choices in quantifying fitness can lead to different analysis outcomes.  Since the focus of this paper is on the theoretical concepts, we focus on a few example data sets and a simple model to demonstrate the existence of these discrepancies.  While other organisms and environments, especially with more complex growth dynamics and interactions, could certainly have additional or different discrepancies in fitness statistics, we believe the simplicity of our approach is valuable because it demonstrates that even basic features of microbial growth (common across systems) with realistic parameter values are sufficient to cause significant differences in fitness depending on these quantification choices.  We agree with the reviewers that a systematic documentation of how these fitness discrepancies are empirically realized is important, but we believe that question is best explored in separate future works that can focus fully on this empirical rather than theoretical question.

      We plan to revise the manuscript in several ways, following the suggestions of the three reviewers and the editor.  First, we will better articulate the main goal and conclusions of this manuscript, especially its generality and limitations.  Second, we will work to streamline and clarify several points in the main text identified by the reviewers to make it more accessible and useful to a broader audience, especially experimentalists who routinely measure fitness in their work.  We are grateful to the reviewers and the editor for their time and effort in assessing the manuscript, and we look forward to providing an updated version that addresses these concerns.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      Casas-Tinto et al. present convincing data that injury of the adult Drosophila CNS triggers transdifferentiation of glial cells and even the generation of neurons from glial cells. This observation opens up the possibility of getting a handle on the molecular basis of neuronal and glial generation in the vertebrate CNS after traumatic injury caused by Stroke or Crush injury. The authors use an array of sophisticated tools to follow the development of glial cells at the injury site in very young and mature adults. The results in mature adults revealing a remarkable plasticity in the fly CNS and dispels the notion that repair after injury may be only possible in nerve cords which are still developing. The observation of so-called VC cells which do not express the glial marker repo could point to the generation of neurons by former glial cells.

      Conclusion:

      The authors present an interesting story that is technically sound and could form the basis for an in-depth analysis of the molecular mechanism driving repair after brain injury in Drosophila and vertebrates.

      Strengths:

      The evidence for transdifferentiation of glial cells is convincing. In addition, the injury to the adult CNS shows an inherent plasticity of the mature ventral nerve cord which is unexpected.

      Weaknesses:

      Traumatic brain injury in Drosophila has been previously reported to trigger mitosis of glial cells and generation of neural stem cells in the larval CNS and the adult brain hemispheres. Therefore this report adds to but does not significantly change our current understanding. The origin and identity of VC cells is unclear.

      The Reviewer correctly points out that it has been reported that traumatic brain injury trigger generation of neural stem cells. However, according to previous reports, those cells where quiescent Dpn+ neuroblast. We now report that already differentiated adult neuropil glia transdifferentiate into neurons. Which is a new mechanism not previously reported. 

      We agree with the reviewer regarding the identity of VC neurons although according to the results of G-TRACE experiments the origin is clear, they originate from neuropil glia (i.e. Astrocyte-like glia and ensheathing glia). We have used a battery of antibodies previously reported to identify specific subtypes of neurons to identify these newly generated neurons (Figure S1). We did not find any other neuronal marker rather than Elav that co-localize with VC cells

      Reviewer #2:

      Summary:

      Casas-Tinto et al., provide new insight into glial plasticity using a crush injury paradigm in the ventral nerve cord (VNC) of adult Drosophila. The authors find that both astrocyte-like glia (ALG) and ensheating glia (EG) divide under homeostatic conditions in the adult VNC and identify ALG as the glial population that specifically ramps up proliferation in response to injury, whereas the number of EGs decreases following the insult. Using lineagetracing tools, the authors interestingly observe the interconversion of glial subtypes, especially of EGs into ALGs, which occurs independent of injury and is dependent on the availability of the transcription factor Prospero in EGs, adding to the plasticity observed in the system. Finally, when tracing the progeny of differentiated glia, Casas-Tinto and colleagues detect cells of neuronal identity and provide evidence that such glia-derived neurogenesis is specifically favored following ventral nerve cord injury, which puts forward a remarkable way in which glia can respond to neuronal damage.

      Numerous experiments have been carried out in 7-day-old flies, showing that the observed plasticity is not due to residual developmental remodeling or a still immature VNC.

      By elegantly combining different genetic tools, the authors show glial divisions with mitotic-dependent tracing and find that the number of generated glia is refined by apoptosis later on.

      The work identifies Prospero in glia as an important coordinator of glial cell fate, from development to the adult context, which draws further attention to the upstream regulatory mechanisms.

      We express our gratitude to the reviewer for their keen appreciation of our efforts and their enthusiasm for the outcomes of this research.

      Weaknesses:

      Although the authors do use a variety of methods to show glial proliferation, the EdU data (Figure 1B) could be more informative (Figure 1B) by displaying images of non-injured animals and providing quantifications or the mention of these numbers based on results previously acquired in the system.

      We appreciate the Reviewer’s comment. We believed that adding images of non-injured animals did not add new information as we already quantified the increase of glial proliferation upon injury in Losada-Perez let al. 2021. Besides, the purpose of this experiment was to figure out if dividing cells where Astrocyte-like glia rather than the number of dividing cells. Comparing independent experiments could be tricky but if we compare the quantifications of G2-M glia (repo>fly-Fucci) done in Losada-Perez et al 2021 (fig 1C) with the quantifications of G2-M neuropil glia done in this work (fig 1C) we can see that the numbers are comparable.

      The experiments relying on the FUCCI cell cycle reporter suggested considerable baseline proliferation for EGs and ALGs, but when using an independent method (Twin Spot MARCM), mitotic marking was only detected for ALGs. This discrepancy could be addressed by assessing the co-localization of the different glia subsets using the identified driver lines with mitotic markers such as PH3.

      In our understanding this discrepancy could be explained by the magnitude of proliferation. The lower proliferation rate of EG (as indicate the fly-fucci experiments) combining with the incomplete efficiency of MARCM clones induction reduces considerably the chances of finding EG MARCM clones. PH3 is a mitotic marker but it is also found in apoptotic cells (Kim and Park 2012. DOI: 10.1371/journal.pone.0044307) however, we stained injured VNCs with anti-Ph3 and found ALG cells positive for PH3 (Author response image 1).

      Author response image 1.

       

      The data in Figure 1C would be more convincing in combination with images of the FUCCI Reporter as it can provide further information on the location and proportion of glia that enter the cell cycle versus the fraction that remains quiescent.

      We added a Figure 1 V2 (version 2) with the suggested images (1-C’).

      The analyses of inter-glia conversion in Figure 3 are complicated by the fact that Prospero RNAi is both used to suppress EG - to ALG conversion and as a marker to establish ALG nature. Clarifications if the GFP+ cells still expressed Pros or were classified as NP-like GFP cells are required here.

      As described in the text, Pros is a marker for ALG and the results suggest that Prospero expression is required for the EG to ALG transition. We clarified these concepts in the text accordingly. In figure 3 we showed images of NP-like cells originated from EG that are prospero+, and therefore supporting the transdifferentiation from EG to ALG.  

      The conclusion that ALG and EG glial cells can give rise to cells of neuronal lineage is based on glial lineage information (GFP+ cells from glial G-trace) and staining for the neuronal marker Elav. The use of other neuronal markers apart from Elav or morphological features would provide a more compelling case that GFP+ cells are mature neurons.

      We completely agree with the reviewer's observation regarding the identity of VC neurons. We have used a battery of antibodies previously reported to identify specific subtypes of neurons to identify these newly generated neurons (Figure S1). We did not find any other neuronal marker rather than Elav that colocalize with VC cells

      Although the text discusses in which contexts, glial plasticity is observed or increased upon injury, the figures are less clear regarding this aspect. A more systematic comparison of injured VNCs versus homeostatic conditions, combined with clear labelling of the injury area would facilitate the understanding of the panels.

      We appreciate the Reviewer’s observation. We have carefully checked all figures and labelled then as “Injured” or “Not Injured”. We added a Figure 2-V2 and a figure 4-V2.

      Context/Discussion

      The study finds that glia in the ventral cord of flies have latent neurogenic potential. Such observations have not been made regarding glia in the fly brain, where injury is reported to drive glial divisions or the proliferation of undifferentiated progenitor cells with neurogenic potential.

      Discussing this different strategy for cell replacement adopted by glia in the VNC and pointing out differences to other modes seems fascinating. Highlighting differences in the reactiveness of glia in the VNC compared to the brain also seems highly relevant as they may point to different properties to repair damage.

      Based on the assays employed, the study points to a significant amount of

      glial "identity" changes or interconversions, which is surprising under homeostatic conditions. The significance of this "baseline" plasticity remains undiscussed, although glia unarguably show extensive adaptations during nervous system development.

      It would be interesting to know if the "interconversion" of glia is determined by the needs in the tissue or would shift in the context of selective ablation/suppression of a glial type.

      We deeply appreciate the Reviewer’s enthusiasm on this subject, it is indeed fascinating. We made a reduced discussion in order to fit in the eLife Short report requirements but the specific condition that trigger glial interconversion are of great interest for us. To compromise EG or ALG viability and evaluate the behaviour of glial cells is of great interest for developmental biology and regeneration, but the precise scenario to develop these experiments is not well defined. In this report, we aim to reproduce an injury in Drosophila brain and this model should serve to analyze cellular behaviours. The scenario where we deplete on specific subpopulation of glial cells is conceptually attractive, but far away from the scope of this report.

      Reviewer #3:

      In this manuscript, Casas-Tintó et al. explore the role of glial cells in the response to a neurodegenerative injury in the adult brain. They used Drosophila melanogaster as a model organism and found that glial cells are able to generate new neurons through the mechanism of transdifferentiation in response to injury.

      This paper provides a new mechanism in regeneration and gives an understanding of the role of glial cells in the process.

    1. Author response:

      Reviewer #1 (Public review):

      Li et al. investigate Ca2+ signaling in T. gondii and argue that Ca2+ tunnels through the ER to other organelles to fuel multiple aspects of T. gondii biology. They focus in particular on TgSERCA as the presumed primary mechanism for ER Ca2+ filling. Although, when TgSERCA was knocked out there was still a Ca2+ release in response to TG present.

      Note that we did not knockout SERCA as it is an essential gene so it would not be possible to isolate parasites that do not express SERCA. We created conditional mutants that downregulate the expression of SERCA and some activity is present in the mutant after 24 h of ATc treatment.

      Overall the Ca2+ signaling data do not support the conclusion of Ca2+ tunneling through the ER to other organelles in fact they argue for direct Ca2+ uptake from the cytosol.

      The authors show EM membrane contact sites between the ER and other organelles, so Ca2+ released by the ER could presumably be taken up by other organelles but that is not ER Ca2+ tunneling.

      They clearly show that SERCA is required for T. gondii function.

      Overall, the data presented to not fully support the conclusions reached

      We agree that the data does not support Ca2+ tunneling as defined and characterized in mammalian cells. In response to this comment, we modified the title and the text accordingly.

      However, we think that the study shows far more than just the role of SERCA in T. gondii functions. We argue that the study shows that the ER (through the activity of the SERCA pump) sequesters and re-distributes calcium to other organelles following influx through the PM. The experiments show that the ER is able to take calcium from the cytosol as it enters the parasite through SERCA activity, and this activity is important for the transition of the parasite between various extracellular calcium exposures. We believe that the role of the ER in redistributing calcium following exposure to physiological levels of extracellular calcium is demonstrated in the experiments shown in Figs 1H-I, 4G-H and 5G,H, I, J, K . There are no previous T. gondii studies that address the question of how intracellular stores are filled with calcium, which are essential for the continuation of the lytic cycle, meaning they are essential for the parasitism of T. gondii.

      Data argue for direct Ca2+ uptake from the cytosol

      The ER most likely takes up calcium from the cytosol following its entry through the PM and redistributes it to the other organelles. We will delete the word “tunneling” and replace it with transfer and re-distribution as they represent our results.

      What we think is re-distribution is shown in Figure 1H and I in which the calcium released after GPN and nigericin are enhanced after TG addition. Of note is that there is no experimental evidence that supports the regulation of calcium entry by store depletion (PMID: 24867952), and we do not think that the enhanced response is due to calcium entry.

      Figure 4G and H show that knocking down SERCA reduces significantly the response to GPN. Fig 5I shows that the mitochondrial calcium uptake is reduced after the addition of GPN in the knockdown mutant. Fig 2B shows that SERCA can take up calcium at 55 nM calcium while mitochondrial uptake needs higher concentrations (Fig 5B-C). However, higher calcium concentrations could be reached at the microdomains formed around MCS between the ER and mitochondrion. Figure 5E shows that the mitochondrion is not responsive to an increase of cytosolic calcium. This is also shown for the apicoplast in Fig. 7 E and F of the Li et al, Nat Commun 2021 paper.

      Reviewer #2 (Public review):

      The role of the endoplasmic reticulum (ER) calcium pump TgSERCA in sequestering and redistributing calcium to other intracellular organelles following influx at the plasma membrane.

      T. gondii transitions through life cycle stages within and exterior to the host cells, with very different exposures to calcium, adds significance to the current investigation of the role of the ER in redistributing calcium following exposure to physiological levels of extracellular calcium.

      They also use a conditional knockout of TgSERCA to investigate its role in ER calcium store-filling and the ability of other subcellular organelles to sequester and release calcium. These knockout experiments provide important evidence that ER calcium uptake plays a significant role in maintaining the filling state of other intracellular compartments.

      We thank the reviewer.

      While it is clearly demonstrated, and not surprising, that the addition of 1.8 mM extracellular CaCl2 to intact T. gondii parasites preincubated with EGTA leads to an increase in cytosolic calcium and subsequent enhanced loading of the ER and other intracellular compartments, there is a caveat to the quantitation of these increases in calcium loading. The authors rely on the amplitude of cytosolic free calcium increases in response to thapsigargin, GPN, nigericin, and CCCP, all measured with fura2. This likely overestimates the changes in calcium pool sizes because the buffering of free calcium in the cytosol is nonlinear, and fura2 (with a Kd of 100-200 nM) is a substantial, if not predominant, cytosolic calcium buffer. Indeed, the increases in signal noise at higher cytosolic calcium levels (e.g. peak calcium in Figure 1C) are indicative of fura2 ratio calculations approaching saturation of the indicator dye.

      We agree about the limitations of using Fura2 but according to the literature (PMID:3838314, fig. 3) Fura2 is suitable for measurements between 100 nM and 1 mM calcium.  The responses in our experiments were within its linear range and the experiments with the SERCA mutant and mitochondrial GCaMPs supports the conclusions of our work.

      We agree that the experiment shown in Fig 1C shows a response close to the limit of the linear range of Fura2 and we can provide a more representative trace in the final article. We can include new quantifications and comparisons.

      Another caveat, not addressed, is that loading of fura2/AM can result in compartmentalized fura2, which might modify free calcium levels and calcium storage capacity in intracellular organelles.

      We are aware of this issue and because of that we have modified our protocol to minimize compartmentalization. We load cells for 26 min at room temperature and keep cells in ice and do not use them for longer that 2-3 hours because we do see evidence of compartmentalization. One evidence of compartmentalization is the increase in the resting calcium concentration.

      The finding that the SERCA inhibitor cyclopiazonic acid (CPA) only mobilizes a fraction of the thapsigargin-sensitive calcium stores in T. gondii coincides with previously published work in another apicomplexan parasite, P. falciparum, showing that thapsigargin mobilizes calcium from both CPA-sensitive and CPA-insensitive calcium pools (Borges-Pereira et al., 2020, DOI: 10.1074/jbc.RA120.014906). It would be valuable to determine whether this reflects the off-target effects of thapsigargin or the differential sensitivity of TgSERCA to the two inhibitors.

      This is an interesting observation, and we will discuss the result considering the Plasmodium study and include the citation. We will add inhibition curves using the MagFluo protocol and compare CPA and TG.

      Figure S1 suggests differential sensitivity, and it shows that thapsigargin mobilizes calcium from both CPA-sensitive and CPA-insensitive calcium pools in T. gondii. Also important is that we used 1 µM TG as we are aware that TG has shown off-target effects at higher concentrations. 

      The authors interpret the residual calcium mobilization response to Zaprinast observed after ATc knockdown of TgSERCA (Figures 4E, 4F) as indicative of a target calcium pool in addition to the ER. While this may well be correct, it appears from the description of this experiment that it was carried out using the same conditions as Figure 4A where TgSERCA activity was only reduced by about 50%.

      We partially agree as pointed by the reviewer knock down of TgSERCA by only 50% means that the ER still could be targeted by zaprinast and no evidence of another target calcium pool. From the MagFLuo4 experiment (although we are aware that the fluorescence of mag Fluo4 is not linear to calcium), there is SERCA activity after 24 hr of ATc treatment.  However, when adding Zaprinast after TG we see a significant release of calcium which is true for both wild type and conditional knockdowns. Because of this result we proposed that there could be another large neutral calcium pool than the one mobilized by TG. We will address these possibilities in the discussion and interpretation of the result.

      The data in Figures 4A vs 4G and Figures 4B vs 4H indicate that the size of the response to GPN is similar to that with thapsigargin in both the presence and absence of extracellular calcium. This raises the question of whether GPN is only releasing calcium from acidic compartments or whether it acts on the ER calcium stores, as previously suggested by Atakpa et al. 2019 DOI: 10.1242/jcs.223883. Nonetheless, Figure 1H shows that there is a robust calcium response to GPN after the addition of thapsigargin.

      The results of the experiments did not exclude the possibility that GPN can also mobilize some calcium from the ER besides acidic organelles. We don’t have any evidence to support that GPN can mobilize calcium from the ER either. Based on our unpublished work, we think GPN mainly release calcium from the PLVAC. We will include the mentioned citation and discuss the result considering the possibility that GPN may be acting on the ER.

      An important advance in the current work is the use of state-of-the-art approaches with targeted genetically encoded calcium indicators (GECIs) to monitor calcium in important subcellular compartments. The authors have previously done this with the apicoplast, but now add the mitochondria to their repertoire. Despite the absence of a canonical mitochondrial calcium uniporter (MCU) in the Toxoplasma genome, the authors demonstrate the ability of T. gondii mitochondrial to accumulate calcium, albeit at high calcium concentrations. Although the calcium concentrations here are higher than needed for mammalian mitochondrial calcium uptake, there too calcium uptake requires calcium levels higher than those typically attained in the bulk cytosolic compartment. And just like in mammalian mitochondria, the current work shows that ER calcium release can elicit mitochondrial calcium loading even when other sources of elevated cytosolic calcium are ineffective, suggesting a role for ER-mitochondrial membrane contact sites. With these new tools in hand, it will be of great value to elucidate the bioenergetics and transport pathways associated with mitochondrial calcium accumulation in T. gondii.

      We thank this reviewer for his/her positive comment. Studies of bioenergetics and transport pathways associated with mitochondrial calcium accumulation is part of our future plans.

      The current studies of calcium pools and their interactions with the ER and dependence on SERCA activity in T. gondi are complemented by super-resolution microscopy and electron microscopy that do indeed demonstrate the presence of close appositions between the ER and other organelles (see also videos). Thus, the work presented provides good evidence for the ER acting as the orchestrating organelle delivering calcium to other subcellular compartments through contact sites in T. gondi, as has become increasingly clear from work in other organisms.

      Thank you

      Reviewer #3 (Public review):

      This manuscript describes an investigation of how intracellular calcium stores are regulated and provides evidence that is in line with the role of the SERCA-Ca2+-ATPase in this important homeostasis pathway. Calcium uptake by mitochondria is further investigated and the authors suggest that ER-mitochondria membrane contact sites may be involved in mediating this, as demonstrated in other organisms.

      The significance of the findings is in shedding light on key elements within the mechanism of calcium storage and regulation/homeostasis in the medically important parasite Toxoplasma gondii whose ability to infect and cause disease critically relies on calcium signalling. An important strength is that despite its importance, calcium homeostasis in Toxoplasma is understudied and not well understood.

      We agree with the reviewer. Thank you

      A difficulty in the field, and a weakness of the work, is that following calcium in the cell is technically challenging and thus requires reliance on artificial conditions. In this context, the main weakness of the manuscript is the extrapolation of data. The language used could be more careful, especially considering that the way to measure the ER calcium is highly artificial - for example utilising permeabilization and over-loading the experiment with calcium. Measures are also indirect - for example, when the response to ionomycin treatment was not fully in line with the suggested model the authors hypothesise that the result is likely affected by other storage, but there is no direct support for that.

      The MagFluo protocol has been amply used in mammalian cells, DT40 cells and other cells for the characterization of the IP3 receptor response to IP3. We will include and discuss more citations in the revised article. The scheme at the top of the figure shows the protocol used. There is no overloading with calcium because the cells are permeabilized and the concentrations of calcium used are physiological and all experiments were performed at 220 nm calcium which is within the cytosolic levels tolerated by cells. The experiment was done with permeabilized cells because permeabilization allows the indicator to become diluted, the substrate MgATP to reach the membrane of the ER and in addition allows for the exposure to precise concentrations of calcium. MagFluo4 loading is intended for its compartmentalization to all intracellular compartments and the uptake stimulated by MgATP exclusively occurs in the compartment occupied by SERCA. IO is an ionophore that causes calcium release from other stores in addition to the ER and it is expected that will result in a larger release. We must clarify that the experiment shown in Fig. 2 was done to characterize the activity of SERCA and was not aimed at the characterization of the role of SERCA in the parasite. We will explain this result better in the revised version of the article.

      Below we provide some suggestions to improve controls, however, even with those included, we would still be in favour of revising the language and trying to avoid making strong and definitive conclusions. For example, in the discussion perhaps replace "showed" with "provide evidence that are consistent with..."; replace or remove words like "efficiently" and "impressive"; revise the definitive language used in the last few lines of the abstract (lines 13-17); etc. Importantly we recommend reconsidering whether the data is sufficiently direct and unambiguous to justify the model proposed in Figure 7 (we are in favour of removing this figure at this early point of our understanding of the calcium dynamic between organelles in Toxoplasma).

      We thank the reviewer for the suggestions and will modify the language as suggested.

      Fig 7 is only a model and as all models could be incorrect. However, considering this reviewer’s criticism we will replace the model for a simpler one that is less speculative.

      Another important weakness is poor referencing of previous work in the field. Lines 248-250 read almost as if the authors originally hypothesised the idea that calcium is shuttled between ER and mitochondria via membrane contact sites (MCS) - but there is extensive literature on other eukaryotes which should be first cited and discussed in this context. Likewise, the discussion of MCS in Toxoplasma does not include the body of work already published on this parasite by several groups. It is informative to discuss observations in light of what is already known.

      We added a citation following the sentence mentioned by the reviewer in lines 248-250 (corrected preprint) and will include more in the revised article. We cite several pertinent articles that describe MCS in Toxoplasma (lines 378-380, very few actually). We will make sure not to miss any new articles that could have been recently published. Note that our work is not about describing the presence of MCSs. We are showing transfer of calcium between the ER and mitochondria and we present evidence that supports that it happens through MCSs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      - Summary: 

      Recordings were made from the dentate nucleus of two monkeys during a decision-making task. Correlates of stimulus position and stimulus information were found to varying degrees in the neuronal activities. 

      We agree with this summary.

      - Strengths: 

      A difficult decision-making task was examined in two monkeys.

      We agree with this statement.

      - Weaknesses: 

      One of the monkeys did not fully learn the task. The manuscript lacked a coherent hypothesis to be tested, and no attempt was made to consider the possibility that this part of the brain may have little to do with the task that was being studied. 

      We understand the reviewers concern. It is correct that one of the monkeys (Mi) did not perform at a high level, but it should be noted that both monkeys learned significantly above chance level. Therefore, we would argue that both monkeys in fact did learn the task but Mi’s performance was suboptimal. This difference in the performance levels gave us a rare opportunity to dive deeper into the reasons why some animals perform better than the others and we show that Mi (the lower performing monkey) paid more attention to the outcome of the previous trial – this is evident from our behavioural and decoding models.

      We tested the overall hypothesis that neurons of the nucleus dentate can dynamically modulate their activity during a visual attention task, comprising not only sensorimotor but also cognitive attentional components. Many neurons in the dentate are multimodal (Figure 3C-D) which was something that was theorized. One of the specific hypotheses that we tested is that the dentate cells can be direction-selective for both the sensorimotor and cognitive component. Given that many of the recorded cells showed direction-selectivity in their firing rate modulation for gap directions and/or stimulus directions, we provide strong evidence that this hypothesis is correct. We have now spelled out this hypothesis more explicitly in the introduction of the revised version. We now also explain better why we tested this specific hypothesis. Indeed, earlier studies in primates such as those by Herzfeld and colleagues (2018, Nat. Neuro.) and van Es and colleagues (2019, Current Biol) have indicated that direction-selectivity of cerebellar activity may occur in various sensorimotor domains.

      We also appreciate the comment of this Reviewer that in our original submission we did not show our attempt to consider the possibility that this part of the brain may have little to do with the task that was being studied. We in fact did consider this possibility in that we successfully injected 3 ml of muscimol (5 μg/ml, Sigma Aldrich) into the dentate nucleus in vivo in one of the monkeys (Mo). This application resulted in a reduction of more than 10% in correct responses of the covert attention task after 45 minutes, whereas the performance remained the same following saline injections. Unfortunately, due to the timing of the experiments and Covid19-related laboratory restrictions we were unable to perform these experiments in the other monkey or repeat them in Mo. We aim to replicate this in future experiments and publish it when we have full datasets of at least two monkeys available. For this paper we have prioritized our tracing experiments, highlighting the connections of the dentate nucleus with attention related areas in brainstem and cortex in both monkeys, following perfusion.

      - Perhaps the large differences in performance between the two subjects can be used as a way to interpret the neural data's relationship to behavior, as it provided a source of variance. This is what we would hypothesize if we believed that this area of the brain is playing a significant role in the task. If one animal learns much more poorly, and this region of the brain is important for that behavior, then shouldn't there be clear, interpretable differences in the neural data? 

      We thank the Reviewer for this comment. We have added a new Supplementary Figure 2, in which we present the data for both monkeys separately in the revised manuscript. Comparing the two datasets however, we see more commonalities related to the significant learning in both monkeys than differences that might be related to their different levels of learning. We have therefore decided to show the different datasets transparently in the new Supplementary Figure 2, but to stay on the conservative side in our interpretations.

      - How should we look for these differences? A number of recent papers in mice have uncovered a large body of data showing that during the deliberation period, when the animal is interpreting a sensory stimulus (often using the whisker system), there is ramping activity in a principal component space among neurons that contribute to the decision. This ramping activity is present (in the PCA space) in the motor areas of the cortex, as well as in the medial and lateral cerebellar nuclei. Perhaps a similar computational approach would benefit the current manuscript. 

      We also appreciate this point. We have done the principal component analysis accordingly, and we indeed do find the ramping activity in several components of the dentate activity of both monkeys (Mi and Mo). We have now added a new Supplementary Figure 3 with the first three components of both correct and incorrect trials for Mi and Mo, highlighting their potential contribution.

      - What is the hypothesis that is being tested? That is, what do you think might be the function of this region of the cerebellum in this task? It seems to me that we are not entirely in the dark, as previous literature on mice decision-making tasks has produced a reasonable framework: the deliberation period coincides with ramping activity in many regions of the frontal lobe and the cerebellum. Indeed, the ramp in the cerebellum appears to be a necessary condition for the ramp to be present in the frontal lobe. Thus, we should see such ramping activity in this task in the dentate. When the monkey makes the wrong choice, the ramp should predict it. If you don't see the ramping activity, then it is possible that the hypothesis is wrong, or that you are not recording from the right place. 

      It is indeed one of our specific hypotheses that the dentate cells can be direction-selective for the preparing cognitive component and/or sensorimotor response. We provide evidence that this hypothesis may be correct when we analyze the regular time response curves (see Figure 2 and the new Supplementary Figure 2 where the data of both monkeys are now presented separately). Moreover, we have now verified this by analysing the ramping curves of PCA space (new Supplementary Figure 3) and firing frequency of DN neurons that modulated upon presentation of the C-stimulus (new Supplementary Figure 4). These figures and findings are now referred to in the main text.

      - As this is a difficult task that depends on the ability of the animals to understand the meaning of the cues, it is quite concerning that one of the monkeys performed poorly, particularly in the early sessions. Notably, the disparity between the two subjects is rather large: one monkey at the start of the recordings achieved a performance that was much better than the second monkey did at the end of the recording sessions. You highlighted the differences in performance in Figure 1D and mentioned that you started recording once the animals reached 60% performance. However, this did not make sense to me as the performance of Mi even after the final day of recording did not reach the performance of Mo on the first day of recording. Thus, in contrast to Mo, Mi appeared to be not ready for the task when the recording began.

      We understand this point. However, please note that the learning performance of the monkeys concerned retraining sessions after they had had several weeks of vacation. So, even though it is correct that one of the two monkeys had a very good consolidation and started already at a relatively high level on the first retraining session, the other one also started and ended at a level above chance level (the y-axis starts at 0.5). We now highlight this point better in the Results section.

      - One objective of having two monkeys is to illustrate that what is true in one animal is also true in the other. In some figures, you show that the neural data are significantly different, while in others you combine them into one. Thus, are you confident that the neural data across the animals should be combined, as you have done in Figure 2? Perhaps you can use the large differences in performance as a source of variance to find meaning in the neural data. 

      This is a valid question; as highlighted above, we have now addressed this point in the new Supplementary Figure 2, where the data for both monkeys are presented separately. Given the sample sizes and level of variances, it is in general difficult to draw conclusions about the potential differences and contributions, but the data are sufficiently transparent to observe common trends. With regard to linking differences in the neural data to the differences in performance level, please also consider Figure 4, the new Supplementary Figure 3 (with the ramping PCA component) and new Supplementary Figure 4 (with the additional analysis of the ramping activity of DN neurons that modulated upon presentation of the C-stimulus), which suggests that the ramping stage of Mo starts before that of Mi. This difference highlights the possibility that injecting accelerations of the simple spike modulations of Purkinje cells in the cerebellar hemispheres into the complex of cerebellar nuclei may be instrumental in improving the performance of responses to covert attention, akin to what has been shown for the impact of Purkinje cells of the vestibulocerebellum on eye movement responses to vestibular stimulation (De Zeeuw et al. 1995, J Neurophysiol). This possibility is now also raised in the Discussion.

      - How do we know that these neurons, or even this region of the brain, contribute to this task? When a new task is introduced, the contributions of the region of the brain that is being studied are usually established via some form of manipulation. This question is particularly relevant here because the two subjects differed markedly in their performance, yet in Figure 3 you find that a similar percentage of neurons are responding to the various elements of the task.

      We appreciate this question. As highlighted above, we are refraining from showing our muscimol manipulation (3 ml of 5 μg/ml muscimol, Sigma Aldrich), as it only concerns 1 successful dataset and 1 control experiment. We hope to replicate this reversible lesion experiment in the future and publish it when we have full new datasets of at least two monkeys available. As explained above, for this paper we have sacrificed both monkeys following a timed perfusion, so as to have similar survival times for the transport of the neuro-anatomical tracer involved.  

      - Behavior in both animals was better when the gap direction was up/down vs. left/right. Is this difference in behavior encoded during the time that the animal is making a decision? Are the dentate neurons better at differentiating the direction of the cue when the gap direction is up/right vs. left/right? 

      These data have now been included in the new Supplementary Figure 2; we did not observe any significant differences in this respect.

      Reviewer #2:

      - The authors trained monkeys to discriminate peripheral visual cues and associate them with planning future saccades of an indicated direction. At the same time, the authors recorded single-unit neural activity in the cerebellar dentate nucleus. They demonstrated that substantial fractions of DN cells exhibited sustained modulation of spike rates spanning task epochs and carrying information about stimulus, response, and trial outcome. Finally, tracer injections demonstrated this region of the DN projects to a large number of targets including several known to interconnect the visual attention network. The data compellingly demonstrate the authors' central claims, and the analyses are well-suited to support the conclusions. Importantly, the study demonstrates that DN cells convey many motor and nonmotor variables related to task execution, event sequencing, visual attention, and arguably decision-making/working memory. 

      We thank the Reviewer for this positive and constructive feedback.

      - The study is solid and I do not have major concerns, but only points for possible improvement. 

      We thank the Reviewer for this positive feedback.

      - A key feature of this data is the extended changes/ramps in DN output across epochs (Figure 2). Crudely, this presents a challenge for the view that DN output mainly drives motor effectors, as the saccade itself lasts only a tiny fraction of the overall task. Some discussion of this dichotomy in thinking about the function(s) of the cerebellum, vis a vis the multifarious DN targets the authors demonstrate here, etc., would be helpful. 

      We agree with the Reviewer and we have expanded our Discussion on this point, also now highlighting the outcome of the new PCA analysis recommended by Reviewer 1 (see the new Supplementary figure Figure 3).

      - A high-level suggestion on the data: the presentation of the data focuses (sensibly) on the representation of the stimulus and response epochs (Figures 2-3). Yet, the authors then show that from decoding, it is, in fact, a trial outcome that is best represented in the population (Figure 4). While there is nothing 'wrong' with this, it reads slightly incongruously, and the reader does a bit of a "double take" back to the previous figures to see if they missed examples of the trial-outcome signals, but the previous presentations only show correct trials. Consider adding somewhere in the first 3 main figures some neural data showing comparisons with incorrect trials. This way, the reader develops prior expectations for the outcome decoding result and frame of reference for interpreting it. On a related note, the text contains an earlier introduction of this issue (p24 last sentence) and p25 paragraph 1 cites Figure 3D and 3E for signals "related to the absence of reward" - but the caption says this includes only correct trials? 

      We thank the Reviewer for bringing up these points. We have addressed the textual suggestions. Moreover, we have done the PCA analysis suggested by Reviewer 1 for both the correct and incorrect trials (see Supplementary material).

      - P29: The discrepancy in retrograde labeling between monkeys (2 orders of magnitude): I realize the authors can't really do anything about this, but the difference is large enough to warrant concerns in the interpretation (how did the tracer spread over the drastically larger area? Isotropically? Could it cross more "hard boundaries" and incorporate qualitatively different inputs/outputs?). A small discussion of possible caveats in interpreting the outcomes would be helpful. 

      We fully agree with this comment. As highlighted in the text, in both monkeys we first identified the optimal points for injection in the dentate nucleus electrophysiologically and we used the same pump with the same settings to carry out the injections, but even so the differences are substantial. We suspect that the larger injection might have been caused by an air bubble trapped in the syringe or a deviation in the stock solution, but we can never be sure of that. We have added a potential explanation for the caveat that might have played a role.

      - And a list of quick points: 

      We have addressed all points listed below; we want to thank the Reviewer for bringing them up.

      P3 paragraph 2 needs comma "in daily life,". 

      P4 paragraph 2 "C-gap" terminology not previously defined. 

      P4 paragraph 2 "animals employed different behavioral strategies". Grammatically, you should probably say "each animal employed a different behavioral strategy," but also scientifically the paragraph doesn't connect this claim to anything about the DN (whereas, e.g., the abstract does make this connection clear). 

      P5 paragraph 1 "theca" should be "the". 

      P6 paragraph 1 problem with ignashenkova citation insert. 

      P10 paragraph 1 I think the spike rate "difference between highest and lowest" is not exactly the same as "variance," you might want to change the terminology. 

      P10 paragraph 1 should probably say "To determine if a cell preferentially modulated". 

      P10 paragraph 1 last sentence the last clause could be clearer. 

      P17 paragraph 2 should be something like "as well as those by Carpenter and..."? 

      P20 caption: consider "...directionality in the task: only one C-stim...". 

      P20 caption: consider "to the left and right in the [L/R] task...to the top/bottom in the [U/D] task". 

      Fig1E and S1 - is there a physical meaning of the "weight" unit, and if none, can this be transformed into a more meaningful unit? 

      P21 paragraph 1 consider "activity was recorded for 304 DN neurons...". 

      P21 paragraph 1 "correlations with the temporal windows" it's not clear how activity can "correlate" with a time window, consider rephrasing (activity levels changed during these time epochs, depending on stimulus identity). 

      P21 paragraph 1 should be "by comparing the number of spikes in a bin...". 

      P22 paragraph 2 "when we aligned the neurons to the time of maximum change" needs clarification. The maximum change of what? And per neuron? Across the population? 

      P22 paragraph 2 "than that of the facilitating" should be "than did the facilitating units". 

      P24 paragraph 1 needs a comma and rewording "Within each direction, trials are sorted by the time of saccade onset". 

      P24 paragraph 1 should probably say "Same as in G, but for suppressed cells". 

      P24 paragraph 2 should say "more than one task event" not "events". 

      P24 paragraph 2 needs a comma "To fully characterize the neural responses, we fitted". 

      P25 paragraph 1 should probably say "we sampled from similar populations of DN". 

      P34 paragraph 3 consider rephrasing the sentence that contains both "dissociation" and "dissociate". 

      P37 last line: consider "coordination of cerebellum and cerebral cortex *in* higher order mental..."? 

      P38 paragraph 1 citation needed for "kinematics of goal-directed hand actions of others"? 

      P38 paragraph 1 commas probably not needed "map visual input, from high-level visual regions, onto..." 

      References

      - Herzfeld D.J., Kojima Y, Soetedjo R, Shadmehr R (2018) Encoding of error and learning to correct that error by the Purkinje cells of the cerebellum. Nat Neurosci 21:736–743.

      - van Es, D.M., van der Zwaag W., and Knapen T. (2019) Topographic Maps of Visual Space in the Human Cerebellum. Current Biol Volume 29, Issue 10p1689-1694.e3May 20.

      - De Zeeuw CI, Wylie DR, Stahl JS, Simpson JI. (1995) Phase relations of Purkinje cells in the rabbit flocculus during compensatory eye movements. J Neurophysiol. Nov;74(5):2051-64. doi: 10.1152/jn.1995.74.5.2051.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors bring together implanted radiofrequency coils, high-field MRI imaging, awake animal imaging, and sensory stimulation methods in a technological demonstration. The results are very detailed descriptions of the sensory systems under investigation.

      Strengths:

      - The maps are qualitatively excellent for rodent whole-brain imaging. - The design of the holder and the coil is pretty clever.

      Weaknesses:

      - Some unexpected regions appear on the whole brain maps, and the discussion of these regions is succinct.

      - The authors do not make the work and e ort to train the animals and average the data from several hundred trials apparent enough. This is important for any reader who would like to consider implementing this technology.

      - The data is not available. This does not let the readers make their own assessment of the results.

      Thank you for the comments on this manuscript. We have provided more detailed discussion of the unexpected regions(page 18 – line 491-494) and training procedures(page7-9 – line 172-236). We also uploaded the datasets to OpenNeuro 

      Whisker (https://doi.org/10.18112/openneuro.ds005496.v1.0.1),  Visual (https://doi.org/10.18112/openneuro.ds005497.v1.0.0) and Zenodo:

      SNR Line Profile Data & Data Processing Scripts:  (https://zenodo.org/doi/10.5281/zenodo.13821455). 

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Hike et al. entitled 'High-resolution awake mouse fMRI at 14 Tesla' describes the implementation of awake mouse BOLD-fMRI at high field. This work is timely as the field of mouse fMRI is working toward collecting high-quality data from awake animals. Imaging awake subjects o ers opportunities to study brain function that are otherwise not possible under the more common anesthetized conditions. Not to mention the confounding e  ects that anesthesia has on neurovascular coupling. What has made progress in this area slow (relative to other imaging approaches like optical imaging) is the environment within the MRI scanner (high acoustic noise) - as well as the intolerance of head and body motion. This work adds to a relatively small, but quickly growing literature on awake mouse fMRI. The findings in the study include testing of an implanted head-coil (for MRI data reception). Two designs are described and the SNR of these units at 9.4T and 14T are reported. Further, responses to visual as well as whisker stimulation recorded in acclimated awake mice are shown. The most interesting finding, and most novel, is the observation that mice seem to learn to anticipate the presentation of the stimulus - as demonstrated by activations evident ~6 seconds prior to the presentation of the stimulus when stimuli are delivered at regular intervals (but not when stimuli are presented at random intervals). These kinds of studies are very challenging to do. The surgical preparation and length of time invested into training animals are grueling. I also see this work as a step in the right direction and evidence of the foundations for lots of interesting future work. However, I also found a few shortcomings listed below.

      Weaknesses:

      (1) The surface coil, although o ering a great SNR boost at the surface, ultimately comes at a cost of lower SNR in deeper more removed brain regions in comparison to commercially available Bruker coils (at room temperature). This should be quantified. A rough comparison in SNR is drawn between the implanted coils and the Bruker Cryoprobe - this should be a quantitative comparison (if possible) - including any di erences in SNR in deeper brain structures. There are drawbacks to the Cryoprobe, which can be discussed, but a more thorough comparison between the implanted coils, and other existing options should be provided (the Cryoprobe has been used previously in awake mouse experiments(Sensory evoked fMRI paradigms in awake mice - Chen, Physiological e ects of a habituation procedure for functional MRI in awake mice using a cryogenic radiofrequency probe – Yoshida, PREVIOUS REFERENCE). Further, the details of how to build the implanted coils should be provided (shared) - this should include a parts list as well as detailed instructions on how to build the units. Also, how expensive are they? And can they be reused?

      Thank you for the comment. We did not use a Bruker Cryoprobe for this work but rather a Bruker 4array surface coil. We are unable to compare to a cryoprobe since we do not have access to one for our system. A comparison to previously published data using different scanners could be possible but would require the sequence contain identical parameters to avoid introducing an uncontrollable variable, we are planning to recruit different laboratories to test the implanted RF coils with their existing cryoprobes in the future study. 

      We have included an updated figure comparing SNR at different depths across the Bruker 4-array coil and the implanted RF coils. As shown in Supplementary Figure 7B, there is significant SNR enhancement up to 4 mm cortical depth for both single loop and Figure 8 implanted RF coils in comparison to the Bruker 4-array coil.

      Author response image 1.

      Comparison between implanted and commercial coils. A shows representative coils in the single loop (left) and figure 8 styles (right). Supplementary Table 1 provides a parts list and cost for making these coils and Supplementary Figure 1 provides a circuit diagram to assemble. B presents the SNR line profile values as a function of distance from Pia Matter for each coil tested at 9.4T: commercial phased array surface coil (4 Array), implanted single loop, and implanted figure 8. SNR values were calculated by dividing the signal by the standard deviation of the noise. C-E shows a representative FLASH image with line profile of SNR measurements from each of the coils used to create the graph seen in B. Clear visual improvement in SNR can be seen in figures C-E. C – Commercial phased array. D – Single loop at 9.4T. E – Figure 8 at 9.4T. (N4 array = 6, Nsingle loop = 5, Nfigure 8 = 5)

      Additionally, we have added a supplementary figure (supp fig 1) of a circuit diagram, in an effort to disseminate the prototype design of the coils to other laboratories. We have included a detailed parts list with the cost for construction of the coils configured for our scanner(supp table 1). These specifics though would need to be adjusted to the precise field strength/bore size/animal the coil was being built for. As for reusability, the copper wire is cemented to the animal skull and this implantable coil should be considered as consumables for the awake mouse experiments, though the PCB parts can be retrieved.  

      (2) In the introduction, the authors state that "Awake mouse fMRI has been well investigated". I disagree with this statement and others in the manuscript that gives the reader the impression that awake experiments are not a challenging and unresolved approach to fMRI experiments in mice (or rodents). Although there are multiple labs (maybe 15 worldwide) that have conducted awake mouse experiments (with varying degrees of success/thoroughness), we are far from a standardized approach. This is a strength of the current work and should be highlighted as such. I encourage the authors to read the recent systematic review that was published on this topic in Cerebral Cortex by Mandino et al. There are several elements in there that should influence the tone of this piece including awake mouse implementations with the Bruker Cryoprobe, prevalence of surgical preparations, and evaluations of stress.

      Thank you for the comment. We agree with the reviewer that the current stage of awake mouse fMRI studies remains to be improved.  And, we have revised the Introduction to highlight the state-of-theart of awake mouse fMRI (Page 4 – line 81-88). 

      (3) The authors also comment on implanted coils reducing animal stress - I don't know where this comment is coming from, as this has not been reported in the literature (to my knowledge) and the authors don't appear to have evaluated stress in their mice. 

      Since question 3 and 4 are highly related to the acclimation procedures, we will answer the two questions together.   

      (4) Following on the above point, measures of motion, stress, and more details on the acclimation procedure that was implemented in this study should be included.

      We thank the reviewer to raise the animal training issues.  

      During the animal training, we have measured both pupil dynamic and eye motion features from training sessions, of which the detailed procedure is described in Methods (page 7-9 – line 172236). 

      The training procedure is carried out over a total of 5 weeks with four phases of training: i. Holding animal in hands, ii. Head-fixation and pupillometry, iii. Head-fixation and pupillometry with mockMRI acoustic exposure, iv. Head-fixation and pupillometry with Echo-Planar-Imaging (EPI) in the MR scanner.

      Author response table 1.

      As shown in Supp Fig 2B, the spectral power of pupil dynamics (<0.02Hz) and eye movements gradually increased as a function of the training time for head-fixed mice exposed to the mock MRI acoustic environment during phase 3.  In phase 4, when head-fixed mice were put into the scanner for the first time, both eye movements and pupil dynamics were initially reduced during scanning but recovered to an acclimated state on Day 2, similar to the level on Day 8 of phase 3.  These behavioral outputs would provide an alternative way to monitor the stress levels of the mice. 

      Author response image 2.

      The eye movements (A) and power spectra of pupil dynamics (<0.02Hz) (B) change during different training phases.

      It should be noted that stress may be related to increased frequency of eye blinking or twitching movements in human subjects(1–3). Whereas, the eyeblink of head-fixed mice has been used for behavioral conditioning to investigate motor learning in normal behaving mice(4–6). Importantly, head-fixed mouse studies have shown that eye movements are significantly reduced compared to the free-moving mice(7). The increased eye movement during acclimation process would indicate an alleviated stress level of the head-fixed mice in our cases. Meanwhile, stress-related pupillary dilation could dominate the pupil dynamics at the early phase of training(8). We have observed a gradually increased pupil dynamic power spectrum at the ultra-slow frequency during phase 3, presenting the alleviated stress-related pupil dilation but recovered pupil dynamics to other factors, including arousal, locomotion, startles, etc. in normal behaving mice.  Despite the extensive training procedure of the present work in comparison to the existing awake mouse fMRI studies (training strategies for awake mice fMRI have been reviewed by Mandino et al. to show the overall training duration of existing studies(9)), the stress remains a confounding factor for the brain functional mapping in head-fixed mice. In particular, a recent study(10) shows that the corticosterone concentration in the blood samples of head-fixed mice is significantly reduced on Day 25 following the training but remains higher than in the control mice. In the discussion section, we have discussed the potential issues of stress-related confounding factors for awake mouse fMRI studies (Page 16 – lines 436-458). 

      (1) A. Marcos-Ramiro, D. Pizarro-Perez, M. Marron-Romera, D. Gatica-Perez, Automatic blinking detection towards stress discovery. ICMI 2014 - Proceedings of the 2014 International Conference on Multimodal Interaction 307–310 (2014). https://doi.org/10.1145/2663204.2663239/SUPPL_FILE/ICMI1520.MP4.

      (2) M. Haak, S. Bos, S. Panic, L. Rothkrantz, DETECTING STRESS USING EYE BLINKS AND BRAIN ACTIVITY FROM EEG SIGNALS. Lance 21, 76 (2009).

      (3) E. Del Carretto Di Ponti E Sessam, Exploring the impact of Stress and Cognitive Workload on Eye Movements: A Preliminary Study. (2023).

      (4) S. A. Heiney, M. P. Wohl, S. N. Chettih, L. I. Ru olo, J. F. Medina, Cerebellar-dependent expression of motor learning during eyeblink conditioning in head-fixed mice. J Neurosci 34, 14845–14853 (2014).

      (5) S. N. Chettih, S. D. Mcdougle, L. I. Ruffolo, J. F. Medina, Adaptive timing of motor output in the mouse: The role of movement oscillations in eyelid conditioning. Front Integr Neurosci 5, 12996 (2011).

      (6) J. J. Siegel, et al., Trace Eyeblink Conditioning in Mice Is Dependent upon the Dorsal Medial Prefrontal Cortex, Cerebellum, and Amygdala: Behavioral Characterization and Functional Circuitry. eNeuro 2, 51–65 (2015).

      (7) A. F. Meyer, J. O’Keefe, J. Poort, Two Distinct Types of Eye-Head Coupling in Freely Moving Mice. Current Biology 30, 2116-2130.e6 (2020).

      (8) H. Zeng, Y. Jiang, S. Beer-Hammer, X. Yu, Awake Mouse fMRI and Pupillary Recordings in the UltraHigh Magnetic Field. Front Neurosci 16, 886709 (2022).

      (9) F. Mandino, S. Vujic, J. Grandjean, E. M. R. Lake, Where do we stand on fMRI in awake mice? Cereb Cortex 34 (2024).

      (10) K. Juczewski, J. A. Koussa, A. J. Kesner, J. O. Lee, D. M. Lovinger, Stress and behavioral correlates in the head-fixed method: stress measurements, habituation dynamics, locomotion, and motor-skill learning in mice. Scientific Reports 2020 10:1 10, 1–19 (2020).

      (5) It wasn't clear to me at what times the loop versus "Figure 8" coil was being used, nor how many mice (or how much data) were included in each experiment/plot. There is also no mention of biological sex.

      Thank you for the comment. We have clarified sex and number. The figure 8 coil was only used as part of development to show the improvement of the coil design for cortical measurements. The detailed information is described in Method (Page 6 – line 127-129 & Page 10 – line 269-270). Additionally animal numbers have been included in the figure captions.

      (6) Building on the points above, the manuscript overall lacks experimental detail (especially since the format has the results prior to the methods).

      Thank you for the comment. We have modified the manuscript to increase the experimental detail and moved the methods section before the results.

      (7) An observation is made in the manuscript that there is an appreciable amount of negative BOLD signal. The authors speculate that this may come from astrocyte-mediated BOLD during brain state changes (and cite anesthetized rat and non-human primate experiments). This is very strange to me. First, the negative BOLD signal is not plotted (please do this), further, there are studies in awake mice that measure astrocyte activation eliciting positive BOLD responses (see Takata et al. in Glia, 2017).

      We thank the reviewer to raise the negative BOLD fMRI observation issue.  We added a subplot of the negative BOLD signal changes in the revised Figure 4. This negative BOLD signals across cortical areas could be coupled with brain state changes upon air-pu -induced startle responses. Our future studies are focusing on elucidating the brain-wide activity changes of awake mice with fMRI.  We also provide a detailed discussion of the potential mechanism underlying the negative BOLD fMRI signals. First, as reported in the paper (suggested  by the reviewer),  astrocytic Ca2+ transients coincide with positive BOLD responses in the activated cortical areas, which is aligning with the neurovascular coupling (NVC) mechanism. However, there is emerging evidence to show that astrocytic Ca2+ transients are coupled with both positive and negative BOLD responses in anesthetized rats(11) and awake mice(12). An intriguing observation is that cortex-wide negative BOLD signals coupled with the spontaneous astrocytic Ca2+ transients could co-exist with the positive BOLD signal detected at the activated cortex.  Studies have shown that astrocytes are involved in regulating brain state changes(13), in particular, during locomotion(14) and startle responses(15). These brain state-dependent global negative BOLD responses are also related to the arousal changes of both non-human primates(16) and human subjects(17).  The established awake mouse fMRI platform with ultra-high spatial resolution will enable the brain-wide activity mapping of the functional nuclei contributing to the brain state changes of head-fixed awake mice in future studies. (Page 17-18 – Line 478-490)

      (11) M. Wang, Y. He, T. J. Sejnowski, X. Yu, Brain-state dependent astrocytic Ca2+ signals are coupled to both positive and negative BOLD-fMRI signals. Proc Natl Acad Sci U S A 115, E1647–E1656 (2018).

      (12) C. Tong, Y. Zou, Y. Xia, W. Li, Z. Liang, Astrocytic calcium signal bidirectionally regulated BOLD-fMRI signals in awake mice in Proc. Intl. Soc. Mag. Reson. Med. 32, (2024).

      (13) K. E. Poskanzer, R. Yuste, Astrocytes regulate cortical state switching in vivo. Proc Natl Acad Sci U S A 113, E2675–E2684 (2016).

      (14) M. Paukert, et al., Norepinephrine controls astroglial responsiveness to local circuit activity. Neuron 82, 1263–1270 (2014).

      (15) R. Srinivasan, et al., Ca2+ signaling in astrocytes from IP3R2−/− mice in brain slices and during startle responses in vivo. Nat Neurosci 18, 708 (2015).

      (16) C. Chang, et al., Tracking brain arousal fluctuations with fMRI. Proc Natl Acad Sci U S A 113, 4518– 4523 (2016).

      (17) B. Setzer, et al., A temporal sequence of thalamic activity unfolds at transitions in behavioral arousal state. Nat Commun 13 (2022).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I really enjoyed this work. The maps shown are among the best-quality maps out there. Here are suggestions to the authors.

      (1) Both the ACA and VRA are rather unexpected. The authors explain these briefly as being part of the associative cortical areas. Both the ACA and VRA are not canonical associative areas (or at least not to us). This warrants a stronger discussion.

      To verify both ACA and VRA as associate areas, we provide the  connectivity map projections from the Allen Brain Atlas (seen below). These projections are derived from a Cre-dependent AAV tracing of axonal projections. We have included an explanation of this in the introduction. 

      Author response image 3.

      Representative images are shown indicating connections between the barrel cortex and retrosplenial area from an injection in the barrel cortex (Left panel) as well as the visual cortex and cingulate connection from an injection in the visual cortex (Right panel). Images are of connectivity map projections from the Allen Brain Atlas derived from a Cre-dependent AAV tracing of axonal projections

      (2) This is a lot of work. But looking at the figures, this is not obvious. We read in the caption that several hundred trials were used. It would be good to also specify how many mice. It would be clearer to represent this info in the figure as well to support the fact that this is not a trivial acquisition.

      Thank the reviewer to raise the e ort issue. We have edited the figure to include this information and included the numbers in the text as well

      (3) The training protocol is seemingly extensive, but this is only visible by following another reference. Including a description in this work would help the reader make sense of the effort that went into this work.

      We thank the reviewer to raise the training protocol issue. We have more thoroughly discussed the training method used for this study (page 7-9 – line 172-236)

      (4) I really would love to see that dataset made freely available - this should be the norm.

      The datasets have been uploaded to OpenNeuro 

      Whisker (https://doi.org/10.18112/openneuro.ds005496.v1.0.1),  Visual (https://doi.org/10.18112/openneuro.ds005497.v1.0.0) and Zenodo:

      SNR Line Profile Data & Data Processing Scripts: 

      (https://zenodo.org/doi/10.5281/zenodo.13821455). 

      (page 21 – line 573-579)

      Reviewer #2 (Recommendations For The Authors):

      (1) I'm a little confused about the stimulation paradigm and the effect of it causing an effective 2second TR (which is on the long side) - please elaborate (a figure might be helpful). The paradigm for visual stimulation also seems elaborate, can you please explain the logic and how it was developed?

      Thank you for raising the detailed stimulation paradigm issues. The stimulation paradigm is independent and does not interfere with the setup of the effective 2-second TR. The 2-second TR is based on the usage of 2-segment EPI, each with a TR of 1-second. The application of 2-segment paradigm enables the echo spacing with 0.52 ms with effective image bandwidth with 3858Hz, assuring less image distortion.  The stimulation paradigm was defined by an “8s on, 32s o ” epoch such to elicit a strong BOLD response and could be used for any reasonable TR duration. 

      We have included a figure outlining the stimulation paradigm (Supp Fig. 3)

      (2) I had difficulties viewing the movies (on my MAC).

      Thank you for this note. We have re-upload the videos in .mov format

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The pituitary gonadotropins, FSH and LH, are critical regulators of reproduction. In mammals, synthesis and secretion of FSH and LH by gonadotrope cells are controlled by the hypothalamic peptide, GnRH. As FSH and LH are made in the same cells in mammals, variation in the nature of GnRH secretion is thought to contribute to the differential regulation of the two hormones. In contrast, in fish, FSH and LH are produced in distinct gonadotrope populations and may be less (or differently) dependent on GnRH than in mammals. In the present manuscript, the authors endeavored to determine whether FSH may be independently controlled by a distinct peptide, cholecystokinin (CCK), in zebrafish.

      Strengths:

      The authors demonstrated that the CCK receptor is enriched in FSH-producing relative to LH-producing gonadotropes, and that genetic deletion of the receptor leads to dramatic decreases in gonadotropin production and gonadal development in zebrafish. Also, using innovative in vivo and ex vivo calcium imaging approaches, they show that LH- and FSH-producing gonadotropes preferentially respond to GnRH and CCK, respectively. Exogenous CCK also preferentially stimulated FSH secretion ex vivo and in vivo.

      Weaknesses:

      The concept that there may be a distinct FSH-releasing hormone (FSHRH) has been debated for decades. As the authors suggest that CCK is the long-sought FSHRH (at least in fish), they must provide data that convincingly leads to such a conclusion. In my estimation, they have not yet met this burden. In particular, they show that CCK is sufficient to activate FSH-producing cells, but have not yet demonstrated its necessity. Their one attempt to do so was using fish in which they inactivated the CCK receptor using CRISPR-Cas9. While this manipulation led to a reduction in FSH, LH was affected to a similar extent. As a result, they have not shown that CCK is a selective regulator of FSH.

      Our conclusion regarding the necessity of CCK signaling for FSH secretion is based on the following evidence:

      (1) CCK-like receptors are expressed in the pituitary gland predominantly on FSH cells.

      (2) Application of CCK to pituitaries elicits FSH cell activation and to a much lesser degree activation of LH cells.  (calcium imaging assays)

      (3) Application of CCK to pituitaries and by injections in-vivo significantly increased only FSH release.

      (4) Mutating the FSH-specific CCK receptor in a different species of fish (medaka) also causes a complete shutdown of FSH production and phenocopies a fsh-mutant phenotype (Uehara, Nishiike et al. 2023).

      Taken together, we believe that this data strongly supports the conclusion that CCK is necessary for FSH production and release from the fish pituitary. Admittedly, the overlapping effects of CCK on both FSH and LH cells in zebrafish (evident in both our calcium imaging experiments and especially in the KO phenotype) complicates the interpretation of the phenotype. We speculate that the effect of CCK on LH cells in zebrafish can be caused either by paracrine signaling within the gland or by the effects of CCK on GnRH neurons that were shown to express CCK receptors .

      In the current version, we emphasize that CCK also induces LH secretion. Although it does not affect LH to the same extent as FSH, an overlap does exist. This is mentioned in the abstract and discussion.

      Moreover, they do not yet demonstrate that the effects observed reflect the loss of the receptor's function in gonadotropes, as opposed to other cell types.

      Although there is evidence for the expression of CCK receptor in other tissues, we do show a direct decrease of FSH and LH expression in the gonadotrophs of the pituitary of the mutant fish; taken together with its significant expression in FSH cells compared to the rest of the cells of the pituitary in the cell specific transcriptomic, it is the most reasonable explanation for the mutant phenotype.

      Unfortunately, unlike in mice, technologies for conditional knockout of genes in specific cell types are not yet available for our model and cell types. Additional tissue distribution of the three receptors types of CCK was added in supplementary figure 1, from this tissue distribution it can be appreciated how in the pituitary only CCKBRA (our identified CCK receptor) is expressed, while in other tissues it is either not expressed or expressed with the additional CCK receptors that can compensate its activity.

      It also is not clear whether the phenotypes of the fish reflect perturbations in pituitary development vs. a loss of CCK receptor function in the pituitary later in life. Ideally, the authors would attempt to block CCK signaling in adult fish that develop normally. For example, if CCK receptor antagonists are available, they could be used to treat fish and see whether and how this affects FSH vs. LH secretion.

      While the observed gonadal phenotype of the KO (sex inversed fish) should have a developmental origin since it requires a long time to manifest, the effect of the KO on FSH and LH cells is probably more acute. Unfortunately a specific antagonist that affect only CCKRBA and not the other CCK receptors wasn’t identified yet.

      In the Discussion, the authors suggest that CCK, as a satiety factor, may provide a link between metabolism and reproduction. This is an interesting idea, but it is not supported by the data presented. That is, none of the results shown link metabolic state to CCK regulation of FSH and fertility. Absent such data, the lengthy Discussion of the link is speculative and not fully merited.

      In the revised manuscript, we provided data to link cck with metabolic status in supplementary figure 1 and modified the discussion to tone down the link between metabolic status to and reproductive state.

      Also in the Discussion, the authors argue that "CCK directly controls FSH cells by innervating the pituitary gland and binding to specific receptors that are particularly abundant in FSH gonadotrophs." However, their imaging does not demonstrate innervation of FSH cells by CCK terminals (e.g., at the EM level).

      Innervation of the fish pituitary does not imply a synaptic-like connection between axon terminals and endocrine cells. In fact, such connections are extremely rare, and their functionality is unclear. Instead, the mode of regulation between hypothalamic terminals and endocrine cells in the fish pituitary is more similar to "volume transmission" in the CNS, i.e. peptides are released into the tissue and carried to their endocrine cell targets by the circulation or via diffusion. A short explanation was added in lines 395-398 in the discussion

      Moreover, they have not demonstrated the binding of CCK to these cells. Indeed, no CCK receptor protein data are shown.

      Our revised manuscript  includes detailed experiments showing the activation of the receptor by its homologous ligand, supplementary Figure 1 includes a transactivation  assay of CCK to its receptor and the effect of the different mutants on the activation of the receptor. Unfortunately, no antibody is available against this fish specific receptor (one of the caveats of working with fish models); therefore, we cannot present receptor protein data.

      The calcium responses of FSH cells to exogenous CCK certainly suggest the presence of functional CCK receptors therein; but, the nature of the preparations (with all pituitary cell types present) does not demonstrate that CCK is acting directly in these cells.

      We agree with the reviewer that there are some disadvantages in choosing to work with a whole-tissue preparation. However, we believe that the advantages of working in a more physiological context far outweigh the drawbacks as it reflects the natural dynamics more precisely. Since our transcriptome data, as well as our ISH staining, show that the CCK receptor is exclusively expressed in FSH cells, it is improbable that the observed calcium response is mediated via a different pituitary cell type.

      Indeed, the asynchrony in responses of individual FSH cells to CCK (Figure 4) suggests that not all cells may be activated in the same way. Contrast the response of LH cells to GnRH, where the onset of calcium signaling is similar across cells (Figure 3).

      The difference between the synchronization levels of LH and FSH cells activity stems from the gap-junction mediated coupling between LH cells that does not exist between FSH cells(Golan, Martin et al. 2016). Therefore, the onset of calcium response in FSH cells is dependent on the irregular diffusion rate of the peptide within the preparation, whereas the tight homotypic coupling between LH cells generates a strong and synchronized calcium rise that propagates quickly throughout the entire population

      The differences in connectivity between LH and FSH cells is mentioned in lines 194-195

      Finally, as the authors note in the Discussion, the data presented do not enable them to conclude that the endogenous CCK regulating FSH (assuming it does) is from the brain as opposed to other sources (e.g., the gut).

      We agree with the reviewer that, for now, we are unable to determine whether hypothalamic or peripheral CCK are the main drivers of FSH cells. While the strong innervation of the gland by CCK-secreting hypothalamic neurons strengthens the notion of a hypothalamic-releasing hormone and also fits with the dogma of the neural control of the pituitary gland in fish (Ball 1981), more experiments are required to resolve this question.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript builds on previous work suggesting that the CCK peptide is the releasing hormone for FSH in fishes, which is different than that observed in mammals where both LH and FSH release are under the control of GnRH. Based on data using calcium imaging as a readout for stimulation of the gonadotrophs, the researchers present data supporting the hypothesis that CCK stimulates FSH-containing cells in the pituitary. In contrast, LH-containing cells show a weak and variable response to CCK but are highly responsive to GnRH. Data are presented that support the role of CCK in the release of FSH. Researchers also state that functional overlap exists in the potency of GnRH to activate FSH cells, thus the two signalling pathways are not separate. The results are of interest to the field because for many years the assumption has been that fishes use the same signalling mechanism. These data present an intriguing variation where a hormone involved in satiation acts in the control of reproduction.

      Strengths:

      The strengths of the manuscript are that researchers have shed light on different pathways controlling reproduction in fishes.

      Weaknesses:

      Weaknesses are that it is not clear if multiple ligand/receptors are involved (more than one CCK and more than one receptor?). The imaging of the CCK terminals and CCK receptors needs to be reinforced.

      Reviewer consultation summary: 

      The data presented establish sufficiency, but not necessity of CCK in FSH regulation. The paper did not show that CCK endogenously regulates FSH in fish. This has not been established yet.

      This is a very important comment, also raised by reviewer 1. To avoid repetition, please see our detailed response to the comment above.

      The paper presents the pharmacological effects of CCK on ex vivo preparations but does not establish the in vivo physiological function of the peptide. The current evidence for a novel physiological regulatory mechanism is incomplete and would require further physiological experiments. These could include the use of a CCK receptor antagonist in adult fish to see the effects on FSH and LH release, the generation of a CCK knockout, or cell-specific genetic manipulations.

      As detailed in the responses to the first reviewer, we cannot conduct conditional, cellspecific gene knockout in our model. However we did conducted KO and show the direct effect on FSH and LH secretion together with physiological characterisation of the mutant.

      Zebrafish have two CCK ligands: ccka, cckb and also multiple receptors: cckar, cckbra and cckbrb. There is ambiguity about which CCK receptor and ligand are expressed and which gene was knocked out.

      In the revised manuscript, we clarified which of the receptors are expressed (CCKRBA) and which receptor is targeted. We also provided data showing the specificity of the receptors (both WT and mutant) to the ligands. Supplementary 1 shows receptor cross-activation. The method also specifies the exact NCBI ID numbers of the targeted receptor and the antibody used for the immunostaining.

      Blocking CCK action in fish (with receptor KO) affects FSH and LH. Therefore, the work did not demonstrate a selective role for CCK in FSH regulation in vivo and any claims to have discovered FSHRH need to be more conservative.

      We agree with the reviewer that the overlap in the effect of CCK measured in the calcium activation of cells and in the KO model does not allow us to conclude selectivity. In this context, it is crucial to highlight that CCKRBA exhibits high expression on FSH cells but not on LH cells. Therefore, the effect of CCK on LH cells is likely paracrine or through GnRH neurons that were shown to express CCK receptors. In the current version, we emphasize that CCK also induces LH secretion. Although it does not affect LH to the same extent as FSH, an overlap does exist. This is mentioned in the abstract and discussion.

      The labelling of the terminals with anti-CCK looks a lot like the background and the authors did not show a specificity control (e.g. anti-CCK antibody pre-absorbed with the peptide or anti-CCK in morphant/KO animals).

      Figures colours had been updated to better visualise the specific staining of the antibody. Also, The same antibody had been previously used to mark CCK-positive cells in the gut of the red drum fish(Webb, Khan et al. 2010) , where a control (pre-absorbed with the peptide) experiment had been conducted.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Abstract:

      The authors have not yet established that CCK is the primary regulator of FSH in vivo.

      In the new version, we highlight the leading effect of CCK on the reproductive axis, which includes FSH and LH.

      Introduction:

      The authors need to make clear earlier in the Introduction that fish have two types of gonadotropes. This information comes too late (last paragraph) currently.

      Added in line 42

      They should discuss relevant data on the differential regulation of FSH and LH in fish, as a rationale for looking for different releasing factors.

      This has been discussed in the first paragraph of the introduction

      In the last sentence of the penultimate paragraph, the authors assume that it must be a hypothalamic factor that regulates FSH. Why is this necessarily the case? Are there data indicating that a hypothalamic factor is required for FSH production in fish?

      This has been mentioned in the discussion, we do not deny that circulating CCK or CCK from other brain areas might affect FSH secretion in the pituitary (line 402-404). However, as the hypothalamus serves as the main gateway from the brain to the pituitary and contains hypophysiotropic CCK neurons it is the most reasonable assumption.

      Results:

      In the first paragraph, the authors reference three types of CCK receptors, only one of which is expressed in the pituitary. The specific receptor should be named here.

      The receptor name and NCBI id had been added in this paragraph.

      Figure 1: What specificity controls were used for the ISH in Figure 1?

      HCR- The method used to identify RNA expression and developed by Molecular Instruments (https://www.molecularinstruments.com/hcr-rnafish-protocols), do not require specific control as had been previously done with older ISH methods. The use of multiple short probes assure the specificity to the RNA.More over the expression is specific to the targeted cells.

      In Figure 1D, the red square is missing in the KO fish (at low magnification).

      This was fixed in the updated version.

      In Figure 1G, the number of dots does not correspond to the number of animals described in the figure legend. Does each point represent an animal?

      Each dot represent a fish. The order of the numbers in the legend didn’t match the order in the graph, this had been fixed in the last version

      Figure 2A: It is not clear that all FSH (GFP) cells are double-labeled. Should all double-labeled cells appear white? Many appear as green. Some quantification of the proportion of co-labeling is needed. Also, the scale bars are too small to read. Perhaps add the size of the scale bars to the legend.

      They are all double-labeled, as can be seen by the single-color images, since GFP fluorescence is stronger than RCaMP fluorescence, the double-labelling might be seen a green cells; a scale bar was added.

      Figure 2C: Is the synchronous activity of LH cells here dependent on endogenous GnRH? Can these events be blocked with a GnRH receptor antagonist?

      We currently do not have enough data to support this hypothesis and the in vivo 2 photon system is not optimal to answer these questions since these are spontaneous events which are difficult to predict. This is the main reason we moved to an ex vivo system. The similar response we receive when applying GnRH in the ex vivo system support it is GnRH activation.

      Figure 4C: As some LH cells respond to CCK, can the authors really claim that CCK is a selective regulator of FSH? What explains the heterogeneity in the response of LH cells to CCK?

      In this version, we highlight that CCK directly activates FSH but it is also affecting LH to some extent. However it is clear that the effect on FSH cells is more significant.

      Figures 5A and B: With larger Ns, some of the trends might be significant (e.g., GnRH stimulated FSH release and CCK stimulated LH release).

      Though there is a trend, the values in the Y axis reveal that the trend of response of FSH to GnRH and LH to CCK is lower then the distribution of the basal response (the before) in all of the graphs. Hence we do not believe a larger N will affect those results. We added the range of the secreted hormones concentrations in the result description to emphasize the difference in values,

      Figures 5C and D: What explains the lack of an increase in LH secretion following GnRH treatment?

      We did not measure LH Secretion in the plasma as we didn’t have enough blood, we do see an increase in LH transcription (see supplementary figure 5 – figure supplement 1)

      Also, as mRNA levels were measured (in C), reference should be made to expression rather than transcription. Not all changes in mRNA levels reflect changes in transcription.Also, remove transcription from the legend. Reference to supplementary Figure 4 in the legend should be supplementary Figure 6. Finally, in C and D, distinguish males from females (as in 5A and B).

      Modifications had been done according to the reviewer suggestions.

      Figure legends:

      The figure legends are very long. One way to shorten them is to remove descriptions of the results. The legends should indicate what is in each figure, not the results of the experiments.

      Modifications had been done according to the reviewer suggestions.

      Sample sizes should be spelled out in the legends, as they are not in the M&M.

      We made sure all sample sizes are mentioned in the legend

      Materials and Methods:

      Section 1.1 can be removed as it repeats content presented elsewhere.

      This section was removed

      Section 1.5: It is unclear what this means: "blinding was not applied to ensure tractability" Please clarify.

      This section was removed

      Reviewer #2 (Recommendations For The Authors):

      It appears that zebrafish have two ligands: ccka, cckb. Also multiple receptors: cckar, cckbra and cckbrb. Authors need to discuss this and clearly state which ligand and which receptor they are referring to in the manuscript.

      We discussed the receptor type in the first paragraph of the results, the exact synthetic peptide used is described in the methods. The 8 amino acids of the mature CCK peptide are the same between CCKa and CCKb. A sentence regarding the specificity of the antibody to the mature CCK peptide was added in line 101.

      "to GnRH puff application (300 μl of 30 μg/μl)"; (250 μl of 30 μg/ml CCK)

      Please give the final concentration to make it easy on the readers of the data.

      The molarity of the final concentration was added.

      (2.4) Differential calcium response underlies differential hormone. This section is a bit confusing to read, for example:

      "For that, we collected the medium perfused through our ex vivo system (Fig. 2a) and measured LH and FSH levels using a specific ELISA validated for zebrafish [31] while monitoring the calcium activity of the cells."

      So the authors did the ELISA while monitoring the activity (?). This sentence does not make sense: please rewrite it.

      We modified this sentence  in line 308-311

      To functionally validate the importance of CCK signalling we used CRISPR-cas9 to generate loss-of-function (LOF) mutations in the pituitary- CCK receptor gene.

      The authors need to clearly state WHICH gene they inactivated: Zebrafish have three CCK-receptors, so "the pituitary receptor gene" needs to be defined.

      Was added again in line 107, and is mentioned in the methods

      Figure 3 is a crucial figure!

      Figure 3B: The data are not very convincing. Please state how thick the sections are in the figure legend (assuming these are adult pituitaries),

      Added in the legend (figure 1C in the new version), slice thickness and adult fish.

      Please show at least the merged image a high magnification view of the co-localization of the receptor with the cells.

      This is figure 1 in the new revision, a magnified figure was added

      Please give the scale bar size for 3B.

      Scales for all images were added

      Figure 3C: the co-localization of the terminals of the CCK and FSH cells shows very few cells expressing close to terminals.

      Important: Because the labelling of the terminals with anti-CCK looks a lot like the background, it is very important to show the control (anti-CCK antibody pre-absorbed with the peptide). The authors should have these data. The photo needs to have been taken at the same gain (contrast) and the photo showing the terminals.

      This is  a commercial antibody that had been previously validated for CCK in fish. The co-localization pattern resembles GnRH innervation in the pituitary. In fish when hypothalamic neurons innervate the pituitary they do not innervate all the cells, as this is an endocrine system, the peptide can travel to neighbouring cells via diffusion or aided blood flow (Golan, Zelinger et al. 2015) ).  The images reveal the direct innervation of CCK in the pituitary and its proximity to FSH cells.

      Figure 4c, on right. The text seems to be stretched as if the photo was adjusted without locking the aspect ratio. Please check the original images.

      This has been fixed

      Can the authors use different pseudo colours? Differentiating a double label of white versus yellow is very difficult, and thus the photo is not very convincing.

      This had been changed to green and magenta

      What is meant by "CCK-AB" antibody? Perhaps anti-CCK would be a better label

      This has been fixed

      Figure 5A: increase the magnification of the insets; the structure of the gonads is very difficult to see with clarity in these low mag images. The most obvious way to improve this figure is to reduce or eliminate the pie graph (not really necessary) and show a high magnification (and larger) image of the gonadal structure.

      This is figure 1 in the new version, with magnification of the gonad next to each body section.

      Discussion:

      " Moreover, in the zebrafish, as well as in other species, the functional overlap in gonadotropin signalling pathways is not limited to the pituitary but is also present in the gonad, through the promiscuity of the two gonadotropin receptors"<br /> The reasoning of this sentence is not clear: zebrafish do not use GnRH to control reproduction: they lack GnRH1 through genomic rearrangement (see Whitlock, Postlethwait and Ewer 2019) and KO of GnRH2/GnRH3 does not affect reproduction.

      While GnRH KO model indicate a redundancy of GnRH in this axis in zebrafish, there is also ample evidence for its importance in regulating reproduction such as its effect on gonadotropin (Golan, Martin et al. 2016) and its use in spawning inductions in fish (Mizrahi and Levavi-Sivan 2023). We believe it is currently too soon to conclude that GnRH signalling is completely non relevant to reproduction in cyprinids.  

      Reviewing Editor (Recommendations For The Authors):

      It would be interesting to see calcium imaging experiments in the CCKR receptor mutants to establish a more direct connection between peptide action and activity.

      We added a receptor assay that reflect the non-activation of the mutated receptors by CCK (supplementary figure 1) , and compared it to the wild type that is activated. This show that: 1) CCK directly activate our identified receptor in FSH cells. 2) the mutated receptors are non-active.

      "all homozygous fish (CCKR+12/+7/-1/ CCKR+12/+7/-1, n=12)"

      It may be better to write the genotype of fish separately as CCKR+12/+12, CCKR+7/+7 and CCKR-1/-1, n=12) otherwise it seems as if all alleles occurred together in the same fish.

      Modified according to the reviewer request

      In Figure 1 scale bar legends are very small. 

      Description of the scale bars were added to the all the legends

      Figure 1 legend "On the top right of each panel is the gender distribution" - fish have no gender but sex.

      Modified according to the reviewer request

      The authors should endeavour to improve the presentation of the figures. They should use a sans-serif font and check that text is not cut at the edge of figure panels, that scale bars are uniform and clearly labelled and fonts are of similar size and clearly legible. E.g. labels of the fish brain of Fig3A are very small.

      We modified all the figures to adapt the font and the scales, we increased the size of the image in Figure 3a to make the labels clearer.

      Please use the elife format to name supplementary figures, as Figure X - Figure Supplement Y (each supplement associated with one of the main figures).

      Fixed

      Peptide concentrations in the ex vivo experiments should also be given as molar concentrations not only as '250 μl of 30 μg/ml CCK'.

      Fixed

      "In contrast, FSH cells responded with a very low calcium rise in hormonal secretion in response to GnRH" - a very low rise in hormonal secretion

      Fixed

      Please clarify why you used a GnRH synthetic agonist and not the native peptide.

      It is commonly used for spawning induction in fish (line 245); it has also been shown to directly affect the secretion of LH and FSH (Biran, Golan et al. 2014, Biran, Golan et al. 2014, Mizrahi, Gilon et al. 2019) , added to line 245.

      References

      Ball, J. (1981). "Hypothalamic control of the pars distalis in fishes, amphibians, and reptiles." General and comparative endocrinology 44(2): 135-170.

      Biran, J., M. Golan, N. Mizrahi, S. Ogawa, I. S. Parhar and B. Levavi-Sivan (2014). "Direct regulation of gonadotropin release by neurokinin B in tilapia (Oreochromis niloticus)." Endocrinology 155(12): 4831-4842.

      Biran, J., M. Golan, N. Mizrahi, S. Ogawa, I. S. Parhar and B. Levavi-Sivan (2014). "LPXRFa, the Piscine Ortholog of GnIH, and LPXRF Receptor Positively Regulate Gonadotropin Secretion in Tilapia (Oreochromis niloticus)." Endocrinology 155(11): 4391-4401.

      Golan, M., A. O. Martin, P. Mollard and B. Levavi-Sivan (2016). "Anatomical and functional gonadotrope networks in the teleost pituitary." Scientific Reports 6: 23777.

      Golan, M., E. Zelinger, Y. Zohar and B. Levavi-Sivan (2015). "Architecture of GnRH-Gonadotrope-Vasculature Reveals a Dual Mode of Gonadotropin Regulation in Fish." Endocrinology 156(11): 4163-4173.

      Mizrahi, N., C. Gilon, I. Atre, S. Ogawa, I. S. Parhar and B. Levavi-Sivan (2019). "Deciphering Direct and Indirect Effects of Neurokinin B and GnRH in the Brain-Pituitary Axis of Tilapia." Front Endocrinol (Lausanne) 10: 469.

      Mizrahi, N. and B. Levavi-Sivan (2023). "A novel agent for induced spawning using a combination of GnRH analog and an FDA-approved dopamine receptor antagonist." Aquaculture 565: 739095.

      Uehara, S. K., Y. Nishiike, K. Maeda, T. Karigo, S. Kuraku, K. Okubo and S. Kanda (2023). "Cholecystokinin is the follicle-stimulating hormone (FSH)-releasing hormone." bioRxiv: 2023.2005.2026.542428.

      Webb, K. A., Jr., I. A. Khan, B. S. Nunez, I. Rønnestad and G. J. Holt (2010). "Cholecystokinin: molecular cloning and immunohistochemical localization in the gastrointestinal tract of larval red drum, Sciaenops ocellatus (L.)." Gen Comp Endocrinol 166(1): 152-159.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      The authors introduce a computational model that simulates the dendrites of developing neurons in a 2D plane, subject to constraints inspired by known biological mechanisms such as diffusing trophic factors, trafficked resources, and an activity-dependent pruning rule. The resulting arbors are analyzed in terms of their structure, dynamics, and responses to certain manipulations. The authors conclude that 1) their model recapitulates a stereotyped timecourse of neuronal development: outgrowth, overshoot, and pruning 2) Neurons achieve near-optimal wiring lengths, and Such models can be useful to test proposed biological mechanisms- for example, to ask whether a given set of growth rules can explain a given observed phenomenon - as developmental neuroscientists are working to understand the factors that give rise to the intricate structures and functions of the many cell types of our nervous system. 

      Overall, my reaction to this work is that this is just one instantiation of many models that the author could have built, given their stated goals. Would other models behave similarly? This question is not well explored, and as a result, claims about interpreting these models and using them to make experimental predictions should be taken warily. I give more detailed and specific comments below.  

      We thank the reviewer for the summary of the work. But the criticism “that this is one instantiation of many models [we] could have built” is unfair as it can apply to any model. We chose one of the most minimalistic models which implements known biological mechanisms including activity-independent and -dependent phases of dendritic growth, and constrained parameters based on experimental data. We compare the proposed model to other alternatives in the Discussion section. In the revised manuscript, we additionally investigate the sensitivity of model output to variations of specific parameters, as explained below.

      Point 1.1. Line 109. After reading the rest of the manuscript, I worry about the conclusion voiced here, which implies that the model will extrapolate well to manipulations of all the model components. How were the values of model parameters selected? The text implies that these were selected to be biologically plausible, but many seem far off. The density of potential synapses, for example, seems very low in the simulations compared to the density of axons/boutons in the cortex; what constitutes a potential synapse? The perfect correlations between synapses in the activity groups is flawed, even for synapses belonging to the same presynaptic cell. The density of postsynaptic cells is also orders of magnitude of, etc. Ideally, every claim made about the model's output should be supported by a parameter sensitivity study. The authors performed few explorations of parameter sensitivity and many of the choices made seem ad hoc.  

      We have performed detailed sensitivity analysis on the model parameters mentioned by the reviewer, including (I) the density of postsynaptic cells (somatas), (II) the density of potential synapses, and (III) the level of correlations between synapses. 

      (I) While the density of postsynaptic cells in our baseline model seems a bit low, at least when compared to densities observed in adulthood (Keller et al., 2018), we explored how altering this value affects the model dynamics. We found that the postsynaptic cell density does not affect the timing of dendritic outgrowth, overshoot and synaptic pruning. It only changes the final size of the dendritic arbor and the resulting number of connected synapses. This analysis is now included in Supplementary Figure 3-2.

      (II) The density of potential synapses and the density of connected synapses that we used in the manuscript are already in the range of densities that can be found in the literature (Leighton et al., 2024; Ultanir et al., 2007; Glynn et al., 2011; Yang et al., 2014), some of which we already cited in the original submission.

      A potential concern might be that the rapid slowing down of growth in the model could be due to a depletion of potential synapses. To illustrate that this is not the case, we showed that the number of available potential synapses over the time course of the simulations remains high (Figure 3, new panel e). Therefore, the initial density of potential synapses is sufficient and does not affect the final density of connected synapses.

      To further illustrate the robustness of our model dynamics to longer simulation times, we added a new supplementary figure (Supplementary Figure 3-1).

      These new figure additions (Figure 3e, Supplementary Figure 3-1, and Supplementary Figure 3-2) and their implications for the model dynamics are discussed in the Results section of the revised paper:

      p.9 line 198, “After the initial overshoot and pruning, dendritic branches in the model stay stable, with mainly small subbranches continuing to be refined (Figure 3-Figure Supplement 1). This stability in the model is achieved despite the number of potential synaptic partners remaining high (Figure 3e), indicating a balance between activity-independent and activitydependent mechanisms. The dendritic growth and synaptic refinement dynamics are independent of the postsynaptic somata densities used in our simulations (Figure 3-Figure Supplement 2). Only the final arbor size and the number of connected synapses decrease with an increase in the density of the somata, while the timing of synaptic growth, overshoot and pruning remains the same (Figure 3-Figure Supplement 2).”

      We also added more details to the description of our model in the Methods section:

      p.24 line 615, “For all simulations in this study, we distributed nine postsynaptic somata at regular distances in a grid formation on a 2-dimensional 185 × 185 pixel area, representing a cortical sheet (where 1 pixel = 1 micron, Figure 4). This yields a density of around 300 neurons per 𝑚𝑚2 (translating to around 5,000 per 𝑚𝑚3, where for 25 neurons in Figure 3Figure Supplement 2 this would be around 750 neurons per 𝑚𝑚2 or 20,000 per 𝑚𝑚3). The explored densities are a bit lower than compared to neuron densities observed in adulthood (Keller et al., 2018). In the same grid, we randomly distributed 1,500 potential synapses, yielding an initial density of 0.044 potential synapses per 𝜇𝑚2 (Figure 3e). At the end of the simulation time, around 1,000 potential synapses remain, showing that the density of potential synapses is sufficient and does not significantly affect the final density of connected synapses. Thus, the rapid slowing down of growth in our model is not due to a depletion of potential synaptic partners. The resulting density of stably connected synapses is approximately 0.015 synapses per 𝜇𝑚2 (around 60 synapses stabilized per dendritic tree, Figure 3b). This density compares well to experimental findings, where, especially during early development, synaptic densities are described to be within a range similar to the one observed in our model (Leighton et al., 2024; Ultanir et al., 2007; Glynn et al., 2011; Yang et al., 2014; Koshimizu et al., 2009; Tyler and Pozzo-Miller, 2001).”

      (III) Lastly, we investigated how the correlation between synapses of the same activity group might affect our conclusions. As correlations in our model mainly arise from patterns of spontaneous activity which are abundant in early postnatal development (retinal waves (Ackman et al., 2012) or endogenous activity in the form of highly synchronized events involving a large fraction of the cells (Siegel et al., 2012), we explored varying the correlations within each activity group, across activity groups and combinations of both. While this analysis supported our previously described intuition on how competition between synaptic activities should drive activity-dependent refinement, recently a study found direct evidence for such subcellular refinement of synaptic inputs specifically dependent on spontaneous activity between retinal ganglion cell axons and retinal waves in the superior colliculus (Matsumoto et al., 2024). The new analysis confirmed our earlier results that the competition between activity groups leads to activity-dependent refinement and yielded further insight into how the studied activity correlations can affect the competition. Those results are presented in a completely new figure (new Figure 5, supported by the Supplementary Figure 5-1 and 5-2) and discussed in the Results section:

      p.11 line 249, “Group activity correlations shape synaptic overshoot and selectivity competition across synaptic groups.

      Since correlations between synapses emerge from correlated patterns of spontaneous activity abundant during postnatal development (Ackman et al., 2012; Siegel et al., 2012), we explored a wide range of within-group correlations in our model (Figure 5a). Although a change in correlations within the group has only a minor effect on the resulting dendritic lengths (Figure 5b) and overall dynamics, it can change the density of connected synapses and thus also affect the number of connected synapses to which each dendrite converges throughout the simulations (Figure 5c,e). This is due to the change in specific selectivity of each dendrite which is a result of the change in within-group correlations (Figure 5d). While it is easier for perfectly correlated activity groups to coexist within one dendrite (Figure 5-Figure Supplement 1a, 100%), decreasing within-group correlations increases the competition between groups, producing dendrites that are selective for one specific activity group (60%, Figure 5d, Figure 5-Figure Supplement 1a). This selectivity for a particular activity group is maximized at intermediate (approximately 60%) within-group correlations, while the contribution of the second most abundant group generally remains just above random chance levels (Figure 5-Figure Supplement 1a). Further reducing within-group correlations (20%, Figure 5a) causes dendrites to lose their selectivity for specific activity groups due to the increased noise in the activity patterns (20%, Figure 5a). Overall, reducing within-group correlations increases synapse pruning (Figure 5f, bottom), also found experimentally (Matsumoto et al., 2024) as dendrites require an extended period to fine-tune connections aligned with their selectivity biases. This phenomenon accounts for the observed reduction in both the density and number of synapses connected to each dendrite.

      In addition to the within-group correlations, developmental spontaneous activity patterns can also change correlations between groups as for example retinal waves propagated in different domains (Feller et al., 1997) (Figure 5-Figure Supplement 2). An increase in between-group correlations in our model intuitively decreases competition between the groups since fully correlated global events synchronize the activity of all groups (Figure 5-Figure Supplement 2). The reduction in competition reduces pruning in the model, which can be recovered by combining cross-group correlations with decreased within-group correlations (Figure 5-Figure Supplement 2). Our simulations show that altering the correlations within activity groups increases competition (by lowering the within-group correlations) or decreases competition (by raising the across-group correlations). Hence, in our model, competition between activity groups due to non-trivially structured correlations is necessary to generate realistic dynamics between activity-independent growth and activity-dependent refinement or pruning.

      In sum, our simulations demonstrate that our model can operate under various correlations in the spike trains. We find that the level of competition between synaptic groups is crucial for the activity-dependent mechanisms to either potentiate or depress synapses and is fully consistent with recent experimental evidence showing that the correlation between spontaneous activity in retinal ganglion cells axons and retinal waves in the superior colliculus governs branch addition vs. elimination (Matsumoto et al., 2024)."

      Precise details on the implementation of the changed activity correlations were added to the Methods section:

      p. 25 line 638, “Within-group and across-group activity correlations. For the decreased withingroup correlations, we generated parent spike trains for each individual group with the firing rate 𝑟𝑖𝑛 = 𝑟𝑡𝑜𝑡𝑎𝑙 ∗ 𝑃𝑖𝑛 (e.g., 𝑃𝑖𝑛 = 100%; 60%; 20%, Figure 5). All the synapses of the same group share the same parent spike train and the remaining spikes for each synapse are uniquely generated with the firing rate 𝑟𝑟𝑒𝑠𝑡 = 𝑟𝑡𝑜𝑡𝑎𝑙 ∗ (1 − 𝑃𝑖𝑛) (e.g., (1 − 𝑃𝑖𝑛) = 0%; 40%; 80%), resulting in the desired firing rate 𝑟𝑡𝑜𝑡𝑎𝑙 (see Table 1). For the increase in across-group correlations, we generated one master spike train with the firing rate 𝑟𝑐𝑟𝑜𝑠𝑠 = 𝑟𝑡𝑜𝑡𝑎𝑙 ∗ 𝑃𝑐𝑟𝑜𝑠𝑠 for all the synapses of all groups (e.g., 𝑃𝑐𝑟𝑜𝑠𝑠 = 5%; 10%; 20%, Figure 5-Figure Supplement 2). This master spike train is shared across all groups and then filled up according to the within-group correlation (if not specified differently 𝑃𝑖𝑛 = 1 − 𝑃𝑐𝑟𝑜𝑠𝑠 to maintain the rate 𝑟𝑡𝑜𝑡𝑎𝑙). In all the cases, also in those where the change in across-group correlations is combined with the change in within-group correlations, the remaining spikes for each synapse are generated with a firing rate 𝑟𝑟𝑒𝑠𝑡 = 𝑟𝑡𝑜𝑡𝑎𝑙 ∗ (1 − 𝑃𝑖𝑛 − 𝑃𝑐𝑟𝑜𝑠𝑠) to obtain an overall desired firing rate of 𝑟𝑡𝑜𝑡𝑎𝑙.”

      Point 1.2. Many potentially important phenomena seem to be excluded. I realize that no model can be complete, but the choice of which phenomena to include or exclude from this model could bias studies that make use of it and is worth serious discussion. The development of axons is concurrent with dendrite outgrowth, is highly dynamic, and perhaps better understood mechanistically. In this model, the inputs are essentially static. Growing dendrites acquire and lose growth cones that are associated with rapid extension, but these do not seem to be modeled. Postsynaptic firing does not appear to be modeled, which may be critical to activity-dependent plasticity. For example, changes in firing are a potential explanation for the global changes in dendritic pruning that occur following the outgrowth phase.  

      Thanks to the reviewer for bringing up these important considerations. We do indeed write in the Introduction (e.g. lines 36-76) which phenomena we include in the model and why. The Discussion also compares our model to others (lines 433-490), pointing out that most models either focus on activity-independent or activity-dependent phases. We include both, combining the influence of both molecular gradients and growth factors as well as activity-dependent connectivity refinements instructed by spontaneous activity. We consider our model a tractable, minimalist mechanistic model which includes both activity-independent and activity-dependent aspects. 

      Regarding postsynaptic firing, this is indeed super relevant and an important point to consider. In one of our recent publications (Kirchner and Gjorgjieva, 2021), we studied only an activity-dependent model for the organization of synaptic inputs on non-growing dendrites which have a fixed length. There, we considered the effect of postsynaptic firing (via a back-propagating action potential) and demonstrated that it plays an important role in establishing a global organization of synapses on the entire dendritic tree of the neuron. For example, we showed that it could lead to the emergence of retinotopic maps on the dendritic tree which have been found experimentally (Iacaruso et al., 2017). Since we use the same activity-dependent plasticity model in this paper, we expect that the somatic firing will have the same effect on establishing synaptic distributions on the entire dendritic tree. This is now also discussed in the Discussion section of the revised manuscript:

      p. 21 line 491, “Although we did not explicitly model postsynaptic firing, our previous work with static dendrites has shown that it can play an important role in establishing a global organization of synapses on the entire dendritic tree of the neuron (Kirchner and Gjorgjieva, 2021). For example, we showed that it could lead to the emergence of retinotopic maps on the dendritic tree which have been found experimentally (Iacaruso et al., 2017). Since we use the same activity-dependent plasticity model in this paper, we expect that the somatic firing will have the same effect on establishing synaptic distributions on the entire dendritic tree.”

      Including the concurrent development of axons in the model is indeed very interesting. In fact, a recent tour-de-force techniques paper found similar to what we assume. Hebbian activity-dependent dynamics of axonal branches of retinal ganglion cells experiencing spontaneous activity in relation to retinal waves in the superior colliculus (Matsumoto et al., 2024). New branches tend to be added at the locations where spontaneous activity of individual branches is more correlated with retinal waves, whereas asynchronous activity is associated with branch elimination. We suspect the same Hebbian activity-dependent dynamics to apply also to dendritic growth. 

      To address simultaneous dynamic axons to our growing dendrites, in the revised version of the manuscript, we included a simplified form of axonal dynamics by allowing changes in the lifetime and location of potential synapses, which come from axons of presynaptic partners. We explored different median lifetimes of synapses in combination with several distances with which a synapse can move in the simulated space (new Supplementary Figure 3-3). Our results show that dynamically moving synapses only affect the dynamics and stability of our model when the rate of moving synapses combined with the distance of moving synapses is faster than the dendritic growth. In scenarios in which synapses can move across large distances, dendrites get further destabilized due to synapses transferring from one dendrite to another, perturbing the attractor fields of the potential synapses even in late phases of the simulations. Besides such non-biological scenarios, dynamically moving synapses do not affect the model dynamics too much. Thus, they mostly add additional noise and variability to the growth and pruning without changing the timing and amplitude of the dynamics. These results are discussed in the results section of the revised manuscript:

      p.9 line 207, “The development of axons is concurrent with dendritic growth and highly dynamic Matsumoto et al. (2024). To address the impact of simultaneously growing axons, we implemented a simple form of axonal dynamics by allowing changes in the lifetime and location of potential synapses, originating from the axons of presynaptic partners (Figure 3-Figure Supplement 3). When potential synapses can move rapidly (median lifetime of 1.8 hours), the model dynamics are perturbed quite substantially, making it difficult for the dendrites to stabilize completely (Figure 3–Figure Supplement 3c). However, slowly moving potential synapses (median lifetime of 18 hours) still yield comparable results (Figure 3-Figure Supplement 3). The distance of movement significantly influenced results only when potential synaptic lifetimes were short. For extended lifetimes, the moving distance had a minor impact on the dynamics, predominantly affecting the time required for dendrites to stabilize. This was the result of synapses being able to transfer from one dendrite to another, potentially forming new long-lasting connections even at advanced stages of synaptic refinement. In sum, our results show that potential axonal dynamics only affect the stability of our model when these dynamics are much faster than dendritic growth.”

      Precise details on the implementation of the dynamically moving synapses and their synaptic lifetimes are now in the Methods section:

      p. 25 line 650, “Dynamically moving synapses. For the moving synapses we introduced lifetimes for each synapse, randomly sampled from a log-normal distribution with median 1.8h (for when they move frequently), 4.5h or 18h (for when they move rarely) and variance equal to 1 (Figure 3-Figure Supplement 3b). The lifetime of a synapse decreases only when the synapse is not connected to any of the dendrites (i.e., is a potential synapse). When the lifetime of a synapse expires, the synapse moves to a new location with a new lifetime sampled from the same log-normal distribution. This enables synapses to move multiple times throughout a simulation. The exact locations and distances to which each synapse can move are determined by a binary matrix (dimensions: 𝑝𝑖𝑥𝑒𝑙𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 × 𝑝𝑖𝑥𝑒𝑙𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒) representing a ring (annulus) with the inner radius 𝑑/4 and outer radius 𝑑/2 , where the synapse location is at the center of the matrix. All the locations of the matrix within the ring boundaries (between the inner radius and outer radius) are potential locations to which the synapse can move. The synapse then moves randomly to one of the possible locations where no other synapse or dendrite is located. For the movement distances, we chose the ring dimensions 3 × 3, 25 × 25 and 101 × 101, yielding the moving distances (radii) of 1 pixel per movement, 12 pixels per movement and 50 pixels per movement (𝑟 = (𝑑−1)/2). These pixel distances represent small movements, as much as a dendrite can grow in one step (1 micron), and larger movements which are far enough so that the synapse will not attract the same branches again (12 microns) or far enough so that it might attract a completely different dendrite (50 microns, Figure 3-Figure Supplement 3a).”

      Point 1.3. Line 167. There are many ways to include activity -independent and -dependent components into a model and not every such model shows stability. A key feature seems to be that larger arbors result in reduced growth and/or increased retraction, but this could be achieved in many ways (whether activity dependent or not). It's not clear that this result is due to the combination of activity-dependent and independent components in the model, or conceptually why that should be the case.

      We never argued for model uniqueness. There are always going to be many different models (at different spatial and temporal scales, at different levels of abstraction). We can never study all of them and like any modeling study in systems neuroscience we have chosen one model approach and investigated this approach. We do compare the current model to others in the Discussion. If the reviewers have a specific implementation that we should compare our model to as an alternative, we could try, but not if this means doing a completely separate project.

      Point 1.4. Line 183. The explanation of overshoot in terms of the different timescales of synaptic additions versus activity-dependent retractions was not something I had previously encountered and is an interesting proposal. Have these timescales been measured experimentally? To what extent is this a result of fine-tuning of simulation parameters?  

      We found that varying the amount of BDNF controls the timescale of the activity-dependent plasticity (see our Figure 6c). Hence, changing the balance between synaptic additions vs. retractions is already explored in Figure 6e and f. Here we show that the overshoot and retraction does not have to be fine-tuned but may be abolished if there is too much activity-dependent plasticity. 

      Regarding the relative timescales of synaptic additions vs. retractions: since the first is mainly due to activity-independent factors, and the second due to activity-dependent plasticity, the questions is really about the timescales of the latter two. As we write in the Introduction (lines 61-63), manipulating activity-dependent synaptic transmission has been found to not affect morphology but rather the density and specificity of synaptic connections (Ultanir et al. 2007), supporting the sequential model we have (although we do not impose the sequence, as both activity-independent and activitydependent mechanisms are always “on”; but note that activity-dependent plasticity can only operate on synapses that have already formed).

      The described results are robust to parameter variations (performed on the postsynaptic density, potential synapse density, and within- and across-group correlations) as described in the reply to reviewer #1 point 1.1.

      Point 1.5. Line 203. This result seems at odds with results that show only a very weak bias in the tuning distribution of inputs to strongly tuned cortical neurons (e.g. work by Arthur Konnerth's group). This discrepancy should be discussed.  

      First, we note that the correlated activity experienced by our modeled synapses (and resulting synaptic organization) does not necessarily correspond to visual orientation, or any stimulus feature, for that matter, but is rather a property of correlated spontaneous activity. 

      Nonetheless, there is some variability in what the experimental data show. Many studies have shown that synapses on dendrites are organized into functional synaptic clusters: across brain regions, developmental ages and diverse species from rodent to primate (Kleindienst et al., 2011; Takahashi et al., 2012; Winnubst et al., 2015; Gökçe et al., 2016; Wilson et al., 2016; Iacaruso et al., 2017; Scholl et al., 2017; Niculescu et al., 2018; Kerlin et al., 2019; Ju et al., 2020, Hedrick et al., 2022, Hedrick et al., 2024). Interestingly, some in vivo studies have reported lack of fine-scale synaptic organization (Varga et al., 2011; X. Chen et al., 2011; T.-W. Chen et al., 2013; Jia et al., 2010; Jia et al., 2014), while others reported clustering for different stimulus features in different species. For example, dendritic branches in the ferret visual cortex exhibit local clustering of orientation selectivity but do not exhibit global organization of inputs according to spatial location and receptive field properties (Wilson et al. 2016; Scholl et al., 2017). In contrast, synaptic inputs in mouse visual cortex do not cluster locally by orientation, but only by receptive field overlap, and exhibit a global retinotopic organization along the proximal-distal axis (Iacaruso et al., 2017). We proposed a theoretical framework to reconcile these data: combining activity-dependent plasticity similar to the BDNF-proBDNF model that we used in the current work, and a receptive field model for the different species (Kirchner and Gjorgjieva, 2021). This is now also discussed in the Discussion section of the revised manuscript:

      p. 20 line 471, “The correlated activity experienced by our modeled synapses (and resulting synaptic organization) does not necessarily correspond to visual orientation, or any stimulus feature, for that matter, but is rather a property of spontaneous activity. Nonetheless, there is some variability in what the experimental data show. Many have shown that synapses on dendrites are organized into functional synaptic clusters: across brain regions, developmental ages and diverse species from rodent to primate (Kleindienst et al., 2011; Winnubst et al., 2015; Iacaruso et al., 2017; Scholl et al., 2017; Niculescu et al., 2018; Takahashi et al., 2012; Gökçe et al., 2016; Wilson et al., 2016; Kerlin et al., 2019; Ju et al., 2020; Hedrick et al., 2022, 2024). Other studies have reported lack of fine-scale synaptic organization (Chen et al., 2013; Varga et al., 2011; Chen et al., 2011; Jia et al., 2010, 2014). Interestingly, some of these discrepancies might be explained by different species showing clustering with respect to different stimulus features (orientation or receptive field overlap) (Scholl et al., 2017; Wilson et al., 2016; Iacaruso et al., 2017). Our prior work proposed a theoretical framework to reconcile these data: combining activity-dependent plasticity as we used in the current work, and a receptive field model for the different species (Kirchner and Gjorgjieva, 2021).”

      Point 1.6. Line 268. How does the large variability in the size of the simulated arbors relate to the relatively consistent size of arbors of cortical cells of a given cell type? This variability suggests to me that these simulations could be sensitive to small changes in parameters (e.g. to the density or layout of presynapses).  

      We again thank the reviewer for the detailed explanation and feedback on parameters that should be tested in more detail. We have explored several of the suggested model parameters and believe that we have managed to explain and illustrate their effects on the model's dynamics clearly. The precise changes are explained in the reply to point 1.1 and are now available in the revised version of the manuscript.

      Point 1.7. The modeling of dendrites as two-dimensional will likely limit the usefulness of this model. Many phenomena- such as diffusion, random walks, topological properties, etc - fundamentally differ between two and three dimensions.  

      Indeed, there are many differences between two and three dimensions. We have ongoing work that extends the current model to 3D but is beyond the scope of the current paper. In systems neuroscience, people have found very interesting results making such simplified geometric assumptions about networks, for instance the one-dimensional ring model has been used to uncover fundamental insights about computations even though highly simplified and abstracted. We are convinced that our model, especially with the new sensitivity analysis, makes interesting and novel contributions and predictions.

      Point 1.8. The description of wiring lengths as 'approximately optimal' in this text is problematic. The plotted data show that the wiring lengths are several deviations away from optimal, and the random model is not a valid instantiation of the 2D non-overlapping constraints the authors imposed. A more appropriate null should be considered.  

      We appreciate the reviewer’s feedback regarding the use of the term “approximately optimal” in describing wiring lengths. We acknowledge that our initial terminology was imprecise and could be misleading. We had previously referred to the minimal wiring length as the optimal wiring length, which does not fully capture the nuances of neuronal wiring optimization. As noted in prior literature, such as the work by Hermann Cuntz (Cuntz et al., 2010 & 2012), neurons can optimize their wiring beyond simply minimizing dendritic length.

      To address this issue, to better capture the balance between wiring minimization and functional constraints, such as conduction delays, we have developed a new modeling approach based on minimum spanning trees with a balancing factor (Cuntz et al., 2010 & 2012). This factor modulates the trade-off between minimizing wiring length and accounting for conduction delays from synapses to the soma. Specifically, the model assumes a balance between minimizing the total dendritic length and minimizing the tree distance between synapses and the site of input integration, typically the soma. This balance is illustrated in Figure 8 (Figure 7 in the original manuscript), where we demonstrate that the deviation from the theoretical minimum length arises because direct paths to synapses often require longer dendrites in our models.

      Together with the new result, which we added as the new panels f, g and h to Figure 8 (originally Figure 7), we also adjusted panel a of Figure 8, to now illustrate the difference between random wiring, minimal wiring and minimal conductance delay. The updated Figure 8 and its new findings are discussed in the results section of the revised manuscript:

      p.17 line 387, “This deviation is expected given that real dendrites need to balance their growth processes between minimizing wire while reducing conduction delays. The interplay between these two factors emerges from the need to reduce conduction delays, which requires a direct path length from a given synapse to the soma, consequently increasing the total length of the dendritic cable. (Cuntz et al., 2010, 2012; Ferreira Castro et al., 2020).

      To investigate this further, we compared the scaling relations of the final morphologies of our models with other synthetic dendritic morphologies generated using a previously described minimum spanning tree (MST) based model. The MST model balances the minimization of total dendritic length and the minimization of conduction delays between synapses and the soma. This balance results in deviations from the theoretical minimum length because direct paths to synapses often require longer dendrites (Cuntz et al., 2008, 2010). The balance in the model is modulated by a balancing factor (𝑏𝑓 ). If 𝑏𝑓 is zero, dendritic trees minimize the cable only, and if 𝑏𝑓 is one, they will try to minimize the conduction delays as much as possible. It is important to note that the MST model does not simulate the developmental process of dendritic growth; it is a phenomenological model designed to generate static morphologies that resemble real cells.

      To facilitate the comparison of total lengths between our simulated and MST morphologies, we generated MST models under the same initial conditions (synaptic spatial distribution) as our models and simulated them to match several morphometrics (total length, number of terminals, and surface area) of our grown morphologies. This allowed us to create a corresponding MST tree for each of our synthetic trees. Consequently, we could evaluate whether the branching structures of our models were accurately predicted by minimum spanning trees based on optimal wiring constraints. We found that the best match occurred with a trade-off parameter 𝑏𝑓 = 0.9250 (Figure 8f). Using the morphologies generated by the MST model with the specified trade-off parameter (𝑏𝑓 ), we showed that the square root of the synapse count and the total length (𝐿) in both our model generated trees and the MST trees exhibit a linear scaling relationship (Figure 8g; 𝑅2 = 0.65). The same linear relationship can be observed for the square root of the surface area and the total length 𝐿 of our model trees and the MST trees (Figure 8h; 𝑅2 = 0.73). Overall, these results indicate that our model generate trees are wellfitted by the MST model and follow wire optimization constraints.

      We acknowledge that the value of the balancing factor 𝑏𝑓 in our model is higher than the range of balancing factors that is typically observed in the biological dendritic counterparts, which generally ranges between 0.2 and 0.4 (Cuntz et al., 2012; Ferreira Castro et al., 2020; Baltruschat et al., 2020). However, it is still remarkable that our model, which does not explicitly address these two conservation laws, achieves approximately optimal wiring. Why do we observe such a high 𝑏𝑓 value? We reason that two factors may contribute to this. First, in our models, local branches grow directly to the nearest potential synapse, potentially taking longer routes instead of optimally branching to minimize wiring length (Wen and Chklovskii, 2008). Second, the growth process in our models does not explicitly address the tortuosity of the branches, which can increase the total length of the branches used to connect synapses. In the future, it will be interesting to add constraints that take these factors into account. Taken together, combining activity-independent and -dependent dendrite growth produces morphologies that approximate optimal wiring.”

      Further details on the fitted MST model and the corresponding analysis were added to the methods section:

      p.26 line 669, “Comparison with wiring optimization MST models. To evaluate the wire minimization properties of our model morphologies (n=288), we examined whether the number of connected synapses (N), total length (L), and surface area of the spanning field (S) conformed to the scaling law 𝐿 ≈ 𝜋−1/2 ⋅ 𝑆1/2 ⋅ 𝑁1/2 (Cuntz et al., 2012). Furthermore, to validate that our model dendritic morphologies scale according to optimal wiring principles, we created simplified models of dendritic trees using the MST algorithm with a balancing factor (bf). This balancing factor adjusts between minimizing the total dendritic length and minimizing the tree distance between synapses and the soma (Cost = 𝐿 + 𝑏𝑓 ⋅ 𝑃 𝐿) (MST_tree; best bf = 0.925) (Cuntz et al., 2010); TREES Toolbox http://www.treestoolbox.org).

      Initially, we generated MSTs to connect the same distributed synapses as our models. We performed MST simulations that vary the balancing factor between 𝑏𝑓 = 0 and 𝑏𝑓 = 1 in steps of 0.025 while calculating the morphometric agreement by computing the error (Euclidean distance) between the morphologies of our models and those generated by the MST models. The morphometrics used were total length, number of terminals, and surface area occupied by the synthetic morphologies.”

      Point 1.9. It's not clear to me what the authors are trying to convey by repeatedly labeling this model as 'mechanistic'. The mechanisms implemented in the model are inspired by biological phenomena, but the implementations have little resemblance to the underlying biophysical mechanisms. Overall my impression is that this is a phenomenological model intended to show under what conditions particular patterns are possible. Line 363, describing another model as computational but not mechanistic, was especially unclear to me in this context.  

      What we mean by mechanistic is that we implement equations that model specific mechanisms i.e. we have a set of equations that implement the activity-independent attraction to potential synapses (with parameters such as the density of synapses, their spatial influence, etc) and the activitydependent refinement of synapses (with parameters such as the ratio of BDNF and proBDNF to induce potentiation vs depression, the activity-dependent conversion of one factor to the other, etc). This is a bottom-up approach where we combine multiple elements together to get to neuronal growth and synaptic organization. This approach is in stark contrast to the so-called top-down or normative approaches where the method would involve defining an objective function (e.g. minimal dendritic length) which depends on a set of parameters and then applying a gradient descent or other mathematical optimization technique to get at the parameters that optimize the objective function. This latter approach we would not call mechanistic because it involves an abstract objective function (who could say what a neuron or a circuit should be trying to optimize?) and a mathematical technique for how to optimize the function (we don’t know if neurons can compute gradients of abstract objective functions). 

      Hence our model is mechanistic, but it does operate at a particular level of abstraction/simplification. We don’t model individual ion channels, or biophysics of synaptic plasticity (opening and closing of NMDA channels, accumulation of proteins at synapses, protein synthesis). We do, however, provide a biophysical implementation of the plasticity mechanism through the BDNF/proBDNF model which is more than most models of plasticity achieve, because they typically model a phenomenological STDP or Hebbian rule that just uses activity patterns to potentiate or depress synaptic weights, disregarding how it could be implemented. To the best of our understanding, this is what is normally considered mechanistic in the field (in contrast to, for example, biophysical).

      Reviewer #2 (Public Review): 

      This work combines a model of two-dimensional dendritic growth with attraction and stabilisation by synaptic activity. The authors find that constraining growth models with competition for synaptic inputs produces artificial dendrites that match some key features of real neurons both over development and in terms of final structure. In particular, incorporating distance-dependent competition between synapses of the same dendrite naturally produces distinct phases of dendritic growth (overshoot, pruning, and stabilisation) that are observed biologically and leads to local synaptic organisation with functional relevance. The approach is elegant and well-explained, but makes some significant modelling assumptions that might impact the biological relevance of the results. 

      Strengths: 

      The main strength of the work is the general concept of combining morphological models of growth with synaptic plasticity and stabilisation. This is an interesting way to bridge two distinct areas of neuroscience in a manner that leads to findings that could be significant for both. The modelling of both dendritic growth and distance-dependent synaptic competition is carefully done, constrained by reasonable biological mechanisms, and well-described in the text. The paper also links its findings, for example in terms of phases of dendritic growth or final morphological structure, to known data well. 

      Weaknesses: 

      The major weaknesses of the paper are the simplifying modelling assumptions that are likely to have an impact on the results. These assumptions are not discussed in enough detail in the current version of the paper. 

      (1) Axonal dynamics. 

      A major, and lightly acknowledged, assumption of this paper is that potential synapses, which must come from axons, are fixed in space. This is not realistic for many neural systems, as multiple undifferentiated neurites typically grow from the soma before an axon is specified (Polleux & Snider, 2010). Further, axons are also dynamic structures in early development and, at least in some systems, undergo activity-dependent morphological changes too (O'Leary, 1987; Hall 2000). This paper does not consider the implications of joint pre- and post-synaptic growth and stabilisation.  

      We thank the reviewer for the summary of the strengths and weaknesses of the work. While we feel that including a full model of axonal dynamics is beyond the scope of the current manuscript, some aspects of axonal dynamics can be included and are now implemented and tested in the revised manuscript. Since this feedback covers similar aspects of the model that were also pointed out by reviewer #1, we refer here to our detailed reply to their comments 1.1 and 1.2, where we list and discuss all the analyses performed to address the raised issues.

      (2) Activity correlations 

      On a related note, the synapses in the manuscript display correlated activity, but there is no relationship between the distance between synapses and their correlation. In reality, nearby synapses are far more likely to share the same axon and so display correlated activity. If the input activity is spatially correlated and synaptic plasticity displays distance-dependent competition in the dendrites, there is likely to be a non-trivial interaction between these two features with a major impact on the organisation of synaptic contacts onto each neuron.  

      We have explored the amount of correlation (between and within correlated groups) in the revised manuscript (see also our reply to reviewer comment 1.1).

      However, previous experimental work, (e.g. Kleindienst et al., 2011) has provided anatomical and functional analyses that it is unlikely that the functional synaptic clustering on dendritic branches is the result of individual axons making more than one synapse (see pg. 1019).

      (3) BDNF dynamics 

      The models are quite sensitive to the ratio of BDNF to proBDNF (eg Figure 5c). This ratio is also activity-dependent as synaptic activation converts proBDNF into BDNF. The models assume a fixed ratio that is not affected by synaptic activity. There should at least be more justification for this assumption, as there is likely to be a positive feedback relationship between levels of BDNF and synaptic activation.  

      The reviewer is correct. We used the BDNF-proBDNF model for synaptic plasticity based on our previous work (Kirchner and Gjorgjieva, 2021).  

      There, we explored only the emergence of functionally clustered synapses on static dendrites which do not grow. In the Methods section (Parameters and data fitting) we justify the choice of the ratio of BDNF to proBDNF from published experimental work. We also performed sensitivity analysis (Supplementary Fig. 1) and perturbation simulations (Supplementary Fig. 3), which showed that the ratio is crucial in regulating the overall amount of potentiation and depression of synaptic efficacy, and therefore has a strong impact on the emergence and maintenance of synaptic organization. Since we already performed all this analysis, we expect that the same results will also apply to the current model which includes dendritic growth, as it involves the same activity-dependent mechanism.

      A further weakness is in the discussion of how the final morphologies conform to principles of optimal wiring, which is quite imprecise. 'Optimal wiring' in the sense of dendrites and axons (Cajal, 1895; Chklovskii, 2004; Cuntz et al, 2007, Budd et al, 2010) is not usually synonymous with 'shortest wiring' as implied here. Instead, there is assumed to be a balance between minimising total dendritic length and minimising the tree distance (ie Figure 4c here) between synapses and the site of input integration, typically the soma. The level of this balance gives the deviation from the theoretical minimum length as direct paths to synapses typically require longer dendrites. In the model this is generated by the guidance of dendritic growth directly towards the synaptic targets. The interpretation of the deviation in this results section discussing optimal wiring, with hampered diffusion of signalling molecules, does not seem to be correct. 

      We agree with this comment. We had wrongly used the term “optimal wiring” as neurons can optimize their wiring not only by minimizing their dendritic length but other factors as noted by the reviewer. In the revised manuscript we replaced the term “optimal wiring” with “minimal wiring” wherever it was incorrectly used. On top of that, we performed further analysis and discussed these differences, as pointed out in the reply to reviewer #1 point 1.8.

      To summarize, we want to again thank the reviewer for their in-depth review and all the suggestions that helped us improve the analysis and implementation of our model.

      Reviewer #3 (Public Review): 

      The authors propose a mechanistic model of how the interplay between activity-independent growth and an activity-dependent synaptic strengthening/weaken model influences the dendrite shape, complexity and distribution of synapses. The authors focus on a model for stellate cells, which have multiple dendrites emerging from a soma. The activity independent component is provided by a random pool of presynaptic sites that represent potential synapses and that release a diffusible signal that promotes dendritic growth. Then a spontaneous activity pattern with some correlation structure is imposed at those presynaptic sites. The strength of these synapses follow a learning rule previously proposed by the lab: synapses strengthen when there is correlated firing across multiple sites, and synapses weaken if there is uncorrelated firing with the relative strength of these processes controlled by available levels of BDNF/proBDNF. Once a synapse is weakened below a threshold, the dendrite branch at that site retracts and loses its sensitivity to the growth signal 

      The authors run the simulation and map out how dendrites and synapses evolve and stabilize. They show that dendritic trees growing rapidly and then stabilize by balancing growth and retraction (Figure 2). They also that there is an initial bout of synaptogenesis followed by loss of synapses, reflecting the longer amount of time it takes to weaken a synapse (Figure 3). They analyze how this evolution of dendrites and synapses depends on the correlated firing of synapses (i.e. defined as being in the same "activity group"). They show that in the stabilized phase, synapses that remain connected to a given dendritic branch are likely to be from same activity group (Figure 4). The authors systemically alter the learning rule by changing the available concentration of BDNF, which alters the relative amount of synaptic strengthening, which in turn affects stabilization, density of synapses and interestingly how selective for an activity group one dendrite is (Figure 5). In addition the authors look at how altering the activity-independent factors influences outgrowth (Figure 6). Finally, one of the interesting outcomes is that the resulting dendritic trees represent "optimal wiring" solutions in the sense that dendrites use the shortest distance given the distribution of synapses. They compare this distribute to one published data to see how the model compared to what has been observed experimentally.  

      There are many strengths to this study. The consequence of adding the activity-dependent contribution to models of synapto- and dendritogenesis is novel. There is some exploration of parameters space with the motivation of keeping the parameters as well as the generated outcomes close to anatomical data of real dendrites. The paper is also scholarly in its comparison of this approach to previous generative models. This work represented an important advance to our understanding of how learning rules can contribute to dendrite morphogenesis.

      We thank the reviewer for the positive evaluation of the work and the suggestions below.

      To improve the clarity of the manuscript, we adjusted and fixed some figures and corresponding paragraphs as follows:

      (1) We increased the number of ticks and their corresponding numbers in all the figures to make them easier to read and interpret.

      (2) In Figure 3 panel d, showing the evolution of synaptic weight, we corrected the upper limit at the yaxis to 1 (from previously 2).

      (3) Due to a typo in the implementation of the BDNF concentration, we had to correct the used BDNF concentrations from 49%, 45% and 40%, to 49%, 46.5% and 43% respectively.

      (4) The y-axis labels of Figure 6 (old Figure 5) panel e and f were changed to make the plots clearer (e: “morphology change explained (%)” to "effect on morphology (%)", and f: “synapse connection explained (%)” to "effect on connected synapses (%)").

      (5) The values for the eta and tau-w in the supplementary Table were corrected. Previously tau-w was falsely 6000 time steps which was corrected to 3000 time steps, and eta was 45% and is now 46.5%.

      We believe that all the changes to the manuscript will address the reviewer’s concerns and enhance the clarity and accuracy of the findings described in the manuscript.

    1. Author response:

      We thank the reviewers for their thoughtful comments. We are working to revise our manuscript and address each of the reviewers comments. A summary of our planned revisions and responses to some of the reviewers’ major concerns are included below.

      Cultivation Density: Reviewers #1 and #2 suggested that additional studies testing the effects of varying bacterial density during animal development (cultivation) would strengthen our findings. While we agree with the reviewers that this is a very interesting experiment, it is not feasible. Indeed, we attempted this experiment but found it nontrivial to maintain stable bacterial density conditions over long timescales as this requires matching the rate of bacterial growth with the rate of bacterial consumption. Despite our best efforts, we have not been able to identify conditions that satisfy these requirements. We will focus our revised manuscript to include only assertions about the effects of recent experiences.

      Transfer Method: Reviewers #1 and #2 expressed concern that the stress of transferring animals to a new plate may have resulted in an increased arousal state and thus a greater probability of rejecting patches. We thank the reviewers for this thoughtful remark and plan to conduct additional analyses to address this hypothesis. We did, however, anticipate this possibility and, to mitigate the stress of moving, we used an agar plug method where animals were transferred using the flat surface of small cylinders of agar. Importantly, the use of agar as a medium to transfer animals provides minimal disruption to their environment as all physical properties (e.g. temperature, humidity, surface tension) are maintained. Qualitatively, we observe no marked change in behavior from before to after transfer with the agar plug method, especially as compared to the often drastic changes observed when using a metal or eyelash pick.

      Time Parameter: Related to the transfer method, Reviewer #1 expressed concern that the simplest time parameter (time since start of the assay) might better predict animal behavior. We thank the reviewer for pointing out the need to specifically test whether the time-dependent change in explore-exploit decision-making corresponds better with satiety (time off patch) or arousal (time since transfer/start of assay) state. We will conduct additional analyses to address these alternative hypotheses.

      Parameter Initialization: Reviewer #1 pointed out an oversight in our methods section regarding the model parameter values used for the first encounter. We plan to clarify the initialization of parameters in the manuscript. In short, for the first patch encounter where k = 1:

      ρk is the relative density of the first patch.

      τs is the duration of time spent off food since the beginning of the recorded experiment. For the first patch, this is equivalent to the total time elapsed.

      ρh is the approximated relative density of the bacterial patch on the acclimation plates (see Assay preparation and recording in Methods). Acclimation plates contained one large 200 µL patch seeded with OD600 = 1 and grown for a total of ~48 hours. As with all patches, the relative density was estimated from experiments using fluorescent bacteria OP50-GFP as described in Bacterial patch density estimation in Methods.

      ρe is equivalent to ρh.

      Sensing vs. non-sensing: Reviewer #3 suggested that the term “non-sensing” may not be ethologically accurate. We thank the reviewer for their comment and agree that we do not know for certain whether the animals sensed these patches or were merely non-responsive to them. We are, however, confident that these encounters lack evidence of sensing. Specifically, we note that our analyses used to classify events as sensing or non-sensing examined whether an animal’s slow-down upon patch entry could be distinguished from either that of events where animals exploited or that of encounters with patches lacking bacteria. We found that  “non-sensing” encounters are indeed indistinguishable from encounters with bacteria-free patches where there are no bacteria to be sensed (see Figure 2 - Supplement 7C-D and Patch encounter classification as sensing or non-sensing in Methods). Regardless, we agree with the reviewer that all that can be asserted for certain about these events is that animals do not respond to the bacterial patch in any way that we measured. Therefore, we will replace the term “non-sensing” with “non-responding” to better indicate the ethological interpretation of these events.

      Time-dependent changes in sensing vs. non-sensing: Reviewer #1 remarked that the sensation of dilute patches increases with time. We agree with the reviewer that we observe increased responsiveness to dilute patches with time. Although this is interesting, our primary focus was on what decision an animal made given that they clearly sensed the presence of the bacterial patch. Nonetheless, we will add this observation to the discussion as an area of future work to investigate the sensory mechanisms behind this effect.

      Classification of sensing vs. non-sensing: Reviewers #2 and #3 expressed concerns about the validity of the two clusters identified using the semi-supervised QDA approach described. We are grateful to the reviewers for pointing out the difficulty in visualizing the clusters and the need for additional clarity in explaining the supervised labeling. We will use additional visualizations and methods to validate the clusters we have discovered. Specifically, we aim to provide additional evidence that the sensing vs. nonsensing data is bi-modal (i.e. a two-cluster classification method fits best). Further, it seems that there may be some confusion as to how we arrived at 3 encounter types (i.e. search, sample, exploit) that we plan to clarify in the manuscript. Specifically, it’s important to note that two methods were used on two different (albeit related) sets of parameters. We first used a two-cluster GMM to classify encounters as explore or exploit. We then used a two-cluster semi-supervised QDA to classify encounters as sensing or non-sensing (to be changed to “non-responding”, see above response) using a different set of parameters. We thus separated the explore cluster into two (sensing and non-sensing exploratory events) resulting in three total encounter types: exploit, sample (explore/sensing), and search (explore/non-sensing). We will clarify this in the text. Additionally, we will clarify the labelling used for “supervising” QDA. Specifically, we made two simple assumptions: 1) animals must have sensed the patch if they exploited it and 2) animals must not have sensed the patch if there were no bacteria to sense. Thus, we labeled encounters as sensing if they were found to be exploitatory as we assume that sensation is prerequisite to exploitation; and we labeled encounters as non-sensing for events where animals encountered patches lacking bacteria (OD600 = 0). All other points were non-labeled prior to learning the model. In this way, our labels were based on the experimental design and results of the GMM, an unsupervised method; rather than any expectations we had about what sensing should look like. The semi-supervised QDA method then used these initial labels to iteratively fit a paraboloid that best separated these clusters, by minimizing the posterior variance of classification.

      Accept-reject vs. stay-switch: Reviewers #1 and #2 ask for additional discussion on how the accept-reject decision-making framework differs from the stay-switch framework. We thank the reviewers for alerting us to this gap in our discussion. We intend to clarify that these frameworks ask two different types of questions (i.e. “Do you want to eat it?” versus “If so, how long do you want to eat it for?”). These concepts are well described in canonical foraging theory literature (see Pyke, Pulliam & Charnov 1977 for a review on the subject) and are easily distinguishable for animals that forage using the following framework: 1) search for prey, 2) encounter prey from a distance, 3) identify prey type, 4) decide to pursue (accept-reject decision), 5) pursue and capture the prey, 6) exploit prey, and 7) decide to stop exploiting and start searching again (stay-switch decision). In this case, it is easy to see the distinction between accept-reject and stay-switch decisions. However, in some scenarios, animals must physically encounter prey prior to identification and then must make an accept-reject decision. In these cases where pursuit and capture are not visualized, it is harder to distinguish between accept-reject and stay-switch decisions. In our experiments, we find significant bimodality in encounter duration (see Figure 2H) where short duration (exploratory) encounters appear to represent a lower bound where animals spend the minimum amount of time possible on a patch (less than 2 minutes), which we interpret as a rejection of the patch. On the other hand, exploitatory encounters span a large range of durations from 2 to 60+ minutes which we interpret as an initial acceptance of the patch followed by a series of stay-switch decisions which determine the overall duration of the encounter. While one could certainly model our data using only stay-switch decision-making, we ascertain that an encounter of minimal duration is better interpreted ethologically as a rejection than as an immediate switch decision. We will revise the text to further extrapolate upon our point of view on this somewhat philosophical distinction and what it predicts about C. elegans behavior.

      Sensory mutant behavior: Reviewers #1 and #3 ask for further speculation on the observed behavior of osm-6 and mec-4 animals. We will further elaborate on our findings, how they relate to previous studies, and what they suggest about the mechanisms behind these foraging decisions.

      Model design: Reviewer #3 suggested several alterations to the behavioral model. While the proposed model seems entirely reasonable and could aid in elucidating the time component of how prior experience affects decision-making, we chose the present model based on our experience with model selection using these data. Indeed, as the reviewer suggested, we did a great number of analyses involving model selection including model selection criteria (AIC, BIC) and optimization with regularization techniques (LASSO and elastic nets). We found that the problem of model selection was compounded by the enormous array of highly correlated variables we had to choose from. Additionally, we found that both interaction terms and non-linear terms of our task variables could be predictive of accept-reject decisions but that the precise set of terms selected depended sensitively on which model selection technique was used and generally made rather small contributions to prediction. The diverse array of results and combinatorial number of predictors to possibly include failed to add anything of interpretable value. We therefore chose to take a different approach to this problem. Rather than trying to determine what the “best” model was we instead asked whether a minimal model could be used to answer a set of core questions. Indeed, our goal was not maximal predictive performance but rather to distinguish between the effects of different influences enough to determine if encounter history had a significant, independent effect on decision making. We thus chose to only include task variables that spanned the most basic components of behavioral mechanisms to ask very specific questions. For example, we selected a time variable that we thought best encapsulated satiety. While we could have included many additional terms, or made different choices about which terms to include, based on our analyses these choices would not have qualitatively changed our results. Further, we sought to validate the parameters we chose with additional studies (i.e. food-deprived and sensory mutant animals). We regard our study as an initial foray into demonstrating accept-reject decision-making in nematodes. The exact mechanisms and, consequently, the best model design is therefore beyond the scope of this study. Lastly, Reviewer #3 criticized the use of only sensed patches in the model. While we acknowledge that we are not certain as to whether the “non-sensing” encounters are truly not sensed, we find qualitatively similar results when including all exploratory patches in our analyses. In fact, when all encounters are used, we find stronger correlations between our task variables and the accept-reject decision. However, we take the position that sensation is necessary for decision-making and thus believe that while our model’s predictive performance may be better using all encounters, the interpretation of our findings is stronger when we only include sensing events.

    1. Author response:

      First of all, I'd like to express my heartfelt thanks to you for your meticulous and professional review comments. Your feedback is very important to our work. It not only helps us identify the shortcomings in the paper, but also provides valuable guidance for improving the quality of the paper.

      We carefully read every suggestion you made and were deeply inspired. Please rest assured that we will carefully consider and revise each opinion to ensure that our research work is more rigorous and clear. We promise to revise the manuscript accordingly to meet the standards of the journal and enhance the credibility and influence of the research.

      The main modifications include the experiment of A Mid1 supplementation experiment in Mid1 knockout micesupplementing Mid1 in Mid1 knockout mice; Detection of kinases such as CaMKII, PKA and ERK1/2; Supplementary references; Supplement the behavioral experiment of new object recognition; Electrophysiological measurement experiment of supplementing LTP; Supplementary neuron-specific immunohistochemical staining experiment; Supplementing the information of knockout mice used in the study; Modify the language expression of the article and the problem of too few pictures.

      Thank you again for your valuable time and professional advice. We look forward to submitting the revised manuscript to you for further review.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Cesar, Santos & Cogni use a meta-analysis to report on the direction and magnitude of three fundamental fitness components in defensive symbioses. Specifically, the work focuses on interactions between three arthropod host families (Aphididae, Culicidae, Drosophilidae, and others) and common bacterial endosymbionts (Wolbachia, Serratia, Hamiltonella, Spiroplasma, Rickettsia, Regiella X-type and Arsenophonus). The results of the overall analysis confirm common assumptions and previous work on such fitness components, showing that defensive symbionts provide strong protection to hosts and cause detectable costs to both hosts and the enemy. The analysis provides insight into the extent of the cost/benefit tradeoff for hosts, reporting that the cost is six times lower than the protective effect. The confirmation that natural enemies attacking hosts infected with symbionts have a reduction in their fitness is also an interesting one, as this shows that the majority of defensive symbionts provide protection by resisting enemy infection, as opposed to tolerating it. This finding has important consequences for evolutionary counter-responses in the enemy species. Of course, this result has less relevance for certain types of enemies (such as parasitoids) where successful infection is dependent upon host killing.

      Interesting results also emerge from the subgroup analysis. For the full dataset, both natural and introduced symbionts were similarly effective in positively influencing the fitness of hosts. However, in the Wolbachia-specific analysis, the artificially introduced symbionts caused costs to the hosts where the natural strain did not. These findings have potentially important ramifications for schemes that use endosymbionts for biocontrol or vector competence, suggesting that (in some cases) natural strains may be the more stable choice for deploying (as they are associated with lower costs).

      The analysis draws from an impressively large dataset, but the interpretation of the full impact of the results would be helped by greater detail on the species/strain level systems included, the data extraction approach, and inclusion criteria. Accounting for phylogenetic nonindependence and alternative coding of one of the moderator variables could also strengthen the biological relevance of the models. Suggestions and thoughts are outlined below.

      We sincerely thank Reviewer #1 for the time and effort dedicated to reviewing our manuscript. The suggestions provided are highly constructive and will greatly assist us in improving both our analyses and the manuscript overall.

      Strengths & Potential Improvements:

      An impressively large number of effect sizes (3000) from only 226 studies is collected, robustly confirming common assumptions on the magnitude of fundamental fitness components. However the paper would benefit from a clear breakdown in the main text of the specificities of each system included (e.g. a table at the host species/symbiont strain level, where it is possible). Currently, there is not enough detail for those who want a deep dive to understand what data was extracted for the analysis from these 226 studies, or those who want to understand the underlying diversity in the dataset.

      We thank the reviewer for the suggestion, and we will add this information to our revised manuscript.

      Currently, when the 'natural enemy group' is tested as a moderator it is coded broadly by type of organism (e.g. virus, bacterium, fungi, parasitoid). But this doesn't adequately capture the mode of killing/fitness reduction by the enemy, which would be the much more biologically relevant categorisation for your questions. For example, parasitoid infection is dependent upon host death (thus host fecundity is not relevant, because the host either survived or did not). Among bacterial and viral pathogens antagonists there is scope for both fecundity and survival to be affected. This in turn may be a very influential factor for the outcome. You could consider recoding this enemy moderator.

      We agree, and we will implement this in the analysis to our revised manuscript.

      The analysis is restricted to arthropod hosts and defensive symbionts that are also classed as endosymbionts. This focus should be made clear early on in the paper, as there are many systems (that are classed by many as defensive symbioses) that are not part of the analysis.

      We agree, and we will implement this to our revised manuscript.

      There is fairly minimalistic testing of moderators/sub-groups (which probably has its statistical strengths) but perhaps there are also some missed opportunities for testing other ecological contributors to variance, including coinfection (although perhaps limited by power) and other approaches to coding enemy group (as detail above).

      We agree, and we will implement this in the analysis to our revised manuscript.

      Looking at the overview of systems included, there's likely a high degree of phylogenetic non-independence in the dataset. Where it is possible, using phylogenetically controlled models could strengthen this analysis.

      We thank the reviewer for the suggestion. We will explore the possibility of using phylogenetically controlled models in our analyses, although we recognize the challenges associated with their implementation, particularly in the case of the natural enemies, given the great diversity of distant related groups included in our study - viruses, bacteria, fungi, protozoans, nematodes and parasitoids wasps.

      Looking at your included systems (Table S5), you might be able to test the effect of coinfection on the 3 variables of interest. For example, it would be particularly important to see if the effects of two symbionts are additive or not.

      We agree, and we will implement this in the analysis to our revised manuscript.

      No code for the analysis is provided for review at this stage and full details of the dataset are also not available. This slightly limits the ability to assess the full scope and robustness of the study. It would be helpful to have an extensive table in the supplementary detailing (minimum) the reference, study, experiment, host species, symbiont strain, and a description of the exact data extraction source (e.g.table/figure/in text), and method of extraction.

      The code for the analysis and the full raw data with the suggested information are available at https://github.com/cassiasqr/MetaSymbiont (The link is available at the end of the manuscript).

      Reviewer #2 (Public review):

      Summary:

      In this exciting study, Cesar and co-authors perform a meta-analysis on the influence of arthropod symbionts on the fitness of their hosts when they are exposed or not to natural enemies. These so-called defensive symbionts are increasingly recognized as key elements in arthropod survival against natural enemies, with effects that ripple through entire terrestrial ecosystems. The topic is timely, the approach is sound, and the manuscript is well-written. I believe this manuscript will attract the attention of entomologists and of microbiologists interested in symbiosis. This study builds on a previous meta-analysis that I was involved in, which was based on phloem-feeding insects. This novel data set is much larger and includes flies (including the model system Drosophila) and mosquitoes (a group of high medical interest). While the previous metaanalysis considered only parasitoids as natural enemies, this study also includes fungi, bacteria, and viruses.

      Strengths:

      The authors compile a very large dataset and provide a broad quantitative overview of the effects of defensive symbionts in insects. By measuring symbiont effects in the presence and absence of natural enemies, the authors are able to infer whether a trade-off between defense and the costs of mutualism in the absence of enemy pressure exists. Defensive symbioses are an important research topic that had its initial "momentum" a decade ago, so the timing for such a systematic review is very appropriate.

      We sincerely thank Reviewer #2 for dedicating their time and effort to reviewing our manuscript. The suggestions are very insightful and will significantly contribute to improving our manuscript.

      Weaknesses:

      I think the manuscript could be improved by clarifying several sections, particularly the introduction and methods. The introduction section is too specific and heavily reliant on particular examples. In my view, the theoretical background of the study could be made clearer, and the knowledge gap identified more explicitly. A focus on how widespread defensive symbioses are, along with a brief, up-to-date review of the groups possessing such symbionts, would help. This lack of focus is also observed in the methods section, where more details are needed in many instances to better understand how data was collected and analyzed. Regarding the analyses, the multi-level analysis contains many moderators, but it's unclear why these moderators were included. While this may seem a minor issue, it highlights a disconnection between the analyses, the conceptual background, and the hypotheses tested. 

      We thank the reviewer for the suggestions, and we will try to make the introduction and the methods section clearer. 

      Another important weakness is that the analyses are too general, and much-hidden information is not immediately apparent. For instance, readers cannot easily identify which species of symbionts are studied (and the effects they have), or which natural enemies are involved. Although this information is found in the supplementary material, including it in the main body would significantly improve the manuscript.

      We agree, and we will implement this to our   revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      In this revision, the authors significantly improved the manuscript. They now address some of my concerns. Specifically, they show the contribution of end-effects on spreading the inputs between dendrites. This analysis reveals greater applicability of their findings to cortical cells, with long, unbranching dendrites than other neuronal types, such as Purkinje cells in the cerebellum.

      They now explain better the interactions between calcium and voltage signals, which I believe improve the take-away message of their manuscript. They modified and added new figures that helped to provide more information about their simulations.

      However, some of my points remain valid. Figure 6 shows depolarization of ~5mV from -75. This weak depolarization would not effectively recruit nonlinear activation of NMDARs. In their paper, Branco and Hausser (2010) showed depolarizations of ~10-15mV.

      More importantly, the signature of NMDAR activation is the prolonged plateau potential and activation at more depolarized resting membrane potentials (their Figure 4). Thus, despite including NMDARs in the simulation, the authors do not model functional recruitment of these channels. Their simulation is thus equivalent to AMPA only drive, which can indeed summate somewhat nonlinearly.

      In the current study, we used short sequences of 5 inputs, since the convergence of longer sequences is extremely unlikely in the network configurations we have examined. This resulted in smaller EPSP amplitudes of ~5mV (Figure 6 - Supplement 2A, B). Longer sequences containing 9 inputs resulted in larger somatic depolarizations of ~10mV (Figure 6 - Supplement 2E, F). Although we had modified the (Branco, Clark, and Häusser 2010) model to remove the jitter in the timing of arrival of inputs and made slight modifications to the location of stimulus delivery on the dendrite, we saw similar amplitudes when we tested a 9-length sequence using (Branco, Clark, and Häusser 2010)’s published code (Figure 6 - Supplement 2I, J). In all the cases we tested (5 input sequence, 9 input sequence, 9 input sequence with (Branco, Clark, and Häusser 2010) code repository), removal of NMDA synapses lowered both the somatic EPSPs (Figure 6 - Supplement 2C,D,G,H,K,L) as well as the selectivity (measured as the difference between the EPSPs generated for inward and outward stimulus delivery) (Figure 6 Supplement 2M,N,O). Further, monitoring the voltage along the dendrite for a sequence of 5 inputs showed dendritic EPSPs in the range of 20-45 mV (Figure 6 - Supplement 2P, Q), which came down notably (10-25mV) when NMDA synapses were abolished (Figure 6 - Supplement 2R, S). Thus, even sequences containing as few as 5 inputs were capable of engaging the NMDA-mediated nonlinearity to show sequence selectivity, although the selectivity was not as strong as in the case of 9 inputs.

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      Figure 8, what does the scale in A represent? I assume it is voltage, but there are no units. Figure 8, C, E, G, these are unconventional units for synaptic weights, usually, these are given in nS / per input.

      We have corrected these. The scalebar in 8A represents membrane potential in mV. The units of 8C,E,G are now in nS.

      Reviewer #2 (Public Review):

      Summary:

      If synaptic input is functionally clustered on dendrites, nonlinear integration could increase the computational power of neural networks. But this requires the right synapses to be located in the right places. This paper aims to address the question of whether such synaptic arrangements could arise by chance (i.e. without special rules for axon guidance or structural plasticity), and could therefore be exploited even in randomly connected networks. This is important, particularly for the dendrites and biological computation communities, where there is a pressing need to integrate decades of work at the single-neuron level with contemporary ideas about network function.

      Using an abstract model where ensembles of neurons project randomly to a postsynaptic population, back-of-envelope calculations are presented that predict the probability of finding clustered synapses and spatiotemporal sequences. Using data-constrained parameters, the authors conclude that clustering and sequences are indeed likely to occur by chance (for large enough ensembles), but require strong dendritic nonlinearities and low background noise to be useful.

      Strengths:

      (1) The back-of-envelope reasoning presented can provide fast and valuable intuition. The authors have also made the effort to connect the model parameters with measured values. Even an approximate understanding of cluster probability can direct theory and experiments towards promising directions, or away from lost causes.

      (2) I found the general approach to be refreshingly transparent and objective. Assumptions are stated clearly about the model and statistics of different circuits. Along with some positive results, many of the computed cluster probabilities are vanishingly small, and noise is found to be quite detrimental in several cases. This is important to know, and I was happy to see the authors take a balanced look at conditions that help/hinder clustering, rather than to just focus on a particular regime that works.

      (3) This paper is also a timely reminder that synaptic clusters and sequences can exist on multiple spatial and temporal scales. The authors present results pertaining to the standard `electrical' regime (~50-100 µm, <50 ms), as well as two modes of chemical signaling (~10 µm, 100-1000 ms). The senior author is indeed an authority on the latter, and the simulations in Figure 5, extending those from Bhalla (2017), are unique in this area. In my view, the role of chemical signaling in neural computation is understudied theoretically, but research will be increasingly important as experimental technologies continue to develop.

      Weaknesses:

      (1) The paper is mostly let down by the presentation. In the current form, some patience is needed to grasp the main questions and results, and it is hard to keep track of the many abbreviations and definitions. A paper like this can be impactful, but the writing needs to be crisp, and the logic of the derivation accessible to non-experts. See, for instance, Stepanyants, Hof & Chklovskii (2002) for a relevant example.

      It would be good to see a restructure that communicates the main points clearly and concisely, perhaps leaving other observations to an optional appendix. For the interested but time-pressed reader, I recommend starting with the last paragraph of the introduction, working through the main derivation on page 7, and writing out the full expression with key parameters exposed. Next, look at Table 1 and Figure 2J to see where different circuits and mechanisms fit in this scheme. Beyond this, the sequence derivation on page 15 and biophysical simulations in Figures 5 and 6 are also highlights.

      We appreciate the reviewers' suggestions. We have tightened the flow of the introduction. We understand that the abbreviations and definitions are challenging and have therefore provided intuitions and summaries of the equations discussed in the main text.

      Clusters calculations

      Our approach is to ask how likely it is that a given set of inputs lands on a short segment of dendrite, and then scale it up to all segments on the entire dendritic length of the cell.

      Thus, the probability of occurrence of groups that receive connections from each of the M ensembles (PcFMG) is a function of the connection probability (p) between the two layers, the number of neurons in an ensemble (N), the relative zone-length with respect to the total dendritic arbor (Z/L) and the number of ensembles (M).

      Sequence calculations

      Here we estimate the likelihood of the first ensemble input arriving anywhere on the dendrite, and ask how likely it is that succeeding inputs of the sequence would arrive within a set spacing.

      Thus, the probability of occurrence of sequences that receive sequential connections (PcPOSS) from each of the M ensembles is a function of the connection probability (p) between the two layers, the number of neurons in an ensemble (N), the relative window size with respect to the total dendritic arbor (Δ/L) and the number of ensembles (M).

      (2) I wonder if the authors are being overly conservative at times. The result highlighted in the abstract is that 10/100000 postsynaptic neurons are expected to exhibit synaptic clustering. This seems like a very small number, especially if circuits are to rely on such a mechanism. However, this figure assumes the convergence of 3-5 distinct ensembles. Convergence of inputs from just 2 ense mbles would be much more prevalent, but still advantageous computationally. There has been excitement in the field about experiments showing the clustering of synapses encoding even a single feature.

      We agree that short clusters of two inputs would be far more likely. We focused our analysis on clusters with three of more ensembles because of the following reasons:

      (1) The signal to noise in these clusters was very poor as the likelihood of noise clusters is high.

      (2) It is difficult to trigger nonlinearities with very few synaptic inputs.

      (3) At the ensemble sizes we considered (100 for clusters, 1000 for sequences), clusters arising from just two ensembles would result in high probability of occurrence on all neurons in a network (~50% in cortex, see p_CMFG in figures below.). These dense neural representations make it difficult for downstream networks to decode (Foldiak 2003).

      However, in the presence of ensembles containing fewer neurons or when the connection probability between the layers is low, short clusters can result in sparse representations (Figure 2 - Supplement 2). Arguments 1 and 2 hold for short sequences as well.

      (3) The analysis supporting the claim that strong nonlinearities are needed for cluster/sequence detection is unconvincing. In the analysis, different synapse distributions on a single long dendrite are convolved with a sigmoid function and then the sum is taken to reflect the somatic response. In reality, dendritic nonlinearities influence the soma in a complex and dynamic manner. It may be that the abstract approach the authors use captures some of this, but it needs to be validated with simulations to be trusted (in line with previous work, e.g. Poirazi, Brannon & Mel, (2003)).

      We agree that multiple factors might affect the influence of nonlinearities on the soma. The key goal of our study was to understand the role played by random connectivity in giving rise to clustered computation. Since simulating a wide range of connectivity and activity patterns in a detailed biophysical model was computationally expensive, we analyzed the exemplar detailed models for nonlinearity separately (Figures 5, 6, and new figure 8), and then used our abstract models as a proxy for understanding population dynamics. A complete analysis of the role played by morphology, channel kinetics and the effect of branching requires an in-depth study of its own, and some of these questions have already been tackled by (Poirazi, Brannon, and Mel 2003; Branco, Clark, and Häusser 2010; Bhalla 2017). However, in the revision, we have implemented a single model which incorporates the range of ion-channel, synaptic and biochemical signaling nonlinearities which we discuss in the paper (Figure 8, and Figure 8 Supplement 1, 2,3). We use this to demonstrate all three forms of sequence and grouped computation we use in the study, where the only difference is in the stimulus pattern and the separation of time-scales inherent in the stimuli.

      (4) It is unclear whether some of the conclusions would hold in the presence of learning. In the signal-to-noise analysis, all synaptic strengths are assumed equal. But if synapses involved in salient clusters or sequences were potentiated, presumably detection would become easier? Similarly, if presynaptic tuning and/or timing were reorganized through learning, the conditions for synaptic arrangements to be useful could be relaxed. Answering these questions is beyond the scope of the study, but there is a caveat there nonetheless.

      We agree with the reviewer. If synapses receiving connectivity from ensembles had stronger weights, this would make detection easier. Dendritic spikes arising from clustered inputs have been implicated in local cooperative plasticity (Golding, Staff, and Spruston 2002; Losonczy, Makara, and Magee 2008). Further, plasticity related proteins synthesized at a synapse undergoing L-LTP can diffuse to neighboring weakly co-active synapses, and thereby mediate cooperative plasticity (Harvey et al. 2008; Govindarajan, Kelleher, and Tonegawa 2006; Govindarajan et al. 2011). Thus if clusters of synapses were likely to be co-active, they could further engage these local plasticity mechanisms which could potentiate them while not potentiating synapses that are activated by background activity. This would depend on the activity correlation between synapses receiving ensemble inputs within a cluster vs those activated by background activity. We have mentioned some of these ideas in a published opinion paper (Pulikkottil, Somashekar, and Bhalla 2021). In the current study, we wanted to understand whether even in the absence of specialized connection rules, interesting computations could still emerge. Thus, we focused on asking whether clustered or sequential convergence could arise even in a purely randomly connected network, with the most basic set of assumptions. We agree that an analysis of how selectivity evolves with learning would be an interesting topic for further work.

      References

      Bhalla, Upinder S. 2017. “Synaptic Input Sequence Discrimination on Behavioral Timescales Mediated by Reaction-Diffusion Chemistry in Dendrites.” Edited by Frances K Skinner. eLife 6 (April):e25827. https://doi.org/10.7554/eLife.25827.

      Branco, Tiago, Beverley A. Clark, and Michael Häusser. 2010. “Dendritic Discrimination of Temporal Input Sequences in Cortical Neurons.” Science (New York, N.Y.) 329 (5999): 1671–75. https://doi.org/10.1126/science.1189664.

      Foldiak, Peter. 2003. “Sparse Coding in the Primate Cortex.” The Handbook of Brain Theory and Neural Networks. https://research-repository.st-andrews.ac.uk/bitstream/handle/10023/2994/FoldiakSparse HBTNN2e02.pdf?sequence=1.

      Golding, Nace L., Nathan P. Staff, and Nelson Spruston. 2002. “Dendritic Spikes as a Mechanism for Cooperative Long-Term Potentiation.” Nature 418 (6895): 326–31. https://doi.org/10.1038/nature00854.

      Govindarajan, Arvind, Inbal Israely, Shu-Ying Huang, and Susumu Tonegawa. 2011. “The Dendritic Branch Is the Preferred Integrative Unit for Protein Synthesis-Dependent LTP.” Neuron 69 (1): 132–46. https://doi.org/10.1016/j.neuron.2010.12.008.

      Govindarajan, Arvind, Raymond J. Kelleher, and Susumu Tonegawa. 2006. “A Clustered Plasticity Model of Long-Term Memory Engrams.” Nature Reviews Neuroscience 7 (7): 575–83. https://doi.org/10.1038/nrn1937.

      Harvey, Christopher D., Ryohei Yasuda, Haining Zhong, and Karel Svoboda. 2008. “The Spread of Ras Activity Triggered by Activation of a Single Dendritic Spine.” Science (New York, N.Y.) 321 (5885): 136–40. https://doi.org/10.1126/science.1159675.

      Losonczy, Attila, Judit K. Makara, and Jeffrey C. Magee. 2008. “Compartmentalized Dendritic Plasticity and Input Feature Storage in Neurons.” Nature 452 (7186): 436–41. https://doi.org/10.1038/nature06725.

      Poirazi, Panayiota, Terrence Brannon, and Bartlett W. Mel. 2003. “Pyramidal Neuron as Two-Layer Neural Network.” Neuron 37 (6): 989–99. https://doi.org/10.1016/S0896-6273(03)00149-1.

      Pulikkottil, Vinu Varghese, Bhanu Priya Somashekar, and Upinder S. Bhalla. 2021. “Computation, Wiring, and Plasticity in Synaptic Clusters.” Current Opinion in Neurobiology, Computational Neuroscience, 70 (October):101–12. https://doi.org/10.1016/j.conb.2021.08.001.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The technology requires a halo-tagged derivation of the active compound, and the linked position will have a huge impact on the potential "target hits" of the molecules. Given the fact that most of the active molecules lack of structure-activity relationship information, it is very challenging to identify the optimal position of the halo tag linkage.

      We appreciate your insightful comment. While finding the optimal position to attach a chemical linker to a small molecule of interest is indeed a challenging but necessary step, this is a common difficulty across all target-ID methods, except for those that are modification-free, as we described in Discussion. However, modification-free approaches such as DARTS, CETSA, and TPP have their own limitations, such as low sensitivity and a high false-positive rate. Additionally, DARTS and SPROX are limited to use with cell lysates. Please refer to the introduction in our manuscript for more details on these approaches. On the other hand, synthesizing HTL derivatives is relatively straightforward compared to other modifications, and we provide helpful guidelines for chemical linker design, provided the optimal chemical moiety has been identified, which is crucial for target identification. We selected dasatinib and HCQ/CQ as model compounds because previous studies offered insights into their derivative synthesis. Our data also show that DH5 retains strong kinase inhibitory activity (Figure 4—figure supplement 2), and DC661-H1 demonstrates potent inhibition of autophagy (Figure 6—figure supplement 1). For novel compounds, conducting a thorough structure-activity relationship (SAR) study is essential to determine the optimal position for HTL derivative synthesis.

      (2) Although POST-IT works in zebrafish embryos, there is still a long way to go for the broad application of the technology in other animal models.

      Thank you for your constructive comment. Yes, there is still a long way to go in developing the POST-IT system for broader applications in other animal models, especially in mice. However, we hope that our study provides valuable insights and inspiration to scientists and experts for applying the POST-IT system in various models. We are also committed to further improving its applicability.

      (3) The authors identified SEPHS2 as a new potential target of dasatinib and further validated the direct binding of dasatinib with this protein. However, considering the super strong activity of dasatinib against c-Src (sub nanomolar IC50 value), it is hard to conclude the contribution of SEPHS2 binding (micromolar potency) to its antitumor activity.

      Thank you for your insightful comment. We agree that the anticancer activity of dasatinib primarily results from inhibiting tyrosine kinases such as SRC and ABL. However, SEPHS2 contains an “opal" termination codon, UGA, at the 60th amino acid residue, which codes for selenocysteine. Due to the technical challenge of expressing selenoproteins in E. coli, we mutated it to cysteine for expression in E. coli to avoid premature translation termination, as described in the Materials and Methods section. Although the purified recombinant SEPHS2 shows a Kd of about 10 µM for dasatinib, the binding affinity to endogenous SEPHS2 may be higher since selenocysteine is larger and more electronegative than cysteine. This presents an interesting area for future investigation. Furthermore, our study of dasatinib’s binding to SEPHS2 could help facilitate the development of new SEPHS2 inhibitors, potentially targeting the active site of SEPHS2.

      Reviewer #3 (Public review):

      (1) Target Specificity: It is crucial for the authors to differentiate between the primary targets of the POST-IT system and those identified as side effects. This distinction is essential for assessing the specificity and utility of the technology.

      Thank you for your insightful comment. Drugs inevitably bind to various proteins with differing affinities, which can contribute to both side effects and beneficial outcomes. Typically, the primary targets exhibit high affinities. In this manuscript, we ranked the identified protein targets of DH5 based on affinity from mass spectrometry and p-values (Fig. 5A), and for DC661-H1, we used the SILAC ratio (Fig. 6A). We also individually assessed many drug-protein binding affinities using the MST assay, as well as in vitro and in cellulo assays, demonstrating their specificity. Moreover, we believe it is essential to identify as many protein targets as possible at physiological drug concentrations to better understand the drug’s side effects. Of course, further investigation is required to assess the roles and effects of these target proteins.

      (2) In Vivo Target Identification: The manuscript lacks detailed clarity on which specific targets were successfully identified in the in vivo experiments. Expanding on this information would provide a clearer view of the system's effectiveness and scope in complex biological settings.

      Thank you for your insightful comment regarding in vivo target identification. In this manuscript, we utilized a cell line as the primary method for in vivo target identification and validation after optimizing our system in test tubes. We successfully validated many of the targets identified using our POST-IT system (Figure 6—figure supplement 3). To demonstrate the proof of principle for in vivo application, we employed zebrafish embryos as an in vivo model, showing that endogenous SRC can be effectively pulled down by DH5 treatment (Fig. 7). While we could have explored the entire proteome to identify endogenous target proteins in zebrafish that bind to DH5 or dasatinib, we felt this would extend beyond our original scope, given that we have already demonstrated POST-IT’s ability to identify target proteins for dasatinib. Specific target identification and validation are crucial when using zebrafish for drug discovery. Additionally, we acknowledge that drugs likely interact with a range of protein targets in living organisms and may undergo metabolism and interactions within the circulatory system, which we address in our discussion.

      (3) Reproducibility and Scalability: Discussion on the reproducibility of the POST-IT system across various experimental setups and biological models, as well as its scalability for larger-scale drug discovery programs, would be beneficial.

      Thank you for the suggestion. While our system has shown  high reproducibility in our experiments, further improving both reproducibility and scalability would be advantageous. One potential approach to address this is through the generation of stable-expressing cell lines and transgenic zebrafish lines, which we have discussed in the revised manuscript. Establishing stable cell lines with robust POST-IT expression could enhance scalability for drug discovery applications.

      (4) Quantitative Analysis: A more detailed quantitative analysis of the protein interactions identified by POST-IT, including statistical significance and comparative data against other technologies, would enhance the manuscript.

      Thank you for your suggestion. In our assessment of drug-protein affinity, we included Kd values as quantitative measures using MST assays. The protein targets of dasatinib identified through mass spectrometry are also accompanied by p-values for quantitative analysis (Fig. 5A), and the detailed procedures are described in the Material and methods section. While it is challenging to provide direct comparative data against other technologies, our system successfully identified many known target proteins for dasatinib, as well as SEPHS2 and VPS37C as new targets for dasatinib and for HCQ/CQ, respectively, which were not detected by other methods.

      (5) Technological Limitations: The authors should discuss any limitations or potential pitfalls of the POST-IT system, which would be crucial for future users and for guiding subsequent improvements.

      Thank you for your insightful suggestion We agree that clearly defining the technological limitations is important. Therefore, we have expanded our original discussion on the limitations of our POST-IT system (Discussion section, paragraph 6).

      (6) Long-Term Stability and Activity: Information on the long-term stability and activity of the POST-IT components in different biological environments would ensure the reliability of the system in prolonged experiments.

      Yes, this is an important question. We did not notice any stability or toxicity issues with Halo-PafA and Pup substrates in HEK293T cells or zebrafish, which is an important factor for stable cell lines and transgenic zebrafish lines. However, HTL derivatives of the drug could be toxic or unstable due to the nature of the drug or its metabolism, which needs to be taken into account when designing experiments, and we have included this in the Discussion.

      (7) Comparison with Existing Technologies: A detailed comparison with existing proximity tagging and target identification technologies would help position POST-IT within the current landscape, highlighting its unique advantages and potential drawbacks.

      We appreciate your valuable feedback and agree that such comparisons are crucial. We have included a detailed overview and comparison of existing proximity-tagging systems and their related target identification technologies in the Introduction (lines 78-100) and Discussion (lines 391-412), highlighting their respective pros and cons. Additionally, we have expanded the discussion to further compare these technologies with our POST-IT system, addressing its advantages and limitations (lines 378-390, lines 448-467). We hope this provides sufficient context and information to effectively position POST-IT among the landscape of proximity-tagging target identification technologies.

      (8) Concerns Regarding Overexposed Bands: Several figures in the manuscript, specifically Figure 3A, 3B, 3C, 3F, 3G, Figure 4D, and the second panels in Figure 7C as well as some figures in the supplementary file, exhibit overexposed bands.

      We appreciate your astute observation regarding the overexposed bands and apologize for any confusion. The “overexposed” bands represent the unpupylated proteins, while the bands above them correspond to the pupylated proteins. We intended to clearly show both pupylated and unpupylated bands, although the latter are generally much weaker. We are currently working on further improving our POST-IT system to enhance pupylation efficiency.

      (9) Innovation Concern: There is a previous paper describing a similar approach: Liu Q, Zheng J, Sun W, Huo Y, Zhang L, Hao P, Wang H, Zhuang M. A proximity-tagging system to identify membrane protein-protein interactions. Nat Methods. 2018 Sep;15(9):715-722. doi: 10.1038/s41592-018-0100-5. Epub 2018 Aug 13. PMID: 30104635. It is crucial to explicitly address the novel aspects of POST-IT in contrast to this earlier work.

      Thank you for bringing this to our attention. Proximity-tagging systems like BioID, TurboID, NEDDylator, and PafA (Lui Q et al., Nat Methods 2018) were initially developed to study protein-protein interactions or identify protein interactomes, as these applications are of broader interest and generally easier to implement. However, applying proximity-tagging systems for small molecule target identification requires significant optimization. As described in the introduction (lines 78-100), target protein identification systems have since been developed using TurboID and NEDDylator (Tao AJ et al., Nat Commun 2023; Hill ZB et al., J Am Chem Soc 2016). It is conceivable that a PafA-based proximity-tagging system could also be adapted for target-ID, and other groups may pursue this approach in the future. Although the PafA-Pup system shows great promise for target-ID applications, extensive optimization was needed to enable its use for this purpose. Finally, we demonstrate that POST-IT offers distinct advantages over other proximity-tagging-based target-ID systems. For more details, please refer to the introduction and discussion sections.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1- Figure Supplement 1A: The Pup substrate "HB-Pup" is mentioned, but the main text or figure legend provides no introduction or description.

      We appreciate your astute observation. We have added a description in the main text and figure legend as follows: “…and used HB-Pup as a control, which contains 6´His and BCCP at the N terminus of Pup” in the main text (line 142) and “HB, TS, and SBP refer to 6´His and BCCP, twin-STII (Strep-tag II), and streptavidin binding peptide, respectively.” in the Figure 1-figure supplement 1A.

      (2) Figure 1 - Figure Supplement 3B: The authors used TS-sPupK61R as a substrate but did not explain why. The main text mentions that mutating sPup alone did not affect polypupylation, raising the question of why TS-sPupK61R was used in this figure. Furthermore, while the authors state that polypupylation becomes evident after 1 hour of incubation (more pronounced after 2 or 3 hours), the reactions here were conducted for only 30 minutes.

      Thank you for your question. Figure 1 - Figure Supplement 3B was conducted to test self-pupylation levels in the different Halo-PafA derivatives. For this purpose, we could use any Pup substrate such as SBP-sPup and SBPK4R-sPupK61R, instead of Ts-sPup and TS-sPupK61R, as they do not show any differences in pupylation activity. We chose Ts-sPup and TS-sPupK61R simply because any Pup substrates could be used for this purpose. Similarly, we did not need to incubate the reaction for a longer time to detect polypupylation, as our intention was to test “self-pupylation”. We demonstrated in Figure 1 – figure supplement 2 that polypupylation is dependent on the number or position of lysine residues in Pup substrate or tags. The results clearly showed that self-pupylation was almost completely abolished by the Halo8KR mutation. To clarify this, we added the following description in lines 168-169: “Ts-sPup and TS-sPupK61R were chosen as sPup substrates for this experiment, although any Pup substrates could have been used. The levels of self-pupylation were assessed.”

      (3) Line 156: The statement that "the TS-tag completely abolished polypupylation in TS-sPup" is inaccurate. Using TSK8R-sPupK61R as the substrate, several bands appear, which likely represent Halo-PafA with varying degrees of polypupylation. Some bands also appear to correspond to those seen when using TS-sPup as a substrate. The authors should clarify how they distinguish between multipupylation and polypupylation in this case.

      We sincerely appreciate your insight into clarifying the distinction between multipupylation and polypupylation. Polypupylation refers to the addition of a new Pup onto a previously linked Pup on the target protein, akin to polyubiquitination. In contrast, multipupylation involves multiple single pupylations at different positions on the target proteins. Since pupylation occurs exclusively at lysine residues in tag-Pup substrates, mutating all lysine residues to arginine, as in TSK48R-sPupK61R, prevents the mutant tag-Pup from linking to another Pup. This means that only single pupylation can proceed with this type of mutant Pup substrate. If multiple pupylated bands are observed with this mutant substrate, it indicates “multipupylation” rather than “polypupylation”, as shown in Figure 1-figure supplement 2D. The same applies to the pupylation bands in Figure 1-figure supplement 2E and F, as sSBP-sPupK61R and SBPK4R-sPupK61R lack lysine residues. By comparing these multipupylation bands, it is also possible to distinguish them from polypupylation bands, which are marked by yellow arrows. However, after 2-3 pupylation bands, higher-order bands become increasingly difficult to distinguish.

      To clarify the mutation in the TS-tag, we revised the sentence in line 156 from “However, further mutations within the TS-tag completely abolished polypupylation in TS-sPup” to “However, further mutations of two lysine residues within the TS-tag, creating TSK8R-sPupK61R, completely abolished polypupylation in TS-sPup”. Additionally, we have inserted sentences in line 152 to define polypupylation and multipupylation, as described here.

      (4) Line 160: Similar to the above concern about line 156, the claim that SBPK4R and sSBP completely prevented polypupylation is unconvincing and requires more supporting evidence.

      Thank you for raising this concern. As mentioned above, both SBPK4R and sSBP lack lysine residues required for pupylation. As a result, these mutants can only undergo multiple single pupylations on the lysine residues of the target protein, which leads to “multipupylation”. In Figure 1-figure supplement 2E and F, pupylation bands by sSBP-sPupK61R or SBPK4R-sPupK61R do not display doublet bands (one from multipupylation and the other from polypupylation), as seen with SBP-sPup, marked by yellow arrows. Notably, Halo-PafA containing polypupylated branches migrates more slowly than one with an equal number of multipupylation events. To clarify this point, we have added the phrase “as shown in sSBP-sPupK61R and SBP4KR-sPupK61R” at the end of the sentence in line 160.

      (5) Lines 176-177: The authors claim that PafAS126A exhibited reduced polypupylation compared to PafA, but given that PafAS126A may reduce depupylase activity, how could it reduce polypupylation levels? Moreover, it is hard to find any data supporting this conclusion in Figure 1 - Figure Supplement 3B.

      We appreciate your insightful comment. At this point, we do not fully understand how the mutation that reduces depupylase activity also decreases polypupylation. It is possible that PafAS126A has a lower preference for pupylated Pup as a prey, which is required for polypupylation, since depupylase activity depends on recognizing pupylated Pup as a prey to remove it. Nonetheless, Halo-PafAS126A shows reduced levels of higher molecular weight bands compared to Halo-PafA, as shown in Figure 1-figure supplement 3B, while exhibiting increased pupylation in lower molecular weight bands, which represent either multipupylation or low-degree polypupylation. Since higher molecular weight bands (> 150 kD) are likely due to polypupylation, this result suggests reduced polypupylation and increased multipupylation in Halo-PafAS126A. To clarify this in the main text, we have added the following description in line 177: “as evidenced by the decreased levels of high molecular weight bands and an increase in low molecular weight bands”

      (6) POST-IT system in cellulo validation: The system was developed using the Halo-tag, yet the in-cell validation uses FRB and FKBP instead, without explaining this switch. This inconsistency makes the logic of the experiment unclear.

      We appreciate your insightful comment. The interaction between rapamycin and FRB or FKBP is known to be highly specific and robust, making this system useful in various biological contexts. Due to this property, rapamycin can induce interaction between two proteins when one is fused with FRB and the other with FKBP. Before testing or optimizing the POST-IT system in cells, we hypothesized that using the rapamycin-induced interaction between FRB and FKBP could introduce pupylation of the target protein, provided that PafA is fused with FRB or FKBP and the target protein is fused with the other. The results demonstrate that PafA can introduce pupylation of the target protein in a proximity-dependent manner via this chemically induced interaction. To further clarify this in the main text, we modified the original sentence in lines 214-216 as follows: “To mimic drug-target interaction-induced pupylation in live cells and assess the potential of PafA as a proximity-tagging system for target-ID, we incorporated the rapamycin-induced interaction between FRB and FKBP into our PL system, as this interaction between a small molecule and a protein is known to be highly specific and robust (Figure 3—figure supplement 1A).”

      (7) Line 209: The authors decided to use the SBP-tag for further studies due to better performance, but in Figure 3 - Figure supplement 1, they still used the unintroduced HB-Pup as the substrate, which is confusing and lacks explanation.

      Thank you for raising your question. The SBP-tag is not superior to the TS-tag in terms of pupylation activity. However, the TSK8R mutant cannot bind to Strep-Tactin beads, while the SBP mutants, SBPK4R and sSBP, can bind to streptavidin. Therefore, we chose the SBP-tag instead of the TS-tag for further studies as a Pup substrate in POST-IT system, as we needed to pull down the target proteins. HB-Pup is consistently used as a control throughout various experiments, as it is the original Pup substrate. In Figure 3-figure supplement 1B and C, HB-Pup was used to test chemically induced pupylation by PafA. In these cases, it was not so critical which Pup substrate was chosen. Furthermore, we compared HB-Pup and different SBP-sPup substrates in Figure 3-figure supplement 1D, where HB-Pup was used as a control or for comparison. Although pupylation bands with HB-Pup appear more robust, this substrate contains multiple lysine residues, leading to high levels of polypupylation. To make it clear, we modified the sentence in line 209 to “Therefore, we decided to use the SBP-tag as a Pup substrate in the POST-IT system for further studies.”.

      (8) Line 220: Both SBP-sPup and SBPK4R-sPupK61R are described as exhibiting efficient pupylation, but the data show mostly self-pupylation and little to no pupylation of the target protein.

      Thank you for your concern. However, pupylation of the target protein is actually quite substantial, as the intensities of the free form and pupylated proteins are relatively similar, as shown in the upper panel of Figure 3-figure supplement 1D. Self-pupylation is always much higher than target pupylation, because PafA constantly pupylates itself, whereas pupylation of the target protein occurs only through interaction. Furthermore, V5-FRB-mKate2-PafA contains many lysine residues, which increases the levels of self-pupylation.

      (9) Lines 222-224: The authors chose SBPK4R-sPupK61R to avoid polypupylation, although SBP-sPup did not cause detectable polypupylation. Neither substrate caused pupylation of the target protein, so the rationale behind this choice is unclear.

      Thank you for raising your question. Similar to the above comment (#8), please refer to the pupylation bands of the target protein, as shown in the upper panel of Figure 3-figure supplement 1D. The pupylation band of the target protein is quite remarkable, as the intensities of the free form and pupylated proteins are comparable. Additionally, there are no multiple pupylation bands in either case, except for one additional weak multipupylation band, indicating no polypupylation by SBP-sPup, which does not have K-to-R mutations. Of course, SBPK4R-sPupK61R can only undergo single pupylation, as it does not contain lysine residues. Although we did not observe polypupylation by SBP-sPup in this experimental condition, it is possible that SBP-sPup may cause polypupylation under different experimental conditions or with other target proteins. Since SBPK4R-sPupK61R exhibits comparable pupylation of the target protein at least in this experiment setting as SBP-sPup, we selected SBPK4R-sPupK61R as the Pup substrate for POST-IT system to avoid any potential polypupylation that could be caused by SBP-sPup in other cases. We believe that polypupylation can introduce bias into the analysis and hinder the comprehensive discovery of additional target proteins for small molecules.

      (10) Line 224: The authors conclude that rapamycin greatly reduced self-pupylation, but the supporting data are unclear.

      Thank you for your constructive comments on our manuscript. Please refer to the lower panel of Figure 3-figure supplement 1D. When using either SBPK4R-sPupK61R or SBP-sPup, rapamycin treatment results in reduced levels of self-pupylation compared to the no-treatment control. However, we did not observe this reduction with HB-Pup and do not know the reason. To clarify this in the main text, we added the following description to the end of the sentence: “when using either SBPK4R-sPupK61R or SBP-sPup, as shown in the lower panel of Figure 3—figure supplement 1D”

      (11) Line 234: The authors selected an 18-amino acid linker, but given that linkers longer than 10 amino acids enhance labeling, this choice should be explained.

      Thank you for raising your question. In fact, a linker of 10 amino acids (aa) or longer is likely to behave similarly. We chose an 18 aa linker instead of a 40 aa linker primarily for the convenience of cloning and to reduce the potential for DNA sequence recombination associated with longer repeats. Additionally, a longer, flexible linker may behave like an intrinsically disordered protein (Harmon et al., 2017), which can lead to unwanted protein-protein interactions or phase separation. To elaborate on this, we added the following sentences after the sentence in line 233-235: “We chose the 18-amino acid linker instead of the 40-amino acid linker for easier cloning and to lower the risk of DNA recombination from longer repeats. Additionally, a longer, flexible linker may behave like an intrinsically disordered protein (Harmon et al., 2017), an unwanted feature for target-ID.”

      (12) S126A and K172R mutations: The authors claim that these mutations additively enhanced pupylation under cellular conditions, but in Figure 3B, the band intensities appear similar for the wild-type and mutant versions.

      Thank you for raising your concern. Although a single pupylation band appears similar among the three different Halo-PafA proteins, multipupylation bands are slightly but noticeably increased by the S126A and K172R mutations compared to Halo8KR-PafA. Since we used SBPK4R-sPupK61R as a Pup substrate, all higher molecular weight bands result from multipupylation rather than polypupylation. This illustrates why it is preferable to use SBPK4R-sPupK61R over SBP-sPup, as the pupylation bands with SBP-sPup are mixtures of poly- and multipupylation, making it difficult to assess levels of target labeling. To clarify this in the main text, we added the following description after the sentence in line 236: “as the higher molecular weight multipupylation bands are slightly but noticeably increased with these mutations compared to Halo8KR-PafA”

      (13) Line 263: The authors selected DH5 for further experiments due to its efficiency, but the data suggest that the performance of DH1 to DH5 is similar.

      We appreciate your question about the different dasatinib HTL derivatives. However, our data clearly show that DH2-5 derivatives bind significantly more effectively to Halo-PafA in vitro and in live cells compared to DH1 (Figure 4A and B). Additionally, the DH2-5 derivatives result in dramatically increased pupylation of the target protein in vitro and noticeable enhancement in live cells (Figure 4C and D). Among DH2 to DH5, there is no obvious difference in binding to Halo-PafA or pupylation of the target protein. Therefore, we chose DH5, as we believe that the longer linker in DH5 may facilitate the binding of a more diverse range of target proteins to dasatinib, enabling the discovery of additional target proteins.

      (14) Line 309: The authors introduce HCQ and CQ as important drugs but then investigate the mechanism using DC661 without introducing or justifying the choice of this compound.

      Thank you for your point. We explained the reason to choose DC661, a dimer form of CQ, instead of CQ for the synthesis of an HTL derivative in line 310. “assuming that a dimer would enhance binding affinity as previously described.” As the dimer forms of a drug or a small molecule such as testosterone dimers, estrogen dimers, and numerous anticancer drug dimers have been often developed to enhance drug effects (Paquin A et., Molecules 2021). Similarly, dimer forms of HCQ/CQ have been introduced and shown to be more potent (Hrycyna CA et al., ACS Chem Biol 2014; Rebecca VW et al., Cancer Discovery 2019). We expected that using a dimer form might offer higher probability to identify target proteins for HCQ/CQ.

      (15) The authors suggest that multipupylation levels were enhanced but do not explain whether this might benefit the system or introduce other issues. Clarifying this point would provide valuable insight for potential users of this system.

      Thank you for your thoughtful suggestion. Polypupylation likely leads to biased enrichment of a limited set of target proteins, and its levels may not correlate with the binding affinity of target proteins to the small molecule of interest, features that can negatively impact target-ID. In contrast, multipupylation may be correlated with binding affinity or interaction frequency, as we observed increased levels of multipupylation with higher Pup concentrations and longer incubation times. This suggests that target proteins with multiple lysines in proximity to PafA can be sequentially pupylated, starting with the most accessible lysine. However, if a target protein has only one accessible lysine, pupylation will occur only once, regardless of the protein’s affinity to the small molecule. In summary, while polypupylation may be a drawback for target-ID, multipupylation could be useful for both target-ID and understanding binding mode. To elaborate on this, we added the following additional explanation after the sentence in line 152: “, whereas multipupylation is more likely correlated with binding affinity or interaction frequency.”

      (16) The author should address whether the Halotag ligand modification of the drug alters the binding properties between the drug and targets. That may be causing artifact binding of the drug and other proteins.

      Thank you for your insightful comment. Yes, it is true that chemical modifications of the small molecule of interest, such as linker derivatization (e.g., HTL) or photo-affinity labeling, generally lead to reduced activity or affinity compared to the original molecule. Synthesizing a derivative is a common challenge across all target-ID methods, except for modification-free approaches, as we mentioned in the Discussion. However, modification-free methods like DARTS, CETSA, and TPP have their own limitations, including low sensitivity or high false positive rates. Identifying the optimal position for chemical modification on the small molecule of interest is critical. We chose dasatinib and HCQ/CQ as model compounds, because previous studies provided insights into their derivative synthesis. In addition, our data show that DH5 retains robust kinase inhibitory activity (Figure 4-figure supplement 2), and DC661-H1 exhibits potent autophagy inhibition (Figure 6-figure supplement 1). For novel compounds, a thorough structure-activity relationship study is essential to identify the optimal position for HTL derivative synthesis.

      (17) The author stated there is no observable toxicity in zebrafish without providing a detailed analysis or enough data. Further analysis of the expression of Halo-PafA and its substrate sPup influence on toxicity or side effects to the living cells or animals would be needed. It is important for in vivo applications.

      Thank you for your constructive suggestion. We have now included additional experimental data in Figure 7-figure supplement 1, showing no toxicity in zebrafish embryos expressing the POST-IT system. We assessed toxicity in two ways: by injecting the POST-IT DNA plasmid into one-cell-stage embryos for acute expression, and by using embryos from transgenic zebrafish expressing POST-IT under a heat-shock inducible promoter. Neither the injection nor the heat-shock activation of POST-IT expression resulted in any noticeable toxicity.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment

      This important work presents two studies on predictive processes in subjects with and without tinnitus. The evidence supporting the authors' claims is compelling, as their second study serves as an independent replication of the first. Rigorous matching between study groups was performed, especially in the second study, increasing the probability that the identified differences in predictive processing can truly be attributed to the presence of tinnitus. This work will be of interest to researchers, especially neuroscientists, in the tinnitus field.

      We thank the editors at elife very much for their favorable assessment of our manuscript. Based upon the comments of the reviewer, we aimed to further improve our manuscript to be a valuable addition to the tinnitus research field.

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      This study aimed to test experimentally a theoretical framework that aims to explain the perception of tinnitus, i.e., the perception of a phantom sound in the absence of external stimuli, through differences in auditory predictive coding patterns. To this aim, the researchers compared the neural activity preceding and following the perception of a sound using MEG in two different studies. The sounds could be highly predictable or random, depending on the experimental condition. They revealed that individuals with tinnitus and controls had different anticipatory predictions. This finding is a major step in characterizing the top-down mechanisms underlying sound perception in individuals with tinnitus.

      Strengths:

      This article uses an elegant, well-constructed paradigm to assess the neural dynamics underlying auditory prediction. The findings presented in the first experiment were partially replicated in the second experiment, which included 80 participants. This large number of participants for an MEG study ensures very good statistical power and a strong level of evidence. The authors used advanced analysis techniques - Multivariate Pattern Analysis (MVPA) and classifier weights projection - to determine the neural patterns underlying the anticipation and perception of a sound for individuals with or without tinnitus. The authors evidenced different auditory prediction patterns associated with tinnitus. Overall, the conclusions of this paper are well supported, and the limitations of the study are clearly addressed and discussed.

      Weaknesses:

      Even though the authors took care of matching the participants in age and sex, the control could be more precise. Tinnitus is associated with various comorbidities, such as hearing loss, anxiety, depression, or sleep disorders. The authors assessed individuals' hearing thresholds with a pure tone audiogram, but they did not take into account the high frequencies (6 kHz to 16 kHz) in the patient/control matching. Moreover, other hearing dysfunctions, such as speech-in-noise deficits or hyperacusis, could have been taken into account to reinforce their claim that the observed predictive pattern was not linked to hearing deficits. Mental health and sleep disorders could also have been considered more precisely, as they were accounted for only indirectly with the score of the 10-item mini-TQ questionnaire evaluating tinnitus distress. Lastly, testing the links between the individuals' scores in auditory prediction and tinnitus characteristics, such as pitch, loudness, duration, and occurrence (how often it is perceived during the day), would have been highly informative.

      Thank you very much for your careful evaluation of our manuscript. We agree with you that our study design has some limitations such as the assessment of higher frequencies, comorbidities, and tinnitus characteristics. In our discussion, we aimed to acknowledge these issues for future research to improve this study design and gain more insights into neural tinnitus processes.

      See e.g.:

      Line 946-949:

      “Additionally, we rigorously controlled for hearing loss in Study 2, however, pure-tone audiometric testing was solely performed up to 8kHz and we were therefore not able to draw conclusions regarding hearing impairments in higher frequencies and their influence on the effects.”

      Line 949-954:

      “Moreover, we did not screen our participants for hyperacusis. This hypersensitivity to mild sounds is widely correlated with the sensation of tinnitus and underlying neural mechanisms are potentially intertwined with tinnitus processes (Schilling et al., 2023; Yukhnovich et al., 2023; Zheng, 2020). Screening for hyperacusis in future work can therefore reveal more details on participant characteristics influencing predictive processing.”

      Line 955-958:

      “In both studies, tinnitus distress was not correlated with the reported prediction effects. Nevertheless, tinnitus can also be characterized by other features such as its loudness, pitch or duration which were not included in the experimental assessment.”

      Line 958-963:

      “Additionally, we solely used a short version of the Mini-TQ (Goebel and Hiller, 1992) in Study 2, which did not allow us to relate prediction scores to subscales like sleep disturbances which potentially influence cognitive functioning and thus predictive processing. Next to sleeping disorders and distress, tinnitus is often also accompanied by psychological comorbidities such as depression or anxiety (Langguth, 2011) which are potential confounds of the results.”

      Comments on revisions:

      Thank you for your responses. There are a few remaining points that, if addressed, could further enhance the manuscript:

      - While the manuscript acknowledges the limitation of not matching groups on hearing thresholds in Study 1, a deeper analysis of participants' hearing abilities and their impact on MEG results, similar to that conducted in Study 2, would be valuable. Specifically, including a linear model that considers all frequencies, group membership, and their interactions could highlight differences across groups. Additionally, examining the effect of high-frequency hearing loss on prediction scores, as performed in Study 2, would strengthen the analysis, particularly given the trend noted (line 719). Such an addition could make a significant contribution to the literature by exploring how hearing abilities may influence prediction patterns.

      We appreciate your feedback and agree with you that it is a crucial question how hearing abilities influence prediction patterns in tinnitus. However, as hearing status was not assessed in the control group in study 1, we are unfortunately not able to include linear models to investigate differences across groups in this sample. This led us to the implementation of study 2 with a comprehensive hearing assessment to investigate group differences. We highlighted this issue in our methods section.

      Line 170-172:

      “As pure-tone audiometric testing was not included for the control subjects, group comparisons between hearing thresholds were not feasible.”

      - The connection with the hippocampal regions (line 864) remains somewhat unclear. While the inclusion of the Paquette reference appropriately links temporal region activity with tinnitus, it does not fully support the statement: "An increased focus on hippocampal regions, e.g., in fMRI, patient, or animal studies, could be a worthwhile complement to our MEG work, given the outstanding relevance of medial temporal areas in the formation of associations in statistical learning paradigms"

      Thank you for your constructive input. This section is purely speculative, and we do not aim to provide strong claims or expected results but solely point out potential future research directions.

      - Authors should add a comparison of participants mini-TQ scores on both studies

      We appreciate your input and added a comparison of mini TQ-scores between samples. For study 1, all subscales were included, however, we computed the comparison solely based on the items of the mini-TQ to increase comparability. The results were not significant, i.e., tinnitus distress values did not differ between studies.

      Line 629-632:

      “We additionally compared tinnitus distress values assessed by the mini-TQ (Goebel and Hiller, 1992) between study 1 and study 2 to detect potential differences between the samples, however, results of the Welch’s t-test were not significant with t(30.7)=1.27, p\=.214.”

      - Authors should add significant level on Fig 6.B as in Fig 3.C, and a n.s on Fig 6.D

      Thank you very much for your input, we added significance levels and a n.s. to the Figures 6B and 6D.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary of the work: In this work, Fruchard et. al. study the enzyme Tgt and how it modifies guanine in tRNAs to queuosine (Q), essential for Vibrio cholerae's growth under aminoglycoside stress. Q's role in codon decoding efficiency and its proteomic effects during antibiotic exposure is examined, revealing Q modification impacts tyrosine codon decoding and influences RsxA translation, affecting the SoxR oxidative stress response. The research proposes Q modification's regulation under environmental cues reprograms the translation of genes with tyrosine codon bias, including DNA repair factors, crucial for bacterial antibiotic response.

      The experiments are well-designed and conducted and the conclusions, for the most part, are well supported by the data. However, a few clarifications will significantly strengthen the manuscript.

      Thank you.

      Major:

      Figure S4 A-D. These growth curves are important data and should be presented in the main figures. Moreover, given that it is not possible to make a rsxA mutant, I wonder if it would be possible to connect rsx and tgt using the following experiment: expression of tgt results in resistance to TOB (in B), while expression of only rsx lower resistance to TOB (in D). Then simultaneous overexpression of both tgt/rsx in the WT strain should have either no effect on TOB resistance or increased resistance, relative to the WT. Perhaps the authors have done this, and if so, the data should be included as it will significantly strengthen their model.

      We thank the reviewer for this suggestion, we have tried to overexpress both tgt and rsxA simultaneously. However, this appears to be toxic as cells form small colonies and cannot grow well in liquid. We think that the presence of 2 plasmids and corresponding selection antibiotics amplify the toxicity of overexpressing rsxA, and even tgt. In fact, it can be seen that tgt overexpression in WT is already slightly deleterious, in the absence of tobramycin (figure 1B).

      Figure S4 - Is there a rationale for why it is possible to make rsx mutants in E. coli, but not in V. cholerae? For example, does E. coli have a second gene/protein that is redundant in function to rsxA, while V. cholerae does not? I think your data hint at this, since in the right panel growth data, your double mutant does not fully rescue back to rsx single mutant levels, suggesting another factor in tgt mutant also acts to lower resistance to TOB. If so, perhaps a line or two in text will be helpful for readers.

      This point raised by the referee is an interesting one that we have also asked ourselves at multiple occasions. In fact, the Rsx operon is linked with oxidative stress and respiration. Vibrio cholerae and E. coli show differences on genes involved in these pathways. V. cholerae lacks the cyo/nuo respiratory complex genes, and does not encode a Suf operon. Moreover, deletion of the anaerobic respiration Frd pathway leads to strong decrease of V. cholerae growth even in aerobic conditions. (10.1128/spectrum.01730-23). We have previously also generally seen differences between the 2 species in response to stress (10.1128/AAC.01549-10) and the way they deal with ROS (10.1371/journal.pgen.1003421). Therefore, we think that the fact that rsx is essential in V. cholerae and not E. coli could either be due to the presence of an additional redundant pathway in E. coli as suggested by the referee, or to more general differences in respiration and treatment of ROS. We thank the referee for highlighting this and we have now included a comment about this in the manuscript.

      - For growth curves in Figure 2 and relative comparisons like in Figure 5D and Figure S4 (and others in the paper), statistics and error bars, along with replicate information should be provided.

      We had mentioned this in the methods section, we have now added the specific information also on figure legends.

      - Figure 6A - Is the transcript fold change in linear or log? If linear, then tgt expression should not be classified as being upregulated in TOB. It is barely up by ~2-fold with TOB- 0.6....which is a mild phenotype, at best.

      We think that 2-fold change of tgt expression can be sufficient to lead to changes in tRNA modification levels. We agree that this is a mild induction, we have thus changed “increase” to “mildly increase” in the results.  

      - Line 779- 780: "This indicates that sub-MIC TOB possibly induces tgt expression through the stringent response activation." To me, the data presented in this figure, do not support this statement. The experiment is indirect.

      We agree, we rephrased: “Tobramycin may induces tgt expression through stringent response activation or through an independent pathway. “

      - Figure 3B and D. - These samples only have tobramycin, correct? The legend says both carbenicillin and tobramycin.

      The legend is correct, samples also have carbenicillin because we are testing here the growth with 2 synonymous beta-lactamase genes in presence of beta-lactams.

      - Figure 5. The color schemes in bars do not match up with the color scheme in cartoons below panels B and C. That makes it confusing to read. Please fix.

      Fixed.

      - A lot of abbreviations have been used. This makes reading a bit cumbersome. Ideally, less abbreviations will be used.

      Fixed

      Reviewer #2 (Public Review):

      Fruchard et al. investigate the role of the queuosine (Q) modification of the tRNA (Q-tRNA) in the human pathogen Vibrio cholerae. First, the authors state that the absence of Q-modified tRNAs (tgt mutant) increases the translation of TAT codons and proteins with a high TAT codon bias. Second, the absence of Q increases rsxA translation, because rsxA gene has a high TAT codon bias. Third, increased RsxA in the absence of Q inhibits SoxR response, reducing resistance towards the antibiotic tobramycin (TOB). Authors also predict in silico which genes harbor a higher TAT bias and found that among them are some involved in DNA repair, experimentally observing that a tgt mutant is more resistant to UV than the wt strain. It is worth noting that authors employ a wide variety of techniques, both experimental and bioinformatic. However, some aspects of the work need to be clarified or reevaluated.

      (1) The statement that the absence of Q increases the translation of TAT codons and proteins encoded by TAT-enriched genes presents the following problems that should be addressed:

      (1.1) The increase in TAT codon translation in the absence of Q is not supported by proteomics, since there was no detected statistical difference for TAT codon usage in proteins differentially expressed. Furthermore, there are some problems regarding the statistics of proteomics. Some proteins shown in Table S1 have adjusted p-values higher than their pvalues, which makes no sense. Maybe there is a mistake in the adjusted p-value calculation.

      We appreciate the reviewer’s thorough examination of our findings. In our study, we employed an adaptive Benjamini-Hochberg (BH) procedure to control the false discovery rate in our list of selected proteins, as explained in the Data Analysis part of the Proteomics MS and analysis part of our material and methods. The classical BH procedure (10.1111/j.2517-6161.1995.tb02031.x) calculates the 𝑚×𝑝(𝑗) adjusted p-value for the i-th ranked p-value as min where 𝑝(𝑗) is the j-th ranked pvalue and 𝑚 is the number of tests (e.g. number of proteins) (see 10.1021/acs.jproteome.7b00170 for details). Since m/j > 1 and 𝑝(𝑗) > 𝑝(𝑖) for 𝑗≥𝑚, it follows that for 𝑗≥i, resulting in adjusted p-values being higher or equal than the original p-values. Therefore, contrary to the reviewer's comment, it is a mathematical property that the adjusted p-value is greater than the original p-value when using the classical Benjamini-Hochberg procedure. 

      However, we want to underline that we used an « adaptive » BH procedure, which calculates the adjusted p-value for the i-th ranked p-value as min , where 𝜋0 is an estimate of the proportion of true null hypotheses (see 10.1021/acs.jproteome.7b00170 for details). Indeed, the classical BH procedure makes the assumption that 𝜋0 \= 1, which is a strong assumption in MS-based proteomics context.  Consequently, the mathematical property that the adjusted p-value is greater than the original p-value does not always hold true in our approach (that depends also on the 𝜋0 parameter).

      In addition, it is not common to assume that proteins that are quantitatively present in one condition and absent in another are differentially abundant proteins. Proteomics data software typically addresses this issue and applies some corrections. It would be advisable to review that.

      We thank the reviewer for highlighting this point. Indeed, some software impute a random small value to replace missing values and then produces statistics based on this imputed data (10.1038/nmeth.3901). However, the validity and relevance of generating statistics in the absence of actual data is questionable. 

      There are no universally accepted guidelines for handling this situation, and we believe it is more logical to set these values aside as potential interesting proteins. It is well-established that intensity values are often missing due to the detection limits of the spectrometer, suggesting that the missing values observed in several replicates of a condition are actually due to low values (see 10.1093/bioinformatics/btp362 and 10.1093/bioinformatics/bts193 for instance). It is thus logical to consider the associated proteins as potentially differentially abundant when comparing their complete absence in all replicates of one condition to their presence in several replicates of another condition.

      (1.2) Problems with the interpretation of Ribo-seq data (Figure 4D). On the one hand, the Ribo-seq data should be corrected (normalized) with the RNA-seq data in each of the conditions to obtain ribosome profiling data, since some genes could have more transcription in some of the conditions studied. In other articles in which this technique is used (such as in Tuorto et al., EMBO J. 2018; doi: 10.15252/embj.201899777), it is interpreted that those positions in which the ribosome moves most slowly and therefore less efficiently translated), are the most abundant. Assuming this interpretation, according to the hypothesis proposed in this work, the fragments enriched in TAT codons should have been less abundant in the absence of Q-tRNA (tgt mutant) in the Rib-seq experiment. However, what is observed is that TAT-enriched fragments are more abundant in the tgt mutant, and yet the Ribo-seq results are interpreted as RNA-seq, stating that this is because the genes corresponding to those sequences have greater expression in the absence of Q. 

      As recommended by the reviewer, we normalized the RiboSeq data with the RNAseq data to account for potential RNA variations. The updated Figure 4 demonstrates that this normalization does not alter our findings, confirming that variations at the RNAseq level do not contradict changes at the translational level. 

      The reviewer's observation that pauses at TAT codons would lead to ribosome accumulation and subsequent categorization as "up" genes is accurate. We must emphasize, however, that this category of “up genes” is probably quite diverse. The effect of ribosome stalling at TAT codons on total mRNA ribosome occupancy is likely highly variable, depending on the location of the TAT codon(s) within the CDS and the gene's expression level. We therefore think that genes in the "Up" category mainly correspond to genes that are more translated because the impact of pausing at TAT codons is probably not strong enough. Note that unlike what is usually done in bacterial riboseq experiments, we did not use any antibiotics to artificially freeze the ribosomes.

      On the other hand, it would be interesting to calculate the mean of the protein levels encoded by the transcripts with high and low ribosome profiling data.

      While this is a common request, we believe that comparing RiboSeq and proteomics data is not particularly informative. RiboSeq data directly measures translation, while proteomics provides information about protein abundance at steady state, reflecting the balance between protein synthesis and degradation. Furthermore, the number of proteins detectable by mass spectrometry is significantly smaller than the number of genes quantified by RiboSeq. Given these factors, there is often a low correlation between translation and protein abundance, making a direct comparison less relevant 

      (1.3) This statement is contrary to most previously reported studies on this topic in eukaryotes and bacteria, in which ribosome profiling experiments, among others, indicate that translation of TAT codons is slower (or unaffected) than translation of the TAC codons, and the same phenomenon is observed for the rest of the NAC/T codons. This is completely opposed to the results showed in Figure 4. However, the results of these studies are either not mentioned or not discussed in this work. Some examples of articles that should be discussed in this work:

      - "Queuosine-modified tRNAs confer nutritional control of protein translation" (Tuorto et al., 2018; 10.15252/embj.201899777)

      - "Preferential import of queuosine-modified tRNAs into Trypanosoma brucei mitochondrion is critical for organellar protein synthesis" (Kulkarni et al., 2021; doi:10.1093/nar/gkab567.

      - "Queuosine-tRNA promotes sex-dependent learning and memory formation by maintaining codonbiased translation elongation speed" (Cirzi et al., 2023; 10.15252/embj.2022112507)

      - "Glycosylated queuosines in tRNAs optimize translational rate and post-embryonic growth" (Zhao et al., 2023; 10.1016/j.cell.2023.10.026)

      - "tRNA queuosine modification is involved in biofilm formation and virulence in bacteria" (Diaz-Rullo and Gonzalez-Pastor, 2023; doi: 10.1093/nar/gkad667). In this work, the authors indicate that QtRNA increases NAT codon translation in most bacterial species. Could the regulation of TAT codonenriched proteins by Q-tRNAs in V. cholerae an exception? In addition, authors use a bioinformatic method to identify genes enriched in NAT codons similar to the one used in this work, and to find in which biological process are involved the genes whose expression is affected by Q-tRNAs (as discussed for the phenotype of UV resistance). It will be worth discussing all of this.

      Thank you for detailed suggestions, we agree that this discussion was missing and this comment gives us a chance to address that in the revised version of the manuscript.

      About the references above suggested by the referee, 4 of these papers were not mentioned in our manuscript, these were published while our manuscript was previously in review and we realize we have not cited them in the latest version of our manuscript. We thank the referee for highlighting this. We have now included a discussion about this. 

      We included the following in the discussion:

      “However, the opposite codon preference was shown in E. coli {Diaz-Rullo, 2023 #1888}. In eukaryotes also, several recent studies indicate slower translation of U-ending codons in the absence of Q34 {Cirzi, 2023 #1887;Kulkarni, 2021 #1886;Tuorto, 2018 #1268}. It’s important to note here, that in V. cholerae ∆tgt, increased decoding of U-ending codons is observed only with tyrosine, and not with the other three NAC/U codons (Histidine, Aspartate, Asparagine). This is interesting because it suggests that what we observe with tyrosine may not adhere to a general rule about the decoding efficiency of U- or C-ending codons, but instead seems to be specific to Tyr tRNAs, at least in the context of V. cholerae. Exceptions may also exist in other organisms. For example, in human cells, queuosine increases efficiency of decoding for U- ending codons and slows decoding of C- ending codons except for AAC {Zhao, 2023 #1889}. In this case, the exception is for tRNA Asparagine. Moreover, in mammalian cells {Tuorto, 2018 #1268}, ribosome pausing at U-ending codons is strongly seen for Asp, His and Asn, but less with Tyr. In Trypanosoma {Kulkarni, 2021 #1886}, reporters with a combination of the 4 NAC/NAU codons for Asp, Asn, Tyr, His have been tested, showing slow translation at U- ending version of the reporter in the absence of Q, but the effect on individual codons (e.g. Tyr only) is not tested. In mice {Cirzi, 2023 #1887}, ribosome slowdown is seen for the Asn, Asp, His U-ending codons but not for the Tyr U-ending codon. In summary, Q generally increases efficiency of U- ending codons in multiple organisms, but there appears to be additional unknown parameters which affect tyrosine UAU decoding, at least in V. cholerae. Additional factors such as mRNA secondary structures or mistranslation may also contribute to the better translation of UAU versions of tested genes. Mistranslation could be an important factor. If codon decoding fidelity impacts decoding speed, then mistranslation could also contribute to decoding efficiency of Tyr UAU/UAC codons and proteome composition.”

      (1.4) It is proposed that the stress produced by the TOB antibiotic causes greater translation of genes enriched in TAT codons. 

      Actually, it’s the opposite because in presence of TOB, in the wt, tgt would be induced leading to more Q on tRNA-Tyr and less translation of TAT.

      On the one hand, it is shown that the GFP-TAT version (gene enriched in TAT codons) and the RsxATAT-GFP protein (native gene naturally enriched in TAT) are expressed more, compared to their versions enriched in TAC in a tgt mutant than in a wt, in the presence of TBO (Fig. 5C). 

      Figure 5C shows relative fluorescence, ie changes of fluorescence in delta-tgt compared to WT. So it’s not necessarily more expressed but “more increased”

      However, in the absence of TOB, and in a wt context, although the two versions of GFP have a similar expression level (Fig. 3SD), the same does not occur with RsxA, whose RsxA-TAT form (the native one) is expressed significantly more than the RsxA-TAC version (Fig. 3SA). How can it be explained that in a wt context, in which there are also tRNA Q-modification, a gene naturally enriched in TAT is translated better than the same gene enriched in TAC?

      We thank the referee for this question based on careful assessment of our data. We agree, there appears to be significantly more RsxA-TAT in WT than RsxA-TAC. This could be due to other effects such as secondary structure formation on mRNA when the wt RsxA is recoded with TAC codons. This does not hinder the conclusion that the translation of the TAT version is increased in delta-tgt compared to WT.  

      It would be expected that in the presence of Q-tRNAs the two versions would be translated equally (as happens with GFP) or even the TAT version would be less translated. On the other hand, in the presence of TOB the fluorescence of WT GFP(TAT) is higher than the fluorescence of WT GFP(TAC) (Figure S3E) (mean fluorescence data for RsxA-GFP version in the presence of TOB is not shown). These results may indicate that the apparent better translation of TAT versions could be due to indirect effects rather from TAT codon translation.

      This is now mentioned in the manuscript

      “We cannot exclude, however, that additional factors such as mRNA secondary structures also contributes to the better translation of UAU versions of tested genes. “

      (2) Another problem is related to the already known role of Q in prevention of stop codon readthrough, which is not discuss at all in the work. In the absence of Q, stop codon readthrough is increased. In addition, it is known that aminoglycosides (such as tobramycin) also increase stop codon readthrough ("Stop codon context influences genome-wide stimulation of termination codon readthrough by aminoglycosides"; Wanger and Green, 2023; 10.7554/eLife.52611). Absence of Q and presence of aminoglycosides can be synergic, producing devastating increases in stop codon readthrough and a large alteration of global gene expression. All of these needs to be discussed in the work. Moreover, it is known that stop codon readthrough can alter gene expression and mRNA sequence context all influence the likelihood of stop codon readthrough. Thus, this process could also affect to the expression of recoded GFP and RsxA versions.

      We included the following in the revised version of the manuscript (results):

      “Q modification impacts decoding fidelity in V. cholerae.

      To test whether a defect in Q34 modification influences the fidelity of translation in the presence and absence of tobramycin, previously developed reporter tools were used (Fabret & Namy, 2021), to measure stop codons readthrough in V. cholerae ∆tgt and wild-type strains. The system consists of vectors containing readthrough promoting signals inserted between the lacZ and luc sequences, encoding β-galactosidase and luciferase, respectively. Luciferase activity reflects the readthrough efficiency, while β-galactosidase activity serves as an internal control of expression level, integrating a number of possible sources of variability (plasmid copy number, transcriptional activity, mRNA stability, and translation rate).  We found increased readthrough at stop codons UAA and to a lesser extent at UAG for ∆tgt, and this increase was amplified for UAG in presence of tobramycin (Fig. S2, stop readthrough). In the case of UAA, tobramycin appears to decrease readthrough, this may be artefactual, due to the toxic effect of tobramycin on ∆tgt.

      Mistranslation at specific codons can also impact protein synthesis. To further investigate mistranslation levels by tRNATyr in WT and ∆tgt, we designed a set of gfp mutants where the codon for the catalytic tyrosine required for fluorescence (TAT at position 66) was substituted by nearcognate codons (Fig. S2). Results suggest that in this sequence context, particularly in the presence of tobramycin, non-modified tRNATyr mistakenly decodes Asp GAC, His CAC and also Ser UCC, Ala GCU, Gly GGU, Leu CUU and Val GUC codons, suggesting that Q34 increases the fidelity of tRNATyr. 

      In parallel, we replaced Tyr103 of the β-lactamase described above, with Asp codons GAT or GAC. The expression of the resulting mutant β-lactamase is expected to yield a carbenicillin sensitive phenotype. In this system, increased tyrosine misincorporation (more mistakes) by tRNATyr at the mutated Asp codon, will lead to increased synthesis of active β-lactamase, which can be evaluated by carbenicillin tolerance tests. As such, amino-acid misincorporation leads here to phenotypic (transient) tolerance, while genetic reversion mutations result in resistance (growth on carbenicillin). The rationale is summarized in Fig. 3C. When the Tyr103 codon was replaced with either Asp codons, we observe increased β-lactamase tolerance (Fig. 3D, left), suggesting increased misincorporation of tyrosine by tRNATyr at Asp codons in the absence of Q, again suggesting that Q34 prevents misdecoding of Asp codons by tRNATyr.

      In order to test any effect on an additional tRNA modified by Tgt, namely tRNAAsp, we mutated the Asp129 (GAT) codon of the β-lactamase. When Asp129 was mutated to Tyr TAT (Fig. 3D, right), we observe reduced tolerance in ∆tgt, but not when it was mutated to Tyr TAC, suggesting less misincorporation of aspartate by tRNAAsp at the Tyr UAU codon in the absence of Q. In summary, absence of Q34 increases misdecoding by tRNATyr at Asp codons, but decreases misdecoding by tRNAAsp at Tyr UAU. 

      This supports the fact that tRNA Q34 modification is involved in translation fidelity during antibiotic stress, and that the effects can be different on different tRNAs, e.g. tRNATyr and tRNAAsp tested here.”

      Added figures: Figure S2, Figure 3CD

      (3) The statement about that the TOB resistance depends on RsxA translation, which is related to the presence of Q, also presents some problems:

      (3.1) It is observed that the absence of tgt produces a growth defect in V. cholerae when exposed to TOB (Figure 1A), and it is stated that this is mediated by an increase in the translation of RsxA, because its gene is TAT enriched. However, in Figure S4F, it is shown that the same phenotype is observed in E. coli, but its rsxA gene is not enriched in TAT codons. Therefore, the growth defect observed in the tgt mutant in the presence of TOB may not be due to the increase in the translation of TAT codons of the rsxA gene in the absence of Q. This phenotype is very interesting, but it may be related to another molecular process regulated by Q. Maybe the role of Q in preventing stop codon readthrough is important in this process, reducing cellular stress in the presence of TOB and growing better.

      FigS4F (now figure 5D) shows that rsxA can be toxic during growth in presence of tobramycin, but it does not show that rsxA translation is increased in E. coli in delta-tgt. However, we agree with the referee that there are probably additional processes regulated by Q which are also involved in the response to TOB stress. We already had mentioned this briefly in the discussion (“Note that, our results do not exclude the involvement of additional Q-regulated MoTTs in the response to sub-MIC TOB, since Q modification leads to reprogramming of the whole proteome. “), we further discussed it as follows:

      “As a consequence, transcripts with tyrosine codon usage bias are differentially translated. One such transcript codes for RsxA, an anti-SoxR factor. SoxR controls a regulon involved in oxidative stress response and sub-MIC aminoglycosides trigger oxidative stress in V. cholerae{Baharoglu, 2013 #720}, pointing to an involvement of oxidative stress response in the response to sub-MIC tobramycin stress.

      A link between Q34 and oxidative stress has also been previously found in eukaryotic organisms {Nagaraja, 2021 #1466}. Note that our results do not exclude the involvement of additional Qregulated translation of other transcripts in the response to tobramycin. Q34 modification leads to reprogramming of the whole proteome, not only for other transcripts with codon usage bias, but also through an impact on the levels of stop codon readthrough and mistranslation at specific codons, as supported by our data.”

      (3.2) All experiments related to the effect of Q on the translation of TAT codons have been performed with the tgt mutant strain. Considering that the authors have a pSEVA-tgt plasmid to overexpress this gene, they would have to show whether tgt overexpression in a wt strain produces a decrease in the translation of proteins encoded by TAT-enriched genes such as RsxA. This experiment would allow them to conclude that Q reduces RsxA levels, increasing resistance to TOB.

      We agree that this would be interesting to test, however, as it can be seen in figure 1B, delta-tgt pSEVAtgt (complemented strain) grows better than WT pSEVA-tgt (tgt overexpression). In fact, overexpression of tgt negatively impacts cell growth and yield smaller colonies, especially when cells carry a second plasmid (e.g with gfp constructs). We have also seen this with other RNA modification gene overexpressions in the lab (unpublished). We believe that the expression of tgt is tuned and since overexpression affects fitness, it is generally difficult to conduct experiments with overexpression plasmid for RNA modifications.  Nevertheless, we have done the experiment (with slow growing bacteria) and when we normalize expression of gfp in the presence of tgt overexpressing plasmid to the condition with no plasmid, we see little (1.5 fold) or no effect of tgt overexpression on fluorescence (see graph below). This is probably due to a toxic effect of ooverexpression and we do not believe these results are biologically relevant. 

      Author response image 1.

      (3.3) On the other hand, Fig. 1B shows that when the wt and tgt strains compete, both overexpressing tgt, the tgt mutant strain grows better in the presence of TOB. This result is not very well understood, since according to the hypothesis proposed, the absence of modification by Q of the tRNA would increase the translation of genes enriched in TAT, therefore, a strain with a higher proportion of Q-modified tRNAs as in the case of the wt strain overexpressing tgt would express the rsxA gene less than the tgt strain overexpressing tgt and would therefore grow better in the presence of TOB. For all these reasons, it would be necessary to evaluate the effect of tgt overexpression on the translation of RsxA.

      See our answer above about negative effect of tgt overexpression.

      (3.4) According to Figure 1I, the overexpression of tRNA-Tyr(GUA) caused a better growth of tgt mutant in comparison to WT. If the growth defect observed in tgt mutant in the presence of TOB is due to a better translation of the TAT codons of rsxA gene, the overexpression of tRNA-Tyr(GUA) in the tgt mutant should have resulted in even better RsxA translation a worse growth, but not the opposite result.

      We agree, we think that rsxA is not the only factor responsible for growth defect of tgt in presence of TOB (as now further discussed in the discussion). Overexpression of tRNAtyr possibly changes the equilibrium between the decoding of TAC vs TAT and may restore translation of TAC enriched genes. As also suggested by rev3, we have measured decoding reporters for TAT/TAC while overexpressing tTNA-tyr. This is now added to the results in fig S2C and the following:

      “We also tested decoding reporters for TAT/TAC in WT and ∆tgt overexpressing tRNATyr in trans (Fig. S1C). The presence of the plasmid (empty p0) amplified differences between the two strains with decreased decoding of TAC (and increased TAT, as expected) in ∆tgt compared to WT. Overexpression of tRNATyrGUA did not significantly impact decoding of TAT and increased decoding of TAC, as expected. Since overexpression of tRNATyrGUA rescues ∆tgt in tobramycin (Fig. 1I) and facilitates TAC decoding, this suggests that issues with TAC codon decoding contribute to the fitness defect observed in ∆tgt upon growth with tobramycin. Overexpression of tRNATyrAUA increased decoding of TAT in WT but did not change it in ∆tgt where it is already high. Unexpectedly, overexpression of tRNATyrAUA also increased decoding of TAC in WT. Thus, overexpression of tRNATyrAUA possibly changes the equilibrium between the decoding of TAC vs TAT and may restore translation of TAC enriched transcripts.” 

      Added figure: figure S1C

      (4) It cannot be stated that DNA repair is more efficient in the tgt mutant of V. cholerae, as indicated in the text of the article and in Fig 7. The authors only observe that the tgt mutant is more resistant to UV radiation and it is suggested that the reason may be TAT bias of DNA repair genes. To validate the hypothesis that UV resistance is increased because DNA repair genes are TAT biased, it would be necessary to check if DNA repair is affected by Q. UV not only produces DNA damage, but also oxidative stress. Therefore, maybe this phenotype is due to the increase in proteins related to oxidative stress controlled by RsxA, such as the superoxide dismutase encoded by sodA. It is also stated that these repair genes were found up for the tgt mutant in the Ribo-seq data, with unchanged transcription levels. Again, it is necessary to clarify this interpretation of the Ribo-seq data, since the fact that they are more represented in a tgt mutant perhaps means that translation is slower in those transcripts. Has it been observed in proteomics (wt vs tgt in the absence of TOB) whether these proteins involved in repair are more expressed in a tgt mutant?

      We agree that our results do not directly show that DNA repair is more efficient, but that delta-tgt responds better to UV. This has been modified in the manuscript. About oxidative stress, we did not see a better or worse response to H202 of delta-tgt. Moreover, since we see better response of deltatgt  to UV only in V. cholerae and not in E. coli, we did not favor the hypothesesi of response to stressox. In proteomics, we do not detect changes for DNA repair genes except for RuvA which is more abundant in delta-tgt. We have toned down the statement about DNA repair in the paper.

      (5) The authors demonstrate that in E. coli the tgt mutant does not show greater resistance to UV radiation (Fig. 7D), unlike what happens in V. cholerae. It should be discussed that in previous works it has been observed that overexpression in E. coli of the tgt gene or the queF gene (Q biosynthesis) is involved in greater resistance to UV radiation (Morgante et al., Environ Microbiol, 2015 doi: 10.1111/1462-2920.12505; and Díaz-Rullo et al., Front Microbiol. 2021 doi: 10.3389/fmicb.2021.723874). As an explanation, it was proposed (Diaz-Rullo and Gonzalez-Pastor, NAR 2023 doi: 10.1093/nar/gkad667) that the observed increase in the capacity to form biofilms in strains that overexpress genes related to Q modification of tRNA would be related to this greater resistance to UV radiation.

      We now mention the previous observations suggesting a link between tgt and UV. We thank the referee for the reference which we had overlooked. Note that in the case of our experiments, all cultures are in planktonic form and are not allowed to form biofilms. We thus prefer not to biofilmlinked processes in this study.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript the authors begin with the interesting phenotype of sub-inhibitory concentrations of the aminoglycoside tobramycin proving toxic to a knockout of the tRNA-guanine transglycosylase (Tgt) of the important human pathogen, Vibrio cholerae. Tgt is important for incorporating queuosine (Q) in place of guanosine at the wobble position of GUN codons. The authors go on to define a mechanism of action where environmental stressors control expression of tgt to control translational decoding of particularly tyrosine codons, skewing the balance from TAC towards TAT decoding in the absence of the enzyme. The authors use advanced proteomics and ribosome profiling to reveal that the loss of tgt results in increased translation of proteins like RsxA and a cohort of DNA repair factors, whose genes harbor an excess of TAT codons in many cases. These findings are bolstered by a series of molecular reporters, mass spectrometry, and tRNA overexpression strains to provide support for a model where Tgt serves as a molecular pivot point to reprogram translational output in response to stress.

      Strengths:

      The manuscript has many strengths. The authors use a variety of strains, assays, and advanced techniques to discover a mechanism of action for Tgt in mediating tolerance to sub-inhibitory concentrations of tobramycin. They observe a clear phenotype for a tRNA modification in facilitating reprogramming of the translational response, and the manuscript certainly has value in defining how microbes tolerate antibiotics.

      We thank the referee for their time and comments. 

      Weaknesses:

      The conclusions of the manuscript are mostly very well-supported by the data, but in some places control experiments or peripheral findings cloud precise conclusions. Some additional clarification, discussion, or even experimental extension could be useful in strengthening these areas.

      (1) The authors have created and used a variety of relevant molecular tools. In some cases, using these tools in additional assays as controls would be helpful. For example, testing for compensation of the observed phenotypes by overexpression of the Tyrosine tRNA(GUA) in Figure 2A with the 6xTAT strain, Figure 5C with the rxsA-GFP fusion, and/or Figure 7B with UV stress would provide additional information of the ability of tRNA overexpression to compensate for the defect in these situations.

      Thank you for the suggestions. Since overexpression of tRNA tyr is not expected to decrease decoding of TAT, we do not necessarily expect any effect for UV and rsxA expression. Overexpression of tRNA_GUA restores fitness of delta-tgt in TOB, but this is probably independent of RsxA. As ref2 also suggested above, we included in the discussion that the effect seen in delta-tgt with TOB is not only due to RsxA expression but also additional processes. However, these suggestions are interesting and we performed the following experiments in order to have an answer for these questions: 

      - “testing for compensation of the observed phenotypes by overexpression of the Tyrosine tRNA(GUA) in Figure 2A with the 6xTAT strain”: 

      This is now included in figure S2C and results as follows: 

      “We also tested decoding reporters for TAT/TAC in WT and ∆tgt overexpressing tRNA-Tyr in trans (Fig. S1C). The presence of the plasmid amplified differences between the two strains with decreased decoding of TAC (and increased TAT, as expected) in ∆tgt with empty plasmid compared to WT. Overexpression of tRNA_TyrGUA did not significantly impact decoding of TAT and increased decoding of TAC as expected. Since overexpression of tRNA_TyrGUA rescues ∆tgt in tobramycin (Fig. 1I) and facilitates TAC decoding, this suggests that issues with TAC codon decoding contribute to the fitness defect observed in ∆tgt upon growth with tobramycin. Overexpression of tRNA_TyrAUA increased decoding of TAT in WT but did not change it in ∆tgt where it is already high. Interestingly, overexpression of TyrAUA also increased decoding of TAC in WT. Thus, overexpression of tRNA_TyrAUA possibly changes the equilibrium between the decoding of TAC vs TAT and may restore translation of TAC enriched transcripts. “  

      -  Figure 5C with the rxsA-GFP fusion: 

      When we overexpress tRNA_GUA, rsxA fluorescence is 2-fold higher in delta-tgt compared to wt. However, the fluorescence is highly decreased compared to the condition with no tRNA overexpression. While we are not sure whether this apparent decrease is a technical issue or not (e.g. due to the presence of additional plasmid), we prefer not to further explore this in this manuscript. Note that we could not obtain delta-tgt strain carrying both plasmids expressing tRNA_GUA and rsxA, suggesting toxic overproduction of rsxA in this context.

      Author response image 2.

      - Figure 7B with UV stress: 

      Here again, delta-tgt overexpressing tRNA_GUA is still more UV resistant than WT overexpressing tRNA_GUA.

      Author response image 3.

      (2) The authors present a clear story with a reprogramming towards TAT codons in the knockout strain, particularly regarding tobramycin treatment. The control experiments often hint at other codons also contributing to the observed phenotypes (e.g., His or Asp), yet these effects are mostly ignored in the discussion. It would be helpful to discuss these findings at a minimum in the discussion section, or possibly experimentally address the role of His or Asp by overexpression of these tRNAs together with Tyrosine tRNA(GUA) in an experiment like that of Figure 1I to see if a more "wild type" phenotype would present. In fact, the synergy of Tyr, His, and/or Asp codons likely helps to explain the effects observed with the DNA repair genes in later experiments.

      We thank the referee for the suggestion. We agree that there could be synergies between these codons, and that’s probably why proteomics data does not clearly reflect tyrosine codons usage bias. This is now further discussed in the ideas and speculation section. 

      Moreover, we have added Figure S3G and the following result:

      “Since not all TAT biased proteins are found to be enriched in ∆tgt proteomics data, the sequence context surrounding TAT codons could affect their decoding. To illustrate this, we inserted after the gfp start codon, various tyrosine containing sequences displayed by rsxA (Fig. S3G). The native tyrosines were all TAT codons, our synthetic constructs were either TAT or TAC, while keeping the remaining sequence unchanged.  We observe that the production of GFP carrying the TEYTATLLL sequence from RsxA is increased in Δtgt compared to WT, while it is unchanged with TEYTACLLL. However, production of the GFP with the sequences LYTATRLL/LYTACRLL and EYTATLR/ EYTACLR was not unaffected (or even decreased for the latter) by the absence of tgt. Overall, our results demonstrate that RsxA is upregulated in the ∆tgt strain at the translational level, and that proteins with a codon usage bias towards tyrosine TAT are prone to be more efficiently translated in the absence of Q modification, but this is also dependent on the sequence context. “

      (3) Regarding Figure 6D, the APB northern blot feels like an afterthought. It was loaded with different amounts of RNA as input and some samples are repeated three times, but Δcrp only once. Collectively, it makes this experiment very difficult to assess.

      A different amount of RNA was used only for ∆tgt in which we have only one band because of the absence of modification. For all the other conditions, the same amount of RNA was used (0.9 µg). Additional replicates of crp were in an additional gel but only a representative gel was shown in the manuscript. This is now specified in the legend.

      We also attach below the picture of the gel with total RNA (syber Gold labelling of total RNA), where it can be seen that the lanes contain an equivalent quantity of RNA, except for ∆tgt.

      Author response image 4.

      Minor Points:

      (3) Fig S2B, do the authors have a hypothesis why the Asp and Phe tRNAs lead to a growth decrease in the untreated samples? It appears like Phe(GAA) partially compensates for the defect.

      Yes we agree, at this stage we do not have any satisfactory answer for this unfortunately. This would be interesting to study further but this is beyond the scope of the present study.

      (5) Lines 655 to 660 seem more appropriate as speculation in the discussion rather than as a conclusion in the results, where no direct experiments are performed. The authors might take advantage of the "Ideas and Speculation" section that eLife allows.

      Thank you very much for this suggestion, we added this section to the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor.

      - Figure 6 - Fonts on several mutants is different size/type. fixed

      - What is the Pm promoter. Please expand and give enough details so reader can follow. Especially as it is less used in V. cholerae (typical being pBAD or pTAC promoters). done

      - Spacing where references are inserted should be checked. done

      - Line 860-863 - "V. cholerae's response to sub-MIC antibiotic stress is transposable to other Gramnegative pathogens" . This reads awkard. Consider rephrasing. done

      - Figure 7 - Text in A and C is very small and is very hard to read. Font for tgt is different.

      Fixed. Tgt is in italics.

      Reviewer #2 (Recommendations For The Authors):

      As specified in the public review, more evidence would be necessary to affirm that tRNAs not modified by Q have a greater preference for translating TAT codons, since there are several previous studies in which it is shown that Q-tRNAs have a greater preference for NAT codons (including TAT). For example, it is suggested to explore what happens with other recoded genes (enriched in TAT or TAC) if there is a high level of Q-tRNAs (overexpression of tgt in a wt context). It is also necessary to clarify how to interpret the Ribo-seq results, which apparently is different from how they have been interpreted in other studies.

      Please see above our responses and changes made to the manuscript.

      Minor corrections

      In Figure 8, replace "Epitranscriptomic adapation to stress" with "Epitranscriptomic adaptation to stress".

      Fixed, thank you for noticing!

      Reviewer #3 (Recommendations For The Authors):

      (1) Lines 48-50, and 110 to 112, the authors have a nice mechanism and story, yet the lines mentioned feel very qualified (e.g., "possibly", "plausibly") and lead to the abstract hiding the value and major conclusions of the study. The authors could consider to revise or even remove these lines to focus on the take-home message in the abstract and end of introduction/discussion. 

      Thank you for this comment, we modified the text.  

      (2) Additional description for the samples in the results section for Figure 1 would be helpful to the reader.

      Done

      (3) Figure S1, the line of experiments with rluF is interesting, but in the end the choice seems a little random. Have the authors assessed knockouts of other modifications on the ASL for effects? Since the modification is not well characterized in V. cholerae according to the authors, it might make sense to save this for a future paper.

      We removed S1, as we agree that this experiment does not really add something to the paper.

      (4) Line 334 and 353 are redundant.

      Fixed

      (5) It is likely beyond the scope of the study, but it would strengthen the paper to repeat Figure 3 with His and/or Asp based on the findings of 2C and 4E to better understand the contribution of His and Asp to Q biology.

      We repeated figure 3 with Asp. Based on Fig 2C (less efficient decoding of GAC in deta-tgt in TOB) and 4E (positive GAT codon bias in proteins up in riboseq in delta-tgt TOB), we would expect that beta-lactamase with asp GAC would be less efficiently decoded than GAT in delta-tgt. 

      This was added to the manuscript

      “Like Tyr103, Asp129 was shown to be important for resistance to β-lactams (Doucet et al., 2004; Escobar et al., 1994; Jacob et al., 1990). When we replaced the native Asp129 GAT with the synonymous codon Asp129 GAC, the GAC version did not appear to produce functional β-lactamase in ∆tgt (Fig. 3B), suggesting increased mistranslation or inefficient decoding of the GAC codon by tRNAAsp in the absence of Q. Decoding of GAT codon was also affected in ∆tgt in the presence of tobramycin.”

      Added figure: Figure 3B

      (6) The authors could consider replacing 5D with S4A-D, which is easier to understand in our opinion.

      Done

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This article identifies ADGR3 as a candidate GPCR for mediating beige fat development. The authors use human expression data from Human Protein Atlas and Gtex databases and combine this with experiments performed in mice and a murine cell line. They refer to a GPCR bioactivity screening tool PRESTO-Salsa, with which it was found that Hesperetin activates ADGR3. From their experiments, authors conclude that Hesperetin activates ADGR3, inducing a Gs-PKA-CREB axis resulting in adipose thermogenesis.

      Strengths:

      The authors analyze human data from public databases and perform functional studies in mouse models. They identify a new GPCR with a role in thermogenic activation of adipocytes.

      Considerations:

      Selection of ADGRA3 as a candidate GPCR relevant for mediating beiging in humans:

      The authors identify GPCRs that are expressed more highly in murine iBAT compared to iWAT in response to cold and assess which of these GPCRs are expressed in human subcutaneous or visceral adipocytes. Although this strategy will identify GPCRs that are expressed at higher levels in brown fat compared to beige and thus possibly more active in thermogenic function, the relevance in choosing GPCRs that also are expressed in unstimulated human white adipocytes should be considered. Thermogenic activity is not normally present in human white adipocytes. It would have strengthened the GPCR selection if the authors instead had assessed the intersection with human brown adipocytes that were activated with norepinephrine.

      We appreciate your constructive feedback and believe that by adopting this refined strategy, we will strengthen our selection of GPCRs related to adipose thermogenesis in other ongoing studies. We look forward to continuing our research in this area and contributing to the understanding of adipose thermogenesis and its potential therapeutic applications. Thank you once again for your valuable input. 

      Strategy to investigate the role of ADGRA3 in WAT beiging:

      Having identified ADGRA3 as their candidate receptor, the authors investigated the receptor in mouse models, the murine inguinal adipocyte cell line 3T3 and in human subcutaneous adipose progenitors (HAdsc) differentiated in vitro. Calling the human cells "beige" is a stretch as these cells are derived from a white adipose depot. The authors do observe regulation in UCP1 and abundance of mitochondria following modification of ADGRA3 in the cells. However, in future studies, it should be considered if the receptor rather plays a role in differentiation per se, and perhaps not specifically in thermogenic differentiation/activity.

      Regarding the reviewer's suggestion to consider whether ADGRA3 plays a role in differentiation per se, rather than specifically in thermogenic differentiation/activity, we acknowledge that this is an important consideration. Our current studies have focused on the role of ADGRA3 in regulating UCP1 expression and mitochondrial abundance, which are hallmarks of adipose thermogenic activity. However, we recognize that ADGRA3 may also have broader roles in adipocyte differentiation and function that are not limited to thermogenesis.

      To address this point, in future studies, we plan to conduct additional experiments to investigate the potential role of ADGRA3 in adipocyte differentiation, including its effects on the expression of markers of adipocyte differentiation and its impact on adipocyte metabolism and function. These studies will provide further insights into the mechanisms by which ADGRA3 regulates adipocyte biology.

      According to the Human Protein Atlas and Gtex databases, ADGRA3 is not only expressed in adipocytes, but also in other tissues and cell types. The authors address this by measuring the expression in a panel of these tissues, demonstrating a knockdown not only in the adipose tissue, but also in the liver and less pronounced in the muscle (Figure S2). It should thus be emphasized that the decreased TG levels in serum and liver in the mice might in fact depend on Adgra3 overexpression in the liver. Even though this might not have been the purpose of the experiment, it is important to highlight this as it could serve as hypothesis building for future studies of the function of this receptor.

      Thank you for your thoughtful comments and feedback. We appreciate the insight provided by the Human Protein Atlas and Gtex databases regarding the tissue distribution of ADGRA3. We fully acknowledge that the decreased TG levels observed in both the serum and liver of the mice might be linked to the overexpression of Adgra3 in the liver.

      Although this was not the primary objective of our experiment, we agree that this observation is worth highlighting as it could serve as a basis for future hypothesis-driven research on the functional role of ADGRA3 in different tissues. In light of your comments, we emphasized this potential link between Adgra3 overexpression in the liver and reduced TG levels in discussion, as follows.

      “…the precise mechanisms underlying the influence of on adipose thermogenesis. Furthermore, it is crucial to highlight that the observed decrease in TG levels in both serum and liver (Figure 4-figure supplement 2C-D) might be attributed to the significant increase in Adgra3 expression in the liver, which is a consequence of the nanoparticle-mediated overexpression of Adgra3. While the exact mechanism remains to be fully elucidated, this correlation suggests a potential link between Adgra3 overexpression in the liver and reduced TG levels in the serum. We will employ more sophisticated models in subsequent studies to further…”

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Zhao et al. explored the function of adhesion G protein-coupled receptor A3 (ADGRA3) in thermogenic fat biology.

      Strengths:

      Through both in vivo and in vitro studies, the authors found that the gain function of ADGRA3 leads to browning of white fat and ameliorates insulin resistance.

      Weaknesses:

      There are several lines of weak methodologies such as using 3T3-L1 adipocytes and intraperitoneal(i.p.) injection of virus. Moreover, as the authors stated that ADGRA3 is constitutively active, how could the authors then identify a chemical ligand?

      Comments on revised version:

      The revised manuscript by Zhao et al. has limited improvement. The authors refused to perform revised experiments using primary cultures even though two reviewers pointed out the same weakness (3T3-L1 adipocytes are unsuitable). Using infrared thermography to measure body temperature is also problematic.

      Thanks for your comments. We regret that human adipocytes induced from human adipose-derived stem cells (hADSCs) were not recognized as primary cultures by multiple reviewers. Therefore, we have included relevant experimental results of mouse primary adipocytes induced from stromal vascular fraction (SVF) in Figure 8E-H as a supplement. The thermal imaging device was used to measure the temperature of BAT, while the body temperature was measured at 9:00 using a rectal probe connected to a digital thermometer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This paper presents a data processing pipeline to discover causal interactions from time-lapse imaging data, and convicingly illustrates it on a challenging application for the analysis of tumor-on-chip ecosystem data. The core of the discovery module is the original tMIIC method of the authors, which is shown in supplementary material to compare favourably to two state-of-the-art methods on synthetic temporal data on a 15 nodes network.

      Strengths:

      This paper tackles the problem of learning causal interactions from temporal data which is an open problem in presence of latent variables. The core of the method tMIIC of the authors is nicely presented in connection to Granger- Schreiber causality and to the novel graphical conditions used to infer latent variables and based on a theorem about transfer entropy. tMIIC compares favourably to PC and PCMCI+ methods using different kernels on synthetic datasets generated from a network of 15 nodes. A full application to tumor-onchip cellular ecosystems data including cancer cells, immune cells, cancer-associated fibroblasts, endothelial cells and anti cancer drugs, with convincing inference results with respect to both known and novel effects between those components and their contact.

      The code and dataset are available online for the reproducibility of the results.

      We thank Reviewer #1 for highlighting the main results and strengths of our paper, as well as, for his/her recommendations below to further improve the manuscript.

      Weaknesses:

      The references to ”state-of-the-art methods” concerning the inference of causal networks should be more precise by giving citations in the main text, and better discussed in general terms, both in the first section and in the section of presentation of CausalXtract. It is only in the legend of the figures of the supplementary material that we get information. Of course, comparison on our own synthetic datasets can always be criticized but this is rather due to the absence of common benchmark and I would recommend the authors to explicitly propose their datasets as benchmark to the community.

      Following Reviewer #1’s suggestion, we now compare tMIIC’s performance to other state-of-the-art causal discovery methods for time series data in the main text and in a new Figure 2. This Figure 2 also highlights the relation between graph-based causal discovery methods for time series data and Granger-Schreiber temporal causality, as discussed in more details in Methods (Theorem 1).

      We also agree about the importance of sharing benchmark datasets with the community. This is the reason why we provide the dynamical equations of the 15-node benchmarks in Supplementary Tables 1 & 2, so that anyone can generate equivalent time series datasets of any desired length.

      Reviewer #2 (Public review):

      Summary:

      The authors propose a methodology to perform causal (temporal) discovery. The approach appears to be robust and is tested in the different scenarios: one related with live-cell imaging data, and another one using synthetic (mathematically defined) time series data. They compare the performance of their findings against another well-know method by using metrics like F-score, precision and recall,

      Strengths:

      Performance, robustness, the text is clear and concise, The authors provide the code to review.

      We thank Reviewer #2 for his/her positive assessment of our work and the suggestions below to improve the manuscript.

      Weaknesses:

      One concern could be the applicability of the method in other areas like climate, economy. For those areas, public data are available and might be interesting to test how the method performs with this kind of data.

      While our main expertise concerns the analysis of biological and biomedical data, we agree that tMIIC (which is included in MIIC R package) could in principle be applied to other areas, like climate, economy.

      We have not included benchmarks on such diverse types of datasets in the present manuscript, which focuses on CausalXtract’s pipeline for the analysis and causal interpretation of live-cell time-lapse imaging data from complex cellular systems.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      We thank Reviewer 1 for their helpful comments and hope that the changes made to the revised manuscript have addressed their points.

      This study presents a novel application of the inverted encoding (i.e., decoding) approach to detect the correlates of crossmodal integration in the human EEG (electrophysiological) signal. The method is successfully applied to data from a group of 41 participants, performing a spatial localization task on auditory, visual, and audiovisual events. The analyses clearly show a behavioural superiority for audio-visual localization. Like previous studies, the results when using traditional univariate ERP analyses were inconclusive, showing once more the need for alternative, more sophisticated approaches. Instead, the principal approach of this study, harnessing the multivariate nature of the signal, captured clear signs of super-additive responses, considered by many as the hallmark of multisensory integration. Unfortunately, the manuscript lacks many important details in the descriptions of the methodology and analytical pipeline. Although some of these details can eventually be retrieved from the scripts that accompany this paper, the main text should be self-contained and sufficient to gain a clear understanding of what was done. (A list of some of these is included in the comments to the authors). Nevertheless, I believe the main weakness of this work is that the positive results obtained and reported in the results section are conditioned upon eye movements. When artifacts due to eye movements are removed, then the outcomes are no longer significant. 

      Therefore, whether the authors finally achieved the aims and showed that this method of analysis is truly a reliable way to assess crossmodal integration, does not stand on firm ground. The worst-case scenario is that the results are entirely accounted for by patterns of eye movements in the different conditions. In the best-case scenario, the method might truly work, but further experiments (and/or analyses) would be required to confirm the claims in a conclusive fashion.

      One first step toward this goal would be, perhaps, to facilitate the understanding of results in context by reporting both the uncorrected and corrected analyses in the main results section. Second, one could try to support the argument given in the discussion, pointing out the origin of the super-additive effects in posterior electrode sites, by also modelling frontal electrode clusters and showing they aren't informative as to the effect of interest.

      We performed several additional analyses to address concerns that our main result was caused by different eye movement patterns between conditions. We re-ran our key analyses using activity exclusively from frontal electrodes, which revealed poorer decoding performance than that from posterior electrodes. If eye movements were driving the non-linear enhancement in the audiovisual condition, we would expect stronger decoding using sensors closer to the source, i.e., the extraocular muscles. We also computed the correlations between average eye position and stimulus position for each condition to evaluate whether participants made larger eye movements in the audiovisual condition, which might have contributed to better decoding results. Though we did find evidence for eye movements toward stimuli, the degree of movement did not significantly differ between conditions.

      Furthermore, we note that the analysis using a stricter eye movement criterion, acknowledged in the Discussion section of the original manuscript, resulted in very similar results to the original analysis. There was significantly better decoding in the AV condition (as measured by d') than the MLE prediction, but this difference did not survive cluster correction. The most likely explanation for this is that the strict eye movement criterion combined with our conservative measure of (mass-based) cluster correction led to reduced power to detect true differences between conditions. Taken together with the additional analyses described in the revised manuscript and supplementary materials, the results show that eye movements are unlikely to account for differences between the multisensory and unisensory conditions. Instead, our decoding results likely reflect nonlinear neural integration between audio and visual sensory information.

      “Any experimental design that varies stimulus location needs to consider the potential contribution of eye movements. We computed correlations between participants’ average eye position and each stimulus position between the three sensory conditions (auditory, visual and audiovisual; Figure S1) and found evidence that participants made eye movements toward stimuli. A re-analysis of the data with a very strict eye-movement criterion (i.e., removing trials with eye movements >1.875º) revealed that the super-additive enhancement in decoding accuracy no longer survived cluster correction, suggesting that our results may be impacted by the consistent motor activity of saccades towards presented stimuli. Further investigation, however, suggests this is unlikely. Though the correlations were significantly different from 0, they were not significantly different from each other. If consistent saccades to audiovisual stimuli were responsible for the nonlinear multisensory benefit we observed, we would expect to find a higher positive correlation between horizontal eye position and stimulus location in the audiovisual condition than in the auditory or visual conditions. Interestingly, eye movements corresponded more to stimulus location in the auditory and audiovisual conditions than in the visual condition, indicating that it was the presence of a sound, rather than a visual stimulus, that drove small eye movements. This could indicate that participants inadvertently moved their eyes when localising the origin of sounds. We also re-ran our analyses using the activity measured from the frontal electrodes alone (Figure S2). If the source of the nonlinear decoding accuracy in the audiovisual condition was due to muscular activity produced by eye movements, there should be better decoding accuracy from sensors closer to the source. Instead, we found that decoding accuracy of stimulus location from the frontal electrodes (peak d' = 0.08) was less than half that of decoding accuracy from the more posterior electrodes (peak d' = 0.18). These results suggest that the source of neural activity containing information about stimulus position was located over occipito-parietal areas, consistent with our topographical analyses (inset of Figure 3).” 

      The univariate ERP analyses an outdated contrast, AV <> A + V to capture multisensory integration. A number of authors have pointed out the potential problem of double baseline subtraction when using this contrast, and have recommended a number of solutions, experimental and analytical. See for example: [1] and [2]. 

      (1) Teder-Salejarvi, W. A., McDonald, J. J., Di Russo, F., & Hillyard, S. A. (2002). Cognitive Brain Research, 14, 106-114. 

      (2) Talsma, D., & Woldorff, M. G. (2005). Journal of cognitive neuroscience, 17(7), 1098-1114.

      We thank the reviewer for raising this point. Comparing ERPs across different sensory conditions requires careful analytic choices to discern genuine sensory interactions within the signal. The AV <> (A +V) contrast has often been used to detect multisensory integration, though any non-signal related activity (i.e. anticipatory waves; Taslma & Woldorff, 2005) or pre-processing manipulation (e.g. baseline subtraction; Teder-Sälejärvi et al., 2002) will be doubled in (A + V) but not in AV. Critically, we did not apply a baseline correction during preprocessing and thus our results are not at risk of double-baseline subtraction in (A + V). Additionally, we temporally jittered the presentation of our stimuli to mitigate the potential influence of consistent overlapping ERP waves (Talsma & Woldorff, 2005). 

      The results section should provide the neurometric curve/s used to extract the slopes of the sensitivity plot (Figure 2B). 

      We thank the reviewer for raising this point of clarification. The sensitivity plots for Figures 2B and 2C were extracted from the behavioural performance of the behavioural and EEG tasks, respectively. The sensitivity plot for Figure 2B was extracted from individual psychometric curves, whereas the d’ values for Figure 2C were calculated from the behavioural data for the EEG task. This information has been clarified in the manuscript.

      “Figure 1. Behavioural performance is improved for audiovisual stimuli. A) Average accuracy of responses across participants in the behavioural session at each stimulus location for each stimulus condition, fitted to a psychometric curve. Steeper curves indicate greater sensitivity in identifying stimulus location. B) Average sensitivity across participants in the behavioural task, estimated from psychometric curves, for each stimulus condition. The red cross indicates estimated performance assuming optimal (MLE) integration of unisensory cues. C) Average behavioural sensitivity across participants in the EEG session for each stimulus condition. Error bars indicate ±1 SEM.”

      The encoding model was fitted for each electrode individually; I wonder if important information contained as combinations of (individually non-significant) electrodes was then lost in this process and if the authors consider that this is relevant. 

      Although the encoding model was fitted for each electrode individually for the topographic maps (Figure 4B), in all other analyses the encoding model was fitted across a selection of electrodes (see final inset of Figure 3). As this electrode set was used for all other neural analyses, our model would allow for the detection of important information contained in the neural patterns across electrodes. This information has been clarified in the manuscript.

      “Thus, for all subsequent analyses we only included signals from the central-temporal, parietal-occipital, occipital and inion sensors for computing the inverse model (see final inset of Figure 2). As the model was fitted for multiple electrodes, subtle patterns of neural information contained within combinations of sensors could be detected.”

      Neurobehavioral correlations could benefit from outlier rejection and the use of robust correlation statistics. 

      We thank the reviewer for raising this issue. Note, however, that the correlations we report are resistant to the influence of outliers because we used Spearman’s rho1 (as opposed to Pearson’s). This information has been communicated in the manuscript.

      (1) Wilcox, R.R. (2016), Comparing dependent robust correlations. British Journal of Mathematical & Statistical Psychology, 69(3), 215-224. https://doi.org/10.1111/bmsp.12069

      “Neurobehavioural correlations. As behavioural and neural data violated assumptions of normality, we calculated rank-order correlations (Spearman’s rho) between the average decoding sensitivity for each participant from 150-250 ms poststimulus onset and behavioural performance on the EEG task. As Spearman’s rho is resistant to outliers (Wilcox, 2016), we did not perform outlier rejection.”

      “Wilcox, R.R. (2016), Comparing dependent robust correlations. British Journal of Mathematical & Statistical Psychology, 69(3), 215-224. https://doi.org/10.1111/bmsp.12069”

      Many details that are important for the reader to evaluate the evidence and to understand the methods and analyses aren't given; this is a non-exhaustive list:  

      We thank the reviewer for highlighting these missing details. We have updated the manuscript where necessary to ensure the methods and analyses are fully detailed and replicable.

      - specific parameters of the stimuli and performance levels. Just saying "similarly difficult" or "marginally higher volume" is not enough to understand exactly what was done.  

      “The perceived source location of auditory stimuli was manipulated via changes to interaural level and timing (Whitworth & Jeffress, 1961; Wightman & Kistler, 1992). The precise timing of when each speaker delivered an auditory stimulus was calculated from the following formula:

      where x and z are the horizontal and forward distances in metres between the ears and the source of the sound on the display, respectively, r is the head radius, and s is the speed of sound. We used a constant approximate head radius of 8 cm for all participants. r was added to x for the left speaker and subtracted for the right speaker to produce the interaural time difference. For ±15° source locations, interaural timing difference was 1.7 ms. To simulate the decrease in sound intensity as a function of distance, we calculated interaural level differences for the left and right speakers by dividing the sounds by the left and right distance vectors. Finally, we resampled the sound using linear interpolation based on the calculations of the interaural level and timing differences. This process was used to calculate the soundwaves played by the left and right speakers for each of the possible stimulus locations on the display. The maximum interaural level difference between speakers was 0.14 A for ±15° auditory locations, and 0.07 A for ±7.5°.”

      - where are stimulus parameters adjusted individually or as a group? Which method was followed?  

      To clarify, stimulus parameters (frequency, size, luminance, volume, location, etc.) were manipulated throughout pilot testing only. Parameters were adjusted to achieve similar pilot behavioural results between the auditory and visual conditions. For the experiment proper, parameters remained constant for both tasks and were the same for all participants.

      “During pilot testing, stimulus features (size, luminance, volume, frequency etc.) were manipulated to make visual and auditory stimuli similarly difficult to spatially localize. These values were held constant in the main experiment.”

      - specify which response buttons were used.

      “Participants were presented with two consecutive stimuli and tasked with indicating, via button press, whether the first (‘1’ number-pad key) or second (‘2’ number-pad key) interval contained the more leftward stimulus.”

      “At the end of each sequence, participants were tasked with indicating, via button press, whether more presentations appeared on the right (‘right’ arrow key) or the left (‘left’ arrow key) of the display.”

      - no information is given as to how many trials per condition remained on average, for analysis.  

      The average number of remaining trials per condition after eye-movement analysis is now included in the Methods section of the revised manuscript.

      “We removed trials with substantial eye movements (>3.75 away from fixation) from the analyses. After the removal of eye movements, on average 2365 (SD \= 56.94), 2346 (SD \= 152.87) and 2350 (SD \= 132.47) trials remained for auditory, visual and audiovisual conditions, respectively, from the original 2400 per condition.”

      - no information is given on the specifics of participant exclusion criteria. (even if the attrition rate was surprisingly high, for such an easy task).  

      The behavioural session also served as a screening task. Although the task instructions were straightforward, perceptual discrimination was not easy due to the ambiguity of the stimuli. Auditory localization is not very precise, and the visual stimuli were brief, dim, and diffuse. The behavioural results reflect the difficulty of the task. Attrition rate was high as participants who scored below 60% correct in any condition were deemed unable to accurately perform the task, were not invited to complete the subsequent EEG session, and omitted from the analyses. We have included the specific criteria in the manuscript.

      “Participants were first required to complete a behavioural session with above 60% accuracy in all conditions to qualify for the EEG session (see Behavioural session for details).”

      - EEG pre-processing: what filter was used? How was artifact rejection done? (no parameters are reported); How were bad channels interpolated?  

      We used a 0.25 Hz high-pass filter to remove baseline drifts, but no low-pass filter. In line with recent studies on the undesirable influence of EEG preprocessing on ERPs1, we opted to avoid channel interpolation and artifact rejection. This was erroneously reported in the manuscript and has now been clarified. For the sake of clarity, here we demonstrate that a reanalysis of data using channel interpolation and artifact rejection returned the same pattern of results. 

      (1) Delorme, A. (2023). EEG is better left alone. Scientific Reports, 13, 2372. https://doi.org/10.1038/s41598-023-27528-0

      - specific electrode locations must be given or shown in a plot (just "primarily represented in posterior electrodes" is not sufficiently informative).  

      A diagram of the electrodes used in all analyses is included within Figure 3, and we have drawn readers’ attention to this in the revised manuscript.

      “Thus, for all subsequent analyses we only included signals from the central-temporal, parietal-occipital, occipital and inion sensors for computing the inverse model (see final inset of Figure 2).” 

      - ERP analysis: which channels were used? What is the specific cluster correction method?

      We used a conservative mass-based cluster correction from Pernet et al. (2015) - this information has been clarified in the manuscript.

      “A conservative mass-based cluster correction was applied to account for spurious differences across time (Pernet et al., 2015).” 

      “Pernet, C. R., Latinus, M., Nichols, T. E., & Rousselet, G. A. (2015). Cluster-based computational methods for mass univariate analyses of event-related brain potentials/fields: A simulation study. Journal of Neuroscience Methods, 250, 85-93. https://doi.org/https://doi.org/10.1016/j.jneumeth.2014.08.003” 

      - results: descriptive stats on performance must be given (instead of saying "participants performed well").  

      The mean and standard deviation of participants’ performance for each condition in the behavioural and EEG experiments are now explicitly mentioned in the manuscript.

      “A quantification of the behavioural sensitivity (i.e., steepness of the curves) revealed significantly higher sensitivity for the audiovisual stimuli (M = .04, SD = .02) than for the auditory stimuli alone (M = .03, SD = .01; Z = -3.09, p = .002), and than for the visual stimuli alone (M = .02, SD = .01; Z = -5.28, p = 1.288e-7; Figure 1B). Sensitivity for auditory stimuli was also significantly higher than sensitivity for visual stimuli (Z = 2.02, p = .044).” 

      “We found a similar pattern of results to those in the behavioural session; sensitivity for audiovisual stimuli (M = .85, SD = .33) was significantly higher than for auditory (M = .69, SD = .41; Z = -2.27, p = .023) and visual stimuli alone (M = .61, SD = .29; Z = -3.52, p = 4.345e-4), but not significantly different from the MLE prediction (Z = -1.07, p = .285).” 

      - sensitivity in the behavioural and EEG sessions is said to be different, but no comparison is given. It is not even the same stimulus set across the two tasks...  

      This relationship was noted as a potential explanation for the higher sensitivities obtained in the EEG task, and was not intended to stand up to statistical scrutiny. We agree it makes little sense to compare statistically between the EEG and behavioural results as they were obtained from different tasks. We would like to clarify, however, that the stimuli used in the two tasks were the same, with the exception that in the EEG task the stimuli were presented from 5 locations versus 8 in the behavioural task. To avoid potential confusion, we have removed the offending sentence from the manuscript:

      Reviewer 2:

      Their measure of neural responses is derived from the decoder responses, and this takes account of the reliability of the sensory representations - the d' statistics - which is an excellent thing. It also means if I understand their analysis correctly (it could bear clarifying - see below), that they can generate from it a prediction of the performance expected if an optimal decision is made combining the neural signals from the individual modalities. I believe this is the familiar root sum of squares d' calculation (or very similar). Their decoding of the audiovisual responses comfortably exceeds this prediction and forms part of the evidence for their claims. 

      Yet, superadditivity - including that in evidence in the principle of inverse effectiveness more typically quantifies the excess over the sum of proportions correct in each modality. Their MLE d' statistic can already predict this form of superadditivity. Therefore, the superadditivity they report here is not the same form of superadditivity that is usually referred to in behavioural studies. It is in fact a stiffer definition. What their analysis tests is that decoding performance exceeds what would be expected from an optimally weighted linear integration of the unisensory information. As this is not the common definition it is difficult to relate to behavioral superadditivity reported in much literature (of percentage correct). This distinction is not at all clear from the manuscript. 

      But the real puzzle is here: The behavioural data or this task do not exceed the optimal statistical decision predicted by signal detection theory (the MLE d'). Yet, the EEG data would suggest that the neural processing is exceeding it. So why, if the neural processing is there to yield better performance is it not reflected in the behaviour? I cannot explain this, but it strikes me that the behaviour and neural signals are for some reason not reflecting the same processing. 

      Be explicit and discuss this mismatch they observe between behaviour and neural responses. 

      Thank you, we agree that it is worth expanding on the observed disconnect between MSI in behaviour and neural signals. We have included an additional paragraph in the Discussion of the revised manuscript. Despite the mismatch, we believe the behavioural and neural responses still reflect the same underlying processing, but at different levels of sensitivity. The behavioural result likely reflects a coarse down-sampling of the precision in location representation, and thus less likely to reflect subtle MSI enhancements.

      “An interesting aspect of our results is the apparent mismatch between the behavioural and neural responses. While the behavioural results meet the optimal statistical threshold predicted by MLE, the decoding analyses suggest that the neural response exceeds it. Though non-linear neural responses and statistically optimal behavioural responses are reliable phenomena in multisensory integration (Alais & Burr, 2004; Ernst & Banks, 2002; Stanford & Stein, 2007), the question remains – if neural super-additivity exists to improve behavioural performance, why is it not reflected in behavioural responses? A possible explanation for this neurobehavioural discrepancy is the large difference in timing between sensory processing and behavioural responses. A motor response would typically occur some time after the neural response to a sensory stimulus (e.g., 70-200 ms), with subsequent neural processes between perception and action that introduce noise (Heekeren et al., 2008) and may obscure super-additive perceptual sensitivity. In the current experiment, participants reported either the distribution of 20 serially presented stimuli (EEG session) or compared the positions of two stimuli (behavioural session), whereas the decoder attempts to recover the location of every presented stimulus. While stimulus location could be represented with higher fidelity in multisensory relative to unisensory conditions, this would not necessarily result in better performance on a binary behavioural task in which multiple temporally separated stimuli are compared. One must also consider the inherent differences in how super-additivity is measured at the neural and behavioural levels. Neural super-additivity should manifest in responses to each individual stimulus. In contrast, behavioural super-additivity is often reported as proportion correct, which can only emerge between conditions after being averaged across multiple trials. The former is a biological phenomenon, while the latter is an analytical construct. In our experiment, we recorded neural responses for every presentation of a stimulus, but behavioural responses were only obtained after multiple stimulus presentations. Thus, the failure to find super-additivity in behavioural responses might be due to their operationalisation, with between-condition comparisons lacking sufficient sensitivity to detect super-additive sensory improvements. Future work should focus on experimental designs that can reveal super-additive responses in behaviour.”

      Re-work the introduction to explain more clearly the relationship between the behavioural superadditivities they review, the MLE model, and the superadditivity it actually tests. 

      We agree it is worth discussing how super-additivity is operationalised across neural and behavioural measures. However, we do not believe the behavioural studies we reviewed claimed super-additive behavioural enhancements. While MLE is often used as a behavioural marker of successful integration, it is not necessarily used as evidence for super-additivity within the behavioural response, as it relies on linear operations. 

      “It is important to consider the differences in how super-additivity is classified between neural and behavioural measures. At the level of single neurons, superadditivity is defined as a non-linear response enhancement, with the multisensory response exceeding the sum of the unisensory responses. In behaviour, meanwhile, it has been observed that the performance improvement from combining two senses is close to what is expected from optimal integration of information across the senses (Alais & Burr, 2004; Stanford & Stein, 2007). Critically, behavioural enhancement of this kind does not require non-linearity in the neural response, but can arise from a reliability-weighted average of sensory information. In short, behavioural performance that conforms to MLE is not necessarily indicative of neural super-additivity, and the MLE model can be considered a linear baseline for multisensory integration.”

      Regarding the auditory stimulus, this reviewer notes that interaural time differences are unlikely to survive free field presentation.

      Despite the free field presentation, in both the pilot test and the study proper participants were able to localize auditory stimuli significantly above chance. 

      "However, other studies have found super-additive enhancements to the amplitude of sensory event-related potentials (ERPs) for audiovisual stimuli (Molholm et al., 2002; Talsma et al., 2007), especially when considering the influence of stimulus intensity (Senkowski et al., 2011)." - this makes it obvious that there are some studies which show superadditivity. It would have been good to provide a little more depth here - as to what distinguished those studies that reported positive effects from those that did not.

      We have provided further detail on how super-additivity appears to manifest in neural measures.

      “In EEG, meanwhile, the evoked response to an audiovisual stimulus typically conforms to a sub-additive principle (Cappe et al., 2010; Fort et al., 2002; Giard & Peronnet, 1999; Murray et al., 2016; Puce et al., 2007; Stekelenburg & Vroomen, 2007; Teder- Sälejärvi et al., 2002; Vroomen & Stekelenburg, 2010). However, when the principle of inverse effectiveness is considered and relatively weak stimuli are presented together, there has been some evidence for super-additive responses (Senkowski et al., 2011).”

      “While behavioural outcomes for multisensory stimuli can be predicted by MLE, and single neuron responses follow the principles of inverse effectiveness and super- additivity, among others (Rideaux et al., 2021), how audiovisual super-additivity manifests within populations of neurons is comparatively unclear given the mixed findings from relevant fMRI and EEG studies. This uncertainty may be due to biophysical limitations of human neuroimaging techniques, but it may also be related to the analytic approaches used to study these recordings. For instance, superadditive responses to audiovisual stimuli in EEG studies are often reported from very small electrode clusters (Molholm et al., 2002; Senkowski et al., 2011; Talsma et al., 2007), suggesting that neural super-additivity in humans may be highly specific. However, information encoded by the brain can be represented as increased activity in some areas, accompanied by decreased activity in others, so simplifying complex neural responses to the average rise and fall of activity in specific sensors may obscure relevant multivariate patterns of activity evoked by a stimulus.”

      P9. "(25-75 W, 6 Ω)." This is not important, but it is a strange way to cite the power handling of a loudspeaker. 

      “The loudspeakers had a power handling capacity of 25-75 W and a nominal impedance of 6 Ω.” 

      I am struggling to understand the auditory stimulus: 

      "Auditory stimuli were 100 ms clicks". Is this a 100-ms long train of clicks? A single pulse which is 100ms long would not sound like a click, but two clicks once filtered by the loudspeaker. Perhaps they mean 100us. 

      "..with a flat 850 Hz tone embedded within a decay envelope". Does this mean the tone is gated - i.e. turns on and off slowly? Or is it constant?

      We thank the reviewer for catching this. ‘Click’ may not be the most apt way of defining the auditory stimulus. It was a 100 ms square wave tone with decay, i.e., with an onset at maximal volume before fading gradually. Given that the length of the stimulus was 100 ms, the decay occurs quickly and provides a more ‘click-like’ percept than a pure tone. We have provided a representation of the sound below for further clarification. This represents the amplitude from the L and R speakers for maximally-left and maximally-right stimuli. We have added this clarification in the revised manuscript. 

      Author response image 1.

      “Auditory stimuli were 100 ms, 850 Hz tones with a decay function (sample rate = 44, 100 Hz; volume = 60 dBA SPL, as measured at the ears).”

      P10. "Stimulus modality was either auditory, visual, or audiovisual. Trials were blocked with short (~2 min) breaks between conditions".

      Presumably the blocks were randomised across participants.

      Condition order was not randomised across participants, but counterbalanced. This has been clarified in the manuscript.

      “Stimulus modality was auditory, visual or audiovisual, presented in separate blocks with short breaks (~2 min) between conditions (see Figure 6A for an example trial). The order of conditions was counterbalanced across participants.” 

      P15. Feels like there is a step not described here: "The d' of the auditory and visual conditions can be used to estimate the predicted 'optimal' sensitivity of audiovisual signals as calculated through MLE." Do they mean sqrt[ (d'A)^2 + (d'V)^2] ? If it is so simple then it may as well be made explicit here. A quick calculation from eyeballing Figures 2B and 2C suggests this is the case.

      We thank the reviewer for raising this point of clarification. Yes, the ‘optimal’ audiovisual sensitivity was calculated as the hypotenuse of the auditory and visual sensitivities. This calculation has been made explicit in the revised manuscript.

      The d’ from the auditory and visual conditions can be used to estimate the predicted ‘optimal’ sensitivity to audiovisual signals as calculated through the following formula:

      "The perceived source location of auditory stimuli was manipulated via changes to interaural intensity and timing (Whitworth & Jeffress, 1961; Wightman & Kistler, 1992)." The stimuli were delivered by a pair of loudspeakers, and the incident sound at each ear would be a product of both speakers. And - if there were a time delay between the two speakers, then both ears could potentially receive separate pulses one after the other at different delays. Did they record this audio stimulus with manikin? If not, it would be very difficult to know what it was at the ears. I don't doubt that if they altered the relative volume of the loudspeakers then some directionality would be perceived but I cannot see how the interaural level and timing differences could be matched - as if the sound were from a single source. I doubt that this invalidates their results, but to present this as if it provided matched spatial and timing cues is wrong, and I cannot work out how they can attribute an azimuthal location to the sound. For replication purposes, it would be useful to know how far apart the loudspeakers were and what the timing and level differences actually were.

      The behavioural tasks each had evenly distributed ‘source locations’ on the horizontal azimuth of the computer display (8 for the behavioural session, 5 for the EEG session). We manipulated the perceived location of auditory stimuli through interaural time delays and interaural level differences. By first measuring the forward (z) and horizontal (x) distance of each source location to each ear, the method worked by calculating what the time-course of a sound wave should be at the location of the ear given the sound wave at the source. Then, for each source location, we can calculate the time delay between speakers given the vectors of x and z, the speed of sound and the width of the head.  As the intensity of sound drops inversely with the square of the distance, we can divide the sound wave by the distance for each source location to provide the interaural level difference. Though we did not record the auditory stimulus with a manikin, our behavioural analyses show that participants were able to detect the directions of auditory stimuli from our manipulations, even to a degree that significantly exceeded the localisation accuracy for visual stimuli (for the behavioural session task). This information has been clarified in the manuscript.

      “Auditory stimuli were played through two loudspeakers placed either side of the display (80 cm apart for the behavioural session, 58 cm apart for the EEG session).” 

      “The perceived source location of auditory stimuli was manipulated via changes to interaural level and timing (Whitworth & Jeffress, 1961; Wightman & Kistler, 1992). The precise timing of when each speaker delivered an auditory stimulus was calculated from the following formula:

      where x and z are the horizontal and forward distances in metres between the ears and the source of the sound on the display, respectively, r is the head radius, and s is the speed of sound. We used a constant approximate head radius of 8 cm for all participants. r was added to x for the left speaker and subtracted for the right speaker to produce the interaural time difference. For ±15° source locations, interaural timing difference was 1.7 ms. To simulate the decrease in sound intensity as a function of distance, we calculated interaural level differences for the left and right speakers by dividing the sounds by the left and right distance vectors. Finally, we resampled the sound using linear interpolation based on the calculations of the interaural level and timing differences. This process was used to calculate the soundwaves played by the left and right speakers for each of the possible stimulus locations on the display. The maximum interaural level difference between speakers was 0.14 A for ±15° auditory locations, and 0.07 A for ±7.5°.

      I am confused about this statement: "A quantification of the behavioural sensitivity (i.e., steepness of the curves) revealed significantly greater sensitivity for the audiovisual stimuli than for the auditory stimuli alone (Z = -3.09, p = .002)," It is not clear from the methods how they attributed sound source angle to the sounds. Conceivably they know the angle of the loudspeakers, and this would provide an outer bound on the perceived location of the sound for extreme interaural level differences (although free field interaural timing cues can create a wider sound field). 

      Our analysis of behavioural sensitivity was dependent on the set ‘source locations’ that were used to calculate the position of auditory and audiovisual stimuli.  In the behavioural task, participants judged the position of the target stimulus relative to a central stimulus. Thus, for each source location, we recorded how often participants correctly discriminated between presentations. The quoted analysis acknowledges that participants were more sensitive to audiovisual stimuli than auditory stimuli in the context of this task. A full explanation of how source location was implemented for auditory stimuli has been clarified in the manuscript. 

      It would be very nice to see some of the "channel" activity - to get a feel for the representation used by the decoder. 

      We have included responses for the five channels as a Supplemental Figure.

      Figure 6 appears to show that there is some agreement between behaviour and neural responses - for the audiovisual case alone. The positive correlation of behavioural and decoding sensitivity appears to be driven by one outlier - who could not perform the audiovisual task (and indeed presumably any of them). Furthermore, if we were simply Bonferonni correct for the three comparisons, this would become non-significant. It is also puzzling why the unisensory behaviour and EEG do not correlate - which seems to again suggest a poor correspondence between them. Opposite to the claim made.

      We understand the reviewer’s concern here. We would like to note, however, that each correlation used unique data sets – that is, the behavioural and neural data for each separate condition. In this case, we believe a Bonferroni correction for multiple comparisons is too conservative, as no data set was compared more than once. Neither the behavioural nor the neural data were normally distributed, and both contained outliers. Rather than reduce power through outlier rejection, we opted to test correlations using Spearman’s rho, which is resistant to outliers1. It is also worth noting that, without outlier rejection, the audiovisual correlation (p \= .003) would survive a Bonferroni correction for 3 comparisons. The nonsignificant correlation in the auditory and visual conditions might be due to the weaker responses elicited by unisensory stimuli, with the reduced signal-to-noise ratio obscuring potential correlations. Audiovisual stimuli elicited more precise responses both behaviourally and neurally, increasing the power to detect a correlation. 

      (1) Wilcox, R.R. (2016), Comparing dependent robust correlations. British Journal of Mathematical & Statistical Psychology, 69(3), 215-224. https://doi.org/10.1111/bmsp.12069

      “We also found a significant positive correlation between participants’ behavioural judgements in the EEG session and decoding sensitivity for audiovisual stimuli. This result suggests that participants who were better at identifying stimulus location also had more reliably distinct patterns of neural activity. The lack of neurobehavioural correlation in the unisensory conditions might suggest a poor correspondence between the different tasks, perhaps indicative of the differences between behavioural and neural measures explained previously. However, multisensory stimuli have consistently been found to elicit stronger neural responses than unisensory stimuli (Meredith & Stein, 1983; Puce et al., 2007; Senkowski et al., 2011; Vroomen & Stekelenburg, 2010), which has been associated with behavioural performance (Frens & Van Opstal, 1998; Wang et al., 2008). Thus, the weaker signalto-noise ratio in unisensory conditions may prevent correlations from being detected.”

      Further changes:

      (1)   To improve clarity, we shifted the Methods section to after the Discussion. This change included updating the figure numbers to match the new order (Figure 1 becomes Figure 6, Figure 2 becomes Figure 1, and so on).

      (2)   We also resolved an error on Figure 2 (previously Figure 3). The final graph (Difference between AV and A + V) displayed incorrect values on the Y axis.

      This has now been remedied.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aim to elucidate the diversity and gene expression patterns of marine plankton using innovative collection and sequencing methodologies. Their work investigates the taxonomic and functional profiles of planktonic communities, providing insights into their ecological roles and responses to environmental changes.

      Strengths:

      The methodology utilized in this study, particularly the combination of single-cell sequencing and advanced bioinformatics techniques, represents a significant advancement in the field of plankton research. The application of the Smart-seq2 protocol for cDNA synthesis, followed by rigorous quality control measures, ensures high-quality data generation. This comprehensive approach not only enhances the resolution of the obtained genetic information but also allows for a more detailed exploration of the diversity and functional potential of the phytoplankton community.

      One of the major strengths of this study is the rigorous methodological approach, including precise sampling techniques and robust data analysis protocols, which enhance the reliability of the results. The use of advanced sequencing technologies allows for a comprehensive assessment of gene expression, significantly contributing to our understanding of plankton diversity and its implications for marine ecosystems.

      Weaknesses:

      While the evidence presented is solid, there are areas where the analysis could be expanded. The authors could further explore the ecological interactions within plankton communities, which would provide a more holistic view of their functional roles. Additionally, a broader discussion of the implications of their findings for marine conservation efforts could enhance the manuscript's impact.

      The choice of both the plankton net and filter pore size during the plankton collection process is critical, as these factors directly impact the types of phytoplankton collected. The use of a 25 μm filter paper, in particular, may result in the omission of many eukaryotic phytoplankton species. This limitation, combined with the characteristics of the plankton net, could affect the comprehensiveness and accuracy of the results, potentially influencing the study's conclusions regarding phytoplankton diversity.

      The timing of fixation is crucial, as it directly affects whether the measured transcriptome accurately represents the organisms' actual transcriptional state in their native water environment. If fixation occurred a significant time after sample collection, the transcriptomic data may not reflect their true in situ transcriptional activity, which greatly reduces the relevance of this method.

      Thank you for your time, effort, and expertise.

      We agree that additional analyses could improve our understanding of the plankton communities sampled. We have conducted an array of alternative analyses that were not included in the current manuscript and plan to perform new analyses over the next few months as part of a deeper revision of the manuscript. We are especially interested in “providing a more holistic view of the functions” of individual plankton within the community.

      As for the protocol details, the pore size of the filter paper was chosen to focus on ~100 micron-sized organisms as a starting point: they are likely to contain more RNA than smaller organisms, making them well suited for an initial proof of concept of the methodology. That choice, however, is not particularly tightly constrained, therefore smaller plankton could be captured. This is supported by the lack of correlation, in our data, between organismal size and number of detected sequencing reads.

      Timing to cell death/fixation is a common question we receive not just in this manuscript but any RNA-Seq from primary samples. In this case, plankton were seen swimming until picking, and after picking each organism was deposited within two seconds into a lysis buffer for fixation. Therefore, we do not have reason to believe that the transcriptional activity sampled in the sequencing reads differs in any major way from the one in living plankton. Nonetheless, a study specifically testing the effect of time between ocean sampling and reverse transcription would provide more quantitative information on this point.

      Reviewer #2 (Public review):

      Summary:

      The paper introduces Ukiyo-e-Seq, a novel method integrating microscopy with single-cell transcriptomics to study individual, uncultured eukaryotic plankton cells. By combining microscopic imaging with transcriptomic analysis, the approach links plankton morphology to gene expression, enabling taxonomic identification and functional protein exploration. Ukiyo-e-Seq was tested on 66 microbial eukaryotic cells, revealing taxonomic diversity across four superkingdoms and allowing analysis of protein complexes and developmental genes in individual species. According to the authors, this method has the potential to advance single-cell marine biodiversity studies by addressing limitations in traditional taxonomy and metatranscriptomics, especially for rare or uncultured organisms.

      However, the study's conclusions are often weakly supported by data, particularly given that this is not the first study to combine microscopy and single-cell transcriptomics of eukaryotic plankton using Smart-seq2.

      Strengths:

      A notable strength is the authors' generation of several single-cell transcriptomes for the diatom Chaetoceros, which could benefit from greater focus rather than broadly addressing eukaryotic single cells.

      Weaknesses:

      The study lacks comparison with other single-cell transcriptomics studies and it was presented as the first study that combines imaging and single-cell transcriptomics (smart-seq2) of eukaryotic plankton while in fact it is not. The sampling methodology is not replicable as the authors used a tea strainer instead of standard plankton collection equipment to filter larger cells. Terminology throughout the paper is unconventional, such as "public and private contigs," "single-organism genomics," "highly expressed contigs," and "optical methods." Additionally, the authors did not specify which database was used for taxonomic assignments. These issues may stem from the authors' limited background in microbial ecology. Overall, the study has many drawbacks and it could benefit from complete rewriting and focusing mainly on single-cell transcriptomics of diatoms.

      Thank you for your time, effort, and expertise.

      There might be a bit of confusion between single-cell and single-organism sequencing, likely due to lack of clarity in our initial submission. In particular, in this manuscript no effort was spent trying to dissociate oligocellular plankton into individual cells before sequencing. While probably feasible, we expect that to be technically much harder than single-organism sequencing as performed here. The reviewer does not reference a published paper where combined imaging and RNA-Seq of individual uncultured plankton has been achieved, and we were unable to find one in the scientific literature. As stated in the manuscript, others have already performed some work on cultured plankton and single-organism sequencing (without matching images) of uncultured environmental microorganisms.

      The suggestion to focus on a smaller biological niche such as diatoms and adopt language more familiar to that specific community is well received. Indeed, given that organisms as diverse as fish larvae and diatoms could be profiled with Ukiyo-e-Seq, future studies could use the same method to address specific questions with a deeper and more narrow scope. However, this manuscript is demonstrating the feasibility of Ukiyo-e-Seq and its ability to produce usable data for a broad spectrum of organisms: part of the scientific audience might not have a specific interest in diatoms.

      The tea strainer was used for coarse pre-filtering: the exact pore size, geometry and factory tolerance on those measurements are inconsequential because each organism is later chosen (or not) based on a high-resolution microscopy image (or multiple, if fluorescence is considered). This really is a strength of Ukiyo-e-Seq over FACS or droplet-based sorters, which can only collect coarse optical information from each organism for (typically) less than 1 millisecond. In Ukiyo-q-Seq, while the actual decision to pick an individual is currently manual (by the operator of the picker), it can be automated in principle. For instance, one could build a machine learning model of plankton taxonomy based on a large collection of labelled images and use predictions from such a model to automatically drive the picker (e.g. focussing on diatoms), increasing throughput. Even in that case, however, the initial filtering stages using tea strainers, plankton nets, filter paper etc. would not be critical for the final selection of individuals as long as they are not too restrictive.

      The database used for taxonomic assignment was the NCBI non-redundant nucleotide database, accessed through the reference library provided by Kraken2 (nt).

      Reviewer #3 (Public review):

      Gatt et al. present a novel take on single-cell RNA-sequencing from complex planktonic samples, introducing an approach they aptly named Ukiyo-e-Seq. This work combines environmental sampling with cell picking, microscopic imaging, and Smart-seq2 single-cell RNA sequencing to profile uncultured eukaryotic plankton. Developing single-cell approaches for such ecosystems is critical, given the poor representation of many planktonic species in cultures and reference databases. This work could help bridge existing technological gaps between morphological and molecular studies of aquatic microeukaryotes

      The authors argue that microscopy does not provide information on the biochemistry of species under consideration. At best, it provides taxonomic labeling of species within a sample, yet imaging fails to assess their metabolic state or to disentangle cryptic species. In a standard metatranscriptomic setup, the sequence pool is described by aligning assembled contigs with reference databases to obtain functional and taxonomic information. This complex community-level data is impossible to parse at the single-organism level. Moreover, by relying on reference datasets, a lot of potential information can be missed. The aim of the approach is to combine the strengths of both methods, generating single-cell transcriptomic data linked to individual plankton images.

      Strengths:

      Ukiyo-e-Seq generated a valuable dataset by combining imaging and transcriptomics for individual planktonic organisms from environmental samples. This multimodal approach has the potential to improve taxonomic predictions and functional insights at the single-organism level. This manuscript demonstrates the technical feasibility of such an approach. Data of this type is rare and thus represents a valuable resource to further advance single-cell sequencing of planktonic species from environmental samples.

      Weaknesses:

      (1) The merge-split strategy, where single-cell reads are pooled prior to assembly, is counterintuitive. Pooling obscures the single-organism resolution that single-cell methods aim to achieve. The approach might be useful for assembling low-coverage contigs, but risks masking unique expression profiles for transcripts unique to a given well. As an alternative, the authors could assemble each well independently to obtain well-specific transcriptomic bins. Assemblies could then be clustered based on sequence similarity, thereby imposing strict clustering parameters to maintain resolution, to create a common reference for downstream analysis if needed. In my opinion, better results would be obtained by implementing a per-well assembly and read mapping.

      (2) The focus on the top five most expressed contigs throughout the manuscripts' data analysis is a limiting choice, as it excludes most contigs. In the preprint, we are presented with a very narrow view of the data. Visualising the entire range of assembled contigs would provide a better picture of the transcriptomic composition and diversity per well. It would be interesting to assess if the full information could be used to preliminary bin transcriptomic sequences from individual wells, for example, by gathering all 'private' contigs with high read coverage in a single well. Does such a set represent a single complete eukaryotic transcriptome?

      (3) I missed a verification with (broad-scale) taxonomic assessments based on the associated microscopic images. In their goals, the authors state that a joint approach has the potential to discover new taxonomic biodiversity. I agree, and to me, this is what is exciting about the preprint, yet I miss an example or the right bioinformatic implementation to drive home this claim. Are there organisms in wells where poor taxonomic annotations, based on alignment to a reference database or the LCA approach implemented in Kraken2, would usually result in ignoring the species in classic metatranscriptomics? Can you advance the taxonomic annotation by referring back to the organisms' picture? Can manual assessment of taxonomy advance the results from the LCA approach?

      (4) The current use of AlphaFold to predict protein structures does not convincingly add to the study's core objectives.

      Overall, Ukiyo-e-Seq presents a promising method for studying single-cell diversity in environmental samples, though the bioinformatic pipeline requires refinement to support some of the claims made by the authors. Additionally, the manuscript would benefit from clarity and additional details in its methods and a more consistent approach to presenting results and summary statistics across all assembled contigs and all sampled wells, rather than focusing on selected wells.

      Thank you for your time and effort, and for your expertise on the matter.

      The suggestions to conduct additional bioinformatic analyses to explore more fully the criticality and potential of various design choices (e.g. meta-assembly) are well received. We have tried some of those ideas already (e.g. assembling individual wells) and we have considered but not yet conducted or polished others (e.g. a more thorough taxonomic verification). We will endeavour to carry out as many of those analyses as possible during the deeper revision process in the coming months.

      AlphaFold 3’s use was designed to demonstrate the ability to investigate protein-protein interactions from individual species. When two peptide sequences are detected within the same well, they are more likely to be potential interacting partners than in a metatranscriptomic study, because the compartmentalisation of reads into tens or hundreds of wells greatly reduces the search space of potential interaction partners (which has a baseline runtime complexity of n squared, where n is the number of peptide sequences identified).

      ----------

    1. Author response:

      The following is the authors’ response to the original reviews.

      We performed multiple new experiments and analyses in response to the reviewers concerns, and incorporated the results of these analyses in the main text, and in multiple substantially revised or new figures. Before embarking on a point-by-point reply to the reviewers’ concerns, we here briefly summarize our most important revisions.

      First, we addressed a concern shared by Reviewers #1-3 about a lack of information about our DNA sequences. To this end, we redesigned multiple figures (Figures 3, 4, 5, S8, S9, S10, S11, and S12) to include the DNA sequences of each tested promoter, the specific mutations that occurred in it, the resulting changes in position-weight-matrix (PWM) scores, and the spacing between promoter motifs. Second, Reviewers #1 and #2 raised concerns about a lack of validation of our computational predictions and the resulting incompleteness of the manuscript. To address this issue, we engineered 27 reporter constructs harboring specific mutations, and experimentally validated our computational predictions with them. Third, we expanded our analysis to study how a more complete repertoire of other sigma 70 promoter motifs such as the UP-element and the extended -10 / TGn motif affects gene expression driven by the promoters we study. Fourth, we addressed concerns by Reviewer #3 about the role of the Histone-like nucleoid-structuring protein (H-NS) in promoter emergence and evolution. We did this by performing both experiments and computational analyses, which are now shown in the newly added Figure 5. Fifth, to satisfy Reviewer #3’s concerns about missing details in the Discussion, we have rewritten this section, adding additional details and references. 

      We next describe these and many other changes in a point-by-point reply to each reviewer’s comments. In addition, we append a detailed list of changes to each section and figure to the end of this document.

      Reviewer #1 (Public Review):

      Summary:

      This study by Fuqua et al. studies the emergence of sigma70 promoters in bacterial genomes. While there have been several studies to explore how mutations lead to promoter activity, this is the first to explore this phenomenon in a wide variety of backgrounds, which notably contain a diverse assortment of local sigma70 motifs in variable configurations. By exploring how mutations affect promoter activity in such diverse backgrounds, they are able to identify a variety of anecdotal examples of gain/loss of promoter activity and propose several mechanisms for how these mutations interact within the local motif landscape. Ultimately, they show how different sequences have different probabilities of gaining/losing promoter activity and may do so through a variety of mechanisms.

      We thank Reviewer #1 for taking the time to read and provide critical feedback on our manuscript. Their summary is fundamentally correct.

      Major strengths and weaknesses of the methods and results:

      This study uses Sort-Seq to characterize promoter activity, which has been adopted by multiple groups and shown to be robust. Furthermore, they use a slightly altered protocol that allows measurements of bi-directional promoter activity. This combined with their pooling strategy allows them to characterize expressions of many different backgrounds in both directions in extremely high throughput which is impressive! A second key approach this study relies on is the identification of promoter motifs using position weight matrices (PWMs). While these methods are prone to false positives, the authors implement a systematic approach which is standard in the field. However, drawing these types of binary definitions (is this a motif? yes/no) should always come with the caveat that gene expression is a quantitative trait that we oversimplify when drawing boundaries.

      The point is well-taken. To clarify this and other issues, we have added a section on the limitations of our work to the Discussion. Within this section we include the following sentences (lines 675-680):

      “Additionally, future studies will be necessary to address the limitations of our own work. First, we use binary thresholding to determine i) the presence or absence of a motif, ii) whether a sequence has promoter activity or not, and iii) whether a part of a sequence is a hotspot or not. While chosen systematically, the thresholds we use for these decisions may cause us to miss subtle but important aspects of promoter evolution and emergence.”

      Their approach to randomly mutagenizing promoters allowed them to find many anecdotal examples of different types of evolutions that may occur to increase or decrease promoter activity. However, the lack of validation of these phenomena in more controlled backgrounds may require us to further scrutinize their results. That is, their explanations for why certain mutations lead or obviate promoter activity may be due to interactions with other elements in the 'messy' backgrounds, rather than what is proposed.

      Thank you for raising this important point. To address it, we have conducted extensive new validation experiments for the newest version of this manuscript. For the “anecdotal” examples you described, we created 27 reporter constructs harboring the precise mutation that leads to the loss or gain of gene expression, and validated its ability to drive gene expression. The results from these experiments are in Figures 3, 4, 5, and Supplemental Figures S8-S11, and are labeled with a ′ (prime) symbol.

      These experiments not only confirm the increases and decreases in fluorescence that our analysis had predicted. They also demonstrate, with the exception of two (out of 27) falsepositive discoveries, that background mutations do not confound our analysis. We mention these two exceptions (lines 364-367):

      “In two of these hotspots, our validation experiments revealed no substantial difference in gene expression as a result of the hotspot mutation (Fig S8F′ and Fig S8J′). In both of these false positives, new -10 boxes emerge in locations without an upstream -35 box.”

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The authors express a key finding that the specific landscape of promoter motifs in a sequence affects the likelihood that local mutations create or destroy regulatory elements. The authors have described many examples, including several that are non-obvious, and show convincingly that different sequence backgrounds have different probabilities for gaining or losing promoter activity. While this overarching conclusion is supported by the manuscript, the proposed mechanisms for explaining changes in promoter activity are not sufficiently validated to be taken for absolute truth. There is not sufficient description of the strength of emergent promoter motifs or their specific spacings from existing motifs within the sequence. Furthermore, they do not define a systematic process by which mutations are assigned to different categories (e.g. box shifting, tandem motifs, etc.) which may imply that the specific examples are assigned based on which is most convenient for the narrative.

      To summarize, Reviewer #1 criticizes the following three aspects of our work in this comment. 1) The mechanisms we proposed are not sufficiently validated. 2) The description of motifs, spacing, and PWM scores are not shown. 3) How mutations are classified into different categories (i.e. box-shifting, tandem motifs, etc.) is not systematically defined. 

      These are all valid criticisms. In response, we performed an extensive set of follow-up experiments and analyses, and redesigned the majority of the figures. Here is a more detailed response to each criticism:

      (1) Proposed mechanisms for explaining changes in promoter activity are not sufficiently validated. We engineered 27 reporter constructs harboring the specific mutations in the parents that we had predicted to change promoter activity. For each, we compared their fluorescence levels with their wild-type counterpart. The results from these experiments are in Figures 3 and 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12, and are labeled with a ′ (prime) symbol.

      (2) No sufficient description of the strength of emergent promoter motifs or their specific spacings. We redesigned the figures to include the DNA sequences of the parent sequences, as well as the degenerate consensus sequences for each mutation. We additionally now highlight the specific motif sequences, their respective PWM scores, and by how much the score changes upon mutation. Finally, we annotated the spacing of motifs. These changes are in Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12.

      We note that in many cases, high-scoring PWM hits for the same motif can overlap (i.e. two -10 motifs or two -35 motifs overlap). Additionally, the proximity of a -35 and -10 box does not guarantee that the two boxes are interacting. Together, these two facts can result in an ambiguity of the spacer size between two boxes. To avoid any reporting bias, we thus often report spacer sizes as a range (see Figure panels 4F, S8D, S8F-L, S9A, S9H, S10A, and S10E). The smallest spacer we annotate is in Figure 4F with 10 bp, and the largest is in Figure S8D with 26 bp. Any more “extreme” distances are not annotated and for the reader to decide if an interaction is present or not.

      (3) No systematic process by which mutations are assigned to different categories such as box shifting, tandem motifs, etc. We opted to reformulate these categories completely, because the phenotypic effects of a previously mentioned “tandem motif” was actually a byproduct of H-NS repression (see the newly added Figure S12). 

      We also agree that the categories were ambiguous. We now introduce two terms: homo-gain and hetero-gain of -10 and -35 boxes. The manuscript now clearly defines these terms, and the relevant passage now reads as follows (lines 430-435): 

      “We found that these mutations frequently create new boxes overlapping those we had identified as part of a promoter

      (Fig S9). This occurs when mutations create a -10 box overlapping a -10 box, a -35 box overlapping a -35 box, a -10 box overlapping a -35 box, or a -35 box overlapping a -10 box. We call the resulting event a “homo-gain” when the new box is of the same type as the one it overlaps, and otherwise a “hetero-gain”. In either case, the creation of the new box does not always destroy the original box.”

      Impact of the work on the field, and the utility of the methods and data to the community: From this study, we are more aware of different types of ways promoters can evolve and devolve, but do not have a better ability to predict when mutations will lead to these effects. Recent work in the field of bacterial gene regulation has raised interest in bidirectional promoter regions. While the authors do not discuss how mutations that raise expression in one direction may affect another, they have created an expansive dataset that may enable other groups to study this interesting phenomenon. Also, their variation of the Sort-Seq protocol will be a valuable example for other groups who may be interested in studying bidirectional expression. Lastly, this study may be of interest to groups studying eukaryotic regulation as it can inform how the evolution of transcription factor binding sites influences short-range interactions with local regulator elements. Any additional context to understand the significance of the work:

      The task of computationally predicting whether a sequence drives promoter activity is difficult. By learning what types of mutations create or destroy promoters from this study, we are better equipped for this task.

      We thank Reviewer #1 again for their time and their thoughtful comments.

      Reviewer #2 (Public Review):

      Summary:

      Fuqua et al investigated the relationship between prokaryotic box motifs and the activation of promoter activity using a mutagenesis sequencing approach. From generating thousands of mutant daughter sequences from both active and non-active promoter sequences they were able to produce a fantastic dataset to investigate potential mechanisms for promoter activation. From these large numbers of mutated sequences, they were able to generate mutual information with gene expression to identify key mutations relating to the activation of promoter island sequences.

      We thank Reviewer #2 for reading and providing a thorough review of our manuscript. 

      Strengths:

      The data generated from this paper is an important resource to address this question of promoter activation. Being able to link the activation of gene expression to mutational changes in previously nonactive promoter regions is exciting and allows the potential to investigate evolutionary processes relating to gene regulation in a statistically robust manner. Alongside this, the method of identifying key mutations using mutual information in this paper is well done and should be standard in future studies for identifying regions of interest.

      Thank you for your kind words.

      Weaknesses:

      While the generation of the data is superb the focus only on these mutational hotspots removes a lot of the information available to the authors to generate robust conclusions. For instance.

      (1) The linear regression in S5 used to demonstrate that the number of mutational hotspots correlates with the likelihood of a mutation causing promoter activation is driven by three extreme points.

      A fair criticism. In response, we have chosen to remove the analysis of this trend from the manuscript entirely. (Additionally, Pnew and mutual information calculations both relied on the fluorescence scores of daughter sequences, so the finding was circular in its logic.)

      (2) Many of the arguments also rely on the number of mutational hotspots being located near box motifs. The context-dependent likelihood of this occurring is not taken into account given that these sequences are inherently box motif rich. So, something like an enrichment test to identify how likely these hot spots are to form in or next to motifs.

      Another good point. To address it, we carried out a computational analysis where we randomly scrambled the nucleotides of each parent sequence while maintaining the coordinates for each mutual information “hotspot.” This scrambling results in significantly less overlap with hotspots and boxes. This analysis is now depicted in Figure 2C and described in lines 272-296.

      (3) The link between changes in expression and mutations in surrounding motifs is assessed with two-sided Mann Whitney U tests. This method assumes that the sequence motifs are independent of one another, but the hotspots of interest occur either in 0, 3, 4, or 5s in sequences. There is therefore no sequence where these hotspots can be independent and the correlation causation argument for motif change on expression is weakened.

      This is a fair criticism and a limitation of the MWU test. To better support our reasoning, we engineered 27 reporter constructs harboring the specific mutations in the parents that we had predicted to change promoter activity. For each, we compared their fluorescence levels with their wild-type counterpart. The results from these experiments are in Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12 and are labeled with a ′ (prime) symbol.

      These experiments not only confirm the increases and decreases in fluorescence that our analysis had predicted. They also demonstrate, with the exception of two (out of 27) falsepositive discoveries, that background mutations do not confound our analysis. We mention these two exceptions (lines 364-367):

      “In two of these hotspots, our validation experiments revealed no substantial difference in gene expression as a result of the hotspot mutation (Fig S8F′ and Fig S8J′). In both of these false positives, new -10 boxes emerge in locations without an upstream -35 box.”

      (4) The distance between -10 and -35 was mentioned briefly but not taken into account in the analysis.

      We have now included these spacer distances where appropriate. These changes are in Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12.

      We note that in many cases, high-scoring PWM hits for the same motif can overlap (i.e. two -10 motifs or two -35 motifs overlap). Additionally, the proximity of a -35 and -10 box does not guarantee that the two boxes are interacting. Together, these two facts can result in an ambiguity of the spacer size between two boxes. To avoid any reporting bias, we thus often report spacer sizes as a range (see Figure panels 4F, S8D, S8F-L, S9A, S9H, S10A, and S10E). The smallest spacer we annotate is in Figure 4F with 10 bp, and the largest is in Figure S8D with 26 bp. More “extreme” distances are not annotated, and for the reader to decide if an interaction is present or not.

      The authors propose mechanisms of promoter activation based on a few observations that are treated independently but occur concurrently. To address this using complementary approaches such as analysis focusing on identifying important motifs, using something like a glm lasso regression to identify significant motifs, and then combining with mutational hotspot information would be more robust.

      This is a great idea, and we pursued it as part of the revision. For each parent sequence, we mapped the locations of all -10 and -35 box motifs in the daughters, then reduced each sequence to a binary representation, either encoding or not encoding these motifs, also referred to as a “hot-encoded matrix.” We subsequently performed a Lasso regression between the hot-encoded matrices and the fluorescence scores of each daughter sequence. The regression then outputs “weights” to each of the motifs in the daughters. The larger a motif’s weight is, the more the motif influences promoter activity. The Author response image 1 describes our workflow.

      Author response image 1.

      We really wanted this analysis to work, but unfortunately, the computational model does not act robustly, even when testing multiple values for the hyperparameter lambda (λ), which accounts for differences in model biases vs variance.

      The regression assigns strong weights almost exclusively to -10 boxes, and assigns weak to even negative weights to -35 boxes. While initially exciting, these weights do not consistently align with the results from the 27 constructs with individual mutations that we tested experimentally. This ultimately suggests that the regression is overfitting the data.

      We do think a LASSO-regression approach can be applied to explore how individual motifs contribute to promoter activity. However, effectively implementing such a method would require a substantially more complex analysis. We respectfully believe that such an approach would distract from the current narrative, and would be more appropriate for a computational journal in a future study. 

      Because this analysis was inconclusive, we have not made it part of the revised manuscript. However, we hope that our 27 experimentally validated new constructs with individual mutations are sufficient to address the reviewer’s concerns regarding independent verification of our computational predictions.

      Other elements known to be involved in promoter activation including TGn or UP elements were not investigated or discussed.

      Thank you for highlighting this potentially important oversight. In response, we have performed two independent analyses to explore the role of TGn in promoter emergence in evolution. First, we computationally searched for -10 boxes with the bases TGn immediately upstream of them in the parent sequences, and found 18 of these “extended -10 boxes” in the parents (lines 143145):

      “On average, each parent sequence contains ~5.32 -10 boxes and ~7.04 -35 boxes (Fig S1). 18 of these -10 boxes also include the TGn motif upstream of the hexamer.”

      However, only 20% of these boxes were found in parents with promoter activity (lines 182-185):

      “We also note that 30% (15/50) of parents have the TGn motif upstream of a -10 box, but only 20% (3/15) of these parents have promoter activity (underlined with promoter activity: P4-RFP, P6-RFP, P8-RFP, P9-RFP, P10-RFP, P11GFP, P12-GFP, P17-GFP, P18-GFP, P18-RFP, P19-RFP, P22-RFP, P24-GFP, P25-GFP, P25-RFP). “

      Second, we computationally searched through all of the daughter sequences to identify new -10 boxes with TGn immediately upstream. We found 114 -10 boxes with the bases TGn upstream. However, only 5 new -10 boxes (2 with TGn) were associated with increasing fluorescence (lines 338-345):

      “On average, 39.5 and 39.4 new -10 and -35 boxes emerged at unique positions within the daughter sequences of each mutagenized parent (Fig 3A,B), with 1’562 and 1’576 new locations for -10 boxes and -35 boxes, respectively. ~22% (684/3’138) of these new boxes are spaced 15-20 bp away from their cognate box, and ~7.3% (114/1’562) of the new -10 boxes have the TGn motif upstream of them. However, only a mere five of the new -10 boxes and four of the new 35 boxes are significantly associated with increasing fluorescence by more than +0.5 a.u. (Fig 3C,D).”

      In addition, we now study the role of UP elements. This analysis showed that the UP element plays a negligible role in promoter emergence within our dataset.  It is discussed in a new subsection of the results (lines 591-608).

      Collectively, these additional analyses suggest that the presence of TGn plus a -10 box is insufficient to create promoter activity, and that the UP element does not play a significant role in promoter emergence or evolution.

      Reviewer #3 (Public Review):

      Summary:

      Like many papers in the last 5-10 years, this work brings a computational approach to the study of promoters and transcription, but unfortunately disregards or misrepresents much of the existing literature and makes unwarranted claims of novelty. My main concerns with the current paper are outlined below although the problems are deeply embedded.

      We thank Reviewer #3 for taking the time to review this manuscript. We have made extensive changes to address their concerns about our work.

      Strengths:

      The data could be useful if interpreted properly, taking into account i) the role of translation ii) other promoter elements, and iii) the relevant literature.

      Weaknesses:

      (1) Incorrect assumptions and oversimplification of promoters.

      - There is a critical error on line 68 and Figure 1A. It is well established that the -35 element consensus is TTGACA but the authors state TTGAAA, which is also the sequence represented by the sequence logo shown and so presumably the PWM used. It is essential that the authors use the correct -35 motif/PWM/consensus. Likely, the authors have made this mistake because they have looked at DNA sequence logos generated from promoter alignments anchored by either the position of the -10 element or transcription start site (TSS), most likely the latter. The distance between the TSS and -10 varies. Fewer than half of E. coli promoters have the optimal 7 bp separation with distances of 8, 6, and 5 bp not being uncommon (PMID: 35241653). Furthermore, the distance between the -10 and -35 elements is also variable (16,17, and 18 bp spacings are all frequently found, PMID: 6310517). This means that alignments, used to generate sequence logos, have misaligned -35 hexamers. Consequently, the true consensus is not represented. If the alignment discrepancies are corrected, the true consensus emerges. This problem seems to permeate the whole study since this obviously incorrect consensus/motif has been used throughout to identify sequences that resemble -35 hexamers.

      We respectfully but strongly disagree that our analysis has misrepresented the true nature of -35 boxes. First, accounting for more A’s at position 5 in the PWM is not going to lead to a “critical error.” This is because positions 4-6 of the motif barely have any information content (bits) compared to positions 1-3 (see Fig 1A). This assertion is not just based on our own PWM, but based on ample precedent in the literature. In PMID 14529615, TTG is present in 38% of all -35 boxes, but ACA only in 8%. In PMID 29388765, with the -10 instance TATAAT, the -35 instance TTGCAA yields stronger promoters compared to the -35 instance TTGACA (See their Figure 3B).

      In PMID 29745856 (Figure 2), the most information content lies in positions 1-3, with the A and C at position 5 both nearly equally represented, as in our PWM. In PMID 33958766 (Figure 1) an experimentally-derived -35 box is even reduced to a “partial” -35 box which only includes positions 1 and 2, with consensus: TTnnnn.

      In addition, we did not derive the PWMs as the reviewer describes. The PWMs we use are based on computational predictions that are in excellent agreement with experimental results. Specifically, the PWMs we use are from PMID 29728462, which acquired 145 -10 and -35 box sequences from the top 3.3% of computationally predicted boxes from Regulon DB. See PMID 14529615 for the computational pipeline that was used to derive the PWMs, which independently aligns the -10 and -35 boxes to create the consensus sequences. The -35 PWMs significantly and strongly correlates with an experimentally derived -35 box (see Supporting Information from Figure S4 of Belliveau et al., PNAS 2017. Pearson correlation coefficient = 0.89). Within the 145 -35 boxes, the exact consensus sequence (TTGACA) that Reviewer #3 is concerned about is present 6 times in our matrix, and has a PWM score above the significance threshold. In other words, TTGACA, is classified to be a -35 box in our dataset.

      We now provide DNA sequences for each of the figures to improve accessibility and reproducibility. A reader can now use any PWM or method they wish to interpret the data.

      - An uninformed person reading this paper would be led to believe that prokaryotic promoters have only two sequence elements: the -10 and -35 hexamers. This is because the authors completely ignore the role of the TG motif, UP element, and spacer region sequence. All of these can compensate for the lack of a strong -35 hexamer and it's known that appending such elements to a lone -10 sequence can create an active promoter (e.g. PMIDs 15118087, 21398630, 12907708, 16626282, 32297955). Very likely, some of the mutations, classified as not corresponding to a -10 or -35 element in Figure 2, target some of these other promoter motifs.

      Thank you for bringing this oversight to our attention. We have performed two independent analyses to explore the role of TGn in promoter emergence in evolution. First, we computationally searched for -10 boxes with the bases TGn immediately upstream of them in the parent sequences, and found 18 of these “extended -10 boxes” in the parents (lines 143145):

      “On average, each parent sequence contains ~5.32 -10 boxes and ~7.04 -35 boxes (Fig S1). 18 of these -10 boxes also include the TGn motif upstream of the hexamer.”

      However, only 20% of these boxes were found in parents with promoter activity (lines 182-185):

      “We also note that 30% (15/50) of parents have the TGn motif upstream of a -10 box, but only 20% (3/15) of these parents have promoter activity (underlined with promoter activity: P4-RFP, P6-RFP, P8-RFP, P9-RFP, P10-RFP, P11GFP, P12-GFP, P17-GFP, P18-GFP, P18-RFP, P19-RFP, P22-RFP, P24-GFP, P25-GFP, P25-RFP).”

      Second, we computationally searched through all of the daughter sequences to identify new -10 boxes with TGn immediately upstream. We found 114 -10 boxes with the bases TGn upstream. However, only 5 new -10 boxes (2 with TGn) were associated with increasing fluorescence (lines 338-345):

      “On average, 39.5 and 39.4 new -10 and -35 boxes emerged at unique positions within the daughter sequences of each mutagenized parent (Fig 3A,B), with 1’562 and 1’576 new locations for -10 boxes and -35 boxes, respectively. ~22% (684/3’138) of these new boxes are spaced 15-20 bp away from their cognate box, and ~7.3% (114/1’562) of the new -10 boxes have the TGn motif upstream of them. However, only a mere five of the new -10 boxes and four of the new 35 boxes are significantly associated with increasing fluorescence by more than +0.5 a.u. (Fig 3C,D).”

      In addition, we now study the role of UP elements. This analysis showed that the UP element plays a negligible role in promoter emergence within our dataset.  It is discussed in a new subsection of the results (lines 591-608) and in the newly added Figure S13.

      Collectively, these additional analyses suggest that the presence of TGn plus a -10 box is insufficient to create promoter activity, and that the UP element does not play a significant role in promoter emergence or evolution.

      - The model in Figure 4C is highly unlikely. There is no evidence in the literature that RNAP can hang on with one "arm" in this way. In particular, structural work has shown that sequencespecific interactions with the -10 element can only occur after the DNA has been unwound (PMID: 22136875). Further, -10 elements alone, even if a perfect match to the consensus, are non-functional for transcription. This is because RNAP needs to be directed to the -10 by other promoter elements, or transcription factors. Only once correctly positioned, can RNAP stabilise DNA opening and make sequence-specific contacts with the -10 hexamer. This makes the notion that RNAP may interact with the -10 alone, using only domain 2 of sigma, extremely unlikely.

      This is a valid criticism, and we thank the reviewer for catching this problem. In response, we have removed the model and pertinent figures throughout the entire manuscript.

      (2) Reinventing the language used to describe promoters and binding sites for regulators.

      - The authors needlessly complicate the narrative by using non-standard language. For example, On page 1 they define a motif as "a DNA sequence computationally predicted to be compatible with TF binding". They distinguish this from a binding site "because binding sites refer to a location where a TF binds the genome, rather than a DNA sequence". First, these definitions are needlessly complicated, why not just say "putative binding sites" and "known binding sites" respectively? Second, there is an obvious problem with the definitions; many "motifs" with also be "bindings sites". In fact, by the time the authors state their definitions, they have already fallen foul of this conflation; in the prior paragraph they stated: "controlled by DNA sequences that encode motifs for TFs to bind". The same issue reappears throughout the paper.

      We agree that this was needlessly complicated. We now just refer to every sequence we study as a motif. A -10 box is a motif, a -35 box is a motif, a putative H-NS binding site is an H-NS motif, etc. The word “binding site” no longer occurs in the manuscript.

      - The authors also use the terms "regulatory" and non-regulatory" DNA. These terms are not defined by the authors and make little sense. For instance, I assume the authors would describe promoter islands lacking transcriptional activity (itself an incorrect assumption, see below)as non-regulatory. However, as horizontally acquired sections of AT-rich DNA these will all be bound by H-NS and subject to gene silencing, both promoters for mRNA synthesis and spurious promoters inside genes that create untranslated RNAs. Hence, regulation is occurring.

      Another fair point. We have thus changed the terminology throughout to “promoter” and “nonpromoter.”

      - Line 63: "In prokaryotes, the primary regulatory sequences are called promoters". Promoters are not generally considered regulatory. Rather, it is adjacent or overlapping sites for TFs that are regulatory. There is a good discussion of the topic here (PMID: 32665585). 

      We have rewritten this. The sentence now reads (lines 67-69):

      “A canonical prokaryotic promoter recruits the RNA polymerase subunit σ70 to transcribe downstream sequences (Burgess et al., 1969; Huerta and Collado-Vides, 2003; Paget and Helmann, 2003; van Hijum et al., 2009).”

      (3) The authors ignore the role of translation.

      - The authors' assay does not measure promoter activity alone, this can only be tested by measuring the amount of RNA produced. Rather, the assay used measures the combined outputs of transcription and translation. If the DNA fragments they have cloned contain promoters with no appropriately positioned Shine-Dalgarno sequence then the authors will not detect GFP or RFP production, even though the promoter could be making an RNA (likely to be prematurely terminated by Rho, due to a lack of translation). This is known for promoters in promoter islands (e.g. Figure 1 in PMID: 33958766).

      We agree that this is definitely a limitation of our study, which we had not discussed sufficiently. In response, we now discuss this limitation in a new section of the discussion (lines 680-686):

      “Second, we measure protein expression through fluorescence as a readout for promoter activity. This readout combines transcription and translation. This means that we cannot differentiate between transcriptional and post-transcriptional regulation, including phenomena such as premature RNA termination (Song et al., 2022; Uptain and Chamberlin, 1997), post-transcriptional modifications (Mohanty and Kushner, 2006), and RNA-folding from riboswitch-like sequences (Mandal and Breaker, 2004).”

      - In Figure S6 it appears that the is a strong bias for mutations resulting in RFP expression to be close to the 3' end of the fragment. Very likely, this occurs because this places the promoter closer to RFP and there are fewer opportunities for premature termination by Rho.

      The reviewer raises a very interesting possibility. To validate it, we have performed the following analysis. We took the RFP expression values from the 9’934 daughters with single mutations in all 25 parent sequences (P1-RFP, P2-RFP, … P25-RFP), and plotted the location of the single mutation (horizontal axis) against RFP expression (vertical axis) in Author response image 2. 

      Author response image 2.

      The distribution is uniform across the sequences, showing that distance from the RBS is not likely the reason for this observation. Since this analysis was uninformative with respect to distance from the RBS, we chose not to include it in the manuscript.

      (4) Ignoring or misrepresenting the literature.

      - As eluded to above, promoter islands are large sections of horizontally acquired, high ATcontent, DNA. It is well known that such sequences are i) packed with promoters driving the expression on RNAs that aren't translated ii) silenced, albeit incompletely, by H-NS and iii) targeted by Rho which terminates untranslated RNA synthesis (PMIDs: 24449106, 28067866, 18487194). None of this is taken into account anywhere in the paper and it is highly likely that most, if not all, of the DNA sequences the authors have used contain promoters generating untranslated RNAs.

      Thank you for pointing out that our original submission was incomplete in this regard. We address these concerns by new analyses, including some new experiments. First, Rhodependent termination is associated with the RUT motif, which is very rich in Cytosines (PMID: 30845912). Given that our sequences confer between 65%-78% of AT-content, canonical rhodependent termination is unlikely. However, we computationally searched for rho-dependent terminators using the available code from PMID: 30845912, but the algorithm did not identify any putative RUTs. Because this analysis was not informative, we did not include it in the paper.

      We analyzed the role of H-NS on promoter emergence and evolution within our dataset using both experimental and computational approaches. These additional analyses are now shown in the newly-added Figure 5 and the newly-added Figure S12. We found that H-NS represses P22-GFP and P12-RFP and affects the bidirectionality of P20. More specifically, to analyze the effects of H-NS, we first compared the fluorescence levels of parent sequences in a Δhns background vs the wild-type (dh5α) background in Figure 5A. We found 6 candidate H-NS targets, with P22-GFP and P12-RFP exhibiting the largest changes in fluorescence (lines 496506):

      “We plot the fluorescence changes in Fig 5A as distributions for the 50 parents, where positive and negative values correspond to an increase or decrease in fluorescence in the Δhns background, respectively. Based on the null hypothesis that the parents are not regulated by H-NS, we classified outliers in these distributions (1.5 × the interquartile range) as H-NS-target candidates. We refer to these outliers as “candidates” because the fluorescence changes could also result from indirect trans-effects from the knockout (Mattioli et al., 2020; Metzger et al., 2016). This approach identified 6 candidates for H-NS targets (P2-GFP, P19-GFP, P20-GFP, P22-GFP, P12-RFP, and P20-RFP). For GFP, the largest change occurs in P22-GFP, increasing fluorescence ~1.6-fold in the mutant background (two-tailed t-test, p=1.16×10-8) (Fig 5B). For RFP, the largest change occurs in P12-RFP, increasing fluorescence ~0.5-fold in the mutant background (two-tailed t-test, p=4.33×10-10) (Fig 5B).” 

      We also observed that the Δhns background affected the bidirectionality of P20 (lines 507-511):

      “We note that for template P20, which is a bidirectional promoter, GFP expression increases ~2.6-fold in the Δhns background (two-tailed t-test, p=1.59×10-6). Simultaneously, RFP expression decreases ~0.42-fold in the Δhns background (two-tailed t-test, p=4.77×10-4) (Fig S12A). These findings suggest that H-NS also modulates the directionality of P20’s bidirectional promoter through either cis- or trans-effects.”

      We then searched for regions where losing H-NS motifs in hotspots significantly changed fluorescence. We identified 3 motifs in P12-RFP and P22-GFP (lines 522-528):

      “For P22-GFP, a H-NS motif lies 77 bp upstream of the mapped promoter. Mutations which destroy this motif significantly increase fluorescence by +0.52 a.u. (two-tailed MWU test, q=1.07×10-3) (Fig 5E). For P12-RFP, one H-NS motif lies upstream of the mapped promoter’s -35 box, and the other upstream of the mapped promoter’s -10 box. Mutations that destroy these H-NS motifs significantly increase fluorescence by +0.53 and +0.51 a.u., respectively (two-tailed MWU test, q=3.28×10-40 and q=4.42 ×10-50) (Fig 5F,G). Based on these findings, we conclude that these motifs are bound by H-NS.”

      We are grateful for the suggestion to look at the role of H-NS in our dataset. Our analysis revealed a more plausible explanation to what we formerly referred to as a “Tandem Motif” in the original submission. Previously, we had shown that in P12-RFP, when a -35 box is created next to the promoter’s -35 box, or a -10 box next to the promoter’s -10 box, that expression decreases. These new -10 and -35 boxes, however, also overlap with the two H-NS motifs in P12-RFP. We tested these exact point mutations in reporter plasmids and in the Δhns background, and found that the Δhns background rescues this loss in expression (see Figure S12). This analysis is in the newly added subsection: “The binding of H-NS changes when new 10 and -35 boxes are gained” and can be found at lines 529-563. We summarize the findings in a final paragraph of the section (lines 556-563):

      “To summarize, we present evidence that H-NS represses both P22-GFP and P12-RFP in cis. H-NS also modulates the bidirectionality of P20-GFP/RFP in cis or trans. In P22-GFP, the strongest H-NS motif lies upstream of the promoter. In P12-RFP, the strongest H-NS motifs lie  upstream of the -10 and -35 boxes of the promoter. We note that there are 16 additional H-NS motifs surrounding the promoter in P12-RFP that may also regulate P12-RFP (Fig S12G). Mutations in two of these two H-NS motifs can create additional -10 and -35 boxes that appear to lower expression. However, the effects of these mutations are insignificant in the absence of H-NS, suggesting that these mutations actually modulate H-NS binding.”

      We also agree that the majority of these sequences are likely driving the expression of many untranslated RNAs (see Purtov et al., 2014). We thus now define a promoter more carefully as follows (lines 113-119):

      “In this study, we define a promoter as a DNA sequence that drives the expression of a (fluorescent) protein whose expression level, measured by its fluorescence, is greater than a defined threshold. We use a threshold of 1.5 arbitrary units (a.u.) of fluorescence. This definition does not distinguish between transcription and translation. We chose it because protein expression is usually more important than RNA expression whenever natural selection acts on gene expression, because it is the primary phenotype visible to natural selection (Jiang et al., 2023).” 

      We also state this as a limitation of our study in the Discussion (lines 680-686):

      “Second, we measure protein expression through fluorescence as a readout for promoter activity. This readout combines transcription and translation. This means that we cannot differentiate between transcriptional and post-transcriptional regulation, including phenomena such as premature RNA termination (Song et al., 2022; Uptain and Chamberlin, 1997), post-transcriptional modifications (Mohanty and Kushner, 2006), and RNA-folding from riboswitch-like sequences (Mandal and Breaker, 2004).”

      - The authors state that GC content does not correlate with the emergence of new promoters. It is known that GC content does correlate to the emergence of new promoters because promoters are themselves AT-rich DNA sequences (e.g. see Figure 1 of PMID: 32297955). There are two reasons the authors see no correlation in this work. First, the DNA sequences they have used are already very AT-rich (between 65 % and 78 % AT-content). Second, they have only examined a small range of different AT-content DNA (i.e. between 65 % and 78 %). The effect of AT-content on promoter emerge is most clearly seen between AT-content of between around 40 % and 60 %. Above that level, the strong positive correlation plateaus.

      We respectfully disagree that the reviewer’s point is pertinent because what the reviewer is referring to is the likelihood that the sequence is a promoter, which indeed increases with AT content, but we are focused on the likelihood that a sequence becomes a promoter through DNA mutation. We note that if a DNA sequence is more AT-rich, then it is more likely to have -10 and -35 boxes, because their consensus sequences are also AT-rich. However, H-NS and other transcriptional repressors also bind to AT-rich sequences. This could also explain the saturation observed above 60% AT-content in PMID 32297955. Perhaps we can address this trend in future works.

      - Once these authors better include and connect their results to the previous literature, they can also add some discussion of how previous papers in recent years may have also missed some of this important context.

      We apologize for this oversight. We have rewritten the Discussion section to include the following points below. Many of the newly added references come from the group of David Grainger, who works on H-NS repression, bidirectional promoters, promoter emergence, promoter motifs, and spurious transcription in E. coli. More specifically:

      (1) The role of pervasive transcription and the likelihood of promoter emergence (lines 614-621):

      “Instead, we present evidence that promoter emergence is best predicted by the level of background transcription each non-promoter parent produces, a phenomenon also referred to as “pervasive transcription” (Kapranov et al., 2007).

      From an evolutionary perspective, this would suggest that sequences that produce such pervasive transcripts – including the promoter islands (Panyukov and Ozoline, 2013) and the antisense strand of existing promoters (Dornenburg et al., 2010; Warman et al., 2021), may have a proclivity for evolving de-novo promoters compared to other sequences (Kapranov et al., 2007; Wade and Grainger, 2014).”

      (2) How our results contradict the findings from Bykov et al., 2020 (lines 622-640):

      “A previous study randomly mutagenized the appY promoter island upstream of a GFP reporter, and isolated variants with increased and decreased GFP expression. The authors found that variants with higher GFP expression acquired mutations that 1) improve a -10 box to better match its consensus, and simultaneously 2) destroy other -10 and -35 boxes (Bykov et al., 2020). The authors concluded that additional -10 and -35 boxes repress expression driven by promoter islands. Our data challenge this conclusion in several ways. 

      First, we find that only ~13% of -10 and -35 boxes in promoter islands actually contribute to promoter activity. Extrapolating this percentage to the appY promoter island, ~87% (100% - 13%) of the motifs would not be contributing to its activity. Assuming the appY promoter island is not an outlier, this would insinuate that during random mutagenesis, these inert motifs might have accumulated mutations that do not change fluorescence. Indeed, Bykov et al. (Bykov et al., 2020) also found that a similar frequency of -10 and -35 boxes were destroyed in variants selected for lower GFP expression, which supports this argument. Second, we find no evidence that creating a -10 or -35 box lowers promoter activity in any of our 50 parent sequences. Third, we also find no evidence that destruction of a -10 or -35 box increases promoter activity without plausible alternative explanations, i.e. overlap of the destroyed box with a H-NS site, destruction of the promoter, or simultaneous creation of another motif as a result of the destruction. In sum, -10 and 35 boxes are not likely to repress promoter activity.”

      (3) How other sequence features besides the -10 and -35 boxes may influence promoter emergence and activity (lines 661-671):

      “These findings suggest that we are still underestimating the complexity of promoters. For instance, the -10 and -35 boxes, extended -10, and the UP-element may be one of many components underlying promoter architecture. Other components may include flanking sequences (Mitchell et al., 2003), which have been observed to play an important role in eukaryotic transcriptional regulation (Afek et al., 2014; Chiu et al., 2022; Farley et al., 2015; Gordân et al., 2013). Recent studies on E. coli promoters even characterize an AT-rich motif within the spacer sequence (Warman et al., 2020), and other studies use longer -10 and -35 box consensus sequences (Lagator et al., 2022). Another possibility is that there is much more transcriptional repression in the genome than anticipated (Singh et al., 2014). This would also coincide with the observed repression of H-NS in P22-GFP and P12-RFP, and accounts of H-NSrepression in the full promoter island sequences (Purtov et al., 2014).”

      (4) The limits of our experimental methodology (lines 675-686):

      “Additionally, future studies will be necessary to address the limitations of our own work. First, we use binary thresholding to determine i) the presence or absence of a motif, ii) whether a sequence has promoter activity or not, and iii) whether a part of a sequence is a hotspot or not. While chosen systematically, the thresholds we use for these decisions may cause us to miss subtle but important aspects of promoter evolution and emergence. Second, we measure protein expression through fluorescence as a readout for promoter activity. This readout combines transcription and translation. This means that we cannot differentiate between transcriptional and post-transcriptional regulation, including phenomena such as premature RNA termination (Song et al., 2022; Uptain and Chamberlin, 1997), posttranscriptional modifications (Mohanty and Kushner, 2006), and RNA-folding from riboswitch-like sequences (Mandal and Breaker, 2004) “

      (5) An updated take-home message (lines 687-694):

      “Overall, our study demonstrates that -10 and -35 boxes neither prevent existing promoters from driving expression, nor do they prevent new promoters from emerging by mutation. It shows how mutations can create new -10 and -35 boxes near or on top of preexisting ones to modulate expression. However, randomly creating a new -10 or -35 box will rarely create a new promoter, even if the new box is appropriately spaced upstream or downstream of a cognate box. Ultimately our study demonstrates that promoter models need to be further scrutinized, and that using mutagenesis to create de-novo promoters can provide new insights into promoter regulatory logic.”

      (5) Lack of information about sequences used and mutations.

      - To properly assess the work any reader will need access to the sequences cloned at the start of the work, where known TSSs are within these sequences (ideally +/- H-NS, which will silence transcription in the chromosomal context but may not when the sequences are removed from their natural context and placed in a plasmid). Without this information, it is impossible to assess the validity of the authors' work.

      Thank you for raising this point. Please see Data S1 for the 25 template sequences (P1-P25) used in this study, and Data S2 for all of the daughter sequences.

      For brevity, we have addressed the reviewer’s request to look at the role of H-NS in their comment (4) “Ignoring or misrepresenting the literature.”

      We do not have information about the predicted transcription start sites (TSS) for the parent sequences because the program which identified them (Platprom) is no longer available. Regardless, having TSS coordinates would not validate or invalidate our findings, since we already know that the promoter islands produce short transcripts throughout their sequences, and we are primarily interested in promoters which can produce complete transcripts.

      - The authors do not account for the possibility that DNA sequences in the plasmid, on either side of the cloned DNA fragment, could resemble promoter elements. If this is the case, then mutations in the cloned DNA will create promoters by "pairing up" with the plasmid sequences. There is insufficient information about the DNA sequences cloned, the mutations identified, or the plasmid, to determine if this is the case. It is possible that this also accounts for mutational hotspots described in the paper.

      We agree that these are important points. To address the criticism that we provided insufficient information, we now redesigned all our figures to provide this information. Specifically, the figures now include the DNA sequences, their PWM predictions, and the exact mutations that lead to promoter activity. The figures with these changes are Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12. We now also provide more details about pMR1 in a new section of the methods (lines 740-748):

      “Plasmid MR1 (pMR1)

      The plasmid MR1 (pMR1) is a variant of the plasmid RV2 (pRV2) in which the kan resistance gene has been swapped with the cm resistance gene (Guazzaroni and Silva-Rocha, 2014). Plasmid pMR1 encodes the BBa_J34801 ribosomal binding site (RBS, AAAGAGGAGAAA) 6 bp upstream of the start codon for GFP(LVA). The plasmid also encodes a putative RBS (AAGGGAGG) (Cazemier et al., 1999) 5 bp upstream of the start codon for mCherry on the opposite strand.

      The plasmid additionally contains the low-to-medium copy number origin of replication p15A (Westmann et al., 2018).

      A map of the plasmid is available on the Github repository: https://github.com/tfuqua95/promoter_islands

      The reviewer also makes a valid point about promoter elements of the plasmid itself. We addressed it with the following new analyses. First we re-examined each of the examples where new -10 and -35 boxes are gained or lost, to see if any of these hotspots occur on the flanking ends of the parent sequences. We looked specifically at the ends because they could potentially interact with -10 and -35 box-like sequences on the plasmid to form a promoter. 

      Only one of these hotspots (out of 27) occurred at the end of the cloned sequences, and is thus a candidate for the phenomenon the reviewer hypothesized. This hotspot occurs in P9-GFP, where gaining a -10 box at the left flank increases expression (see Figure S8E-F’). There is indeed a -35 box 22-23 bp upstream of this -10 box on the plasmid, which could potentially affect promoter activity. 

      We tested the GFP expression of a construct harboring the point mutation which creates this -10 box on the left flank of P9-GFP. However, there was no significant difference in fluorescence between this construct and the wile-type P9-GFP (see Figure S8E-F’). Thus, this -35 box on pMR1 is not likely creating a new promoter.

      (6) Overselling the conclusions.

      Line 420: The paper claims to have generated important new insights into promoters. At the same time, the main conclusion is that "Our study demonstrates that mutations to -10 and -35 boxes motifs are the primary paths to create new promoters and to modulate the activity of existing promoters". This isn't new or unexpected. People have been doing experiments showing this for decades. Of course, mutations that make or destroy promoter elements create and destroy promoters. How could it be any other way?

      In hindsight, we agree that the original conclusion was not very novel. Our new conclusion is that -10 and -35 boxes do not repress transcription, and that our current promoter models, even with the additional motifs like the UP-element and the extended -10, are insufficient to understand promoters (lines 687-694):

      “Overall, our study demonstrates that -10 and -35 boxes neither prevent existing promoters from driving expression, nor do they prevent new promoters from emerging by mutation. It shows how mutations can create new -10 and -35 boxes near or on top of preexisting ones to modulate expression. However, randomly creating a new -10 or -35 box will rarely create a new promoter, even if the new box is appropriately spaced upstream or downstream of a cognate box. Ultimately our study demonstrates that promoter models need to be further scrutinized, and that using mutagenesis to create de-novo promoters can provide new insights into promoter regulatory logic.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I would like to start by thanking the authors for presenting an interesting and well-written article for review. This paper is a welcome addition to the field, addressing modern questions in the longstanding area of bacterial gene regulation. It is both enlightening and inspiring. While I do have suggestions, I hope these are not perceived as a lack of optimism for the work.

      Thank you for your kind words and suggestions, and for providing an astute and constructive review. We feel that manuscript has greatly improved with your suggested changes.

      ABSTRACT:

      Line 11: The sentence, "It is possible that these motifs influence..." Could be rewritten to be clearer as it is the most important point of the manuscript. It is not obvious that you're talking about how the local landscape of motifs affects the probability of promoters evolving/devolving in this location.

      We have changed the sentence to read, “Here, we ask whether the presence of such motifs in different genetic sequences influences promoter evolution and emergence.”

      INTRODUCTION:

      Line 68: Is the -35 consensus motif not TTGACA? Here it is listed as TTGAAA.

      Corrected from TTGAAA to TTGACA

      RESULTS:

      Line 92-94. In finding that the. The main takeaway from this work is that different sequences have different likelihoods of mutations creating promoters and so I believe this claim could be explored deeper with more quantitative information. Could the authors supplement this claim by including? Could you look at whether there is a correlation between the baseline expression of a parent sequence and Pnew? I expect even the inactive sequences to have some variability in measured expression.

      Thank you for this great idea. We followed up on it by plotting the baseline parent sequence fluorescence scores against Pnew. You are indeed correct, i.e., Pnew increases with baseline expression following a sigmoid function, and is now shown in Figure 1D. To report our new observations, we have added the following section to the Results (lines 219-232):

      “Although mutating each of the 40 non-promoter parent sequences could create promoter activity, the likelihood Pnew that a mutant has promoter activity, varies dramatically among parents. For each non-promoter parent, Fig 1D shows the percentage of active daughter sequences. The median Pnew is 0.046 (std. ± 0.078), meaning that ~4.6% of all mutants have promoter activity. The lowest Pnew is 0.002 (P25-GFP) and the highest 0.41 (P8-RFP), a 205-fold difference.

      We hypothesized that these large differences in Pnew could be explained by minute differences in the fluorescence scores of each parent, particularly if its score was below 1.5 a.u. Plotting the fluorescence scores of each parent (N=50) and their respective Pnew values as a scatterplot (Fig 1E), we can fit these values to a sigmoid curve (see methods). This finding helps to explain why P8-RFP has a high Pnew (0.41) and P25-GFP a low Pnew (0.002), as their fluorescence scores are 1.380 and 1.009 a.u., respectively. The fact that the inflection point of the fitted curve is at 1.51 a.u. further justifies our use of 1.5 a.u. as a cutoff for promoter and non-promoter activity.”

      Another potentially interesting analysis would be to see if k-mer content is correlated with Pnew. That is, determine the abundance of all hexamers in the sequence and see if Pnew is correlated with the number of hexamers present that is one nucleotide distance away from the consensus motifs (such as TcGACA or TAcAAT).

      We performed the suggested analysis by searching for k-mers that correlate with Pnew and found that no k-mer significantly correlates with Pnew (lines 240-248):

      “We then asked whether any k-mers ranging from 1-6 bp correlated with the non-promoter Pnew values (5,460 possible k-mers). 718 of these 1-6 bp k-mers are present 3 or more times in at least one non-promoter parent. We calculated a linear regression between the frequency of these 718 k-mers and each Pnew value, and adjusted the p-values to respective q-values (Benjamini-Hochberg correction, FDR=0.05). This analysis revealed six k-mers: CTTC, GTTG,

      ACTTC, GTTGA, AACTTC, TAACTT which correlate with Pnew. However, these correlations are heavily influenced by an outlying Pnew value of 0.41 (P8-RFP) (Fig S5C-H), and upon removing P8-RFP from the analysis, no k-mer significantly correlates with Pnew (data not shown)”

      Line 152-157: How did you define the thresholds for 'active' or 'inactive'? It is not clear in the methods how this distinction was made.

      We have more clearly defined these thresholds in the text. A sequence with promoter activity has a fluorescence score greater than 1.5 a.u. (lines 168-172):

      “We declared a daughter sequence to have promoter activity or to be a promoter if its score was greater than or equal to 1.5 a.u., as this score lies at the boundary between no fluorescence and weak fluorescence based on the sort-seq bins (methods). Otherwise, we refer to a daughter sequence as having no promoter activity or being a non-promoter.”

      Lines: 152-157: In trying to find the parent expression levels, no figure was available showing the distribution of parent expression levels. Furthermore, In looking at Data S2 & filtering out for sequences with distance 0 from the parent, I found the most active sequences did not match up with the sequences described as active in this section (e.g. p19 and p20 have a higher topstrand mean over P22, yet are not listed as active top strand sequences).

      We really appreciate you taking the time to examine the supplemental data. We previously listed the parents that had only GFP activity but no RFP activity (P22), and only RFP activity but no GFP activity (P6, P12, P13, P18, P21). We then said that P19 and P20 were bidirectional promoters, because they showed both GFP and RFP activity. In hindsight, we realize that our wording was confusing. We thus rewrote the affected paragraph, such that the bidirectional promoters are now in both lists of GFP/RFP active parents. We also now make the distinction between “templates” which comprise our 25 promoter island fragments, and “parents”, where we treat both strands separately (50 parents total). The paragraph in question now reads (lines 173-187):

      “Because some sequences in our library are unmutated parent sequences, we determined that 10/50 of the parent sequences already encode promoter activity before mutagenesis. Specifically, three parents drove expression on the top strand (P19-GFP, P20-GFP, P22-GFP), and five did on the bottom strand (P6-RFP, P12-RFP, P13-RFP, P18-RFP, P19-RFP, P20-RFP, P21-RFP). Two parents harbor bidirectional promoters (P19 and P20). The remaining 40 parent sequences are non-promoters, with an average fluorescence score of 1.39 a.u. We note that some of these parents have a fluorescence score higher than 1.39 a.u., but less than 1.50 a.u. such as P8-RFP (1.38 a.u.), P16-RFP (1.39 a.u.), P9-GFP (1.49 a.u.), and P1-GFP (1.47 a.u.). Whether these are truly “promoters” or not, is based solely on our threshold value of 1.5 a.u. We also note that 30% (15/50) of parents have the TGn motif upstream of a -10 box, but only 20% (3/15) of these parents have promoter activity (underlined with promoter activity: P4-RFP, P6-RFP, P8-RFP, P9RFP, P10-RFP, P11-GFP, P12-GFP, P17-GFP, P18-GFP, P18-RFP, P19-RFP, P22-RFP, P24-GFP, P25-GFP, P25RFP). See Fig S4 for fluorescence score distributions for each parent and its daughters, and Data S2 for all daughter sequence fluorescence scores.”

      Please include a supplementary figure showing the different parent expression levels (GFP mean +/- sd). Also, please explain the discrepancy in the 'active sequences' compared to Data S2 or correct my misunderstanding.

      We have added this plot to Figure S4B. The discrepancy arose because we listed the parents that had only GFP activity but no RFP activity (P22), and only RFP activity but no GFP activity (P6, P12, P13, P18, P21). We then said that P19 and P20 were bidirectional promoters, because they showed both GFP and RFP activity. previous response regarding the ambiguity.

      Line 182: I do not see 'Fuqua and Wagner 2023' in the references (though I am familiar with the preprint).

      We have added Fuqua and Wagner, BiorXiv 2023 to the references.

      Lines 197 - 200: The distribution of hotspot locations should be compared to the distribution of mutations in the library. e.g. It is not notable that 17% of mutations are in -10 motifs if 17% of all mutations are in -10 motifs.

      Thank you for raising this point. To address it, we carried out a computational analysis where we randomly scrambled the nucleotides of each parent sequence while maintaining the coordinates for each mutual information “hotspot.” This scrambling results in significantly less overlap with hotspots and boxes. This analysis is now depicted in Figure 2C and written in lines 272-296.

      Lines 253-264: Examples 3B, 3D, and 3F should indicate the spacing between the new and existing motifs. Are these close to the 15-19 bp spacer lengths preferred by sigma70?

      Point well taken. We now annotate the spacing of motifs in Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, and S11. We note that in many cases, high-scoring PWM hits for the same motif can overlap (i.e. two -10 motifs or two -35 motifs overlap). Additionally, the proximity of a 35 and -10 box does not guarantee that the two boxes are interacting. Together, these two facts can result in an ambiguity of the spacer size between two boxes. To avoid any reporting bias, we thus often report spacer sizes as a range (see Figure panels 4F, S8D, S8F-L, S9A, S9H, S10A, and S10E). The smallest spacer we annotate is in Figure 4F with 10 bp, and the largest is in Figure S8D with 26 bp. Any more “extreme” distances are not annotated, and for the reader to decide if an interaction is present or not.

      Line 255: While fun, I am concerned about the 'Shiko' analogy. My understanding is the prevailing theory is that -35 recognition occurs before -10 recognition (https://doi.org/10.1073/pnas.94.17.9022, 10.1101/sqb.1998.63.141). Given this, the 'Shiko -35' concept in 3H is a bit awkward as it suggests that sigma70 stops at -10 motifs before planting down on the -35. Considering the cited paper is still in the preprint stages (and did not observe these Shiko -35 emergences), I am concerned about how this particular example will be received by the community. Perhaps more care could be done to verify that this example is consistent with generally accepted mechanisms of promoter recognition or a short clarification could be added to clarify the extent of the analogy.

      Thank you for raising this point. We decided to remove the Shiko analogy, because several readers assumed that it relates to the physical binding of RNA polymerase, rather than being an evolutionary mechanism of mutations forming complementary motifs in a stepwise manner.

      Lines 323-326: It would be helpful to describe a more systematic approach to defining emergence events into different categories. A clear definition of each category in the methods or main text would help others consistently refer to these concepts in the future. This could be helped by showing the actual parent vs daughter sequences as a supplementary figure to figures 4B, 4D, & 4G.

      We agree this could have been more clearly communicated. We have addressed this by 1) simplifying the nomenclatures of these categories and  2) clearly defining these categories, and 3) showing the actual parent vs daughter sequences in Figure 4, and Supplemental Figures S9, S10, S11, and S12. More specifically:

      (1) Simplifying the nomenclature. We highlight events where gaining new -10 and -35 boxes can modify the promoter activity of parent sequences with promoter activity. This occurs when a new -10 or -35 box appears that partially overlaps with the -10 or -35 box of the actual promoter. Thus, we rename two terms: hetero-gain and homo-gain, shown in Figure 4B:

      (2) We clearly define these categories (lines 430-435):

      “We found that these mutations frequently create new boxes overlapping those we had identified as part of a promoter (Fig S9). This occurs when mutations create a -10 box overlapping a -10 box, a -35 box overlapping a 35 box, a -10 box overlapping a -35 box, or a -35 box overlapping a -10 box. We call the resulting event a “homogain” when the new box is of the same type as the one it overlaps, and otherwise a “hetero-gain”. In either case, the creation of the new box does not always destroy the original box.”

      In the original manuscript, there was an additional third category, where gaining a -35 box upstream of the promoter’s -35 box, and gaining a -10 box upstream of the promoter’s -10 box decreased expression. We referred to this as a “tandem motif” and it can be found in Figure S12C,D. However, in response to comment “(4) Ignoring or misrepresenting the literature” from Reviewer #3, we carried out an analysis of the binding of H-NS (see Figure 5 and Figure S12). This analysis revealed that this “tandem motif” phenomenon was actually the result of changing the affinity of H-NS to these regions. Thus, the “tandem motif” is probably spurious.

      DISCUSSION:

      Line 378-379: Since hotspots are essentially areas where promoters appear, wouldn't it be obvious that having more hotspots (i.e. areas where more promoters appear) would equate to a higher probability of new promoters? It would be helpful to clarify why this isn't obvious. This could be resolved by adding more complexity to the statement, such as showing that the level of mutual information found in a hotspot or across all hotspots in a sequence is correlated with Pnew.

      A fair criticism. In response, we have chosen to remove the analysis of this trend from the manuscript entirely. (Additionally, Pnew and mutual information calculations both relied on the fluorescence scores of daughter sequences, so the finding was circular in its logic.)

      Line 394-396: This comparison of findings to Bykov et al should include a bit more justification for the proposed mechanism and how it specifically was observed in this paper. What did they observe and how do these findings relate?

      We gladly followed this suggestion, and added the following two paragraphs to the discussion (lines 622-640).

      “A previous study randomly mutagenized the appY promoter island upstream of a GFP reporter, and isolated variants with increased and decreased GFP expression. The authors found that variants with higher GFP expression acquired mutations that 1) improve a -10 box to better match its consensus, and simultaneously 2) destroy other -10 and -35 boxes (Bykov et al., 2020). The authors concluded that additional -10 and -35 boxes repress expression driven by promoter islands. Our data challenge this conclusion in several ways. 

      First, we find that only ~13% of -10 and -35 boxes in promoter islands actually contribute to promoter activity. Extrapolating this percentage to the appY promoter island, ~87% (100% - 13%) of the motifs would not be contributing to its activity. Assuming the appY promoter island is not an outlier, this would insinuate that during random mutagenesis, these inert motifs might have accumulated mutations that do not change fluorescence. Indeed, Bykov et al. (Bykov et al., 2020) also found that a similar frequency of -10 and -35 boxes were destroyed in variants selected for lower GFP expression, which supports this argument. Second, we find no evidence that creating a -10 or -35 box lowers promoter activity in any of our 50 parent sequences. Third, we also find no evidence that destruction of a -10 or -35 box increases promoter activity without plausible alternative explanations, i.e. overlap of the destroyed box with a H-NS site, destruction of the promoter, or simultaneous creation of another motif as a result of the destruction. In sum, -10 and 35 boxes are not likely to repress promoter activity. “

      METHODS:

      Line 500: Could you provide more details on PMR1 (e.g. size, copy number, RBS strength) or a reference? I could not find this easily.

      Thank you for pointing out this oversight. In response, we have added the following subsection to the methods (lines 740-748):

      “Plasmid MR1 (pMR1)

      The plasmid MR1 (pMR1) is a variant of the plasmid RV2 (pRV2) in which the kan resistance gene has been swapped with the cm resistance gene (Guazzaroni and Silva-Rocha, 2014). Plasmid pMR1 encodes the BBa_J34801 ribosomal binding site (RBS, AAAGAGGAGAAA) 6 bp upstream of the start codon for GFP(LVA). The plasmid also encodes a putative RBS (AAGGGAGG) (Cazemier et al., 1999) 5 bp upstream of the start codon for mCherry on the opposite strand.

      The plasmid additionally contains the low-to-medium copy number origin of replication p15A (Westmann et al., 2018).

      A map of the plasmid is available on the Github repository: https://github.com/tfuqua95/promoter_islands.”

      Line 581: What was the sequencing instrument &/or depth?

      We now report this information as follows (Methods, lines 918-922):

      “Illumina sequencing

      The amplicon pool was sequenced by Eurofins Genomics (Eurofins GmbH, Germany) using a NovaSeq 6000 (Illumina, USA) sequencer, with an S4 flow cell, and a PE150 (Paired-end 150 bp) run. In total, 282’843’000 reads and 84’852’900’000 bases were sequenced. Raw sequencing reads can be found here: https://www.ncbi.nlm.nih.gov/bioproject/1071572.”

      SUPPLEMENT:

      Supplementary Figure 2: Why does the GFP control produce a bimodal distribution?

      The GFP+ culture was inoculated directly from a glycerol stock. The bimodal distribution probably results from a subset of the bacteria having lost the GFP-coding insert, because the left-most peak coincides with the negative control.

      Reviewer #2 (Recommendations For The Authors):

      This paper would benefit from a clear definition of what constitutes an active promoter as this is only mentioned as justification for the use of arbitrary values for fluorescence.

      Good point. To clarify, we now include this new paragraph in the introduction (lines 112-119):

      “In this study, we define a promoter as a DNA sequence that drives the expression of a (fluorescent) protein whose expression level, measured by its fluorescence, is greater than a defined threshold. We use a threshold of 1.5 arbitrary units (a.u.) of fluorescence. This definition does not distinguish between transcription and translation. We chose it because protein expression is usually more important than RNA expression whenever natural selection acts on gene expression, because it is the primary phenotype visible to natural selection (Jiang et al., 2023).”

      There needs to be a clear distinction in the use of the word sequences as often interchange sequences when meaning the 25 parent sequences and then the 50 possible sequences directions the promoter can act. It is confusing going from one to the other.

      We agree that this distinction is important. To make it clearer, we now introduce an additional term (lines 119-130). Our experiments start from 25 promoter island fragments (P1-P25), which we now call template sequences. Each template sequence comprises both DNA strands. The parent sequences are the top and bottom strands of each template sequence. Therefore, there are now 50 parent sequences (P1-GFP, P1-RFP, P2-GFP…, P25-RFP). By treating each strand as its own sequence, we no longer have to refer to the strand, avoiding the earlier confusion.

      The description of the hotspots is often unclear and trying to determine if 3 out of 9 hotspots come from one parent sequence or multiple is not possible. A table denoting this information would be most helpful.

      We agree, and now provide this information in Data S3.

      Finally, the description of the proposed mechanism of promoter activation via mutation of motifs should not be in the results but in the discussion, as it has insufficient evidence and would require further experimental validation.

      We remedied this problem by providing experimental validation of the proposed mechanisms. Specifically, we created the precise mutations that caused a loss or gain of a -10 or a -35 box, and measured the level of gene expression they drive with a plate reader. Because we chose to provide this experimental validation, we opted to leave the mechanisms of promoter activation in the results section.

      The (Fuqua and Wagner 20023) paper is not in the references.

      We have added Fuqua and Wagner, BiorXiv 2023 to the references.

      I enjoyed the paper and wish the authors the best for their future work.

      Thank you for taking the time to review our manuscript!

      Reviewer #3 (Recommendations For The Authors):

      The paper has major flaws. For example:

      The data need to be analysed with correct promoter sequence element sequences (TTGACA for the -35 element).

      The discrepancy lies in the frequency of A’s vs C’s at position #5 of the PWM. Our PWM was built with more A’s than C’s at this position, but also includes C’s in this position. However, we respectfully disagree that using a different -35 box PWM is going to change the outcomes of our study. First, positions 4-6 of the PWM barely have any information content (bits) compared to positions 1-3 (see Fig 1A). This assertion is not just based on our own PWM, but based on ample precedent in the literature. In PMID 14529615, TTG is present in 38% of all -35 boxes, but ACA only 8%. In PMID 29388765, with the -10 instance TATAAT, the -35 instance TTGCAA yields stronger promoters compared to the -35 instance TTGACA (See their Figure 3B). In PMID 29745856 (Figure 2), the most information content lies in positions 1-3, with the A and C at position 5 both nearly equally represented, as in our PWM. In PMID 33958766 (Figure 1) an experimentally-derived -35 box is even reduced to a “partial” -35 box which only includes positions 1 and 2, with consensus: TTnnnn. Additionally, the -35 box PWM that we used significantly and strongly correlates with an experimentally derived -35 box (see Supporting Information from Figure S4 of Belliveau et al., PNAS 2017. Pearson correlation coefficient = 0.89). We now provide DNA sequences for each of the figures to improve accessibility and reproducibility. A reader can now use any PWM or method they wish to interpret the data.

      The data need to be analysed taking into account the role of other promoter elements and sequences for translation.

      Point well taken. 

      Thank you for bringing this oversight to our attention. We have performed two independent analyses to explore the role of TGn in promoter emergence in evolution. First, we computationally searched for -10 boxes with the bases TGn immediately upstream of them in the parent sequences, and found 18 of these “extended -10 boxes” in the parents (lines 143145):

      “On average, each parent sequence contains ~5.32 -10 boxes and ~7.04 -35 boxes (Fig S1). 18 of these -10 boxes also include the TGn motif upstream of the hexamer.”

      However, only 20% of these boxes were found in parents with promoter activity (lines 182-185):

      “We also note that 30% (15/50) of parents have the TGn motif upstream of a -10 box, but only 20% (3/15) of these parents have promoter activity (underlined with promoter activity: P4-RFP, P6-RFP, P8-RFP, P9-RFP, P10-RFP, P11GFP, P12-GFP, P17-GFP, P18-GFP, P18-RFP, P19-RFP, P22-RFP, P24-GFP, P25-GFP, P25-RFP).” 

      Second, we computationally searched through all of the daughter sequences to identify new -10 boxes with TGn immediately upstream. We found 114 -10 boxes with the bases TGn upstream. However, only 5 new -10 boxes (2 with TGn) were associated with increasing fluorescence (lines 338-345):

      “Mutations indeed created many new -10 and -35 boxes in our daughter sequences. On average, 39.5 and 39.4 new 10 and -35 boxes emerged at unique positions within the daughter sequences of each mutagenized parent (Fig 3A,B), with 1’562 and 1’576 new locations for -10 boxes and -35 boxes, respectively. ~22% (684/3’138) of these new boxes are spaced 15-20 bp away from their cognate box, and ~7.3% (114/1’562) of the new -10 boxes have the TGn motif upstream of them. However, only a mere five of the new -10 boxes and four of the new -35 boxes are significantly associated with increasing fluorescence by more than +0.5 a.u. (Fig 3C,D).”

      In addition, we now study the role of UP elements. This analysis showed that the UP element plays a negligible role in promoter emergence within our dataset.  It is discussed in a new subsection of the results (lines 591-608).

      “The UP-element does not strongly influence promoter activity in our dataset.

      The UP element is an additional AT-rich promoter motif that can lie stream of a -35 box in a promoter sequence (Estrem et al., 1998; Ross et al., 1993). We asked whether the creation of UP-elements also creates or modulates promoter activity in our dataset. To this end, we first identified a previously characterized position-weight matrix for the UP element (NNAAAWWTWTTTTNNWAAASYM, PWM threshold score = 19.2 bits) (Estrem et al., 1998) (Fig S13A). We then computationally searched for UP-element-specific hotspots within the parent sequences, i.e., locations in which mutations that gain or lose UP-elements lead to significant fluorescence increases (Mann-Whitney U-test, Fig S7 and methods. See Data S8 for the coordinates, fluorescence changes, and significance). The analysis did not identify any UP elements whose mutation significantly changes fluorescence. 

      We then repeated the analysis with a less stringent PWM threshold of 4.8 bits (1/4th of the PWM threshold score). This time, we identified 74 “UP-like” elements that are created or destroyed at unique positions within the parents. 23 of these motifs significantly change fluorescence when created or destroyed. However, even with this liberal threshold, none of these UP-like elements increase fluorescence by more than 0.5 a.u. when gained, or decrease fluorescence by more than 0.5 a.u. when lost (Fig S13B). This finding ultimately suggests that the UP element plays a negligible role in promoter emergence within our dataset.”

      Collectively, these additional analyses suggest that the presence of TGn plus a -10 box is insufficient to create promoter activity, and that the UP element does not play a significant role in promoter emergence or evolution.

      The full sequences used need to be provided and mutations resulting in new promoters need to be shown.

      To Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12, we have added the sequences which created or the destroyed the promoters, and their PWM scores.

      The paper needs to be rewritten to take into account the relevant literature on i) promoter islands (i.e. sections of horizontally acquired AT-rich DNA) ii) generation and loss of promoters by mutation.

      We have rewritten the introduction. The majority of these points are now addressed in the following two new paragraphs (lines 92-112):

      “Recent work shows that mutations can help new promoters to emerge from promoter motifs or from sequences adjacent to such motifs (Bykov et al., 2020; Fuqua and Wagner, 2023; Yona et al., 2018). However, encoding -10 and -35 boxes is insufficient to drive complete transcription of a gene coding sequence. For instance, the E. coli genome contains clusters of -10 and -35 boxes that are bound by RNA polymerase and produce short oligonucleotide fragments, but rarely create complete transcripts. Such clusters are called promoter islands, and are strongly associated with horizontally-transferred DNA (Bykov et al., 2020; Panyukov and Ozoline, 2013; Purtov et al., 2014; Shavkunov et al., 2009). 

      There are two proposed explanations for why promoter islands do not create full transcripts. First, the TF H-NS may repress promoter activity in promoter islands. This is because in a Δhns background, transcript levels from the promoter islands increases (Purtov et al., 2014). However, mutagenizing a specific promoter island (appY) until it transcribes a GFP reporter, reveals that in-vitro H-NS binding does not significantly change when GFP levels increase (Bykov et al., 2020). Thus, it is not clear whether H-NS actually represses the complete transcription of these sequences. The second proposed explanation is that excessive promoter motifs silence transcription. The aforementioned study found that promoter activity increases when mutations improve a -10 box to better match its consensus (TAAAAAT→TATACT), while simultaneously destroying surrounding -10 and -35 boxes (Bykov et al., 2020). However, we note that if these surrounding motifs never contributed to GFP fluorescence to begin with, then mutations could also simply have accumulated in them during random mutagenesis without affecting promoter activity.”

      In closing, we would like to thank all three reviewers again for your time to engage with this manuscript.

      Summary of specific changes that we have made to each section of the manuscript 

      • Abstract

      - We updated the abstract to include the finding that more than 1’500 new -10s and 35s are created in our dataset, but only ~0.3% of them actually create de-novo promoter activity.

      - We no longer highlight the conclusion that the majority of promoters emerge and evolve from -10 and -35 boxes.

      • Introduction

      - We have added more background information about the UP-element and the TGn motif.

      - We better describe the promoter islands and the results identified by Bykov et al., 2020.

      • Results: Promoter island sequences are enriched with motifs for -10 and -35 boxes.

      - We clarify how the -10 and -35 PWMs we use were derived.

      - We refer to the 25 promoter island fragments as “Template sequences” (P1-P25). The “parent sequences” now correspond to the top and bottom strands of each template (N=50, P1-GFP, P1-RFP, P2-GFP, …, P25-RFP).

      - We elaborate that ~7% of the -10 boxes in the template sequences have the TGn motif.

      - In the previous version of the manuscript, if there were overlapping -10 boxes or overlapping -35 box, we counted these to be a single -10 box or a single -35 box, respectively. In the new version of the manuscript, we now treat each motif as an independent box. Because of this, the number of -10 and -35 boxes per parent have slightly increased.  

      •Results: Non-promoters vary widely in their potential to become promoters.

      - We make a clear distinction between promoters and non-promoters, and define the parent sequences.

      - We note that only 20% of parents with an “extended -10 box” have promoter activity.

      • Results: Promoter emergence correlates with minute differences in background promoter levels.

      - We added an analysis where we compare Pnew to the parent fluorescence levels, even if they are below 1.5 a.u. We find that the distribution of Pnew matches a sigmoid function.

      • Results: Promoter emergence does not correlate with simple sequence features

      - We added an analysis comparing k-mer counts to Pnew.

      - We updated the way we count -10 and -35 boxes, and recalculated the correlation with Pnew. The P and R2 values have changed, but Pnew still does not significantly correlate with -10 or -35 box counts.

      • Results: Promoters emerge and evolve only from specific subsets of -10 and -35 boxes

      - We have added an analysis where we computationally scramble the wild-type parent sequences while maintaining the coordinates of the mutual information hotspots. This reveals that the overlap with -10 and -35 motifs is not a coincidence of dense promoter motif encoding.

      We found a computational error in our analysis and updated the percent overlap between -10 boxes and -35 boxes with mutual information hotspots. The results are similar. o 14% of -10 boxes overlap with hotspots with our new way of defining -10 and -35 boxes.

      • Results: New -10 and -35 boxes readily emerge, but rarely lead to de-novo promoter activity

      - We quantify how often a new -10 and -35 box is created at a unique position within our collection of promoter fragments, and how often this results in a -10 and -35 box being appropriately spaced, and how often this actually leads to de-novo promoter activity. o We quantify how often a TGn sequence lies upstream of a new -10 box.

      • Results: Promoters can emerge when mutations create motifs but not by destroying them.

      - For each example, we added the DNA sequences of the wild-type region of interest and the mutant region of interest that results in the gain of promoter activity, and their respective PWM scores. 

      - We created constructs to validate each example by testing their fluorescence on a plate reader.

      - We removed the P1-GFP example from the main figure, as it was a false-positive in the dataset. It is now in Fig S8.

      - We removed the Shiko Emergence metaphor because it could be confused with a binding mechanism for RNA polymerase.

      • Results – Gaining new motifs over existing motifs increases and decreases promoter activity.

      - We removed the “Tandem motif” because it is more likely caused by H-NS binding.

      - We renamed the mechanisms to be “hetero-gain” and “homo-gain” for simplicity, and clearly define how we classified each sequence into each category.

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the predicted point mutations.

      • Results – Histone-like nucleoid-structuring protein (H-NS) represses P12-RFP and P22-GFP.

      - This is a new analysis, which explores the role of the TF H-NS in repressing the parent sequences. 

      - We identified putative H-NS motifs in P12-RFP and P22-GFP.

      - We show experimentally that in a H-NS null background, a bidirectional promoter (P20) becomes unidirectional, even though P20 does not contain an obvious H-NS motif.

      - In the original version of the manuscript, we describe a phenomenon where gaining a -35 box upstream of a promoter’s -35 box, or a -10 box upstream of a promoter’s -10 box significantly decreases expression. We called this phenomenon a “tandem motif.” However, in the newest version of the manuscript, we find that these fluorescence decreases are rescued in a H-NS null background, suggesting the finding was actually due to H-NS binding modulation and not -10 and -35 boxes.

      • Results – The UP-element does not strongly influence promoter activity in our dataset.

      We used a PWM for the UP element to see if gaining or losing UP motifs was significantly correlated with increasing or decreasing expression. Even with a liberal PWM threshold, the analysis did not find any UP elements.

      • Discussion

      - We rewrote the discussion to account for the new analyses and the results on H-NS, the UP-element, and the extended -10.

      - We better explain how our results clash with the results from the Bykov paper.

      - We fit our results into the context of David Grainger’s papers.

      • Methods

      - Added an explanation about pMR1.

      - Added methods describing how we created the point mutation constructs.

      - Added the methods for the plate reader.

      - Added the methods for Illumina sequencing.

      - Added the methods for the sigmoid curve-fitting.

      • Figure 1

      - Panel E compares how Pnew (the probability of a daughter sequence having a fluorescence score greater than 1.5 a.u.) associates with the fluorescence scores of each parent sequence.

      - Panel F was originally in Figure S5. In the originally submitted version of the manuscript, if there were overlapping -10s or overlapping -35s, we counted these to be a single -10 or a single -35, respectively. In the new version of the manuscript, we now treat each motif as an independent box. Because of this, the r2 and p values have changed, but the conclusions have not (Pnew still does not significantly correlate with -10 or -35 box counts).

      • Figure 2

      - Panel C now includes a stacked barplot showing the percentage of -10 and -35 boxes that overlap with mutual information hotspots when the parent sequences are randomly scrambled computationally.

      • Figure 3

      - Panels A-C were added to explain how we define a new -10/-35 box, how many such new boxes each parent has. These panels also illustrate how we associate the presence or absence of a motif with significant changes in fluorescence scores of the daughter sequences.

      - We moved the example of P1-GFP to Figure S8 because when we tested the specific mutation which leads to gaining the -10 box, fluorescence did not change.

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from reporter constructs harboring the point mutations predicted by our computational analyses.

      - Cartoons of RNA polymerase have been removed.

      • Figure 4

      - The tandem-motif has been removed from the figure.

      - Cartoons of RNA polymerase have been removed.

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the point mutations predicted by our computational analyses.

      • Figure 5

      - This is a new figure analyzing the role of H-NS in promoter evolution and emergence.

      • Figure S4

      - Panel B now shows the wild-type parent scores and their standard deviations from the sort-seq experiment.

      • Figure S5

      - Panels with -10 and -35 box counts moved to Figure 1.

      - The panel comparing Pnew to hotspot counts was removed.

      - Correlations between different k-mers and Pnew are added to panels C-H.

      • Figure S8

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the point mutations predicted by our computational analyses.

      • Figure S9

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the point mutations predicted by our computational analyses.

      • Figure S10

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the point mutations predicted by our computational analyses.

      • Figure S11

      - Added DNA sequences and PWM scores.

      • Figure S12

      - A new figure with further insights about H-NS.

      • Figure S13

      - A new figure regarding the UP-element analysis.

      • Figure S14

      - Added Panel D to show how we created mutant reporter constructs for validation.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Liu et al. present CROWN-seq, a technique that simultaneously identifies transcription-start nucleotides and quantifies N6,2'-O-dimethyladenosine (m6Am) stoichiometry. This method is derived from ReCappable-seq and GLORI, a chemical deamination approach that differentiates A and N6-methylated A. Using ReCappable-seq and CROWN-seq, the authors found that genes frequently utilize multiple transcription start sites, and isoforms beginning with an Am are almost always N6-methylated. These findings are consistently observed across nine cell lines. Unlike prior reports that associated m6Am with mRNA stability and expression, the authors suggest here that m6Am may increase transcription when combined with specific promoter sequences and initiation mechanisms. Additionally, they report intriguing insights on m6Am in snRNA and snoRNA and its regulation by FTO. Overall, the manuscript presents a strong body of work that will significantly advance m6Am research.

      Strengths:

      The technology development part of the work is exceptionally strong, with thoughtful controls and well-supported conclusions.

      We appreciate the reviewer for the very positive assessment of the study. We have addressed the concerns below.

      Weaknesses:

      Given the high stoichiometry of m6Am, further association with upstream and downstream sequences (or promoter sequences) does not appear to yield strong signals. As such, transcription initiation regulation by m6Am, suggested by the current work, warrants further investigation.

      We thank the reviewer for the insightful comments. We have softened the language related to m6Am and transcription regulation. We totally agree with the reviewer that future investigation is required to determine the molecular mechanism behind m6Am and transcription regulation.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript "Decoding m6Am by simultaneous transcription-start mapping and methylation quantification" Liu and co-workers describe the development and application of CROWN-Seq, a new specialized library preparation and sequencing technique designed to detect the presence of cap-adjacent N6,2'-O-dimethyladenosine (m6Am) with single nucleotide resolution. Such a technique was a key need in the field since prior attempts to get accurate positional or quantitative measurements of m6Am positioning yielded starkly different results and failed to generate a consistent set of targets. As noted in the strengths section below the authors have developed a robust assay that moves the field forward.

      Furthermore, their results show that most mRNAs whose transcription start nucleotide (TSN) is an 'A' are in fact m6Am (85%+ for most cell lines). They also show that snRNAs and snoRNAs have a substantially lower prevalence of m6Am TSNs.

      Strengths:

      Critically, the authors spent substantial time and effort to validate and benchmark the new technique with spike-in standards during development, cross-comparison with prior techniques, and validation of the technique's performance using a genetic PCIF1 knockout. Finally, they assayed nine different cell lines to cross-validate their results. The outcome of their work (a reliable and accurate method to catalog cap-adjacent m6Am) is a particularly notable achievement and is a needed advance for the field.

      Weaknesses:

      No major concerns were identified by this reviewer.

      We thank the reviewer for the positive assessment of the method and dataset. We have addressed the concerns below.

      Mid-level Concerns:

      (1) In Lines 625 and 626, the authors state that “our data suggest that mRNAs initate (mis-spelled by authors) with either Gm, Cm, Um, or m6Am.” This reviewer took those words to mean that for A-initiated mRNAs, m6Am was the ‘default’ TSN. This contradicts their later premise that promoter sequences play a role in whether m6Am is deposited.

      We thank the reviewer for the comment. We have changed this sentence into “Instead, our data suggest that mRNAs initiate with either Gm, Cm, Um, or Am, where Am are mostly m6Am modified.” The revised sentence separates the processes of transcription initiation and m6Am deposition, which will not confuse the reader.

      (2) Further, the following paragraph (lines 633-641) uses fairly definitive language that is unsupported by their data. For example in lines 637 and 638 they state “We found that these differences are often due to the specific TSS motif.” Simply, using ‘due to’ implies a causative relationship between the promoter sequences and m6Am has been demonstrated. The authors do not show causation, rather they demonstrate a correlation between the promoter sequences and an m6Am TSN. Finally, despite claiming a causal relationship, the authors do not put forth any conceptual framework or possible mechanism to explain the link between the promoter sequences and transcripts initiating with an m6Am.

      (3) The authors need to soften the language concerning these data and their interpretation to reflect the correlative nature of the data presented to link m6Am and transcription initiation.

      For (2) and (3). We have softened the language in the revised manuscript. Specifically, for lines 633-641 in the original manuscript, we have changed “are often due to” into “are often related to” in the revised manuscript, which claims a correlation rather than a causation.

      Reviewer #3 (Public review):

      Summary:

      m6Am is an abundant mRNA modification present on the TSN. Unlike the structurally similar and abundant internal mRNA modification m6A, m6Am’s function has been controversial. One way to resolve controversies surrounding mRNA modification functions has been to develop new ways to better profile said mRNA modification. Here, Liu et al. developed a new method (based on GLORI-seq for m6A-sequencing), for antibody-independent sequencing of m6Am (CROWN-seq). Using appropriate spike-in controls and knockout cell lines, Liu et al. clearly demonstrated CROWN-seq’s precision and quantitative accuracy for profiling transcriptome-wide m6Am. Subsequently, the authors used CROWN-seq to greatly expand the number of known m6Am sites in various cell lines and also determine m6Am stoichiometry to generally be high for most genes. CROWN-seq identified gene promoter motifs that correlate best with high stoichiometry m6Am sites, thereby identifying new determinants of m6Am stoichiometry. CROWN-seq also helped reveal that m6Am does not regulate mRNA stability or translation (as opposed to past reported functions). Rather, m6Am stoichiometry correlates well with transcription levels. Finally, Liu et al. reaffirmed that FTO mainly demethylates m6Am, not of mRNA but of snRNAs and snoRNAs.

      Strengths:

      This is a well-written manuscript that describes and validates a new m6Am-sequencing method: CROWN-seq as the first m6Am-sequencing method that can both quantify m6Am stoichiometry and profile m6Am at single-base resolution. These advantages facilitated Liu et al. to uncover new potential findings related to m6Am regulation and function. I am confident that CROWN-seq will likely be the gold standard for m6Am-sequencing henceforth.

      Weaknesses:

      Though the authors have uncovered a potentially new function for m6Am, they need to be clear that without identifying a mechanism, their data might only be demonstrating a correlation between the presence of m6Am and transcriptional regulation rather than causality.

      We thank the reviewer for the very positive assessment of the CROWN-seq method. We have softened the language which is related to the correlation between m6Am and transcription regulation.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      In this work, Qiu and colleagues examined the effects of preovulatory (i.e., proestrous or late follicular phase) levels of circulating estradiol on multiple calcium and potassium channel conductances in arcuate nucleus kisspeptin neurons. Although these cells are strongly linked to a role as the "GnRH pulse generator," the goal here was to examine the physiological properties of these cells in a hormonal milieu mimicking late proestrus, the time of the preovulatory GnRH-LH surge. Computational modeling is used to manipulate multiple conductances simultaneously and support a role for certain calcium channels in facilitating a switch in firing mode from tonic to bursting. CRISPR knockdown of the TRPC5 channel reduced overall excitability, but this was only examined in cells from ovariectomized mice without estradiol treatment.

      Comments to address most recent author response:

      The concern regarding the CRISPR experiments being confined to OVX mice is that the results can only suggest that CRISPR-mediated knockdown of TRPC5 can, at best, phenocopy the OVX+E condition. A reciprocal experiment in the opposite direction (for example, that returning TRPC5 to OVX levels in OVX+E mice prevents the changes in firing activity and pattern typical of the OVX+E2 condition) would strengthen the indication that E2-sensitive changes in TRPC5 expression and function are critically important to surge function. Acknowledging this as a limitation of the studies would help to better contextualize the value of the CRISPR experiments to an understanding of surge mechanisms when done only in OVX conditions.

      We have noted in the manuscript that “It would be of interest in future experiments to do the reciprocal experiment to see if overexpressing Trpc5 channels in Kiss1ARH neurons from OVX + E2 females restores the RMP and  “rescues” the synchronization phenotype.”

      The nature of the confusion regarding the consideration of OVX+E2 conditions in the computational model primarily arises from the methods description in the supplemental file: "The effect of E2 on ionic currents is modelled as a change in the maximum conductance parameter. For currents IM,IT, ICa and ITRPC5 this change is inferred from the qPCR data assuming that the conductance is directly proportional to the mRNA expression." If these were instead based on the whole-cell recordings as the authors now indicate in their response, then this description needs to be edited and clarified accordingly. Furthermore, the section states, "For ISK, IBK, Ileak, the OVX and OVX+E2 conductances are obtained from current-voltage relationships recorded from Kiss1ARH neurons in the absence/presence of iberiotoxin (BK blocker) and apamin (SK blocker). All other currents were assumed to be unaffected by E2." This section thus does not directly indicate that the recordings in the stated figures were used in the model, and moreover suggests that currents besides ISK, IBK, and Ileak were not different in OVX+E2 conditions.

      The prior evidence stated for correlation of mRNA and channel conductance is not explicitly cited in the manuscript. It is well known that post-translational modifications, physiological modulation of individual channel biophysical properties, and many other factors can influence the end output of a membrane conductance. Therefore, the authors should, at minimum, provide a literature citation supporting the assumption used here.

      We have re-written the paragraph on “Modelling the effects of E2” in the Supplemental Information (now Appendix 1)  to clarify the that the modeling was based on a combination of electrophysiological recordings and the qPCR data presented in this and previous publications. The statement that “all other currents were assumed to be unaffected by E2” was a misstatement and has been deleted. As per the reviewer’s request, we have listed seven publications that document the correlation between the mRNA expression and channel conductance for the various channels. We thank the reviewer for the suggestion.

      Reviewer #2 (Public review):

      Summary:

      Kisspeptin neurons of the arcuate nucleus (ARC) are thought to be responsible for the pulsatile GnRH secretory pattern and to mediate feedback regulation of GnRH secretion by estradiol (E2). Evidence in the literature, including the work of the authors, indicates that ARC kisspeptin coordinate their activity through reciprocal synaptic interactions and the release of glutamate and of neuropeptide neurokinin B (NKB), which they co-express. The authors show here that E2 regulates the expression of genes encoding different voltage-dependent calcium channels, calcium-dependent potassium channels and canonical transient receptor potential (TRPC5) channels and of the corresponding ionic currents in ARC kisspeptin neurons. Using computer simulations of the electrical activity of ARC kisspeptin neurons, the authors also provide evidence of what these changes translate into in terms of these cells' firing patterns. The experiments reveal that E2 upregulates various voltage-gated calcium currents as well as 2 subtypes of calcium-dependent potassium currents, while decreasing TRPC5 expression (an ion channel downstream of NKB receptor activation), the slow excitatory synaptic potentials (slow EPSP) elicited in ARC kisspeptin neurons by NKB release and expression of the G protein-associated inward-rectifying potassium channel (GIRK). Based on these results, and on those of computer simulations, the authors propose that E2 promotes a functional transition of ARC kisspeptin neurons from neuropeptide-mediated sustained firing that supports coordinated activity for pulsatile GnRH secretion to a less intense burst-like firing pattern that could favor glutamate release from ARC kisspeptin. The authors suggest that the latter might be important for the generation of the preovulatory surge in females.

      Strengths:

      The authors combined multiple approaches in vitro and in silico to gain insights into the impact of E2 on the electrical activity of ARC kisspeptin neurons. These include patch-clamp electrophysiology combined with selective optogenetic stimulation of ARC kisspeptin neurons, reverse transcriptase quantitative PCR, pharmacology and CRISPR-Cas9-mediated knockdown of the Trpc5 gene. The addition of computer simulations for understanding the impact of E2 on the electrical activity of ARC kisspeptin cells is also a strength.

      The authors add interesting information on the complement of ionic currents in ARC kisspeptin neurons and on their regulation by E2 to what was already known in the literature. Pharmacological and electrophysiological experiments appear of the highest standards and robust statistical analyses are provided throughout. The impact of E2 replacement on calcium and potassium currents is compelling. Likewise, the results of Trpc5 gene knockdown do provide good evidence that the TRPC5 channel plays a key role in mediating the NKB-mediated slow EPSP. Surprisingly, this also revealed an unsuspected role for this channel in regulating the membrane potential and excitability of ARC kisspeptin neurons.

      Weaknesses:

      The manuscript also has weaknesses that obscure some of the conclusions drawn by the authors.

      One is that the authors compare here two conditions, OVX versus OVX replaced with high E2, that may not reflect the physiological conditions under which the proposed transition between neuropeptide-dependent sustained firing and less intense burst firing might take place (i.e. the diestrous [low E2] and proestrous [high E2] stages of the estrous cycle). This is an important caveat to keep in mind when interpreting the authors' findings. Indeed, that E2 alters certain ionic currents when added back to OVX females, does not mean that the magnitude of all of these ionic currents will vary during the estrous cycle.

      We do know that the slow EPSP, which is generated by TRPC5 channels, tracks beautifully with the steroid state of female mice.  Using our E2 treatment paradigm that generates a LH surge in OVX females (left panel in Author response image 1), there is no difference in the amplitude of the slow EPSP in proestrous versus OVX + E2 females (right panel in Author response image 1).    

      Author response image 1.

      In addition, although the computational modeling indicates a role of the various E2-modulated conductances in causing a transition in ARC kisspeptin neuron firing pattern, their role is not directly tested in physiological recordings, weakening the link between these changes and the shift in firing patterns.

      In future experiments we will test directly the physiological contribution of the other E2-modulated conductances in causing the transition in the firing pattern of arcuate Kiss1 neurons using CRISPR/SaCas9 technology as we have documented for the TRPC5 channel (e.g., Figures 11 and 12).

      Overall, the manuscript provides interesting information about the effects of E2 on specific ionic currents in ARC kisspeptin neurons and some insights into the functional impact of these changes. However, some of the conclusions of the work, with regard, in particular, to the role of these changes in ion channels and to their implications for the LH surge, are not fully supported by the findings.

      ---------

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, Qiu and colleagues examined the effects of preovulatory (i.e., proestrous or late follicular phase) levels of circulating estradiol on multiple calcium and potassium channel conductances in arcuate nucleus kisspeptin neurons. Although these cells are strongly linked to a role as the "GnRH pulse generator," the goal here was to examine the physiological properties of these cells in a hormonal milieu mimicking late proestrus, the time of the preovulatory GnRH-LH surge. Computational modeling is used to manipulate multiple conductances simultaneously and support a role for certain calcium channels in facilitating a switch in firing mode from tonic to bursting. CRISPR knockdown of the TRPC5 channel reduced overall excitability, but this was only examined in cells from ovariectomized mice without estradiol treatment. The manuscript has been substantially improved from the initial version by the addition of new experiments and clarification of important figures. Importantly, the overlap of data with previous reports from the same group has been corrected.

      Strengths:

      (1) Examination of multiple types of calcium and potassium currents, both through electrophysiology and molecular biology.

      (2) Focus on arcuate kisspeptin neurons during the surge is relatively conceptually novel as the anteroventral periventricular nucleus (AVPV) kisspeptin neurons have received much more attention as the "surge generator" population.

      (3) The modeling studies allow for direct examination of manipulation of single and multiple conductances, whereas the electrophysiology studies necessarily require examination of each current in isolation. Construction of an arcuate kisspeptin neuron model promises to be of value to the reproductive neuroendocrinology field.

      Weaknesses:

      A remaining weakness in this revised version of the manuscript is that the relevance of the CRISPR experiments is still rather tenuous given that the goal is to understand what happens in the estrogen-treatment condition, and these experiments were performed only in OVX mice. Similar concerns reflect that the computational model examining the effect of E2 infers multiple conductances based on qPCR data and an assumption that the conductances are directionally proportional to the level of gene expression, and then tunes these to the current recordings obtained from OVX mice, without a direct confirmation in OVX+E2 conditions that the model parameters accurately reflect the properties of these currents in the presence of estrogen.

      We are still puzzled by Reviewer’s concerns about doing the CRISPRing of Trpc5 in the OVX+E2 females.  The Trpc5 channel expression is significantly reduced with the E2 treatment (Figure 10E) which we know translates into a minimal slow EPSP (Figure 2, Qiu eLife 2016) and is essentially equivalent to the slow EPSP amplitude in the Trpc5 mutagenesis in the ovariectomized females (Figure 12).  TRPC5 channel conductance is already at “rock bottom.”  The modeling informs us that such a low TRPC5 conductance will not support a long lasting slow EPSP and sustained firing (Figure 13A).

      Also, we respectively point out that we have published a score of papers over the past 20 years showing that the channel conductance does correlate with the mRNA expression (e.g., Qiu et al., eLife 2018).  Secondly, the model does take into consideration the OVX + E2 conditions (Figure 13B,C) which is based on the extensive whole-cell recordings presented in Figures 4,5,6,7,8 and 9.

      Reviewer #2 (Public Review):

      Summary:

      Kisspeptin neurons of the arcuate nucleus (ARC) are thought to be responsible for the pulsatile GnRH secretory pattern and to mediate feedback regulation of GnRH secretion by estradiol (E2). Evidence in the literature, including the work of the authors, indicates that ARC kisspeptin coordinate their activity through reciprocal synaptic interactions and the release of glutamate and of neuropeptide neurokinin B (NKB), which they co-express. The authors show here that E2 regulates the expression of genes encoding different voltage-dependent calcium channels, calcium-dependent potassium channels and canonical transient receptor potential (TRPC5) channels and of the corresponding ionic currents in ARC kisspeptin neurons. Using computer simulations of the electrical activity of ARC kisspeptin neurons, the authors also provide evidence of what these changes translate into in terms of these cells' firing patterns. The experiments reveal that E2 upregulates various voltage-gated calcium currents as well as 2 subtypes of calcium-dependent potassium currents while decreasing TRPC5 expression (an ion channel downstream of NKB receptor activation), the slow excitatory synaptic potentials (slow EPSP) elicited in ARC kisspeptin neurons by NKB release and expression of the G protein-associated inward-rectifying potassium channel (GIRK). Based on these results, and on those of computer simulations, the authors propose that E2 promotes a functional transition of ARC kisspeptin neurons from neuropeptide-mediated sustained firing that supports coordinated activity for pulsatile GnRH secretion to a less intense burst-like firing pattern that could favor glutamate release from ARC kisspeptin. The authors suggest that the latter might be important for the generation of the preovulatory surge in females.

      Strengths:

      The authors combined multiple approaches in vitro and in silico to gain insights into the impact of E2 on the electrical activity of ARC kisspeptin neurons. These include patch-clamp electrophysiology combined with selective optogenetic stimulation of ARC kisspeptin neurons, reverse transcriptase quantitative PCR, pharmacology and CRISPR-Cas9-mediated knockdown of the Trpc5 gene. The addition of computer simulations for understanding the impact of E2 on the electrical activity of ARC kisspeptin cells is also a strength.

      The authors add interesting information on the complement of ionic currents in ARC kisspeptin neurons and on their regulation by E2 to what was already known in the literature. Pharmacological and electrophysiological experiments appear of the highest standards and robust statistical analyses are provided throughout. The impact of E2 replacement on calcium and potassium currents is compelling. Likewise, the results of Trpc5 gene knockdown do provide good evidence that the TRPC5 channel plays a key role in mediating the NKB-mediated slow EPSP. Surprisingly, this also revealed an unsuspected role for this channel in regulating the membrane potential and excitability of ARC kisspeptin neurons.

      Weaknesses:

      The manuscript also has weaknesses that obscure some of the conclusions drawn by the authors.

      One is that the authors compare here two conditions, OVX versus OVX replaced with high E2, that may not reflect the physiological conditions under which the proposed transition between neuropeptide-dependent sustained firing and less intense burst firing might take place (i.e. the diestrous [low E2] and proestrous [high E2] stages of the estrous cycle). This is an important caveat to keep in mind when interpreting the authors' findings. Indeed, that E2 alters certain ionic currents when added back to OVX females, does not mean that the magnitude of all of these ionic currents will vary during the estrous cycle.

      Unfortunately, mice are a poor reproductive model since female mice do not have a clear follicular (estradiol-driven) phase distinctive from the luteal (progesterone-driven) phase.  Had we utilized a “proestrous” female, we could not with certainty distinguish between the effects of estradiol versus progesterone on the expression of the calcium and potassium channels that were the focus of this study.  Therefore, using our physiological model we can state with confidence that “estradiol elicits distinct firing patterns in arcuate nucleus kisspeptin neurons….”

      Overall, the manuscript provides interesting information about the effects of E2 on specific ionic currents in ARC kisspeptin neurons and some insights into the functional impact of these changes. However, some of the conclusions of the work, with regard, in particular, to the role of these changes in ion channels and their implications for the LH surge, are not fully supported by the findings.

      As we pointed out in the Discussion, the O’Byrne lab has clearly shown the relevance of Kiss1ARH neuronal burst firing and the release of glutamate to its effects on the LH surge:

      “Rather, we postulate that glutamate neurotransmission is more important for excitation of Kiss1AVPV/PeN neurons and facilitating the GnRH (LH) surge with high circulating levels of E2 when peptide neurotransmitters are at a nadir and glutamate levels are high in female Kiss1ARH neurons. Indeed, low frequency (5 Hz) optogenetic stimulation of Kiss1ARH neurons, which only releases glutamate in E2-treated, ovariectomized females (Qiu J. et al., 2016), generates a surge-like increase in LH release during periods of optical stimulation (Lin et al., 2021; Voliotis et al., 2021).  In a subsequent study optical stimulation of Kiss1ARH neuron terminals in the AVPV at 20 Hz, a frequency commonly used for terminal stimulation in vivo, generated a similar surge of LH (Shen et al., 2022).  Additionally, intra-AVPV infusion of glutamate antagonists, AP5+CNQX, completely blocked the LH surge induced by Kiss1ARH terminal photostimulation in the AVPV (Shen et al., 2022).”

      Recommendations for the authors:

      Reviewer #2 (Recommendations for The Authors):

      The reviewer noted the following in the revised manuscript:

      - page 6, the authors may consider adding that presynaptic effects of blocking calcium channels on the slow EPSP cannot be fully ruled out. Indeed, the added experiments do indicate that some of the effects can be explained by impaired regulation of TRPC5 channels by calcium influx through calcium channels; however, the senktide-induced current is not fully blocked by the broad-spectrum calcium channel inhibitor cadmium, suggesting that the effect of blocking these channels on the slow EPSP may involve other mechanisms, such as presynaptic effects.

      Optogenetic stimulation of all Kiss1ARH neurons induces the release of NKB at “physiological” concentrations, which in turn generates a slow EPSP in the recorded Kiss1ARH neuron. Blocking voltage-gated calcium channels can inhibit the NKB release from presynaptic  Kiss1ARH neurons, thereby reducing the amplitude of the slow EPSP. However, in whole-cell recordings of synaptically isolated Kiss1ARH neurons,  senktide directly induces a large inward current (Figure 3F), which is generated by the opening of TRPC5 channels (Qiu et al. J. Neurosci 2021). Voltage-gated calcium channels are coupled to the activation of TRPC5 channels (Blair, Kaczmarek and Clapham, J. Gen Physiol 2009), so by blocking voltage-gated calcium channels, cadmium effectively abrogates the facilitating effects of these channels on TRPC5 channel activation and significantly reduces but does not abolish the inward (excitatory) current (Figures 3F-H). We have clarified in the Results (page 6) that the Kiss1ARH neurons were synaptically isolated as depicted in Figures 3F,G.

      - page 8, bottom, the mean value given for the apamin-sensitive current amplitude in E2 treated females does not match that plotted on the I/V graph in Figure 7F.

      Thank you for pointing out this typographical error, which we have corrected.

    1. Author response:

      Reviewer 1 (Public Review)

      (1) The proposed design is not sufficient to answer the research question. The rationale of the study proposed in the introduction is that auditory stimulation may explain the analgesic effects of RPMS. To answer this question, the authors should have used a factorial design using 4 groups (active RPMS + sound; active RPMS + no sound; sham RPMS + sound; sham RPMS + no sound). Using this design, it would have been possible to determine if the sound, the afferent stimulation, or both are necessary to produce analgesia. Rather, they tested two types of RPMS (iTBS, cTBS) without real rationale, one electrical stimulation and a placebo.

      We will clarify that the study design employed was originally designed to determine whether iTBS or cTBS would be more effective to reduce pain. We included TENS as a positive control, and sham as a negative control. We were indeed surprised by the findings, and present them herein. Future RCTs should be performed to reproduce these findings.

      (2) There are multiple ways that the current design could have introduced biases. The study was not randomized but pseudo-randomised. What does that mean? Was their allocation concealment? Was the assessor and data analyst blinded to group allocation? Did an intention to treat analyses were performed? Did the participants were adequately blinded (was it measured)?

      This study was not designed as an RCT, but rather as experimental study. The study was pseudo-randomized to ensure that the groups had equal allocation and distribution of sexes.

      The groups were blinded to the other stimulations (they were not informed of the various arms of the study, through different consent forms).

      It was not possible to blind the experimenter as the iTBS and cTBS protocols are very different: iTBS has multiple bursts separated by brief intervals, whereas cTBS is continuous). The data were masked for analysis, and only unblinded at the final stage. We will update the manuscript to reflect these changes.

      (3) The TENS parameters used were not optimal and are not those commonly used in clinical practice. This could have explained the lack of TENS effects. The lack of TENS effects has not been discussed and it is concerning. If TENS had been effective (as expected), the story about the auditory effects would not have been presented as the primary mechanisms underlying the current results.

      We acknowledge that this is a limitation of the study. A future study should address this. However, we will not remove the arm for transparency.

      (4) No primary outcome has been identified. It is important to mention that the interpretation of results is based on the presence of only one statistically significant result. Pain intensity and pain unpleasantness are not affected. This was not properly addressed in the Discussion. What does that mean that secondary hyperalgesia is affected but not pain?

      We reiterate that this study was not designed as an RCT, but rather an experimental study with The primary outcomes measures that capture change in  were measures of pain sensitivity (pain intensity NRS, pain unpleasantness NRS, and secondary hyperalgesia). We will clarify this in the revised manuscript.

      We will now include discussion of the effects being solely on secondary hyperalgesia, and not on pain intensity and unpleasantness.

      (5a) The use of secondary hyperalgesia variable is concerning. How is it possible to measure secondary hyperalgesia if there is no lesioned tissue?

      Secondary hyperalgesia refers to hyperalgesia assessed in an area adjacent to or remote of the site of stimulation. In general, it is not required to lesion a tissue to activate the nociceptive system or to induce pain. We have cited other studies that have employed secondary hyperalgesia as a pain outcome measure without inducing a lesion.

      Hyperalgesia reflects increased pain on suprathreshold stimulation. Then, one measures the subjective response to a painful (i.e. suprathreshold) stimulation, then applies a conditioning stimulation (e.g. heat), and measures the subjective response to the same original stimulus. If the response after conditioning is higher than the baseline measure, hyperalgesia has been induced. Secondary hyperalgesia just refers to hyperalgesia assessed in an area adjacent to or remote of the site of stimulation. In general, it is not required to lesion a tissue to activate the nociceptive system or to induce pain.

      (5b) If heat creates secondary hyperalgesia without lesion, what does that mean physiologically?

      Secondary hyperalgesia is normally interpreted as a perceptual correlate of central sensitization.

      (5c) Is it a valid and reliable "pain" variable?

      Yes and yes. A noxious heat stimulus can reliably elicit secondary hyperalgesia (see section 3.2 from Quesada et al. 2021). We also cite several studies that have used secondary hyperalgesia as an outcome measure of central sensitization in pain.

      (6) The follow-up study has been designed to cover the RPMS sound using pink noise. However, the pink noise was also present during the PHP measurement. How can we determine whether the absence of change is due to the pink noise during the RPMS or the presence of pink noise during PHP? I don't think this is possible to discriminate.

      We will add a third study that performs the control analysis with the sound of the rPMS masked, but no pink noise otherwise. The study will be performed in two groups: one with pink noise, and one without pink noise.

      Appraisal

      (7) Despite all these potential issues, authors interpret their data with high confidence and with several overstatements in the Title, Abstract, and Discussion. The results do not support their conclusions. The fact that auditory stimulation may produce an analgesic effect is a hypothesis, but the current study cannot ascertain it.

      We believe that the chief concern with the interpretation lies with concerns with the second study. The proposed third experiment will address these concerns.

      Reviewer 2 (Public Review):

      (1) My biggest concern in this paper is that the stimulation protocols are not applied after pain was induced in the subjects, but before. This is not bad in itself, but as the paper presents the stimulations as potential "treatments" it generates a severe mismatch between the objective, context (introduction), and impact (discussion) presented for the experiments, and how they are actually designed. This adds to the fact that healthy volunteers are used here to generate a study with low translational capability, that aims to be translational and provide an indication for clinics (maybe this is why the reduction in pain intensity caused by PMS when applied in patients, reported in references [29, 35 and 39], is not observed here).

      We will reframe these as prophylaxis, rather than treatment. This study was an experimental study originally designed to determine which stimulation parameters (cTBS or iTBS) would be better suited to modulate pain. We performed the study in healthy individuals undergoing acute pain, akin to a person undergoing painful procedure, which could lead to central sensitization and pain persistence (e.g., post-surgical pain). However, before testing this in individuals undergoing actual procedures, it is essential to determine efficacy in people before translation.

      Khan et al [29] is a case study with neuropathic pain, whereas our study uses a nociceptive pain model. Lim et al [35] employed 10 sessions of rPMS stimulation in patients with acute low back pain. Similar to our study, the change in VAS driven by rPMS was no different than the sham stimulation. We notice that there is no reference 39, and will correct this.

      (2) TENS treatment duration is simply too short (90s) to be considered a therapeutic TENS intervention. I get that this duration was chosen to match the one of PMS, but TENS is never applied like this in the clinics, in which the duration varies from 10 minutes to an hour (or more). This specific study comparing different durations recommends 40 minutes for knee osteoarthritis pain relief (PMID: 12691335). Under these conditions, this stimulation is more similar to a sham TENS than to a real TENS treatment: I would suggest interpreting it as such. As the paper is right now, it could give the impression that PMS could produce clinical effects not observed in TENS, but while the PMS application resembles a clinical one, the TENS application does not (due to its extremely short duration). As an example, giving paracetamol at a dose 10 times below its effective dose is a placebo, not a paracetamol treatment.

      We acknowledge that this is a limitation, and will address this in the Discussion of the revised manuscript.

      (3) This study measured pain, not central sensitization. Specifically, the effects refer to the area of secondary hyperalgesia. The IASP definition for central sensitization is "Increased responsiveness of nociceptive neurons in the central nervous system to their normal or subthreshold afferent input." (PMID: 32694387). No neuronal results are reported in this article. Therefore, central sensitization is not measured here, and we do not know if it is reduced by sound. This frontally clashes with the title of the article and with many interpretations of the results. For a deep review on this topic, I recommend PMID: 39278607 and the short article PMID: 30416715.

      It is widely accepted that central sensitization is the neurophysiological basis of secondary hyperalgesia (see PMID: 11313449; PMID: 10581220).

      The reviewer is conflating secondary hyperalgesia due to central sensitization and chronic pain. Whether chronic pain is driven or maintained by central sensitization is not the goal of our study. However, there is ample evidence that nociceptive drive can induce plasticity in the CNS, which alters pain sensitivity, and that these changes facilitate pain.

      (4a) There is no mention of blinding/masking/concealing in this manuscript. Was the therapist blind to whether they applied one protocol, another, or a placebo? Were the evaluators blind, as this can heavily influence their measurements? And the volunteers? Was allocation concealed? Was this blinding measured afterwards? Blinding is, together with randomization, the most important methodological feature for those interventional studies. For example, not introducing blinding and concealing directly makes a study lose 4 out of 10 points in the PEDro scale, failing to fulfill criteria 3, 5, 6, and 7 (https://pedro.org.au/english/resources/pedro-scale/).

      This study was not designed as an RCT, but rather as experimental study. The study was pseudo-randomized to ensure that the groups had equal allocation and distribution of sexes.

      The groups were blinded to the other stimulations (they were not informed of the various arms of the study, through different consent forms). However, blinding was not measured afterwards (again, this was not meant to be an RCT).

      It was not possible to blind the experimenter as the iTBS and cTBS protocols are very different: iTBS has multiple bursts separated by brief intervals, whereas cTBS is continuous). The data were masked for analysis, and only unblinded at the final stage. We will update the manuscript to reflect these changes.

      (4b) Continuing with methodological considerations, the dropout percentage is high (18% for the first and 25% for the second study), both above the 15% cutoff for criterion 8 of the PEDro, losing another point.

      In the study, only 2 withdrew after feeling the heat, 2 were lost to follow up, and 2 had incomplete data. That totals 6/123 in Study 1. In study 2, none of the participants that met inclusion/exclusion criteria, and who were ‘allocated’ to the study were included (0% dropout/data loss).

      We are unsure how to address this point, as we had clear inclusion/exclusion criteria, and these could only be measured after consenting. As this is an experimental study performed on healthy individuals in a university setting, we are not able to collect any study related data prior to consent.

      We openly reported individuals who did not meet the criteria, and thus were excluded. These criteria are a combination of what is required to collect good quality data, and what we are ethically permitted to do. We understand that in an interventional trial where >15% drop out due to intolerance, or adverse events would indeed be concerning.

      (5) Data reporting and statistical treatment can be improved, as only differences are reported and regression to the mean is not accounted for in this study. Moreover, baseline levels for the dependent variables (control session) are not accessible for evaluation and they are not compared statistically, making it impossible to know if the groups were similar at baseline. This will imply failing criterion 3 of the PEDro, for a total of 2/10 points.

      This only concerns study 1, as study 2 is a within subject study design. Study 1 provides the raw data in Figure 4. We will provide the raw data for each of the primary outcome measures in a supplemental table in the revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Park et al. conducted various analyses attempting to elucidate the biological significance of SARS-CoV-2 mutations. However, the study lacks a clear objective. The specific goals of the analyses in each subsection are unclear, as is how the results from these subsections are interconnected. Compiling results from unrelated analyses into a single paper can be confusing for readers. Clarifying the objective and narrowing down the topics would make the paper's purpose clearer.

      The logic of the study is also unclear. For instance, the authors developed an evaluation score, APESS, for analyzing viral sequences. Although they state that the APESS score correlates with viral infectivity, there is no explanation in the results section about why this is the case.

      The structure of the paper should be reconsidered.

      Thank you for your feedback. We have heeded the input that the study lacks a clear objective and made sure that the overall goal of the study is reflected in the Abstract, Results, and Discussion.

      We have made sure that the specific goals in each subsection are clearer in the Results section that better explain the goals of those sections and elaborated on how the components of our study connect to each other. We have addressed these in more detail in the ‘Recommendations for the authors’ section.

      Thank you for the feedback on APESS, our evaluation model. APESS was created based on virus properties that we discovered of SARS-CoV-2 in our study. When applying our evaluation model, high APESS scores indicated high infectivity. APESS is calculated from a comprehensive evaluation of SARS-CoV-2 at the nucleotide, amino acid, and protein structure levels.

      The detailed explanations and exact calculations of APESS are detailed in the Materials and Methods section in line 571 but we should have been more detailed in the Results section as well. We have made sure to properly indicate this in the Results section in line 284.

      And overall, we have made edits to the manuscript that accurately explain our research by amending terms, restructuring arguments, and providing more clarity for the interconnectivity of the research.

      Reviewer #2 (Public review):

      Summary:

      The authors have developed a machine learning tool AIVE to predict the infectivity of SARS-CoV-2 variants and also a scoring metric to measure infectivity. A large number of virus sequences were used with a very detailed analysis that incorporates hydrophobic, hydrophilic, acid, and alkaline characteristics. The protein structures were also considered to measure infectivity and search for core mutations. The study especially focused on the S protein of SARS-CoV-2. The contents of this study would be of interest to many researchers related to this area and the web service would be helpful to easily analyze such data without in-depth bioinformatics expertise.

      Strengths:

      - Analysis of large-scale data.

      - Experimental validation on a partial set of searched mutations.

      - A user-friendly web-based analysis platform that is made public.

      Weaknesses:

      - Complexity of the research.

      Thank you for your kind feedback. Our study explored a wide range of topics including biochemical properties, machine learning, and viral infectivity.

      In presenting our research, we recognize that our comprehensive analysis may have slightly obscured the specific aims and overall objective of the study. We investigated properties in the viral sequences of SARS-CoV-2 and examined big data, clinical data, and expression data to elucidate their effect on viral infectivity. We then used evaluation modeling and in silico and in vitro validation.

      We have clarified the aims of our research and improved upon the flow of the manuscript by adding sentences that outline the goals of our research in the appropriate sub sections of the Results and Discussion sections.

      Reviewer #1 (Recommendations for the authors):

      The abstract should clearly state the backgrounds, objectives, strategies, and findings of this study in an orderly manner.

      Thank you for your feedback. We have restructured the Abstract to better reflect the goals and methods of our study. We start the Abstract by introducing the background of the study ‘An unprecedented amount of SARS-CoV-2 data has been accumulated compared with previous infectious diseases, enabling insights into its evolutionary process and more thorough analyses.’ in line 48. Then we more clearly stated the overall objectives of our research in line 50 as ‘This study investigates SARS-CoV-2 features as it evolves to evaluate its infectivity.’ Then, we clearly defined our specific discoveries in the virus, the purpose of our evaluation model, and how we validated our findings.

      In the Introduction, the message of each paragraph is unclear. Please clearly state the objectives of the study and what was done to achieve these objectives.

      Thank you for the feedback. We have updated the Introduction section to more clearly state the objectives of the study.

      To increase clarity, we have moved ‘Furthermore, hydrophobic properties in the amino acid sequence affect protein folding. Coronavirus hydrophobicity has significant effects on amino acid properties and protein folding.’ to line 127.

      In line 130, we rephrased the first sentence of the paragraph to ‘For these prior approaches to virus analysis and prediction, expertise with the relevant fields is required for a full understanding.’ to better establish the link between the background information and aims of the study. Then in line 134, we added ‘elucidate properties about the virus’ to clarify the aims of the study.

      In line 141, we have improved the clarity of the sentence to better present the scope and objectives of the study.

      The relationship between the sections in the Results is unclear. Clarify why each section is necessary and how they are interconnected.

      We investigated properties in the viral sequences of SARS-CoV-2 that highlighted amino acid substitutions or changes in polarity (Figure 1). In VOCs, we noted trends or absences of amino acid substitutions at specific positions (Figure 2). We examined epidemiological and clinical data to determine the infectivity, severity, and symptomaticity of lineages. Looking at expression data and binding affinity further illuminated the effect of amino acid substitutions (Figure 3). We created APESS, an evaluation modeling, that is comprehensively calculated from the nucleotide, amino acid, and protein structure levels of the virus. Evaluation of lineages revealed that higher APESS scores were associated with higher infectivity (Figure 4). We used in silico and in vitro validation to reinforce our findings then used machine learning to make predictions on future developments (Figure 5). We created candidate sequences for evaluation and utilized machine learning in predictions (Figure 6).

      We have added explanations to each section in Results that elucidate the objective of each section and how they connect with each other in the wider study.

      In line 157, we have added ‘We examined the amino acid sequences of SARS-CoV-2 to make discoveries about biochemical properties.’ to clearly outline the objective of the subsection.

      In line 207, we have improved the phrasing of the sentence.

      In line 278, we stressed that ‘We developed APESS, an evaluation model to analyze viral sequences based on the nucleotide, amino acid, and protein structure properties.’ to properly define the purpose and background of APESS.

      Please define abbreviations when they first appear.

      We have added the full terms for the stated abbreviations in the relevant sections of the manuscript.

      In line 107, we have added the proper abbreviation for Our World in Data (OWID).

      In lines 143, 175, and 489 we have added the full term for Variants of Concern (VOCs).

      In line 160, we have added the full term for Receptor Binding Motif (RBM).

      Reviewer #2 (Recommendations for the authors):

      (1) pg 9, line 51, full name of RBM should be declared.

      We have added the full name of Receptor Binding Motif (RBM) to the appropriate section in the Abstract.

      (2) How are the Variants of Concern (VOCs) defined?

      Thank you for the comment and we apologize for the confusion. Variants of Concern as defined by the World Health Organization are specified in the Materials and Methods section. We have also added the full name for Variants of Concern (VOCs) when they are first mentioned in the Introduction and Results sections.

      (3) pg 17, line 297. The purpose of using AI/ML to predict amino acid substitutions at specific locations is not clear. The VOCs and related mutation loci were already searched, so the AA substitution prediction step seems a little repetitive. Is it to create customized sequences? Also, if prediction (or probability) was made, some performance evaluation would be helpful.

      Thank you for this feedback. The purpose of utilizing machine learning to make predictions about amino acid substitutions is to assess the possibility of amino acid substitutions occurring at specific locations. These potential amino acid substitutions were evaluated by APESS to have high scores, linking them to high infectivity. As the feedback suggests, amino acid substitutions in VOCs are researched but our prediction sought to ascertain the likelihood of amino acid substitutions that our evaluation model associated with infectivity. In the Results section in line 330, we assessed the probability of amino acid substitutions N460K and Q493R that the study found to be significant. The datasets that we utilized for these predictions are detailed in the Materials and Methods section in line 677.

      The models we trained with machine learning predicted the probability of mutations based on samples in each group and their performance was evaluated by comparing the presence of mutations in the clades they diverged from. We have added the following sentences to line 330: “We used Accuracy, Precision, Recall, and F1 score to evaluate performance. All models showed high performance scores above 0.95 in Precision, Recall, and F1 score. For accuracy, XGBoost, scored above 0.89, exhibiting relatively high performance while LightGBM scored above 0.78.”

      (4) pg 17, line 289. The objective of creating candidate lineages is not clear and would be helpful for the readers if its purpose is elaborated on. Since there are enough SARS-CoV-2 sequences, wouldn't it be more realistic and accurate to use those real sequences instead of creating them? Furthermore, the candidate lineages should be defined but they were missing in this section. This part made it a little difficult to follow the overall paper's logic.

      The manuscript should have been clearer on what ‘candidate lineages’ signified, we apologize for the confusion. In line 314, we included the following sentences for clarity: ‘We introduced amino acid substitutions at specific locations in the SARS-CoV-2 backbone for the wildtype and VOCs. The amino acid substitutions were lysine (K), arginine (R), asparagine (N), serine (S), tyrosine (Y), and glycine (G). We then evaluated the infectivity of these candidate lineages with our evaluation model APESS.’

      The purpose of creating candidate lineages in our study was to assess the effect of specific amino acid substitutions on the virus’ infectivity. The amino acid substitutions we evaluated were lysine (K), arginine (R), asparagine (N), serine (S), tyrosine (Y), and glycine (G). We determined that examining the introduction of specific amino acid substitutions to SARS-CoV-2 sequences would highlight the significance they had on infectivity. We have revised the paragraph in line 314 of the Results section to convey what we were doing.

      (5) This study covers very detailed contents regarding lineages, mutations, and their effect on infectivity. It would be more readable if subsections could be added per group of investigation, especially in the results and discussion section.

      In the Results section, we have emphasized the objective of each subsection and how they connect with one another for the overall goals of our study.

      In line 157, we have added ‘We examined the amino acid sequences of SARS-CoV-2 to make discoveries about biochemical properties.’ to clearly outline the objective of the subsection.

      In line 207, we have improved the phrasing of the sentence.

      In line 278, we stressed that ‘We developed APESS, an evaluation model to analyze viral sequences based on the nucleotide, amino acid, and protein structure properties.’ to properly define the purpose and background of APESS.

      We have made edits to the Discussion section to more clearly indicate subsections.

      In line 389, we have added ‘In our investigation of various viruses’ to clearly indicate the background on other viruses.

      In line 409, we added the sentence ‘We made discoveries on specific amino acid substitutions at positions.’ to indicate the subsection talking about N437R, N460K, and D467 mutations.

      In line 471, we added the sentence ‘We created AIVE to feature our findings and analyses on an online platform.’ And modified the following sentence to better explain AIVE.

      (6) pg 26, line 557. The criteria for the SCPSi scores were set to 0.9 and 0.1 by the proportion of the Omicron and Delta variants. How do other criteria affect the performance of the method?

      Thank you for the question and check point. We used 0.9/0.1 for our initial criteria in our SCPS calculation. To determine how that affected performance, we have used 0.8/0.2 and 0.7/0.3 as the criteria.

      After calculating APESS with different SCPS weights (0.9/0.1, 0.8/0/2, 0.7/0.3), we used a Gaussian Mixture Model (GMM) to compare how the groups were divided based on APESS. All three groups with different SCPS weights were determined to accurately reflect data patterns when they had four components.

      When comparing parameter values, the group that used the original weights of 0.9 and 0.1 for SCPS showed the lowest values for variance and standard error across all four components. This indicates that each component was stable and clearly distinguishable from one another.

      The group where the weights were adjusted to 0.7 and 0.3 for SCPS showed significantly higher variance and a large error for the G2 component. The distribution of each component was more widespread, signifying that the stability and reliability was lower.

      The group where the weights were adjusted to 0.8 and 0.2 for SCPS was positioned between the two previous groups for finer data classification and reliability. However, the group notably lacked reliability when it came to the SE values for the G4 component.

      Thus, the original model with 0.9 and 0.1 weight is the most reliable.

      When the Gaussian Density for each group was plotted, the group with 0.9/0.1 SCPS weights showed the highest peak near 2 (G1), with a value of approximately 2. For the group with SCPS 0.8/0.2 weights, the highest peak appeared near 4.2 (G3), showing a high value around 14. For the group with SCPS 0.7/0.3 weights, the highest peak appeared near 3.7 (G3) showing a value around 5. The group with 0.9/0.1 SCPS weights exhibited a more uniform Gaussian distribution compared to the other two.

      Author response image 1.

      Superposition of Gaussian Densities for SCPS weight 0.9/0.1

      Author response table 1.

      Statistical values of the Superposition of Gaussian Densities for SCPS weight 0.9/0.1

      Author response image 2.

      Superposition of Gaussian Densities for SCPS weight 0.8/0.2

      Author response table 2.

      Statistical values of the Superposition of Gaussian Densities for SCPS weight 0.8/0.2

      Author response image 3.

      Superposition of Gaussian Densities for SCPS weight 0.7/0.3

      Author response table 3.

      Statistical values of the Superposition of Gaussian Densities for SCPS weight 0.7/0.3

      (7) Overall, the approach is very detailed and realistic. Just curious if this approach would be also applicable to other viruses such as influenza.

      We appreciate the insightful comments from the reviewer, and this is a direction we hope to take our research in the future. Our study focused on SARS-CoV-2 and the properties we discovered from the virus’ spike protein interacting with the host’s ACE2 receptor. In our investigation of other coronaviruses such as MERS-CoV, SARS-CoV-1 possesses a different structure and properties than these viruses as we have illustrated in Supplementary Figure 24. We had provided explanations about our investigation of other viruses in the Discussion section. In line 389, we have added ‘In our investigation of various viruses’ to better signpost this section.

    1. Author response:

      In this initial response to the public review, we outline our plan to address the major concerns raised. Below, we provide a general categorization of the suggestions and our corresponding responses

      Weakness #1: Statistical Concerns - using the number of seizures (rather than the number of animals) may identify small effects that could be insignificant. Effect size should be taken into consideration.

      Reviewer 1:

      “While the data generally supports the authors' conclusions, a weakness of this manuscript lies in their analytical approach where EEG feature-space comparisons used the number of spontaneous or evoked seizures as their replicates as opposed to the number of IHK mice; these large data sets tend to identify relatively small effects of uncertain biological significance as being highly statistically significant.”

      Reviewer 2:

      “In several sections of the paper, the authors argue that two different groups are similar on the basis that no statistical difference was found between the two groups (i.e., p > 0.05); however, the failure to find a statistically significant difference, particularly with relatively small sample sizes, is not rigorous evidence that the two groups are actually similar - they are just "not significantly different.”

      Reviewer 3:

      “(3) The utility of increasing the number of seizures for enhancing statistical power is limited unless the sample size under evaluation is the number of seizures. However, the standard practice is for the sample size to be the number of mice.”

      Reviewer 3:

      “(1) Evaluation of seizure similarity using the SVM modeling and clustering is not sufficiently explained to show if there are meaningful differences between induced and spontaneous seizures. SVM modeling did not include analysis to assess the overfitting of each classifier since mice were modeled individually for classification.”

      We understand the reviewers’ concerns. In this work, we used linear mixed effect model to address two levels of variability –between animals and within animals. The interactive linear mixed effect model shows that most (~90%) of the variability in our data comes from within animals (Residual), the random effect that the model accounts for, rather than between animals. Since variability between animals are low, the model identifies common changes in seizure propagation across animals, while accounting for the variability in seizures within each animal. Therefore, the results we find are of changes that happen across animals, not of individual seizures. We will make text edits to enhance understanding of the linear mixed effect model.

      To address the point raised about similarity, we will explain how the SVM classifier was trained. The purpose of the SVM is not to identify meaningful differences between induced and spontaneous seizures. Rather, it is to classify EEG sections as “seizures” or non-seizures, demonstrating the gross similarity between induced and spontaneous seizures despite minor differences. We will make text clarifications for the SVM model.

      Weakness #2: Clinical and biological significance is unclear.

      Reviewer 1:

      “Furthermore, the clinical relevance of similarly small differences in EEG feature space measurements between seizure-naïve and epileptic mice is also uncertain.”

      Reviewer 2:

      “While the paper may be relevant for the ETSP and contract research organizations (CROs), the paper was not written to attract the interest of biological scientists, even those in this specific area of epilepsy research. It may be of low interest to other neuroscientists… The key issue the authors aim to address is the 30-40% of patients with DRE, but the real problem with DRE patients is not that these people have seizures with no effect of the ASDs; rather, although ASD may reduce seizure burden, these patients continue to have some remaining seizures even after high doses of ASDs, which often leads to adverse effects from the particular ASDs… It remains unclear that the optogenetically induced seizures in this model are better than similarly induced seizures in a naïve animal, and there is no evidence that the model will be useful for finding new ASDs to treat DRE.”

      Reviewer 3:

      “(6) Human epilepsy is extensively heterogeneous in both etiology and individual phenotype, and it may be hard to generalize the approach.”

      Reviewer 2:

      “The authors state that this approach should be used to test for and discover new ASDs for DRE, and also used for various open/closed loop protocols with deep-brain stimulation; however, the paper does not actually discuss rigorously or critically the background literature on other published studies in these areas or how this approach will improve future research for a broader audience than the ETSP and CROs. Thus, it is not clear whether the utility will apply more widely and how extensive a readership will be attracted to this work.”

      We appreciate the reviewer’s concerns. We will revise the manuscript to better emphasize the potential significance of our approach. The on-demand seizure model can be applied to address biologically and clinically relevant questions beyond its utility in drug screening. For example, crossing the Thy1-ChR2 mouse line with genetic epilepsy models, such as Scn1a mutants, could reveal how optogenetic stimulation differentially induces seizures in mutant versus non-mutant mice, providing insights into seizure generation and propagation in Dravet Syndrome. Due to the cellular specificity of optogenetics, we also envision this approach being used to study circuit-specific mechanisms of seizure generation and propagation. Regarding drug-resistant epilepsy (DRE) and anti-seizure drug (ASD) screening, we agree with the reviewer that probing new classes of ASDs for DRE represents the critical goal. However, we believe a full exploration of additional ASD classes and/or modeling DRE lies outside the scope of this manuscript.

      Weakness #3: Definition of Seizure is unclear

      Reviewer 2:

      “Although the figures provide excellent examples of individual electrographic seizures and compare induced seizures in epileptic and naïve animals, it is unclear which criteria were used to identify an actual seizure induced by the optogenetic stimulus, versus a hippocampal paroxysmal discharge (HPD), an "afterdischarge", an "electrophysiological epileptiform event" (EEE, Ref #36, D'Ambrosio et al., 2010 Epilepsy Currents), or a so-called "spike-wave-discharge" (SWD). Were HPDs or these other non-seizure events ever induced using stimulation in animals with IH-KA? A critical issue is that these other electrical events are not actual seizures, and it is unclear whether they were included in the column showing data on "electrographic afterdischarges" in Figure 5 for the studies on ASDs”

      Reviewer 3:

      “(2) The difference between seizures and epileptiform discharges or trains of spikes (which are not seizures) is not made clear.”

      Reviewer 2:

      “The differences between the optogenetically evoked seizures in IH-KA vs naïve mice are interpreted to be due to the "epileptogenesis" that had occurred, but the lesion from the KA-induced injury would be expected to cause differences in the electrically and behaviorally recorded seizures - even if epileptogenesis had not occurred. This is not adequately addressed.”

      Thank you for pointing out the unclear definition of the seizures analyzed. We agree and will revise the text to clarify this issue. In this manuscript, we focused on tonic-clonic seizures. We analyzed animal behavior during evoked events, and a high percentage of induced electrographic events were accompanied by behavioral seizures with a Racine scale of three or above. Regarding epileptogenesis, our model is based on the IHK model, in which spontaneous tonic-clonic seizures occur a few to several days after KA injection. These mice are, by definition, epileptogenic. We will further clarify this methodology in the text.

      Weakness #4: Similarity/Difference with Kindling Not Clear

      Reviewer 2:

      “The authors did not test whether an apparent "kindling" effect, apparently seen in naïve controls, also occurred in animals micro-injected with kainic acid (KA). This effect could cause model instability that might result in variability in response to ASDs. It is not clear whether the number of optogenetically induced seizures in epileptic animals would affect the response to drugs. It is also unclear how much of an improvement the animal model in the present work is over other similar models of TLE, where electrically triggered seizures could simply be applied to one of them.”

      Reviewer 3:

      “(5) It is unlikely that long-term adaptation to CA1-stimulated seizure induction is absent in these mice. A duration of evaluation longer than 16 days is warranted in light of the downward slope at days 13-16 for induced seizures in Figure 4C.”

      We appreciate the reviewer’s comments regarding the “kindling effect” as well as its similarity to the kindling model. We will carefully assess the data and address this in the revised manuscript. In electrical kindling, the activated cellular population is non-specific, including both excitatory and inhibitory neurons. In our model, we specifically activate predominantly excitatory neurons (Thy1-positive neurons), which we observed to participate in convulsant-induced seizures (as demonstrated in Thy1-GCaMP experiments). We consider this specificity an improvement over the kindling model, making our approach more biologically relevant.

      Weakness #5: Time needed to generate model is significant. Unclear if animals were pre-selected

      Reviewer 1:

      “Finally, the multiple surgeries and long timetable to generate these mice may limit the value compared to existing models in drug-testing paradigms.

      Reviewer 2:

      “The authors offer little mention of other research using animal models of TLE to screen ASDs, of which there are many published studies - many of them with other strengths and/or weaknesses. For example, although Grabenstatter and Dudek (2019, Epilepsia) used a version of the systemic KA model to obtain dose-response data on the effects of carbamazepine on spontaneous seizures, that work required use of KA-treated rats selected to have very high rates of spontaneous seizures, which requires careful and tedious selection of animals. The ETSP has published studies with an intra-amygdala kainic acid (IA-KA) model (West et al., 2022, Exp Neurol), where the authors claim that they can use spontaneous seizures to identify ASDs for DRE; however, their lack of a drug effect of carbamazepine may have been a false negative secondary to low seizure rates. The approach described in this paper may help with confounds caused by low or variable seizure rates. These types of issues should be discussed, along with others.”

      We appreciate the reviewer’s insights. In an existing model investigating spontaneous tonic-clonic seizures (such as the intra-amygdala kainate injection model), the time investment is back-loaded, requiring two to three weeks per condition while counting spontaneous seizures, which may occur only once a day. In contrast, our model requires a front-loaded time investment. Once the animals are set up, we can test multiple drugs within a few weeks, providing significant time savings. Additionally, we did not pre-screen animals in our study. Existing models often pre-select mice with high rates of spontaneous seizures, whereas in our model, seizures can be induced even in animals with few spontaneous seizures. We believe that bypassing the need for pre-screening is a key advantage of our induced seizure model.

      Reviewer 3:

      “(7) No mention or assessment of mouse sex as a biological variable.”

      Thank you for pointing this out. Both female and male animals were included in this study: Epileptic cohort: 7 males, 3 females; Naïve cohort: 3 males, 4 females

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Wilson's Disease (WD) is an inherited rare pathological condition due to a mutation in ATP7B that alters mitochondrial structure and dysfunction. Additionally, WD results in dysregulated copper metabolism in patients. These metabolic abnormalities affect the functions of the liver and can result in cholecystitis. Understanding the immune component and its contribution to WD and cholecystitis has been challenging. In this work, the authors have performed single-cell RNA sequencing of mesenchymal tissue from three WD patients and three liver hemangioma patients.

      Strengths:

      The authors describe the transcriptomic alterations in myeloid and lymphoid compartments.

      Weaknesses:

      In brief, this manuscript lacks a clear focus, and the writing needs vast improvement. Figures lack details (or are misrepresented), the results section only catalogs observations, and the discussion needs to focus on their findings' mechanistic and functional relevance. The major weakness of this manuscript is that the authors do not provide a mechanistic link between the absence of ATP7B and NK cells' impaired/altered functions. While the work is of high clinical relevance, there are various areas that could be improved.

      In this study, we reported for the first time that ATP7B mutation and the resulting metabolic abnormalities in hepatocytes cause functional alteration of immune cells in WD patients. We dissected the transcriptional profiles of liver mesenchymal cells and delineated the functional differences of main immune cells in WD patients through scRNA-seq. The NK cell exhaustion and its clinical significance were further demonstrated.

      The mechanism study is of our concern. Given that the ATP7B mutation is hepatocyte-specific, its effect on immune cells is most probably through intercellular communication rather than through the direct action of ATP7B protein. How ATP7B mutation disturbs the metabolic homeostasis in hepatocyte, how metabolic pathways regulate the release of signal substances, and how signal substances act on the NK cells need to be explained. These contents, together with this manuscript, are beyond the scope of a single article, so we put the novelty in this manuscript.

      We sincerely appreciate the comments. We have improved the manuscript based on your valuable suggestions. The mechanism study is our subsequent research topic. We are actively promoting it and have found that ATP7B mutation rewires a certain metabolism pathway in hepatocyte, and that a critical metabolite functions as the mediator causing NK cell exhaustion.

      Reviewer #2 (Public Review):

      Summary:

      Wilson's disease is a rare genetic disorder caused by mutations in the ATP7B gene. Previous studies have documented that ATP7B mutations can disrupt copper metabolism, affecting brain and liver function. In this paper, the authors performed a retrospective clinical study and found that Wilson's disease has a high incidence of cholecystitis. Single-cell RNA-seq analysis revealed changes in the immune microenvironment, including the activation of immune responses and the exhaustion of natural killer cells.

      Strengths:

      A key finding of this study is that the predominant ATP7B gene mutation in the Chinese population is the 2333G>T (p. R778L) mutation. The authors reported associations between Wilson's disease and cholecystitis, as well as the exhaustion of natural killer cells.

      Weaknesses:

      The underlying mechanisms linking ATP7B mutations to cholecystitis and natural killer cell exhaustion remain unclear. Specifically, it is not yet determined whether copper metabolism alterations directly cause cholecystitis and natural killer cell exhaustion, or if these effects are secondary to liver dysfunction.

      In this study, we reported for the first time that ATP7B mutation and the resulting metabolic abnormalities in hepatocytes cause functional alteration of immune cells in WD patients. We dissected the transcriptional profiles of liver mesenchymal cells and delineated the functional differences of main immune cells in WD patients through scRNA-seq, focusing on the NK cell exhaustion and its clinical significance.

      The mechanism study is of our concern. Given that the ATP7B mutation is hepatocyte-specific, its effect on immune cells is most probably through intercellular communication, so we prioritize the studying of this aspect. How ATP7B mutation disturbs the metabolic homeostasis in hepatocyte, how metabolic pathways regulate the release of signal substances, and how signal substances act on the NK cells need to be explained. These contents, together with this manuscript, are beyond the scope of a single article, so we put the novelty in this manuscript.

      We sincerely appreciate the comments. The mechanism study is the topic of our follow-up study. We are actively promoting the research and we have found that ATP7B mutation rewires a certain metabolism pathway in hepatocyte, and that a critical metabolite functions as the mediator causing NK cell exhaustion.

      Reviewer #1 (Recommendations For The Authors):

      Major:

      (1) Abstract. A major portion of this manuscript focuses on non-NK cells. Data that describes NK cell exhaustion is only minimal. Therefore, the authors should modify the abstract.

      Thank you for your valuable suggestion. We have supplemented the description of functional changes in other immune cells, and have modified the abstract (line 31-35).

      (2) Introduction. There are three paragraphs. The first paragraph discusses cholecystitis. However, there are too many repetitions, and the information is unclear. In the second part, the authors discuss NK cells and their exhaustion. The authors do not establish a clear rationale or logic linking NK cells to WD or cholecystitis. In the last paragraph, the authors describe their findings. Their correlation between NK cell exhaustion and the poor healing process of cholecystitis has no direct experimental proof.

      Thank you for your comments. We have deleted the repetitions and rephrased some sentences (line 72-74). Briefly, in the first paragraph, we proposed the significant prognostic value of immune cell dysfunction for cholecystitis. In the second paragraph, we introduced NK cell exhaustion and its potential to predict prognosis of certain diseases. In the third paragraph, we introduced that the liver is a central organ involved in metabolism and immunity, holding a large number of NK cells. Liver pathologies commonly impact the development and outcome of inflammation-associated diseases such as cholecystitis. WD was selected as a research model. In the last paragraph, we introduced our findings from clinical study, scRNA-seq, clinical samples, and bioinformatics analysis, and concluded at the end.

      (3) Results. Overall, the results section lacks clarity and a clear focus. Figure legends need to be significantly detailed. The authors make too many broad statements without any support. The authors also make too many overstatements.

      Thank you for your valuable suggestion. We have improved the inaccurate statements and made detailed refinement of figure legends. All the changes are marked in the manuscript, and related responses are described below.

      Figure 1: No information is provided about the functional impairment of ATP7B protein due to the mutation found in the cohort of Chinese patients. What does 'immune abnormalities' (line 127) mean? What is the relevance of showing liver fibrosis and copper accumulation in the eye in Figure 1c and d, respectively? Total cholesterol concentrations are still within the range in the plasma of WD patients, but the authors call it higher. ECAR has not changed in WD patients, but the authors claim it has (line 117).

      (1) All these gene mutations in WD disable the protein function and cause the same outcome. (2) We have deleted the inappropriate statement. (3) In clinical observation, we found that WD not only causes copper accumulation in hepatocytes, but also leads to a variety of diseases, including liver fibrosis, Kayser-Fleischer Ring, and lower risk of hyperglycemia. We showed these together with the data of cholecystitis incidence. We think these might suggest the significance of intercellular communication between hepatocytes and other cells in microenvironment. (4) We have deleted the inappropriate statement (line 108-110, 112-113).

      Figure 2: Did the authors use the liver mesenchymal tissue or mesenchymal cells? Figure 2 states that they used mesenchymal cells, different from liver mesenchymal tissue. Numbers within Figure 2b UMAP are not visible. Were the initial T and NK cells annotated as indicated in Figure S2 (CD3D, CD#E, CD3G)? If so, that does not include NK cells.

      (1) The liver mesenchymal cells were used for scRNA-seq. (2) It is possible that the image resolution was reduced due to the compression of files by the submission system during merging process. We confirm that the image resolution of all figures meets publishing requirements, and that all characters on the figures are visible. You can download figure files to view details. (3) It was our negligence that the incomplete cell markers were shown in Figure S2. We have updated the markers (CD3D, CD3E, NKG7), references (Ref #53, #55, and #56), and related figures (Figure 2e, and Figure S2c).

      Figure 3: The authors should change 'Case' to 'WD patients' both in the text and figures. DEGs in Figure 3C indicate a transcriptomic alteration in the B cell compartment, which the authors do not delineate. Also, the rationale and explanation for the CellChat analyses are minimal. Concluding that a change occurred within the TME with minimal data and explanations is unfair.

      Thank you for your comments. (1) We apologize for the confusion caused by the use of nomenclatures and abbreviations in the text and figures. In all scRNA-seq data analysis, presentation, and description, we used specific terms (CASE and CON) to refer to the group of WD patients and controls, as well as their cell population. We have now unified the use of nomenclature in full text and defined them when first appeared (line 126-127), avoiding using lowercase form to prevent confusion. (2) We have now compared the expression of key genes of B cell between the two group in the next section “The dysfunction of main immune cells in WD patients” (line 230-235, Figure 4e, Figure S4e). (3) We have described the results of cellular communication in more detail (line 188-194). (4) We have modified the conclusion and all the related statement in full text (line 29-31, 82-84, 149, 194-195).

      Figure 4: This section deals with multiple cell types with minimal explanations. This section discusses various cell types, but it lacks focus. In particular, the T cell section should be separated and elaborated more in detail.

      (1) In this section, we intended to show the comparison in function of main immune cells that account for a considerable proportion, instead of just showing differently expressed genes that provide minimal information. The evaluation of functional signature, based on the integration of multiple gene expression, allows a direct understanding of the final outcome owing to transcriptional changes. (2) Given that the main functions of T cells did not change significantly and there were more significant changes in innate immunity, the T cell section is relatively short and unsuitable as a separated part.

      Figure 5: What are the distinct subsets of NK cells authors have found in the WD patients and controls? How do these subsets differ between the two groups in numbers and their transcriptomes? The presentation and labeling of Figure 5 and Supplementary Figure 5 need to be vastly improved. The pseudotime presentation in Figure 5b should be presented separately for the patients and the controls. Are the changes in gene expression presented in Figure 5a due to the change in the subset compositions? Figure 5c immuno-staining is not at all visible. A clear explanation should be given for the differences between Figure 5c and Figure 5e, where NKG2A expressions are shown. A better explanation for Figure 5d is required. Did the authors use all the antibodies with the same fluorochrome? If so, what color is that? Can the authors include the individual samples in the bar diagram in Figure 5e? Again, the data in Figure 5 is insufficient to conclude that NK cells are exhausted in WD patients. While the role of changes in the expression of T-BET and EOMES can be related to dysfunction and cellular exhaustion of NK cells, the statement made by the authors needs to be toned down as they do not test with independent experiments.

      (1) The subsets of NK cell were clustered by gene expression profile and labeled by the characteristically expressed gene, using certain algorithm in the routine procedure. They cannot be distinguished in clinical samples by one or several genes or other sorting methods. Thus, we were not able to analyze these subsets in clinical samples. (2) We have supplemented the comparison of numbers and transcriptomes of three NK subtypes between the two groups (line 268-273). (3) We have checked the figures and confirmed that all characters on the figures are visible. (4) We have separately presented the plot in Figure S5d. (5) We compared the expression level of genes presented in Figure 5a between the two groups in three NK subtypes and supplemented this part (line 264-268). The results were very consistent across the three subtypes, suggesting that the results in total NK population were contributed by all three subtypes and not affected by a single composition. (6) KLRC1 is also known as NKG2A. We are sorry for not making a clear explanation, and now we use KLRC1 only in all text to avoid confusion. We have made a more clear and detailed description for Figure 5c, 5d, and 5e (now labeled as Figure 5b, 5c, and 5d), and have included the fluorochrome in Figure 5d (now labeled as Figure 5c) and the individual value in Figure 5e (now labeled as Figure 5d) (line 293-299). (7) In this section, we found the upregulated expression of inhibitory receptors, downregulated expression of effector molecules, and the impaired NK cell-mediated cytotoxicity in NK cell of WD patients from scRNA-seq. Then we validated the findings in clinical liver section samples and clinical blood samples by mIHC and flow cytometry, respectively. According to the recent articles, exhausted NK cells are characterized by decreased production of effector cytokines (e.g., IFNγ), as well as by impaired cytolytic activity, and downregulate expression of certain activating receptors and upregulate expression of inhibitory receptors (e.g., 10.3389/fimmu.2017.00760, 10.1038/s41590-018-0132-0, 10.1038/s41467-019-09212-y, 10.1080/2162402X.2016.1264562). Therefore, we concluded NK cell exhaustion in WD patients. (8) In the part about transcription factors, we kept the description of objective data and deleted the statement of the contribution of transcription factors to NK exhaustion.

      Figure 6: Data presented in Figure 6 and the conclusion made in this manuscript are predictive. There is no direct testing of ATP7B in NK cells to show the functions of this gene. Extension of this to patient survival is purely speculative. As long as authors state these facts clearly in their text, it can be acceptable. However, they do not extend their conclusions to similar liver diseases.

      ATP7B mutation is hepatocyte-specific, and it does not occur in any immune cells. The function of ATP7B in NK cell was not studied. We found the NK exhaustion and poor prognosis of cholecystitis in WD patients. Given that there were researches demonstrating that NK exhaustion is correlated with poor liver cancer prognosis, we hypothesized that NK exhaustion contributes to the poor prognosis of cholecystitis. Bioinformatics studies confirmed our hypothesis and supported the extension of this result to other inflammatory diseases. We had no experimental data, but this result was reliable in bioinformatics method.

      (4) Discussion: While the authors analyzed multiple cell types, the discussion is primarily focused on NK cells. There is no clear link between copper utilization, NK cell function, and exhaustion that the authors articulate.

      Thank you for your comments. The focus of our study is NK cell exhaustion, which is experimentally proven, so we discussed this aspect. We prioritize the effect of intercellular communication and metabolic alteration on the NK cell exhaustion in our follow-up study. Excess copper is released into the circulation in some circumstances in WD patients, but generally they receive long-term de-coppering therapy to maintain intracellular copper at a non-lethal level. Thus, we do not tend to consider copper as a critical factor in this study. In original manuscript, we mentioned the cuproptosis and its potential as a novel target. It is likely to lead to ambiguity and misunderstanding, so we deleted this part to put our point of view clearly.

      (5) Supplementary Figures: The presentation and labeling of these figures need to be changed.

      Thank you for your suggestions. We have modified the figures and confirmed that all characters on the figures are visible.

      Reviewer #2 (Recommendations For The Authors):

      It is better to test whether ATP7B mutation can directly affect immune functions.

      Thank you for your suggestions. Given that the ATP7B mutation is hepatocyte-specific, its effect on immune cells is most probably through intercellular communication. Thus, we prioritize the effect of intercellular communication on the NK cell exhaustion and we are actively promoting the research.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer 1

      We would like to express our gratitude to Reviewer 1 for providing a thorough summary of our work and highlighting its strengths. With regards to the weaknesses, we are committed to improve the manuscript by performing the necessary changes. First, we will specify the exact p-value in all cases.

      Regarding the discussion section, we acknowledge the feedback regarding its potential confusion. In line with the reviewer's suggestion, we will reduce the literature review and highlight our findings.

      Finally, for the preprint we did not include cofounders such as HIV infection and ethnicity as our study population did not exhibit viral infections and comprised only Hispanic individuals. We will make a more thorough description of the population of study and address these characteristics explicitly in both the methods section and the initial part of the results.

      Reviewer 2

      We appreciate and thank reviewer 2 for the commentaries. Although it is true that several papers have described the role of microbiome in COVID-19 severity, we firmly believe that our current work stands out. There is not much information related to this association in Mediterranean countries, especially in the south of Spain. In addition, most of the studies only describe microbiota composition in stool or nasopharyngeal samples separately, without investigating any potential relationships between them as we do.

      (1) We agree with the reviewer idea of a limited sample size. We faced the challenge of collecting the samples during the peak of COVID-19 pandemia. Thus, doctors and nurses were overwhelmed and not always available for carrying out patient recruitment following the inclusion criteria. Despite these constraints, we ensured that all included samples met our specified inclusion criteria and were from subjects with confirmed symptomatology.

      In addition, our main goal was to identify whether severity of the disease could be assessed through microbiota composition. Therefore we did not include a healthy group. Despite not having a large N, our results should be reproducible as they are supported by statistical analysis.

      (2) We thank reviewer commentary, and since our original sentence may have lacked clarity, we intend to modify it to ensure it conveys the intended meaning more effectively.

      Nonetheless, we remain confident in the significance of our findings. Not only have we found correlation between microbiota and COVID severity, but we have also described how specific bacteria from each condition is associated with key biochemical parameters of clinical COVID infection.

      (3) We appreciate the feedback provided by the reviewer. In this case, we have performed 16S analysis due to its cost-effectiveness compared to metagenomic approaches. Furthermore, 16S analysis has undergone refinements that ensure comprehensive coverage and depth, along with standardized analysis protocols. Unlike 16S, metagenomic approaches lack software tools such as QIIME that facilitate standardization of analysis and, thus, reduce reproducibility of results.

      (4) We sincerely appreciate this insightful suggestion. simply listing associations between both microbiomes and COVID-19 severity could not be enough, we intend to discuss how microbiota composition may be linked to the mechanisms underlying COVID-19 pathogenesis in our discussion.

      (5) We are grateful for the constructive criticism and intend to rewrite our abstract to enhance clarity. Additionally, we will thoroughly review all figures and their descriptions to ensure accuracy and comprehensibility.

      Reviewer 3

      We acknowledge the annotations made by reviewer 3 and are committed to addressing all identified weaknesses to enhance the quality of our work. Our idea is to modify the methods section and figures to make them easier to understand.

      Specifically, in the case of Figure 1, we recognize an error in the description of the Bray-Curtis test. We appreciate the commentary and we will make the necessary changes. Moreover, there is another observation related to Figure 1 description. We are going to modify it in order to gain accuracy.

      For figure 2 we are planning to add a supplementary table showing the abundance of detected genus. Nevermind, we will also update the manuscript text to provide clarification on how we obtained this result.

      Regarding the clarification about "1% abundance," we want to emphasize that we are referring to relative abundance, where 1 represents 100%. To avoid confusion, we will explicitly state this in both the methods section and figure descriptions. Besides, it is true that the statistical test employed for the analysis is not mentioned in the figure description and we recognize that the image may be difficult to interpret. Therefore, we will modify the text and a supplementary table displaying the abundance and p values is going to be added.

      Furthermore, we agree with the reviewer's suggestion to investigate whether the bacteria identified as potential biomarkers for each condition are specific to their respective severity index or if there is a threshold. Thus, we will reanalyze the data and include a supplementary table with the abundance of each biomarker for each condition. We will also place greater emphasis on these results in our discussion.

      Finally, in response to the reviewer's suggestion, we are going to go through the nasopharyngeal-fecal axis part in the discussion. It is well described that COVID-19 induces a dysbiosis in both microbiomes. Consequently, we understand that the ratio we have described could be an interesting tool for assessing COVID severity development as it considers alterations in both environments. However, we acknowledge that there may be room for improvement in clarifying the significance of this intriguing finding and its implications.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript from Schwintek and coworkers describes a system in which gas flow across a small channel (10^-4-10^-3 m scale) enables the accumulation of reactants and convective flow. The authors go on to show that this can be used to perform PCR as a model of prebiotic replication.

      Strengths:

      The manuscript nicely extends the authors' prior work in thermophoresis and convection to gas flows. The demonstration of nucleic acid replication is an exciting one, and an enzyme-catalyzed proof-of-concept is a great first step towards a novel geochemical scenario for prebiotic replication reactions and other prebiotic chemistry.

      The manuscript nicely combines theory and experiment, which generally agree well with one another, and it convincingly shows that accumulation can be achieved with gas flows and that it can also be utilized in the same system for what one hopes is a precursor to a model prebiotic reaction. This continues efforts from Braun and Mast over the last 10-15 years extending a phenomenon that was appreciated by physicists and perhaps underappreciated in prebiotic chemistry to increasingly chemically relevant systems and, here, a pilot experiment with a simple biochemical system as a prebiotic model.

      I think this is exciting work and will be of broad interest to the prebiotic chemistry community.

      Weaknesses:

      The manuscript states: "The micro scale gas-water evaporation interface consisted of a 1.5 mm wide and 250 µm thick channel that carried an upward pure water flow of 4 nl/s ≈ 10 µm/s perpendicular to an air flow of about 250 ml/min ≈ 10 m/s." This was a bit confusing on first read because Figure 2 appears to show a larger channel - based on the scale bar, it appears to be about 2 mm across on the short axis and 5 mm across on the long axis. From reading the methods, one understands the thickness is associated with the Teflon, but the 1.5 mm dimension is still a bit confusing (and what is the dimension in the long axis?) It is a little hard to tell which portion (perhaps all?) of the image is the channel. This is because discontinuities are present on the left and right sides of the experimental panels (consistent with the image showing material beyond the channel), but not the simulated panels. Based on the authors' description of the apparatus (sapphire/CNC machined Teflon/sapphire) it sounds like the geometry is well-known to them. Clarifying what is going on here (and perhaps supplying the source images for the machined Teflon) would be helpful.

      We understand. We will update the figures to better show dimensions of the experimental chamber. We will also add a more complete Figure in the supplementary information. Part of the complexity of the chamber however stems from the fact that the same chamber design has also been used to create defined temperature gradients which are not necessary and thus the chamber is much more complex than necessary.

      We added the scheme of the whole PTFE Chip to Figure 2 in the top left corner, indicating the ROI shown in the fluorescence micrographs. Additionally, the channel walls are now clearly indicated by white dotted lines. The dimensions of the setup are now shown clearer, by showing the total width of the channel as well as its height until the gas flux channel, as well as its depth. Changed caption of the figure accordingly and it now reads: “[…] The PTFE chip cutout in the top left corner shows the ROI used for the micrographs. The color scale is equal for both simulation and experiment and Channel dimensions are 4 x 1.5 x 0.25 mm as indicated. Dotted lines visualize the location of the channel walls. […]“

      The data shown in Figure 2d nicely shows nonrandom residuals (for experimental values vs. simulated) that are most pronounced at t~12 m and t~40-60m. It seems like this is (1) because some symmetry-breaking occurs that isn't accounted for by the model, and perhaps (2) because of the fact that these data are n=1. I think discussing what's going on with (1) would greatly improve the paper, and performing additional replicates to address (2) would be very informative and enhance the paper. Perhaps the negative and positive residuals would change sign in some, but not all, additional replicates?

      To address this, we will show two more replicates of the experiment and include them in Figure 2.

      We are seeing two effects when we compare fluorescence measurements of the experiments.

      Firstly, degassing of water causes the formation of air-bubbles, which are then transported upwards to the interface, disrupting fluorescence measurements. This, however, mostly occurs in experiments with elevated temperatures for PCR reactions, such as displayed in Figure 4.

      Secondly, due to the high surface tension of water, the interface is quite flexible. As the inflow and evaporation work to balance each other, the shape of the interface adjusts, leading to alterations in the circular flow fields below.

      Thus the conditions, while overall being in steady state, show some fluctuations. The strong dependence on interface shape is also seen in the simulation. However, modeling a dynamic interface shape is not so easy to accomplish, so we had to stick to one geometry setting. Again here, the added movies of two more experiments should clarify this issue.

      We performed three more replicates of the experiment and included the averaged data points together with their respective standard deviation as error bars in Figure 2d. Additionally, the videos of each individual repeat are now added to the supplementary files for the reader to better understand where the strong fluctuations around half an hour come from. The Figure caption was adjusted to “ […] The maximum relative concentration of DNA increased within an hour to ~30 X the initial concentration, with the trend following the simulation. Error bars are the standard deviation from four independent measurements. […].

      The main text was also changed to better explain how the fluctuations impact the measurements: […] Water continuously evaporated at the interface, but nucleic acids remained in the aqueous phase accumulating near the interface. They could only escape downward either by diffusion or by the vortex induced by the gas flowing across the interface, pushing the molecules back deeper into the bulk (See the flow lines in Fig2(b) taken from the simulation).  As the gas flow continuously removed excess vapor, the evaporation rate remained constant. Thus, except for fluctuations, a stable interface shape should be expected. However, due to the high surface tension of water, the interface is very flexible. As the inflow and evaporation work to balance each other, the shape of the interface adjusts, likely in response to small fluctuations in gas pressure and spatial variations in water surface tension. This is leading to alterations in the circular flow fields below (Supplementary Movie 2).

      As these fluctuations are difficult to simulate, we decided to stick with one interface shape, matching evaporation and inflow speeds. The evaporation rate at the interface was therefore set to be proportional to the vapor concentration gradient and varied spatially along the interface between 5 and 10.5 µm/s (See Suppl. Fig. VI.1(d)). Using the known diffusion coefficient of 95 µm²/s for the 63mer[9]}, the simulation closely matched the experimental results. In both cases, DNA accumulated in regions with circular flow patterns driven by the gas flux (Fig.2(b), right panel).

      5 minutes after starting the experiment, the maximum DNA accumulation was 3-fold, while after one hour of evaporation, around 30-fold accumulation was observed. Due to molecules residing in very shallow volumes when directly at the interface, the fluorescence signal can vary drastically compared to measurements deeper in the bulk. This can be seen in the fluctuations between independent measurements (See Supplementary Movies 2b,2b,2c), especially around 0.5~h shown in Figure 2(d). The simulated maximum accumulation followed the experimental results and starts saturating after about one hour (Fig.2(d)). […]”

      The authors will most likely be familiar with the work of Victor Ugaz and colleagues, in which they demonstrated Rayleigh-Bénard-driven PCR in convection cells (10.1126/science.298.5594.793, 10.1002/anie.200700306). Not including some discussion of this work is an unfortunate oversight, and addressing it would significantly improve the manuscript and provide some valuable context to readers. Something of particular interest would be their observation that wide circular cells gave chaotic temperature profiles relative to narrow ones and that these improved PCR amplification (10.1002/anie.201004217). I think contextualizing the results shown here in light of this paper would be helpful.

      Thanks for pointing this out and reminding us. We apologize. We agree that the chaotic trajectories within Rayleigh-Bénard convection cells lead to temperature oscillations similar to the salt variations in our gas-flux system. Although the convection-driven PCR in Rayleigh-Bénard is not isothermal like our system, it provides a useful point of comparison and context for understanding environments that can support full replication cycles. We will add a section comparing approaches and giving some comparison into the history of convective PCR and how these relate to the new isothermal implementation.

      We added a main text paragraph after the last paragraph in section “Strand Separation Dynamics”: “[…]Rayleigh-Bénard convection cells generate similar patterns to those seen in Fig. 3(c) The oscillations in salt concentration resemble the temperature fluctuations observed in convection-based PCR reactions from earlier studies [32,33], which showed that chaotic temperature variations, compared to periodic ones, enhanced the efficiency of the PCR reaction.[…]

      Again, it appears n=1 is shown for Figure 4a-c - the source of the title claim of the paper - and showing some replicates and perhaps discussing them in the context of prior work would enhance the manuscript.

      We appreciate the reviewer for bringing this to our attention. We will now include the two additional repeats for the data shown in Figure 4c, while the repeats of the PAGE measurements are already displayed in Supplementary Fig. IX.2. Initially, we chose not to show the repeats in Figure 4c due to the dynamic and variable nature of the system. These variations are primarily caused by differences at the water-air interface, attributed to the high surface tension of water. Additionally, the stochastic formation of air bubbles in the inflow—despite our best efforts to avoid them—led to fluctuations in the fluorescence measurements across experiments. These bubbles cause a significant drop in fluorescence in a region of interest (ROI) until the area is refilled with the sample.

      Unlike our RNA-focused experiments, PCR requires high temperatures and degassing a PCR master mix effectively is challenging in this context. While we believe our chamber design is sufficiently gas-tight to prevent air from diffusing in, the high surface-to-volume ratio in microfluidics makes degassing highly effective, particularly at elevated temperatures. We anticipate that switching to RNA experiments at lower temperatures will mitigate this issue, which is also relevant in a prebiotic context.

      The reviewer’s comments are valid and prompt us to fully display these aspects of the system. We will now include these repeats in Figure 4c to give readers a deeper understanding of the experiment's dynamics. Additionally, we will provide videos of all three repeats, allowing readers to better grasp the nature of the fluctuations in SYBR Green fluorescence depicted in Figure 4c.

      The data from the triplicates are now added to Figure 4c, showing how air bubbles, forming through degassing at the high temperatures required for Taq polymerase, disrupt the measurement, as they momentarily dry off the channel and stop the reaction until the channel fills again. Figure caption has been adapted and now reads: “[…] Dotted lines show the data from independent repeats. Air bubbles formed through degassing can momentarily disrupt the reaction. […]”

      We additionally changed the main text to explain the reader the experimental difficulties: “[…] In other repetitions of the reaction, this increase was sometimes even observed earlier, around the one-hour mark (dotted lines). However, air bubbles nucleated by degassing events rise and temporarily dry out the channel, interrupting the reaction until the liquid refills the channel (Supplementary Movies 4,4b,4c\&5). Despite our best efforts, we were unable to fully prevent this, especially given the high temperatures required for Taq polymerase activity. In an identical setting when the gas- and water flux were switched off, no fluorescence increase was found (See Fig. 4(c) red lines). Fluorescence variations are additionally caused by fluctuations in the position of the gas-water interface, as discussed earlier. […]”

      I think some caution is warranted in interpreting the PCR results because a primer-dimer would be of essentially the same length as the product. It appears as though the experiment has worked as described, but it's very difficult to be certain of this given this limitation. Doing the PCR with a significantly longer amplicon would be ideal, or alternately discussing this possible limitation would be helpful to the readers in managing expectations.

      This is a good point and should be discussed more in the manuscript. Our gel electrophoresis is capable of distinguishing between replicate and primer dimers. We know this since we were optimizing the primers and template sequences to minimize primer dimers, making it distinguishable from the desired 61mer product. That said, all of the experiments performed without a template strand added did not show any band in the vicinity of the product band after 4h of reaction, in contrast to the experiments with template, presenting a strong argument against the presence of primer dimers.

      We added a main text section explaining this to the reader: “[…]Suppl. Fig. IX.2 shows all independent repeats of the corresponding experiments. No product was detected in any of these cases, ruling out reaction limitations such as primer dimer formation. Primer dimers would form even in the absence of a template strand and would be identifiable through gel electrophoresis. As Taq polymerase requires a significant overlap between the two dimers to bind, this would result in a shorter product compared to the 61mer used here.  […]”

      Reviewer #2 (Public review):

      Schwintek et al. investigated whether a geological setting of a rock pore with water inflow on one end and gas passing over the opening of the pore on the other end could create a non-equilibrium system that sustains nucleic acid reactions under mild conditions. The evaporation of water as the gas passes over it concentrates the solutes at the boundary of evaporation, while the gas flux induces momentum transfer that creates currents in the water that push the concentrated molecules back into the bulk solution. This leads to the creation of steady-state regions of differential salt and macromolecule concentrations that can be used to manipulate nucleic acids. First, the authors showed that fluorescent bead behavior in this system closely matched their fluid dynamic simulations. With that validation in hand, the authors next showed that fluorescently labeled DNA behaved according to their theory as well. Using these insights, the authors performed a FRET experiment that clearly demonstrated the hybridization of two DNA strands as they passed through the high Mg++ concentration zone, and, conversely, the dissociation of the strands as they passed through the low Mg++ concentration zone. This isothermal hybridization and dissociation of DNA strands allowed the authors to perform an isothermal DNA amplification using a DNA polymerase enzyme. Crucially, the isothermal DNA amplification required the presence of the gas flux and could not be recapitulated using a system that was at equilibrium. These experiments advance our understanding of the geological settings that could support nucleic acid reactions that were key to the origin of life.

      The presented data compellingly supports the conclusions made by the authors. To increase the relevance of the work for the origin of life field, the following experiments are suggested:

      (1) While the central premise of this work is that RNA degradation presents a risk for strand separation strategies relying on elevated temperatures, all of the work is performed using DNA as the nucleic acid model. I understand the convenience of using DNA, especially in the latter replication experiment, but I think that at least the FRET experiments could be performed using RNA instead of DNA.

      We understand the request only partially. The modification brought about by the two dye molecules in the FRET probe to be able to probe salt concentrations by melting is of course much larger than the change of the backbone from RNA to DNA. This was the reason why we rather used the much more stable DNA construct which is also manufactured at a lower cost and in much higher purity also with the modifications. But we think the melting temperature characteristics of RNA and DNA in this range is enough known that we can use DNA instead of RNA for probing the salt concentration in our flow cycling.

      Only at extreme conditions of pH and salt, RNA degradation through transesterification, especially under alkaline conditions is at least several orders of magnitude faster than spontaneous degradative mechanisms acting upon DNA [Li, Y., & Breaker, R. R. (1999). Kinetics of RNA degradation by specific base catalysis of transesterification involving the 2 ‘-hydroxyl group. Journal of the American Chemical Society, 121(23), 5364-5372.]. The work presented in this article is however focussed on hybridization dynamics of nucleic acids. Here, RNA and DNA share similar properties regarding the formation of double strands and their respective melting temperatures. While RNA has been shown to form more stable duplex structures exhibiting higher melting temperatures compared to DNA [Dimitrov, R. A., & Zuker, M. (2004). Prediction of hybridization and melting for double-stranded nucleic acids. Biophysical Journal, 87(1), 215-226.], the general impact of changes in salt, temperature and pH [Mariani, A., Bonfio, C., Johnson, C. M., & Sutherland, J. D. (2018). pH-Driven RNA strand separation under prebiotically plausible conditions. Biochemistry, 57(45), 6382-6386.] on respective melting temperatures follows the same trend for both nucleic acid types. Also the diffusive properties of RNA and DNA are very similar [Baaske, P., Weinert, F. M., Duhr, S., Lemke, K. H., Russell, M. J., & Braun, D. (2007). Extreme accumulation of nucleotides in simulated hydrothermal pore systems. Proceedings of the National Academy of Sciences, 104(22), 9346-9351.].

      Since this work is a proof of principle for the discussed environment being able to host nucleic acid replication, we aimed to avoid second order effects such as degradation by hydrolysis by using DNA as a proxy polymer. This enabled us to focus on the physical effects of the environment on local salt and nucleic acid concentration. The experiments performed with FRET are used to visualize local salt concentration changes and their impact on the melting temperature of dissolved nucleic acids.  While performing these experiments with RNA would without doubt cover a broader application within the field of origin of life, we aimed at a step-by-step / proof of principle approach, especially since the environmental phenomena studied here have not been previously investigated in the OOL context. Incorporating RNA-related complexity into this system should however be addressed in future studies. This will likely require modifications to the experimental boundary conditions, such as adjusting pH, temperature, and salt concentration, to account for the greater duplex stability of RNA. For instance, lowering the pH would reduce the RNA melting temperature [Ianeselli, A., Atienza, M., Kudella, P. W., Gerland, U., Mast, C. B., & Braun, D. (2022). Water cycles in a Hadean CO2 atmosphere drive the evolution of long DNA. Nature Physics, 18(5), 579-585.].

      (2) Additionally, showing that RNA does not degrade under the conditions employed by the authors (I am particularly worried about the high Mg++ zones created by the flux) would further strengthen the already very strong and compelling work.

      Based on literature values for hydrolysis rates of RNA [Li, Y., & Breaker, R. R. (1999). Kinetics of RNA degradation by specific base catalysis of transesterification involving the 2 ‘-hydroxyl group. Journal of the American Chemical Society, 121(23), 5364-5372.], we estimate RNA to have a half-life of multiple months under the deployed conditions in the FRET experiment (High concentration zones contain <1mM of Mg2+). Additionally, dsRNA is multiple orders of magnitude more stable than ssRNA with regards to degradation through hydrolysis [Zhang, K., Hodge, J., Chatterjee, A., Moon, T. S., & Parker, K. M. (2021). Duplex structure of double-stranded RNA provides stability against hydrolysis relative to single-stranded RNA. Environmental Science & Technology, 55(12), 8045-8053.], improving RNA stability especially in zones of high FRET signal. Furthermore, at the neutral pH deployed in this work, RNA does not readily degrade. In previous work from our lab [Salditt, A., Karr, L., Salibi, E., Le Vay, K., Braun, D., & Mutschler, H. (2023). Ribozyme-mediated RNA synthesis and replication in a model Hadean microenvironment. Nature Communications, 14(1), 1495.], we showed that the lifetime of RNA under conditions reaching 40mM Mg2+ at the air-water interface at 45°C was sufficient to support ribozymatically mediated ligation reactions in experiments lasting multiple hours.

      With that in mind, gaining insight into the median Mg2+ concentration across multiple averaged nucleic acid trajectories in our system (see Fig. 3c&d) and numerically convoluting this with hydrolysis dynamics from literature would be highly valuable. We anticipate that longer residence times in trajectories distant from the interface will improve RNA stability compared to a system with uniformly high Mg2+ concentrations.

      Added a new Supplementary section for this. We used the trace from Figure 3(c) and calculated the hydrolysis rate for each timestep by using literature values from RNA [Li, Y., & Breaker, R. R. (1999). Kinetics of RNA degradation by specific base catalysis of transesterification involving the 2 ‘-hydroxyl group. Journal of the American Chemical Society, 121(23), 5364-5372.]. We conclude that the conditions deployed for the experiment are not harsh on RNA, with hydrolysis rates in the E-6 1/min regime. The figure below (also now in the supplementary information) shows the hydrolysis of RNA deployed under the conditions of the experiment in Figure 3. RNA is not expected to hydrolyze under these conditions and timescales, in which a replication reaction would occur. With a half life of around 83 days, even a prebiotically plausible – very slow – replication reaction would not be constrained by hydrolysis boundary conditions in this scenario.

      Referenced to this section in the supplementary information in the maintext: […] In the experimental conditions used here, RNA would also not readily degrade, even if the strand enters the high salt regimes (See Suppl. Sec. IX). Using literature values for hydrolysis rates under the deployed conditions, we estimate dissolved RNA to have a half life of around 83 days. […]

      (3) Finally, I am curious whether the authors have considered designing a simulation or experiment that uses the imidazole- or 2′,3′-cyclic phosphate-activated ribonucleotides. For instance, a fully paired RNA duplex and a fluorescently-labeled primer could be incubated in the presence of activated ribonucleotides +/- flux and subsequently analyzed by gel electrophoresis to determine how much primer extension has occurred. The reason for this suggestion is that, due to the slow kinetics of chemical primer extension, the reannealing of the fully complementary strands as they pass through the high Mg++ zone, which is required for primer extension, may outcompete the primer extension reaction. In the case of the DNA polymerase, the enzymatic catalysis likely outcompetes the reannealing, but this may not recapitulate the uncatalyzed chemical reaction.

      This is certainly on our to-do list for future experiments in this setting. Our current focus is on templated ligation rather than templated polymerization and we are working hard to implement RNA-only enzyme-free ligation chain reaction, based on more optimized parameters for the templated ligation from 2’3’-cyclic phosphate activation that was just published [High-Fidelity RNA Copying via 2′,3′-Cyclic Phosphate Ligation, Adriana C. Serrão, Sreekar Wunnava, Avinash V. Dass, Lennard Ufer, Philipp Schwintek, Christof B. Mast, and Dieter Braun, JACS doi.org/10.1021/jacs.3c10813 (2024)]. But we first would try this at an air-water interface which was shown to work with RNA in a temperature gradient [Ribozyme-mediated RNA synthesis and replication in a model Hadean microenvironment, Annalena Salditt, Leonie Karr, Elia Salibi, Kristian Le Vay, Dieter Braun & Hannes Mutschler, Nature Communications doi.org/10.1038/s41467-023-37206-4 (2023)] before making the jump to the isothermal setting we describe here. So we can understand the question, but it was good practice also in the past to first get to know the setting with PCR, then jump to RNA.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Could the authors comment on the likelihood of the geological environments where the water inflow velocity equals the evaporation velocity?

      This is an important point to mention in the manuscript, thank you for pointing that out. To produce a defined experiment, we were pushing the water out with a syringe pump, but regulated in a way that the evaporation was matching our flow rate. We imagine that a real system will self-regulate the inflow of the water column on the one hand side by a more complex geometry of the gas flow, matching the evaporation with the reflow of water automatically. The interface would either recede or move closer to the gas flux, depending on whether the inflow exceeds or falls short of the evaporation rate. As the interface moves closer, evaporation speeds up, while moving away slows it down. This dynamic process stabilizes the system, with surface tension ultimately fixing the interface in place.

      We have seen a bit of this dynamic already in the experiments, could however so far not yet find a good geometry within our 2-dimensional constant thickness geometry to make it work for a longer time. Very likely having a 3-dimensional reservoir of water with less frictional forces would be able to do this, but this would require a full redesign of a multi-thickness microfluidics. The more we think about it, the more we envisage to make the next implementation of the experiment with a real porous volcanic rock inside a humidity chamber that simulates a full 6h prebiotic day. But then we would lose the whole reproducibility of the experiment, but likely gain a way that recondensation of water by dew in a cold morning is refilling the water reservoirs in the rocks again. Sorry that I am regressing towards experiments in the future.

      We added a paragraph after the second paragraph in Results and Discussion.

      It now reads: […] For a real early Earth environment we envision a system that self-regulates the water column's inflow by automatically balancing evaporation with capillary flows. The interface adjusts its position relative to the gas flux, moving closer if the inflow is less than the evaporation rate, or receding if it exceeds it. When the interface nears the gas flux, evaporation accelerates, while moving it away slows evaporation. This dynamic process stabilizes the system, with surface tension ultimately fixing the interface's position. […]

      (2) Could the authors speculate on using gases other than ambient air to provide the flux and possibly even chemical energy? For example, using carbonyl sulfide or vaporized methyl isocyanide could drive amino acid and nucleotide activation, respectively, at the gas-water interface.

      This is an interesting prospect for future work with this system. We thought also about introducing ammonia for pH control and possible reactions. We were amazed in the past that having CO2 instead of air had a profound impact on the replication and the strand separation [Water cycles in a Hadean CO2 atmosphere drive the evolution of long DNA, Alan Ianeselli, Miguel Atienza, Patrick Kudella, Ulrich Gerland, Christof Mast & Dieter Braun, Nature Physics doi.org/10.1038/s41567-022-01516-z (2022)]. So going more in this direction absolutely makes sense and as it acts mostly on the length-selectively accumulated molecules at the interface, only the selected molecules will be affected, which adds to the selection pressure of early evolutionary scenarios.

      Of course, in the manuscript, we use ambient air as a proxy for any gas, focusing primarily on the energy introduced through momentum transfer and evaporation. We speculate that soluble gasses could establish chemical gradients, such as pH or redox potential, from the bulk solution to the interface, similar to the Mg2+ accumulation shown in Figure 3c. The nature of these gradients would depend on each gas's solubility and diffusivity. We have already observed such effects in thermal gradients [Keil, L. M., Möller, F. M., Kieß, M., Kudella, P. W., & Mast, C. B. (2017). Proton gradients and pH oscillations emerge from heat flow at the microscale. Nature communications, 8(1), 1897.] and finding similar behavior in an isothermal environment would be a significant discovery.

      Added a paragraph in the Conclusion to showcase this: [… ] Furthermore we expect that other gases, such as CO2, could establish chemical gradients in this environment. Such gradients have been observed in thermal gradients before [23] and finding similar behaviour in an isothermal environment would be a significant discovery.[…]

      (3) Line 162: Instead of "risk," I suggest using "rate".

      Thanks for pointing this out! Will be changed.

      Fixed.

      (4) Using FRET of a DNA duplex as an indicator of salt concentration is a decent proxy, but a more direct measurement of salt concentration would provide further merit to the explicit statement that it is the salt concentration that is changing in the system and not another hidden parameter.

      Directly observing salt concentration using microscopy is a difficult task. While there are dyes that change their fluorescence depending on the local Na+ or Mg2+ concentration, they are not operating differentially, i.e. by making a ratio between two color channels. Only then we are not running into artifacts from the dye molecules being accumulated by the non-equilibrium settings. We were able to do this for pH in the past, but did not find comparable optical salt sensors. This is the reason we ended up with a FRET pair, with the advantage that we actually probe the strand separation that we are interested in anyhow. Using such a dye in future work would however without a doubt enhance the understanding of not only this system, but also our thermal gradient environments.

      (5) Figure 3a: Could the authors add information on "Dried DNA" to the caption? I am assuming this is the DNA that dried off on the sides of the vessel but cannot be sure.

      Thanks to the reviewer for pointing this out. This is correct and we will describe this better in the revised manuscript.

      Added a sentence in the caption to address this: […] Fluctuations in interface position can dry and redissolve DNA repeatedly (see “Dried DNA” in right panel). […]

      (6) Figure 4b and c: How reproducible is this data? Have the authors performed this reaction multiple independent times? If so, this data should be added to the manuscript.

      The data from the gel electrophoresis was performed in triplicates and is shown in full in supplementary information. The data in c is hard to reproduce, as the interface is not static and thus ROI measurements are difficult to perform as an average of repeats. Including the data from the independent repeats will however give the reader insight into some of the experimental difficulties, such as air bubbles, which form from degassing as the liquid heats up, that travel upwards to the interface, disrupting the ongoing fluorescence measurements.

      This was also pointed out by reviewer 1 and addressed there.

      (7) Line 256: "shielding from harmful UV" statement only applies to RNA oligomers as UV light may actually be beneficial for earlier steps during ribonucleoside synthesis. I suggest rephrasing to "shielding nucleic acid oligomers from UV damage.".

      Will be adjusted as mentioned.

      Fixed.

      (8) The final paragraph in the Results and Discussion section would flow better if placed in the Conclusion section.

      This is a good point and we will merge results and discussion closer together.

      Fixed.

      (9) Line 262, "...of early Life" is slightly overstating the conclusions of the study. I suggest rephrasing to "...of nucleic acids that could have supported early life."

      This is a fair comment. We thank the reviewer for his detailed analysis of the manuscript!

      Changed the phrase to: […]In this work we investigated a prebiotically plausible and abundant geological environment to support the replication of nucleic acids. […]

      (10) In references, some of the journal names are in sentence case while others are in title case (see references 23 and 26 for example).

      Thanks - this will be fixed.

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This study provides compelling evidence that RAR, rather than its obligate dimerization partner RXR, is functionally limiting for chromatin binding. This manuscript provides a paradigm for how to dissect the complicated regulatory networks formed by dimerizing transcription factor families.

      Dahal and colleagues use advanced SMT techniques to revisit the role of RXR in DNA-binding of the type-2 nuclear receptor (T2NR) RAR. The dominant consensus model for regulated DNA binding of T2NRs posits that they compete for a limited pool of RXR to form an obligate T2NR-RXR dimer. Using advanced SMT and proximity-assisted photoactivation technologies, Dahal et al. now test the effect of manipulating the endogenous pool size of RAR and RXR on heterodimerization and DNA-binding in live U2OS cells. Surprisingly, it turns out that RAR, rather than RXR, is functionally limiting for heterodimerization and chromatin binding. By inference, the relative pool size of various T2NRs expressed in a given cell, rather than RXR, is likely to determine chromatin binding and transcriptional output.

      The conclusions of this study are well supported by the experimental results and provide unexpected novel insights into the functioning of the clinically important class of T2NR TFs. Moreover, the presented results show how the use of novel technologies can put long-standing theories on how transcription factors work upside down. This manuscript provides a paradigm for how to further dissect the complicated regulatory networks formed by T2NRs or other dimerizing TFs. I found this to be a complete story that does not require additional experimental work. However, I do have some suggestions for the authors to consider.

      Reviewer #1 (Recommendations For The Authors):

      (1) Does the increased chromatin binding measured when the RAR levels are increased reflect a higher occupancy of a similar set of loci, or are additional loci bound? The authors could discuss this issue in the context of the published literature. Obviously, this could be addressed experimentally by ChIP-seq or a similar analysis, but this would extend beyond the main topic of this manuscript.

      We attempted to explore this experimentally using ChIP-seq with multiple RAR- and RXR-specific antibodies. Unfortunately, our results were inconclusive, as the antibody enrichment relative to the IgG control was insufficient for reliable interpretation. Specifically, our ChIP-seq enrichment levels were only around 1.5fold, while the accepted standard for meaningful ChIP enrichment is typically at least 2-fold. Due to these technical limitations, we decided to defer these experiments for now.

      However, we agree with the reviewer that understanding whether the increased chromatin binding of RAR reflects higher occupancy at the same set of loci or binding to additional loci is a key question. In similar experiments involving the transcription factor TFEB (Esbin et al., 2024, Genes Dev, doi: 10.1101/gad.351633.124) where an increase in the SMT bound fraction occurred, both scenarios—higher occupancy at known loci and binding to additional loci in ChIP-seq was observed. So, addressing this intriguing possibility in future studies focused on RAR and RXR would be interesting.

      (2) The results presented suggest convincingly that endogenous RXR is normally in excess to its binding partners (in U2OS cells). This point could be strengthened further by reducing RXR levels, e.g., by knocking out 1 allele or the use of shRNAs (although the latter method might be too hard to control). Overexpression of another T2NR might also help determine the buffer capacity of RXR.

      We appreciate the reviewers’ acknowledgment that our results convincingly demonstrate that endogenous RXR is typically in excess relative to its binding partners in U2OS cells. We agree that this conclusion could be further reinforced by experiments such as overexpression of another T2NR to test RXR's buffering capacity. We are actively pursuing follow-up experiments involving overexpression of additional T2NRs to address this question in more detail. These studies are ongoing, and we plan to explore the buffer capacity of RXR more extensively in a future manuscript.

      (3) The ~10% difference in fbound of RAR and RXR (in Figs 1 and 2), while they should be 1:1 dimers, is explained by invoking the expression of RXR isoforms. Can the authors be more specific concerning the nature of these isoforms?

      We have provided detailed information about different T2NRs expressed in U2OS cells according to the Expression Atlas and the Human Protein Atlas Database in Supplementary Table S1. Table S1 specifically shows that both isoforms of RXRα and RXRβ are expressed in U2OS cells. Additionally, the caption of Table S1 explicitly notes the presence of isoform RXRβ in U2OS cells. In the main text, we reference Table S1 when discussing the 10% difference in fbound between RARα and RXRα, and we have now suggested that the expression of RXRβ likely accounts for the observed discrepancy.

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript "Surprising Features of Nuclear Receptor Interaction Networks Revealed by Live Cell Single Molecule Imaging", Dahal et al combine fast single molecule tracking (SMT) with proximity-assisted photoactivation (PAPA) to study the interaction between RARa and RXRa. The prevalent model in the nuclear receptor field suggests that type II nuclear receptors compete for a limiting pool of their partner RXRa. Contrary to this, the authors find that over-expression of RARa but not RXRa increases the fraction of RXRa molecules bound to chromatin, which leads them to conclude that the limiting factor is the abundance of RARa and not RXRa. The authors also perform experiments with a known RARa agonist, all trans retinoic acid (atRA) which has little effect on the bound fraction. Using PAPA, they show that chromatin binding increases upon dimerization of RARa and RXRa.

      Strengths:

      In my view, the biggest strength of this study is the use of endogenously tagged RARa and RXRa cell lines. As the authors point out, most previous studies used either in vitro assays or over-expression. I commend the authors on the generation of single-cell clones of knock-in RARa-Halo and Halo-RXRa. The authors then carefully measure the abundance of each protein using FACS, which is very helpful when comparing across conditions. The manuscript is generally well written and figures are easy to follow. The consistent color-scheme used throughout the manuscript is very helpful.

      Weaknesses:

      (1) Agonist treatment:

      The authors test the effect of all trans retinoic acid (atRA) on the bound fraction of RARa and RXRa and find that "These results are consistent with the classic model in which dimerization and chromatin binding of T2NRs are ligand independent." However, all the agonist treatments are done in media containing FBS. FBS is not chemically defined and has been found to have between 10 and 50 nM atRA (see references in PMID 32359651 for example). The addition of 1 nM or 100 nM atRA is unlikely to result in a strong effect since the medium already contains comparable or higher levels of agonist. To test their hypothesis of ligand-independent dimerization, the authors should deplete the media of atRA by growing the cells in a medium containing charcoal-stripped FBS for at least 24 hours before adding agonist.

      We acknowledge the reviewer's concern regarding the presence of atRA in FBS and agree that it may introduce baseline levels of agonist. However, in our experiments, both the 1 nM and 100 nM atRA treatments resulted in observable changes in RAR expression levels (Figure S3C). Additionally, the luciferase assays demonstrated that 100 nM atRA significantly increased retinoic acid-responsive promoter activity (Figure S1C). Given these clear responses to atRA, we believe the observed lack of effect on the chromatin-bound fraction cannot be attributed to the presence of comparable or higher levels of atRA in the FBS, as the reviewer suggests. Moreover, since our results align with the established literature and do not impact the core findings of our study, we decided not to pursue the suggested experiments with charcoal-stripped FBS in this manuscript.  

      (2) Photobleaching and its effect on bound fraction measurements:

      The authors discard the first 500 to 1000 frames due to the high localization density in the initial frames. This will preferentially discard bound molecules that will bleach in the initial frames of the movie and lead to an over-estimation of the unbound fraction.

      For experiments with over-expression of RAR-Halo and Halo-RXR, the authors state that the cells were pre-bleached and that these frames were used to calculate the mean intensity of the nuclei. When pre-bleaching, bound molecules will preferentially bleach before the diffusing population. This will again lead to an over-representation of the unbound fraction since this is the population that will remain relatively unaffected by the pre-bleaching. Indeed, the bound fraction for over-expressed RARa and RXRa is significantly lower than that for the corresponding knock in lines. To confirm whether this is a biological result, I suggest that the authors either reduce the amount of dye they use so that this pre-bleaching is not necessary or use the direct reactivation strategy they use for their PAPA experiments to eliminate the pre-bleaching step.

      As for the measurement of the nuclear intensity, since the authors have access to multiple HaloTag dyes, they can saturate the HaloTagged proteins with a high concentration of JF646 or JFX650 to measure the mean intensity of the protein while still using the PA-JFX549 for SMT. Together, these will eliminate the need to prebleach or discard any frames.

      The Janelia Fluor dyes used in our experiments are known for their high photostability (Grimm et al., 2021, JACS Au, doi: 10.1021/jacsau.1c00006). During the initial 80 ms imaging to calculate the mean nuclear intensity, the laser power was kept at very low intensity (~3%) for a brief duration (~10 seconds), in contrast to the high-intensity (~100%) used during the tracking experiments, which span around 3 minutes. This low-power illumination does not induce significant photobleaching but merely puts the dyes in a temporary dark state. Therefore, this pre-bleaching step closely resembles the direct reactivation strategy employed in our PAPA experiments.

      To further address the reviewer's concern, we performed a frame cut-off analysis for our SMT movies of endogenous RARα-Halo and over-expressed RARα-Halo (Figure S9B). The analysis shows no significant change in the bound fraction of either endogenous or over-expressed RARα-Halo when discarding the initial 1000 frames. Based on these results, we conclude that the pre-bleaching does not lead to an overestimation of the unbound fraction, and that our experimental approach is robust.

      (3) Heterogeneous expression of the SNAP fusion proteins:

      The cell lines expressing SNAP tagged transgenes shown in Fig S6 have very heterogeneous expression of the SNAP proteins. While the bulk measurements done by Western blotting are useful, while doing single-cell experiments (especially with small numbers - ~20 - of cells), it is important to control for expression levels. Since these transgenic stable lines were not FACS sorted, it would be helpful for the reader to know the spread in the distribution of mean intensities of the SNAP proteins for the cells that the SMT data are presented for. This step is crucial while claiming the absence of an effect upon over-expression and can easily be done with a SNAPTag ligand such as SF650 using the procedure outlined for the over-expressed HaloTag proteins.

      We agree with the reviewer that there is heterogeneity in SNAP protein expression across the transgenic lines. In response to the reviewer’s suggestion, we performed the proposed experiment to assess the distribution of mean intensities for two key experimental conditions: Halo-RXRα with overexpressed RARα-SNAP and HaloRXRα with overexpressed RARαRR-SNAP. These results again confirm that the increase in chromatin-bound fraction of Halo-RXRα is observed only in the presence of RARα capable of heterodimerizing with RXRα, supporting our main conclusion (Figure S9).

      For these experiments, we followed the same labelling procedure described in the methods section for tracking endogenous Halo-tagged proteins alongside transgenic SNAP proteins. As shown in Figure S9, for ~ 70 cell nuclei, the distribution of mean intensities is similar for both conditions, with the bound fraction of Halo-RXRα significantly increasing in the presence of RARα-SNAP compared to RARαRR-SNAP. This analysis underscores that the observed effects are indeed due to the functional differences between the two RARα variants rather than variability in expression levels.

      (4) Definition of bound molecules:

      The authors state that molecules with a diffusion coefficient less than 0.15 um2/s are considered bound and those between 1-15 um2/s are considered unbound. Clarification is needed on how this threshold was determined. In previous publications using saSPT, the authors have used a cutoff of 0.1 um2/s (for example, PMID 36066004, 36322456). Do the results rely on a specific cutoff? A diffusion coefficient by itself is only a useful measure of normal diffusion. Bound molecules are unlikely to be undergoing Brownian motion, but the state array method implemented here does not seem to account for non-normal diffusive modes. How valid is this assumption here?

      We acknowledge the inconsistency in the diffusion coefficient thresholds for defining the chromatin-bound fraction used across our group’s publications. The choice of threshold or cutoff (0.1 µm²/s vs 0.15 µm²/s) is largely arbitrary and does not significantly impact the results. To validate this, we tested the effect of different cutoffs on fbound (%) for endogenously expressed Halo-tagged RARα and RXRα (Figure S10). As shown in Figure S10, there was no substantial difference in fbound (%) calculated using a 0.1 µm²/s versus 0.15 µm²/s cutoff (e.g., RARα clone c156: 47±1% vs 49±1%; RXRα clone D6: 34±1% vs 35±1%). 

      Since we have consistently applied the 0.15 µm²/s cutoff throughout this manuscript across all experimental conditions, the comparative analysis of fbound (%) remains valid. While we agree that a Brownian diffusion model may not fully capture the motion of bound molecules, our state array model accounts for localization error, which likely incorporates some of the chromatin motion features. Moreover, the distinction between bound (<0.15 µm²/s) and unbound (1-15 µm²/s) populations is sufficiently large that using a normal diffusion model is reasonable for our analysis.

      (5) Movies:

      Since this is an imaging manuscript, I request the authors to provide representative movies for all the presented conditions. This is an essential component for a reader to evaluate the data and for them to benchmark their own images if they are to try to reproduce these findings.

      We have now included representative movies for all the SMT experimental conditions presented in the manuscript. Please see data availability section of the manuscript.

      (6) Definition of an ROI:

      The authors state that "ROI of random size but with maximum possible area was selected to fit into the interior of the nuclei" while imaging. However, the readout speed of the Andor iXon Ultra 897 depends on the size of the defined ROI. If the ROI was variable for every movie, how do the authors ensure the same sampling rate?

      We used the frame transfer mode on the Andor iXon Ultra 897 camera for our acquisitions, which allows for fast frame rate measurements without altering the exposure time between frames. Additionally, we verified the metadata of all our movies to ensure a consistent frame interval of 7.4 ms across all conditions. This confirms that the sampling rate was maintained uniformly, despite the variability in ROI size. 

      Reviewer #2 (Recommendations For The Authors):

      (1) 'Hoechst' is mis-spelled.

      We have now corrected this typo in the manuscript.

      (2) Cos7 appears in several places throughout the text. I assume this is a typo. If so, please correct it. If not, please explain if some experiments were done in Cos7 cells and kindly provide a justification for that.

      The use of Cos7 cells is intentional and not a typo. Cos7 cells have been previously utilized in studies investigating the interaction between T2NRs (Kliewer et al., 1992, Nature, doi: 10.1038/355446a0). In our study, due to technical issues with antibodies for coIP in U2OS cells, we initially used Cos7 cells for control experiments to verify that Halo-tagging of RARα and RXRα did not disrupt their interaction, by transiently expressing the constructs in Cos7 cells. Following these control experiments, we confirmed the direct interaction of endogenously expressed RAR and RXR in U2OS cells with their respective binding partners using the SMT-PAPA assay. Since these results confirmed that Halo-tagging did not interfere with RAR-RXR interactions, we chose not to repeat the coIP experiments in U2OS cells.

      Reviewer #3 (Public Review):

      Summary:

      This study aims to investigate the stoichiometric effect between core factors and partners forming the heterodimeric transcription factor network in living cells at endogenous expression levels. Using state-of-the-art single-molecule analysis techniques, the authors tracked individual RARα and RXRα molecules labeled by HALO-tag knock-in. They discovered an asymmetric response to the overexpression of counter-partners. Specifically, the fact that an increase in RARα did not lead to an increase in RXRα chromatin binding is incompatible with the previous competitive core model. Furthermore, by using a technique that visualizes only molecules proximal to partners, they directly linked transcription factor heterodimerization to chromatin binding.

      Strengths:

      The carefully designed experiments, from knock-in cell constructions to singlemolecule imaging analysis, strengthen the evidence of the stoichiometric perturbation response of endogenous proteins. The novel finding that RXR, previously thought to be a target of competition among partners, is in excess provides new insight into key factors in dimerization network regulation. By combining the cutting-edge single-molecule imaging analysis with the technique for detecting interactions developed by the authors' group, they have directly illustrated the relationship between the physical interactions of dimeric transcription factors and chromatin binding. This has enabled interaction analysis in live cells that was challenging in single-molecule imaging, proving it is a powerful tool for studying endogenous proteins.

      Weaknesses:

      As the authors have mentioned, they have not investigated the effects of other T2NRs or RXR isoforms. These invisible factors leave room for interpretation regarding the origin of chromatin binding of endogenous proteins (Recommendations 4). In the PAPA experiments, overexpressed factors are visualized, but changes in chromatin binding of endogenous proteins due to interactions with the overexpressed proteins have not been investigated. This might be tested by reversing the fluorescent ligands for the Sender and Receiver. Additionally, the PAPA experiments are likely to be strengthened by control experiments (Recommendations 5).

      We agree that this would be an interesting experiment. However, there are three technical challenges that complicate its implementation: First, as demonstrated in our original PAPA paper, dark state formation is less efficient when dyes are conjugated to Halo compared to SNAPf, making the reverse configuration less optimal. Second, SNAPf-tagged proteins have slower labeling kinetics than Halotagged proteins, often resulting in under-labeling of SNAPf. Third, our SNAPf transgenes were integrated polyclonally. Since background PAPA scales with the concentration of the sender-labeled protein, variable concentrations of the senderlabeled SNAPf proteins would introduce significant variability, complicating the interpretation of the background PAPA signal. Due to these concerns, we believe that performing reciprocal measurements with reversed fluorescent ligands may not yield reliable results. 

      Reviewer #3 (Recommendations For The Authors):

      (1) The term "Surprising features" in the title is ambiguous and may force readers to search for what it specifically refers to. Including a word that evokes specific features might be helpful.

      Our findings contradict previous work, which suggested that chromatin binding of T2NRs is regulated by competition for a limited pool of RXR. In contrast, we found that RAR expression can limit RXR chromatin binding, but not the other way around, which challenges the existing model. This unexpected result is what we refer to as a "surprising feature" in our title, and we believe it accurately reflects the novel insights our study provides. We also think that this is clearly conveyed in our manuscript abstract, supporting the use of "Surprising features" in the title. 

      (2) p.3, line 11 - The threshold of 0.15 μm2s-1 seems to be a crucial value directly linked to the value of fbound. What is the rationale for choosing this specific value? If consistent conclusions can be obtained using threshold values that are similar but different, it would strengthen the robustness of the results.

      Please refer to our response to Reviewer #2’s Public Review point 4. The threshold choice is arbitrary and doesn’t affect the overall conclusions. To test this, we compared fbound (%) values calculated using both 0.1 μm²s-1 and 0.15 μm²s-1 cutoffs. For example, with endogenously expressed Halo-tagged RARα (clone c156), we observed fbound values of 47±1% vs 49±1%, and for RXRα (clone D6), 34±1% vs 35±1%, respectively (Figure S10). Since we have consistently applied the 0.15 μm²s-1 cutoff across all experimental conditions in this manuscript, the comparisons of fbound (%) between different conditions are robust and valid.

      (3) p.4, line 13 - "the fbound of endogenous RARα-Halo (47{plus minus}1%) was largely unchanged upon expression of SNAP (47{plus minus}1%)" part of the sentence is not surprising. It would make more sense if it were expressed as "the fbound of endogenous RARα-Halo (47{plus minus}1%) was largely unchanged upon expression of RXRα-SNAP (49{plus minus}1%), consistent with the control SNAP (47{plus minus}1%).".

      We understand how the original phrasing may be confusing to the readers and have restructured the sentence as suggested by the reviewer for clarity.

      (4) p.6, line 26 - The discussion that "most chromatin binding of endogenous RXRα in U2OS cells depends on heterodimerization partners other than RARα" seems to contradict the top right figure in Figure 4. If that's the case, the binding partner for the bound red molecule might be yellow rather than blue. Given a decrease in the number of RARα molecules with an unchanged binding ratio, the total number of binding molecules has decreased. Could it be interpreted that the potential reduction in RXRα chromatin binding, accompanying the decrease in binding RARα, is compensated for by other partners?

      We agree with the reviewer that both the yellow and blue molecules in Figure 4 represent T2NRs that can heterodimerize with RXR. For simplicity, we chose to omit the depiction of RXR dimerization with other T2NRs (represented in yellow) in Figure 4. We have now included a note in the figure caption to clarify this. We plan to follow up on the buffer capacity of RXR with other T2NRs in a separate manuscript and will discuss this aspect in more detail once we have data from those experiments.

      (5) Fig. 3 - I expected that DR localizations always appear more frequently than PAPA localizations by the difference in the number of distal molecules. Why does the linear line for SNAP-RXRα in Fig. 3 B have a slope exceeding 1? Also, although the sublinearity is attributed to binding saturation, is there any possibility that this sublinearity originates from the PAPA system like the saturation of PAPA reactivation? Control samples like Halo-SNAPf-3xNLS might address these concerns.

      The number of DR and PAPA localizations depends on the arbitrarily chosen intensity and duration of green and violet light pulses. For any given protein pair, different experimental settings can result in PAPA localizations being greater than, less than, or equal to the number of DR localizations. Therefore, the informative metric is not the absolute number of DR and PAPA localizations, but rather how the ratio of PAPA to DR localizations changes between different conditions—such as between interacting pairs and non-interacting controls.

      Regarding the sublinearity, we agree that it is essential to consider whether the observed sublinearity might stem from saturation of the PAPA signal. We know of two ways in which this could occur:

      First, PAPA can be saturated as the duration of the green light pulse increases and dark-state complexes are depleted. However, this cannot explain the nonlinearity that we observe, because the duration of the green light pulse is constant, and thus the probability that a given complex is reactivated by PAPA is also constant. Likewise, holding the violet pulse duration constant yields a constant probability that a given molecule is reactivated by DR. PAPA localizations are expected to scale linearly with the number of complexes, while DR localizations are expected to scale linearly with the total number of molecules. Sublinear scaling of PAPA localizations with DR localizations thus implies that the number of complexes scales sublinearly with the total concentration of the protein.

      Second, saturation could occur if PAPA localizations are undercounted compared to DR localizations. While this is a valid concern, we consider it unlikely in this case because 1) our localization density is below the level at which our tracking algorithm typically undercounts localizations, and 2) we observe sublinearity for RXR → RAR PAPA even though the number of PAPA localizations is lower than the DR localizations; undercounting due to excessive localization density would be expected to introduce the opposite bias in this case.

      (6) Fig. 4 - The differences between A, B, and C on the right side of the model are subtle, making it difficult to discern where to see. Emphasizing the difference in molecule numbers or grouping free molecules at the top might help clarify these distinctions.

      We appreciate the reviewer’s feedback. In response, we have revised Figure 4 by grouping the free molecules on the top right side for panels A, B and C, as suggested.

      (7) While the main results are obtained through single-molecule imaging, no singlemolecule fluorescence images or trajectory plots are provided. Even just for representative conditions, these could serve as a guide for readers trying to reproduce the experiments with different custom-build microscope setups. Also, considering data availability, depositing the source data might be necessary, at least for the diffusion spectra.

      We have now included representative movies for all the presented SMT conditions as source data. Please see data availability section of the manuscript.

      (8) Tick lines are not visible on many of the graph axes. 

      We have revised the figures to ensure that the tick lines are now clearly visible on all graph axes.

      (9) Inconsistencies in the formatting are present in the methods, such as "hrs" vs. "hours", spacing between numbers and units, and "MgCl2". "u" should be "μ" and "x" should be "×". 

      We have corrected the formatting errors.

      (10) Table S4, rows 16 and 17 - Are "RAR"s typos for "RXR"s? 

      We have corrected this in the manuscript.

      (11) p.10~12 - Are three "Hoestch"s typos for "Hoechst"s? 

      This is now corrected in the manuscript.

      (12) p.11, line 17 - According to the referenced paper, the abbreviation should be "HILO" in all capital letters, not "HiLO". 

      This is now corrected in the manuscript.

      (13) "%" on p.3, line 18, and "." on p.6, line 27 are missing. 

      This missing “%”  and “.” are now added.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Yao S. and colleagues aims to monitor the potential autosomal regulatory role of the master regulator of X chromosome inactivation, the Xist long non-coding RNA. It has recently become apparent that in the human system, Xist RNA can not only spread in cis on the future inactive X chromosome but also reach some autosomal regions where it recruits transcriptional repression and Polycomb marking. Previous work has also reported that Xist RNA can show a diffused signal in some biological contexts in FISH experiments.

      In this study, the authors investigate whether Xist represses autosomal loci in differentiating female mouse embryonic stem cells (ESCs) and somatic mouse embryonic fibroblasts (MEFs). They perform a time course of ESC differentiation followed by Capture Hybridization of Associated RNA Targets (CHART) on both female and male ESCs, as well as pulldowns with sense oligos for Xist. The authors also examine transcriptional activity through RNA-seq and integrate this data with prior ChIP-seq experiments. Additional experiments were conducted in MEFs and Xist-ΔB repeat mutants, the latter fails to recruit Polycomb repressors.

      Based on this experimental design, the authors make several bold claims:

      (1) Xist binds to about a hundred specific autosomal regions.

      (2) This binding is specific to promoter regions rather than broad spreading.

      (3) Xist autosomal signal is inversely correlated with PRC1/2 marks but positively correlated with transcription.

      (4) Xist targeting results in the attenuation of transcription at autosomal regions.

      (5) The B-repeat region is important for autosomal Xist binding and gene repression.

      (6) Xist binding to autosomal regions also occurs in somatic cells but does not lead to gene repression.

      Together, these claims suggest that Xist might play a role in modulating the expression of autosomal genes in specific developmental and cellular contexts in mice.

      Strengths:

      This paper deals with an interesting hypothesis that Xist ncRNA can also function at autosomal loci.

      Weaknesses: The claims reported in this paper are largely unsubstantiated by the data, with multiple misinterpretations, lacking controls, and inadequate statistics. Fundamental flaws in the experimental design/analysis preclude the validity of the findings. Major concerns are listed below: (1) The entire paper is based on the CHART observation that Xist is specifically targeted to autosomal promoters. Overall, the data analysis is flawed and does not support such conclusions. Importantly the sense WT and the 0h controls are not used, nor are the biological replicates. 

      We respectfully disagree with Rev1 but nevertheless thank the reviewer for making some suggestions that helped to strengthen our manuscript.  We have provided new experiments and analyses in the revised manuscript. Please see responses below.

      Rev1 seems to have missed or misunderstood some key experiments. In fact, the sense WT and 0h controls were shown. Furthermore, we included at least two biological replicates for each experiment.

      We used both male ES cells (which do not express Xist) and sense probes as key negative controls, as outlined in Figure S1. Crucially, we only analyzed peaks that were reproducible between biological replicates. The Xist CHART peaks in differentiating female ES cells were significantly enriched above the “background” defined by the sense probe and male controls. Specifically, in comparison to undifferentiated female ES cells (day 0) where both X chromosomes are active and Xist is not induced, Xist CHART robustly pulled down the X chromosome during cell differentiation (day 4, day 7, and day 14). In contrast, male ES cells showed no significant pull-down of the X chromosome, and the sense group also exhibited markedly reduced binding (new Figure S1B). Furthermore, Principal Component Analysis (PCA) of CHART-seq reads (day 4 as an example) include Xist, sense, and input in WT and ΔRepB female, further confirmed that the sense probe CHART was clearly distinguishable from Xist CHART signals. Please see revised Figure S1C. Together, these findings underscore the specificity and robustness of our CHART results.

      Data is typically visualized without quantification, and when quantified, control loci/gene sets are erroneously selected. Firstly, CHART validation on the X in FigS1 is misleading and not based on any quantifications (e.g., see the scale on Kdm6a (0-190) compared to Cdkl5 (0-40)). If scaled appropriately, there is Xist signal on the escapee. 

      Rev1 may have misread the presented data. In the example raised by Rev1, Fig. S1 is inherently quantitative: e.g., a ratio is a number in Fig. S1A (now Fig. S1B) and all gene tracks in Fig. 1B-E are shown with scales. We showed X-linked genes in Fig. S1 (now Fig. S2) as a control to demonstrate that the CHART worked and that Xist accumulated over time from day 0 to day 14. Our new Figure 1B demonstrates the Xist accumulation in graph format. 

      Our paper focuses on Xist autosomal binding sites. Thus, the X-linked examples were placed in the supplement. Escapee genes do in fact accumulate Xist at their promoter regions and this finding is consistent with data published by Simon et al. (2013, Nature). It was therefore not desirable in this paper to reanalyze X-linked genes, including escapees. Nevertheless, to address the reviewer’s concerns, we present new data in new Figure S3A. Here we analyzed the density of Xist binding across X-linked genes, including both active and inactive genes, as well as escapee genes. From this quantitative analysis, it should be clear that escapees do bind Xist. However, from the metagene plots in Figure S3B, we confirm the previous conclusion that escapees bind Xist at high levels just upstream of the promoter and that there is a depletion of Xist in the escapee gene body, consistent with a barrier preventing Xist from moving into the active gene. 

      All X-linked loci should have been quantified and classified based on escape status; sense control should also be quantified, and biological replicates should be shown separately. 

      Please see above response.

      Additionally, in the revised manuscript, we have examined the Irreproducible Discovery Rate (IDR) to validate the reproducibility of peaks between the two replicates in the revised version, and we included a representative example from female WT ES cells at day 4 (revised Figure S4A). The results showed a strong correlation between the replicates, with an IDR threshold of 0.05 (red point > 0.05). As described in the Methods section, to ensure reliable and robust peak identification, we performed peak calling (MACS2) separately on each replicate, and then used bedtools intersect to identify peaks that overlapped between the two replicates. This stringent process, including strict q-value settings in MACS2, ensures the reliability and reproducibility of the peaks presented in this study.

      Secondly, and most importantly, Figure 1 does not convincingly show specific Xist autosomal binding. Panel A quantification is on extremely variable y-scales and actually shows that Xist is recruited globally to nearly all autosomal genes, likely indicating an unspecific signal. Again, the sense and 0h controls should have been quantified along with biological replicates. 

      Figure 1 shows heatmaps and corresponding metagenes for d0, d4, d7, and d14 female ES cells. Two biological replicates are analyzed. In our revised manuscript, we have used Pearson and Spearman correlation coefficients to measure the strength and direction of a relationship between two biological replicates and shown that the two replicates have high reproducibility (new Figure S1A). On d0, the Xist coverage on autosomes and X chromosome is low, but there is a clear increase on d4, d7, and d14, particularly at the TSS of autosomal genes, as shown by the metagene plots on in Figure 1A-B and the CHART density maps in new Figure 1E-F. We also show relative depletion of Xist signals in the male and sense negative controls.

      Upon inspecting genome browser tracks of all regions reported in the manuscript (Rbm14, Srp9, Brf1, Cand2, Thra, Kmt2c, Kmt2e, Stau2, and Bcl7b), the signal is unspecific on all sites with the possible exception of Kmt2e. On all other loci, there is either a strong signal in the 0h ESC controls or more signal in some of the sense controls. This implies that peak calling is picking up false positive regions. How many peaks would have been picked up if the sense or the 0h controls were used for peak calling? It is likely that there would be a lot since there are also possible "peaks" (e.g., Fzd9) in control tracks. 

      The analysis cannot be performed by visual inspection. A statistical analysis must be performed to call signal above noise. This is why we performed peak-calling on two biological replicates and identified overlapping peaks using bedtools intersect to improve reliability. Significant peaks are noted as black bars under each track. As mentioned above, for our analysis, we focused on the top 100 peaks based on peak scores to ensure robustness. Xist has significantly higher signal compared to the sense probe in the Xist-autosomal peak regions (revised Figure 1E-F). Additionally, we conducted peak calling on undifferentiated ES cells (d0) and detected a significantly higher number of peaks (~600) compared to the differentiated states (d4 or d7) (~100).

      Single-cell sequencing studies have shown that about 2% of undifferentiated mESCs express detectable Xist (Pacini et al., Nat Commun, 2021). The Xist peaks in “day 0” cells may be due to the differentiating population.

      Further inspection of the data was not possible as the authors did not provide access to the raw fastq files. When inspecting results from past published experiments {Engreitz, 2013 #1839} reported regions were not bound by Xist. 

      On the contrary, we deposited the raw data files to GEO prior to the submission of the paper and included the reviewer link to access them. As of August 24, 2024, GEO publicly released these files, allowing for full inspection of the data. 

      Regarding the Engreitz publication, it is not recommended to compare our current study to their analysis for the crucial reason that the Engreitz study was not conducted under physiological conditions. The authors overexpressed the Xist gene in male ES cells. Because Xist RNA can silence genes in male cells as well, this ectopic overexpression normally leads to cell death — thus forcing examination of effects in a narrow time window before Xist can fully spread and act across the genome. Comparing our experiments (endogenous Xist expression in female ES cells) to the ectopic overexpression in male ES cells of Engreitz et al. should therefore not be undertaken.

      Thirdly, contrary to the authors' claim, deleting the B repeat does not lead to a loss of autosomal signal. Indeed, comparing Fig1A and Fig2B side by side clearly shows no difference in the autosomal signal, likely because the autosomal signal is CHART background. Properly quantifying the signal with separate replicates as well as the sense and 0h controls is vital. Overall current data together with published results indicate that CHART peak calling on autosomes is due to technical noise or artefacts.

      In our revised manuscript, we have included the quantitative results as mentioned above in the main and supplementary figure (new Figure 1E-F, Figure 2E-F, and S3A). The data clearly show an enrichment in the Xist CHART samples in differentiating female ES cells.

      We believe the reviewer may be comparing the original Figure 1A and Figure 2A (not Figure 2B). As mentioned above, the analysis cannot be performed by visual inspection. Please see new Figure 2E and 2F. From these data, it should be clear that deleting RepB causes a decrease in Xist targeting to autosomal loci.

      (2) The RNA-seq analysis is also flawed and precludes strong statements. Firstly, the analysis frequently lacks statistical analysis (Fig3B, FigS2B-C) and is often based on visualizations (Fig 3D-G) without quantifications. Day 4 B-repeat deletion does not lead to a significant change in the expression of genes close to Xist signal (Fig3H, d14 does not fully show). 

      Please see new revised Figure 3B and Figures S2B-C (now revised as Figures S6A and S6B). 

      Secondly, for all transcriptional analysis, it is important to show autosomal non-target genes, which is not always done. 

      In the revised manuscript, we included non-target genes for each analysis (new Figure 4E-F, 5D and 5F, 7C and 7E, S7F, S8).

      Indeed, both males and B repeat deletion will lead to transcriptional changes on autosomes as a secondary effect from different X inactivation status. The control set, if used, is inappropriate as it compares one randomly selected set of ~100 genes. This introduces sampling error and compares different classes of genes. Since Xist signal targets more active genes, it is important to always compare autosomal target genes to all other autosomal genes with similar basal expression patterns.

      Please see new Figure S8. We included 100 randomly selected non-target sites on autosomes for this comparative analysis. For consistency, we applied the same flanking regions (10 kb) in the analysis of both target and non-target genes. We believe that this selection method for nontargets is appropriate for two reasons: first, it allows us to control for Xist binding and non-binding; second, it ensures a similar number of genes in both groups, providing a robust foundation for statistical analysis. 

      (3) The ChIP-seq analysis also has some problems. The authors claim that there is no positive correlation between genes close to Xist autosomal binding (10kb) compared to those 50kb away (Fig 3C, S2D); however, this analysis is based entirely on metagene visualization. Signal within the Xist binding sites should be quantified (not genes close by) and compared to other types of genomic loci and promoters. Focusing on the 50kb group only as controls is misleading.

      We believe the reviewer may have misunderstood our conclusions. As stated in the paper, we observed lower coverage of the histone marks H3K27me3 and H2AK119ub, associated with PRC2 and PRC1, respectively. Our conclusions regarding PRC1/2 support the RNA-seq results, indicating that Xist tends to bind to actively expressed genes. In other words, these genes exhibit lower levels of PRC-mediated silencing signals. This observation underscores the relationship between Xist binding and gene activity, highlighting that Xist preferentially associates with regions that are less subject to silencing by polycomb repressive complexes.

      Secondly, the authors only look at PRC mark signal upon differentiation; what about the 0h timepoint, i.e., is there pre-marking? 

      Day 0 is not an appropriate timepoint for this analysis because Xist is not yet induced. There is also a small fraction of cells (<5%) that spontaneously differentiate and start to undergo XCI. Because of these reasons, the day 0 timepoint is considered somewhat heterogeneous and it would be difficult to make conclusions regarding Xist peaks in these samples.

      Most worryingly, the data analysis is not consistent between figures (see Fig3C vs 5H-I). In Fig5, the group of Xist targets was chosen as those within 100kb of Xist binding, which would encompass all the control regions from Fig3C. In this analysis, the authors report that there is Xist-dependent H3K27me3 deposition, and in fact, here the Xist autosomal targets have more of it than the controls. Overall, all of this analysis is misleading, and clear conclusions cannot be made.

      We believe that the reviewer may have also misunderstood the analysis in Figure 5. Figure 5 shows the effect of the Xist inhibitor, X1, on H3K27me3 and gene expression. X1 blocks reduces PRC2 targeting and gene silencing — consistent with X1’s effect on RepA as published in Aguilar et al. 2022. 

      All in all, because the fundamental observation is not robust (see point 1), all subsequent analyses are also affected. There are also multiple other inconsistencies within the analysis; however, they have not been included here for brevity.

      We again respectfully disagree with Rev1 but thank the reviewer for making suggestions that helped to strengthen our manuscript.  We believe that the revised manuscript with new analyses is improved in part because of the reviewer’s critical comments.

      Reviewer #2 (Public review):

      Summary:

      To follow-up on recent reports of Xist-autosome interaction the authors examine female (and male transgenic) mESCs and MEFs by CHARTseq. Upon finding that only 10% of reads map to X, they sought to identify reproducible alternative sites of Xist-binding, and identify ~100 autosomal Xistbinding sites and show a transient impact on expression.

      Strengths:

      The authors address a topical and interesting question with a series of models including developmental timepoints and utilize unbiased approaches (CHARTseq, RNAseq). For the CHARTseq they have controls of both sense probes and male cells; and indeed do detect considerable background with their controls. The use of deletions emphasizes that intact functional Xist is involved. The use of 'metagene' plots provides a visual summation of genic impact.

      Reviewer 2 has made some excellent suggestions. We have revised the manuscript accordingly and are grateful to the reviewer for the recommendations.

      Weaknesses:

      Overall, the result presentation has many 'sample' gene presentations (in contrast to the stronger 'metagene' summation of all genes). The manuscript often relies on discussion of prior X chromosomal studies, while the data generated would allow assessment of the X within this study to confirm concordance with prior results using the current methodology/cell lines. 

      Many of the 'follow-up' analyses are in fact reprocessing and comparison of published datasets. The figure legends are limited, and sample size and/or source of control is not always clear. While similar numbers of autosomal Xist-binding sites were often observed, the presented data did not clarify how many were consistent across time-points/cell types. While there were multiple time points/lines assessed, only 2 replicates were generally done.

      We apologize for the deficiencies in the legend.  The revised manuscript has corrected them.

      We generated many new datasets with deep sequencing, with at least two biological replicates for each. Such experiments are extremely expensive by nature. Thus, two biological replicates are typically considered acceptable.

      Additionally, we performed reanalysis of published datasets to test whether — in the hands of other investigators — cell lines expressing Xist also supported autosomal targeting. Figure 4 is a case in point. Here we examined Tg1 and Tg2, which respond to doxycycline to overexpress Xist from an ectopic site. Transcriptomic analysis showed significant downregulation of autosomal Xist targets, as exemplified by Rbm14 and Bcl7b (new Figure 4C, S9B). In contrast, non-targets of Xist such as Stau1 did not demonstrate significant changes in gene expression (new Figure 4E and 4G). Looking across all autosomal target genes, we observed a significant decrease in mean expression in the Xist overexpressing cell lines (new Figure 4D). The fact that the autosomal changes were also observed in datasets generated by other investigators greatly strengthen our conclusions. 

      Aim achievement:

      The authors do identify autosomal sites with enrichment of chromatin marks and evidence of silencing. More details regarding sample size and controls (both treatment, and most importantly choice of 'non-targets' - discussed in comments to authors) are required to determine if the results support the conclusions.

      Specific scenarios for which I am concerned about the strength of evidence underlying the conclusion:

      I found the conclusion "Thus, RepB is required not only for Xist to localize to the X- chromosome but also for its localization to the ~100 autosomal genes " (p5) in constrast to the statement 2 lines prior: "A similar number of Xist peaks across autosomes in ΔRepB cells was observed and the autosomal targets remained similar". Some quantitative statistics would assist in determining impact, both on autosomes and also X; perhaps similar to the quintile analysis done for expression.

      We have added the Xist coverage panel for day 4 and 7 in the identified Xist-autosomal peak regions (new Figure 1E-F, Figure 2E-F), as mentioned above. The results clearly demonstrate that the deletion of RepB decreases Xist binding to autosomes. Also, we showed that ΔRepB increased X-linked genes expression in our revised Figure 3D. 

      It is stated that there is a significant suppression of X-linked genes with the autosomal transgenes; however, only an example is shown in Figure 4B. To support this statement, a full X chromosomal geneset should be shown in panels F and G, which should also list the number of replicates. 

      Please see new Figure 4B.

      As these are hybrid cells, perhaps allelic suppression could be monitored? Is Med14 usually subject to X inactivation in the Ctrl cells, and is the expression reduced from both X chromosomes or preferentially the active (or inactive) X chromosome?

      If Rev2 is referring to Figure 4, the dataset used in Figure 4 comes from another research group and was previously published (Loda, A. et al. Nat Commun, 2017).

      If Rev2 is referring to our ES cells, they are N2 cell lines.  The X chromosomes are fully hybridized (Cas/Mus), but the autosomes are not fully hybridized (Ogawa et al., Science, 2008). Med14 is subject to XCI and is expressed from the Xa, silenced on the Xi. 

      The expression change for autosomes after transgene induction is barely significant; and it was not clear what was used as the Ctrl? This is a critical comparator as doxycycline alone can change expression patterns.

      We agree that there was a modest change in expression after transgene induction, but it is a significant change. Again, the dataset is from a published study where the authors generated doxycycline-responsive Xist transgenes (see above). The control in this case is Dox-treated wildtype cells. We now clarify these points.

      In the discussion there is the statement. "Genetic analysis coupled to transcriptomic analysis showed that Xist down-regulates the target autosomal genes without silencing them. This effect leads to clear sex difference - where female cells express the ~100 or so autosomal genes at a lower level than male cells (Figure 7H)." This sweeping statement fails to include that in MEFs there is no significant expression difference, in transgenics only borderline significance, and at d14 no significant expression difference. The down-regulation overall seems to be transient during development while targeting is ongoing?

      Indeed, the Xist effects on autosomes seem to occur during cell differentiation in ES cells. While there is no apparent effect in MEFs, we cannot exclude effects on other somatic cells. Regardless of whether the effects are in early development or throughout life, the sex differences may have life-long effects in mammals. The study conducted in human cells by the Plath lab also concluded that the differences primarily affect stem cells.

      Finally, I would have liked to see discussion of the consistency of the identified genes to support the conclusion that the autosomal sites are not merely the results of Xist diffusion.

      We address this in the third paragraph of the Discussion. Our main argument is that if autosomal binding were caused by diffusion, then RepB deletion or X1 treatment would have led to increased binding at autosomal sites, as Xist would bind less to the X chromosome. However, as demonstrated in our study, both treatments resulted in reduced Xist binding on both the X chromosome and autosomes. This finding suggests that the binding is specific and reliant on Xist's RepA and RepB domains, rather than being a passive diffusion process.

      To examine overlap between the conditions (days of differentiation and WT/RepB cells), we generated Venn Diagrams as now shown in Figure S4E.

      The impact of Xist on autosomes is important for consideration of impact of changes in Xist expression with disease (notably cancers). Knowing the targets (if consistent) would enable assessment of such impact.

      We thank Rev2 for the very helpful review and for the forward-looking experiments. Indeed, the physiological changes brought on by autosomal targeting will be of future interest.

      Reviewer #3 (Public review):

      Summary:

      Yao et al use CHART to identify chromatin associated with Xist in female mouse ESCs, and, as control, male ESCs at various timepoints of differentiation. Besides binding of Xist to X chromosome regions they found significant binding to autosomes, concentrating mostly on promoter regions of around 100 autosomal genes, as elucidated by MACS. The authors went on to show that the RepB repeat is mostly responsible for these autosomal interactions using a female ESC line in which RepB is deleted. Evidence is provided that Xist interacts with active autosomal genes containing lower coverage of repressive marks H3K27me3 and H2AK119ub and that RepB dependent Xist binding leads to dampening of expression, but not silencing of autosomal genes. These results were confirmed by overexpression studies using transgenic ESCs with doxycycline-inducible Xist as well as via a small molecule inhibitor of Xist (X1), inducing/inhibiting the dampening of autosomal genes, respectively. Finally, using MEFs and Xist mutants RepB or RepE the authors provide evidence that Xist is bound to autosomal genes in cells after the XCI process but appears not to affect gene expression. The data presented appear generally clear and consistent and indicate some differences between human and mouse autosomal regulation by Xist. Thus, these results are timely and should be published.

      We thank Rev3 for the positive remarks and great suggestions.  We have amended the manuscript per below. 

      Strengths:

      Regulation of autosomal gene expression by Xist is a "big deal" as misregulation of this lncRNA causes developmental defects and human disease. Moreover, this finding may explain sexspecific developmental differences between the sexes. The results in this manuscript identify specific mouse autosomal genes bound by Xist and decipher critical Xist regions that mediate this binding and gene dampening. The methods used in this study are appropriate, and the overall data presented appear convincing and are consistent, indicating some differences between human and mouse autosomal regulation by Xist.

      Weaknesses:

      (1) The figure legends and/or descriptions of data are often very short lacking detail, and this unnecessarily impedes the reading of the manuscript, in particular the figures would benefit not only from more detailed descriptions/explanations of what has been done but also what is shown. 

      We have included more detailed descriptions in the figure legends and throughout the manuscript.

      This will facilitate the reading and overall comprehension by the reader. One out of many examples: In Fig S1B in the CHART data at d4 and d7 there is not only signal in female WT Xist antisense but also in female sense control. For a reader that is not an expert in XCI it would be helpful to point out in the legend that this signal corresponds to the lncRNA Tsix (I suppose), that is transcribed on the other strand.

      We thank the reviewer for this excellent point.  We have amended the Results section accordingly.

      (2) Different scales are used in the lower panels of Figures 1A and 2A, which makes it difficult to directly compare signals between the different differentiation stages.

      We have included a figure combining all timepoints — d0, d4, d7, and d14 WT female Xist CHART signals  — on the X chromosome and autosomes to support our thesis. Please see new Figure 1B.

      (3) In this study some of the findings on mouse cells contrast previously published results in human ESCs: 1) Xist binding occurs preferentially to promoters in mice, not in human. 2) Binding of Xist is mostly detected in polycomb-depleted regions in mice but there is a positive correlation between Xist RNA and PRC2 marks in human ESCs. These differences are surprising but may be very interesting and relevant. While I am aware that this might be a difficult task, it would be helpful to experimentally address this issue in order to distinguish whether species specific and/or methodological differences between the studies are responsible for these differences.

      Indeed, our findings in mouse cells contrast with those observed in humans. As discussed in the manuscript, this discrepancy may be attributed to factors such as cell type, differentiation methods, and the Xist pull-down technique employed (our CHART method utilizes a 20 nt oligo library, whereas RAP uses long oligos). We agree that future work should investigate the underlying causes of these differences between mouse and human systems.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      For Figure 2: labelling ∆B on the panel A timeline (e.g. d0-∆B) would make the results clearer for the audience. Panel B makes most sense beside panel E of Figure 1, so combine here and skip in Figure 1?

      We have modified Figure 2A and thank Rev2 for this suggestion. As for the embedded tables: since we performed peak calling for WT and ∆B separately, we believe that showing both the peak numbers and their corresponding peak patterns provides a clearer representation of the data.

      I agree that at day 7 there appears to be a difference in X; but by day 14 this looks much more minimal - is it just time-shifted rather than altered? Perhaps this could be discussed. Autosomal binding sites show no change in number.

      Day 7 exhibits the strongest Xist binding on the X chromosome, consistent with the de novo establishment phase of XCI when Xist is expressed at the highest levels (300 copies/cell during de novo XCI versus ~100 copies/cell during maintenance [Sunwoo et al., 2015 as cited]. Per our RNA-seq analysis here, we also observed highest Xist expression on day 7 and reduced levels on day 14 (Fig. S5A). This expression difference explains the reduced Xist CHART levels on day 14 compared to day 7. 

      While the X has previously been examined, it would seem beneficial to conduct the same expression analyses (Figure 3) for the X (perhaps supplemental), as the authors have the data 'in hand'. I feel comparison to X in the main figure for panels A and B would fit, while a similar analysis for the X for panel C could be supplemental, presumably supporting the published data to which this data is currently compared. 

      This is a good suggestion. Please find the new data in Figures 2E-F and 3D, which demonstrate that the RepB deletion inhibits Xist binding on the X chromosome, resulting in increased X-linked gene expression, as previously mentioned. Since Xist binds across the X chromosome, we did not perform peak calling as we did for the autosomes. Therefore, applying a similar analysis as in Figures 3A-B may not be appropriate in this case.

      Such a direct comparison to X-data from the same study would be important. For panel H: How many replicates (2)? This should be in the legend. What is the change in median expression? Again, a supplemental figure showing impact on X-linked targets would be useful. Do male and female ESCs show an expression difference prior to differentiation (ie d0)? The data underlying this Figure should be in one of the supplementary tables, showing the full statistical tests and average change. The supplementary tables 8-12 list the WT target genes, not expression differences with the deletion. Again, given that the difference appears transient, might the ∆B cells be altered in rate of differentiation?

      Panel H (revised Figure 3G) includes two replicates, and this has been added to the legends. We have provided a supplementary figure demonstrating that RepB increases the expression levels of X-linked genes on days 4, 7, and 14 (revised Figure 3D). Male and female ESCs show differences in the expression of X-linked genes, as both X chromosomes are active in females at this stage prior to differentiation (revised Figure S5C). 

      A supplementary table with statistical tests and average change information has been included in our revised version (Table S11).

      On the other hand, these Xist-autosomal target genes displayed no significant differences between WT male, female, or ∆B female cells on day 0 — prior to onset of XCI and Xist expression. Please see new Figure 3H. 

      As for whether ∆B cells are altered in their rate of differentiation, the analysis by Colognori et al. 2019 indicates that ∆B cells differentiate similarly to WT cells. (In Figure 6 of Colognori et al. 2019, autosomal genes expressed similarly in WT and ∆B cells, whereas XCI is affected only in ∆B cells)

      We have also modified the legends for our supplementary tables.

      Why were the transgene lines examined upon neuronal differentiation rather than the same approach as in Figures 1-3? I would have thought neuronal differentiation might be more similar to d14, where limited changes remain? Could the authors clarify and discuss?

      We apologize for the confusion. The Tg lines in Figure 4 came from a previously published study. We performed reanalysis of published datasets because we wanted to test whether — in the hands of other investigators — cell lines expressing Xist also supported autosomal targeting. Here we examined Tg1 and Tg2, which respond to doxycycline to overexpress Xist from an ectopic site. Transcriptomic analysis showed significant downregulation of autosomal Xist targets, as exemplified by Bcl7b and Rbm14 (Figure 4C and S9B). In contrast, non-targets of Xist such as Stau1 did not demonstrate significant changes in gene expression (Figure 4E and 4F). Looking across all autosomal target genes, we observed a significant decrease in mean expression in the Xist overexpressing cell lines (Figure 4D). The fact that the autosomal changes were also observed in datasets generated by other investigators greatly strengthen our conclusions. We have clarified this in the Results section.

      Figure 5 - the legend should specify the number of replicates and clarify the blue/green (intuitive, but not specified). Are the 'target' / 'non-target' genes from d4 Chart (but the RNA from d5)? How are 'non-targets' defined - do they match the 'targets' in certain criteria (expression level, chromatin features, GC content)? Do they change per differentiation protocol?

      We have modified the legends to clarify that the 'target' and 'non-target' genes are derived from the day 4 CHART-seq data, while the RNA data is from day 5, as that study sequenced day 5 and not day 4. Non-targets were randomly chosen based on (i) the absence of Xist binding and (ii) similar expression levels. Please see revised Figure S8.

      It would be helpful to compare Xist expression levels across the various models, and the MEF model could be better described - are they polyploid as often happens?

      We have included the Xist expression levels of ES cells and MEF cells in the revised version (revised Figure S5A, 6D). The transformed MEFs are indeed tetraploid, as is typical.

      For 6A to be informative, one needs to know % mapping to X in ES timeline, which is in supplemental, so perhaps 6A should also be supplemental?

      We have moved 6A to the supplemental figure.

      It is odd that ∆B seems to have had more impact in MEFs, and I would like more discussion - but I also think I am missing something: "We observed that Xist signals were more substantially reduced on both the Xi and autosomal regions in ΔRepE MEFs compared to ΔRepB cells", yet in lower panel 6 G it looks like ∆B is LOWER than ∆E? Am I misinterpreting?

      We apologize for the confusing writing.  The revised text now reads:  “To investigate, we utilized a deletion of Xist’s Repeat E (∆RepE), which was previously demonstrated to severely abrogate localization of Xist to the Xi 41,42. We reasoned that the severe loss of Xist binding might unmask a transcriptomic difference. As expected, we observed that Xist signals were somewhat more reduced on the Xi in ΔRepE MEFs compared to ΔRepB cells (Figure 6E-6F). Despite this reduction, peak coverages in autosomal target genes did not increase in ΔRepE MEFs (Figure 6E-6F). However, there was an overall decrease in the number of significant autosomal peaks in ∆RepE MEFs relative to WT cells (Figure 6A). Regardless, we observed no significant transcriptomic differences in ∆RepE MEFs relative to WT MEFs (Figure 7A-7E). Additionally, further examination of RNA sequencing data from male and female MEF cells in two published studies 43,44 corroborated that the expression levels of these autosomal Xist targets did not exhibit significant changes (Figure 7F and 7G). Altogether, the analysis in MEFs demonstrates that Xist continues to bind autosomal genes in post-XCI somatic cells. However, autosomal binding of Xist in post-XCI cells does not overtly impact expression of the associated autosomal genes. Nonetheless, we cannot exclude more subtle changes that do not meet the significance cut-off.”

      Overall, I would like to see how consistent these autosomal peaks are - I shudder to suggest Venn diagrams, but something to show whether there are day/lineage specific peaks and/or ∆repeat B/E resistant peaks. 

      We now present Venn diagrams comparing MEF, ES_d4, and ES_d7, showing approximately 50% overlap between MEF and ES cells (revised Figure S10B). This may be expected, as each timepoint is a different developmental stage of XCI, with expected gene expression differences.

      Very minor comments:

      It would be easier if the supplemental tables were tabs in 1 file!

      We will defer to the editor on how best to format the supplemental tables.

      Similar to the text, could gene names be included in the supplemental?

      We have provided gene names in the supplemental files.

      Figure 3 legend: should 'representing' be representative?

      We have modified it.

      "Xist patterns identified in human cells" p 5; it is challenging to follow human versus mouse, so specify or ensure correct use of XIST/Xist Indeed, we edited the manuscript accordingly.

      Gene names should be italicized.

      We have italicized gene names in our manuscript.

      Ref. 38 lacks details (...).

      We have updated the reference.

      Peak-like characters - perhaps characteristics? P8

      We have modified this.

      Reviewer #3 (Recommendations for the authors):

      On page 6, the 6th sentence in the first paragraph needs correction. "Consistent with Xist's behavior on the X chromosome."

      We have modified the sentence. Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study by Longhurst et al. investigates the mechanisms of chemoresistance and chemosensitivity towards three compounds that inhibit cell cycle progression: camptothecin, colchicine, and palbociclib. Genome-wide genetic screens were conducted using the HAP1 Cas9 cell line, revealing compound-specific and shared pathways of resistance and sensitivity. The researchers then focused on novel mechanisms that confer resistance to palbociclib, identifying PRC2.1. Genetic and pharmacological disruption of PRC2.1 function, but not related PRC2.2, leads to resistance to palbociclib. The researchers then show that disruption of PRC2.1 function (for example, by MTF2 deletion), results in locus-specific changes in H3K27 methylation and increases in D-type cyclin expression. It is suggested that increased expression of D-type cyclins results in palbociclib resistance.

      Strengths:

      The results of this study are interesting and contribute insights into the molecular mechanisms of CDK4/6 inhibitors. Importantly, while CDK4/6 inhibitors are effective in the clinic, tumour recurrence is very high due to acquired resistance.

      Weaknesses:

      A key resistance mechanism is Rb loss, so it is important to understand if resistance conferred by PRC2.1 loss is mediated by Rb, and whether restoration of PRC2.1 function in Rb-deplete cells results in renewed palbociclib sensitivity. It is also important to understand the clinical implications of the results presented. The inclusion of these data would significantly improve the paper. However, besides some presentation issues and typos as described below, it is my opinion that the results are robust and of broad interest.

      Major questions:

      (1) Is the resistance to CDK4/6 inhibition conferred by mutation of MTF2 mediated by Rb?

      (2) Are mutations in PRC2.1 found in genetic analyses of tumour samples in patients with acquired resistance?

      We thank the reviewer for their editing and experimental suggestions, and have integrated their responses into our re-submitted manuscript.

      We also agree that understanding the role of RB1 in mediating palbociclib resistance to the proposed resistance mechanism is of particular interest. However, as there are three RB proteins expressed in human cells, this is a technically difficult question to probe genetically. Despite this technical challenge, we have provided multiple lines of evidence in our resubmitted manuscript that the resistance to palbociclib observed in our PRC2.1-deficent cells is mediated through the canonical CDK4/6-RB1 pathway. First, disruption of RB1 in HAP1 cells results in palbociclib resistance to a level comparable level to PRC2.1 disruption (Fig. 4E). Second, inactivation of SUZ12 or MTF2 increases the number of cells entering S-phase in palbociclib treatment (Fig. 4G) with no increase in basal rates of apoptosis (Fig. S2D), suggesting that any proliferation advantage observed in PRC2.1-defective cells is due to resistance to  palbociclib-induced cell cycle arrest. Third, we show that over expression of CCND1 and CCND2 is sufficient to drive resistance to palbociclib in wild-type HAP1 cells (Fig. S5F).  And finally, increased levels of CCND1 and CCND2 observed in cells lacking PRC2.1 activity results in higher CDK4/6 activity as measured by RB1 phosphorylation, despite palbociclib blockade (Fig. 6F). All these lines of evidence strongly suggest that MTF2-containing PRC2.1 regulates G1 progression in through the canonical CDK4/6RB1 pathway by repressing CCND1 and CCND2 expression. 

      Whether or not MTF2 deletion leads to palbociclib resistance in clinical samples is also of a question of particular interest. Currently, we are unaware of any reports that specifically mention MTF2 deletion as leading to palbociclib resistance, and we were unable to find another example in our own cancer database review. However, we have included references to other examples of MTF2 mutation resulting in chemotherapeutic resistance in our discussion. Additionally, although MTF2 is rarely observed to be mutated in cancers (Ngubo et al. 2023), it is highly differentially expressed and investigating decreased MTF2 transcription in palbociclib resistant tumors, though challenging, might prove fruitful.  However, as mechanisms of palbociclib resistance is an area of active investigation, we speculate that future studies might uncover additional examples of MTF2 mediating resistance to this clinically important chemotherapeutic.  

      Reviewer #2 (Public Review):

      Summary:

      Longhurst et al. assessed cell cycle regulators using a chemogenetic CRISPR-Cas9 screen in haploid human cell line HAP1. Besides known cell cycle regulators they identified the PRC2.1 subcomplex to be specifically involved in G1 progression, given that the absence of members of the complex makes the cells resistant to Palbociclib. They further showed that in HAP1 cells the PRC2.1, but not the PRC2.2 complex is important to repress the cyclins CCND1 and CCND2. This can explain the enhanced resistance to Palbociclib, a CDK4/6Inhibitor, after PRC2.1 deletion.

      Strengths:

      The initial CRISPR screen is very interesting because it uses three distinct chemicals that disturb the cell cycle at various stages. This screen mostly identified known cell cycle regulators, which demonstrates the validity of the approach. The results can be used as a resource for future research.

      The most interesting outcome of the experiment is the finding that knockouts of the PRC2.1 complex make the cell resistant to Palbociclib. In a further experiment, the authors focused on MTF2 and JARID2 as the main components of PRC2.1 and PRC2.2, respectively. Via extensive analyses, including genome-wide experiments, they confirmed that MTF2 is particularly important to repress the cyclins CCND1 and CCND2. The absence of MTF2 therefore leads to increased expression of these genes, sufficient to make the cell resistant to palociclib. This result will likely be of wide interest to the community.

      Weaknesses:

      The main weakness of the manuscript is that the experiments were performed in only one cell line. To draw more general conclusions, it would be essential to confirm some of the results in other cell lines.

      In addition, some of the findings, such as the results from the CRISPR screen as well as the stronger impact of the MTF2 KO on H3K27me3 and gene expression (compared to JARID2 KO), are not unexpected, given that similar results were already obtained before by other labs.

      We thank the reviewer for their suggestions and we believe that we have addressed their main concern about the generality of the MTF2 regulation of D-type cyclin expression in our resubmitted manuscript. We have now shown through shRNA knockdown that MTF2 represses CCND1 in two additional cell lines, the breast cancer MDA-MB-231 and immortalized monkey COS7 cell line (Fig. 6E). However, it is important to note that MTF2 did not control CCND1 expression in every cell line tested (Fig. 6D), underscoring the context-dependent nature of this regulation. Future studies will illuminate what cell or tumor types in which this regulation is observed.

      Additionally, while MTF2 has previously been shown to exert a greater effect on H3K27me3 levels in some circumstances (Loh et al. 2021, Rothberg et al. 2018), a number of notable reports in ES cell lines have concluded that PRC2 localization and H3K27me3 at the majority of genomic sites are dependent on both PRC2.1 and PRC2.2 activity (Healy et al. 2019, Højfeldt et al. 2019, Perino et al. 2020, Oksuz et al. 2018). Therefore, we think it is important to highlight the greater dependence on MTF2 for promoter proximal H3K27me3 levels in our transformed cell line context.  

      Reviewer #3 (Public Review):

      This study begins with a chemogenetic screen to discover previously unrecognized regulators of the cell cycle. Using a CRISPR-Cas9 library in HAP1 cells and an assay that scores cell fitness, the authors identify genes that sensitize or desensitize cells to the presence of palbociclib, colchicine, and camptothecin. These three drugs inhibit proliferation through different mechanisms, and with each treatment, expected and unexpected pathways were found to affect drug sensitivity. The authors focus the rest of the experiments and analysis on the polycomb complex PRC2, as the deletion of several of its subunits in the screen conferred palbociclib resistance. The authors find that PRC2, specifically a complex dependent on the MTF2 subunit, methylates histone 3 lysine 27 (H3K27) in promoters of genes associated with various processes including cell-cycle control. Further experiments demonstrate that Cyclin D expression increases upon loss of PRC2 subunits, providing a potential mechanism for palbociclib resistance.

      The strengths of the paper are the design and execution of the chemogenetic screen, which provides a wealth of potentially useful information. The data convincingly demonstrate in the HAP1 cell line that the MTF2-PRC2 complex sustains the effects of palbociclib (Figure 4), methylates H3K27 in CpG-rich promoters (Figure 5), and represses Cyclin D expression (Figure 6). These results could be of great interest to those studying cell-cycle control, resistance mechanisms to therapeutic cell-cycle inhibitors, and chromatin regulation and gene expression.

      There are several weaknesses that limit the overall quality and potential impact of the study. First, none of the results from the colchicine and camptothecin screens (Figures 1 and 2) are experimentally validated, which lessens the rigor of those data and conclusions. Second, all experiments validating and further exploring results from the palbociclib screen are restricted to the Hap1 cell line, so the reproducibility and generality of the results are not established. While it is reasonable to perform the initial screen to generate hypotheses in the Hap1 line, other cancer and non-transformed lines should be used to test further the validity of conclusions from data in Figures 4-6. Third, conclusions drawn from data in Figures 3D and 4D are not fully supported by the experimental design or results. Finally, there have been other similar chemogenetic screens performed with palbociclib, most notably the study described by Chaikovsky et al. (PMID: 33854239). Results here should be compared and contrasted to other similar studies.

      We thank the reviewer for their suggestions regarding our manuscript. While the genes recovered as mediating cellular responses to camptothecin and colchicine was never confirmed following our chemogenetic screens, we felt our primary findings were in the area of palbociclib resistance and decided focus our follow-up investigations on genes. We included the results camptothecin and colchicine chemogenetic screens as confirmation of the specificity of PRC2 mutation resulting in resistance to palbociclib (Fig. 4C) and for others in the community to use as a resource for future investigations. We have also clarified our results for Figure 3D and 4D in our revised manuscript, as well as included additional plots of these results (Fig. S1DS1F). And, with our resubmitted manuscript, we believe we have addressed their concern of the generality of our results by demonstrating our primary finding that MTF2 regulates D-type cyclins in additional cell lines other than HAP1. We feel these results indicate that while not “general”, there are additional cellular contexts that our main result holds true. In line with this, and to address how our chemogenetic screens fits into the landscape of previous studies, including Chaikosvsky et al., we have included the following lines to our discussion:  “Additionally, other chemogenetic screens utilizing palbociclib and have not identified that inactivation of PRC2 components as either enhancing or reducing palbociclib-induced proliferation defects, suggesting that PRC2 mutation is neutral in the cell lines studied. These observations not only underscore the context-dependent ramifications of mutation of these PRC2 complex members, but also may help inform the context in which CDK4/6 inhibitors are most efficacious.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) "We found that only thirteen and twenty genes resulted in sensitivity or resistance, respectively, in every conditions tested and were deemed non-specific and excluded from any further analysis (see Table S2)." It's unclear to me why these genes were deemed 'nonspecific'. Are these genes functionally important for the general exclusion of xenobiotic molecules?

      By this, we simply meant that these effects were not specific to one condition. Such genes could affect drug half-life or a general stress response, but are less likely to have functions directly tied to the pathway targeted by a drug than are genes whose loss affects only one condition.  

      (2) "Given that increased CCND1 levels is sufficient to drive increased CDK4/6 kinase activity, upregulation of these D-type cyclins is likely to be a significant contributor to the palbociclib resistance in MTF2∆ cells." It's unclear to me what is the basis for this statement. This is only true if there is free CDK4/6. If CDK4/6 is already fully occupied by D-type cyclins, then increased CCND1 levels would not be expected to have an effect. 

      While we anticipated that increased levels of CCND1 would result in more CDK4/6-Dtype association, we now demonstrate in the new Figure S5F that there is more CCND1 in complex with CDK6 in both SUZ12∆ and MTF2∆ cell lines. Furthermore, we able to show in Figure S5G that overexpression of D-type cyclins results in resistant to palbociclib-induced proliferation defects in HAP1 cells.

      (3) The description of the results is very confusing in places, especially regarding "resistance" versus "sensitivity" genes. For example: "CCNE1, CDK6, CDK2, CCND2 and CCND1, all of which are integral to promoting the G1/S phase transition, ranked as the 2nd, 24th, 27th, 29th and 46th most important genes for palbociclib resistance, respectively (Figures 1F and 1G). CCND1 and CCND2 bind either CDK4 or CDK6, the molecular targets of palbociclib, whereas CDK2 and CCNE1 form a related CDK kinase that promotes the G1/S transition.

      Similarly, cells with sgRNAs targeting RB1, whose phosphorylation by CDK4/6 is a critical step in G1 progression, displayed substantial resistance to palbociclib." My reading of this paragraph suggests that disruption of the CDK6 locus is associated with palbociclib resistance - surely this is a typo and instead should have been sensitivity? Please explain.

      We thank the reviewer for pointing this out and have corrected this typo  

      (4) Sensitivity to palbociclib was enhanced in cells expressing sgRNAs targeting H4 acetylation, positive regulators of Pol II transcription, and regulators of the DNA Damage Response pathway (Figures 3A and 3B), although this sensitivity was much weaker than that seen with DNA damaging agents. This observation is consistent with long-term treatment with palbociclib inducing DNA damage, as has been suggested by a number of recent publications 65,66." This is also consistent with recent work on Cdk7 inhibitors (Wilson et al. Mol Cell 2023), as Cdk7 inhibition is expected to affect both CDK1/2/4/6 activities and Pol II transcription.

      We thank the reviewer for bringing this observation to our attention and we have added this citation to this passage in our manuscript.

      (5) Figure 3D - would it not make sense to plot the data such that palbo concentration is on the x-axis? It is also difficult to interpret since the data are normalized to starting "% proliferation" at the indicated palbo treatment, when it is likely that % proliferation changes significantly with palbo concentration. Indeed, this is the graphing format used for a later figure (Figure 4D). The data with rotenone suggests palbo antagonizes rotenone-mediated reduction in proliferation. But it's unclear to me whether the graph shows the converse - that rotenone treatment modulates palbo-induced cell cycle arrest.

      This reviewer is correct about the fact that increasing doses of palbociclib in the absence of oxidative phosphorylation do indeed have an effect on proliferation. However, it is helpful to normalize proliferation values to each initial dose of palbociclib and then compare this to the different oxidative phosphorylation inhibitors treatment combinations. To illustrate that the oxidative phosphorylation inhibitors do indeed antagonize palbociclib-induced proliferation defects, we have now included the data graphed as each oxidative phosphorylation inhibitor vs palbociclib as Supplemental Figures S1D-S1F.

      • The highest concentration of GSK126 tested (5µM) does not appear to confer resistance, but perhaps this is due to off-target effects or cytotoxicity?

      We agree with the reviewer that at the highest doses of dose of GSK126, low doses of palbociclib do not confer resistance to palbociclib. However, higher doses do appear to have this effect. We have included a statement in our results section to address this reviewer’s observations. 

      • Disruption of Emi1 leads to resistance (Figure 1F, FZR1), yet overexpression induces resistance (Mouery et al. bioRxiv 2023). Explain.

      We do not understand why EMI1 responds in this way, and therefore we cannot comment on this in the text. 

      Typos/stylistic comments:

      • Typo "However, the net result of these opposing effects on cell cycle progression, and the contribution of the individual subcomplexes to this regulation, rained unclear."

      We thank the reviewer for pointing this out, and we have corrected it.  

      • Use of the word "growth" - I think the authors should be more precise. Is "proliferation" meant here?

      We thank the reviewer for pointing this out, and we have corrected it.

      • n Figure 4G, two of the panels have 8.42%. Is this correct, or may it be a copy/paste error?

      This was an error, but is no longer relevant as we have reconducted and reanalyzed this experiment.

      Reviewer #2 (Recommendations For The Authors):

      Major Points

      (1) Some of the conclusions should be confirmed in additional cell lines. I would suggest testing the resistance to Palbociclib in several additional cell lines, where MTF2 and JARID2 are deleted. If the conclusion can be generalized, one would expect that the differential role of MTF2 versus JARID2 can be confirmed in more cell lines.

      While the PRC2.1-dependent repression of D-type cyclins does not appear to be general, we have now demonstrated in Figures 5SE and 6F that there are multiple different cellular contexts in which our observations are consistent. Specifically, we demonstrate that GSK126 causes upregulation of CCND1 in both immortalized nontumor cells (COS7 cells) and in the breast cancer cell line MDA-MB-231. Moreover, in both cases we showed that this effect is PRC2.1-dependent, as shRNA knockdown of MTF2 increases expression of CCND1.

      (2) In addition, it may be attractive to make use of publicly available RNA-seq data of MTF2 and JARID2 knockout/down cells, to investigate the generality of the finding that PRC2.1 regulates CCND1 and CCND2.

      While it would be useful to address this issue, Figure S5E demonstrates that the repression of D-type cyclin expression by PRC2.1 is context dependent. Furthermore, prior to identifying the lines shown in Figure 6F and 5SE, we were not aware of which lines to focus our investigations on. However, we have now demonstrated a few cellular contexts in which either chemical inhibition of PRC2 or knockdown of MTF2 results in de-repression of CCND1 expression.

      (3) At a bare minimum the authors should strongly discuss the limitations of the study, and tone down the conclusions.

      We would agree with this based upon the data in the original submitted manuscript, however, now that we have shown that this effect is more general, this is less critical. That said, we do not see this effect in all cell lines, and we have made this apparent in the final version of the manuscript.

      Minor point

      (1) In my view, Figures 1-3 should be shortened to the most essential points, and some data/figures should be moved to the supplementary figures. Especially the STING genenetwork graphs are in my view not particularly meaningful.

      While we understand the opinion of this reviewer, we feel that these data will be of significant interest to some readers.  

      (2) Figure 6E and 6F/G appear to be largely redundant. This can perhaps be made more concise.

      This has been addressed in the new version of Figure 6

      (3) Figure 5D should be enlarged. 

      We thank the reviewer for this suggestion and have enlarged the image.

      Reviewer #3 (Recommendations For The Authors):

      The manuscript could be edited to improve clarity. In several places, the scientific logic motivating an experiment is confusing, and there are several hypotheses and conclusions that seem opposite from what the data are suggesting. Some aspects of the figures were also unclear. Specific examples include the following:

      (1) Last sentence of abstract : "Our results demonstrate a role for PRC2.1, but not PRC2.2, in promoting G1 progression." Data show that knockout of PRC2.1 components promotes G1 progression through upregulation of CycD, so the conclusion here is the opposite.

      We thank the reviewer for catching this error. We have now changed this to “in antagonizing G1 progression”.

      (2) In the second paragraph of the results, CCNE1, CDK2, etc are described as scoring high for palbociclib resistance, but those genes scored as sensitizing. Also, in that paragraph, it is described that a drug is sensitizing cells to loss of a gene, which seems like incorrect logic. It should be clarified that knock-out of a gene either sensitizes or desensitizes cells to the drug.

      We thank the reviewer for catching this error. We have now corrected it.  

      (3) In the motivation for the experiment in Figure 3D, it is written: "we asked whether chemical inhibition of oxidative phosphorylation could rescue sensitivity to palbociclib". Considering that knock-out of genes that mediate oxidative phosphorylation confer resistance to palbociclib, it is confusing why it was expected that chemical inhibitors would restore sensitivity.

      We are sorry if the original wording was confusing. We have now changed this to “combined inhibition of oxidative phosphorylation and CDK4/6 activity mutually rescue the proliferation defect imposed by agents targeting the other process”.  

      (4) If the intention of Figure 3D is to test the hypothesis that chemical inhibition of oxidative phosphorylation modulates sensitivity to palbociclib, the clarity of Figure 3D would be improved if data were shown such that palbociclib concentration is on the x-axis and the different curves are different drug concentrations.

      It appears that there is some mutual suppression, which inhibition of each process rescues cells partly from inhibition of the other. In fact, with these drugs the stronger of the two is seen as the rescue of mitochondrial poisons by palbociclib. We have now discussed this in the text.  

      (5) The authors should check the units on the x-axis in Figure 4D, should they be log[uM Palbo] or log [nM Palbo]?

      We thank the reviewer for catching this error. We have now corrected it

      (6) It should be clarified which data are summarized in the graph to the right in Figure 4G, are these experiments with palbociclib?

      This is currently included in the figure legends.

      (7) The text suggests that the control CCNE1 knockout is shown in Figure 4E, but those data are missing.

      This has been corrected in Figure 4E.

      Several conclusions are not well supported by the data and should be revised or more data and analysis should be added.

      (1) The titular conclusion that the "PRC2.1 Subcomplex Opposes G1 Progression through Regulation of CCND1 and CCND2" has only been demonstrated in the context of a Cdk4/6 inhibitor in HAP1 cells. There is little evidence supporting this claim that is broadly applicable. For example, data in Figure 4G show small and not demonstrable significant differences in G1 and S phase populations in the mock experiments. Also, experiments in other cells are needed to support the rigor and generality of the conclusion.

      Our chemogenetic screen and competitive proliferation assay data in Figure 4A, 4C and 4E support the conclusion that PRC2.1 and PRC2.2 play opposing roles in G1 progression. Furthermore, we have repeated the initial BrdU incorporation experiments shown in Figure 4G and have been able to demonstrate that JARID2∆ cells do indeed display a significant decrease of cells entering into S-phase when treated with palbociclib. Most importantly, in the Figures 6D and 6E we show additional cell lines where this is the case.  Therefore, we feel that this title is valid in the current version of the manuscript, where we have shown it to be the case in multiple tumor-derived human cell lines as well as immortalized non-human primate cells.  

      (2) It is unclear how the data in Figure 3D support the conclusion that the administered inhibitors of oxidative phosphorylation influence response to palbociclib.

      As noted in the response to point 4, we have now discussed this mutual rescue more thoroughly in the text.  

      (3) In Figure 4D, the IC50 values should be calculated and statistical significance based on biological replicates should be determined. Also, the conclusion that "increasing doses of GSK126 withstood palbociclib-induced growth suppression" is overstated, as ultimately all drug conditions succumb to palbocilib suppression of proliferation, although there may be differences in sensitivity.

      We have now  included a statical analysis of each data point in Figure 4D.  

      Editorial comments:

      (1) The title does not seem to optimally capture the content of the paper. Please consider changing it, e.g. focusing on palbociclib resistance. 

      While we used this particular drug to make the original observation, we feel it is more general to discuss the underlying biology (cyclin gene control) than the pharmacological methodology. Moreover, we have now extended our findings about the regulation of D-type cyclins by PRC2.1 to several cell lines, derived from both cancers and primary cells, re-enforcing the fact that this effect is observed more broadly.   

      (2) Please indicate the biological system (haploid human HAP1 cells) in either title or abstract.

      The abstract now indicates that we have observed this in CML, breast cancer and immortalized primary cells.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors aim to investigate the relationship between low estrogen levels, postmenopausal hypertension, and the potential role of the molecule L-AABA as a biomarker for hypertension. By employing metabolomic analysis and various statistical methods, the study seeks to understand how estrogen deficiency affects blood pressure and identify key metabolites involved in this process, with a particular focus on L-AABA.

      Strengths:

      The study addresses a relevant and understudied area: the role of estrogen and metabolites in postmenopausal hypertension. It presents a novel hypothesis that L-AABA may serve as a protective factor against hypertension, which could have significant clinical implications if proven.

      We appreciate the acknowledgment of our study’s focus on an important and understudied area. Our hypothesis regarding L-AABA’s role as a possible protective factor against hypertension indeed holds promise for advancing clinical implications.

      Weaknesses:

      The evidence linking L-AABA to hypertension is largely correlative, lacking experimental validation or mechanistic proof. Key limitations, such as the inadequacy of the ovariectomy model in replicating human menopause, are acknowledged but not addressed with alternative approaches. In summary, while the study offers an intriguing hypothesis, its conclusions are premature and require further experimental validation and human data to substantiate the claims.

      We recognize the limitations regarding the correlative nature of our findings and the inadequacy of the OVX model in replicating human menopause. Future research will prioritize experimental validation and incorporate human studies to solidify our conclusions.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Dr. Yao Li et al. documented the metabolomic profile of the aorta from OVX rats and that from OVX plus E2. These conditions mimic post-menopause hypertension and hormonal replacement therapy.

      Strengths:

      The authors state that this is probably the first study to examine the metabolic changes in the aorta of post-menopause hypertension.

      As pointed out by the reviewer, our study may be the first to investigate changes in aortic metabolism in postmenopausal hypertension. As an exploratory study, our goal is to depict the overall characteristics and explore possible research directions.

      Weaknesses:

      There are several weaknesses, and a few of them are quite serious.

      (1) The aorta is not a resistant artery and has little to do with hypertension. The authors should have used resistant arteries for this study. The expression of several adrenergic receptors and cholinergic receptors in the aorta and resistant arteries are different. It is unknown whether the aorta metabolomic profile has any relevance to BP and whether they are similar to that of the resistant arteries. I understand the logistics issue of obtaining enough tissues from resistant arteries. At least, once some leads are discovered in the aorta, the authors should validate it in resistant arteries. This should be feasible.

      We acknowledge the limitation of using the aorta and will aim to include studies on resistant arteries to validate our metabolomic findings.

      (2) The aorta and all the arteries have three layers. It is critically important to know whether the metabolic changes occur in the intima or in the media, while the adventitia probably has little to do with vasoconstriction and hypertension. If the authors want to use the aorta to conduct the preliminary study, they should completely remove the adventitia and then use samples with and without their endothelium stripped and then assess their metabolomic profiles. After the leads are obtained from this preliminary profiling, they should be validated in endothelium and smooth muscles of the resistant artery. The current experiments are not appropriately designed.

      Future studies will involve detailed profiling of specific arterial layers, focusing on the intima and media to enhance the relevance of our findings related to hypertension.

      (3) The tail-cuff BP measurement is a technique of the last century. The current gold standard of BP measurement is by telemetry. The tail-cuff method is particularly problematic in this study because the 1-2 h restraining of the rats for more than 10 times BP measurement will cause significant stress in the animal, and their stress hormone secretion might cause biased metabolomic profiles in the OVX versus shames operated mice. The problem can be totally avoided by using telemetry.

      We appreciate the suggestion and will consider telemetry for more accurate blood pressure measurements in future experiments to minimize stress-related bias.

      (4) Although the L-AABA showed a high p-value (10^-4) of a decrease in the OVX rats, the fold change is small (2-3 folds). Such a small change should be validated using a different method to be convincing.

      We plan to employ additional methods to validate the observed changes in L-AABA levels in the following research, ensuring robustness of our findings.

      (5) The authors claim (or hypothesize) that the reduced AABA level in OVX can cause vascular remodeling. This can be easily validated by the histology of the OVX-resistant artery, and they should do that during the revision. The authors should also examine the M1 macrophage function from the OVX mice to validate their claimed link of AABA to M1.

      We intend to conduct histological analyses and examine M1 macrophage function in OVX-resistant arteries to validate our hypothesis in the following research.

      (6) As mentioned above, the authors need to pinpoint the changes of AABA to target cells, i.e., endothelial cells, SMC, or M1, and then use in vitro or in vivo cell biology approaches to assess whether these cells in the OVX rat indeed have an abnormality in function and, indeed, such functional changes are responsible for the BP phenotype.

      Addressing these points, we aim to pinpoint specific cell types affected by AABA variations and conduct in vitro and in vivo studies to examine their physiological impacts in the following research.

      (7) The results of the current study can be condensed into 1 or 2 figures that can serve as a base or a starting point for a deeper scientific study.

      Thank you for your suggestion. As a omics research, our research approach may differ from traditional mechanism studies.

      Summary

      The experimental design of this manuscript is inappropriate, and the methods are not up to the current standards. The whole study is descriptive and rudimentary. It lacks validation and mechanism. The data from this manuscript might be of some value and can serve as the first step for more investigation of the mechanism of post-menopause hypertension.

      Reviewer #3 (Public review):

      Summary:

      The decrease in estrogen levels is strongly associated with postmenopausal hypertension. Dr. Yao Li and colleagues aimed to investigate the metabolomic mechanisms of underlying postmenopausal hypertension using OVX and OVX+E2 rat models. They successfully established a correlation between reduced estrogen levels and the development of hypertension in rats. They identified L-alpha-aminobutyric acid (AABA) as a potential marker for postmenopausal hypertension. The research explored the metabolic alterations in aortic tissues and proposed several potential mechanisms contributing to postmenopausal hypertension.

      Strengths:

      The group performed a comprehensive enrichment analysis and various statistical analyses of the metabolomics data.

      As summarized by the reviewer, our current study conducted a comprehensive analysis of metabolomics data. It is also a reliable foundation for further mechanism research.

      Weaknesses:

      (1) The manuscript is descriptive in nature, although they mentioned their primary objective is to explore the potential mechanisms linking low estrogen levels with postmenopausal hypertension. No mechanism insights have been interrogated in this study, which has been mentioned by the authors in the discussion. The connection between E2, AABA, and macrophage needs to be validated in endothelial cells, vascular smooth muscle cells, and other aortic tissue cells. Without such verification, the manuscript predominantly raises hypotheses only based on metabolomic data.

      We have proposed research hypotheses based on detailed omics data. Further research on the mechanisms involving endothelial and vascular smooth muscle cells to validate the pathway connections between E2, AABA, and macrophages is undoubtedly the future direction of this study.

      (2) The serum contains three forms of estrogen: Estradiol, Estrone, and Estriol. The authors used the Rat E2 ELISA kit. Ideally, all three forms of estrogen should be measured.

      Future assays will aim to measure Estradiol, Estrone, and Estriol to capture a more comprehensive picture of estrogen’s role in postmenopausal hypertension.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This useful study reports on the discovery of an antimicrobial agent that kills Neisseria gonorrhoeae. Sensitivity is attributed to a combination of DedA assisted uptake of oxydifficidin into the cytoplasm and the presence of a oxydifficidin-sensitive RplL ribosomal protein. Due to the narrow scope, the broader antibacterial spectrum remains unclear and therefore the evidence supporting the conclusions is incomplete with key methods and data lacking. This work will be of interest to microbiologists and synthetic biologists.

      General comment about narrow scope: The broader antibacterial spectrum of oxydifficidin has been reported previously (S B Zimmerman et al., 1987). The main focus of this study is on its previously unreported potent anti-gonococcal activity and mode of action. While it is true that broad-spectrum antibiotics have historically played a role in effectively controlling a wide range of infections, we and others believe that narrow-spectrum antibiotics have an overlooked importance in addressing bacterial infections. Their advantage lies in their ability to target specific pathogens without markedly disrupting the human microbiota.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Kan et al. report the serendipitous discovery of a Bacillus amyloliquefaciens strain that kills N. gonorrhoeae. They use TnSeq to identify that the anti-gonococcal agent is oxydifficidin and show that it acts at the ribosome and that one of the dedA gene products in N. gonorrhoeae MS11 is important for moving the oxydifficidin across the membrane.

      Strengths:

      This is an impressive amount of work, moving from a serendipitous observation through TnSeq to characterize the mechanism by which Oxydifficidin works.

      Weaknesses:

      (1) There are important gaps in the manuscript's methods.

      The requested additions to the method describing bacterial sequencing and anti-gonococcal activity screening will be made. However, we do not think the absence of these generic methods reduces the significance of our findings.

      (2) The work should evaluate antibiotics relevant to N. gonorrhoeae.

      (1) It is not clear to us why reevaluating the activity of well characterized antibiotics against known gonorrhoeae clinical strains would add value to this manuscript. The activity of clinically relevant antibiotics against antibiotic-resistant N. gonorrhoeae clinical isolates is well described in the literature. Our use of antibiotics in this study was intended to aid in the identification of oxydifficidin’s mode of action. This is true for both Tables 1 and 2.

      (2) If the reviewer insists, we would be happy to include MIC data for the following clinically relevant antibiotics: ceftriaxone (cephalosporin/beta-lactam), gentamicin (aminoglycoside), azithromycin (macrolide), and ciprofloxacin (fluoroquinolone).

      (3) The genetic diversity of dedA and rplL in N. gonorrhoeae is not clear, neither is it clear whether oxydifficidin is active against more relevant strains and species than tested so far.

      (1) We thank the reviewer for this suggestion. We aligned the DedA sequence from strain MS11 with DedA proteins from 220 N. gonorrhoeae strains that have high-quality assemblies in NCBI. The result showed that there are no amino acid changes in this protein. Using the same method, we observed several single amino acid changes in RplL. This included changes at A64, G25 and S82 in 4 strains with one change per strain. These sites differ from R76 and K84, where we identified changes that provide resistance to oxydifficidin. Notably, in a similar search of representative Escherichia, Chlamydia, Vibrio, and Pseudomonas NCBI deposited genomes, we did not identify changes in RplL at position R76 or K84.

      (2) While the usefulness of screening more clinically relevant antibiotics against clinical isolates as suggested in comment 2 was not clear to us, we agree that screening these strains for oxydifficidin activity would be beneficial. We have ordered Neisseria gonorrhoeae strain AR1280, AR1281 (CDC), and Neisseria meningitidis ATCC 13090. They will be tested when they arrive.

      Reviewer #2 (Public Review):

      Summary:

      Kan et al. present the discovery of oxydifficidin as a potential antimicrobial against N. gonorrhoeae, including multi-drug resistant strains. The authors show the role of DedA flippase-assisted uptake and the specificity of RplL in the mechanism of action for oxydifficidin. This novel mode of action could potentially offer a new therapeutic avenue, providing a critical addition to the limited arsenal of antibiotics effective against gonorrhea.

      Strengths:

      This study underscores the potential of revisiting natural products for antibiotic discovery of modern-day-concerning pathogens and highlights a new target mechanism that could inform future drug development. Indeed there is a recent growing body of research utilizing AI and predictive computational informatics to revisit potential antimicrobial agents and metabolites from cultured bacterial species. The discovery of oxydifficidin interaction with RplL and its DedA-assisted uptake mechanism opens new research directions in understanding and combating antibiotic-resistant N. gonorrhoeae. Methodologically, the study is rigorous employing various experimental techniques such as genome sequencing, bioassay-guided fractionation, LCMS, NMR, and Tn-mutagenesis.

      Weaknesses:

      The scope is somewhat narrow, focusing primarily on N. gonorrhoeae. This limits the generalizability of the findings and leaves questions about its broader antibacterial spectrum. Moreover, while the study demonstrates the in vitro effectiveness of oxydifficidin, there is a lack of in vivo validation (i.e., animal models) for assessing pre-clinical potential of oxydifficidin. Potential SNPs within dedA or RplL raise concerns about how quickly resistance could emerge in clinical settings.

      (1) Spectrum/narrow scope: The broader antibacterial spectrum of oxydifficidin has been reported previously (S B Zimmerman et al., 1987). The focus of this study is on its previously unreported potent anti-gonococcal activity and its mode of action. While it is true that broad-spectrum antibiotics have historically played a role in effectively controlling a wide range of infections, we and others believe that narrow-spectrum antibiotics have an overlooked importance in addressing bacterial infections. Their advantage lies in their ability to target specific pathogens without markedly disrupting the human microbiota.

      (2) Animal models: We acknowledge the reviewer’s insight regarding the importance of in vivo validation to enhance oxydifficidin’s pre-clinical potential. However, due to the labor-intensive process needed to isolate oxydifficidin, obtaining a sufficient quantity for animal studies is beyond the scope of this study. Our future work will focus on optimizing the yield of oxydifficidin and developing a topical mouse model for subsequent investigations.

      (3) Potential SNPs: Please see our response to Reviewer #1’s comment 3. We acknowledge that potential SNPs within dedA and rplL raise concerns regarding clinical resistance, which is a common issue for protein-targeting antibiotics. Yet, as pointed out in the manuscript, obtaining mutants in the lab was a very low yield endeavor.

      Reviewer #3 (Public Review):

      Summary:

      The authors have shown that oxydifficidin is a potent inhibitor of Neisseria gonorrhoeae. They were able to identify the target of action to rplL and showed that resistance could occur via mutation in the DedA flippase and RplL.

      Strengths:

      This was a very thorough and clearly argued set of experiments that supported their conclusions.

      Weaknesses:

      There was no obvious weakness in the experimental design. Although it is promising that the DedA mutations resulted in attenuation of fitness, it remains an open question whether secondary rounds of mutation could overcome this selective disadvantage which was untried in this study.

      We thank the reviewer for the positive comment. We agree that investigating factors that could compensate for the fitness attenuation caused by DedA mutation would enhance our understanding of the role of DedA.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The use of the term "N. gonorrhoeae wildtype" should not be used. It is uninformative, as the species contains a large amount of diversity. Instead, please name the strain. From Figure 1, it looks like the authors used MS11. Since MS11 is a longstanding lab strain and likely does not reflect circulating N. gonorrhoeae, and since H041 is no longer in circulation, the authors should ideally test the compound against more representative strains of N. gonorrhoeae. This includes panels of isolates available through the CDC, for example (https://www.cdc.gov/drugresistance/resistance-bank/index.html). I encourage the authors to include FC428 or another recently identified isolate with the penA 60 allele to demonstrate oxydifficidin's activity against contemporary concerning isolates/lineages.

      (1) “N. gonorrhoeae MS11” is now used instead of “N. gonorrhoeae WT” in this manuscript.

      (2) In our revised manuscript, we have added MIC data for recently identified Neisseria gonorrhoeae isolates AR#1280 and AR#1281 which contain the penA 60 allele (Table 1). The data shows oxydifficidin maintains its potent activity against these multidrug-resistant strains. We also added a description of this data to the results section as shown below.

      Original text: “Oxydifficidin was more potent against N. gonorrhoeae MS11 than almost all other antibiotics we tested. In fact, it was only slightly less active than the highly optimized third-generation cephalosporin, ceftazidime.([18]) However, unlike third-generation cephalosporins, oxydifficidin retained activity against the multidrug resistant H041 clinical isolate (Table 1).([4]) H041 is resistant to the “standard of care” cephalosporin ceftriaxone (2 µg/mL) as well as a number of other antibiotics that are normally active against N. gonorrhoeae (penicillin G, 4 µg/mL; cefixime, 8 µg/mL; levofloxacin, 32 µg/mL).”

      Changed to: “Oxydifficidin was more potent against N. gonorrhoeae MS11 than most other antibiotics we tested. Notably, unlike clinically used antibiotics such as ceftriaxone, azithromycin, and ciprofloxacin, oxydifficidin retained activity against all multidrug-resistant clinical isolates we examined (Table 1).” (Line 77-79)

      (2) Does oxydifficidin have activity against N. meningitidis? It is the species most closely related to N. gonorrhoeae and the other pathogenic Neisseria.

      Oxydifficidin has potent activity against N. meningitidis ATCC 13090. In our revised manuscript, we have included its MIC data in Figure 1c.

      (3) Given claims that oxydifficidin activity in N. gonorrhoeae as compared to other Neisseria reflects N. gonorrhoeae's dedA and sensitive rplL, it would be good to assess the allelic diversity of these genes in N. gonorrhoeae. There are over 20,000 genomes from clinical isolates of N. gonorrhoeae in databases. It should be straightforward to check whether dedA and rplL allelic variants already exist in the population. Should variants be observed, oxydifficidin should be tested against the associated strains of N. gonorrhoeae.

      Response: We thank the reviewer for this suggestion. We aligned the DedA sequence from strain MS11 with DedA proteins from 220 N. gonorrhoeae strains that have high-quality assemblies in NCBI. The result showed that there are no amino acid changes in this protein. Using the same method, we observed several single amino acid changes in RplL. This included changes at A64, G25 and S82 in 4 strains with one change per strain. These sites differ from R76 and K84, where we identified changes that provide resistance to oxydifficidin. Notably, in a similar search of representative Escherichia, Chlamydia, Vibrio, and Pseudomonas NCBI deposited genomes, we did not identify changes in RplL at position R76 or K84.

      New text: “A survey of 220 N. gonorrhoeae strains with high-quality assemblies in NCBI found no mutations in the DedA protein.” (Line 104-105)

      “These two mutations were not found in the survey of the same collection of N. gonorrhoeae strains used to look for DedA mutations.” (Line 143-144)

      (4) Clinically relevant antibiotics for N. gonorrhoeae are penicillin, tetracycline, spectinomycin, gentamicin, ciprofloxacin, azithromycin, ceftriaxone; moreover, zoliflodacin and gepotidacin have reportedly successfully completed phase 3 trials. The authors should redo their MIC testing with these antibiotics (e.g., for Figures 1 and 2 and Tables 1 and 2), both because this will enable direct comparison with the many clinical isolates that have undergone testing and because these are the drugs most pertinent to clinical practice. Ampicillin, ceftazidime, chloramphenicol, bacitracin, and daptomycin are not relevant. Could the authors explain why they tested vancomycin, polymyxin B, irgasan, melittin, avilamycin, and thiostrepton?

      Our use of antibiotics with diverse modes of action (e.g. vancomycin, polymyxin B, irgasan, melittin, avilamycin, and thiostrepton) in this study was intended to aid in the identification of oxydifficidin’s mode of action. This is true for both Tables 1 and 2.

      To address the reviewer’s concern, in our revised manuscript, we have added MIC data for the following clinically relevant antibiotics: ceftriaxone (cephalosporin/beta-lactam), gentamicin (aminoglycoside), azithromycin (macrolide), and ciprofloxacin (fluoroquinolone) to Table 1.

      (5) Please describe the characteristics of the transposon library (finding four transposons in a single strain does seem unexpected, given how most transposon libraries aim for one transposon insertion per strain).

      We understand that one transposon insertion per strain is ideal for transposon libraries. This Bacillus strain proved to be recalcitrant to genetic manipulation. In the rare cases where we obtained resistance colonies upon electroporation with the transposon, all colonies contained multiple (≥ 4) transposon insertions. This made it impractical to build a library with one transposon insertion per library member.

      We assumed that the anti-N. gonorrhoeae activity most likely originated from a natural product BGC, which typically range from 10-100 kb in size.

      Based on the average of 50 kb per BGC, ~80 transposon insertions would be required to fully search the 4.2 Mb genome of Bacillus amyloliquefaciens BK for a BGC. At 4 mutations per transformant, 1x coverage of the genome would require only 20 library members.

      After extensive electroporation of transposon into Bacillus amyloliquefaciens BK, we were able to obtain a library of 50 members, including one mutant (Tn5-3) that lacked anti-N. gonorrhoeae activity.

      New text added to the methods section:

      “A library containing 50 transposon mutants was obtained. In the mutants examined, each strain contained ≥4 transposon insertions” (Line 337-339)

      (6) Please describe in the methods how you sequenced and annotated the genome of Bacillus amyloliquefaciens BK.

      The sequencing method is now described in “Genomic Sequencing and annotation of Bacillus amyloliquefaciens” section. The genome of Bacillus amyloliquefaciens BK was not fully annotated. Mutations were identified as described in the updated methods section below.

      New text:

      “Genomic Sequencing and annotation of Bacillus amyloliquefaciens

      Genomic DNA from Bacillus amyloliquefaciens BK WT and transposon mutant Tn5-3 was isolated using PureLink Microbiome DNA purification kit (Invitrogen) according to the manufacturer’s instructions.

      The Bacillus amyloliquefaciens BK WT genome was assembled by mapping its sequencing data onto the annotated genome of Bacillus amyloliquefaciens FZB42 using Geneious Prime. Differences in the mutant strain Tn5-3 were identified by mapping its sequencing data onto the assembled Bacillus amyloliquefaciens BK WT genome. The mutated genes were then annotated using NCBI BLAST. The oxydifficidin BGC was annotated using the antiSMASH online server.” (Line 253-260)

      (7) Please describe in the methods how you screened the library for strains that lacked anti-gonococcal activity.

      The method is added to our revised manuscript as section “Screening of Bacillus Strains Lacking Anti-N. gonorrhoeae Activity”.

      New text:

      “Screening of Bacillus Strains Lacking Anti-N. gonorrhoeae Activity

      The transposon mutants of Bacillus amyloliquefaciens BK were grown overnight in LB medium at 30 °C. Each overnight culture was then diluted 1:5000, and 1 μl of the diluted culture was spotted onto a GCB agar plate swabbed with N. gonorrhoeae cells. The plate was then incubated overnight at 37 °C with 5% CO2. The mutant strain (Tn5-3) lacking anti-N. gonorrhoeae activity was identified due to its failure to produce a zone of growth inhibition in the resulting N. gonorrhoeae lawn.” (Line 341-346)

      (8) Was only one strain found that was a 'non-producer' of anti-N. gonorrhoeae activity? Line 68 suggests that this was only one of multiple non-producers. Is that correct? If so, did you work up the others, and did they also have disruptions in the same biosynthetic gene cluster?

      Only one strain was identified as a “non-producer” of anti-N. gonorrhoeae activity. We have modified the text to clarify this point.

      Original text: “The sequencing of one non-producer strain revealed that it surprisingly contained four transposon insertions and one frame shift mutation.”

      Changed to: “The sequencing of the non-producer strain revealed that it surprisingly contained four transposon insertions and one frame shift mutation.” (Line 53-54 )

      (9) All sequences (including Bacillus amyloliquefaciens BK) must be deposited in a public database (e.g., NCBI) and the accession numbers reported in the manuscript.

      Genomic sequence data of Bacillus amyloliquefaciens BK has been deposited in GenBank, and its accession number (GCA_019093835.1) now appears in figure legend of Figure S1a.

      Figure S1a legend:

      “Genome-based phylogenetic tree containing Bacillus amyloliquefaciens BK and closely related Bacillus spp. The tree was built by Genome Clustering of MicroScope using neighbor-joining method. The NCBI accession numbers of Bacillus strains used in the tree are GCA_000196735.1, GCA_000204275.1, GCA_000015785.2, GCA_019093835.1, GCA_000009045.1, GCA_000011645.1, GCA_000172815.1, GCA_000008005.1, and GCA_000007845.1 (from top to bottom).”

      Minor

      (10) Statements in the article would benefit from fact-checking. For example:

      - gonorrhea is not the second most prevalent sexually transmitted infection worldwide; it is the second most reported bacterial sexually transmitted infection.

      - Treatment is ceftriaxone 500mg IM x1 in the US, but 1g IM x1 in the UK and Europe. The UK guidelines also permit ciprofloxacin, should sequencing indicate gyrA 91S. I suggest reviewing / specifying which treatment guidelines you're referring to.

      We appreciate the reviewer’s corrections. The word “prevalent” is now changed to “reported”.

      Original text: “Gonorrhea, which is caused by Neisseria gonorrhoeae, is the second most prevalent sexually transmitted infection worldwide.”

      Changed to: “Gonorrhea, which is caused by Neisseria gonorrhoeae, is the second most reported sexually transmitted infection worldwide.” (Line 2-3)

      Original text: “Gonorrhea is the second most prevalent sexually transmitted infection worldwide, its causative agent is the bacterium Neisseria gonorrhoeae.”

      Changed to: “Gonorrhea is the second most reported sexually transmitted infection worldwide, its causative agent is the bacterium Neisseria gonorrhoeae.” (Line 18-19)

      “In the USA” is now added to the sentence stating gonorrhea treatment.

      Original text: “The high dose (500 mg) of the cephalosporin ceftriaxone is currently the only recommended therapy for treating gonorrhea infections.”

      Changed to: “The high dose (500 mg) of the cephalosporin ceftriaxone is currently the only recommended therapy for treating gonorrhea infections in the USA.” (Line 20-22)

      (11) Please make sure all results are in the results section. The report of cell morphology, for example, should be in the results, not the discussion.

      In our revised manuscript, we have included the cell morphology data in the results section with the text changes below.

      Original text: “Interestingly, not only was dedA deficient N. gonorrhoeae less susceptible to oxydifficidin, oxydifficidin also kills this mutant more slowly (Figure 2b) than WT N. gonorrhoeae MS11.”

      Changed to: “Interestingly, not only was dedA deficient N. gonorrhoeae less susceptible to oxydifficidin, oxydifficidin also kills this mutant more slowly (Figure 2b) than WT N. gonorrhoeae MS11. The dedA deletion mutant also showed an altered cell morphology with reduced membrane integrity and lower formation of micro-colonies (Figure S4). (Line 100-104)

      Original text: “The dedA deletion mutant also showed an altered cell morphology with reduced membrane integrity and lower formation of micro-colonies (Figure S4), indicating that it should show reduced pathogenesis and fitness, and, as a result, not accumulate in a clinical setting, which adds to the therapeutic appeal of oxydifficidin.”

      Changed to: “The dedA deletion mutant exhibited altered cell morphology, characterized by diminished membrane integrity and reduced micro-colony formation, indicating that it should show reduced pathogenesis and fitness, and, as a result, not accumulate in a clinical setting, which adds to the therapeutic appeal of oxydifficidin” (Line 206-210)

      (12) Tables 1 and 2 should be combined and should address the most relevant antibiotics

      The MIC data of additional relevant antibiotics are now included in Table 1. However, we still believe that keeping Tables 1 and 2 separate enhances the clarity of the manuscript. Table 2 specifically focuses on diverse ribosomal targeting antibiotics, which highlights the unique binding site of oxydifficidin.

      (13) Supplemental Figure 1a. The tree could be better resolved, and there are four entries with the identical listing of "Bacillus amyloliquefaciens subsp. plantarum" on different branches. In the methods or the legend, please indicate the accession numbers for these genomes. Also please specify how this tree was made-is it a maximum likelihood tree? Something else?

      The tree is now better resolved and includes new entries. The requested information regarding accession numbers and tree construction method has been included in the figure legend.

      New supplemental Figure 1a legend:

      “a. Genome-based phylogenetic tree containing Bacillus amyloliquefaciens BK and closely related Bacillus spp. The tree was built by Genome Clustering of MicroScope using neighbor-joining method. The NCBI accession numbers of Bacillus strains used in the tree are GCA_000196735.1, GCA_000204275.1, GCA_000015785.2, GCA_019093835.1, GCA_000009045.1, GCA_000011645.1, GCA_000172815.1, GCA_000008005.1, and GCA_000007845.1 (from top to bottom).”

      Reviewer #2 (Recommendations For The Authors):

      The conclusions drawn in the manuscript are well-supported by the experimental data presented.

      I have the below minor comments:

      (1) "serendipitously identified" - I feel this wording should be avoided throughout the manuscript. The point of a research paper is to communicate methodology and experimental detail, and this language portrays the opposite.

      While we agree that methodology and experimental procedures are paramount in scientific reporting, we believe it is equally important to convey, particularly to younger generations, that a part of the scientific process is often unplanned and can benefit from chance observations. Therefore, we would like to keep this wording.

      (2) The introduction should include the biological roles/function of DedA proteins in bacteria.

      DedA proteins perform a wide array of biological roles and functions in bacteria. In the results section (Line 107-116), we have described the most well-established of these functions, particularly the flippase activity, which appears to be directly related to oxydifficidin sensitivity. We believe that introducing this information in the results section enhances the manuscript’s clarity and flow.

      (3) "When we screened this contaminant for antibacterial activity against lawns of other Gram-negative bacteria it did not produce a zone of growth of inhibition against any of the bacteria we tested (e.g., Escherichia coli, Vibrio cholerae, Caulobacter crescentus)." Can these data Figures be included in the Supplements?

      This result was recorded in the lead author’s notebook, but no image was saved.

      (4) Line 52: Was any base analyses performed on the Tn-mutants i.e., how many insertion-sites? Depth of mutants? Was a library constructed in this study or previously? Why were only BGC assessed?

      Please see our response to Reviewer #1’s comment (5). We focused on BGCs because we believed the anti-N. gonorrhoeae activity most likely resulted from a molecule encoded by a natural product BGC.

      (5) Line 98: Do the other 2 predicted DedA-like proteins also have a role in uptake of oxydifficidin? Is there some redundancy in uptake?

      We generated knockout mutants for two other predicted DedA-like proteins in N. gonorrhoeae MS11, and the MIC of oxydifficidin for these mutants remained the same as for the N. gonorrhoeae MS11 wild type strain. Therefore, we believe that the DedA protein discussed in this manuscript is the primary transporter of oxydifficidin. However, we cannot completely rule out the possibility of redundancy in oxydifficidin uptake by other DedA-like proteins.

      New text: “We also generated deletion mutants for two other predicted dedA-like genes, and the MIC of oxydifficidin for these mutants remained the same as for the N. gonorrhoeae MS11 wild type strain.” (Line 98-100)

      Reviewer #3 (Recommendations For The Authors):

      This is a well presented manuscript and I could not immediately see any issues with it.

      We appreciate the reviewer’s positive feedback.

    1. Author response:

      We are submitting a revised manuscript with major additions that address the main concerns in the initial reviews. At the highest level, this revision provides i) orthogonal biochemical measurements that yield concrete evidence of lysosomal protein aggregates, and ii) a plausible mechanism linking lysosomal lipid handling and protein aggregation through disruption of ESCRT function. We believe these additions significantly improve the completeness of this study and the conclusions that can be drawn from the data.

      Below are more specific highlights on the addition in this revision:

      -       We included orthogonal techniques (thioflavin-T staining and Lyso-IP followed by differential extraction) and confirmed the accumulation of RIPA-insoluble protein aggregates at the lysosomes in cells under lipid perturbation (Figure 3).

      -       We performed TMT-Proteomics and identified accumulation of insoluble ESCRT components at the lysosomes under lipid perturbation (Figure 4). Two new authors involved in this effort are added onto the manuscript.

      -       The ESCRT result prompted us to revisit lysosomal membrane integrity. With improved imaging conditions and analysis we were able to see increased membrane permeabilization under lipid perturbation. VPS4A overexpression partially rescued this phenotype, suggesting that lipid accumulation impairs ESCRT disassembly (Figure 5).

      -       Together, the results suggest that lipid perturbation impairs ESCRT function, compromising both lysosomal membrane repair and microautophagy, resulting in the accumulation of endogenous protein aggregates at the lysosomes (Graphical Abstract).

      Reviewer #1 (Recommendations For The Authors):

      (1) Perhaps the most prominent limitation of this work is the unilateral focus on native cells (i.e. cells under no endogenous or exogenous stress) as the model for protein aggregate formation. Furthermore, although the ProteoStat stain has been utilized by many investigators before, the sole reliance on this stain as the read-out for their assays is concerning. To compound the concern, the ProteoStat-positive puncta co-localize with lysosmal markers which was surprising even to the authors. All in all, it behooves the authors to test proteostasis in multiple parallel ways to actually define what they are studying. How is it possible that protein aggregates under native conditions are only co-localized with lysosomes? Are we really studying protein aggregates which should predominantly be cytoplasmic insoluble aggregates?

      (a) They need to get away from a simple stain like ProteoStat and conduct co-stainings with other markers such as poly-ubiquitin antibodies and other chaperones to define what and where else exactly are these aggregates.

      Co-staining with poly-ubiquitin was included in the original manuscript. We added orthogonal staining with another widely used amyloid dye, Thioflavin-T, and provided fine-grained quantification of lysosomal vs cytosolic localization of various signals (Figures S4A-C & 3A-B).

      (b) They need to do Immunoblots with and without triton insolubility to see if these aggregates are insoluble as most would predict. They can do lysosomal isolation vs cytoplasmic to see if the insoluble aggregates are really lysosomal.

      We performed Lyso-IP followed by differential detergent extraction to confirm the accumulation of insoluble proteins at the lysosomes (Figure 3C). Proteomic analysis identified some of these insoluble proteins as ESCRT subunits (Figure 4).

      (c) They should compare aggregate formation in the native state versus cells with lysosomal inhibition via Bafilomycin or chloroquine versus cells with proteosomal inhibition. The lysosomal inhibition experiments are particularly informative given the lysosomal relevance they have uncovered.

      We included other small molecule inhibitors and at different time points to compare the effect of different modes of proteostasis challenge (Figure S4A-D). Together with the ESCRT finding, our results suggest the role of microautophagy in our system, and provide a model of how ProteoStat- and/or ubiquitin- positive substrates become partitioned between the cytoplasm and lysosomes under different perturbations.

      (d) Many protein aggregates which are too bulky for proteosome degradation will traditionally be dealt with by aggrephagy. Why is this not observed?

      Knockdown of core macroautophagy components did not impact Proteostat intensity in our CRISPRi screen, suggesting that basal macroautophagy plays a negligible role in clearing endogenous amyloid-like structures in our experimental system. We provide an alternative model that these aggregates instead arrive at the lysosomes via microautophagy.

      (2) After addressing #1, they can validate if the genes they identified by CRISPR screens are also important in modulation of protein aggregate burden in other systems. For example, if they inhibit lysosomes by Bafilo or Chloroquine to obtain protein aggregates and then Knockdown the identified genes in the CRISPR screens, will they get the same results?

      We addressed the effect of different modes of proteostasis challenge as recommended above. Deacidifying the lysosomes alone causes intense protein aggregation (Figure S4A-D) and eventually cell death, and was thus not combined with other perturbations.

      (3) They identify lysosomal lipid metabolism genes/pathways as the culprit for inducing proteostasis. In particular sphingolipid and cholesteryl ester species appear to be operational here. However, there are no specific lipids species or specific lipid metabolism gene that is causative. Rather, you have to knockdown entire processes to have an effect. This suggests that the focus on lysosome health (i.e. permeability, proteolysis, etc) is rudimentary. When you have to knockdown entire classes of lipids, this would indicate more broad effects on cellular lipids (including membrane lipids beyond the lysosome) and related cellular health?

      We included data on the effect of knocking down MYLIP, PSAP, and as a comparison PSMD2 on the growth rate of K562 cells (Figure S5A). MYLIP and PSAP KDs, which cause predominantly an accumulation of lipids, do not impede cell growth. Increasing lipid uptake by MYLIP KD increases cell proliferation under our culture conditions, suggesting a general negative impact on cell health was not required for the association between lipid levels and protein aggregates.

      (a) They conduct a superficial methyl-beta-cyclodextrin experiment with equivocal results. The use of MBCD for different time-courses to deplete various membrane cholesterol pools including the plasma membrane pool is important to ascertain what aspect of the cellular cholesterol is affecting proteostasis. MBCD +/- cholesterol reintroduction time-courses for rescue will also be key to determine the culprit cellular cholesterol pool.

      The MBCD / Filipin experiment helped us determine that ProteoStat doesn’t directly stain cholesterol, nor any major plasma membrane components. Free cholesterol was implicated in neither the screen nor the lipidomics and was not the subject of targeted experiments.

      (b) The same concept can be applied to sphingolipids. There are sphingolipids in abundance in multiple membrane compartments. Which ones are causal here? More nuanced evaluation of this with sphingolipid staining/tracking can be conducted.

      We attempted experiments where sphingolipids were added back to cells grown in FBS-depleted media. Nevertheless, we were not able to consistently deliver these lipid species and doing so while ensuring the correct subcellular localization at physiologically relevant level would require substantial methods development.

      (c) As part of this, are lipid rafts and/or caveolae being affected by the perturbations in cholesterol and sphingolipids? Lipid rafts are highly enriched in these 2 lipids which could link to their preteostasis observation.

      Indeed, ceramides released from SM hydrolysis are proposed to self-assembled into microdomains with negative curvature that can promote the formation of intralumenal vesicles (Alonso and Goni, 2018; Niekamp et al 2022). We propose that SM accumulation may hinder this process by counteracting the negative membrane curvature and impede microautophagy.

      (d) How about ER membrane lipids? The UPR and subsequent effects on proteostasis are intricately involved with ER lipid bilayer composition.

      We did not perform lipidomics on ER membranes in this study, though we note that at steady state, sphingolipids and cholesterol esters are not expected to be enriched at the ER (Ikonen and Zhou, 2021). We checked whether lipid-related genetic perturbations induced the UPR in published perturb-seq data in K562 cells. Neither MYLIP nor PSAP knockdown induced a UPR.

      In conclusion, the manuscript is interesting but the excitement over a link between lysosome-related lipid metabolism and proteostasis needs to be tamped until a more robust experimental approach is employed to generate supportive and corroborating results.

      Reviewer #2 (Recommendations For The Authors):

      - The paper has a number of grammatically awkward sentences. Editing these would enhance clarity.

      - It is important to show the co-localization of aggregates with the lysosome. This is shown in supplements but should be in a main figure. Here the authors cite previous work indicating that ProteoStat puncta co-localize with ubiquitinated proteins and state that they do not see this, then essentially just move on. Is there an explanation for this discrepancy and can it be resolved? What do they think is really going on? What happens to levels of ubiquitinated proteins when lipid metabolism is perturbed as in these experiments?

      We have included the lipid-induced lysosomal protein aggregation data in the main text (Figure 3A-B), and provided fine-grained quantification of the cytosolic-vs-lysosomal ProteoStat / Ub / ThT signals under different aggregate-inducing conditions (Figure S4A-D). We discuss these results in the main text and propose a model involving ESCRT-mediated microautophagy in the main text. This is supported further by the LysoIP-proteomics and LMP analysis.

      - Please add an indicator of amino acid numbers to Fig. 3C.

      These annotations are now included (now Figure S3C).

      - The legend for 3D is mislabelled.

      We have corrected the legend (now Figure S3D).

      Reviewer #3 (Recommendations For The Authors):

      Protein homeostasis and lipid homeostasis are both are important for maintaining cellular functions. However, the crosstalk remains largely unknown. The manuscript entitled as "Impairment of lipid homoeostasis causes accumulation of protein aggregates in the lysosome" deals with this interesting topic. An important link between lysosomal protein aggregation and sphingolipids/cholesterol esters metabolism were discovered. The topic belonging to the Cell Biology domain also falls into the aims and scope of eLife. Here are the revisions I recommend:

      (1) From lipidomics analysis, a remarkable correlation between levels of sphingomyelin and cholesterol ester and ProteoStat staining was found. Could the authors explain how sphingomyelin and cholesterol ester are quantified? The two lipids are not included as internal standards from the lipidomics experiment.

      Sphingomyelin and cholesterol ester internal standards are included in the Avanti 330707 SPLASH® LIPIDOMIX® Mass Spec Standard, which was supplied at 3% v/v to the MeOH/H2O cell lysis buffer. We have amended the Methods section to clarify this.

      (2) Could the authors perhaps delete Figure 1B and show it on Figure 2A only? There is no need to show the same figure two times. The threshold of both False Discovery Rate and Median Enrichment needs to be added. From Figure 2A, the Lysosomal hydrolases (GBA, LIPA, GALC) seems located in statistically insignificant region. Based on previous studies, the GBA could have an effect on sphingolipid levels, then how to explain that sphingomyelin was highly correlated with ProteoSate staining?

      We have combined the two volcano plots into a single figure (now Figure 1D), and added a line to help visualize the gene effects while considering the combined contribution of FDR and enrichment. Individual lysosomal hydrolases indeed have insignificant effects on ProteoStat and this is discussed in the main text as having relatively constrained impacts on the general lipidome. For example, while GBA and GALC KDs can lead to accumulation of their immediate substrates (glucosylceramide and galactosylceramide, respectively), they do not directly impinge on sphingomyelin.

      (3) The authors show the corelation between ProteoState staining and different lipids/lipid classes in Figure 3B and Figure S3A. It is not necessary to show the corelation with individual lipids (such as sphingomyelin(d18:1/24:0) and cholesterol ester(18:2). The corelation with full collection of lipid classes would be more representative, which is only list in Figure 3B and Figure S3A. It is suggested to add the information of how many individual lipids in each chass are used for the correlation analysis. Replace Figure 3A to Figure S3A, and put Figure 3A as supplementary figure are suggested.

      We decided to retain the correlation of two individual lipids (a sphingomyelin and a cholesterol ester species) with ProteoStat as examples to illustrate with clarity how we obtained the class-wide comparison. The number of individual lipids included in each class for correlation analysis is now included in Figures 2F and S3A.

      (4) The authors state that lipid uptake and metabolism modulate proteostasis. However, only cholesterol and LDL were tested. It would be more precise to state as cholesterol uptake and metabolism modulate proteostasis. In addition, sphingolipids and cholesterol esters accumulate with increased lysosomal protein aggregation. It would be interesting to see the effects of sphingolipids uptake, since sphingolipids are correlated with proteostasis better than cholesterol.

      We attempted to add back specific sphingolipids to assess sufficiency. However, we found it challenging to ensure that these lipids were distributed to the correct subcellular locations at physiologically relevant levels. Without this crucial information, it was difficult to draw any conclusions about the sufficiency of the sphingolipids we tested to impair proteostasis.

      Alonso A, Goñi FM. 2018. The Physical Properties of Ceramides in Membranes. Annu Rev Biophys 47:633–654. doi:10.1146/annurev-biophys-070317-033309

      Ikonen E, Zhou X. 2021. Cholesterol transport between cellular membranes: A balancing act between interconnected lipid fluxes. Dev Cell 56:1430–1436. doi:10.1016/j.devcel.2021.04.025

      Niekamp P, Scharte F, Sokoya T, Vittadello L, Kim Y, Deng Y, Südhoff E, Hilderink A, Imlau M, Clarke CJ, Hensel M, Burd CG, Holthuis JCM. 2022. Ca2+-activated sphingomyelin scrambling and turnover mediate ESCRT-independent lysosomal repair. Nat Commun 13:1875. doi:10.1038/s41467-022-29481-4

    1. Author response:

      We thank the editors and reviewers for their thorough evaluation of our manuscript. We appreciate the constructive feedback and insights provided. 

      We acknowledge that some of our conclusions would benefit from more measured statements and additional computational controls. We will revise the manuscript to better reflect the scope and limitations of our analytical approach. While we cannot add new experimental validations at this stage, we will strengthen our computational analyses and clarify our methodology.

      Below, we outline our planned revisions to address the major points raised in the public reviews:

      Clarification of Terms and Definitions:

      (1) We will make it clearer in our manuscript to emphasize that we reuse the same raw datasets from our previous study as described in Calendrilli et al, 2023, and there is no modification to the experimental methods or data. 

      (2) We will provide clear definitions for:

      - "Non-differentially expressed" genes

      - "Ctrl specific" RNA sets

      - The composition of control populations in different analyses

      (3) We will revise the use of "non-diffusive RNA-chromatin interactome" and “RNase-resistant” terminology to better reflect our actual findings.

      (4) We will also improve clarity regarding:

      - The rationale for focusing on specific genomic regions

      - The interpretation of evolutionary conservation data

      (5) We will provide additional rationale on the exclusion of short-range interactions.

      Figure Revisions:

      (1) Figure 3a: We will correct any discrepancy between text references and figure content.

      (2) Figure 4: We will standardize the color scheme between control and RNase-treated samples.

      (3) We will follow the reviewer's suggestion to move figure 1g to the supplementary file. 

      Additional Computational Analyses:

      (1) We will consider adding controls for RNA length effects and integrate any existing knowledge on the protection extent variation across different RBP.

      Discussions:

      (1) We will carefully rephrase our conclusions to more accurately reflect the scope and limitations of our computational findings, ensuring we do not overstate the implications.

      (2) We will expand the discussion of limitations, including:

      - The focus on RNase-resistant interactions only

      - The cell-type specificity of our findings

      - The lack of functional validation

      - The limited ability to discern and study the transient or weak RNA-chromatin interactions using the current dataset

      (3) Regarding the recent papers from Jenner and Davidovich groups about RNase treatment effects on chromatin solubility:

      - We will discuss these findings in our revised manuscript

      - We will address potential limitations this may impose on our interpretations

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work examines the binding of several phosphonate compounds to a membrane-bound pyrophosphatase using several different approaches, including crystallography, electron paramagnetic resonance spectroscopy, and functional measurements of ion pumping and pyrophosphatase activity. The work attempts to synthesize these different approaches into a model of inhibition by phosphonates in which the two subunits of the functional dimer interact differently with the phosphonate.

      Strengths:

      This study integrates a variety of approaches, including structural biology, spectroscopic measurements of protein dynamics, and functional measurements. Overall, data analysis was thoughtful, with careful analysis of the substrate binding sites (for example calculation of POLDOR omit maps).

      Weaknesses:

      Unfortunately, the protein did not crystallize with the more potent phosphonate inhibitors. Instead, structures were solved with two compounds with weak inhibitory constants >200 micromolar, which limits the molecular insight into compounds that could possibly be developed into small molecule inhibitors. Likewise, the authors choose to focus the spectroscopy experiments on these weaker binders, missing an opportunity to provide insight into the interaction between more potent binders and the protein.

      We acknowledge the reviewer concern regarding the choice of weaker inhibitors. We attempted co-crystallization with all available inhibitors, including those with higher potency. However, despite numerous efforts, these potent inhibitors yielded low-resolution crystals, making them unsuitable for detailed structural analysis. Therefore, we chose to focus on the weaker binders, as we were able to obtain high-quality crystal structures for these compounds. This allowed us to perform DEER spectroscopy with the added advantage of accurately analyzing the data against structural models derived from X-ray crystallography. Using these weaker inhibitors enabled a more precise interpretation of the DEER data, thus providing reliable insights into the conformational dynamics and inhibition mechanism. However, as suggested by the reviewer, in the revised version, we will perform DEER analysis on the more potent inhibitors to provide additional insight into their interactions.

      In general, the manuscript falls short of providing any major new insight into membrane-bound pyrophosphatases, which are a very well-studied system. Subtle changes in the structures and ensemble distance distributions suggest that the molecular conformations might change a little bit under different conditions, but this isn't a very surprising outcome. It's not clear whether these changes are functionally important, or just part of the normal experimental/protein ensemble variation.

      We respectfully disagree with the reviewer. The scale of motions seen in this study correspond to those seen in the full panoply of crystal structures of mPPases. Some proteins undergo very large conformational changes during catalysis – such as the rotary ATPase. This one doesn’t, meaning that the precise motions we describe are likely to be relevant. Conformational changes in the ensemble, whether large or small, represent essential protein motions which underlie key mPPase catalytic function. Our DEER spectroscopy data demonstrate the sensitivity and resolution necessary to monitor these subtle changes in equilibria, even if these are only a few Angstroms. For several of the conditions we investigated by DEER in solution, corresponding x-ray structures have been solved, with the derived distances agreeing well with the DEER distributions. This further validates the biological relevance of the structures, including serial time-resolved ones that indicate asymmetry.

      The ZLD-bound crystal structure doesn't predict the DEER distances, and the conformation of Na+ binding site sidechains in the ZLD structure doesn't predict whether sodium currents occur. This might suggest that the ZLD structure captures a conformation that does not recapitulate what is happening in solution/ a membrane.

      We agree with the reviewer that the ZLD-bound crystal structure does not predict the DEER distances. However, we believe this discrepancy arises from the effect of the bulkiness of ZLD inhibitor, which prevents the closure of the hydrolytic centre. Additionally, the absence of Na+ at the ion gate in the ZLD-bound structure suggests that Na+ transport does not occur, a conclusion further supported by our electrometric measurements. We agree with the reviewer, that the distances observed in the DEER experiments might represent a potential new conformation in solution, which may not be captured by the static X-ray structure, thereby offering insights into the dynamic nature of the protein under physiological conditions. Finally, the static x-ray structures have not captured the asymmetric conformations that must exist to explain half-of-the-sites reactivity.

      Reviewer #2 (Public review):

      Summary:

      Crystallographic analysis revealed the asymmetric conformation of the dimer in the inhibitor-bound state. Based on this result, which is consistent with previous time-resolved analysis, authors verified the dynamics and distance between spin introduced label by DEER spectroscopy in solution and predicted possible patterns of asymmetric dimer.

      Strengths:

      Crystal structures with inhibitor bound provide detailed coordination in the binding pocket thus useful information for the PPase field and maybe for drug development.

      Weaknesses:

      The distance information measured by DEER is advantageous for verifying the dynamics and structure of membrane protein in solution. However, regarding T211 data, which, as the authors themselves stated, lacks measurement precision, it is unclear for readers how confident one can judge the conclusion leading from these data for the cytoplasmic side.

      We thank the reviewer for acknowledging the advantageous use of the DEER methodology for identifying dynamic states of membrane proteins in solution. We used two sites in our analysis: S525 (periplasm) and T211 (cytoplasm). As we clearly stated in the original manuscript, S525R1 yielded high-quality DEER data, while T211R1 yielded weak (or no) visual oscillations, leading to broad, though different distributions for the several conditions tested. Our main conclusions are based on the S525R1 data. We included the T211R1 data because, although it does not provide definitive evidence, it is consistent with our proposed model and offers additional insights into biologically relevant conditions. Furthermore, the shifts in the centre of mass (Fig EV8D) of the broad T211R1 distributions show a trend that is consistent with our model; although not proving it, it does not exclude it either. Lastly, these data do indeed confirm an important structural feature of mPPase in solution conditions which is the intrinsically high dynamic state of the loop5-6 where T211 is located, and consistent with our previous (Kellosalo et al., Science,  2012; Li et al., Nat. Commun, 2016; Vidilaseris et al., Sci. Adv., 2019; Strauss et al., EMBO Rep., 2024) and current x-ray crystallography data.

      The distance information for the luminal site, which the authors claim is more accurate, does not indicate either the possibility or the basis for why it is the ensemble of two components and not simply a structure with a shorter distance than the crystal structure.

      We thank the reviewer for pointing out this possibility and alternative interpretation of our DEER data. In the revised version, we will show that our DEER data are consistent with (and do not exclude) asymmetry and rephrase to be inclusive of other possibilities. Importantly, this additional possibility does not affect the current interpretation of the data in our manuscript.

      Reviewer #3 (Public review):

      Summary:

      Membrane-bound pyrophosphatases (mPPases) are homodimeric proteins that hydrolyze pyrophosphate and pump H+/Na+ across membranes. They are attractive drug targets against protist pathogens. Non-hydrolysable PPi analogue bisphosphonates such as risedronate (RSD) and pamidronate (PMD) serve as primary drugs currently used. Bisphosphonates have a P-C-P bond, with its central carbon can accommodate up to two substituents, allowing a large compound variability. Here the authors solved two TmPPase structures in complex with the bisphosphonates etidronate (ETD) and zoledronate (ZLD) and monitored their conformational ensemble using DEER spectroscopy in solution. These results reveal the inhibition mechanism of these compounds, which is crucial for developing future small molecule inhibitors.

      Strengths:

      The authors show that seven different bisphosphonates can inhibit TmPPase with IC50 values in the micromolar range. Branched aliphatic and aromatic modifications showed weaker inhibition.

      High-resolution structures for TmPPase with ETD (3.2 Å) and ZLD (3.3 Å) are determined. These structures reveal the binding mode and shed light on the inhibition mechanism. The nature of modification on the bisphosphonate alters the conformation of the binding pocket.

      The conformational heterogeneity is further investigated using DEER spectroscopy under several conditions.

      Weaknesses:

      The authors observed asymmetry in the TmPPase-ELD structure above the hydrolytic center. The structural asymmetry arises due to differences in the orientation of ETD within each monomer at the active site. As a result, loop5-6 of the two monomers is oriented differently, resulting in the observed asymmetry. The authors attempt to further establish this asymmetry using DEER spectroscopy experiments. However, the (over)interpretation of these data leads to more confusion than any further understanding. DEER data suggest that the asymmetry observed in the TmPPase-ELD structure in this region might be funneled from the broad conformational space under the crystallization conditions.

      See also the response below - We respectfully disagree with the reviewer. The asymmetry was previously established using serial time crystallography (Strauss et al., EMBO Rep, 2024) and biochemical assays (e.g. Malinen et al., Prot. Sci., 2022; Artukka et al., Biochem J, 2018; Luoto et al., PNAS, 2013) and also partially seen in one static structure (Vidilaseris et al., Sci Adv 2019). DEER data only show that the previously proposed asymmetry could also be present within the conformational ensemble in solution conditions. Indeed, our data do not (and cannot) exclude this possibility.

      DEER data for position T211R1 at the enzyme entrance reveal a highly flexible conformation of loop5-6 (and do not provide any direct evidence for asymmetry, Figure EV8).

      Please see relevant response above. We acknowledge that T211 is indeed situated on a highly dynamic loop, which is important for gating and our DEER data confirm its high flexibility. Given we have not observed oscillations of this site, leading to broad distributions, we have stated in the original manuscript that we will not establish the presence of any asymmetry in solution on the basis of T211, rather relying on the S525 site, for which we have acquired high-quality DEER data, as was also pointed out and have been commented on by all reviewers.

      Similarly, data for position S521R1 near the exit channel do not directly support the proposed asymmetry for ETD.

      The reviewer appears to suggest that we hold the S525R1 DEER data as direct proof of asymmetry; this is combative on the grounds that to directly prove asymmetry would require time-resolved DEER measurements, far beyond the scope of this work. Rather, we have applied DEER measurements to explore whether asymmetry (observed previously via time-resolved X-ray crystallography) is also present (or indeed a possibility) in solution. We simply state that the DEER data are consistent with asymmetry (i.e., that the mean distance increases in the presence of ETD compared to the apo-state). This is a restrained interpretation of the data.

      Despite the high quality of the data, they reveal a very similar distance distribution. The reported changes in distances are very small (+/- 0.3 nm), which can be accommodated by a change of spin label rotamer distribution alone. Further, these spin labels are located on a flexible loop, thereby making it difficult to directly relate any distance changes to the global conformation

      We thank the reviewer for recognising the high quality of our DEER data for the S525R1, where visual oscillations in the raw traces, as in our case, reportedly lead to highly accurate and reliable distributions, able to separate (in fortuitous cases) helical movements of only a few Angstroms. The ability of DEER/PELDOR offering near Angstrom resolution was previously demonstrated by the acquisition and solution of high resolution multi-subunit spin-labelled membrane protein structures (Pliotas at al., PNAS, 2012; Pliotas et al., Nat Struct Mol Biol, 2015; Pliotas, Methods Enzymol, 2017) as well as it ability in detecting small (and of similar to mPPase magnitude) conformational changes in different integral membrane proteins systems (Kapsalis et al., Nature Comms, 2019; Kubatova et al., PNAS, 2023; Schmidt et al., JACS, 2024; Lane et al., Structure, 2024; Hett et al., JACS, 2021; Zhao et al., Nature, 2024), occurring under different conditions and/or stimuli in solution and/or lipid environment. The changes here are not very small (e.g. ~ 7 Angstroms between the two mean distance extremes (Ca vs IDP)) for DEER’s proven detection sensitivity, and with all other conditions showing changes between those extremes.

      These changes are relatively small, but they are expected for membrane ion pumps. Indeed, none of the mPPase structures show helical movements of greater than a half a turn, and that only in helices 6 and 12. There appear to be larger-scale loop closing motions of the 5-6 loop that includes T211, due to the presence of E217 which binds to one of the Mg2+ ions that coordinate the leaving group phosphate. (This is, inter alia, the reason that this loop is so flexible: it can not order before substrate is bound.) Here we have the resolution to detect such subtle differences by DEER, given there are clear shifts in our time domain data and these are reflected in the mean distances in the distributions. Therefore, our study demonstrates the sensitivity and resolution DEER offers in detecting subtle conformational transitions, key in membrane proteins pathways. To further belabour this point, we do not quantify the DEER data (for instance through parametric fitting) to extract populations of different conformational states and we appreciate that to do so would be highly prone to error; however we do (and can, we feel without overinterpretation) assert that the mean distances shift.

      The interpretations listed below are not supported by the data presented:

      (1) 'In the presence of Ca2+, the distance distribution shifts towards shorter distances, suggesting that the two monomers come closer at the periplasmic side, and consistent with the predicted distances derived from the TmPPase:Ca structure.' Problem: This is a far-stretched interpretation of a tiny change, which is not reliable for the reasons described in the paragraph above.

      While the authors overall agree with the reviewer assessment that ±0.3 nm is a small (not a minor) change, there are literature examples quantifying (or using for quantification) distribution peaks separated by similar Δr. (Kubatova et al., PNAS, 2023; Schmidt et al., JACS, 2024; Hett et al., JACS, 2021; Zhao et al., Nature, 2024). In particular, none of the mPPase structures show helical movements of greater than a half a turn (in helices 6 and 12 in particular). There appear to be larger-scale loop closing motions of the 5-6 loop that includes T211, due to the presence of E217 which binds to one of the Mg2+ ions that coordinate the leaving group phosphate. (This is, inter alia, the reason that this loop is so flexible: it can not order before substrate is bound.)

      Importantly, we have fitted Gaussians to the experimental distance distributions of 525R1 output by the Comparative Deer Analyzer 2.0 and observed a change in the distribution width in presence of Ca2+, implying the rotameric freedom of the spin label is restricted. However, the CW-EPR for 525R1 indicate that the rotational correlation time of the spin label is highly consistent between conditions (the spectra are almost identical); this cannot be explained simply by rotameric preference of the spin label (as asserted by the reviewer 3), as there is no (further) immobilisation observed from the CW-EPR of apo-state (Figure EV9) to that in presence of Ca2+. Furthermore, in the absence of conformational changes, it is reasonable to assume (and demonstrable from the CW-EPR data) that the rotamer cloud should not significantly change between conditions. However, Gaussian fits of the two extreme cases yielding the longest (i.e., in presence of IDP) and shortest (in presence of ZTD) mean distances for the 525R1 DEER data indicated significant (i.e., above the noise floor after Tikhonov validation) probability density for the IDP condition at 50 Å (P(r) = 0.18). This occurs at four standard deviations above the mean of the ZTD condition, which by random chance should occur with <0.007% probability. Indeed, one can say that to observe 18% probability density at four standard deviations above the mean by random chance would occur on the order of one in 4 x 10^6.

      As in previous response the method can detect changes of such magnitude which are not small, but physiologically relevant and expected for integral membrane proteins, such as mPPases. Indeed, even in equal (or more) complex systems such as heptameric mechanosensitive channel proteins DEER provided sub-Angstrom accuracy, when a spin labelled high resolution XRC structure was solved (Pliotas et al., PNAS, 2012; Pliotas et al., Nat Struct Mol Biol, 2015). Despite this is ideal case where DEER accuracy was experimentally validated another high resolution structural method on modified membrane protein and is not very common it demonstrates the power of the method , especially when strong oscillations are present in the raw DEER data (as here for mPPase 525R1), even when multiple distances are present, Angstrom resolution is achievable in such challenging protein classes.

      (2) 'Based on the DEER data on the IDP-bound TmPPase, we observed significant deviations between the experimental and the in silico distances derived from the TmPPase:IDP X-ray structure for both cytoplasmic- (T211R1) and periplasmic-end (S525R1) sites (Figure 4D and Figure EV8D). This deviation could be explained by the dimer adopting an asymmetric conformation under the physiological conditions used for DEER, with one monomer in a closed state and the other in an open state.'

      Problem: The authors are trying to establish asymmetry using the DEER data. Unfortunately, no significant difference is observed (between simulation and experiment) for position 525 as the authors claim (Figure 4D bottom panel). The observed difference for position 112 must be accounted for by the flexibility and the data provide no direct evidence for any asymmetry.

      Reviewer 3 is wrong in suggesting that we are trying to prove asymmetry through the DEER data. That is a well-known fact in the literature (eg Vidilaseris et al, Sci Adv 2019 where we show (1) that the exit channel inhibitor ATC (i.e., close to 525) binds better in solution to the TmPPase:PPi complex than the TmPPase:PPi2 complex, and (2) that ATC binds in an asymmetric fashion to the TmPPase:IDP2 complex with just one ATC dimer on one of the exit channels. We merely use the DEER data to support this well-established fact.

      However, we agree that the DEER data in presence of IDP does not provide direct proof for asymmetry; particularly mutant T211R1 yields in silico distributions too short for measurement by DEER. It is possible that the deviations observed (and particularly likely for T211R1) arise from conformational heterogeneity in solution. We will rephrase this paragraph accordingly: “Owing to the broad nature of the T211R1 (cytoplasmic site) distance distributions, we refrain from interpreting shifts in this data. For the 525R1 (periplasmic site) for which we obtained data of high quality (as also pointed out by both reviewers 2 and 3) we observed deviations between the experimental and the in-silico distances derived from the TmPPase:IDP X-ray structure. While this deviation is less pronounced than for the +ZTD condition, the deviation is consistent with an asymmetric conformation in solution.”

      (3) 'Our new structures, together with DEER distance measurements that monitor the conformational ensemble equilibrium of TmPPase in solution, provide further solid experimental evidence of asymmetry in gating and transitional changes upon substrate/inhibitor binding.'

      Problem: See above. The DEER data do not support any asymmetry.

      We feel that the reviewer comments here are somewhat unfounded. The DEER data (and we will limit discussion only to the 525R1 mutant in this regard) satisfy relevant criteria of the white paper (Schiemann et al., 2021, JACS) from the EPR community (signal-to-noise ratio w.r.t modulation depth of > 20 in all cases; replicates have been performed and will be added into the main-text or supplementary; near quantitative labelling efficiency (evidenced by lack of free spin label signal in the CW-EPR spectra); analysed using the CDA (now Figure EV10, this data we will promote to the main-text) to avoid confirmation bias).

      While the DEER data do not prove asymmetry, we do not claim proof of asymmetry in the above sentence. We concede to rephrase the offending sentence above as: “Our new structures, together with DEER distance measurements that monitor the conformational ensemble of TmPPase in solution, do not exclude asymmetry in gating and transitional changes upon substrate/inhibitor binding and are consistent with our proposed model.” We feel that this reframed conjecture of asymmetry is well founded; indeed, comparing the experimental apo-state 525R1 distance distribution with in-silico modelling performed on the hybridised asymmetric structure (i.e., comprised of one monomer bound to Ca2+ and another bound to IDP) yields an overlap coefficient (Islam and Roux, JPC B, 2015) of >0.97. This implies the envelope of the modelled distance distribution is quantitatively inside the envelope of the experimental distance distribution. Thus, the DEER data do not exclude asymmetry (previously observed by time-resolved XRC) in solution. While we appreciate that ideally one would measure time-resolved DEER to directly correlate kinetics of conformational changes within the ensemble to the catalytic cycle of mPPase,(and this is something we aim to do in the future), it is beyond the the scope of this study.

      Indeed, half-of-the-sites reactivity has been demonstrated in at least the following papers (Vidilaseris et al, Sci Acv. ,2019, Strauss et al, EMBO Rep. 2024, Malinen et al Prot Sci, 2022, Artukka et al Biochem J, 2018; Luoto et al, PNAS, 2013). Half-of-the sites activity requires asymmetry in the mechanism, and therefore asymmetric motions in the active site (viz 211) and exit channel (viz 525). As mentioned above, we have demonstrated this for other inhibitors (Vidilaseris et al 2019) and as part of a time-resolved experiment (Strauss et al 2024). In fact, given the wealth of evidence showing that the symmetrical crystal structures sample a non- or less-productive conformation of the protein, it would be quixotic to propose the DEER experiments - in solution - do not generate asymmetric conformations. It certainly doesn’t obey Occam’s razor of choosing the simplest possible explanation that covers the data.

      (4) Based on these observations, and the DEER data for +IDP, which is consistent with an asymmetric conformation of TmPPase being present in solution, we propose five distinct models of TmPPase (Figure 7).

      Problem: Again, the DEER data do not support any asymmetry and the authors may revisit the proposed models.

      We respectfully disagree with the reviewer. Please see our detailed response above. However, in the revised version, we will clarify that the proposed models are not solely based on the DEER data but are grounded in both current and previously solved structures, with the DEER data providing additional consistency with these models.

      (5) 'In model 2 (Figure 7), one active site is semi-closed, while the other remains open. This is supported by the distance distributions for S525R1 and T211R1 for +Ca/ETD informed by DEER, which agrees with the in silico distance predictions generated by the asymmetric TmPPase:ETD X-ray structure'

      Problem: Neither convincing nor supported by the data

      We respectfully disagree with the reviewer. However, owing to the conformational heterogeneity of T211R1, in the revised version, we will exclude it in the above sentence, to the effect: Please see our detailed response above.

    1. Author Response:

      Thank you for your interest in our paper. We would also like to thank the anonymous reviewers for their critical and constructive comments. Although the reviewers found our work interesting, they raised several important concerns about our study. To address these concerns, mostly we will perform new experiments as following.

      1. Examine whether antioxidant-NAC can block SFN-induced TFEB-nuclear translocation in NPC cells;

      2. Examine whether calcineurin inhibitor (FK506+CsA) or Ca 2+ inhibitor (Bapta-AM) can block SFN-induced TFEB-nuclear translocation in NPC cells.

      3. Investigate whether cholesterol was cleared by activation of TFEB by SFN in vivo tissues.

      4. Investigate whether SFN-evoked the lysosomal exocytosis is TFEB-dependent by using TFEB-KO cells.

      5. Examine the effect of NPC1 deficiency on dextran trafficking by studying the localization of CF- dex and Lamp1.

      6. Perform cytotoxicity experiments to examine whether SFN used in this study is cytotoxic in various cell lines

      In addition, according to the reviewers’ suggestions, we will make clarifications and corrections wherever appropriate in the manuscript. Below please find our point-by-point responses and plans to the reviewers’ comments.

      Reviewer #1 (Public review):

      Summary:

      The authors are trying to determine if SFN treatment results in dephosphorylation of TFEB, subsequent activation of autophagy-related genes, exocytosis of lysosomes, and reduction in lysosomal cholesterol levels in models of NPC disease.

      Strengths:

      (1) Clear evidence that SFN results in translocation of TFEB to the nucleus.

      (2) In vivo data demonstrating that SFN can rescue Purkinje neuron number and weight in NPC1-/- animals.

      Thank you for the support!

      Weaknesses:

      (1) Lack of molecular details regarding how SFN results in dephosphorylation of TFEB leading to activation of the aforementioned pathways. Currently, datasets represent correlations.

      Thank you for this constructive comment. The reviewer is right that in this manuscript the molecular mechanism of SFN-activated TFEB has not been discussed in details. Because previously we have shown that SFN induces TFEB nuclear translocation via a Ca 2+ - dependent but MTOR (mechanistic target of rapamycin kinase)-independent mechanism through a moderate increase in reactive oxygen species (ROS). And calcineurin-mediated TFEB dephosphorylation underlies SFN-induced TFEB activation. These data have been published in 2021 autophagy (Li, Shao et al. 2021) . Therefore, in this study we did not mention this part. We will add the molecular mechanism of TFEB activation by SFN in the discussion part. And to further confirm this mechanism in NPC cells, we will also perform experiments including: 1) examine whether antioxidant-NAC can block SFN-induced TFEB-nuclear translocation in NPC cells; 2) examine whether calcineurin inhibitor (FK506+CsA) can block SFN-induced TFEB-nuclear translocation in NPC cells.

      (2) Based on the manuscript narrative, discussion, and data it is unclear exactly how steady-state cholesterol would change in models of NPC disease following SFN treatment. Yes, there is good evidence that lysosomal flux to (and presumably across) the plasma membrane increases with SFN. However, lysosomal biogenesis genes also seem to be increasing. Given that NPC inhibition, NPC1 knockout, or NPC1 disease mutations are constitutively present and the cell models of NPC disease contain lysosomes (even with SFN) how could a simple increase in lysosomal flux decrease cholesterol levels? It would seem important to quantify the number of lysosomes per cell in each condition to begin to disentangle differences in steady state number of lysosomes, number of new lysosomes, and number of lysosomes being exocytosed.

      Thank you for the suggestion. It is important to define the three states 1) original number of lysosomes, 2) number of new lysosomes, and 3) number of lysosomes being exocytosis. However, we have checked literature, so far it seems that there is no good method that could clearly differentiate the three states of lysosomes.

      (3) Lack of evidence supporting the authors' premise that "SFN could be a good therapeutic candidate for neuropathology in NPC disease".

      Suggestion was taken! We will investigate whether cholesterol was reduced by activation of TFEB by SFN in vivo to strength the point that SFN could be a potential therapeutic compound for NPC treatment. And to avoid confusion, we have removed this sentence.

      Reviewer #2 (Public review):

      Summary:

      This study presents a valuable finding that the activation of TFEB by sulforaphane (SFN) could promote lysosomal exocytosis and biogenesis in NPC, suggesting a potential mechanism by SFN for the removal of cholesterol accumulation, which may contribute to the development of new therapeutic approaches for NPC treatment.

      Strengths:

      The cell-based assays are convincing, utilizing appropriate and validated methodologies to support the conclusion that SFN facilitates the removal of lysosomal cholesterol via TFEB activation.

      Weaknesses:

      (1) The in vivo experiments demonstrate the therapeutic potential of SFN for NPC. A clear dose-response analysis would further strengthen the proposed therapeutic mechanism of SFN. Additional data supporting the activation of TFEB by SFN for cholesterol clearance in vivo would strengthen the overall impact of the study

      We understand the reviewer’s point. We examined two doses of SFN-30 and 50mg/kg. As shown in Fig.6, SFN (50mg/kg), but not 30mg/kg prevents a degree of Purkinje cell loss in the lobule IV/V of cerebellum, suggesting a dose-correlated preventive effect of SFN. In vivo experiments with higher concentrations of SFN and optimized dosage form of SFN were planned in the future study, but will not be included in this study.

      We will investigate whether cholesterol was cleared by activation of TFEB by SFN in vivo.

      (2) In Figure 4, the authors demonstrate increased lysosomal exocytosis and biogenesis by SFN in NPC cells. Including a TFEB-KO/KD in this assay would provide additional validation of whether these effects are TFEB-dependent.

      Thank you for this valuable suggestion. We will investigate whether SFN-evoked the lysosomal exocytosis is TFEB-dependent by using TFEB-KO cells.

      (3) For lysosomal pH measurement, the combination of pHrodo-dex and CF-dex enables ratiometric pH measurement. However, the pKa of pHrodo red-dex (according to Invitrogen) is ~6.8, while lysosomal pH is typically around 4.7. This discrepancy may account for the lack of observed lysosomal pH changes between WT and U18666A-treated cells. Notably, previous studies (PMID: 28742019) have reported an increase in lysosomal pH in U18666A-treated cells.

      We understand the reviewer’s point. But we used pHrodo™ Green-Dextran (P35368, Invitrogen), but not pHrodo red-dex to measure the lysosomal luminal acidity. According to the product information from Invitrogen, pHrodo Green-dex conjugates are non-fluorescent at neural pH, but fluorescence bright green at acidic pH ranges 4-9, such as those in endosomes and lysosomes. Therefore, pHrodo Green-dex can be used to monitor the acidity of lysosome (Hu, Li et al. 2022) . We also used LysoTracker Red DND-99 (Thermo Scientific, L7528) to measure lysosomal pH (Fig. 4G, H), which is consistent with results of pHrodo Green/CF measurement. Overall, in our hands, we have not detected pH change of lysosomes in U18666A-treated NPC1 cell models.

      (4) The authors are also encouraged to perform colocalization studies between CF-dex and a lysosomal marker, as some researchers may be concerned that NPC1 deficiency could reduce or block the trafficking of dextran along endocytosis.

      Suggestion was taken! We will examine the effect of NPC1 deficiency on dextran trafficking by studying the localization of CF-dex and Lamp1.

      (5) In vivo data supporting the activation of TFEB by SFN for cholesterol clearance would significantly enhance the impact of the study. For example, measuring whole-animal or brain cholesterol levels would provide stronger evidence of SFN's therapeutic potential.

      We really appreciate the reviewer’s suggestions. We will investigate whether cholesterol was cleared by activation of TFEB by SFN in vivo.

      Reviewer #3 (Public review):

      Summary:

      The authors demonstrate that activation of TFEB facilitates cholesterol clearance in cell models of Niemann-Pick type C (NPC). This is done through a variety of approaches including activation of TFEB by sulforaphane (SFN), a naturally occurring small-molecule TFEB agonist. SFN induces TFEB nuclear translocation and promotes lysosomal exocytosis. In an NPC mouse model, SFN dephosphorylates/activates TFEB in the brain and rescues the loss of Purkinje cells.

      Strengths:

      NPC is a severe disease and there is little in the way of treatment. The manuscript points towards some treatment options. However, the title, the title "Small-molecule activation of TFEB Alleviates Niemann-Pick Disease..." is far too strong and should be changed.

      Weaknesses:

      (1) The manuscript is extremely hard to read due to the writing; it needs careful editing for grammar and English.

      We will thoroughly check grammar to improve the manuscript.

      (2) There are a number of important technical issues that need to be addressed.

      We will address the technical issues mentioned in the following.

      (3) The TFEB influence on filipin staining in Figure 1A is somewhat subtle. In the mCherry alone panels there is a transfected cell with no filipin staining and the mCherry-TFEBS211A cells still show some filipin staining.

      We understand the reviewer’s point. We will investigate whether cholesterol is cleared by activation of TFEB by SFN in vivo.

      (4) Figure 1C is impressive for the upregulation of filipin with U18666A treatment. However, SFN is used at 15 microM. This must be hitting multiple pathways. Vauzour et al (PMID: 20166144) use SFN at 10 nM to 1microM. Other manuscripts use it in the low microM range. The authors should repeat at least some key experiments using SFN at a range of concentrations from perhaps 100 nM to 5 microM. The use of 15 microM throughout is an overall concern.

      We understand the reviewer’s point. See RESPONSE #1, previously we have shown that SFN (10–15 μM, 2–9 h) induces robust TFEB nuclear translocation in a dose- and time-dependent manner in HeLa GFP-TFEB stable cells as well as in other human cell lines without cytotoxicity (Li, Shao et al. 2021) . According to previous results, in this study, we chose SFN (15 μM) to examine its effect on cholesterol clearance. We will add the information in the discussion part. In this study, we will perform dose-response TFEB nuclear translocation in NPC model cells as well as cytotoxicity experiments to examine whether the concentrations of SFN used in various cell lines are toxic.

      References:

      Hu, M. Q., P. Li, C. Wang, X. H. Feng, Q. Geng, W. Chen, M. Marthi, W. L. Zhang, C. L. Gao, W. Reid, J. Swanson, W. L. Du, R. Hume and H. X. Xu (2022). "Parkinson's disease-risk protein TMEM175 is a proton-activated proton channel in lysosomes.” Cell 185(13): 2292-+.

      Li, D., R. Shao, N. Wang, N. Zhou, K. Du, J. Shi, Y. Wang, Z. Zhao, X. Ye, X. Zhang and H. Xu (2021). “Sulforaphane Activates a lysosome-dependent transcriptional program to mitigate oxidative stress.” Autophagy 17(4): 872-887.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The work from Petazzi et al. aimed at identifying novel factors supporting the differentiation of human hematopoietic progenitors from induced pluripotent stem cells (iPSCs). The authors developed an inducible CRISPR-mediated activation strategy (iCRISPRa) to test the impact of newly identified candidate factors on the generation of hematopoietic progenitors in vitro. They first compared previously published transcriptomic data of iPSCderived hemato-endothelial populations with cells isolated ex vivo from the aorta-gonadmesonephros (AGM) region of the human embryo and they identified 9 transcription factors expressed in the aortic hemogenic endothelium that were poorly expressed in the in vitro differentiated cells. They then tested the activation of these candidate factors in an iPSCbased culture system supporting the differentiation of hematopoietic progenitors in vitro. They found that the IGF binding protein 2 (IGFBP2) was the most upregulated gene in arterial endothelium after activation and they demonstrated that IGFBP2 promotes the generation of functional hematopoietic progenitors in vitro.

      Strengths:

      The authors developed an extremely useful doxycycline-inducible system to activate the expression of specific candidate genes in human iPSC. This approach allows us to simultaneously test the impact of 9 different transcription factors on in vitro differentiation of hematopoietic cells, and the system appears to be very versatile and applicable to a broad variety of studies.

      The system was extensively validated for the expression of 1 transcription factor (RUNX1) in both HeLa cells and human iPSC, and a detailed characterization of this test experiment was provided.

      The authors exhaustively demonstrated the role of IGFBP2 in promoting the generation of functional hematopoietic progenitors in vitro from iPSCs. Even though the use of IGFBP2interacting proteins IGF1 and IGF2 have been previously reported in human iPSC-derived hematopoietic differentiation in vitro (Ditadi and Sturgeon, Methods 2016; Ng et al., Nature Biotechnology 2016), and IGFBP-2 itself has been shown to promote adult HSC expansion ex vivo (Zhang et al., Blood 2008), its role on supporting in vitro hematopoiesis was demonstrated here for the first time.

      Weaknesses:

      Although the authors performed a very thorough characterization of the system in proof-ofprinciple experiments activating a single transcription factor, the data provided when 9 independent factors were used is not sufficient to fully validate the experimental strategy. Indeed, in the current version of the manuscript, it is not clear whether the results presented in both the scRNAseq analysis and the functional assays are the consequence of the simultaneous activation of all 9 TF or just a subset of them. This is essential to establish whether all the proposed factors play a role during embryonic hematopoiesis, and a more complete analysis of the scRNAseq dataset could help clarify this aspect.

      Similarly, the data presented in the manuscript are not sufficient to clarify at what stage of the endothelial-to-hematopoietic transition (EHT) the TF activation has an impact. Indeed, even though the overall increase of functional hematopoietic progenitors is fully demonstrated, the assays proposed in the manuscript do not clarify whether this is due to a specific effect at the endothelial level or to an increased proliferation rate of the generated hematopoietic progenitors. Similar conclusions can be applied to the functional validation of IGFBP2 in vitro.

      The overall conclusions are sometimes vague and not always supported by the data. For instance, the authors state that the CRISPR activation strategy resulted in transcriptional remodeling and a steer in cell identity, but they do not specify which cell types are involved and at what level of the EHT process this is happening. In the discussion, the authors also claim that they provided evidence to support that RUNX1T1 could regulate IGFBP2 expression. However, this is exclusively based on the enrichment of RUNX1T1 gRNA in cells expressing higher levels of IGFBP2 and it does not demonstrate any direct or indirect association of the two factors.

      We thank the reviewer for the positive comments about the importance of our work and have now addressed the points raised as weaknesses by performing additional analysis and experiments, adding a new schematic of the mechanism, and rewording our claims.

      We have clarified the different effects mediated by the activation and the IGFBP2 addition in a summary section at the end of the results and added Figure 6, showing this in visual form. We have also clearly stated the limitations related to the correlation between RUNX1T1 and IGFBP2 in the discussion and toned down our claims regarding this throughout the entire paper. We have also reworded the text to clarify the specific cell types identified in the sequencing data that we refer to.

      Reviewer #2 (Public Review):

      To enable robust production of hematopoietic progenitors in-vitro, Petazzi et al examined the role of transcription factors in the arterial hemogenic endothelium. They use IGFBP2 as a candidate gene to increase the directed differentiation of iPSCs into hematopoietic progenitors. They have established a novel induced-CRISPR mediated activation strategy to drive the expression of multiple endogenous transcription factors and show enhanced production of hematopoietic progenitors through expansion of the arterial endothelial cells. Further, upregulation of IGFBP2 in the arterial cells facilitates the metabolic switch from glycolysis to oxidative phosphorylation, inducing hematopoietic differentiation. While the overall study and resources generated are good, assertions in the manuscript are not entirely supported by the experimental data and some claims need further experimental validation.

      We thank the reviewer for the positive comments, and we have provided new data and analysis to make sure that all our assertations are clearly supported and also reworded those where limitations were identified by the reviewers.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      The assessment could change from "incomplete" to "solid" if the authors: i) improve data analysis (for both scRNAseq and functional assays) by providing additional information that could strengthen their conclusions, as suggested in the specific comments by both reviewers; ii) either provide new functional evidence supporting their mechanistic conclusion or alternatively tone down the claims that are not fully supported by data and acknowledge the limitations raised by reviewers in the discussion; (iii) the issue of paracrine signaling to expand only hematopoietic progenitors needs to be addressed.

      We have now improved the data analysis and provided additional functional tests to strengthen our conclusions and toned down those that were identified by the reviewers as not supported enough and included a discussion on these limitations. We have also reworded the section about the paracrine signaling throughout the paper.

      Reviewer #1 (Recommendations For The Authors):

      Figure 1 contains exclusively published data. It might be more appropriate to use it as a supplementary figure or as part of a more exhaustive figure (maybe combining Figures 1 and 2 together?).

      Figure 1 contained novel bioinformatic analyses that represent the base of our research and it has a different content and focus to figure 2, which is already a large figure. We therefore believe it is better to keep it as a separate figure, containing a new panel now too. 

      It seems there is an issue with Figure S3 labelling:

      • In line 112, Figure S2A-B does not display genomic PCR and sequencing results;

      • In line 123, Figure S3D-E does not show viability and proliferation data;

      • In line 127, Figure S3G does not show mCherry expression in response to DOX;

      We apologies for the confusion with the numbers, we have now correctly labelled the figures.

      It would be more informative to include gates and frequency on flow cytometry plots in Figure S3, to be able to evaluate the extent of the reduction in mCherry expression.

      We have now included the gating and frequency of mCherry-expressing cells in Supplementary Figure 3D.

      It is not clear from the text and figures whether the SB treatment was maintained throughout the hematopoietic differentiation protocol (line 122):

      • If so, it would be important to confirm that HDAC treatment does not affect EHT cultures

      • If not, can the authors provide some evidence that transgene silencing is not occurring during hematopoietic differentiation?

      We have clarified that we decided to treat the cells with SB exclusively in maintenance condihons because HDACs have been shown to be essenhal for the EHT (lines 138-142). We have now also included addihonal data showing the high expression of the mCherry tag reporhng the iSAM expression on day 8 (Supplementary Figure 4F).

      Can the authors provide a simple diagram summarizing the experimental strategy for each differentiation experiment in the respective supplementary figure? For instance, at what stage of the protocol was DOX added in Figure 3? Or at what stage IGFBP2 was added in Figure 5? It would be a very useful addition to the interpretation of the results.

      We have now included three schemahcs for all the experiments in the manuscript in supplementary figure 4 A-C.

      In Figure 3, the authors should provide more detailed information about the data filtering of the scRNAseq experiment, and more specifically:

      • How many cells were included in the analysis for each library after QC and filtering?

      • How "cells in which the gRNAs expression was detected" were selected? Do they include only cells showing expression of gRNAs for all 9 TF?

      This informahon is now included in the method sechon lines 773-781; the detailed code is available on the GitHub link provided in the same sechon. We have filtered the cells expressing one gRNA for the non-targehng gRNA (iSAM_NT) control and more than one for the iSAM_AGM sample. 

      In Figure 3A, it is not clear whether the expression of the 9 factors is consistently detected in all cells or just a subset of them, and the heatmap in Figure 3A does not provide this information. It would be more accurate to provide expression on a per-cell basis, for instance, as a violin plot displaying single dots representing each cell. 

      We have now included this violin plot in Supplementary Figure 4G as requested. However, this visualisation is difficult to interpret because some of the target genes’ expression seems variable in both experimental and control conditions. We had envisaged that this could have been the case and so this is why we had included the three different controls.  For this reason we chose to show the normalised expression which takes all the different variables into account (Figure 3A). 

      In Figure 3B-C, it seems that clusters EHT1 and EHT2 do not express endothelial markers anymore. Are these fully differentiated hematopoietic cells rather than cells undergoing EHT? In general, it would be quite important to provide evidence of expressed marker genes characterizing each cluster (eg. heatmap summarizing top DEG in the supplementary figure?). 

      We have now provided a spreadsheet containing the clusters’ markers that we used in

      Supplementary Table 1) a heatmap in Figure 3E. Furthermor,e we have now edited Figure 3C to include Pan Endothelial markers (PECAM1 and CDH5). These data show that the EHT1 and EHT2 cluster both express endothelial markers but are progressively downregulated as expected during endothelial to hematopoietic transition. We have also included and discussed this in the manuscript lines 192-195 and a schematic for the mechanism in Figure 6.

      In Figure 3E, displaying the proportion of clusters within each sample/library would be a more accurate way of comparing the cell types present in each library (removing potential bias introduced by loading different numbers of cells in each sample).

      We have now included the requested data in Supplementary Figure 4I and it confirms again the expansion of arterial cells in the activated cells.    

      In Figure 3G, by plating 20,000 total CD34+, the assay does not account for potential differences in sample composition. It is then hard to discriminate between the increased number of progenitors in the input or an enhanced ability of HE to undergo EHT. This is an important aspect to consider to precisely identify at what level the activation of the 9 factors is acting. A proper quantification of flow cytometry data summarizing the % of progenitors, arterial cells, etc. would be useful to interpret these results.

      Lines 204-205 reworded. We are very much aware of the fact that the CD34+ cell population consists of a range of cells across the EHT process and this is precisely why we carried out this single cell sequencing analyses.  We purposely tested the effect of the observed changes in composition by colony assays

      In Figure 3G, it seems that NT cells w/o DOX have very little CFU potential (if any). Can the authors provide an explanation for this?

      We think that the limited CFU potential is due to the extensive genetic manipulation and selection that the cells underwent for the derivation of all the iSAM lines but this did not impede us from observing an effect of gene activation on CFU numbers. This is one of the primary reasons that we then validated our overall findings using the parental iPSC line in control condition and with the addition of IGFBP2. We show that the parental iPSC line gives rise to hematopoietic progenitor, both immunophenotypically (Figure 4D) and functionally, at expected levels (Figure 4B left column).

      Figure 4A shows an upregulation of IGFBP2 in arterial cells as a result of TF activation. However, from the data presented here, it is not possible to evaluate whether this is specific to the arterial cluster, or it is a common effect shared by all cell types regardless of their identity. 

      Data has now been included in Supplementary Figure 4H, which shows that all the cells show an increase in IGFBP2, but arterial cells show the highest increase. We have now edited the text to reflect this, in lines 228-230.

      In Figure 5A-B only a minority of arterial cells express RUNX1 in response to IGFBP2 treatment. Is this sufficient to explain the very significant increase in the generation of functional hematopoietic progenitors described in Figure 4? Quantification and statistical analysis of RUNX1 upregulation would strengthen this conclusion.

      We have now provided the statistical analysis showing significant upregulation of RUNX1 upon IGFBP2 addition. The p values are now provided in the figure 5 legend.

      In Figure 5 the authors conclude that IGFBP2 remodels the metabolic profile of endothelial cells. However, it is not clear which cell types and clusters were included in the analysis of Figure 5C-G. Is the switch from Glycolysis to Oxidative Phosphorylation specific to endothelial cells? Or it is a more general effect on the entire culture, including hematopoietic cells? 

      We based this conclusion on the fact that the single-cell RNAseq allows to verify that the metabolic differences are obtained in the endothelial cells. Given that we sorted the adherent cells, the majority of these are endothelial cells as shown in Figure 5A. The Seahorse pipeline includes a number of washing steps resulting in the analyses being performed on the adherent compartment which we know consists primarily of endothelial cells. We cannot exclude some contamination from non-endothelial cells but we highlight to this reviewer that the initial observation of the metabolic changes was identified in endothelial cells in the single cell sequencing data. Taken together, we believe that this implies that metabolic changes are specific to this population. We have clarified this in the line 317.

      In the discussion, the authors conclude that they "provide evidence to support the hypothesis that RUNX1T1 could regulate IGFBP2 expression". To further support this conclusion, the authors could provide a correlation analysis of the expression of the two genes in the cell type of interest. 

      Following the observation of the IGFBP2 high expression across clusters, we have now reworded this sentence in lines 382-385  We have tried to perform the correlation analysis but we believe this not to be appropriate due to the detection level of the gRNA, we have now included this as a limitation point in the discussion lines 416-427, and also toned down the conclusion we did draw about RUNX1T1 throughout the whole manuscript.

      As mentioned by the authors, IGFBP2 binds IGF1 and IGF2 modulating their function. Both IGF1 (http://dx.doi.org/10.1016/j.ymeth.2015.10.001) and IGF2 (doi:10.1038/nbt.3702) have been used in iPSC differentiation into definitive hematopoietic cells. It would be relevant to discuss/reference this in the discussion.

      We have now included the suggested reference in the section where we discuss the role of IGFBP2 in binding IGF1 and IGF2.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 1 compares the transcriptome of human AGM and in-vitro derived hemogenic endothelial cells (HECs). It is not clear why only the genes downregulated in the latter were chosen. Are there any significantly upregulated genes, knockdown/knockout which could also serve a similar purpose? Single-cell transcriptome database analysis is very preliminary. A detailed panel with differences in cluster properties of HECs between the two systems should be provided. A heatmap of all differentially expressed genes between the two samples must be generated, along with a logical explanation for choosing the given set of genes. 

      We have now included another panel in figure 1 to better clarify the logic behind the strategy used to identify our target genes (Figure 1A).

      (2) Figure 2 - a panel describing the workflow of gRNA design and targeting for the 9 candidate genes, along with lentiviral packaging and transduction would make it easier to follow. 

      We have now included three schematics for all the experiments in the manuscript in supplementary figure 4 A-C. 

      (3) Figure 3- to assess the effect of arterial cell expansion on the emergence of hematopoietic progenitors, CD34+ Dll4+ cells should be sorted for OP9 co-culture assay.

      Using only CD34+ cells does not answer the question raised. Also, the CFU assay performed does not fully support the claim of enhanced hematopoietic differentiation since only CFU-E and CFU-GM colonies are increased in Dox-treated samples, with no effect on other colony types. OP9 co-culture assay with these cells would be required to strengthen this claim. 

      We wanted to clarify that the effect on the methylcellulose coming from the activated cells was not limited to CFU-E, as the reviewer reported; instead, it also affected CFU-GM and CFU-M. 

      We have now performed additional experiments where we sorted the CD34+ compartment into DLL4- and DLL4+ in Supplementary Figure 5D-E, which we discussed in lines 250-258. 

      (4) In Figure 3F, there appears to be a lot of variation in the DLL4% fold change values for

      DOX treated iSAM_AGM sample, which weakens the claim of increased arterial expansion.

      Can the authors explain the probable reason? It is suggested that the two other controls (iSAM_+DOX and iSAM_-DOX) should be included in this analysis. It is imperative to also show % populations rather than just fold change to gain confidence.

      We agree that there is a lot of variability. That is because differentiation happens in 3D in embryoid bodies, which contain many different cell types that differentiate in different proportions across independent experiments. We have now included the raw data in Supplementary Figure 4 D, with additional statistical analysis to show the expansion of arterial cells including also the suggested additional controls.

      (5) How does activation of these target genes cause increased arterialization? Is the emergence of non-HE populations suppressed? Or is it specific to the HE? The data on this should be clarified and also discussed. ANTO/Lesley text

      We have provided additional data clarifying the connection between increased arterialisation and hemogenic potential. We showed that the activation induces increased arterialisation and that IGFBP2 acts by supporting the acquisition of hemogenic potential. We have discussed this in lines 326-348 and provided a new figure to explain this in detail (figure 6)

      (6) Considering that IGFBP2 was chosen from the activated target gene(s) cluster, can the authors explain why the reduced CFU-M phenomenon observed in Figure 3G does not appear in the MethoCult assay for IGFBP2 treated cells (Figure 4B)?

      The difference could be explained by the fact that in Figure 3G, the cells underwent activation of multiple genes, while in Figure 4B, they were only exposed to IGFBP2. Our results show that IGFBP2 could at least partially explain the phenotype that we see with the activation, but we believe that during the activation experiments, there might be other signals available that might not be induced by IGFBP2 alone. We have also added a summary section and a figure to clarify the different mechanisms of action of the gene activation and IGFBP2.

      (7) Figure 4- while the experiments conducted support the role of IGFBP2 in increasing hematopoietic output, there is no experimental evidence to prove its function through paracrine signalling in HECs. The authors need to provide some evidence of how IGFBP2 supplementation specifically expands only the hematopoietic progenitors. Experimental strategies involving specifically targeting IGFBP2 in hemogenic/arterial endothelial cells are required to prove its cell type specific function. Additionally, assessing the in vivo functional potential of the hematopoietic cells generated in the presence of IGFBP2, by bone-marrow transplantation of CD34+ CD43+ cells, is essential. 

      The role of IGFBP2 in the context of HSC production and expansion was not the topic of our research, and we have not claimed that IGFBP2  affects the long-term repopulating capacity of HSPCs. Therefore, we believe that the requested experiments are not required to support the specific claims that we do make. We have now provided more experiments and bioinformatic analysis that support the role of IGFBP2 in inducing the progression of EHT from arterial cells to hemogenic endothelium, and to avoid misunderstandings, we have toned down our claims by editing the text regarding its paracrine effect s. 

      (8) Figure 4C-D -It is recommended to plot % populations along with fold change value. As this is a key finding, it is important to perform flow cytometry for additional hematopoietic markers- CD144, CD235a and CD41a to demonstrate whether this strategy can also expand erythroid-megakaryocyte progenitors. Telma

      Figure 4C already shows the percentage values; we have now added the percentage for Figure 4D in SF5C. We have also performed additional analysis as requested and added the data obtained to Supplementary Figure 5D.

      (9) In Figure 5, analysis showing the frequency of cells constituting different clusters, between untreated and IGFBP2-treated samples in the single-cell transcriptome analysis is essential. Additional experiments are required to validate the function of IGFBP2 through modulation of metabolic activity. Inhibition of oxidative phosphorylation in the IGFBP2treated cells should reduce the hematopoietic output. Authors should consider doing these experiments to provide a stronger mechanistic insight into IGFBP2-mediated regulation of hematopoietic emergence.

      We have now included the requested cluster composition in Supplementary Figure 5F. We decided not to include further tests on the metabolic profile of IGFBP2 as we already discussed in other papers that showed, using selective inhibitors, that the EHT coincides with a glycol to OxPhos switch. 

      (10) It is very striking to see that IGFBP2 supplementation changes the transcriptional profile of developing hematopoietic cells by increasing transcription of OXPHOS-related genes with concomitant reduction of glycolytic signatures, particularly at Day 13. However, the mitochondrial ATP rate measurements do not seem convincing. The bioenergetic profiles show that when mitochondrial inhibitors are added, both groups exhibit decreased OCR values and, on the other hand, higher ECAR. This indicates that both groups have the capability to utilize OXPHOS or glycolysis and may only differ in their basal respiration rates.

      Differences in proliferation rate can cause basal respiration to change. There is no information on how the bioenergetic profile was normalized (cell no./protein amount). Given that IGFBP2 has been shown to increase proliferation, it is very likely that the cells treated with IGFBP2 proliferated faster and therefore have higher OCR. The data needs to be normalized appropriately to negate this possibility.

      We have previously tested whether IGFBP2 causes an increase in proliferation by analysing the cell cycle of cells treated with it, as we initially thought this could be a mechanism of action. We have now provided the quantification of the cell cycle in the cells treated with IGFBP2, showing no effect was observed in cell cycle Supplementary Figure 4E. Following this analysis, we decided to plate the same number of cells and test their density under the microscope before running the experiment; each experiment was done in triplicate for each condition. We have now added this info to the method sections lines 806-813.  We did not comment on the basal difference, which we agree might be due to several factors, but we only compared the difference in response to the inhibitors, which isn’t affected by the basal level but exclusively by their D values. We have also included the formulas used to calculate the ATP production rate.

      Overall, it appears that IGFBP2 does not seem to primarily cause metabolic changes, but simply accelerates the metabolic dependency on OXPHOS. Hence, the term 'metabolic remodelling' must be avoided unless IGFBP2 depletion/loss of function analysis is shown.

      We thank the reviewer for suggesting how to interpret the data about the dependency on OXPHOS. We have now changed the conclusions and claims about the effect of IGFBP2. We have also included a cell cycle analysis of the hematopoietic cells derived upon IGFBP2 addition to show that they don’t show differences in proliferation that could cause the increase in colony formation we observed. Regarding the assay, we have plated the same number of cells for each group to make sure we were comparing the same number of cells, which we also assessed in the microscope before the test, and we eliminated the suspension cells during the washes that preceded the measurement. The review is correct in indicating that there is a basal difference in the value of OCR and ECAR where the IGFBP2 is lower at the start and not higher, which would not conceal higher proliferation. Finally, the ATP production rate is calculated on the variation of OCR and ECAR upon the addition of inhibitors, which normalizes for the basal differences.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Summary:

      In this manuscript, the molecular mechanism of interaction of daptomycin (DAP) with bacterial membrane phospholipids has been explored by fluorescence and CD spectroscopy, mass spectrometry, and RP-HPLC. The mechanism of binding was found to be a two-step process. A fast reversible step of binding to the surface and a slow irreversible step of membrane insertion. Fluorescence-based titrations were performed and analysed to infer that daptomycin bound simultaneously two molecules of PG with nanomolar affinity in the presence of calcium. Conformational change but not membrane insertion was observed for DAP in the presence of cardiolipin and calcium.

      Strengths:

      The strength of the study is skillful execution of biophysical experiments, especially stoppedflow kinetics that capture the first surface binding event, and careful delineation of the stoichiometry.

      Weaknesses:

      The weakness of the study is that it does not add substantially to the previously known information and fails to provide additional molecular details. The current study provides incremental information on DAP-PG-calcium association but fails to capture the complex in mass spectrometry. The ITC and NMR studies with G3P are inconclusive. There are no structural models presented. Another aspect missing from the study is the reconciliation between PG in the monomer, micellar, and membrane forms.

      Besides the two-stage process, another important finding in the current work is the stable complex that plays a critical role in the drug uptake both in vitro and in B. subtilis. This complex has been shown to be a stable species in HPLC and its binding stoichiometry and affinity have been quantitatively characterized. The complex may not be stable enough in gas phase to be detected in the MS analysis, which was designed to detect the phospholipid and Dap components, not the complex itself. The structural model of this complex is clearly proposed and presented in Figure 6. 

      The NMR and ITC studies have a very clear conclusion that Dap has a weak interaction with the PG headgroup alone, which is unable to account for the Dap-PG interaction observed in the fluorescence studies. Thus, the whole PG molecule has to be involved in the interaction, leading to the discovery of the stable complex.  

      Reviewer #2 (Recommendations For The Authors):

      (1) I appreciate and agree with the comment that there are stages of daptomycin insertion, and these might involve the formation of different complexes with different binding partners (e.g. pre-insertion vs quaternary vs bactericidal). However, it seems like lipid II is an apparent participant in daptomycin membrane dynamics (Grein et al. Nature Communications 2020). It's not clear why this was excluded from analysis by the authors, or what basis there is for the discussion statement that the quaternary complex can shift into the bactericidal complex by exchanging 1 PG for lipid II. 

      We agree that lipid II and other isoprenyl lipids may be involved in the uptake and insertion of daptomycin into membrane according to the results of the Nat. Comm. paper. However, these isoprenyl lipids are very small components of the membrane in comparison to PG and their contribution to the drug uptake is thus expected to be much less significant. Nonetheless, we included farnesyl pyrophosphate (FPP) as an analog of bactoprenol pyrophosphate (C55PP), which was reported to have the same promoting effect as lipid II in the previous study, in our study but found no promoting effect in the fluorescence assay (Fig. 2B). In addition, no complex was formed when FPP replaced PG in our preparation and analysis of the drug-lipid complex. In consideration of these negative results and the expected small contribution, other isoprenyl lipids or their analogs were not included in the study.

      The statement of forming the proposed bactericidal complex from the identified complex is a speculation that is possible only when lipid II has a higher affinity for Dap than a PG ligand. To avoid confusion, we deleted the sentence’ in the revision. 

      (2) The detailed examination of daptomycin dynamics, particularly on the millisecond scale, in this paper is ideal for characterizing the effect of lipid II on daptomycin insertion. It would be helpful to either include lipid II in some analyses (micelle binding, fluorescence shifts, CD) or at least address why it was excluded from the scope of this work.

      As mentioned in the response to the first comment, we did not exclude isoprenyl lipids in our study but used some of their analogs in the fluorescence assay. Besides FPP mentioned above, we also tested geranyl pyrophosphate and geranyl monophosphate but obtained the same negative results. Lipid II was not directly used because it is one of the three isoprenyl lipids reported to have the same promoting effects in the Nat. Comm. paper and also because its preparation is not easy. Even if lipid II were different from other isoprenyl lipids in promoting membrane binding, its contribution is likely negligible at the reversible stage compared to the phospholipids because of its minuscule content in bacterial membrane. This is the main reason we did not use the isoprenyl lipids in the fast kinetic study (this stage only involves reversible binding, not insertion). 

      (3) Grein et al. 2020 saw that PG did not have a strong effect on daptomycin interaction with membranes. I believe this discrepancy is more likely due to the complex physical parameters of supported bilayers versus micelles/vesicles or some other methodological variable, but if the authors have more insight on this, it would be valuable commentary in the discussion.

      We totally agree that the discrepancy is likely due to the different conditions in the assays. It is hard to tell exactly what causes the difference. Thus, we did not attempt to comment on the cause of this difference in the discussion.

      (4) Isolation of the daptomycin complex from B. subtilis cells clearly had different traces from the in vitro complex; is it possible that lipid II is present in the B. subtilis complex? If not, a time-course extraction could be useful to support the model that different complexes have different activities. Isolates from early-stage incubation with daptomycin may lack lipid II but isolates from longer incubations may have lipid II present as the complex shifts from insertion to bactericidal.

      From the day we isolated the complex from B. subtilis, we have been looking for evidence for the previously proposed lipid complexes containing lipid II or other isoprenyl lipids but have not been successful. We did not see any sign of lipid II or other isoprenyl lipids in the MALDI or ESI mass spectroscopic data. The minute peaks in the HPLC traces are not the expected complexes in separate LC-MS analysis. However, this does not mean that such complexes are not present in the isolated PG-containing complex because: (1) the amount of such complexes may be too small to be detected due to the low content of the isoprenyl lipids; (2) the isoprenyl lipids, particularly lipid II, are not easily ionizable due to their size and unique structure for detection in mass spectrometry. 

      We don’t think the drug treatment time is the reason for the failure in detecting lipid II or other isoprenyl lipids. In our reported experiment, the cells were treated with a very high dose of Dap for 2 hours before extraction. In a separate experiment done recently, we treated B. subtilis at 1/3 of the used dose under the same condition and found all treated cells were dead after 1 hour in a titration assay, consistent with the results from reported time-killing assays in the literature. From this result, the proposed bactericidal lipid-containing complex should have been formed in the treated cells used in our extraction and isolated along with the PG-containing complex. It was not detected likely due to the reasons discussed above. To avoid the interference of the PG-containing complex, a large amount of bacterial cells might have to be treated at a low dose to isolate enough amount of the lipid II-containing complex for identification. However, isolation or identification of the lipid II-containing complex is outside the scope of the current investigation and is therefore not pursued. 

      (5) Part of the daptomycin mechanism of interacting with bacterial membranes involves the flipping of daptomycin from one leaflet to another. There was some mentioned work on the consistency of results between micelles and vesicles, but the dynamics or existence of a flipping complex in the bilayer system wasn't addressed at all in this paper.

      The current investigation makes no attempt to solve all problems in the daptomycin mode of action and is limited to the uptake of the drug, up to the point when Dap is inserted into the membrane. Within this scope, flipping of the complex is not yet involved and is thus irrelevant to the study. How the complex is flipped and used to kill the bacteria is what should be investigated next.  

      (6) The authors mention data with phosphatidylethanolamine in the text, but I could not find the data in the main or supplemental figures. I recommend including it in at least one of the figures.

      It is much appreciated that this error is identified. The POPE data was lost when the graphic (Fig. 2B) was assembled in Adobe to create Figure 2. We re-draw the graphic and reassemble the figure to solve this problem. Fig. 2B has also been modified to use micromolar for the concentration of the lipids.

      (7) Readability point: I'd suggest some consistency in the concentrations mentioned. Making the concentrations either all molar-based or all percentage-based would make comparison across figures easier.

      As suggested, we have changed the % into micromolar concentrations in Fig. 2B and also in Fig. 3A. 

      (8) The model figure is quite difficult to interpret, particularly the final stage of the tail unfolding. I recommend the authors use a zoomed-in inset for this stage, or at least simplify the diagram by removing the non-participating lipid structures. The figure legend for the model figure should also have a brief description of the events and what the arrows mean, particularly the POPS PG arrow in the final panel of the figure. I am assuming here the authors are implying that daptomycin can transiently interact with one lipid species and move to another, but the arrow here suggests that daptomycin is moving through the lipid headgroup space.

      We really appreciate the suggestions. As suggested, we put an inset to show the preinsertion complex more clearly. In addition, we have removed the green arrows originally intended to show the re-organization/movement of the phospholipids. Moreover, the legend is changed to ‘Proposed mechanism for the two-phased uptake of Dap into bacterial membrane. In the first phase, Dap reversibly binds to negative phospholipids with a hidden tail in the headgroup region, where it combines with two PG molecules to form a pre-insertion complex. In the second phase, the hidden tail unfolds and irreversibly inserts into the membrane. The inset shows the headgroup of the pre-insertion complex with the broad arrow showing the direction for the unfolding of the hidden tail. The red dots denote Ca2+.’  

      (9) The authors listed the Kd for daptomycin and 2 PG as 7.2 x 10-15 M2. Is this correct? This is an affinity in the femtomolar range.

      Please note that this Kd is for the simultaneous binding of two PG molecules, not for the binding of a single ligand that we usually refer to. Assuming that each PG contributes equally to this interaction, the binding affinity for each ligand is then the squared root of 7.2 x 10-15 M2, which equals to 8.5 x 10-8 M. This is equivalent to a nanomolar affinity for PG and is a reasonably high affinity.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors reported an increase in daptomycin intensity with the increasing amount of negatively charged DMPG. A similar observation has been reported for GUVs, however, the authors did not refer to this paper in their manuscript: E. Krok, M. Stephan, R. Dimova, L. Piatkowski, Tunable biomimetic bacterial membranes from binary and ternary lipid mixtures and their application in antimicrobial testing, Biochim. Biophys. Acta - Biomembr. 1865 (2023) [1]. This paper is also consistent with the authors' observation that there is negligible fluorescence detected for the membranes composed of PC lipids upon exposure to the Dap treatment.

      As suggested, this paper is cited as ref. 29 in the revision by adding the following sentence at the end of the section ‘Dependence of Dap uptake on phosphatidylglycerol.’: ‘PG-dependent increase of the steady-state fluorescence was also observed in giant unilamellar vesicles (GUVs).29’. The numbering is changed accordingly for the remaining references.  

      (2) Please include the plot of the steady-state Kyn fluorescence vs the content of POPA (Figure 2C shows traces for DMPG, CL, and POPS). Both POPA and POPS lipids are negatively charged, however, POPS seems to interact with Dap, while POPA does not. In my opinion, this observation is really interesting and might deserve a more thorough discussion. The authors might want to describe what could be the mechanism behind this lipid-specific mode of binding.

      As suggested, a plot is now added for POPA in Fig. 2C, which is basically a flat line without significant increase for the Kyn fluorescence. Indeed, the different effect of the negative phospholipids is very interesting, indicating that the reversible binding of Dap to the lipid surface is dependent not only on the Ca2+-mediated ionic interaction but also the structure of the headgroup. In other words, Dap recognizes the phospholipids at the surface binding stage. Considering this headgroup specificity, the last sentence in the second paragraph in “Discussion’ is changed from ‘In addition, due to the low lipid specificity, this reversible binding likely involves Ca2+-mediated ionic interaction between Dap and the phosphoryl moiety of the headgroups.’ to ‘In addition, due to the specificity for negative phospholipids (Fig. 2B and 2C), this reversible binding of Dap likely involves both a nonspecific Ca2+-mediated ionic interaction and a specific interaction with the remaining part of the headgroups.’

      (3) The authors write that they propose a novel mechanism for the Ca2+-dependent insertion of Dap to the bacterial membrane, however, they rather ignored the already published findings and hypotheses regarding this process. In fact the role of Ca2+, as well as the proposed conformational changes of Dap, which allow its deeper insertion into the membrane are well known:

      The role of Ca2+ ions in the mechanism of binding is actually three-fold: (i) neutralization of daptomycin charge [2], (iii) creating the connection between lipids and daptomycin and (iii) inducing two daptomycin conformational changes. It should be noted that the interactions between calcium ions and daptomycin are 2-3 orders of magnitude stronger than between daptomycin and PG lipids [3,4]. Thus, upon the addition of CaCl2 to the solution, the divalent cations of calcium bind preferentially to the daptomycin, rather than to the negatively charged PG lipids, which results in the decrease of daptomycin net negative charge but also leads to its first conformational change [4]. Upon binding between calcium ions and two aspartate residues, the area of the hydrophobic surface increases, which allows the daptomycin to interact with the negatively charged membrane. In the next step, Ca2+ acts as a bridge connecting daptomycin with the anionic lipids. This event leads to the second conformational change, which enables deeper insertion of daptomycin into the lipid membrane and enables its fluorescence [4]. The overall mechanism has a sequential character, where the binding of daptomycin-Ca2+ complex to the negatively charged PG (or CA) occurs at the end.

      The authors should focus on emphasizing the novelty of their manuscript, keeping in mind the already published paper.

      We agree with the comments on the three general roles of calcium ion in the Dap interaction with membrane. The current investigation does not ignore the previous findings, which involve many more works than mentioned above, but takes these findings as common knowledge. Actually, the role of calcium ion is not the focus of current work. Instead, the current work focuses on how the drug is taken up and inserted into the membrane in the presence of the ion and how its structure changes in this process. With the known roles of calcium ion in mind, we propose an uptake mechanism (Fig. 6) that shows no conflict with the common knowledge.

      We would like to point out that the ‘deeper insertion into the membrane’ in the comment is different from the membrane insertion referred to in our manuscript. This ‘deeper insertion’ still remains in the reversible stage of binding to the membrane surface because all negative phospholipids can do this (causing a conformational change and fluorescence increase, as quantified in Fig.2C) but now we know that only PG can enable irreversible membrane insertion because of our work. In addition, the comment that calcium binding to daptomycin causes first conformational change is not supported by our finding that no conformational change is found for Dap in the presence of calcium in a lipid-free environment (Fig. 3B). One important aspect of novelty and contribution of our work is to clear up some of these ambiguities in the literature. Another contribution of our work is to demonstrate the formation of a stable complex between Dap and PG with a defined stoichiometry and its crucial role in the drug uptake. 

      (4) One paragraph in the section "Ca2+- dependent interaction between Dap and DMPG" is devoted to a discussion of the formation of precipitate upon extraction of DMPG-containing micelles, exposed to Dap in the calcium-rich environment. Contrary, in the absence of Dap, no precipitate was detected. The authors did not provide any visual proof for their statement. Please include proper photographs in the supplementary information.

      The precipitate formed upon extraction of the DMPG-containing micelles was too little to be visually identifiable but could be collected by centrifugation and detected by fluorescence or HPLC after dissolving in DMSO. For visualization, we show below the precipitate formed using higher amount of Dap and DMPG. The Dap-DMPG-Ca2+ complex (left tube) was formed by mixing 1 mM Dap, 2 mM DMPG and 1 mM Ca2+ and the control (right tube) was a mixture of 2 mM DMPG and 1 mM Ca2+. This is now added as Fig. S7 in the supplementary information (the index is modified accordingly) and cited in the main text.

      (5) The authors wrote that it is not clear how many calcium ions are bound to Dap-2PG complex (page 11, Discussion section). There are already reports discussing this issue. I recommend citing the paper discussing that exactly two Ca2+ ions bind to a single Dap molecule: R. Taylor, K. Butt, B. Scott, T. Zhang, J.K. Muraih, E. Mintzer, S. Taylor, M. Palmer, Two successive calcium-dependent transitions mediate membrane binding and oligomerization of daptomycin and the related antibiotic A54145, Biochim. Biophys. Acta - Biomembr. 1858, (2016) 1999-2005 [5]

      We were aware of the cited work that shows binding of two Ca2+ but also noted that there are more works showing one Ca2+ in the binding, such as the paper in [Ho, S. W., Jung, D., Calhoun, J. R., Lear, J. D., Okon, M., Scott, W. R. P., Hancock, R. E. W., & Straus, S. K. (2008), Effect of divalent cations on the structure of the antibiotic daptomycin. European Biophysics Journal, 37(4), 421–433.]. That was the reason we said ‘it is not clear how many calcium ions are bound to Dap-2PG complex’. Now, both papers are cited (as Ref. #33, 34) to support this statement.

      (6) The authors wrote two contradictory statements:

      -  PG cannot be found in mammalian cell membranes:

      "Moreover, the complete dependence of the membrane insertion on PG also explains why Dap selectively attacks Gram-positive bacteria without affecting mammalian cells, because PG is present only in bacterial membrane but not in mammalian membrane. " (Page 10, Discussion section, last sentence of the first paragraph)

      "However, Dap absorbed on bacterial surface is continuously inserted into the acyl layer via formation of complex with PG in a time scale of minutes, whereas no irreversible insertion of Dap occurs on mammalian membrane due to the absence of PG while the bound Dap is continuously released to the circulation as the drug is depleted by the bacteria." (Page 13, Discussion section)

      -  PG in trace amounts is present in mammalian membranes:

      "The proposed requirement of the pre-insertion quaternary complex increases the threshold of PG content for the membrane insertion to happen and thus makes it impossible on the surface of mammalian cells even if their plasma membrane contains a trace amount of PG." (Page 13, Discussion section).

      In fact, phosphatidylglycerol comprises 1-2 mol% of the mammalian cell membranes. Please, correct this information, which in this form is misleading to the readers.

      We appreciate the comments about the PG content in mammalian cells. Changes are made as listed below:

      (1) p10, the sentence is changed to ‘Moreover, the complete dependence of the membrane insertion on PG also explains why Dap selectively attacks Gram-positive bacteria without affecting mammalian cells, because PG is a major phospholipid in bacterial membrane but is a minor component in mammalian membrane.’ 

      (2) p13, the sentence is changed to ‘However, Dap absorbed on bacterial surface is continuously inserted into the acyl layer via formation of complex with PG in a time scale of minutes, whereas little irreversible insertion of Dap occurs on mammalian membrane due to the low content of PG while the bound Dap is continuously released to the circulation as the drug is depleted by the bacteria.’

      (3) p13, another sentence is modified to ‘The proposed requirement of the pre-insertion quaternary complex increases the threshold of PG content for the membrane insertion to happen and thus makes it less likely on the surface of mammalian cells that contain PG at a low level in the membrane.’ 

      (7) Please include information that Dap is effective only against Gram-positive bacteria and does not show antimicrobial properties against Gram-negative strains. The authors focused on emphasizing that Dap does not affect mammalian membranes, most likely due to the low PG content, however even membranes of Gram-negative bacteria are not susceptible to the Dap, despite the relatively high content of negatively charged PG in the inner membrane (e.g. inner cell membrane of E. coli has ~20% PG).

      The requested information is already included in ‘Introduction’. In this part, Dap is introduced to be only active against Gram-positive bacteria, implicating that it is not active against Gram-negative bacteria. The reason Dap is inactive against E. coli or other Gramnegative bacteria is because the outer membrane prevents the antibiotic from accessing the PG in the inner membrane to cause any harm. When the outer membrane is removed, Dap will also attack the plasma membrane of Gram-negative bacteria. 

      Literature cited in the comments:

      (1) E. Krok, M. Stephan, R. Dimova, L. Piatkowski, Tunable biomimetic bacterial membranes from binary and ternary lipid mixtures and their application in antimicrobial testing, Biochim. Biophys. Acta - Biomembr. 1865 (2023). https://doi.org/10.1101/2023.02.12.528174.

      (2) S.W. Ho, D. Jung, J.R. Calhoun, J.D. Lear, M. Okon, W.R.P. Scott, R.E.W. Hancock, S.K. Straus, Effect of divalent cations on the structure of the antibiotic daptomycin, Eur. Biophys. J. 37 (2008) 421-433. https://doi.org/10.1007/S00249-007-0227-2/METRICS.

      (3) A. Pokorny, P.F. Almeida, The Antibiotic Peptide Daptomycin Functions by Reorganizing the Membrane, J. Membr. Biol. 254 (2021) 97-108. https://doi.org/10.1007/s00232-02100175-0.

      (4) L. Robbel, M.A. Marahiel, Daptomycin, a bacterial lipopeptide synthesized by a nonribosomal machinery, J. Biol. Chem. 285 (2010) 2750127508. https://doi.org/10.1074/JBC.R110.128181.

      (5) R. Taylor, K. Butt, B. Scott, T. Zhang, J.K. Muraih, E. Mintzer, S. Taylor, M. Palmer, Two successive calcium-dependent transitions mediate membrane binding and oligomerization of daptomycin and the related antibiotic A54145, Biochim. Biophys. Acta - Biomembr. 1858 (2016) 1999-2005. https://doi.org/10.1016/J.BBAMEM.2016.05.020.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work used a comprehensive dataset to compare the effects of species diversity and genetic diversity within each trophic level and across three trophic levels. The results showed that species diversity had negative effects on ecosystem functions, while genetic diversity had positive effects. These effects were observed only within each trophic level and not across the three trophic levels studied. Although the effects of biodiversity, especially genetic diversity across multi-trophic levels, have been shown to be important, there are still very few empirical studies on this topic due to the complex relationships and difficulty in obtaining data. This study collected an excellent dataset to address this question, enhancing our understanding of genetic diversity effects in aquatic ecosystems.

      Strengths:

      The study collected an extensive dataset that includes species diversity of primary producers (riparian trees), primary consumers (macroinvertebrate shredders), and secondary consumers (fish). It also includes the genetic diversity of the dominant species at each trophic level, biomass production, decomposition rates, and environmental data.

      The conclusions of this paper are mostly well supported by the data and the writing is logical and easy to follow.

      Weaknesses:

      (1) While the dataset is impressive, the authors conducted analyses more akin to a "meta-analysis," leaving out important basic information about the raw data in the manuscript. Given the complexity of the relationships between different trophic levels and ecosystem functions, it would be beneficial for the authors to show the results of each SEM (structural equation model).

      We understand the point raised by the reviewer. We now provide individual SEMs (Figure 3), although we limit causal relationships to those for which the p-value was below 0.2 for the sake of graphical clarity. We also provide the percentage of explained variance for each ecosystem function. We detail the graph in the Results section (see l. 317-328) and discuss them (see l. 387-398). Note that we do not detail each function separately as this would (in our opinion) result in a long descriptive paragraph from which it might be difficult to get some key information. Rather, we summarize the percentage of explained variance for each function and discuss the strength of environmental vs biodiversity effects for some examples. In the Discussion, we explain why environmental effects (on functions and biodiversity) are relatively weak. We mainly attribute this to the sampling scheme that follows an East-West gradient (weak altitudinal range) rather than an upstream-downstream gradient as it is traditionally done in rivers. The reasoning behind this sampling scheme is explained in our companion paper (Fargeot et al. Oikos 2023) to which we now refer more explicitly in the MS. Briefly, using an upstream-downstream gradient would have certainly push up the effects of the environment, but this would have made extremely complex the inference of biodiversity effects due to strong collinearity among environmental and biodiversity parameters.

      (2) The main results presented in the manuscript are derived from a "metadata" analysis of effect sizes. However, the methods used to obtain these effect sizes are not sufficiently clarified. By analyzing the effect sizes of species diversity and genetic diversity on these ecosystem functions, the results showed that species diversity had negative effects, while genetic diversity had positive effects on ecosystem functions. The negative effects of species diversity contradict many studies conducted in biodiversity experiments. The authors argue that their study is more relevant because it is based on a natural system, which is closer to reality, but they also acknowledge that natural systems make it harder to detect underlying mechanisms. Providing more results based on the raw data and offering more explanations of the possible mechanisms in the introduction and discussion might help readers understand why and in what context species diversity could have negative effects.

      (We now provide more details. However, we are unfortunately not sure that this helped reaching some stronger explanation regarding underlying mechanisms. To be frank, we did not succeed in improving mechanistic inferences based on the outputs of the SEM models. We explored visually some additional relationships (e.g. relationships between the biomass of the focal species and that of other species in the assemblage) that we now discuss a bit more, but again, this did not really help in better understanding processes. We realize this is a limitation of our study and that this can be frustrating for readers. Nonetheless, as said in the Discussion, field-based study must be taken for what they are; observational studies forming the basis for future mechanistic studies. Although we failed to explain mechanisms, we still think that we provide important field-base evidence for the importance of biodiversity (as a whole) for ecosystem functions.

      3) Environmental variation was included in the analyses to test if the environment would modulate the effects of biodiversity on ecosystem functions. However, the main results and conclusions did not sufficiently address this aspect.

      This is now addressed, see our response to your first comment. We now explain (result section) and discuss environmental effects. As explained in the MS, environmental effects are similar in strength to those of biodiversity and are not that high, which is partly explained by the sampling scheme (see Fargeot et al. 2023). This is a choice we’ve made at the onset of the experiment, as we wanted to focus on biodiversity effects and avoid strong collinearity as it is generally the case in rivers (which impedes any proper and strong statistical inferences).

      Reviewer #2 (Public review):

      Summary:

      Fargeot et al. investigated the relative importance of genetic and species diversity on ecosystem function and examined whether this relationship varies within or between trophic-level responses. To do so, they conducted a well-designed field survey measuring species diversity at 3 trophic levels (primary producers [trees], primary consumers [macroinvertebrate shredders], and secondary consumers [fishes]), genetic diversity in a dominant species within each of these 3 trophic levels and 7 ecosystem functions across 52 riverine sites in southern France. They show that the effect of genetic and species diversity on ecosystem functions are similar in magnitude, but when examining within-trophic level responses, operate in different directions: genetic diversity having a positive effect and species diversity a negative one. This data adds to growing evidence from manipulated experiments that both species and genetic diversity can impact ecosystem function and builds upon this by showing these effects can be observed in nature.

      Strengths:

      The study design has resulted in a robust dataset to ask questions about the relative importance of genetic and species diversity of ecosystem function across and within trophic levels.

      Overall, their data supports their conclusions - at least within the system that they are studying - but as mentioned below, it is unclear from this study how general these conclusions would be.

      Weaknesses:

      (4) While a robust dataset, the authors only show the data output from the SEM (i.e., effect size for each individual diversity type per trophic level (6) on each ecosystem function (7)), instead of showing much of the individual data. Although the summary SEM results are interesting and informative, I find that a weakness of this approach is that it is unclear how environmental factors (which were included but not discussed in the results) nor levels of diversity were correlated across sites. As species and genetic diversity are often correlated but also can have reciprocal feedbacks on each other (e.g., Vellend 2005), there may be constraints that underpin why the authors observed positive effects of one type of diversity (genetic) when negative effects of the other (species). It may have also been informative to run SEM with links between levels of diversity. By focusing only on the summary of SEM data, the authors may be reducing the strength of their field dataset and ability to draw inferences from multiple questions and understand specific study-system responses.

      We have addressed this remark and we ask the reviewers and the readers to refer to our response to comment 1 from reviewer 1. Regarding co-variation among biodiversity estimates (SGDCs according to Vellend’s framework), we have addressed these issues in a companion paper that we now cite and expand further in the MS (Fargeot et al. Oikos, 2023). Given the size of the dataset and its complexity (and associated analyses), we have decided to focus on patterns of species and genetic biodiversity in a first paper (Oikos paper) and then on the link between biodiversity and functions (this paper). As it can be read in the Oikos’s paper, there are no co-variation in term of biodiversity estimates; species diversity is not correlated to genetic diversity, and within facet, there are not co-variation among species. In addition, environmental predictors are highly estimate-specific (i.e. environmental predictors sustaining species and genetic estimates are idiosyncratic). As a result (see the new Figure 3), environmental effects are relatively weak (the same intensity that those of biodiversity) and collinearity among parameters is relatively weak. The second point is important, as this permit to better infer parameters from models, and this allows to discuss direct relationships (as observed in Figure 3, indirect environmental effects are relatively rare). We provide in the Discussion a bit more explanation about the absence of co-variation among biodiversity estimates (see l. 433-440).

      (5) My understanding of SEM is it gives outputs of the strength/significance of each pathway/relationship and if so, it isn't clear why this wasn't used and instead, confidence intervals of Z scores to determine which individual BEFs were significant. In addition, an inclusion of the 7 SEM pathway outputs would have been useful to include in an appendix.

      We now provide p-values (Table S2) and the seven models (Figure 3).

      (6) I don't fully agree with the authors calling this a meta-analysis as it is this a single study of multiple sites within a single region and a specific time point, and not a collection of multiple studies or ecosystems conducted by multiple authors. Moreso, the authors are using meta-analysis summary metrics to evaluate their data. The authors tend to focus on these patterns as general trends, but as the data is all from this riverine system this study could have benefited from focusing on what was going on in this system to underpin these patterns. I'd argue more data is needed to know whether across sites and ecosystems, species diversity and genetic diversity have opposite effects on ecosystem function within trophic levels.

      We agree. “Meta-regression” would perhaps be more adequate than “meta-analyses”. We changed the formulation.

      Reviewer #3 (Public review):

      The manuscript by Fargeot and colleagues assesses the relative effects of species and genetic diversity on ecosystem functioning. This study is very well written and examines the interesting question of whether within-species or among-species diversity correlates with ecosystem functioning, and whether these effects are consistent across trophic levels. The main findings are that genetic diversity appears to have a stronger positive effect on function than species diversity (which appears negative). These results are interesting and have value.

      However, I do have some concerns that could influence the interpretation.

      (7) Scale: the different measures of diversity and function for the different trophic levels are measured over very different spatial scales, for example, trees along 200 m transects and 15 cm traps. It is not clear whether trees 200 m away are having an effect on small-scale function.

      Trees identification and invertebrate (and fish) sampling are done on the same scale. Trees are spread along the river so that their leaves fall directly in the river. Traps have been installed all along the same transect in various micro-habitats. Diversity have been measured at the exact same scale for all organisms. We have modified the MS to make this clear.

      (8) Size of diversity gradients: More information is needed on the actual diversity gradients. One of the issues with surveys of natural systems is that they are of species that have already gone through selection filters from a regional pool, and theoretically, if the environments are similar, you should get similar sets of species, without monocultures. So, if the species diversity gradients range from say, 6 to 8 species, but genetic diversity gradients span an order of magnitude more, you can explain much more variance with genetic diversity. Related to this, species diversity effects on function are often asymptotic at high diversity and so if you are only sampling at the high diversity range, we should expect a strong effect.

      Fish species number varies from 1 to 11, invertebrate family number varies from 15 to 42 and the tree species number varies from 7 to 20 (see Fargeot et al. 2023 for details). We have added this information in the M&M. The gradients are hence relatively large and do not cover a restricted set of values. There is a variance in species number among sites, even if sites are collected along a relatively weak altitudinal gradient. This is obviously complex to compare to SNP (genomic) diversity. Genetic and species effects are similar in effect sizes (percentage of explained variance), so it does not seem we have biased one of the two gradients of biodiversity.

      (9) Ecosystem functions: The functions are largely biomass estimates (expect decomposition), and I fail to see how the biomass of a single species can be construed as an ecosystem function. Aren't you just estimating a selection effect in this case?

      The biomass estimated for a certain area represents an estimate of productivity, whatever the number of species being considered. Obviously, productivity of a species can be due to environmental constraints; the biomass is expected to be lower at the niche margin (selection effect). But if these environmental effects are taken into account (which is the case in the SEMs), then the residual variation can be explained by biodiversity effects. We provide an explanation (l. 217-219).

      (10) Note that the article claims to be one of the only studies to look at function across trophic levels, but there are several others out there, for example:

      Thanks, we now cite some of these studies (Li et al 2020, Moi et al. 2021, Seibold et al. 2018).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Introduction:

      The introduction of the manuscript is generally well-structured, and the scientific questions are clearly presented. However, in each paragraph where specific aspects are introduced, the authors do not focus sufficiently on the given points. The current introduction discusses the weaknesses of previous studies extensively but lacks detailed explanations of mechanisms and a clear anticipation of this study's contributions.

      For example:

      L72-77: The authors mention that "genetic diversity may functionally compensate for a species loss," but this point is not highly relevant to the main analyses of this study, which focus on comparing the relative effects of species diversity and genetic diversity.

      Yes true, we understand the point made by the reviewers. We deleted this part of the sentence.

      L87-95: As previously noted, "whether environmental variation decreases or enhances the relative influence of genetic and species diversity on ecosystem functions" was not addressed in this study. Additionally, the last sentence seems unnecessary here, as it does not relate to "environmental variation." The phrase "generate insightful knowledge for future mechanistic models" is vague. It would be helpful to specify what kind of knowledge and what types of future mechanistic models are being referred to.

      We modified these two sentences. We now posit the prediction that what has been observed under controlled conditions (that genetic and species have effects of similar magnitude) might not be the norm under fluctuating environments (because it has been shown that environmental variation modulates the strength of interspecific BEFS and create huge variance).

      L96-116: The use of "for instance" three times in this paragraph makes the structure seem scattered, as only examples are provided. Improving the transition words can help the text focus better on the main point.

      We have modified some parts of this section to better reflect predictions

      L115-116: Again, it would be beneficial to specify what kind of insightful information can be provided.

      We have modified this sentence by making more explicit some of the information that may be gained.

      L117-134: Stating clear expectations can help the introduction focus on the mechanisms and assist readers in following the results.

      We now provide some predictions. We were reluctant to make predictions in the first version of the MS as we have the feeling that predictions can go on very different direction depending on how we set the scene. We therefore stick to predictions that we think are the most logical (the simplest ones). This illustrates the lack of theoretical papers on these issues.

      Methods:

      L287-293: The method for estimating the standard effect size is unclear. I assume it was derived from the SEM models? This needs further clarification.

      Yes, it is derived from the standardized estimate from each pSEM. This is now explained in the MS.

      Results:

      As mentioned in the public review, it is very important to show the results of analyzing raw data.

      Done, see Figure 3 and Results section.

      Table 1: The font and format of the PCA table are different from other tables and appear vague, resembling a picture rather than a table.

      Changed.

      Table 2 (and supplementary table): "D.f." is not explained in the table legend. Is 1 the numerator df and 30 the denominator df? Is the denominator the residual? Additionally, the table legend mentions "magnitude and direction." ANOVA only tests if the biodiversity effects are significantly different between species or genetic diversity, but not the magnitude. For example, -0.5 and 0.5 are very different, but their effect magnitudes are the same.

      This is a mistake; sorry the format of the Table was from a previous version of the MS in which we used linear models rather that linear mixed models (both lead to the same results). The ANOVA used to test the significance of fixed terms in linear mixed model are based on Wald chi-sqare tests, and it should have been read “Chi-value” rather than “F-value” in both tables and the only degree of freedom in this test is the one at the numerator. This has been changed. We have changed the caption of the Table (“ANOVA table for the linear mixed model testing whether the relationships between biodiversity and ecosystem functions measured in a riverine trophic chain differ between the biodiversity facets (species or genetic diversity) and the types of BEF (within- or between-trophic levels)”)

      Minor:

      There should always be a space between a number and a unit. In the manuscript, spaces are inconsistently used between numbers and units.

      Corrected

      Reviewer #2 (Recommendations for the authors):

      (1) In the introduction, the authors could focus more and build out what they predicted/hypothesized as well as what has been found in the manipulated experiments that examined the role of species and genetic diversity. That would enhance the background information for a more general audience, and highlight expected results and why.

      We modified the Introduction according to comments made by reviewer 1 and clarified the predictions as best as we can.

      (2) Similarly, the discussion is fairly big picture, but this dataset focused exclusively on this 3-trophic interaction in a riverine system. It could be beneficial to dig into the ecology to find out why the opposite effects of species and genetic diversity are seen within trophic levels in this system.

      We have added some explanations based on the specific pSEM (see our responses to the public reviews for details). But as said in the responses to the public reviews, even with mode detailed models, it is hard to tease apart mechanisms. One important point is that genetic and species diversity do not correlate one to each other (they do not co-vary over space), which means the effect of one facet is independent from the other. However, apart from that, we can’t really tell more without more mechanistic approaches. We understand this is frustrating, but this is the nature of field-based data. This does not mean they are useless. On the contrary, they confirm and expand patterns found under controlled conditions (which for ecologists is quite important as nature is our playground), but they are limited in inferences of mechanisms.

      (3) It would also be informative if the authors specified what positive and negative Z scores mean. It seems counterintuitive in Figure 3. For example, in the upper left, it's denoted as a larger intraspecific effect - which I'd assume is higher genetic (within species) diversity - but is this not where species diversity effects are higher? In theory this figure could be similar to Figure 1 from Des Roches et al. 2018 - where showing the 1:1 line of where species and genetic diversity effects are similar and then how some are more impacted by SD or GD as that links to the overall question, right?

      For example: Figure 3 makes it seem that GD effects are stronger (more positive) for within trophic responses (which is reflected in the text), but in that quadrant, it states that the interspecific effect is larger?

      yes, you’re true Figure 3 (now Figure 4) is not ideal. We added an explicit explanation for interpreting Zr in the main text. In addition, we modified the text in the quadrat as this was not correct. Note that it cannot be directly be compared to that of DesRoches et al. In DesRoches et al., there is a single effect size (ES) per situation (which is roughly expressed as “ES = effect of species - effect of genotypes”). Here, there are two ES per situation, one for the species effect, the other for the genetic effect, which makes the biplot more complex (as species and genetic can be similar in magnitude, but opposite in direction, e.g., 0.5 and -0.5). We may have done as DesRoches et al. (“ES = effect of species - effect of genotypes”), but as we don’t have absolute ES (as in DesRoches) the resulting signs of the ES are non sensical…Not easy for us to find a clever solution (or said differently, we were not clever enough to find an easy solution).  Nonetheless, we tried another visualization by including “sub-quadrats” into the four main quadrats. We hope this will be clearer

      (4) It's unclear why authors included both a simplified linear mixed model with diversity type and biodiversity facet as fixed factors, and then a second linear model that included trophic level (with those other 2 factors and interactions), but only showed results of trophic level from that more complex model. It is unclear why they include two models when the more complex one would have evaluated all aspects of their research question and shown the same patterns.

      You’re true, the more complex model evaluates both aspects. Nonetheless, as the hypotheses were strictly separated, we thought it is simpler to associate one model to one hypothesis. We agree that this duplicates information, but we would like to keep the two models to make the text more gradual.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript reports the substrate-bound structure of SiaQM from F. nucleatum, which is the membrane component of a Neu5Ac-specific Tripartite ATP-dependent Periplasmic (TRAP) transporter. Until recently, there was no experimentally derived structural information regarding the membrane components of the TRAP transporter, limiting our understanding of the transport mechanism. Since 2022, there have been 3 different studies reporting the structures of the membrane components of Neu5Ac-specific TRAP transporters. While it was possible to narrow down the binding site location by comparing the structures to proteins of the same fold, a structure with substrate bound has been missing. In this work, the authors report the Na+-bound state and the Na+ plus Neu5Ac state of FnSiaQM, revealing information regarding substrate coordination. In previous studies, 2 Na+ ion sites were identified. Here, the authors also tentatively assign a 3rd Na+ site. The authors reconstitute the transporter to assess the effects of mutating the binding site residues they identified in their structures. Of the 2 positions tested, only one of them appears to be critical to substrate binding.

      Strengths:

      The main strength of this work is the capture of the substrate-bound state of SiaQM, which provides insight into an important part of the transport cycle.

      Weaknesses:

      The main weakness is the lack of experimental validation of the structural findings. The authors identified the Neu5Ac binding site, but only tested 2 residues for their involvement in substrate interactions, which was very limited. The authors tentatively identified a 3rd Na+ binding site, which if true would be an impactful finding, but this site was not tested for its contribution to Na+ dependent transport, and the authors themselves report that the structural evidence is not wholly convincing. This lack of experimental validation undermines the confidence of the findings. However, the reporting of these new data is important as it will facilitate follow-up studies by the authors or other researchers.

      The main concern, also mentioned by other reviewers, is the lack of mutational data and functional studies on the identified binding sites. Two other structures of TRAP transporters have been determined, one from Haemophilus influenzae (Hi) and the other from Photobacterium profundum (Pp). We will refer to the references in this paper as [1], Peter et al. as [2], and Davies et al. as [3]. The table below lists all the mutations made in the Neu5Ac binding site, including direct polar interactions between Neu5Ac and the side chains, as well as the newly identified metal sites.

      The structure of Fusobacterium nucleatum (Fn) that we have reported shows a significant sequence identity with the previously reported Hi structure. When we superimpose the Pp and Fn structures, we observe that nearly all the residues that bind to the Neu5Ac and the third metal site are conserved. This suggests that mutagenesis and functional studies from other research can be related to the structure presented in our work.

      The table below shows that all three residues that directly interact with Neu5Ac have been tested by site-directed mutagenesis for their role in Neu5Ac transport. Both D521 and S300 are critical for transport, while S345 is not. We do not believe that a mutation of D521A in Fn, followed by transport studies, will provide any new information.

      However, Peter et al. have mutated only one of the 5 residues near the newly identified metal binding site, which resulted in no transport. The rest of the residues have not been functionally tested. We propose to mutate these residues into Ala, express and purify the proteins, and then carry out transport assays on those that show expression. We will include this information in the revised manuscript.

      Author response table 1.

      Reviewer #2 (Public Review):

      In this exciting new paper from the Ramaswamy group at Purdue, the authors provide a new structure of the membrane domains of a tripartite ATP-independent periplasmic (TRAP) transporter for the important sugar acid, N-acetylneuraminic acid or sialic acid (Neu5Ac). While there have been a number of other structures in the last couple of years (the first for any TRAP-T) this is the first to trap the structure with Neu5Ac bound to the membrane domains. This is an important breakthrough as in this system the ligand is delivered by a substrate-binding protein (SBP), in this case, called SiaP, where Neu5Ac binding is well studied but the 'hand over' to the membrane component is not clear. The structure of the membrane domains, SiaQM, revealed strong similarities to other SBP-independent Na+-dependent carriers that use an elevator mechanism and have defined Na+ and ligand binding sites. Here they solve the cryo-EM structure of the protein from the bacterial oral pathogen Fusobacterium nucleatum and identify a potential third (and theoretically predicted) Na+ binding site but also locate for the first time the Neu5Ac binding site. While this sits in a region of the protein that one might expect it to sit, based on comparison to other transporters like VcINDY, it provides the first molecular details of the binding site architecture and identifies a key role for Ser300 in the transport process, which their structure suggests coordinates the carboxylate group of Neu5Ac. The work also uses biochemical methods to confirm the transporter from F. nucleatum is active and similar to those used by selected other human and animal pathogens and now provides a framework for the design of inhibitors of these systems.

      The strengths of the paper lie in the locating of Neu5Ac bound to SiaQM, providing important new information on how TRAP transporters function. The complementary biochemical analysis also confirms that this is not an atypical system and that the results are likely true for all sialic acid-specific TRAP systems.

      The main weakness is the lack of follow-up on the identified binding site in terms of structure-function analysis. While Ser300 is shown to be important, only one other residue is mutated and a much more extensive analysis of the newly identified binding site would have been useful.

      Please see the comments above.

      Reviewer #3 (Public Review):

      The manuscript by Goyal et al reports substrate-bound and substrate-free structures of a tripartite ATP-independent periplasmic (TRAP) transporter from a previously uncharacterized homolog, F. nucleatum. This is one of the most mechanistically fascinating transporter families, by means of its QM domain (the domain reported in his manuscript) operating as a monomeric 'elevator', and its P domain functioning as a substrate-binding 'operator' that is required to deliver the substrate to the QM domain; together, this is termed an 'elevator with an operator' mechanism. Remarkably, previous structures had not demonstrated the substrate Neu5Ac bound. In addition, they confirm the previously reported Na+ binding sites and report a new metal binding site in the transporter, which seems to be mechanistically relevant. Finally, they mutate the substrate binding site and use proteoliposomal uptake assays to show the mechanistic relevance of the proposed substrate binding residues.

      The structures are of good quality, the functional data is robust, the text is well-written, and the authors are appropriately careful with their interpretations. Determination of a substrate-bound structure is an important achievement and fills an important gap in the 'elevator with an operator' mechanism. Nevertheless, I have concerns with the data presentation, which in its current state does not intuitively demonstrate the discussed findings. Furthermore, the structural analysis appears limited, and even slight improvements in data processing and resulting resolution would greatly improve the authors' claims. I have several suggestions to hopefully improve the clarity and quality of the manuscript.

      We appreciate your feedback and will make the necessary modifications to the manuscript incorporating most of the suggestions. We will submit the revised version once the experiments are completed. We are also working on improving the quality of the figures and have made several attempts to enhance the resolution using CryoSPARC or RELION, but without success. We will continue to explore newer methods in an effort to achieve higher resolution and to model more lipids, particularly in the binding pocket.

      Reviewing Editor (Recommendations for the Authors):

      After discussing the reviews, the reviewers and reviewing editor have agreed on a list of the most important suggested revisions for the authors, which, if satisfactorily addressed, would improve the assessment of the work. These suggested revisions are listed below. We also include the full Recommendations For The Authors from each of the individual reviewers.

      (1) The authors tentatively identified a 3rd Na+ binding site, which if true would be an impactful finding, but this site was not tested for its contribution to Na+ dependent transport, and the authors themselves report that the structural evidence is not wholly convincing. Additional mutagenesis and activity experiments to test the contribution of this site to transport would strengthen the manuscript. Measuring Na+ concentration-response relations and calculating Hill slopes in WT vs. an M site mutant would be a good experiment. Given the lack of functional data and poor density, it does not seem appropriate to build the M site sodium in the PDB model.

      The density is well defined to suggest a metal bound (waters would not be clearly defined at this resolution).  While our modeling of the site as a Na+ is arbitrary, this was done to satisfy the refinement programs where we have a known scatterer modeled.  We could model this density with other metals, but unlike crystallographic refinement, real-space refinement of cryoEM maps does not produce a difference map that might allow us to identify the metal but not conclusively.   The density of the maps is good (we have added better figures to demonstrate this).  We tried making multiple mutations to test for activity – unfortunately, we are still struggling to express proteins with mutations in this site in sufficient quantities to carry out transport assays.

      In the absence of being able to do the experiments, we did MD simulations (carried out by Senwei Quan and Jane Allison at University of Auckland).  Our results are shown below – we are not certain without further studies that these should be included in the current paper (we will add them as authors if the editor feels that this evidence is critical).

      Author response table 2.

      We are showing this for review to suggest that K+, Ca2+, and Na+ were tried, and only Na+ stays stably in the binding pocket. The rest of the results will also have to be explained, which would change the focus of the paper.

      We also provided the sequence to Alphafold3 and asked it to identify the possible metal binding sites—when the input was Na+, it found all three binding sites. 

      Summary:  Both our experimental data and computational studies suggest the observed metal binding site is real but at the moment, it is not possible to refine the structure and put an unidentified metal.  Computational studies suggest that this is a high-probability Na+ site. 

      Demonstration of cooperativity between the Na+ site and transport require carrying out these experiments with mutations in these sites in a concentration-dependent manner. Unfortunately, our inability to produce well-expressed and purified proteins with mutations in a short time frame failed. 

      (2) The authors identified the Neu5Ac binding site but only tested 2 residues for their involvement in substrate interactions, which was very limited. Given that the major highlight of this paper is the identification of the Neu5Ac binding site, it would strengthen the manuscript if the authors provided a more extensive series of mutagenesis experiments - testing at least the effect of D521A would be important. One inconsistency is Ser345 mutagenesis not affecting transport, and the authors should further discuss in the text why they think that is.

      D521A has been tested in H. influenzae, and this mutation results in loss of transport.  This residue is highly conserved and occupies the same position. We expect the result to remain the same. 

      We have added a few extra lines to discuss Serine 345: “Ser 345 OG is 3.5Å away from the C1-carboxylate oxygen – a distance that would result in a weak interaction between the two groups. It is, therefore, not surprising that the mutation into Ala did not affect transport. The space created by the mutation can be occupied by a water molecule.”

      (3) The purification and assessment of the stability of the protein are described in text alone with no accompanying data. It would be beneficial to include these data (e.g. in the Supplementary info) as it allows the reader to evaluate the protein quality.

      This is now added as Supplementary Figure 2.

      (4) The structural figures throughout the paper could benefit from more clarity to better support the conclusions. Specific critiques are listed below:

      - Figure 1: since the unbound map has a similar reported resolution, displaying the unbound structure's substrate binding site with the same contour would clearly demonstrate that the appearance of this density is substrate-dependent.

      - Figure 1: the atomic fit of the ligand to the density, and the suggested coordination by side chain and backbone residues, would be useful in this figure.

      - Figure 1: I think it would be more intuitive to compare apo and bound structures with the same local resolution scale.

      We have remade Figure 1 “Architecture of FnSiaQM with nanobody. (A and B) Cryo-EM maps of FnSiaQM unliganded and sialic acid bound at 3.2 and 3.17 Å, respectively. The TM domain of FnSiaQM is colored using the rainbow model (N-terminus in blue and C-terminus in red). The nanobody density is colored in purely in red. The density for modeled lipids is colored in tan and the unmodelled density in gray. The figures were made with Chimera at thresholds of 1.2 and 1.3 for the unliganded and sialic acid-bound maps. (C and D) The cytoplasmic view of apo and sialic acid bound FnSiaQM, respectively. Color coding is the same as in panels A and B. The density corresponding to sialic acid and sodium ions are in purple. The substrate binding sites of apo and sialic acid bound FnSiaQM are shown with key residues labeled. The density (blue mesh) around these atoms was made in Pymol with 2 and 1.5 s for the apo and the sialic acid, respectively, with a carve radius of 2 Å.”

      The local resolution maps have been moved to Supplementary Figure 3.

      - Figure 3, Figure 5a: The mesh structures throughout the manuscript are blocky and very difficult to look at and interpret, especially for the ion binding sites, which are currently suggestive of but not definitively ion densities. Either using transparent surfaces, higher triangle counts, or smoothing the surface might help this.

      We have made Figure 3 again with higher triangle counts.  We tried all three suggestions and this provided the best figure. We have replaced Figure 5A with density for Neu5Ac and residues around it.

      - Figure 5A: It would be important to show the densities of the entire binding pocket, especially coordinating side chains, to show the reader what is and isn't demonstrated by this structure.

      - It's not clear how Figure 5D is supposed to show that the cavity can accommodate Neu5Gc, as suggested by the text - please make the discussed cavity clearer in the Figure.

      We have now marked with an arrow the Methyl Carbon where the hydroxyl group is added.  We have mentioned that in the legend.  It is open to the periplasmic side of the cavity.

      - Supplementary Figure 4: Please label coordinating residue sites.

      Labels have been added to Supplementary Figure 6 which was earlier Supplementary Figure 4.

      (5) Intro section: the authors should introduce the work on HiSiaP around the role of the R147 residue in high-affinity Neu5Ac binding, which coordinates the carboxylate of Neu5Ac, and which is a generally conserved mechanism for organic acid binding in other TRAP transporters. This context will help magnify their discovery later that in the membrane domains, it is a key serine and not an arginine that coordinates the carboxylate group (probably as the local concentration of Neu5Ac is high and tight binding site is not desirable for rapid transport, which is mentioned in the discussion).

      Thank you for pointing this out. We have added a new sentence to the introduction.

      “All the SiaP structures show the presence of a conserved Arginine that binds to the C1-carboxylate of Neu5Ac, and this Arg residue is critical as the high electrostatic affinity may be important to have a strong binding affinity that sequesters the small amounts that reach the bacterial periplasmic space  (Glaenzer et al., 2017).”

      (6) TRAP transporters exist for many organic compounds and not just sialic acid, which might be nice to make the reader aware of.

      We initially did not do this as this is an advance paper and this was discussed in the earlier paper (Currie et. al., 2024). However, we have now added a sentence to the introduction. “Additionally, amino acids, C4-dicarboxylates, aromatic substrates and alpha-keto acids are also transported by TRAP transporters (Vetting et al., 2015). “

      (7) On p. 12, the authors describe the Neu5Ac binding site as a large solvent-exposed vestibule, having previously described the substrate-bound state as occluded. These descriptions should be adjusted to make clear which structure is being referenced. The clarity of this would be substantially improved if the authors included a figure that showed this occlusion - currently none of the structure figures clearly demonstrate what the authors are referring to. There are several conspicuous unmodeled densities proximal to the substrate, reminiscent of lipids (in between transport and scaffold domain) and possibly waters/ions. Given this, it is really surprising that the substrate binding site is described as "solvent-exposed" since the larger molecules seem to occlude the pocket. The authors should further process their dataset and discuss the implications of these surrounding densities.

      We have processed the data sets carefully both with cryosparc and relion and the resolution described here is same with both software with the cryosparc maps slightly better in terms of interpretability of peripheral helices and described in the manuscript. The current sample (FnTRAP) with the nanobody is a relatively stable sample (in our experience with other similar proteins) as evident from the number of images and particles to achieve a decent resolution and thus the workflow is straightforward and simple.  There are number of non-protein densities, which in principle can be modelled but we have chosen a conservative approach not to model these extra densities (except for the two lipids, few ions) due to limit of the resolution. It is possible that increasing the number of particles will result in an increase in resolution but from the estimated B-factor (125 or 135 Å2 for unliganded and liganded), this will certainly require lot of more images with no guarantee of increased resolution.

      The question of outward open Vs outward occluded is a valid point. We have now modified this in the manuscript. “The Neu5Ac binding site has a large solvent-exposed vestibule towards the cytoplasmic side, while its periplasmic side is sealed off. Cryo-EM map shows the presence of multiple densities that could be modeled as lipids, possibly preventing the substrate from leaving the transporter. However, the densities are not well defined to model them as specific lipids, hence they have not been modeled.  We describe this as the “inward-facing open state” with the substrate-bound.”

      (8) On p.15, the activity of FnSiaPQM in liposomes is reported, although the impetus for this study is not clear. Presumably, the reason for its inclusion is to ensure that the structurally characterized protein is active. It would be useful to say this at the start of the section if this is the case. This study nicely shows that the energetics and requirements of transport are identical to all the previous studies on Neu5Ac TRAP transporters - it would be good to acknowledge this somewhere in this section as well.

      These changes have been incorporated.  We have added a line to say why we did this and added as the last line that this is similar to other SiaPQM’s characterized.

      (9) Figure 5C. The authors show the transport activity with and without valinomycin. The authors do not explain the rationale for testing and reporting both conditions for these mutants; an explanation is required, or the data should be simplified. The expected membrane potential induced by valinomycin should be mentioned in the legend.

      We have simplified Figure 5C and added the expected membrane potential value.

      (10) The authors state that the S300A mutant is inactive. However, unless the authors also measured the background binding/transport of radiolabelled substrate in the absence of protein, then the accuracy of this statement is not clear because Figure 5C does indicate some activity for S300A, albeit much lower than WT. This is an important point in light of the authors' suggestion that the membrane protein does not need a binding site of high affinity or stringent selectivity.

      We thank the reviewer for pointing this out we have now added a line in the experimental protocols “The experimental values were corrected by subtracting the control, i.e. the radioactivity taken up in liposomes reconstituted in the absence of protein. The radioactivity associated with the control samples, i.e. empty liposomes was less than 10% with respect to proteoliposomes.”.

      (11) There are several issues and important omissions in the work cited:

      - It is not normal practice to cite a reference in the abstract and the citation is only to the second structure of HiSiaQM, which does not fairly reflect previous work in the field by only referring to their own work. Also throughout the article, it is normal practice with in-text citations to order them chronologically, i.e. earliest first. Please update this.

      This article was submitted as an “Research advance article”.  The instructions specifically say that “Research advance article should cite the article in eLife this paper advances.  Hence the citation of the “second structure of HiSiaQM”.  In fact, in the manuscript we explicitly say “The first structure of _Hi_SiaQM (4.7 Å resolution) demonstrated that it is composed of 15 transmembrane helices and two helical hairpins.”   We are following the policy laid out.  

      Zotero organizes multiple references in alphabetical order, we did not choose to do it that way – the suggestion of bias is not true. The final version of the accepted paper will have numbers, and this argument will automatically be corrected.

      - Intro: please cite the primary papers discovering other families of sialic acid transporters.

      - Intro: When introducing information on the binding site, dissociation constant of Neu5Ac, and thermodynamics of ligand binding to SiaP, the authors should also include references to the work done by others in addition to their own work.

      The Setty et al. paper was the first to demonstrate that the two-component systems are distinct, and that the binding protein of the TRAP system binds enthalpically while the binding protein of the ABC system binds entropically (SiaP vs SatA). As the reviewer points out, this is significant because it highlights how the Arg binding to the carboxylate, which is the enthalpic driver in this case and contributes to the difference between sugar binding to SiaP and SatA. Many studies have published binding affinities of molecules to SiaP, but this paper offers valuable insight into the differences between these systems. We have cited a number of the SiaP papers from other groups, including acknowledging the first structure of SiaP from H. influenzae by Muller et al., in 2006.

      - p.5 "TRAP transporters are postulated to employ an elevator-type mechanism...". This postulation has been experimentally tested and published, so should be discussed and referenced (Peter et al. 2024. https://doi.org/10.1038/s41467-023-44327-3).

      We have now corrected this error. We removed “are postulated to” and added the reference.

      - p.5 "Notably, the transport of Neu5Ac by TRAP transporters requires at least two sodium ions (Davies et al., 2023)." The requirement for at least 2 Na+ ions for Neu5Ac transport was first demonstrated in Mulligan et al. PNAS 2009, so should also be cited (for completion, so should Mulligan et al. JBC. 2012 and Currie et al. elife 2023, which have also shown this requirement is a commonality amongst all Neu5Ac TRAP transporters).

      Added.

      - P.12, Mulligan et al, JBC, 2012 should be added to the citations in the first sentence.

      Added.

      - p.19 "Interestingly, even the dicarboxylate transporter from V. cholerae (VcINDY) binds to its ligand via electrostatic interactions with both carboxylate groups". Other references are more appropriate than the one used to support this statement.

      Also added references for Mancusso et. al, 2012, Nie et.al, 2017 and Sauer et.al., 2022 here.

      - p.19. "The structure of the protein in the outward-facing conformation is unknown". The authors do not discuss the mechanistic findings from Peter et al 2024 Nat Comm here. The work described in that paper revealed an experimentally verified model of the OFS of HiSiaQM, so really needs to be included.

      This is not an experimentally determined 3D structure. They have shown the possible existence of this by microscopy, but the structure is not determined. The work mentioned is a wonderful piece of work, but it does not report the three-dimensional structure of the protein in the outward-facing conformation to allow us to understand the nature of the molecular interactions. 

      - The reference to Kinz-Thompson et al 2022 on p. 6 is not appropriate - neither the HiSiaQM papers nor the PpSiaQM paper makes reference to this work when identifying the binding site. More suitable references are used, for example, Mancusso et al 2012, Nie et al 2017 and Sauer et al 2022; this should be reported accurately.

      Added the suggested references.  We think the paper (Kinz-Thomposin et al 2022) is relevant and have also kept that reference.

      - Garaeva et al report the opposite of what the authors mention - "In the human neutral amino acid transporter (ASCT2), which also uses the elevator mechanism, the HP1 and HP2 loops have been proposed to undergo conformational changes to enable substrate binding and release (Garaeva et al., 2019)." In fact, this paper suggested a one-gate model of transport (HP2), where HP1 seems uninvolved in gating.

      The Reviewer is correct.  We were wrong and not clear.  The entire paragraph has been rewritten.

      “While, both the HP1 and HP2 loops have been hypothesized to be involved in gating, in the human neutral amino acid transporter (ASCT2), (which also uses the elevator mechanism), only the HP2 loops have been shown to undergo conformational changes to enable substrate binding and release (Garaeva et al., 2019). Hence, it is suggested that there is a single gate that controls substrate binding. Superposition of the _Pp_SiaQM and _Hi_SiaQM structures do not reveal any change in these loop structures upon substrate binding. For TRAP transporters, the substrate is delivered to the QM protein by the P protein; hence, these loop changes may not play a role in ligand binding or release. This may support the idea that there is minimal substrate specificity within SiaQM and that it will transport the cargo delivered by SiaP, which is more selective.”

      - p.19 "suggesting that SSS transporters have probably evolved to transport nine-carbon sugars such as Neu5Ac (Wahlgren et al, 2018)." Surely this goes without saying since Wahlgren et al 2018 demonstrated that SiaT, an SSS, could transport sialic acid? It's unclear why this was included here - perhaps it needs to be rewritten to make the point more clearly, but as it stands, this statement appears self-evident. Furthermore, these proteins can transport all kinds of molecules (see TCDB 2.A.21). This statement needs to be clarified. 

      This was a comparison to other Neu5Ac binding sites in other Neu5Ac transporters. We have modified the sentence. “The polar groups bind to both the C1-caboxylate side of the molecule and the C8-C9 carbonyls, suggesting that Proteus mirabilis Neu5Ac transporter (SSS type) evolved specifically to transport nine-carbon sugars such as Neu5Ac (Wahlgren et al., 2018)”.  These were arguments we were making to suggest that the lack of tight binding could also mean reduced specificity.

      - The authors reconstitute the FnSiaQM and measure transport with SiaP, which resembles closely what is known for both HiSiaPQM, VcSiaPQM, which is not cited (https://doi.org/10.1074/jbc.M111.281030).

      - Regarding lipids between transport and scaffold domains: there is precedent for such lipids in the elevator transporter GltPh, Wang, and Boudker (eLife 2020) proposed similar displacements during transport and would be appropriate to cite here.

      We have now cited the reference to the Mulligan et al., 2012 paper.  We also added a sentence on the findings of GltPh paper by Wang and Boudker.  Thank you for pointing this out.

      (12) p.9 "TRAP transporters, as their name suggests, comprise three units: a substrate-binding protein (SiaP) and two membrane-embedded transporter units (SiaQ and SiaM) (Severi et al., 2007)." This is somewhat odd phrasing because the existence of fused membrane components has been well-documented for a long time. The addition of "Many" at the start of the sentence fixes this.

      Added Many.

      (13) On p.12 the authors compare the ligand-induced conformational changes of FnSiaQM with ASCT2, citing Garaeva et al, 2019. This comparison does not make sense considering TRAP transporters and ASCT2 do not share a common fold. A far superior comparison is with DASS transporters, which actually do have the same fold as TRAP transporters. And, importantly, the Na+ and substrate-induced conformational changes have been investigated for DASS transporters revealing a unique mechanism likely shared by TRAP transporters (Sauer et al, Nat Comm, 2022). The text on p.12 should be adjusted to replace the ASCT comparison with a VcINDY comparison.

      The purpose of citing the ASCT2 paper was only concerning the HP1 and HP2 gates.  The authors show that HP2 changes conformation only.  Comparing the two FnSiaQM structures – with and without ligand, we see no change in either the HP1 or the HP2 loops.  On Page 17, when we describe the structure, we do specifically mention that the overall architecture is similar to VcINDY and the DASS transporters.

      (14) p.12 "For TRAP transporters, the substrate is delivered to the QM protein by the SiaP" protein;" "SiaP protein" should be "P protein"

      Corrected.

      (15) p.18. "periplasmic membrane" should be "cytoplasmic membrane".

      Corrected.

      (16) p.19. "This prevents Neu5Ac from binding..." There is no evidence for this so this needs to be softened, e.g. "This likely prevents Neu5Ac from...".

      Agree – Modified.

      (17) Figure 2B is rather small, cramped, and difficult to see. We suggest that the authors make that panel larger, or include it as a stand-alone supplementary figure.

      We have moved this figure into a supplementary figure as suggested by the reviewer.

      (18) The authors describe the Neu5Ac binding site in SiaQM. It would be helpful if the authors provided a figure in support of the statement that the Neu5Ac binding site architecture is similar to dicarboxylate in VcINDY (especially as Neu5Ac is a monocarboxylate).

      The Neu5Ac binding site is NOT similar to the VcINDY binding site. But, we understand the origin of the comment. We have now changed the sentence: “The overall architecture of the Neu5Ac binding site is similar to that of citrate/malate/fumarate in the di/tricarboxylate transporter of V. cholerae (Vc_INDY), but the residues involved in providing specificity are different (Kinz-Thompson _et al., 2022; Mancusso et al., 2012; Nie et al., 2017; Sauer et al., 2022). Neu5Ac binds to the transport domain without direct interactions with the residues in the scaffold domain. The majority of the interactions are with residues in the HP1 and HP2 loops of the transport domain (Figure 5B). Asp521 (HP2), Ser300 (HP1), and Ser345 (helix 5) interact with the substrate through their side chains, except for one interaction between the main chain amino group of residue 301 and the C1-carboxylate oxygen of Neu5Ac. Mutation of the residue equivalent to Asp521 has been shown to result in loss of transport (Peter et al., 2022). To evaluate the role of residues Ser-300 and Ser-345, we mutated them to alanine and performed the transport assays.”  

      (19) When comparing the binding modes of Neu5Ac to different proteins in Figure 6, it would be helpful to include the structure in this paper as well.

      The Neu5Ac binding site is present in figure 5. We would prefer not to show it again in Figure 6.

      Additionally, there is a clear binding mode of Neu5Ac in Figure 1 as well.

      (20) The manuscript would benefit from a more detailed comparison between Na+-bound (described as apo) and Na+/Neu5Ac structures, especially the prospective gates. If this transporter behaves anything like the archetypical ion-coupled glutamate transporters, some structural changes in the gates might be expected to facilitate transport domain movement when the substrate is loaded, but not when only Na+ is bound. It would be important to discuss and visualize these changes.

      We have described in the manuscript that there is NO change in the HP1 and HP2 gates between the unliganded structure and the Neu5Ac bound structure. The major difference we observe is the ordering of the third metal binding site.

      A figure comparing the substrate binding pockets between the different high-resolution structures would also be informative. Do the bonding distances between ligands and side chains significantly change between homologs?

      This is the only Neu5Ac bound structure.  Since the specificity to the substrate comes from the variability of the residues that interact it, we do not believe that this figure would not add much value.  

      (21) A supplementary figure (or an inset to Figure 2) showing pairwise percent identity between different characterized QM transporters would be useful.

      We have now added a Supplementary Figure 4 showing the comparison of the three QM sequences whose structures have been determined.

      (22) There is relatively minimal EM processing. More rigorous processing would require relatively little effort and could boost resolution, making this a vastly improved manuscript with a much more confident interpretation of structures.

      We described the overall workflow. The processing was rigorous. After obtaining the first maps, we created templates with the structure and did template-based picking.  We then did several rounds of 2D classification followed by homogenous refinement, Non-Uniform Refinement.  We then made masks and carried out local refinement.  We then got the best maps and did a 3D classification. Refined the 3D classes independently.  Then, we regrouped them based on how similar they were. We then went back and picked particles again (we used different methods of particle picking, but template-based picking resulted in the final set of particles used) and went through the whole process again.  At the end of the refinement, we carried out global and local CTF refinement followed by reference-based motion correction. The final refinement was then done with the Bayesian polished particles.  The final refinement was local refinement with a mask over only the transporter and the nano-body. After the reviews came, we tried multi-body refinement in Relion5.  It did not improve resolution. We have expanded the legend to supplementary Figure 2 (without listing all the different things we tried). The best resolution we obtained for the structure was 3.1 Å. However, it is important to note that the local resolution of the map around the ligand is good. 

      We realized this is not easy to depict in a local resolution map.  So, we wrote a script to take every atom, then take a radius of 5 Å (again we tried different radii and used the optimal one; we are preparing a manuscript to describe this), take all the local resolution values within the 5 Å spere and average it and add it as B-factor that atom. We have moved the local resolution map figure to the supplement and replaced Figure 1 with a Cartoon, where the color represents the local resolution in which the atom is. 

      (23) Calling the structure without Neu5Ac bound an "apo" structure is confusing since it indeed has the ligand Na+ present and bound. "Na+" and "Na+/Neu5Ac" structures would be more appropriate.

      Changed all “apo” to “unliganded”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      - The manuscript needs comprehensive proofreading for language and formatting. In many instances, spaces are missing or not required.

      Thank you for your comments. The manuscript has been thoroughly proofread for errors in language and formatting.

      - Could the authors explore correlation network analyses to get additional insights into the structure of different clusters? 

      We have added a co-occurrence analysis (at species taxonomic level) based on SparCC to the manuscript (Figure 2).

      This is described on Page 9 line 141-148

      - The GitHub link is not correct. 

      The github repository has now been made public.

      - It is not possible to access the dataset on ENA. 

      We have changed the ENA study PRJEB57401 status to open.

      - Add the graphs obtained with decontam analysis as a supplementary figure. 

      We have added the outputs of decontam (.csv files with feature lists of ASVs that were filtered based on the prevalence and frequency tests) to the github repository.

      - There is nothing about the RPL group in the results section, while the authors discuss this issue in the introduction. What about the controls with proven fertility? 

      Thank you. We have amended the manuscript to compare characteristics between the RPL, unexplained subfertility and controls groups.

      Line 1279-130 page 8:  

      “The study group represented 85% of samples with high sperm DNA fragmentation, 85% of samples with elevated ROS and 79% of samples with oligospermia. Rates of abnormal seminal parameters including low sperm concentration, reduced progressive motility and ROS concentrations were found to be highest in the MFI group (Supplementary Figure 1). Baseline characteristics between the RPL, unexplained subfertility and controls groups were similar.

      Line 150-154 Page 9: 

      “Bacterial richness, diversity and load were similar between all patient groups examined in the study (Supplementary Figure 4).

      - While correctly stated in the title, the term microbiota should be used throughout the manuscript instead of "microbiome" 

      Thank you. This misnomer has been amended throughout the manuscript.

      Minor corrections:

      Line 25: provoke is not a good term here. 

      Thank you. The term ‘provoke’ has been removed

      Line 26: why does semen culture have a limited scope? 

      Thank you. Line 40-41 Page 3 has been amended:

      “It is therefore plausible that asymptomatic seminal infections may be associated with impaired reproductive function in some men. Since semen culture has a limited scope for studying the seminal microbiota due to its inability to identify all present microbiota next generation sequencing (NGS) approaches have been reported recently by a growing number of investigators (13, 14, 15, 16, 17, 18, 19)”.

      Line 68: write μl correctly

      Thank you. This has been corrected

      Line 131: several organisms at the genus level. 

      Thank you. This has been corrected

      Line 136: what are the relative abundances of these genera? Is this relevant? 

      The mean relative abundances for the key taxa mention in each cluster are all above 20%. This information has been added to the manuscript text on page 9, line 153.

      Line 173: Molina et al. 

      Thank you. This has been corrected

      Line 173: the contaminations are referred to the low biomass nature of testicular samples. If present, bacteria of accessory gland secretions are an integral part of the seminal microbiota itself. Please review these sentences. 

      Thank you. This had been reworked to highlight the important of urethral contamination, which you later allude to as a limitation of our study is the failure to provide paired urine and semen samples.

      Page 11 line 194-196

      “Molina et al report that 50%-70% of detected bacterial reads may be environmental contaminants in a sample from extracted testicular spermatozoa (35); with the addition of passage along the urethra it is likely that contamination of ejaculated semen would be much higher.”

      Table 1: remove results interpretation from table caption. 

      Thank you this has been acted upon.

      Table 1: why in some cases, like in DNA fragmentation index, the total is not equal to n=223? 

      This is due to missing data/ analysis not possible for some men due to the requirement of a minimum number of sperm in the ejaculate to perform DNA fragmentation testing.

      Table 1: "frag" is not defined. 

      Thank you, this has been amended

      Tables 2, 3 & 4: bacterial genera in italics. 

      Thank you, this has been amended

      Figure 1A: add the fertility status information above the cluster colors. 

      Thank you, this has been amended in Figure 1.

      Figure 1C: the color code is confusing. Use different colors for each cluster. 

      Figure 1 legend: bacterial genera in italics. 

      Figures 1 & 2: the authors should use similar chart formatting in the two tables. 

      Thank you, this has been amended

      Reviewer 2:

      (1) The patient groups have different diagnoses and should be handled as different groups, and not fused into one 'patient' group in analyses. <br /> Why are the data in tables presented as controls and cases? I would consider men from couples with recurrent pregnancy loss, unexplained infertility, and male factor infertility to have different seminal parameters (not to fuse them into one group). This means, that the statistical analyses should be performed considering each group separately, and not to fuse 3 different infertility diagnoses into one patient group. 

      We have conducted detailed analyses, requested by the reviewer, comparing seminal DNA, ROS and microbiota characteristics between each individual patient groups (Supplimental figures 1 and 4). No specific taxa (at either genera or species-level) were found to differ in relative abundance between the diagnostic groups. However, we expect associations between parameters such as reactive oxygen species, or DNA fragmentation, and relative abundance of bacterial species, to be general and not restricted to or specific to each diagnostic group. Therefore, we also conducted further analyses aggregating data from all patient groups to investigate relationships common to these different forms of male reproductive dysfunction.

      (2) Were any covariables included in the statistical analyses, e.g. age, BMI, smoking, time of sexual abstinence, etc? 

      Covariates were not included in the statistical analyses. This has been added in the manuscript to the limitations.

      Page 14 line 267-268

      “Additionally, we did not have other covariables such as smoking status with which to include in further analyses”.

      (3) Furthermore, it is known that 16S rRNA gene analysis does not provide sensitive enough detection of bacteria on the species level. How much do the authors trust their results on the species level? 

      The limitations of taxonomic assignment using 16S rRNA gene metataxonomics are well documented. However, the capacity to assign sequence amplicons at species level depends on the sequence variability of the 16S rRNA gene for each of the taxa reported and the specific gene region chosen. In this study, amplification of the V1-V2 region was performed using a mixed 28f primer set (see methods for details) that enables resolution and assignment of several bacterial species highly relevant to the reproductive tract including Lactobacillus spp., such as L. crispatus and L. iners, (e.g. https://doi.org/10.3389/fcell.2021.641921, https://doi.org/10.1128/msystems.01039-23, https://doi.org/10.1186/s12915-023-01702-2). In this study, we report the presence of L. iners, but not L. crispatus in semen samples, and we have also identified a specific association/co-occurrence between Gardnerella vaginalis and Lactobacillus iners, similar to that observed in vaginal bacterial communities.

      (4) Were the analyses of bacterial genera and species abundances with seminal quality parameters controlled for diagnosis and other confounders? 

      As stated in point 2, no adjustment was made for co-variates. No differences in microbiome composition were observed among the three diagnostic groups, so no adjustments were made to our analysis.

      (5) The authors stress that their study is the biggest on the microbiome in semen. However, when considering that the study consists of 4 groups (with n=46-63), it does not stand out from previous studies. 

      Our study is overall the largest investigating interactions between the seminal microbiome and male reproductive dysfunction. Other studies have included greater numbers of men with infertility.

      (6) Weaknesses: There is a lack of paired seminal/urinal samples. 

      Thank you. This limitation has been added.

      Page 14 line 266-267

      “A further limitation of this study, and others, is the lack of reciprocal genital tract microbiota testing of the female partners, or paired seminal and urinary samples from male participants”.

      Recommendation for authors to consider:

      Including previous classical reviews in the introduction: DOI:10.1097/MOU.0000000000000742 <br /> DOI: 10.1038/s41585-019-0250-y 

      Thank you. This has been added.

      Mentioning in the M&M section that there is a supplementary text with a more detailed M&M part. 

      Thank you. This has been added. Further methodological detail can be found in supplementary text.

      Revising the use of 'microbiota' and 'microbiome', they are not synonyms. When talking of 16S rRNA gene analysis, we consider 'microbiome' analysis. 

      Thank you. This misnomer has been amended throughout the manuscript.

      Revising the text, there are several erratas (e.g. verb missing, etc). 

      Thank you for your comments. The manuscript has been thoroughly proofread for errors in language and formatting.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      Summary: 

      In the manuscript entitled "Magnesium modulates phospholipid metabolism to promote bacterial phenotypic resistance to antibiotics", Li et al demonstrated the role of magnesium in promoting phenotypic resistance in V. alginolyticus. Using standard microbiological and metabolomic techniques, the authors have shown the significance of fatty acid biosynthesis pathway behind the resistance mechanism. This study is significant as it sheds light on the role of an exogenous factor in altering membrane composition, polarization, and fluidity which ultimately leads to antimicrobial resistance. 

      Strengths: 

      (1) The experiments were carried out methodically and logically. 

      (2) An adequate number of replicates were used for the experiments. 

      Weaknesses: 

      (1) The introduction section needs to be more informative and to the point.  

      Thank you so much for your suggestion. We have revised the introduction to make it more informative and to the point as following:

      “Non-inheritable antibiotic or phenotypic resistance represents a serious challenge for treating bacterial infections. Phenotypic resistance does not involve genetic mutations Phenotypic resistance does not involve genetic mutations and is transient, allowing bacteria to resume normal growth. Biofilm and bacterial persisters are two phenotypic resistance types that have been extensively studied (Brandis et al., 2023; Corona & Martinez, 2013). Biofilms have complex structures, containing elements that impede antibiotic diffusion, sequestering and inhibiting their activity (Ciofu et al., 2022). Biofilm-forming bacteria and persisters also have distinct metabolic states that significantly reduce their antibiotic susceptibility (Yan & Bassler, 2019). These two types of phenotypic resistance share the common feature in their retarded or even cease of growth in the presence of antibiotics (Corona & Martinez, 2013). However, specific factors that promote phenotypic resistance and allow bacteria to proliferate in the presence of antibiotics remain poorly defined.

      Metal ions have a diverse impact on the chemical, physical, and physiological processes of antibiotic resistance  (Booth et al, 2011; Lu et al, 2020; Poole, 2017). This includes genetic elements that confer resistance to metals and antibiotics (Poole, 2017) and metal cations that directly hinder (or enhance) the activity of specific antibiotic drugs (Zhang et al., 2014). The metabolic environment can also impact the sensitivity of bacteria to antibiotics (Jiang et al., 2023; Lee & Collins, 2012; Peng et al., 2015; Zhang et al., 2020; Zhao et al., 2021). Light metal ions, such as magnesium, sodium, and potassium, can behave as cofactors for different enzymes (Du et al., 2016) and influence drug efficacy. Heavy metal ions, including Cu2+ and Zn2+, confer resistance to antibiotics (Yazdankhah et al., 2014; Zhang et al., 2018). Recent reports suggest that sodium negatively regulates redox states to promote the antibiotic resistance of Vibrio alginolyticus (Yang et al., 2018), while actively growing Bacillus subtilis cope with ribosome-targeting antibiotics by modulating ion flux (Lee et al, 2019). In Gram-negative bacteria, by contrast, zinc enhances antibiotic efficacy by potentiating carbapenem, fluoroquinolone, and β-lactam-mediated killing (Isaei et al., 2016; Zhang et al., 2014). Magnesium influences bacterial structure, cell motility, enzyme function, cell signaling, and pathogenesis (Wang et al., 2019). This mineral also modulates microbiota to harvest energy from the diet (Garcia-Legorreta et al., 2020), allowing Bacillus subtilis to cope with ribosome-targeting antibiotics by modulating ion flux (Lee et al., 2019). However, the role of magnesium in promoting phenotypic resistance is less well understood.

      Vibrios inhabit seawater, estuaries, bays, and coastal waters, regions full of metal ions such as magnesium (Kumarage et al., 2022). Magnesium is the second most dissolved element in seawater after sodium. At a salinity of 3.5% seawater, the magnesium concentration is about 54 mM (Potis, 1968), and in deep seawater, can be as high as 2,500 mM (Wang et al., 2024). Vibrio parahaemolyticus and V. alginilyticus are two representative Vibrio pathogens that infect humans and aquatic animals, resulting in illness and economic loss, respectively (Grimes, 2020). (Fluoro)quinolones such as balofloxacin are used to treat Vibrio infection, however, resistance has emerged due to overuse (Suyamud et al., 2024). Indeed, (fluoro)quinolones are one of China's two primary residual chemicals associated with aquaculture (Liu et al., 2017). Vibrio can develop quinolone resistance through mutations in the DNA gyrase gene or through plasmid-mediated mechanisms (Dutta et al., 2021). Thus, the use of V. parahaemolyticus and V. alginilyticus as bacterial representatives, and balofloxacin as a quinolone-based antibacterial representative, can help to define novel magnesiumdependent phenotypic resistance mechanisms of pathogenic Vibrio species. 

      The current study evaluated whether magnesium induces phenotypic resistance in Vibrio species and defined the molecular/genetic basis for this resistance. Genetic approaches, GC-MS analysis of metabolite and membrane remodeling upon antibiotic exposure, membrane physiology, and extensive antimicrobial susceptibility testing were used for the evaluations.”

      (2) The weakest point of this paper is in the logistics through the results section. The way authors represented the figures and interpreted them in the results section (or the figure legends) does not match. The figures are difficult to interpret and are not at all self-explanatory. 

      Thank you so much for your suggestion. We have followed your suggestion to check the match between result and figures. They are now revised. 

      (3) There are too many mislabeling of the figure panels in the main text which makes it difficult to find out which figures the authors are explaining. There should be more explanation on why and how they did the experiments and how the results were interpreted. 

      Thank you so much for your suggestion. We have checked the figures and main text to ensure that we make every figure clearly stated.  

      Reviewer #2 (Public Review): 

      Summary: 

      In this study, the authors aimed to identify if and how magnesium affects the ability of two particular bacteria species to resist the action of antibiotics. In my view, the authors succeeded in their goals and presented a compelling study that will have important implications for the antibiotic resistance research community. Since metals like magnesium are present in all lab media compositions and are present in the host, the data presented in this study certainly will inspire additional research by the community. These could include research into whether other types of metals also induce multi-drug resistance, whether this phenomenon can be observed in other bacterial species, especially pathogenic species that cause clinical disease, and whether the underlying molecular determinants (i.e. enzymes) of metal-induced phenotypic resistance could be new antimicrobial drug targets themselves. 

      Strengths: 

      This study's strengths include that the authors used a variety of methodologies, all of which point to a clear effect of exogenous Mg2+ on drug resistance in the targeted species. I also commend the authors for carrying out a comprehensive study, spanning evaluation of whole cell phenotypes, metabolic pathways, genetic manipulation, to enzyme activity level evaluation. The fact that the authors uncovered a molecular mechanism underlying Mg2+-induced phenotypic resistance is particularly important as the key proteins should be studied further.

      Weaknesses: 

      I believe there are weaknesses in the manuscript, however. The authors take for granted that the reader is familiar with all the assays utilized, and do not properly explain some experiments, and thus I highly suggest that the authors add a brief statement in each situation describing the rationale for each selected methodology (more details are in the private review to the authors). The Results section is also quite long and bogs down at times, and I suggest that the authors reduce its length by 10 to 20%. In contrast, the Introduction is sparse and lacks key aspects, for example, there should be mention of the study's main purpose and approaches, plus an introduction to the authors' choice of species and their known drug resistance properties, as well as the drug of choice (balofloxacin). Another notable weakness is that the authors evaluated Mg2+-induced phenotypic resistance only against two closely related species, and thus the generalizability of this mechanism of drug resistance is not known. The paper would be strengthened if the authors could demonstrate this type of phenotypic resistance in at least one more Gram-negative species and at least one Gram-positive species (antimicrobial susceptibility evaluations would suffice), each of which should be pathogenic to humans. Demonstrating magnesium-induced phenotypic drug resistance in the WHO Priority Bacterial Pathogens would be particularly important. 

      In general, the conclusions drawn by the authors are justified by the data, except for the interpretation of some experiments. Importantly, this paper has discovered new antimicrobial resistance mechanisms and has also pointed to potential new targets for antimicrobials. 

      Thank you so much for your suggestion! We followed your idea the revise the manuscript as following:

      (1) We added a brief statement in the situation to explain the result and methodology according to your suggestion in the private review.

      (2) To make the streamline of the story more logic, we moved the whole second result to supplementary text and supplementary figure. 

      (3) We revised the introduction part by adding additional information to make it informative and to the point as following:

      “Non-inheritable antibiotic or phenotypic resistance represents a serious challenge for treating bacterial infections. Phenotypic resistance does not involve genetic mutations Phenotypic resistance does not involve genetic mutations and is transient, allowing bacteria to resume normal growth. Biofilm and bacterial persisters are two phenotypic resistance types that have been extensively studied (Brandis et al., 2023; Corona & Martinez, 2013). Biofilms have complex structures, containing elements that impede antibiotic diffusion, sequestering and inhibiting their activity (Ciofu et al., 2022). Biofilm-forming bacteria and persisters also have distinct metabolic states that significantly reduce their antibiotic susceptibility (Yan & Bassler, 2019). These two types of phenotypic resistance share the common feature in their retarded or even cease of growth in the presence of antibiotics (Corona & Martinez, 2013). However, specific factors that promote phenotypic resistance and allow bacteria to proliferate in the presence of antibiotics remain poorly defined.

      Metal ions have a diverse impact on the chemical, physical, and physiological processes of antibiotic resistance  (Booth et al, 2011; Lu et al, 2020; Poole, 2017). This includes genetic elements that confer resistance to metals and antibiotics (Poole, 2017) and metal cations that directly hinder (or enhance) the activity of specific antibiotic drugs (Zhang et al., 2014). The metabolic environment can also impact the sensitivity of bacteria to antibiotics (Jiang et al., 2023; Lee & Collins, 2012; Peng et al., 2015; Zhang et al., 2020; Zhao et al., 2021). Light metal ions, such as magnesium, sodium, and potassium, can behave as cofactors for different enzymes (Du et al., 2016) and influence drug efficacy. Heavy metal ions, including Cu2+ and Zn2+, confer resistance to antibiotics (Yazdankhah et al., 2014; Zhang et al., 2018). Recent reports suggest that sodium negatively regulates redox states to promote the antibiotic resistance of Vibrio alginolyticus (Yang et al., 2018), while actively growing Bacillus subtilis cope with ribosome-targeting antibiotics by modulating ion flux (Lee et al, 2019). In Gram-negative bacteria, by contrast, zinc enhances antibiotic efficacy by potentiating carbapenem, fluoroquinolone, and β-lactam-mediated killing (Isaei et al., 2016; Zhang et al., 2014). Magnesium influences bacterial structure, cell motility, enzyme function, cell signaling, and pathogenesis (Wang et al., 2019). This mineral also modulates microbiota to harvest energy from the diet (Garcia-Legorreta et al., 2020), allowing Bacillus subtilis to cope with ribosome-targeting antibiotics by modulating ion flux (Lee et al., 2019). However, the role of magnesium in promoting phenotypic resistance is less well understood.

      Vibrios inhabit seawater, estuaries, bays, and coastal waters, regions full of metal ions such as magnesium (Kumarage et al., 2022). Magnesium is the second most dissolved element in seawater after sodium. At a salinity of 3.5% seawater, the magnesium concentration is about 54 mM (Potis, 1968), and in deep seawater, can be as high as 2,500 mM (Wang et al., 2024). Vibrio parahaemolyticus and V. alginilyticus are two representative Vibrio pathogens that infect humans and aquatic animals, resulting in illness and economic loss, respectively (Grimes, 2020). (Fluoro)quinolones such as balofloxacin are used to treat Vibrio infection, however, resistance has emerged due to overuse (Suyamud et al., 2024). Indeed, (fluoro)quinolones are one of China's two primary residual chemicals associated with aquaculture (Liu et al., 2017). Vibrio can develop quinolone resistance through mutations in the DNA gyrase gene or through plasmid-mediated mechanisms (Dutta et al., 2021). Thus, the use of V. parahaemolyticus and V. alginilyticus as bacterial representatives, and balofloxacin as a quinolone-based antibacterial representative, can help to define novel magnesiumdependent phenotypic resistance mechanisms of pathogenic Vibrio species. 

      The current study evaluated whether magnesium induces phenotypic resistance in Vibrio species and defined the molecular/genetic basis for this resistance. Genetic approaches, GC-MS analysis of metabolite and membrane remodeling upon antibiotic exposure, membrane physiology, and extensive antimicrobial susceptibility testing were used for the evaluations.”

      (4) We examined the effect of magnesium in WHO listed priority strains, which confirmed the results as following:

      “Importantly, exogenous MgCl2 also increased MICs of clinic isolates, carbapenemresistant Escherichia coli, carbapenem-resistant Klebsiella pneumoniae, carbapenemresistant Pseudomonas aeruginosa and carbapenem-resistant Acinetobacter baumannii to balofloxacin (Fig 1G).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) There are many grammatical mistakes to point out. The manuscript needs proofreading and editing.

      We appreciate this comment! The manuscript has been revised by a native speaker.

      (2) The introduction could be more informative. A little more description of magnesium - such as what it does to antibiotics and how it's known to affect the microbiome - might be helpful for the general readers. The question remains why out of all the metal ions that might affect antibiotic resistance (many of them are less explored), authors particularly decided to work on the effect of magnesium. The introduction should cover the rationale of their hypothesis. Also, the authors might want to briefly talk about the model organisms (V. algonolyticus and V. parahemolyticus) describing how threatening they are and how they are becoming resistant to antibiotics. 

      We appreciate this comment! We revise the introduction by providing additional information as following:

      “In Gram-negative bacteria, by contrast, zinc enhances antibiotic efficacy by potentiating carbapenem, fluoroquinolone, and β-lactam-mediated killing (Isaei et al., 2016; Zhang et al., 2014). Magnesium influences bacterial structure, cell motility, enzyme function, cell signaling, and pathogenesis (Wang et al., 2019). This mineral also modulates microbiota to harvest energy from the diet (Garcia-Legorreta et al., 2020), allowing Bacillus subtilis to cope with ribosome-targeting antibiotics by modulating ion flux (Lee et al., 2019). However, the role of magnesium in promoting phenotypic resistance is less well understood.

      Vibrios inhabit seawater, estuaries, bays, and coastal waters, regions full of metal ions such as magnesium (Kumarage et al., 2022). Magnesium is the second most dissolved element in seawater after sodium. At a salinity of 3.5% seawater, the magnesium concentration is about 54 mM (Potis, 1968), and in deep seawater, can be as high as 2,500 mM (Wang et al., 2024). Vibrio parahaemolyticus and V. alginilyticus are two representative Vibrio pathogens that infect humans and aquatic animals, resulting in illness and economic loss, respectively (Grimes, 2020). (Fluoro)quinolones such as balofloxacin are used to treat Vibrio infection, however, resistance has emerged due to overuse (Suyamud et al., 2024). Indeed, (fluoro)quinolones are one of China's two primary residual chemicals associated with aquaculture (Liu et al., 2017). Vibrio can develop quinolone resistance through mutations in the DNA gyrase gene or through plasmid-mediated mechanisms (Dutta et al., 2021). Thus, the use of V. parahaemolyticus and V. alginilyticus as bacterial representatives, and balofloxacin as a quinolone-based antibacterial representative, can help to define novel magnesiumdependent phenotypic resistance mechanisms of pathogenic Vibrio species. 

      The current study evaluated whether magnesium induces phenotypic resistance in Vibrio species and defined the molecular/genetic basis for this resistance. Genetic approaches, GC-MS analysis of metabolite and membrane remodeling upon antibiotic exposure, membrane physiology, and extensive antimicrobial susceptibility testing were used for the evaluations. ”

      (3) Figure 1C is mislabeled as 1B (line 100). Line 101: The sentence is not clear and very confusing. What is meant by 15.6mM - 62.4 mM? Are they talking about the concentration of BLFX (though in the figure the concentration was shown in µg)? Please rewrite the sentence in a simplified way. Also, the zone of inhibition was decreased with increasing MgCl2, not increased. 

      We appreciate this comment! These have been revised, including that Fig 1B is now corrected as Fig. 1C. Line 101, which is now Line 122. The sentence was revised as following:

      “At balofloxacin doses of 1.56, 3.125, 6.25, and 12.5 µg, the zone of inhibition decreased with increasing MgCl2 (Fig 1D)”

      (4) In the western blot images, it would be nice to indicate the MW of the protein bands shown. The loading control used for the experiments should be clearly mentioned in the figure legends. 

      We appreciate this comment! The MWs are indicated in the western-blot image throughout the manuscript. 

      The loading control is clearly stated in the figure legend as following:

      “Whole cell lysates resolved by SDS-PAGE gel was stained with Coomassie brilliant blue as loading control.”. 

      (5) Figures 2 B and C: the figure legend does not explain what the authors wanted to show. It's not clear how they plotted the inhibitory curve, or the binding efficacy. These panels need an explanation of how the analysis was done.

      We appreciate this comment! The figure 2 is now removed to Suppl. Fig 2, and the description of figure 2 is moved to Suppl. Text. We revise the description of the result as following, which is in Suppl. Text:

      “Prior studies suggest that the chelation of antibiotics by magnesium ions inhibits antibiotic uptake (Deitchman et al., 2018; Lunestad and Goksøyr, 1990). To investigate whether magnesium binds to balofloxacin, balofloxacin was pre-incubated with magnesium, and zone of inhibition (ZOI) analysis was conducted. Six different concentrations of balofloxacin were separately incubated with six different concentrations of MgCl2, and then spotted on filter paper so that a defined amount of balofloxacin could be used for ZOI. While lower concentrations of MgCl2, (0.78, 3.125, or 12.5 mM) did not alter the ZOI, higher concentrations, including 50 and 200 mM MgCl2, decreased the ZOI (Suppl. Fig 2A), suggesting that even high doses of magnesium had only a partial effect on balofloxacin through direct binding. For example, at 200 mM MgCl2 and 5 or 10 μg/mL balofloxacin, the balofloxacin ZOI was 53.2 and 70.3% of the ZOI at 0 mM MgCl2, suggesting that  50% of the antibiotics were still functional. Intracellular BLFX also decreased with increasing MgCl2 (Suppl. Fig 2B), while exogenous Mg2+ increased intracellular Mg2+ levels in a dose-dependent manner. For example, exogenous 50 and 200 mM MgCl2 increased intracellular Mg2+ levels to 1.21 and 1.31 mM, respectively (Suppl. Fig 2C). The relationship between TolC, an efflux pump that transports quinolones from bacterial cells, and Mg2+ was also assessed (Kobylka et al., 2020; Song et al., 2020). The expression of TolC/tolC was unaffected by Mg2+ (Suppl. Fig 2D). Magnesium is critical for LPS stability. LPS levels increased at 200 mM Mg2+ (Suppl. Fig 2E), however, the loss of waaF, lpxA, and lpxC, three key genes involved in LPS biosynthesis, did not influence balofloxacin sensitivity/resistance in the presence of Mg2+ (Suppl. Fig 2F). These findings suggest that magnesium-induced LPS biosynthesis does not contribute directly to BLFX resistance and demonstrate that Mg2+ influx is involved in balofloxacin resistance.”

      (6) For the metabolomics results, it will help immensely if the authors provide a volcano plot of the identified metabolites and plot the heat map according to the -log2 metabolite intensities. In Figure 3A, it's not clear what information is conveyed through Euclidean distance calculations of the heat map. In Figure 3 B, the authors mentioned that the OPLS-DA test was conducted, although the figure shows a PCA plot, so it's not clear how these two are connected. Figure 3 E: the figure legend says scattered plot, but the panel represents color-coded numerical values, not a scattered plot. Also, it's not clear how they got those values. 

      We appreciate this comment! We quite agree with you that if the differential metabolites could be shown as volcano plot. However, we didn’t adopt volcano plot in this study because this is a magnesium concentration-dependent metabolomes that includes 6 groups in parallel. Volcano plots may give a complex view of the comparison among different groups. We also tried to plot the heat map according to the -log2 metabolite intensities. Although this analysis cluster 200 mM and 50 mM groups better, the data of low magnesium concentrations was not consistent, which may be due to the minor metabolic change of low concentrations magnesium. Thank you for your understanding. 

      For Euclidean distance calculations, we explain in the figure legend as following:

      “Euclidean distance calculations were used to generate a heatmap that shows clustering of the biological and technical replicates of each treatment.” 

      In Figure 2B, which was Figure 3B in previous version, it has been replaced with OPLS-DA analysis in the revised version. 

      In Figure 2E, which was Figure 3E in previous version, it is revised as following:

      “E. Areas of the peaks of palmitic acid and stearic acid generated by GC-MS analysis.” 

      (7) In Figure 4, the figure legends (as well as the in the text) are not properly referred to. Please make sure to refer to the correct panel. 

      We appreciate this comment! The figure legends have been corrected to match the panel and text. 

      Figure 4F: how was the synergy analysis done? In the methods section, the authors described the antibiotic bactericidal assay protocol, but there was no clear indication of how they generated the isobologram. 

      We appreciate this comment! We provide additional information in the Figure 3F legend, which was Figure 4F in previous version,  as following: 

      “Synergy analysis for BFLX with palmitic acid for V. alginolyticus. Synergy was performed by comparing the dose needed for 50% inhibition of the synergistic agents (white) and non-synergistic (i.e., additive) agents (purple).”

      (8) Figure 5 A: the scatter plot is plotted according to the area along the Y axis: which "area" is represented here? There is absolutely no explanation, neither in the results nor in the figure legends. Using box plots might be a better option than using a scattered plot.

      We appreciate this comment! “Area” has been noted in the revised manuscript as following:

      “The area indicates the area of the peak of the metabolite in total ion chromatography of GC-MS.” 

      (9) In Figure 6 A, the heat map is plotted according to the column Z scores. What is meant by "column Z score"? The corresponding figure legend says, "heat map showing differential abundance of lipid". Z scores do not represent an abundance of a variable, so the conclusion might not be appropriate here. 

      We appreciate this comment! In Figure 5A, which was Figure 6A in previous version, column Z score shows the abundance of metabolites analyzed, which is automatically generated in the heat map analysis to give a sign of these metabolites tested. The legend has been revised as following: 

      “Heatmap showing changes in differential lipid levels at the indicated concentration of MgCl2.”  

      (10) Line 313-314: it should be Figure EV6C.  

      We appreciate this comment! The citation has been corrected.

      (11) The authors have shown that Mg+2 does not alter the LPS transport system, however, there was some significant increase in LPS expression at 200mM MgCl2. It would be interesting if the authors could also check if Mg+2 has any effect on the outer membrane protein (OMP) integrity (by checking OMP components BamA and LptD).  

      We appreciate this comment!  We have carefully examined the membrane permeability in Figure 7. We thus didn’t perform additional experiment here to see the change of BamA and LptD. Thank you very much for your understanding.

      (12) I wonder if the authors could check the effect of extracellular Mg+2 during the co-treatment of palmitic acid, linoleic acid, and balofloxacin. Will there still be the antagonistic effect or the presence of Mg+2 could change the phenotype? 

      We appreciate this comment! Additional experiments is performed as following:

      “Furthermore, magnesium had a minimal effect on the antagonistic effect of palmitic acid, linolenic acid, and balofloxacin (Fig 4G), suggesting that this mineral functions through lipid metabolism.” 

      Reviewer #2 (Recommendations For The Authors)

      (1) As mentioned in the Public Review, I strongly believe that the impact of this study will be more significant if magnesium-induced phenotypic drug resistance could be demonstrated in at least one other Gram-negative and one other Grampositive species, both of which should be human pathogens. The full suite of experiments would not be necessary for this suggestion; evaluation of the effect of Mg concentration in growth media on the drug resistance of other species, testing the different antibiotic types used in this study, would be sufficient. 

      We appreciate this comment! Additional experiments have performed to test this idea. Mg2+ has the similar effect on carbapenem-resistant Escherichia coli, carbapenem-resistant Klebsiella pneumoniae, carbapenem-resistant Pseudomonas aeruginosa and carbapenem-resistant Acinetobacter baumannii as the similar as on the Vibrio species in shown in Figure 1G. These have been described following as

      “Importantly, exogenous MgCl2 also increased MICs of clinic isolates, carbapenemresistant Escherichia coli, carbapenem-resistant Klebsiella pneumoniae, carbapenemresistant Pseudomonas aeruginosa and carbapenem-resistant Acinetobacter baumannii to balofloxacin (Fig 1G).”

      (2) I recommend that the Introduction section be expanded. I recommend one or two sentences introducing the two Vibrio species selected for study. I.e. why did the authors choose these two species? What is known about their phenotypic drug resistance in the literature? Why did the authors select balofloxacin for their studies, is it a common antimicrobial used vs Vibrios? As well, the end of the Introduction section ends abruptly with no transition to the present study itself. The end of the introduction should include one or two sentences introducing the main purpose of the study, its approach, and the techniques undertaken. For example, "In this study, we evaluated whether magnesium induces phenotypic resistance in Vibrio species and the molecular/genetic basis for such resistance. We used genetic approaches, GC-MS analysis of metabolite and membrane remodeling upon antibiotic exposure, membrane physiology, and extensive antimicrobial susceptibility evaluations." 

      We appreciate this comment! We revise the introduction by providing additional information as following:

      “In Gram-negative bacteria, by contrast, zinc enhances antibiotic efficacy by potentiating carbapenem, fluoroquinolone, and β-lactam-mediated killing (Isaei et al., 2016; Zhang et al., 2014). Magnesium influences bacterial structure, cell motility, enzyme function, cell signaling, and pathogenesis (Wang et al., 2019). This mineral also modulates microbiota to harvest energy from the diet (Garcia-Legorreta et al., 2020), allowing Bacillus subtilis to cope with ribosome-targeting antibiotics by modulating ion flux (Lee et al., 2019). However, the role of magnesium in promoting phenotypic resistance is less well understood.

      Vibrios inhabit seawater, estuaries, bays, and coastal waters, regions full of metal ions such as magnesium (Kumarage et al., 2022). Magnesium is the second most dissolved element in seawater after sodium. At a salinity of 3.5% seawater, the magnesium concentration is about 54 mM (Potis, 1968), and in deep seawater, can be as high as 2,500 mM (Wang et al., 2024). Vibrio parahaemolyticus and V. alginilyticus are two representative Vibrio pathogens that infect humans and aquatic animals, resulting in illness and economic loss, respectively (Grimes, 2020). (Fluoro)quinolones such as balofloxacin are used to treat Vibrio infection, however, resistance has emerged due to overuse (Suyamud et al., 2024). Indeed, (fluoro)quinolones are one of China's two primary residual chemicals associated with aquaculture (Liu et al., 2017). Vibrio can develop quinolone resistance through mutations in the DNA gyrase gene or through plasmid-mediated mechanisms (Dutta et al., 2021). Thus, the use of V. parahaemolyticus and V. alginilyticus as bacterial representatives, and balofloxacin as a quinolone-based antibacterial representative, can help to define novel magnesiumdependent phenotypic resistance mechanisms of pathogenic Vibrio species. 

      The current study evaluated whether magnesium induces phenotypic resistance in Vibrio species and defined the molecular/genetic basis for this resistance. Genetic approaches, GC-MS analysis of metabolite and membrane remodeling upon antibiotic exposure, membrane physiology, and extensive antimicrobial susceptibility testing were used for the evaluations. ”

      (3) The authors introduce the acronym AWST but never use it again in the paper, instead they use SWT. The authors should introduce SWT only for consistency. 

      We appreciate this comment! We have corrected all the “SWT” to “ASWT”

      (4) Line 76 is not clear: what is meant by "some of which could influence drug efficacy" - the enzymes that utilize light metal ions are co-factors? Or the metals directly?  

      We appreciate this comment! The information we wanted to deliver is that light metal ions can serve as cofactors to catalyze biochemical reaction. Such chemical reaction would alter the drug efficacy, e.g. the Fe-S cluster are metallocofactor for proteins which regulates redox chemistry including antibioticinduced redox change. However, this information is not appropriate for this manuscript, so we delete this sentence. 

      (5) Line 90: add a reference corroborating that this chemical composition is a mimic of marine water. The NaCl concentration used in particular looks quite low. 

      We appreciate this comment! It was a typo error. The NaCl concentration was 210 mM as shown in Suppl. Table 1. We also provide details of the chemical composition of the marine water as following:

      “Marine environments and agriculture, where antibiotics are commonly used, are rich in magnesium. To investigate whether this mineral impacts antibiotic activity, the minimal inhibitory concentration (MIC) of V. alginolyticus ATCC33787 and V. parahaemolyticus VP01, which we referred as ATCC33787 and VP01 afterwards, isolated from marine aquaculture, to balofloxacin (BLFX) in Luria-Bertani medium

      (LB medium) plus 3% NaCl as LBS medium and “artificial seawater” (ASWT) medium that included the major ion species in marine water (Wilson, 1975) (LB medium plus 210 mM NaCl, 35 mM Mg2SO4, 7 mM KCl, and 7 mM CaCl2) were assessed”

      (6) Line 98 and Figure 1B. M9 is indicated in the text but does not appear in the figure, the figure only shows SWT. This should be checked. Line 99: based on Figure 1C, the authors are adding MgCl2 to SWT, SWT should be mentioned in this line. Line 100: I believe this is referring to Figure 1C, which should be checked. 

      We appreciate this comment! 

      Line 98, which is now Line 118: We have corrected M9 to ASWT as following:

      “However, the MIC for BLFX was higher in ASWT medium supplemented with Mg2SO4 or MgCl2 than in LB medium (Fig 1B).”

      Line 99, which is now Line 133: the sentence is corrected as following:

      “The MIC for BLFX increased at higher concentrations of MgCl2 in ASWT”

      Line 100, which is now Line 135: we have corrected Fig 1B to Fig. 1C.

      (7) Line 101: text and Figure 1D are not consistent, as Figure 1D does not show this level of precision in added MgCl2 as indicated in the text (15.6 - 62.4 mM).  

      We appreciate this comment! The sentence has been corrected as following: “At balofloxacin doses of 1.56, 3.125, 6.25, and 12.5 µg, the zone of inhibition decreased with increasing MgCl2 (Fig 1D)””.  

      (8) MgCl2 clearly induces increasing levels of BLFX resistance, and to high levels, but not for every antibiotic. For example, the level of increased resistance to blactams is low (ceftriaxone) and plateaus (ceftazidime). As well, resistance to gentamicin plateaus at a lower level than the other aminoglycosides. These observations do not take away from the conclusion that Mg induces multi-drug resistance, but since the behaviour of the MICs for these drugs is different than the other drugs, they should be mentioned. Also, Figure 1F - tetracyclines (plural) is used for vertical axis label - does this refer to the tetracycline itself or the class itself, and if the class, which one was tested? 

      We appreciate this comment! We revise the description as following: “Notably, magnesium had a reduced effect on ceftriaxone and gentamicin than other antibiotics.”

      The tetracyclines is labeled as “Oxytetracycline” in the revised manuscript. 

      - The magnesium chelation experiments presented in Figure 2 are not clear. The authors should briefly mention how this was done around line 128, and what data underlies the values in Figure 2C. Figure 2B is also not clear to me at all. Similarly, how the authors measured intracellular balofloxacin and Mg2+ is not clear and should be mentioned briefly around lines 130-132. 

      We appreciate this comment! These have been rewritten following as  “To investigate whether magnesium binds to balofloxacin, balofloxacin was preincubated with magnesium, and zone of inhibition (ZOI) analysis was conducted. Six different concentrations of balofloxacin were separately incubated with six different concentrations of MgCl2, and then spotted on filter paper so that a defined amount of balofloxacin could be used for ZOI. While lower concentrations of MgCl2, (0.78, 3.125, or 12.5 mM) did not alter the ZOI, higher concentrations, including 50 and 200 mM MgCl2, decreased the ZOI (Suppl. Fig 2A), suggesting that even high doses of magnesium had only a partial effect on balofloxacin through direct binding. For example, at 200 mM MgCl2 and 5 or 10 μg/mL balofloxacin, the balofloxacin ZOI was 53.2 and 70.3% of the ZOI at 0 mM MgCl2, suggesting that  50% of the antibiotics were still functional. Intracellular BLFX also decreased with increasing MgCl2 (Suppl. Fig 2B), while exogenous Mg2+ increased intracellular Mg2+ levels in a dose-dependent manner. For example, exogenous 50 and 200 mM MgCl2 increased intracellular Mg2+ levels to 1.21 and 1.31 mM, respectively (Suppl. Fig 2C). The relationship between TolC, an efflux pump that transports quinolones from bacterial cells, and Mg2+ was also assessed (Kobylka et al., 2020; Song et al., 2020). The expression of TolC/tolC was unaffected by Mg2+ (Suppl. Fig 2D). Magnesium is critical for LPS stability. LPS levels increased at 200 mM Mg2+ (Suppl. Fig 2E), however, the loss of waaF, lpxA, and lpxC, three key genes involved in LPS biosynthesis, did not influence balofloxacin sensitivity/resistance in the presence of Mg2+ (Suppl. Fig 2F). These findings suggest that magnesium-induced LPS biosynthesis does not contribute directly to BLFX resistance and demonstrate that Mg2+ influx is involved in balofloxacin resistance.”

      - Line 135: LPS cannot be "expressed", as the authors word it here. This should be corrected. Also, the inspection of Figure 2G actually shows the levels of LPS increase with increased Mg2+. The authors should re-evaluate these results and change their description around this area of the Results. 

      We appreciate this comment! We have removed the whole Figure 2 to Supplementary Text and Supplementary Figure 2. We rewrite this part as following: “The relationship between TolC, an efflux pump that transports quinolones from bacterial cells, and Mg2+ was also assessed (Kobylka et al., 2020; Song et al., 2020). The expression of TolC/tolC was unaffected by Mg2+ (Suppl. Fig 2D). Magnesium is critical for LPS stability. LPS levels increased at 200 mM Mg2+ (Suppl. Fig 2E), however, the loss of waaF, lpxA, and lpxC, three key genes involved in LPS biosynthesis, did not influence balofloxacin sensitivity/resistance in the presence of Mg2+ (Suppl. Fig 2F). These findings suggest that magnesium-induced LPS biosynthesis does not contribute directly to BLFX resistance and demonstrate that Mg2+ influx is involved in balofloxacin resistance.”

      - Section: MgCl2 affects bacterial metabolism. Authors switched to M9 medium - why? This contrasts with other sections using SWT and should be explained. Also, I cannot evaluate whether the statistical analysis of the data here was performed correctly and was appropriate for this type of experiment. I advise the authors to move the details in lines 166-169 to the Materials and Methods and replace this section instead with a more accessible description of the statistical analysis that a non-expert would be able to appreciate. Furthermore, analysis of Figure 3A indicates that the levels of asparagine, 4-hydroxybutyric acid, uracil, cystathionine, fumaric acid, and aminoethanol have significantly changed at high MgCl2, but these are not mentioned in the text. I suggest the authors mention these if they are relevant to the 12 enriched pathways, especially the biosynthesis of fatty acids. 

      We appreciate this comment! 

      We indicate the reason we use M9 medium as following:

      “To better understand how magnesium affects bacterial metabolism” for explaining why the M9 medium was used.”

      The information lines 166-169 indicated has been removed to M &M. 

      We have carefully examined the abundance of the metabolites and the enriched pathway. Among the listed metabolites, only fumarate is within the enriched pathways. We mention this point in our revised manuscript as following:

      “The increase in fatty acid biosynthesis could be partially explained by an imbalanced pyruvate cycle/TCA cycle, in which fumarate levels increased at higher Mg2+ while succinate levels increased at lower Mg2+ (Suppl. Fig 5B). These findings indicated that glycolysis fluxes into fatty acid biosynthesis rather than the pyruvate cycle/TCA cycle. The relevance of fatty acids and BLFX was demonstrated by the observation that exogenous palmitic acid increased bacterial resistance to balofloxacin (Fig 2F). These results suggest that fatty acid metabolism may be critical to magnesium-based phenotypic resistance.”

      - Line 211 appears to refer to Figure 4F and should be checked. Similarly in line 216 - appears this should be Figure 4H, and line 218 should be Figure 4H. Line 226: add a reference to Fig 4I (after arcA was decreased). Line 227: what are genes N646_1004 and N646_1885? Based on Fig 4J these are crp - authors should add to line 227. Line 228 appears to refer to Figure 4J, not Figure 4I. Line 229 - should be Figure 4K, not Figure 4I. Line 231 - should be 4L, not 4K. Line 239 - should be 4M.

      We appreciate this comment! The text and figure is now matched. 

      - Line 312: the descriptions of "11 lipids, 32 lipids, and 53", and then "26 lipids, 52 lipids, and 107 lipids" are not clear at all and should be corrected. 

      We appreciate this comment! The sentence is revised as following:

      “The abundance of 11, 32, and 53 lipids was increased in 3.125, 50, and 200 mM MgCl2-treated bacteria, respectively, while the abundance of 26, 52, and 107 lipids was decreased in 3.125, 50, and 200 mM MgCl2-treated bacteria, respectively (Suppl. Fig 7C)”

      - Line 340. What is the assay the authors are using to measure the levels of the PGS and PSS enzymes? This is not mentioned or clear in this part of the Results.  

      We appreciate this comment!  We provide the information in the manuscript as following:

      “Levels of PGS and PSS were quantified by ELISA kits according to manufacture’s instruction (Shanghai Fusheng Industrial Co., Ltd., China)”

      - Line 372: What is the assay for measuring membrane depolarization? This is not mentioned and I suggest it should be. Line 374: Figure 7B does not show time dependence, only dose dependence, this should be corrected, it is assumed the authors are referring to Fig 7C for the time dependence data. 

      We appreciate this comment! We provide the information in the result as following:  

      “The voltage-sensitive dye, DiBAC4(3) showed that 12.5–200 mM MgCl2 promoted membrane depolarization in a dose-dependent manner (Fig 6A)”

      We also explain how DiBAC4(3) can be used to measure membrane depolarization in the Materials and Methods section as following:

      “DiBAC4(3) is a s voltage-sensitive probe that penetrates depolarized cells, binding intracellular proteins or membranes exhibiting enhanced fluorescence and red spectral shift.”

      To make it clear the specific figure, we revise the sentence as following:

      “Meanwhile, MgCl2 had a dose-dependent (Fig 6B) and time-dependent (Fig 6C) effect on proton motive force (PMF).”

      - Line 384: mention how FM5-95 measures membrane permeability. The authors should also clarify how this reagent is used to measure membrane fluidity, and it is not clear if the data for this is presented in Figure 7 - please clarify. Regarding SYTO9 dye experiment: the authors should briefly explain the experimental design - how SYTO9 dye operates and why FACS was chosen. What is labeled with FITC?  

      We appreciate this comment! We clarify the reason we use FM5-95 in the Methods and Materials section as following:

      “Measurement of fluidity by fluorescence microscopy

      Measurement of membrane fluidity is performed as previously described (Wen et al., 2022). Briefly, ATCC33787 were cultured in medium with indicated concentrations of MgCl2, collected and then adjusted to OD 0.6. Aliquot of 100 μL bacteria cells of each sample were diluted to 1 mL and 10 μL (10 mg/mL) FM5-95 (Thermo Fisher

      Scientific, USA) was added. FM5-95 is a lipophilic styryl dye that insert into the outer leaflet of bacterial membrane and become fluorescence. This dye preferentially bind to the microdomains with high membrane fluidity(Wen et al., 2022). After incubated for 20 min at 30 ℃ at vibration without light, the sample was centrifuged for 10 min at 12,000 rpm. The pellets were resuspended with 20 μL of 3% NaCI. Aliquot of 2 μL sample was dropped on the agarose slide, and take photos under the inverted fluorescence microscope.”

      This data is presented as micrographs in Fig. 6D, which shows the decreased FM5-95 staining with increasing concentrations of MgCl2. We make this description clear with the following revision:

      “FM5-95 staining decreased with increasing concentrations of Mg2+, and no staining was observed in the presence of 200 mM Mg2+ (Fig 6D).”

      We explain the reason why we use SYTO9 as following:

      “SYTO9, a green fluorescent dye that binds to nucleic acid, enters and stains bacteria cells when there is an increase in membrane permeability (Lehtinen et al., 2004; McGoverin et al., 2020). Staining decreased with increasing MgCl2, indicating that bacterial membrane permeability declined in an Mg2+ dose-dependent manner (Fig 6E).”

      We didn’t use FACS in this study, while we only analyze the fluorescence distribution with the equipment. To make it clear, we revise the sentence as following:

      “After incubated for 15 min at 30 ℃ at vibration without light, the mixtures were filtered and measured by flow cytometry (BD FACSCalibur, USA).”

      - Lines 391-397. The statement that palmitic acid shifts the peaks in Figure 7F is not supported by the data. There is essential no change in the major peak position within each MgCl2 concentration set with increasing palmitic acid. For the linolenic acid data, it is clear that linolenic acid increases permeability only at 50 mM MgCl2-this should be mentioned in the text. 

      We appreciate this comment! We revise the sentence as following:

      “Exogenous palmitic acid also shifted the fluorescence signal peaks to the left in an MgCl2-dependent manner while palmitic acid only slightly shifted the peaks (Fig 6F). In contrast, exogenous linolenic acid shifted the peak to the right in a dose-dependent manner at 50 mM MgCl2 (Fig 6G).” 

      - Line 404-405 - as mentioned earlier, the assay for the update of BLFX should be mentioned (if it is done so earlier in the text, then it does not need to be here).  

      We appreciate this comment! It has been mentioned in the introduction.  

      - Discussion: CpxA/R-OmprF pathway is mentioned here for the first time. Is this one of the pathways modified by MgCl2 as determined during the course of the study? If so, this should be reworded to mention that. If not, the relevance of this particular pathway as it relates to light metals and phenotypic resistance should be discussed.

      We appreciate this comment! Since it is not relevant to the discussion of Mg2+ and fatty acid biosynthesis, we delete this sentence in the revised manuscript.  

      -The following grammatical errors should be corrected:

      -line 55 change to: "genetic mutations; instead, this type of resistance is transient, and bacteria resume normal growth"

      -line 57: change to "resistance types are biofilm" 

      -line 61: change to "states that significantly" 

      -line 63: change to "resistance share the common feature in they retard or even cease in the presence" 

      -line 65: change to "resistance that allow bacteria to proliferate" 

      -line 81: change "But whether" to "Whether" 

      -line 178: change to "may be critical to the Mg-based phenotypic resistance"

      -line 86: change to "Marine environments and agriculture are rich in magnesium, where..." 

      -line 93: change in to vs

      -line 154: insert space after metabolism 

      -line 158: change 'identified" to "focused on the levels of" 

      -line 160: change "The levels of forty-one metabolites" 

      -line 198: change shared to share 

      -line 310: increased is duplicated, delete one 

      -line 451: add "the" before ratio 

      -line 453: gram should be capitalized 

      -line 462: "the regulation" should be reworded to "More importantly, the effect of exogenous MgCl targets the..." 

      -line 469: add dash between Mg2+ and limited

      -line 478: change "the crucial" to "a crucial" 

      -there are numerous locations in the manuscript where the word "magnetism" is used when clearly the word is supposed to be magnesium - this should be corrected

      We appreciate this comment! These have been corrected or revised. 

      Editors comments:

      Page 2 line 27; Page 25 line number 426; page 27 line number 481: In the abstract and discussion, only Vibrio alginolyticus was mentioned, even though two Vibrio species were used in the study. It would be helpful to understand the rationale behind the focus on this particular species.

      We appreciate this comment! We have revised the introduction to provide additional information as following:

      “Vibrios inhabit seawater, estuaries, bays, and coastal waters, regions full of metal ions such as magnesium (Kumarage et al., 2022). Magnesium is the second most dissolved element in seawater after sodium. At a salinity of 3.5% seawater, the magnesium concentration is about 54 mM (Potis, 1968), and in deep seawater, can be as high as 2,500 mM (Wang et al., 2024). Vibrio parahaemolyticus and V. alginilyticus are two representative Vibrio pathogens that infect humans and aquatic animals, resulting in illness and economic loss, respectively (Grimes, 2020). (Fluoro)quinolones such as balofloxacin are used to treat Vibrio infection, however, resistance has emerged due to overuse (Suyamud et al., 2024). Indeed, (fluoro)quinolones are one of China's two primary residual chemicals associated with aquaculture (Liu et al., 2017). Vibrio can develop quinolone resistance through mutations in the DNA gyrase gene or through plasmid-mediated mechanisms (Dutta et al., 2021). Thus, the use of V. parahaemolyticus and V. alginilyticus as bacterial representatives, and balofloxacin as a quinolone-based antibacterial representative, can help to define novel magnesium-dependent phenotypic resistance mechanisms of pathogenic Vibrio species.”

      On Page 2, line 34: The abstract contains some undefined abbreviations, such as 'PE' and 'PG', which should be explained. 

      We appreciate this comment! We explain the PE and PG in the revised abstract as following:

      “phosphatidylethanolamine (PE) biosynthesis is reduced and phosphatidylglycerol (PG)”

      On Page 2, line 31-32: For the statement "Exogenous supplementation of fatty acids confirm the role of fatty acids in antibiotic resistance…" it would be beneficial to specify whether the fatty acids were saturated or unsaturated. 

      Response, We appreciate this comment! We revise the sentence as following:

      “Exogenous supplementation of unsaturated and saturated fatty acids increased and decreased bacterial susceptibility to antibiotics, respectively, confirming the role of fatty acids in antibiotic resistance.”

      The potential effects of the specific ions (SO4 and Cl2) present in the Mg2SO4 and MgCl2 compounds used in the study were not discussed. It would be useful to understand if these ions had any influence on the observed outcomes.

      We appreciate this comment! We revise the sentence as following:

      “However, the MIC for BLFX was higher in ASWT medium supplemented with Mg2SO4 or MgCl2 than in LB medium (Fig 1B). And Mg2SO4 or MgCl2 had no

      difference on MIC, suggesting it is Mg2+ not other ions contribute to the MIC change.”

      On Page 8, line 141: The heading of Figure 2, "Mg2+ elevates intracellular Mg2+," seems redundant and could be revised for clarity or modified. 

      We appreciate this comment! Figure 2 is now moved to supplementary figure as Suppl. Fig 2. The title is revised as following:

      “Figure 2. Mg2+ decreases balofloxacin uptake.”

      On Page 4, line 91: some terms/abbreviations, such as 'LB' and 'M9,' require expansion or definition to ensure the reader's understanding.

      We appreciate this comment! We include the expansion for LB and M9 in the  revised manuscript as following:

      “Luria-Bertani medium (LB medium)” and “M9 minimal medium (M9 medium)”

      Page 4, line 92: The real seawater composition used in the experiments should be supported by a reference.

      We appreciate this comment! We provide the reference in the revised manuscript as following:

      ““artificial seawater” (ASWT) medium that included the major ion species in marine water (Wilson, 1975) (LB medium plus 210 mM NaCl, 35 mM Mg2SO4, 7 mM KCl, and 7 mM CaCl2)”

      Page 4 line, number 93: the he full names of the bacterial strains (e.g., ATCC33787 and VP01) should be provided instead of just the strain numbers.

      We appreciate this comment! We revised the sentence as following:

      “To investigate whether this mineral impacts antibiotic activity, the minimal inhibitory concentration (MIC) of V. alginolyticus ATCC33787 and V. parahaemolyticus VP01, which we referred as ATCC33787 and VP01 afterwards,”

      Finally, there appears to be a potential contradiction between the statements on page 12, lines 211-212 and 214-216, regarding the effects of Mg2+ on the synthesis of unsaturated fatty acids. Further explanation may be needed to reconcile these seemingly contradictory points.

      We appreciate this comment! For line 221-226, which was previously line 211-212, is about the gene expression for fatty acid biosynthesis. While, Line 228 and 233, which was previously line 214-216 is about the gene expression for fatty acid degradation. We agree that the previous description is a little bit confuse. We revise the sentence to emphasize that we focus on fatty acid degradation so that the readers can tell them apart. 

      In the text, we revised it as following:

      “In addition, we also quantified gene expression during fatty acid degradation to determine whether Mg2+ affects this process”  In the figure legend, we also indicate that 

      “H. qRT-PCR for the expression of genes encoding fatty acid degradation in the absence or presence of the indicated concentrations of MgCl2”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper Homan et al used mouse models of Metabolic Dysfunction-Associated Steatotic Liver Disease and different specific target deletions in cells to rule out the role of Complement 3a Receptor 1 in the pathogenesis of disease. They provided limited evidence and only descriptive results that despite C3aR being relevant in different contexts of inflammation, however, these tenets did not hold true.

      Weaknesses:

      (1) The results are based on readouts showing that C3aR is not involved in the pathogenesis of liver metabolic disease.

      (2) The description of the mouse models they used to validate their findings is not clear. Lysm-cre mice - which are claimed to delete C3aR in (?) macrophages are not specific for these cells, and the genetic strategy to delete C3aR in Kupffer cells is not clear.

      (3) Taking this into account, it is very challenging to determine the validity of these data, also considering that they are merely descriptive and correlative.

      We generated 2 different cohorts of mice using LysM-Cre (Jackson Strain #004781) to drive deletion in all macrophages and Clec4f-Cre (Jackson Strain #033296) to specifically ablate C3ar1 in Kupffer cells. These experimental models have been clearly defined in the revised manuscript on pages 5 and 7 and in the methods section (page 10). The reviewer’s point is well taken that the LysM-Cre transgene can also be active in granulocytes and some dendritic cells. Even so, despite deletion of C3ar1 in macrophages and other granulocytes, we do not see a major effect on hepatic steatosis and fibrosis in this GAN diet induced model of MASLD/MASH. This was a somewhat surprising finding. We do not agree that our findings are correlative. We specifically ablated C3aR1 in macrophages or Kupffer cells and found no significant differences in the major readouts of steatosis and fibrosis for MASLD/MASH between control and knockout mice. It is possible that in other models of liver injury that we did not test (e.g., short-term treatment with a hepatotoxin such as carbon tetrachloride), there may be differences in liver injury in mice lacking C3ar1 in macrophages, but the GAN diet model has been shown to better parallel the gene expression changes in human MAFLD/MASH. This has been added to the discussion (page 9).

      Reviewer #2 (Public review):

      Summary:

      Homan et al. examined the effect of macrophage- or Kupffer cell-specific C3aR1 KO on MASLD/MASH-related metabolic or liver phenotypes.

      Strengths:

      Established macrophage- or Kupffer cell-specific C3aR1 KO mice.

      Weaknesses:

      Lack of in-depth study; flaws in comparisons between KC-specific C3aR1KO and WT in the context of MASLD/MASH, because MASLD/MASH WT mice likely have a low abundance of C3aR1 on KCs.

      Homan et al. reported a set of observation data from macrophage or Kupffer cell-specific C3aR1KO mice. Several questions and concerns as follows could challenge the conclusions of this study:

      (1) As C3aR1 is robustly repressed in MASLD or MASH liver, GAN feeding likely reduced C3aR1 abundance in the liver of WT mice. Thus, it is not surprising that there were no significant differences in liver phenotypes between WT vs. C3aR1KO mice after prolonged GAN diet feeding. It would give more significance to the study if restoring C3aR1 abundance in KCs in the context of MASLD/MASH.

      GAN diet feeding resulted in higher liver C3ar1 compared to regular diet (Figure 1H). This thus became an impetus for studying the effects of C3ar1 deletion in macrophages or Kupffer cells, which are responsible for the majority of liver C3ar1 expression, in MASLD/MASH (Figures 2B and 3H). This point has been added to the text on page 5.

      (2) Would C3aR1KO mice develop liver abnormalities after a short period of GAN diet feeding?

      We did not assess if short term GAN diet feeding resulted in significant differences in liver abnormalities in the C3ar1 macrophage or Kupffer cell knockout mice. Perhaps the reviewer’s point is that perhaps with shorter periods of GAN diet feeding there may be a phenotype in the KO mice. We agree that this is entirely possible, though with shorter feeding timeframes what is typically seen is hepatic steatosis without fibrosis. Nevertheless, the most important element in our opinion for a disease preventing or modifying model lies with the longer-term GAN diet feeding. With long term GAN diet feeding that has been previously shown to model human MASLD/MASH, we did not observe significant differences in liver abnormalities with the KO mice. This has been added to the discussion (page 8).

      (3) What would be the liver macrophage phenotypes in WT vs C3aR1KO mice after GAN feeding?

      Similar to the above point, given the lack of a major MASLD/MASH phenotype in hepatic steatosis and fibrosis, we did not further profile the liver macrophage profiles of the macrophage or Kupffer cell C3ar1 KO mice with GAN feeding.

      (4) In Fig 1D, >25wks GAN feeding had minimal effects on female body weight gain. These GAN-fed female mice also develop NASLD/MASH liver abnormalities?

      We thank the reviewer for this question. In general, female GAN-fed mice develop milder MASLD/MASH abnormalities. We have included additional data in the revised manuscript in Figure S4. These results show no to minimal development of a MASLD/MASH gene signature.

      (5) Would C3aR1KO result in differences in liver phenotypes, including macrophage population/activation, liver inflammation, lipogenesis, in lean mice?

      We have provided additional data further characterizing liver inflammation, lipogenesis and macrophages in macrophage C3ar1 KO mice under lean/regular diet conditions in Figure 2K. These results show a potential trend but no substantial development of a MASLD/MASH gene signature.

      (6) The authors should provide more information regarding the generation of KC-specific C3aR1KO. Which Cre mice were used to breed with C3aR1 flox mice?

      Clec4f-Cre transgenic mice were used to generate Kupffer cell specific KO of C3ar1. This has been clarified and explicitly stated in the revised manuscript on page 7 and in the methods section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      These data should be repeated using a more established model of Kupffer cell target deletion via Clec4-F mice.

      Our data with Kupffer cell C3ar1 deletion is indeed done with Clec4f-Cre transgenic mice. This has been clarified in the revised manuscript on page 7 and in the methods section.

      Reviewer #2 (Recommendations for the authors):

      (1) Typo: "iver" in the abstract

      (2) Line 97, "GAN diet I" should be "GAN diet"?

      These points have been corrected in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      Recent years have seen spectacular and controversial claims that loss of function of the RNA splicing factor Ptbp1 can efficiently reprogram astrocytes into functional neurons that can rescue motor defects seen in 6-hydroxydopamine (6-OHDA)-induced mouse models of Parkinson's disease (PD). This latest study is one of a series that fails to reproduce these observations, but remarkably also reports that neuronal-specific loss of function of Ptbp1 both induces expression of dopaminergic neuronal markers in striatal neurons and rescues motor defects seen in 6-OHDA-treated mice. The claims, if replicated, are remarkable and identify a straightforward and potentially translationally relevant mechanism for treating motor defects seen in PD models. However, while the reported behavioral effects are strong and were collected without sample exclusion, other claims made here are less convincing. In particular, no evidence that Ptbp1 loss of function actually occurs in striatal neurons is provided, and the immunostaining data used to claim that dopaminergic markers are induced in striatal neurons is not convincing. Furthermore, no characterization of the molecular identity of Ptbp1-deficient striatal neurons is provided using single-cell RNA-Seq or spatial transcriptomics, making it difficult to conclude that these cells are indeed adopting a dopaminergic phenotype. 

      Overall, while the claims of behavioral rescue of 6-OHDA-treated mice appear compelling, it is essential that these be independently replicated as soon as possible before further studies on this topic are carried out. Insights into the molecular mechanisms by which neuronalspecific loss of function of Ptbp1 induces behavioral rescue are lacking, however. Moreover, the claims of induction of neuronal identity in striatal neurons by Ptbp1 require considerable additional work to be convincing.

      We thank the reviewer for the detailed analysis of our study. Please find our answers to the points raised by the reviewer below in blue.

      Strengths of the study: 

      (1) The effect size of the behavioral rescue in the stepping and cylinder tests is strong and significant, essentially restoring 6-OHDA-lesioned mice to control levels.

      (2) Since the neurotoxic effects of 6-OHDA treatment are highly variable, the fact that all behavioral data was collected blinded and that no samples were excluded from analysis increases confidence in the accuracy of the results reported here. 

      We appreciate the reviewer’s feedback and acknowledgement of the strengths of our study. We undertook several optimization steps in the surgery, post-operative care, and handling of the animals for behavior experiments to ensure high reproducibility of our experiments.

      Weaknesses of the study:  

      (1) Neurons express relatively little Ptbp1. Indeed, cellular expression levels as measured by scRNA-Seq are substantially below those of astrocytes and other non-neuronal cell types, and Ptbp1 immunoreactivity has not been observed in either striatal or midbrain neurons (e.g. Hoang, et al. Nature 2023). This raises the question of whether any recovery of Th expression is indeed mediated by the loss of function of Ptbp1 rather than by off-target effects. AAVmediated rescue of Ptbp1 expression could help clarify this.

      In the original manuscript, we delivered control vectors that only express the ABE to 6-OHDAlesioned mice (labeled as AAV-ctrl) and did not detect TH positive cells in the midbrain or striatum of control mice or rescue of spontaneous motor skills. We can therefore exclude that the delivery procedure, AAV-PHP.eB capsid, or ABE expression caused adverse effects leading to induction of TH expression and functional rescue of spontaneous motor behaviors in PD mice. To further exclude that these effects were caused by off-target editing, we experimentally determined off-target binding sites of our sgRNA (sgRNA-ex3) using GUIDEseq and subsequently analyzed these sites in treated animals by NGS (Figure 3 – supplement 3). While two off-target sites were identified, it is unlikely that base editing at these sites caused the observed phenotypes. One off-target site was identified in the myopalladin (Mypn) gene, which encodes for a muscle-specific protein that plays a role in regulating the structure and growth of skeletal and cardiac muscle (Filomena et al., 2021, 2020).  The other site is not located in a coding region, but in an intron of the ankyrin-1 (Ank1) gene, encoding for an adaptor protein linking membrane proteins to the underlying cytoskeleton (Cunha and Mohler, 2009). Even though this gene is also expressed in neurons, base editing within this intronic region did not lead to changes in transcript levels (Figure 3 – supplement 3). Thus, the induction of TH expression upon adenine base editing with sgRNA-ex3 is likely a direct consequence of PTBP1 downregulation.

      Further supporting this conclusion, in the revised manuscript we additionally show PTBP1 downregulation at the RNA and protein level in the SNc and striatum after base editor treatment (Figure 2 – figure supplement 5; figure 3 – supplement 2).

      (2) It is not clear why dopaminergic neurons, which are not normally found in the striatum, are observed following Ptbp1 knockout. This is very similar to the now-debunked claims made in Zhou, et al. Cell 2020, but here performed using the hSyn rather than GFAP mini promoter to control AAV expression. While this is the most dramatic and potentially translationally relevant claim of the study, this claim is extremely surprising and lacks any clear mechanistic explanation for why it might happen in the first place.  

      We agree with the reviewer that our study does not provide mechanistic insights into how Ptbp1 downregulation in neurons leads to the induction of dopaminergic markers in the striatum. As we believe that this is not within the scope of a revision, we discuss potential follow-up experiments in the discussion section of the revised manuscript.

      This observation is even more surprising in light of reports that antisense oligonucleotidemediated knockdown of Ptbp1, which should have affected both neuronal and glial Ptbp1 expression, failed to induce expression of dopaminergic neuronal markers in the striatum (Chen, et al. eLife 2022). Selective loss of function of Ptbp1 in striatal and midbrain astrocytes likewise results in only modest changes in gene expression. 

      Using 6-OHDA lesioned Aldh1l1-CreERT2;Rpl22lsl-HA mice, the Chen et al. study (eLife 2022) assessed potential astrocyte to neuron conversion by quantifying the presence of HA-labeled neurons after ASO-mediated knockdown of Ptbp1. Even though they did not detect HApositive neurons in the SNc, suggesting absence of astrocyte to neuron conversion, the images in Figure 4D reveal TH positive cells in the lesioned hemisphere, similar to our observations in Figure 2B-D. While it cannot be excluded that these TH positive cells are remnants from an incomplete 6-OHDA lesion, they could also be endogenous neurons with induced expression of dopaminergic markers after ASO-mediated knockdown of Ptbp1. Furthermore, Chen et al. performed the apomorphine test to assess changes in motor skills, which did not reveal an improvement in our study either.

      It is critically important that this claim be independently replicated, and that additional data be provided to conclusively show that striatal neurons are indeed expressing dopaminergic markers.

      Our behavior and immunofluorescence experiments involving mice injected into the striatum were performed with two independently generated cohorts of 6-OHDA mice. In detail, the 6OHDA mice were generated by two independent surgeons from different labs (>6 months between experiments of these cohorts), leading to comparable behavioral outcomes before and after treatment. Subsequent behavior and immunofluorescence experiments with each cohort were performed and analyzed by two independent and blinded researchers, showing comparable results.

      (3) More generally, since multiple spectacular and irreproducible claims of single-step glial-toneuron reprogramming have appeared in high-profile journals in recent years, a consensus has emerged that it is essential to comprehensively characterize the identity of "transformed" cells using either single-cell RNA-Seq or spatial transcriptomics (e.g. Qian, et al. FEBS J 2021; Wang and Zhang, Dev Neurobiol 2022). These concerns apply equally to claims of neuronal subtype conversion such as those advanced here, and it is essential to provide these same datasets. 

      In the revised version, we have analyzed the expression of additional neuronal markers in TH positive cells of the striatum using 4i imaging. Briefly, our results showed that the vast majority of TH-expressing cells also expressed the markers DAT and NEUN, further corroborating the neuronal and dopaminergic identity of these cells. Additional analysis revealed that this TH/DAT/NEUN expressing cell population expressed markers of GABAergic neurons, either of medium spiny neurons (~50%) and various types of interneurons (~50%). While our 4i analysis has allowed us to broadly classify these TH-expressing populations, we agree that detailed transcriptional analysis at the single cell level is required to understand the molecular mechanisms underlying the generation of TH positive cells. These analyses are, however, not within the scope of a revision and would require a thorough dedicated study. We have added these results and discussion points to the revised manuscript.

      (4) Low-power images are generally lacking for immunohistochemical data shown in Figures 3 and 4, which makes interpretation difficult. DAPI images in Figure 3C do not appear nuclear. Immunostaining for Th, DAT, and Dcx in Figure 4 shows a high background and is difficult to interpret. 

      We thank the reviewer for closely evaluating these images and suggestions for improvement. In the revised manuscript, we provide low power images and higher magnification insets as requested to allow for easier interpretation.

      (5) Insights into the mechanism by which neuronal-specific loss of Ptbp1 function induces either functional recovery, or dopaminergic markers in striatal neurons, is lacking.

      In the revised manuscript, we provide a more detailed discussion of mechanisms that could potentially be involved in the functional recovery or expression of dopaminergic markers. However, deciphering the exact molecular mechanisms underlying these observations requires thorough transcriptional analysis at the single cell level, which is out of scope of this revision.

      Reviewer #2 (Public Review):

      Summary: 

      The manuscript by Bock and colleagues describes the generation of an AAV-delivered adenine base editing strategy to knockdown PTBP1 and the behavioral and neurorestorative effects of specifically knocking down striatal or nigral PTBP1 in astrocytes or neurons in a mouse model of Parkinson's disease. The authors found that knocking down PTBP1 in neurons, but not astrocytes, and in striatum, but not nigra, results in the phenotypic reorganization of neurons to TH+ cells sufficient to rescue motor phenotypes, though insufficient to normalize responses to dopaminomimetic drugs.

      Strengths: 

      The manuscript is generally well-written and adds to the growing literature challenging previous findings by Qian et al., 2020 and Zhou et al., 2020 indicating that astrocytic downregulation of PTBP1 can induce conversion to dopaminergic neurons in the midbrain and improve parkinsonian symptoms. The base editing approach is interesting and potentially more therapeutically relevant than previous approaches.

      Weaknesses: 

      The manuscript has several weaknesses in approach and interpretation. In terms of approach, the animal model utilized, the 6-OHDA model, though useful to examine dopaminergic cell loss, exhibits accelerated neurodegeneration and none of the typical pathological hallmarks (synucleinopathy, Lewy bodies, etc.) compared to the typical etiology of Parkinson's disease, limiting its translational interpretation. 

      We thank the reviewer for the detailed assessment of our study and pinpointing its current weaknesses. Please find our answers to all comments below in blue.

      We agree with the reviewer that the 6-OHDA model lacks the typical pathological hallmarks of PD. Nevertheless, we chose this model for two reasons:

      i) The 6-OHDA model was used by both Qian et al. (2020) and Zhou et al. (2020). To allow comparison of our results to these studies, it was crucial to use the same model. Notably, the 6-OHDA model was also used by Chen et al. (2022) and Hoang et al. (2023) for comparison to the two studies from 2020.

      ii) The 6-OHDA model is straightforward to generate and displays robust motor impairments for evaluation of potential therapeutic effects of neuroregeneration treatment approaches. We therefore believe that the model is well-suited to analyze the cellular and behavioral effects (specifically motor skills) of PTBP1 downregulation. 

      In future studies, it would be critical to include models that also display typical pathological hallmarks of the disease to further evaluate the therapeutic effect of this base editing approach. These experiments are, however, not within the scope of this study, which was aimed to focus on the cellular and behavioral effects of PTBP1 downregulation. 

      In addition, there is no confirmation of a neuronal or astrocytic knockdown of PTBP1 in vivo; all base editing validation experiments were completed in cell lines. 

      In the revised manuscript, we assess in vivo base editing efficiencies at the Ptbp1 target site in the SNc (AAV-hsyn, 15.6%) and striatum (AAV-hysn, 21.1%). Furthermore, we assessed in vivo Ptbp1 downregulation at the RNA and protein level to complement our in vitro data (Figure 2 – figure supplement 5; figure 3 – supplement 2).

      Finally, it is unclear why the base editing approach was used to induce loss-of-function rather than a cell-type specific knockout, if the goal is to assess the effects of PTBP1 loss in specific neurons. 

      We expressed base editors under cell-type specific promoter to induce a reliable loss-offunction mutation at the Ptbp1 exon-intron junction in neurons or astrocytes. Performing these mutations with Cas9 nucleases instead would have had potential limitations and risks, including i) indel mutations do not always lead to a frameshift and loss-of-function despite high indel formation at the targeted site, ii) nucleases induce DNA double strand breaks, which can have serious side effects (e.g. chromosomal rearrangements or translocations), and iii) ‘mosaicisms’ as edited cells contain different indel mutations, which may result in different effects and thus complicate analysis of the downstream effects. We discuss these points in the revised manuscript.  

      In terms of interpretation, the conclusion by the authors that PTBP1 knockdown has little likelihood to be therapeutically relevant seems overstated, particularly since they did observe a beneficial effect on motor behavior. We know that in PD, patients often display negligible symptoms until 50-70% of dopaminergic input to the striatum is lost, due to compensatory activity of remaining dopaminergic cells. Presumably, a small recovery of dopaminergic neurons would have an outsized effect on motor ability and may improve the efficacy of dopaminergic drugs, particularly levodopa, at lower doses, averting many problematic side effects. Since striatal dopamine was assessed by whole-tissue analysis, which is not necessarily reflective of synaptic dopamine availability, it is difficult to assess whether the ~10% increase in TH+ cells in the striatum was sufficient to improve dopamine function. However, the improvement in motor activity suggests that it was.

      As pointed out by the reviewer, it is difficult to estimate the therapeutic effect and importance of a ~10% increase in TH+ cells for PD patient. Guided by the reviewer’s suggestion, we have included a more in-depth discussion of our results and its potential therapeutic value as well as outstanding questions for future studies in the revised manuscript.

      Reviewer #3 (Public Review):

      This study explores the use of an adenine base editing strategy to knock down PTBP1 in astrocytes and neurons of a Parkinson's disease mouse model, as a potential AAV-BE therapy. The results indicate that editing Ptbp1 in neurons, but not astrocytes, leads to the formation of tyrosine hydroxylase (TH)+ cells, rescuing some motor symptoms.

      Several aspects of the manuscript stand out positively. Firstly, the clarity of the presentation. The authors communicate their ideas and findings in a clear and understandable manner, making it easier for readers to follow. 

      The Materials and methods section is well-elaborated, providing sufficient detail for reproducibility. 

      The logical flow of the manuscript makes sense, with each section building upon the previous one coherently.

      The ABE strategy employed by the authors appears sound, and the manuscript presents a coherent and well-supported argument.

      Positively, some of the data in this study effectively counteracts previous work in line with more recent publications, demonstrating the authors' ability to contribute to the ongoing conversation in the field.

      We thank the reviewer for appreciating the effort we have put into this study. Please find below a point-by-point reply to the weaknesses raised by the reviewer. 

      However, while the in vitro data yields promising results, it may have been overly optimistic to assume that the efficiencies observed in dividing cells will directly translate to in vivo conditions. This consideration is important given the added complexities of vector optimization, different cell types targeted in vitro versus in vivo, as well as unknown intrinsic limitations of the base editing technology. 

      We agree with the reviewer that in vitro base editing efficiencies might not directly translate to in vivo editing outcomes. We therefore assessed in vivo base editing efficiencies at the Ptbp1 locus and PTBP1 downregulation in the striatum and midbrain. Our data revealed that in vivo base editing activity was lower than in our in vitro setting (in vitro: Figure 1; figure 1 – figure supplement 2; in vivo: figure 2 – figure supplement 5; figure 3 – supplement 2). However, we believe that these rates are slightly underestimated since we sequenced DNA isolated from the whole tissue (striatum or SNc) and not from purified astrocytes or neurons. Moreover, we could demonstrate that editing led to a reduction of Ptbp1 transcript and PTBP1 protein level (Figure 2 – figure supplement 5; figure 3 – supplement 2).

      In addition, certain aspects of the manuscript would benefit from a more in-depth and comprehensive discussion rather than being only briefly touched upon. Such a discussion would enhance the relevance of the obtained results and provide the foundation for improvement when using similar approaches.

      Following the reviewer’s suggestion, we included a more in-depth discussion of our results in the revised manuscript.

      Recommendations for the authors:

      Reviewing Editor (Recommendations for the Authors):

      A summary of key recommendations that might improve the eLife assessment in a subsequent submission are provided below, as a guide to help the authors focus on changes that might enhance the strength of evidence (e.g., from "incomplete" to "solid").

      (1) Provide further explanation of the mechanistic relationship between the downregulation of Ptbp1 and TH+ dopaminergic neuron reprogramming. Additional discussion of this topic should also be included.

      (2) Demonstrate proof of editing in the intended targeted cells in vitro and/or in vivo.

      (3) Show evidence of successful Base Editor delivery in vivo.

      (4) Perform a deeper characterization of TH+ cells in vivo and provide a more thorough discussion of the identity of the targeted cells. This may include an exploration of whether TH+ cells detected are TH+ interneurons and/or establish their identity based on transcriptomics or a similar approach.

      (5) Provide better-quality representative images supporting the quantitative data.

      (6) Please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      In the revised manuscript, we provided 1) suggestions of the mechanistic relationship between Ptbp1 knockdown, dopamine synthesis, and the functional rescue of spontaneous behaviors, 2) proof of in vivo base editing and successful base editor delivery, 3) deeper characterization of TH-expressing cells in vivo using 4i imaging, 4) better quality images, and 5) full statistical reporting.  

      Individual Reviewer recommendations for the authors are included below.

      Reviewer #1 (Recommendations For The Authors):

      Confirm loss of Ptbp1 function in infected striatal neurons. Single-cell RNA-Seq or spatial transcriptomic analysis must be performed to characterize the identity of the edited striatal neurons. The quality of the immunostaining in Figures 3 and 4 needs to be improved, and lowpower images provided. Were eLife a conventional journal, I would have insisted on all these being included prior to publication. Please also arrange for independent replication of the behavioral rescue and induction of dopaminergic marker gene expression in the striatum. 

      In the revised manuscript, we confirmed Ptbp1 downregulation at the tissue level in the SNc and striatum by RT-qPCR and western blot and included low-power images for easier interpretation. Additionally, we assessed expression of additional neuronal markers on striatal sections using 4i imaging and found that TH/DAT/NEUN positive populations either expressed markers of medium spiny neurons or interneurons. We have included these results in the revised manuscript.

      Our behavioral and imaging experiments involving mice injected into the striatum were in fact performed with two independently generated cohorts of 6-OHDA mice. In detail, the 6OHDA mice were generated by two independent surgeons from different labs (>6 months between experiments of these two cohorts), leading to comparable behavioral outcomes before and after treatment. The experiments with each cohort were performed and analyzed by two independent and blinded researchers, yielding comparable results. 

      Reviewer #2 (Recommendations For The Authors):

      (1) In the introduction, lines 43-45: This statement is inaccurate. Current treatment strategies do not focus on slowing or halting disease progression. There is currently no accepted therapy that does this. Dopaminergic therapies and deep brain stimulation can compensate for circuitry dysfunction as a result of dopamine cell loss but do not slow the disease. The referenced paper used is older and does not refer to new treatments for PD and is a summary article for a special issue of the Disease Models and Mechanisms journal. Please ensure that all references used are appropriate for the statement they are attached to.

      We thank the reviewer for pointing this out. We have rephrased this statement accordingly and provided an appropriate reference describing current treatment strategies.

      (2) The number of TH+ cells in the intact nigra seems low compared to published data. Suggest a stereological approach may be better than the Abercrombie method.

      Following the reviewer’s suggestion, we re-quantified the number of TH positive cells using a stereological approach (Nv:Vref method). We have included these results in the revised manuscript. 

      (3) Have the authors considered that the striatal TH+ cells could be TH+ striatal interneurons? 

      In the revised manuscript, we performed additional 4i imaging experiments to further analyze the identity of the TH positive cells in the striatum. Briefly, we found that TH/DAT/NEUN positive populations either expressed markers of GABAergic medium spiny neurons or interneurons. We have added these results to the revised manuscript (Figure 4). 

      (4) The Western blot shown in Figure 1 C for C8-D1A has some abnormalities and makes it difficult to judge the bands. Also, for 1B, the legends are difficult to see.

      In the revised manuscript, we have repeated the respective western blot to make interpretation of the bands easier, and adapted the legends in Figure 1B for better visibility.

      (5) Figure 2: Please show representative images for the GFAP-targeted editing.

      Representative images of the GFAP-targeted groups can be found in Figure 2 – figure supplement 3.

      (6) Figure 2, Supplement 3: Please include quantification.

      The quantifications for these images can be found in Figure 2D and 2F. 

      (7) Figure 1, Supplement 2: The gene name in A is misspelled.

      Thank you for point this out. In the revised manuscript, we added the correct gene name.

      (8) Line 267-276: As previously indicated, the statement here is overstated based on the data provided. In addition, the citation provided to justify this claim (Kannari et al., 2000) is an odd choice as the dosage of L-DOPA utilized was not therapeutically relevant (50 mg/kg). A better indication of efficacy would be the return to basal, unaffected levels rather than the fold increase in dopamine levels. A better comparison would be Lindgren et al., 2010 who showed that L-DOPA-treated animals with a physiologically relevant dose (6 mg/kg) that did not induce dyskinesia, showed a return to basal, non-lesioned dopamine levels in the striatum after LDOPA by microdialysis. To really support this claim, the authors would need to use an approach that could measure synaptic dopamine availability, rather than whole-tissue dopamine levels, such as microdialysis, fiber photometry, or an equivalent.

      Following the reviewer’s suggestions, we replaced this reference with Lindgren et al. (2010) and provide a more detailed interpretation of our results and remaining questions for future studies.  

      Reviewer #3 (Recommendations For The Authors):

      Major and minor issues are discussed below by section.

      INTRODUCTION and AIM - Lines 36-73

      - The authors effectively contextualize the aim of their study by providing comprehensive background information on previous research regarding cell 'reprogramming' into dopaminergic neurons in the SNc. However, the introduction lacks contextualization of TH+ cells and PD. For readers who may not be well-versed in the Parkinson's field, understanding the importance of TH (Tyrosine Hydroxylase) may be challenging, since the term "TH+ cells" is mentioned only once by the end of the introduction (line 71), to then become a key element in the entire study.

      - Providing a brief explanation of the role of Tyrosine Hydroxylase in the synthesis of L-DOPA would facilitate the reader's comprehension of why the presence of TH+ cells following Base Editing treatment is relevant.

      - Further elaboration on the relationship between the downregulation of the general RNA binding protein, PTBP1, and the specific dopaminergic-related readout, TH, would improve coherence and strengthen the linkage between the introductory section and the results.

      We thank the reviewer for the constructive suggestions. In the introduction of the revised manuscript, we describe the meaning and importance of TH in the context of dopamine synthesis and PD. Likewise, we briefly outlined the importance of the PTBP1/nPTBP regulatory loops during neuronal differentiation and maturation. 

      RESULTS 

      Result Section 1 - Line 75-109

      - Thorough screening of sgRNAs targeting splice junctions across the Ptbp1 gene in HEPA cells, shows the achievement of high levels of editing (80-90%) with sgRNA-ex3 and sgRNAex7. 

      - The data also indicates that editing translates into significant reductions in ptbp1 expression, along with an increase in the expression of genes repressed by PTBP1.

      - Despite obtaining lower percentages of editing events in N2a neuroblastoma cells and the C8-D1A astroglial cell line, the differential expression levels of ptbp1 and the readout genes remain significant. However, the gRNA screening assay is performed in immortalized, dividing cells. 

      - Providing proof that Adenosine Base Editing of Ptbp1 is successful in non-dividing cells (such as SNc and/or striatal primary neurons) would strengthen the case for the potential therapy in the intended cell type.

      Following the reviewer’s comment, we show in vivo base editing rates in the SNc and striatum of treated PD mice in the revised manuscript (Figure 2 – figure supplement 5; figure 3 – supplement 2).

      - Moreover, assessing the expression levels of tyrosine hydroxylase by qPCR after Ptbp1 base editing in vitro could help contextualize the use of TH+ detection as an in vivo readout and may help explain why the total number of TH+ cells is low after ABE treatment in vivo - as shown in following sections.

      In the revised manuscript, we now provide quantifications of in vivo base editing efficiencies in the SNc (~15%) and striatum (~20%). As expected from these lower in vivo base editing rates, downregulation of Ptbp1 at the transcript and protein level was less pronounced compared to our in vitro experiments. It seems likely that higher base editing efficiency and more pronounced downregulation of Ptbp1 could lead to a larger population of TH expressing cells. We have added these results and interpretations to the revised manuscript.

      - Furthermore, although ABEs are less prone to generating bystander and other nucleotide changes compared to CBEs, it is still possible. Figures 1 (line 811) and 1-supplement 2 (line 842) only show a brief window of the Sanger sequencing trace. Updating these figures to display a wider view of the sequencing trace would enhance transparency. If unwanted edits are detected, while they may not significantly alter the relevance, impact, or structure of the paper, they may become an important aspect of the discussion. 

      Indeed, ABEs can induce bystander edits and we also detected such edits at the Ptbp1 target site. However, since our base editing strategy was designed to yield a loss of Ptbp1 function, bystander editing at the splice site was not a primary focus in our analysis. Nevertheless, we included CRISPResso output images showing the specific editing outcomes in a wider analysis window in the revised manuscript (Figure 3 – figure supplement 2). 

      Result Section 2 - Lines 110-159

      A split intein system is used in vivo with sgRNA-ex3, after updating the promoter to make it cell-specific: hSyn to restrict expression to neurons and GFAP to restrict expression to astrocytes. 

      However, no other assay is performed to assess whether a) the promoter change and/or b) splitting Cas9 may affect the editing efficiency compared to their initial in vitro approach.

      In the revised manuscript, we assessed the performance of the in vivo AAV vectors encoding the split intein ABE with sgRNA-ex3 in vitro in N2a and C8-D1A cells. Our results show that all vectors are functional and result in base editing at the target locus.

      -  Addressing whether this is the case may explain the low number of TH+ cells observed in vivo. 

      - The authors could also consider staining for Cas9 to address whether the low number of TH+cells could be attributed to a poor Cas9 delivery.

      To confirm successful in vivo base editor delivery, we quantified in vivo base editing efficiencies in the SNc and striatum of PD mice. Our analysis revealed in vivo base editing efficiencies at both tissue sites, confirming that base editors were successfully delivered. Editing efficiencies were, however, substantially lower (Figure 2 – figure supplement 5; figure 3 – supplement 2).  than in our in vitro cell line setting (Figure 1; figure 1 – figure supplement 2). Even though tissue editing rates likely underestimate the cell type-specific editing rates in astrocytes or neurons, higher base editing rates would have likely resulted in a higher number of TH positive cells. We have added these results and their implications to the revised manuscript. 

      -  Moreover, despite the presence of TH, in Figure 2 E,F authors examine the striatal innervation from newly generated TH+ cells in the SNc by Fluorescence Intensity (FI) to conclude that the edited cells do not form projections towards the striatum. Considering the low levels of TH+ positive cells obtained, the accumulation of gross FI might not be the most accurate way to assess the presence or absence of cell projections.

      - Using another marker that stains the projections rather than the cell soma, and that is a marker of dopaminergic neurons, might be a better way to address this.

      To address the reviewer’s comment, we analyzed the presence of potential dopaminergic fibers in the mfb, where projections are more concentrated (around the injection coordinates of 6-OHDA), using the dopaminergic marker DAT. In line with our previous observations in the striatum, we did not detect an increase in DAT fluorescence intensity upon treatment on the lesioned hemisphere (Figure 2 – figure supplement 4).  

      Result Section 3 - Line 160-182

      Minor issue

      - The same dual split intein system is used in the striatum. However, in Figure 3 - Figure Supplement 1 - line 958 and in Figure 3 - Figure Supplement 4 - line 1000authors show the injection of 2x the viral genomes indicated along the manuscript. In previous experiments the SNc 2x108vg/animal was used whereas this figure shows 4x108vg/animal injected in the striatum. 

      - The authors should clarify if the vg injected in the striatum was different from what they previously indicated.

      Compared to injection in the SNc, the volume of vector injected in the striatum was doubled since the region is significantly larger. We clarified that the injected vector genomes were different between striatum and SNc in the revised manuscript.

      Result Section 4- Line 183-220

      In this section, the authors thoroughly examine the neuronal nature of TH+ cells through NeuN co-staining and iterative immunofluorescence imaging (4i). BrdU experiments are conducted to determine the origin of these cells, leading to the conclusion that TH+ cells derive from nondividing cells and express the neuronal marker DAT, characteristic of dopamine-producing neurons (DANs). Cell shape of the TH+ cells in the striatum and SNc is also evaluated measuring their Feret's diameter and their cell surface. Authors conclude there's heterogeneity in the TH+ cell population due to the presence of TH+/Neun- as well as differences in cell shape. 

      However, their explanation of this heterogeneity is solely attributed to differences in the microenvironment and lacks further elaboration. Similarly, their observation that almost half the number of TH+ striatal cells after treatment express CTIP2 (Line 213 and Figure 4B), a marker for GABAergic medium spiny neurons, which they state as "interesting" (line 213) is not developed further. Delving deeper into these topics could strengthen the discussion.

      In the revised manuscript, we provided a more in-depth discussion of the 4i imaging results and potential therapeutic implications. Additionally, we suggest follow-up experiments to analyze the identity, function, and molecular mechanisms underlying the expression of TH upon PTBP1 downregulation in future studies. 

      Result Section 5- Line 221-243

      Two drug-free and two drug-induced behavioral tests are conducted in control and treated animals to evaluate the restoration of motor functions following treatment. Consistent with their previous findings, only the treatment targeted to neurons resulted in the restoration of motor functions in drug-free behavioral tests. The rationale behind each test and its evaluation is clearly explained.

      DISCUSSION 

      - In the discussion section, the authors effectively re-examine their results contextualizing their data with previous studies in the field. However, it would be helpful at this point in the manuscript to reconsider the use of the term 'cell reprogramming,' as this study does not involve actual cell reprogramming. The concept "reprograming" entails the process of transforming adult cells into a stem cell-like state, to then differentiate them into a different cell type. As proven in section 4 by a BrdU proliferation assay, the targeted cells are differentiated neurons. Considering BrdU is administered 5 days after ABE treatment, if true cell reprogramming was taking place, there should be evidence of BrdU incorporation. Cell reprogramming or reprograming is mentioned 4 times in the manuscript (line 34, line 54, line 265, line 277). Therefore, using another terminology would be more accurate.

      Following the reviewer’s suggestion, we removed the term “cell reprograming” from the manuscript and rather describe it as induction of TH expression in endogenous neurons.

      - As noted in the comments of section 4, a more thorough discussion about the various possibilities for heterogeneity would enhance the manuscript's contribution to the PD field.

      In the revised manuscript, we provided a more in-depth discussion of the 4i imaging results and potential therapeutic implications. 

      - Despite observing low numbers of TH+ cells, no significant rescue of drug-induced behaviors, and low levels of released dopamine, the authors merely state that these results make the therapy non-viable, but there is no further exploration or discussion. Whether the limitations lie in the ABE strategy itself, such as its efficiency in targeting and editing of differentiated neurons; or if the issues lie on the injection and delivery, is never discussed. A deeper argumentation on the possible underlying reasons for these challenges would greatly enhance the manuscript and contribute to the advancement of ABE therapies in the brain.

      We believe that the efficacy of our base editing approach could be significantly enhanced by optimizing the delivery. Currently, we are using a dual AAV approach to deliver intein-split ABEs. Since this approach relies on the delivery of higher AAV doses to achieve cotransduction of a cell by two different AAVs, the efficiency could be significantly enhanced by using smaller Cas9 orthologues that can be delivered as a single AAV. Furthermore, in this study we performed a single injection into the dorsal striatum to deliver ABE-expressing AAVs. Performing multiple injections into the rostral, medial, and caudal regions of the striatum might allow us to transduce more cells and induce TH expression in a larger population of striatal neurons. We have included these points in the revised manuscript.

      - While drug-induced behaviors are not recovered, the data demonstrates a rescue of spontaneous behaviors. Further discussion on the potential differences in circuitry underlying these variations in behavioral rescue would also enrich the manuscript's discussion.

      In the revised manuscript, we provide suggestions for potential mechanisms involved in the rescue of spontaneous behavior vs. absence of rescue of drug-induced behaviors. 

      FIGURES AND FIGURE SUPPLEMENTS

      General minor issue - low magnification images in the following figures, make it difficult to visualize positive cells in tissue sections: Figure 2; Figure 2- supplement 1; Figure 2 - supplement 3, Figure 3- supplement 1. Adding a higher magnification imaging of positive cells in tissue sections of SNc and striatum might help with the visualization. 

      As suggested by the reviewer, we included higher magnification images in the corresponding figures to improve interpretation of our results.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The manuscript involves 11 research vignettes that interrogate key aspects of GnRH pulse generator in two established mouse models of PCOS (peripubertal and prenatal androgenisation; PPA and PNA) (9 of the vignettes focus on the latter model).

      A key message of this paper is that the oft-quoted idea of rapid GnRH/LH pulses associated with PCOS is in fact not readily demonstrable in PNA and PPA mice. This is an important message to make known, but when established dogmas are being challenged, the experiments behind them need to be robust. In this case, underpowered experiments and one or two other issues greatly limit the overall robustness of the study.

      General critiques

      (1) My main concern is that many/most of the experiments were limited to 4-5 mice per group (PPA experiments 1 and 2, PNA experiments 3, 5, 6, 8, and 9). This seems very underpowered for trying to disprove established dogmas (sometimes falling back on "non-significant trends" - lines 105 and 239).

      For the key characterization of GnRH pulse generator activity and LH pulsatility in intact PNA mice (Fig.3, 4, 6), we used 6-8 animals in each experiment which we believe to be sufficient. 

      It is pertinent to explore the “established dogma”. While there is every expectation that the PNA model should have increased LH pulsatility, in fact there is only a single study (Moore, Prescott et al. 2015) that has shown this. The two other reports that have examined this issue find no change in LH pulse frequency (McCarthy, Dischino et al. 2021 and ours). Hence, we would suggest that expectations rather than evidence presently maintains the PNA “dogma”. For the PPA model, there is in fact not a single paper reporting increased LH pulse frequency.

      (2) Page 133-142: it is concerning that the PNA mice didn't have elevated testosterone levels, and this clearly isn't the fault of the assay as this was re-tested in the laboratory of Prof Handelsman, an expert in the field, using LCMS. The point (clearly made in lines 315-336 of the Discussion) that elevated testosterone in PNA mice has been shown in some but not other publications is an important concern to describe for the field. However, the fact remains that it IS elevated in numerous studies, and in the current study it is not so, yet the authors go on to present GnRH pulse generator data as characteristic of the PNA model. Perhaps a demonstration of elevated testosterone levels (by LCMS?) should become a standard model validation prerequisite for publishing any PNA model data.

      We provide a Table below showing the huge inconsistencies in testosterone levels reported in the PNA mouse model. If anything, these inconsistencies might be explained by age, although again this is very variable between studies. Much the same as the “dogma” related to LH pulsatility in the PNA model, we would question whether there is any robust increase in testosterone levels in this model. There is no question that women with PCOS have elevated testosterone but whether the PNA mouse is a good model for this is debatable. We have noted this caution and the need for further LC-MS studies in the Discussion.

      Author response table 1.

      *Same ELISA used in the current study.

      (3) Line 191-196: the lack of a significant increase in LH pulse frequency in PNA mice is based on measurements using reasonable group sizes (7-8), although the sampling frequency is low for this type of analysis (10-minute intervals; 6-minute intervals would seem safer for not missing some pulses). The significance of the LH pulse frequency results is not stated (looks like about p=0.01). The authors note that LH concentration IS elevated (approximately doubled), and this clearly is not caused by an increase in amplitude (Figure 4 G, H, I). These things are worth commenting on in the discussion.

      We have included the p-value of the LH pulse frequency results and included the relevant discussion.

      (4) An interesting observation is that PNA mice appear to continue to have cyclical patterns of GnRH pulse generator activity despite reproductive acyclicity as determined by vaginal cytology (lines 209-241). This finding was used to analyse the frequency of GnRH pulse generator SEs in the machine-learning-identified diestrous-like stage of PNA mice and compare it to diestrous control mice (as identified by vaginal cytology?) (lines 245-254). The idea of a cycle stage-specific comparison is good, but surely the only valid comparison would be to use machine-learning to identify the diestrous-like stage in both groups of mice. Why use machine learning for one and vaginal cytology for the other?

      As “machine learning-defined” diestrus is based on the control vaginal cytology information, the diestrous mice are in fact defined by the same machine learning parameters. We have now noted this.

      Specific points

      (5) With regard to point 2 above, it would be helpful to note the age at which the testosterone samples were taken.

      We have included the age in the method.

      (6) Lines 198-205 and 258-266: I think these are repeated measures of ANOVA data? If so, report the main relevant effect before the post hoc test result.

      We have included the relevant main effect in the manuscript.

      (7) Line 415: I don't think the word "although" works in this sentence.

      We have changed the wording accordingly.

      (8) Lines 514-518: what are the limits of hormone detection in the LCMS assay?

      These were originally stated in the figure legend but have now been included in the Methods.

      Reviewer #2 (Public Review):

      Summary

      The authors aimed to investigate the functionality of the GnRH (gonadotropin-releasing hormone) pulse generator in different mouse models to understand its role in reproductive physiology and its implications for conditions like polycystic ovary syndrome (PCOS). They compared the GnRH pulse generator activity in control mice, peripubertal androgen (PPA) treated mice, and prenatal androgen (PNA) exposed mice. The study sought to elucidate how androgen exposure affects the GnRH pulse generator and subsequent LH (luteinizing hormone) secretion, contributing to the pathophysiology of PCOS.

      Strengths

      (1) Comprehensive Model Selection: The use of both PPA and PNA mouse models allows for a comparative analysis that can distinguish the effects of different timings of androgen exposure.

      (2) Detailed Methodology: The methods employed, such as photometry recordings and serial blood sampling, are robust and allow for precise measurement of GnRH pulse generator activity and LH secretion.

      (3) Clear Results Presentation: The experimental results are well-documented with appropriate statistical analyses, ensuring the findings are reliable and reproducible.

      (4) Relevance to PCOS: The study addresses a significant gap in understanding the neuroendocrine mechanisms underlying PCOS, making the findings relevant to both basic science and potentially clinical research.

      Weaknesses

      (1) Model Limitations: While the PNA mouse model is suggested as the most appropriate for studying PCOS, the authors acknowledge that it does not completely replicate the human condition, particularly the elevated LH response seen in women with PCOS.

      We agree.

      (2) Complex Data Interpretation: The reduced progesterone feedback and its effects on the GnRH pulse generator in PNA mice add complexity to data interpretation, making it challenging to draw straightforward conclusions.

      We agree.

      (3) Machine Learning (ML) Selection and Validation: While k-means clustering is a useful tool for pattern recognition, the manuscript lacks detailed justification for choosing this specific algorithm over other potential methods. The robustness of clustering results has not been validated.

      Please see below.

      (4) Biological Interpretability: Although the machine learning approach identified cyclical patterns, the biological interpretation of these clusters in the context of PCOS is not thoroughly discussed. A deeper exploration of how these clusters correlate with physiological and pathological states could enhance the study's impact.

      It is presently difficult to ascribe specific functions of the various pulse generator states to physiological impact. While it is reasonable to suggest that Cluster_0 activity (representing very infrequent SEs) is responsible for the estrous/luteal-phase pause in pulsatility, we remain unclear on the physiological impact of multi-peak SEs on LH secretion, even in normal mice (see Vas et al., Endo 2024). Thus, for the moment, it is most appropriate to simply state that pulse generator activity remains cyclical in PNA mice without any unfounded speculation.

      (5) Sample Size: The study uses a relatively small number of animals (n=4-7 per group), which may limit the generalisability of the findings. Larger sample sizes could provide more robust and statistically significant results.

      For the key characterization of GnRH pulse generator activity and LH pulsatility in intact PNA mice (Fig.3, 4, 6), we used 6-8 animals in each experiment which we believe to be sufficient. Some of the subsequent experiments do have smaller N numbers and we are particularly aware of the progesterone treatment study that only has N=3 for the PNA group. However, as this was sufficient to show a statistical difference we did not generate more mice.

      (6) Scope of Application: The findings, while interesting, are primarily applicable to mouse models. The translation to human physiology requires cautious interpretation and further validation.

      We agree.

      Reviewer #2 (Recommendations For The Authors):

      (1) The validation of clustering results through additional metrics or comparison with other algorithms would strengthen the methodology. Specifically, the authors selected k=5 for k-means clustering without providing an explicit rationale or evidence of exploratory data analysis (EDA) to support this choice. They refer to their previous publication (Vas, Wall et al. 2024), which does not provide any EDA regarding the choice of a number of clusters nor their robustness. The arbitrary selection of "k" without justification can undermine confidence in the clustering results since clustering results heavily depend on "k". The authors also choose to use Euclidean distance as the "numerical measure" setting in the RapidMiner Studio's software without justification given the chosen features used for clustering and their properties. The lack of exploratory analysis to determine the optimal number of clusters, "k", to be considered means that the authors might have missed identifying the true structure of the data. Common cluster robustness methods, like the elbow method or silhouette analysis, are crucial for justifying the number of clusters. An inappropriate choice could lead to incorrect conclusions about the synchronisation patterns of ARN kisspeptin neurons and their implications for the study's hypotheses. Including EDA and other validation techniques (e.g., silhouette scores, elbow method) would have strengthened the manuscript by providing empirical support for the chosen algorithm and settings.

      It is important to clarify that we did not start this exercise with an unknown or uncharacterised data set and that the objective of the clustering was not to provide any initial pattern to the data. Rather, our aim was to develop an unsupervised approach that would automatically detect the onset and existence of the key features of pulse generator cyclicity that were apparent by eye e.g. the estrous stage slowing and the presence of multi-peak SEs in metestrous. As such, our optimization was driven by the data as well as observation while retaining the unsupervised nature of k-means clustering. We started by assessed 10 variables describing all possible features of the recordings and through a process of elimination found that just 5 were sufficient to describe the key stages of the cycle. While we appreciate that the use of multiple different algorithms would progressively increase the robustness of the machine learning approach, it is evident that the current k-means approach with k=5 is already very effective at reporting the estrous cyclicity of the pulse generator in normal mice (Vas et al., Endo 2024). Having validated this approach, we have now used it here to compare the cyclical patterns of activity of PNA- and vehicle-treated mice.

      (2) The data and methods presented in this study could be valuable for the research community studying reproductive endocrinology and neuroendocrine disorders provided the authors address my comments above regarding the application of ML methods. The insights gained from this work could potentially inform clinical research aiming to develop better diagnostic and therapeutic strategies for PCOS.

      Reviewer #3 (Public Review):

      Summary:

      Zhou and colleagues elegantly used pre-clinical mouse models to understand the nature of abnormally high GnRH/LH pulse secretion in polycystic ovary syndrome (PCOS), a major endocrine disorder affecting female fertility worldwide. This work brings a fundamental question of how altered gonadotropin secretion takes place upstream within the GnRH pulse generator core, which is defined by arcuate nucleus kisspeptin neurons.

      Strengths:

      The authors use state-of-the-art in vivo calcium imaging with fiber photometry and important physiological manipulations and measurements to dissect the possible neuronal mechanisms underlying such neuroendocrine derangements in PCOS. The additional use of unsupervised k-means clustering analysis for the evaluation of calcium synchronous events greatly enhances the quality of their evidence. The authors nicely propose that neuroendocrine dysfunction in PCOS might involve different setpoints through the hypothalamic-pituitary-gonadal (HPG) axis, and beyond kisspeptin neurons, which importantly pushes our field forward toward future investigations.

      Weaknesses:

      Although the authors provide important evidence, additional efforts are required to improve the quality of the manuscript and back up their claims. For instance, animal experiments failed to detect high testosterone levels in PNA female mice, a well-established PCOS mouse model. Considering that androgen excess is a hallmark of PCOS, this highly influences the subsequent evaluation of calcium synchronous events in arcuate kisspeptin neurons and the implications for neuroendocrine derangements.

      Please see our response to Reviewer 1. It will be important to establish a robust PCOS mouse model in the future that has elevated pulse generator activity in the presence of elevated testosterone concentrations.

      Authors also may need to provide LH data from another mouse model used in their work, the peripubertal androgen (PPA) model. Their claims seem to fall short without the pairing evidence of calcium synchronous events in arcuate kisspeptin neurons and LH pulse secretion.

      We have demonstrated that ARN-KISS neuron SEs are perfectly correlated with pulsatile LH secretion in intact and gonadectomized male and female mice on many occasions. Given that the pulse generator frequency slows by 50% in PPA mice, it is very hard to imagine how this could result in an elevated LH pulse frequency. While we were undertaking these studies the first paper (to our knowledge) looking at pulsatile LH secretion in the PPA model was published; no change was found.

      Another aspect that requires reviewing, is further exploration of their calcium synchronous events data and the increase of animal numbers in some of their experiments.

      Please see below.

      Reviewer #3 (Recommendations For The Authors):

      The reviewer believes that this work will greatly contribute to the field and, to provide better manuscript quality, there might be only a few minor and major revisions to be included in the future version.

      Minor:

      (1) Line 17: I would change the sentence to "One in ten women in their reproductive age suffer from PCOS" to adapt to more accurate prevalence studies.

      We have revised the sentence as recommended.

      (2) Line 18 and 19: Although the evidence indeed points to a high LH pulse secretion in PCOS, I would change it to "with increased LH secretion" as most studies show mean values and not LH pulse release data.

      While we agree that most human studies show a mean increase in LH, when assessed with sufficient temporal resolution, this results from elevated LH pulse frequency. As such, and to keep the manuscript focussed on the pulse generator, we would like the retain the present wording.

      (3) Line 47: Please correct "polycystic ovaries" to polycystic-like ovarian morphology to adapt to the current AEPCOS guidelines.

      We have revised the sentence as recommended.

      (4) Line 231: Authors stated that "These PNA mice exhibited a cyclical pattern of activity similar to that of control mice" (Figure 5C and D). Please, include the statistical tests here for this claim. Although they say there aren't differences, the colored fields do not reflect this and seem quite different. Could the authors re-evaluate these claims or provide better examples in the figure?

      We used Sidak’s multiple comparisons tests for this analysis (as stated in Results). The key data for assessing overall cyclical activity in PNA and control mice is Fig 5B which suggest very little difference. We accept that the individual traces of activity (Fig.5D) do not look identical to controls and, indeed, they are representative of the data set. The key point is they remain cyclical in an acyclic mouse. We have made sure that this is clear in the text.

      (5) Subheadings 6 and & of the result section: It sounds confusing to read the foremost claims of the absence of SE differences and next have a clear SE frequency difference in Figures 6 C and D. The reviewer suggests that authors could reorganize the text and figures to make their rationale flow better for future readers.

      We have considered this point carefully but find that re-organization creates its own problems with having to use the machine learning algorithm before describing it. It will always be problematic to incorporate this type of data-reanalysis in an original paper but think this present sequence is the best that can be achieved.

      (6) Discussion: If PNA female mice did not have elevated testosterone levels, how can the authors compare their results to the current literature? Could this be the case for lacking a more robust ARNKISS neuronal activity output in their experiments? The reviewer recommends a better discussion concerning these aspects.

      Please refer to our response to Reviewer #1 comment (2).

      (7) Discussion: the authors claim that diestrous PNA mice exhibited highly variable patterns of ARNKISS neuron activity. Would these differences be due to different circulating sex steroid levels or intrinsic properties? Would the inclusion of future in vitro calcium imaging (brain slices) studies contribute to their research question and conclusions? The reviewer recommends a better discussion concerning these aspects.

      We have tried to clarify that the highly variable patterns of activity in “diestrous” PNA mice come from the fact that we are actually randomly recording from ARN-KISS neurons at metestrus, diestrus, proestrus and estrus.  The pulse generator is cycling but we only have the acyclic “diestrous” smear to go by. This also makes brain slice studies difficult as we would never know the actual cycle stage.

      Major:

      (1) Results section: The reviewer strongly recommends that the LH pulse secretion data for the PPA group be included in the manuscript. If the SEs represent the central mechanism of pulse generation, would the LH pulse frequency match those events? If not, could a mismatch be explained by androgen-mediated negative feedback at the pituitary level? What is the pituitary LH response to exogenous GnRH (i.p. injection) in the PPA group?

      Our initial observation showed the frequency of ARNKISS neuron SEs was halved in PPA mice compared to controls. Additionally, one study reported pulsatile LH secretion to be unchanged in this animal model (Coyle, Prescott et al. 2022). Both pieces of evidence clearly indicate that the PPA mouse does not provide an appropriate PCOS model of elevated pulse generator activity. Therefore, we do not see the value of pursuing further experiments in this animal model.

      (2) Although the evaluation of relative frequency and normalized amplitude indicate the dynamic over time, the authors should include the average amplitudes and frequencies of events within the recording session. For instance, looking at Figures 1 A and B and Figures 3 A and B, a reader can observe differences in the amplitude due to different scaling axes. Perhaps, using a Python toolbox such as GuPPy or any preferred analysis pipeline might help authors include these parameters.

      The amplitude of recorded SEs for each mouse depends primarily on the fiber position. As such, it has only ever been possible to assess SE amplitude changes within the same mouse. It is not possible to assess differences in SE amplitude between mice.

      (3) Line 144-156: (Immunoreactivity results): Authors should proceed with caution when describing these results and clearly state that results show a software-based measurement of immunoreactive signal intensity. In addition, the small sample size of the PNA group (N = 4) compared to controls (N = 6-7) seems to mask possible differences. Could the authors increase the N of the PNA group and re-evaluate these results?

      We have clarified that the immunoreactive signal intensity is based on software-based measurement. The N number for PNA mice in these studies varies from 4 to 6 depending on brain section availability for the different immunohistochemistry runs. The scatter of data is such that any new data points would need to be at the extreme of the distributions to likely have any impact on statistical significance. As a minor part of the paper, we did not feel that the use of further mice was warranted.

      (4) Considering the great variability of PNA's number of SE/hr, the review suggests increasing the N in this group, thus, authors can re-evaluate their findings and draw better analysis/ conclusion.

      We have n=6 for the PNA group in the study. As noted above, the variability in SE/hr in Figure 3 comes from assessing the pulse generator at random times within the estrous cycle. Once we separate “diestrous-like” stage for the PNA animals, the variability is decreased as shown in Figure 6.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Summary of reviewers’ comments and our revisions: 

      We thank the reviewers for their thoughtful feedback. This feedback has motivated multiple revisions and additions that, in our view, have greatly improved the manuscript. This is especially true with regard to a major goal of this study: clearly defining existing scientific perspectives and delineating their decoding implications. In addition to building on this conceptual goal, we have expanded existing analyses and have added a new analysis of generalization using a newly collected dataset. We expect the manuscript will be of very broad interest, both to those interested in BCI development and to those interested in fundamental properties of neural population activity and its relationship with behavior.

      Importantly, all reviewers were convinced that MINT provided excellent performance, when benchmarked against existing methods, across a broad range of standard tasks:

      “their method shows impressive performance compared to more traditional decoding approaches” (R1) 

      “The paper was thorough in considering multiple datasets across a variety of behaviors, as well as existing decoding methods, to benchmark the MINT approach. This provided a valuable comparison to validate the method.” (R2) 

      “The fact that performance on stereotyped tasks is high is interesting and informative…” (R3)

      This is important. It is challenging to design a decoder that performs consistently across multiple domains and across multiple situations (including both decoding and neural state estimation). MINT does so. MINT consistently outperformed existing lightweight ‘interpretable’ decoders, despite being a lightweight interpretable decoder itself. MINT was very competitive with expressive machine-learning methods, yet has advantages in flexibility and simplicity that more ‘brute force’ methods do not. We made a great many comparisons, and MINT was consistently a strong performer. Of the many comparisons we made, there was only one where MINT was at a modest disadvantage, and it was for a dataset where all methods performed poorly. No other method we tested was as consistent. For example, although the GRU and the feedforward network were often competitive with MINT (and better than MINT in the one case mentioned above), there were multiple other situations where they performed less well and a few situations where they performed poorly. Moreover, no other existing decoder naturally estimates the neural state while also readily decoding, without retraining, a broad range of behavioral variables.

      R1 and R2 were very positive about the broader impacts of the study. They stressed its impact both on decoder design, and on how our field thinks, scientifically, about the population response in motor areas: 

      “This paper presents an innovative decoding approach for brain-computer interfaces” (R1)

      “presents a substantial shift in methodology, potentially revolutionizing the way BCIs interpret and predict neural behaviour” (R1)

      “the paper's strengths, particularly its emphasis on a trajectory-centric approach and the simplicity of MINT, provide a compelling contribution to the field” (R1)

      “The authors made strong arguments, supported by evidence and literature, for potentially high-dimensional neural states and thus the need for approaches that do not rely on an assumption of low dimensionality” (R2)

      “This work is motivated by brain-computer interfaces applications, which it will surely impact in terms of neural decoder design.” (R2)

      “this work is also broadly impactful for neuroscientific analysis... Thus, MINT will likely impact neuroscience research generally.” (R2)

      We agree with these assessments, and have made multiple revisions to further play into these strengths. As one example, the addition of Figure 1b (and 6b) makes this the first study, to our knowledge, to fully and concretely illustrate this emerging scientific perspective and its decoding implications. This is important, because multiple observations convince us that the field is likely to move away from the traditional perspective in Figure 1a, and towards that in Figure 1b. We also agree with the handful of weaknesses R1 and R2 noted. The manuscript has been revised accordingly. The major weakness noted by R1 was the need to be explicit regarding when we suspect MINT would (and wouldn’t) work well in other brain areas. In non-motor areas, the structure of the data may be poorly matched with MINT’s assumptions. We agree that this is likely to be true, and thus agree with the importance of clarifying this topic for the reader. The revision now does so. R1 also wished to know whether existing methods might benefit from including trial-averaged data during training, something we now explore and document (see detailed responses below). R2 noted two weaknesses: 1) The need to better support (with expanded analysis) the statement that neural and behavioral trajectories are non-isometric, and 2) The need to more rigorously define the ‘mesh’. We agree entirely with both suggestions, and the revision has been strengthened by following them (see detailed responses below).

      R3 also saw strengths to the work, stating that:

      “This paper is well-structured and its main idea is clear.” 

      “The fact that performance on stereotyped tasks is high is interesting and informative, showing that these stereotyped tasks create stereotyped neural trajectories.” 

      “The task-specific comparisons include various measures and a variety of common decoding approaches, which is a strength.”

      However, R3 also expressed two sizable concerns. The first is that MINT might have onerous memory requirements. The manuscript now clarifies that MINT has modest memory requirements. These do not scale unfavorably as the reviewer was concerned they might. The second concern is that MINT is: 

      “essentially a table-lookup rather than a model.”

      Although we don’t agree, the concern makes sense and may be shared by many readers, especially those who take a particular scientific perspective. Pondering this concern thus gave us the opportunity to modify the manuscript in ways that support its broader impact. Our revisions had two goals: 1) clarify the ways in which MINT is far more flexible than a lookup-table, and 2) better describe the dominant scientific perspectives and their decoding implications.

      The heart of R3’s concern is the opinion that MINT is an effective but unprincipled hack suitable for situations where movements are reasonably stereotyped. Of course, many tasks involve stereotyped movements (e.g. handwriting characters), so MINT would still be useful. Nevertheless, if MINT is not principled, other decode methods would often be preferable because they could (unlike MINT in R3’s opinion) gain flexibility by leveraging an accurate model. Most of R3’s comments flow from this fundamental concern: 

      “This is again due to MINT being a lookup table with a library of stereotyped trajectories rather than a model.”

      “MINT models task-dependent neural trajectories, so the trained decoder is very task-dependent and cannot generalize to other tasks.”

      “Unlike MINT, these works can achieve generalization because they model the neural subspace and its association to movement.”

      “given that MINT tabulates task-specific trajectories, it will not generalize to tasks that are not seen in the training data even when these tasks cover the exact same space (e.g., the same 2D computer screen and associated neural space).”

      “For proper training, the training data should explore the whole movement space and the associated neural space, but this does not mean all kinds of tasks performed in that space must be included in the training set (something MINT likely needs while modeling-based approaches do not).”

      The manuscript has been revised to clarify that MINT is considerably more flexible than a lookup table, even though a lookup table is used as a first step. Yet, on its own, this does not fully address R3’s concern. The quotes above highlight that R3 is making a standard assumption in our field: that there exists a “movement space and associated neural space”. Under this perspective, one should, as R3 argues fully explore the movement space. This would perforce fully explore the associated neural subspace. One can then “model the neural subspace and its association to movement”. MINT does not use a model of this type, and thus (from R3’s perspective) does not appear to use a model at all. A major goal of our study is to question this traditional perspective. We have thus added a new figure to highlight the contrast between the traditional (Figure 1a) and new (Figure 1b) scientific perspectives, and to clarify their decoding implications.

      While we favor the new perspective (Figure 1b), we concede that R3 may not share our view. This is fine. Part of the reason we believe this study is timely, and will be broadly read, is that it raises a topic of emerging interest where there is definitely room for debate. If we are misguided – i.e. if Figure 1a is the correct perspective – then many of R3’s concerns would be on target: MINT could still be useful, but traditional methods that make the traditional assumptions in Figure 1a would often be preferable. However, if the emerging perspective in Figure 1b is more accurate, then MINT’s assumptions would be better aligned with the data than those of traditional methods, making it a more (not less) principled choice.

      Our study provides new evidence in support of Figure 1b, while also synthesizing existing evidence from other recent studies. In addition to Figure 2, the new analysis of generalization further supports Figure 1b. Also supporting Figure 1b is the analysis in which MINT’s decoding advantage, over a traditional decoder, disappears when simulated data approximate the traditional perspective in Figure 1a.

      That said, we agree that the present study cannot fully resolve whether Figure 1a or 1b is more accurate. Doing so will take multiple studies with different approaches (indeed we are currently preparing other manuscripts on this topic). Yet we still have an informed scientific opinion, derived from past, present and yet-to-be-published observations. Our opinion is that Figure 1b is the more accurate perspective. This possibility makes it reasonable to explore the potential virtues of a decoding method whose assumptions are well-aligned with that perspective. MINT is such a method. As expected under Figure 1b, MINT outperforms traditional interpretable decoders in every single case we studied. 

      As noted above, we have added a new generalization-focused analysis (Figure 6) based on a newly collected dataset. We did so because R3’s comments highlight a deep point: which scientific perspective one takes has strong implications regarding decoder generalization. These implications are now illustrated in the new Figure 6a and 6b. Under Figure 6a, it is possible, as R3 suggests, to explore “the whole movement space and associated neural space” during training. However, under Figure 6b, expectations are very different. Generalization will be ‘easy’ when new trajectories are near the training-set trajectories. In this case, MINT should generalize well as should other methods. In contrast, generalization will be ‘hard’ when new neural trajectories have novel shapes and occupy previously unseen regions / dimensions. In this case, all current methods, including MINT, are likely to fail. R3 points out that traditional decoders have sometimes generalized well to new tasks (e.g. from center-out to ‘pinball’) when cursor movements occur in the same physical workspace. These findings could be taken to support Figure 6a, but are equally consistent with ‘easy’ generalization in Figure 6b. To explore this topic, the new analysis in Figure 6c-g considers conditions that are intended to span the range from easy to hard. Results are consistent with the predictions of Figure 6b. 

      We believe the manuscript has been significantly improved by these additions. The revisions help the manuscript achieve its twin goals: 1) introduce a novel class of decoder that performs very well despite being very simple, and 2) describe properties of motor-cortex activity that will matter for decoders of all varieties.

      Reviewer #1: 

      Summary: 

      This paper presents an innovative decoding approach for brain-computer interfaces (BCIs), introducing a new method named MINT. The authors develop a trajectory-centric approach to decode behaviors across several different datasets, including eight empirical datasets from the Neural Latents Benchmark. Overall, the paper is well written and their method shows impressive performance compared to more traditional decoding approaches that use a simpler approach. While there are some concerns (see below), the paper's strengths, particularly its emphasis on a trajectory-centric approach and the simplicity of MINT, provide a compelling contribution to the field. 

      We thank the reviewer for these comments. We share their enthusiasm for the trajectory-centric approach, and we are in complete agreement that this perspective has both scientific and decoding implications. The revision expands upon these strengths.

      Strengths: 

      The adoption of a trajectory-centric approach that utilizes statistical constraints presents a substantial shift in methodology, potentially revolutionizing the way BCIs interpret and predict neural behaviour. This is one of the strongest aspects of the paper. 

      Again, thank you. We also expect the trajectory-centric perspective to have a broad impact, given its relevance to both decoding and to thinking about manifolds.

      The thorough evaluation of the method across various datasets serves as an assurance that the superior performance of MINT is not a result of overfitting. The comparative simplicity of the method in contrast to many neural network approaches is refreshing and should facilitate broader applicability. 

      Thank you. We were similarly pleased to see such a simple method perform so well. We also agree that, while neural-network approaches will always be important, it is desirable to also possess simple ‘interpretable’ alternatives.

      Weaknesses:  

      Comment 1) Scope: Despite the impressive performance of MINT across multiple datasets, it seems predominantly applicable to M1/S1 data. Only one of the eight empirical datasets comes from an area outside the motor/somatosensory cortex. It would be beneficial if the authors could expand further on how the method might perform with other brain regions that do not exhibit low tangling or do not have a clear trial structure (e.g. decoding of position or head direction from hippocampus) 

      We agree entirely. Population activity in many brain areas (especially outside the motor system) presumably will often not have the properties upon which MINT’s assumptions are built. This doesn’t necessarily mean that MINT would perform badly. Using simulated data, we have found that MINT can perform surprisingly well even when some of its assumptions are violated. Yet at the same time, when MINT’s assumptions don’t apply, one would likely prefer to use other methods. This is, after all, one of the broader themes of the present study: it is beneficial to match decoding assumptions to empirical properties. We have thus added a section on this topic early in the Discussion: 

      “In contrast, MINT and the Kalman filter performed comparably on simulated data that better approximated the assumptions in Figure 1a. Thus, MINT is not a ‘better’ algorithm – simply better aligned with the empirical properties of motor cortex data. This highlights an important caveat. Although MINT performs well when decoding from motor areas, its assumptions may be a poor match in other areas (e.g. the hippocampus). MINT performed well on two non-motor-cortex datasets – Area2_Bump (S1) and DMFC_RSG (dorsomedial frontal cortex) – yet there will presumably be other brain areas and/or contexts where one would prefer a different method that makes assumptions appropriate for that area.”

      Comment 2) When comparing methods, the neural trajectories of MINT are based on averaged trials, while the comparison methods are trained on single trials. An additional analysis might help in disentangling the effect of the trial averaging. For this, the authors could average the input across trials for all decoders, establishing a baseline for averaged trials. Note that inference should still be done on single trials. Performance can then be visualized across different values of N, which denotes the number of averaged trials used for training. 

      We explored this question and found that the non-MINT decoders are harmed, not helped, by the inclusion of trial-averaged responses in the training set. This is presumably because the statistics of trialaveraged responses don’t resemble what will be observed during decoding. This statistical mismatch, between training and decoding, hurts most methods. It doesn’t hurt MINT, because MINT doesn’t ‘train’ in the normal way. It simply needs to know rates, and trial-averaging is a natural way to obtain them. To describe the new analysis, we have added the following to the text.

      “We also investigated the possibility that MINT gained its performance advantage simply by having access to trial-averaged neural trajectories during training, while all other methods were trained on single-trial data. This difference arises from the fundamental requirements of the decoder architectures: MINT needs to estimate typical trajectories while other methods don’t. Yet it might still be the case that other methods would benefit from including trial-averaged data in the training set, in addition to single-trial data. Alternatively, this might harm performance by creating a mismatch, between training and decoding, in the statistics of decoder inputs. We found that the latter was indeed the case: all non-MINT methods performed better when trained purely on single-trial data.”

      Reviewer #2:

      Summary: 

      The goal of this paper is to present a new method, termed MINT, for decoding behavioral states from neural spiking data. MINT is a statistical method which, in addition to outputting a decoded behavioral state, also provides soft information regarding the likelihood of that behavioral state based on the neural data. The innovation in this approach is neural states are assumed to come from sparsely distributed neural trajectories with low tangling, meaning that neural trajectories (time sequences of neural states) are sparse in the high-dimensional space of neural spiking activity and that two dissimilar neural trajectories tend to correspond to dissimilar behavioral trajectories. The authors support these assumptions through analysis of previously collected data, and then validate the performance of their method by comparing it to a suite of alternative approaches. The authors attribute the typically improved decoding performance by MINT to its assumptions being more faithfully aligned to the properties of neural spiking data relative to assumptions made by the alternatives. 

      We thank the reviewer for this accurate summary, and for highlighting the subtle but important fact that MINT provides information regarding likelihoods. The revision includes a new analysis (Figure 6e) illustrating one potential way to leverage knowledge of likelihoods.

      Strengths:  

      The paper did an excellent job critically evaluating common assumptions made by neural analytical methods, such as neural state being low-dimensional relative to the number of recorded neurons. The authors made strong arguments, supported by evidence and literature, for potentially high-dimensional neural states and thus the need for approaches that do not rely on an assumption of low dimensionality. 

      Thank you. We also hope that the shift in perspective is the most important contribution of the study. This shift matters both scientifically and for decoder design. The revision expands on this strength. The scientific alternatives are now more clearly and concretely illustrated (especially see Figure 1a,b and Figure 6a,b). We also further explore their decoding implications with new data (Figure 6c-g).

      The paper was thorough in considering multiple datasets across a variety of behaviors, as well as existing decoding methods, to benchmark the MINT approach. This provided a valuable comparison to validate the method. The authors also provided nice intuition regarding why MINT may offer performance improvement in some cases and in which instances MINT may not perform as well. 

      Thank you. We were pleased to be able to provide comparisons across so many datasets (we are grateful to the Neural Latents Benchmark for making this possible).

      In addition to providing a philosophical discussion as to the advantages of MINT and benchmarking against alternatives, the authors also provided a detailed description of practical considerations. This included training time, amount of training data, robustness to data loss or changes in the data, and interpretability. These considerations not only provided objective evaluation of practical aspects but also provided insights to the flexibility and robustness of the method as they relate back to the underlying assumptions and construction of the approach. 

      Thank you. We are glad that these sections were appreciated. MINT’s simplicity and interpretability are indeed helpful in multiple ways, and afford opportunities for interesting future extensions. One potential benefit of interpretability is now explored in the newly added Figure 6e. 

      Impact: 

      This work is motivated by brain-computer interfaces applications, which it will surely impact in terms of neural decoder design. However, this work is also broadly impactful for neuroscientific analysis to relate neural spiking activity to observable behavioral features. Thus, MINT will likely impact neuroscience research generally. The methods are made publicly available, and the datasets used are all in public repositories, which facilitates adoption and validation of this method within the greater scientific community. 

      Again, thank you. We have similar hopes for this study.

      Weaknesses (1 & 2 are related, and we have switched their order in addressing them): 

      Comment 2) With regards to the idea of neural and behavioral trajectories having different geometries, this is dependent on what behavioral variables are selected. In the example for Fig 2a, the behavior is reach position. The geometry of the behavioral trajectory of interest would look different if instead the behavior of interest was reach velocity. The paper would be strengthened by acknowledgement that geometries of trajectories are shaped by extrinsic choices rather than (or as much as they are) intrinsic properties of the data. 

      We agree. Indeed, we almost added a section to the original manuscript on this exact topic. We have now done so:

      “A potential concern regarding the analyses in Figure 2c,d is that they require explicit choices of behavioral variables: muscle population activity in Figure 2c and angular phase and velocity in Figure 2d. Perhaps these choices were misguided. Might neural and behavioral geometries become similar if one chooses ‘the right’ set of behavioral variables? This concern relates to the venerable search for movement parameters that are reliably encoded by motor cortex activity [69, 92–95]. If one chooses the wrong set of parameters (e.g. chooses muscle activity when one should have chosen joint angles) then of course neural and behavioral geometries will appear non-isometric. There are two reasons why this ‘wrong parameter choice’ explanation is unlikely to account for the results in Figure 2c,d. First, consider the implications of the left-hand side of Figure 2d. A small kinematic distance implies that angular position and velocity are nearly identical for the two moments being compared. Yet the corresponding pair of neural states can be quite distant. Under the concern above, this distance would be due to other encoded behavioral variables – perhaps joint angle and joint velocity – differing between those two moments. However, there are not enough degrees of freedom in this task to make this plausible. The shoulder remains at a fixed position (because the head is fixed) and the wrist has limited mobility due to the pedal design [60]. Thus, shoulder and elbow angles are almost completely determined by cycle phase. More generally, ‘external variables’ (positions, angles, and their derivatives) are unlikely to differ more than slightly when phase and angular velocity are matched. Muscle activity could be different because many muscles act on each joint, creating redundancy. However, as illustrated in Figure 2c, the key effect is just as clear when analyzing muscle activity. Thus, the above concern seems unlikely even if it can’t be ruled out entirely. A broader reason to doubt the ‘wrong parameter choice’ proposition is that it provides a vague explanation for a phenomenon that already has a straightforward explanation. A lack of isometry between the neural population response and behavior is expected when neural-trajectory tangling is low and output-null factors are plentiful [55, 60]. For example, in networks that generate muscle activity, neural and muscle-activity trajectories are far from isometric [52, 58, 60]. Given this straightforward explanation, and given repeated failures over decades to find the ‘correct’ parameters (muscle activity, movement direction, etc.) that create neural-behavior isometry, it seems reasonable to conclude that no such isometry exists.”

      Comment 1) The authors posit that neural and behavioral trajectories are non-isometric. To support this point, they look at distances between neural states and distances between the corresponding behavioral states, in order to demonstrate that there are differences in these distances in each respective space. This supports the idea that neural states and behavioral states are non-isometric but does not directly address their point. In order to say the trajectories are non-isometric, it would be better to look at pairs of distances between corresponding trajectories in each space. 

      We like this idea and have added such an analysis. To be clear, we like the original analysis too: isometry predicts that neural and behavioral distances (for corresponding pairs of points) should be strongly correlated, and that small behavioral distances should not be associated with large neural distances. These predictions are not true, providing a strong argument against isometry. However, we also like the reviewer’s suggestion, and have added such an analysis. It makes the same larger point, and also reveals some additional facts (e.g. it reveals that muscle-geometry is more related to neural-geometry than is kinematic-geometry). The new analysis is described in the following section:

      “We further explored the topic of isometry by considering pairs of distances. To do so, we chose two random neural states and computed their distance, yielding dneural1. We repeated this process, yielding dneural2. We then computed the corresponding pair of distances in muscle space (dmuscle1 and dmuscle2) and kinematic space (dkin1 and dkin2). We considered cases where dneural1 was meaningfully larger than (or smaller than) dneural2, and asked whether the behavioral variables had the same relationship; e.g. was dmuscle1 also larger than dmuscle2? For kinematics, this relationship was weak: across 100,000 comparisons, the sign of dkin1 − dkin2 agreed with dneural1 − dneural2 only 67.3% of the time (with 50% being chance). The relationship was much stronger for muscles: the sign of dmuscle1 − dmuscle2 agreed with dneural1 − dneural2 79.2% of the time, which is far more than expected by chance yet also far from what is expected given isometry (e.g. the sign agrees 99.7% of the time for the truly isometric control data in Figure 2e). Indeed there were multiple moments during this task when dneural1 was much larger than dneural2, yet dmuscle1 was smaller than dmuscle2. These observations are consistent with the proposal that neural trajectories resemble muscle trajectories in some dimensions, but with additional output-null dimensions that break the isometry [60].”

      Comment 3) The approach is built up on the idea of creating a "mesh" structure of possible states. In the body of the paper the definition of the mesh was not entirely clear and I could not find in the methods a more rigorous explicit definition. Since the mesh is integral to the approach, the paper would be improved with more description of this component. 

      This is a fair criticism. Although MINTs actual operations were well-documented, how those operations mapped onto the term ‘mesh’ was, we agree, a bit vague. The definition of the mesh is a bit subtle because it only emerges during decoding rather than being precomputed. This is part of what gives MINT much more flexibility than a lookup table. We have added the following to the manuscript.

      “We use the term ‘mesh’ to describe the scaffolding created by the training-set trajectories and the interpolated states that arise at runtime. The term mesh is apt because, if MINT’s assumptions are correct, interpolation will almost always be local. If so, the set of decodable states will resemble a mesh, created by line segments connecting nearby training-set trajectories. However, this mesh-like structure is not enforced by MINT’s operations.

      Interpolation could, in principle, create state-distributions that depart from the assumption of a sparse manifold. For example, interpolation could fill in the center of the green tube in Figure 1b, resulting in a solid manifold rather than a mesh around its outer surface. However, this would occur only if spiking observations argued for it. As will be documented below, we find that essentially all interpolation is local”

      We have also added Figure 4d. This new analysis documents the fact that decoded states are near trainingset trajectories, which is why the term ‘mesh’ is appropriate.

      Reviewer #3:

      Summary:  

      This manuscript develops a new method termed MINT for decoding of behavior. The method is essentially a table-lookup rather than a model. Within a given stereotyped task, MINT tabulates averaged firing rate trajectories of neurons (neural states) and corresponding averaged behavioral trajectories as stereotypes to construct a library. For a test trial with a realized neural trajectory, it then finds the closest neural trajectory to it in the table and declares the associated behavior trajectory in the table as the decoded behavior. The method can also interpolate between these tabulated trajectories. The authors mention that the method is based on three key assumptions: (1) Neural states may not be embedded in a lowdimensional subspace, but rather in a high-dimensional space. (2) Neural trajectories are sparsely distributed under different behavioral conditions. (3) These neural states traverse trajectories in a stereotyped order.  

      The authors conducted multiple analyses to validate MINT, demonstrating its decoding of behavioral trajectories in simulations and datasets (Figures 3, 4). The main behavior decoding comparison is shown in Figure 4. In stereotyped tasks, decoding performance is comparable (M_Cycle, MC_Maze) or better (Area 2_Bump) than other linear/nonlinear algorithms

      (Figure 4). However, MINT underperforms for the MC_RTT task, which is less stereotyped (Figure 4).  

      This paper is well-structured and its main idea is clear. The fact that performance on stereotyped tasks is high is interesting and informative, showing that these stereotyped tasks create stereotyped neural trajectories. The task-specific comparisons include various measures and a variety of common decoding approaches, which is a strength. However, I have several major concerns. I believe several of the conclusions in the paper, which are also emphasized in the abstract, are not accurate or supported, especially about generalization, computational scalability, and utility for BCIs. MINT is essentially a table-lookup algorithm based on stereotyped task-dependent trajectories and involves the tabulation of extensive data to build a vast library without modeling. These aspects will limit MINT's utility for real-world BCIs and tasks. These properties will also limit MINT's generalizability from task to task, which is important for BCIs and thus is commonly demonstrated in BCI experiments with other decoders without any retraining. Furthermore, MINT's computational and memory requirements can be prohibitive it seems. Finally, as MINT is based on tabulating data without learning models of data, I am unclear how it will be useful in basic investigations of neural computations. I expand on these concerns below.  

      We thank the reviewer for pointing out weaknesses in our framing and presentation. The comments above made us realize that we needed to 1) better document the ways in which MINT is far more flexible than a lookup-table, and 2) better explain the competing scientific perspectives at play. R3’s comments also motivated us to add an additional analysis of generalization. In our view the manuscript is greatly improved by these additions. Specifically, these additions directly support the broader impact that we hope the study will have.

      For simplicity and readability, we first group and summarize R3’s main concerns in order to better address them. (These main concerns are all raised above, in addition to recurring in the specific comments below. Responses to each individual specific comment are provided after these summaries.)

      (1) R3 raises concerns about ‘computational scalability.’ The concern is that “MINT's computational and memory requirements can be prohibitive.” This point was expanded upon in a specific comment, reproduced below:

      I also find the statement in the abstract and paper that "computations are simple, scalable" to be inaccurate. The authors state that MINT's computational cost is O(NC) only, but it seems this is achieved at a high memory cost as well as computational cost in training. The process is described in section "Lookup table of log-likelihoods" on line [978-990]. The idea is to precompute the log-likelihoods for any combination of all neurons with discretization x all delay/history segments x all conditions and to build a large lookup table for decoding. Basically, the computational cost of precomputing this table is O(V^{Nτ} x TC) and the table requires a memory of O(V^{Nτ}), where V is the number of discretization points for the neural firing rates, N is the number of neurons, τ is the history length, T is the trial length, and C is the number of conditions. This is a very large burden, especially the V^{Nτ} term. This cost is currently not mentioned in the manuscript and should be clarified in the main text. Accordingly, computation claims should be modified including in the abstract.

      The revised manuscript clarifies that our statement (that computations are simple and scalable) is absolutely accurate. There is no need to compute, or store, a massive lookup table. There are three tables: two of modest size and one that is tiny. This is now better explained:

      “Thus, the log-likelihood of , for a particular current neural state, is simply the sum of many individual log-likelihoods (one per neuron and time-bin). Each individual log-likelihood depends on only two numbers: the firing rate at that moment and the spike count in that bin. To simplify online computation, one can precompute the log-likelihood, under a Poisson model, for every plausible combination of rate and spike-count. For example, a lookup table of size 2001 × 21 is sufficient when considering rates that span 0-200 spikes/s in increments of 0.1 spikes/s, and considering 20 ms bins that contain at most 20 spikes (only one lookup table is ever needed, so long as its firing-rate range exceeds that of the most-active neuron at the most active moment in Ω). Now suppose we are observing a population of 200 neurons, with a 200 ms history divided into ten 20 ms bins. For each library state, the log-likelihood of the observed spike-counts is simply the sum of 200 × 10 = 2000 individual loglikelihoods, each retrieved from the lookup table. In practice, computation is even simpler because many terms can be reused from the last time bin using a recursive solution (Methods). This procedure is lightweight and amenable to real-time applications.”

      In summary, the first table simply needs to contain the firing rate of each neuron, for each condition, and each time in that condition. This table consumes relatively little memory. Assuming 100 one-second-long conditions (rates sampled every 20 ms) and 200 neurons, the table would contain 100 x 50 x 200 = 1,000,000 numbers. These numbers are typically stored as 16-bit integers (because rates are quantized), which amounts to about 2 MB. This is modest, given that most computers have (at least) tens of GB of RAM. A second table would contain the values for each behavioral variable, for each condition, and each time in that condition. This table might contain behavioral variables at a finer resolution (e.g. every millisecond) to enable decoding to update in between 20 ms bins (1 ms granularity is not needed for most BCI applications, but is the resolution used in this study). The number of behavioral variables of interest for a particular BCI application is likely to be small, often 1-2, but let’s assume for this example it is 10 (e.g. x-, y-, and z-position, velocity, and acceleration of a limb, plus one other variable). This table would thus contain 100 x 1000 x 10 = 1,000,000 floating point numbers, i.e. an 8 MB table. The third table is used to store the probability of s spikes being observed given a particular quantized firing rate (e.g. it may contain probabilities associated with firing rates ranging from 0 – 200 spikes/s in 0.1 spikes/s increments). This table is not necessary, but saves some computation time by precomputing numbers that will be used repeatedly. This is a very small table (typically ~2000 x 20, i.e. 320 KB). It does not need to be repeated for different neurons or conditions, because Poisson probabilities depend on only rate and count.

      (2) R3 raises a concern that MINT “is essentially a table-lookup rather than a model.’ R3 states that MINT 

      “is essentially a table-lookup algorithm based on stereotyped task-dependent trajectories and involves the tabulation of extensive data to build a vast library without modeling.”

      and that,

      “as MINT is based on tabulating data without learning models of data, I am unclear how it will be useful in basic investigations of neural computations.”

      This concern is central to most subsequent concerns. The manuscript has been heavily revised to address it. The revisions clarify that MINT is much more flexible than a lookup table, even though MINT uses a lookup table as its first step. Because R3’s concern is intertwined with one’s scientific assumptions, we have also added the new Figure 1 to explicitly illustrate the two key scientific perspectives and their decoding implications. 

      Under the perspective in Figure 1a, R3 would be correct in saying that there exist traditional interpretable decoders (e.g. a Kalman filter) whose assumptions better model the data. Under this perspective, MINT might still be an excellent choice in many cases, but other methods would be expected to gain the advantage when situations demand more flexibility. This is R3’s central concern, and essentially all other concerns flow from it. It makes sense that R3 has this concern, because their comments repeatedly stress a foundational assumption of the perspective in Figure 1a: the assumption of a fixed lowdimensional neural subspace where activity has a reliable relationship to behavior that can be modeled and leveraged during decoding. The phrases below accord with that view:

      “Unlike MINT, these works can achieve generalization because they model the neural subspace and its association to movement.”

      “it will not generalize… even when these tasks cover the exact same space (e.g., the same 2D computer screen and associated neural space).”

      “For proper training, the training data should explore the whole movement space and the associated neural space”

      “I also believe the authors should clarify the logic behind developing MINT better. From a scientific standpoint, we seek to gain insights into neural computations by making various assumptions and building models that parsimoniously describe the vast amount of neural data rather than simply tabulating the data. For instance, low-dimensional assumptions have led to the development of numerous dimensionality reduction algorithms and these models have led to important interpretations about the underlying dynamics”

      Thus, R3 prefers a model that 1) assumes a low-dimensional subspace that is fixed across tasks and 2) assumes a consistent ‘association’ between neural activity and kinematics. Because R3 believes this is the correct model of the data, they believe that decoders should leverage it. Traditional interpretable method do, and MINT doesn’t, which is why they find MINT to be unprincipled. This is a reasonable view, but it is not our view. We have heavily revised the manuscript to clarify that a major goal of our study is to explore the implications of a different, less-traditional scientific perspective.

      The new Figure 1a illustrates the traditional perspective. Under this perspective, one would agree with R3’s claim that other methods have the opportunity to model the data better. For example, suppose there exists a consistent neural subspace – conserved across tasks – where three neural dimensions encode 3D hand position and three additional neural dimensions encode 3D hand velocity. A traditional method such as a Kalman filter would be a very appropriate choice to model these aspects of the data.

      Figure 1b illustrates the alternative scientific perspective. This perspective arises from recent, present, and to-be-published observations. MINT’s assumptions are well-aligned with this perspective. In contrast, the assumptions of traditional methods (e.g. the Kalman filter) are not well-aligned with the properties of the data under this perspective. This does not mean traditional methods are not useful. Yet under Figure 1b, it is traditional methods, such as the Kalman filter, that lack an accurate model of the data. Of course, the reviewer may disagree with our scientific perspective. We would certainly concede that there is room for debate. However, we find the evidence for Figure 1b to be sufficiently strong that it is worth exploring the utility of methods that align with this scientific perspective. MINT is such a method. As we document, it performs very well.

      Thus, in our view, MINT is quite principled because its assumptions are well aligned with the data. It is true that the features of the data that MINT models are a bit different from those that are traditionally modeled. For example, R3 is quite correct that MINT does not attempt to use a biomimetic model of the true transformation from neural activity, to muscle activity, and thence to kinematics. We see this as a strength, and the manuscript has been revised accordingly (see paragraph beginning with “We leveraged this simulated data to compare MINT with a biomimetic decoder”).

      (3) R3 raises concerns that MINT cannot generalize. This was a major concern of R3 and is intimately related to concern #2 above. The concern is that, if MINT is “essentially a lookup table” that simply selects pre-defined trajectories, then MINT will not be able to generalize. R3 is quite correct that MINT generalizes rather differently than existing methods. Whether this is good or bad depends on one’s scientific perspective. Under Figure 1a, MINT’s generalization would indeed be limiting because other methods could achieve greater flexibility. Under Figure 1b, all methods will have serious limits regarding generalization. Thus, MINT’s method for generalizing may approximate the best one can presently do. To address this concern, we have made three major changes, numbered i-iii below:

      i) Large sections of the manuscript have been restructured to underscore the ways in which MINT can generalize. A major goal was to counter the impression, stated by R3 above, that: 

      “for a test trial with a realized neural trajectory, [MINT] then finds the closest neural trajectory to it in the table and declares the associated behavior trajectory in the table as the decoded behavior”.

      This description is a reasonable way to initially understand how MINT works, and we concede that we may have over-used this intuition. Unfortunately, it can leave the misimpression that MINT decodes by selecting whole trajectories, each corresponding to ‘a behavior’. This can happen, but it needn’t and typically doesn’t. As an example, consider the cycling task. Suppose that the library consists of stereotyped trajectories, each four cycles long, at five fixed speeds from 0.5-2.5 Hz. If the spiking observations argued for it, MINT could decode something close to one of these five stereotyped trajectories. Yet it needn’t. Decoded trajectories will typically resemble library trajectories locally, but may be very different globally. For example, a decoded trajectory could be thirty cycles long (or two, or five hundred) perhaps speeding up and slowing down multiple times across those cycles.

      Thus, the library of trajectories shouldn’t be thought of as specifying a limited set of whole movements that can be ‘selected from’. Rather, trajectories define a scaffolding that outlines where the neural state is likely to live and how it is likely to be changing over time. When we introduce the idea of library trajectories, we are now careful to stress that they don’t function as a set from which one trajectory is ‘declared’ to be the right one:

      “We thus designed MINT to approximate that manifold using the trajectories themselves, rather than their covariance matrix or corresponding subspace. Unlike a covariance matrix, neural trajectories indicate not only which states are likely, but also which state-derivatives are likely. If a neural state is near previously observed states, it should be moving in a similar direction. MINT leverages this directionality.

      Training-set trajectories can take various forms, depending on what is convenient to collect. Most simply, training data might include one trajectory per condition, with each condition corresponding to a discrete movement. Alternatively, one might instead employ one long trajectory spanning many movements. Another option is to employ many sub-trajectories, each briefer than a whole movement. The goal is simply for training-set trajectories to act as a scaffolding, outlining the manifold that might be occupied during decoding and the directions in which decoded trajectories are likely to be traveling.”

      Later in that same section we stress that decoded trajectories can move along the ‘mesh’ in nonstereotyped ways:

      “Although the mesh is formed of stereotyped trajectories, decoded trajectories can move along the mesh in non-stereotyped ways as long as they generally obey the flow-field implied by the training data. This flexibility supports many types of generalization, including generalization that is compositional in nature. Other types of generalization – e.g. from the green trajectories to the orange trajectories in Figure 1b – are unavailable when using MINT and are expected to be challenging for any method (as will be documented in a later section).”

      The section “Training and decoding using MINT” has been revised to clarify the ways in which interpolation is flexible, allowing decoded movements to be globally very different from any library trajectory.

      “To decode stereotyped trajectories, one could simply obtain the maximum-likelihood neural state from the library, then render a behavioral decode based on the behavioral state with the same values of c and k. This would be appropriate for applications in which conditions are categorical, such as typing or handwriting. Yet in most cases we wish for the trajectory library to serve not as an exhaustive set of possible states, but as a scaffolding for the mesh of possible states. MINT’s operations are thus designed to estimate any neural trajectory – and any corresponding behavioral trajectory – that moves along the mesh in a manner generally consistent with the trajectories in Ω.”

      “…interpolation allows considerable flexibility. Not only is one not ‘stuck’ on a trajectory from Φ, one is also not stuck on trajectories created by weighted averaging of trajectories in Φ. For example, if cycling speed increases, the decoded neural state could move steadily up a scaffolding like that illustrated in Figure 1b (green). In such cases, the decoded trajectory might be very different in duration from any of the library trajectories. Thus, one should not think of the library as a set of possible trajectories that are selected from, but rather as providing a mesh-like scaffolding that defines where future neural states are likely to live and the likely direction of their local motion. The decoded trajectory may differ considerably from any trajectory within Ω.”

      This flexibility is indeed used during movement. One empirical example is described in detail:

      “During movement… angular phase was decoded with effectively no net drift over time. This is noteworthy because angular velocity on test trials never perfectly matched any of the trajectories in Φ. Thus, if decoding were restricted to a library trajectory, one would expect growing phase discrepancies. Yet decoded trajectories only need to locally (and approximately) follow the flow-field defined by the library trajectories. Based on incoming spiking observations, decoded trajectories speed up or slow down (within limits).

      This decoding flexibility presumably relates to the fact that the decoded neural state is allowed to differ from the nearest state in Ω. To explore… [the text goes on to describe the new analysis in Figure 4d, which shows that the decoded state is typically not on any trajectory, though it is typically close to a trajectory].”

      Thus, MINT’s operations allow considerable flexibility, including generalization that is compositional in nature. Yet R3 is still correct that there are other forms of generalization that are unavailable to MINT. This is now stressed at multiple points in the revision. However, under the perspective in Figure 1b, these forms of generalization are unavailable to any current method. Hence we made a second major change in response to this concern…  ii) We explicitly illustrate how the structure of the data determines when generalization is or isn’t possible. The new Figure 1a,b introduces the two perspectives, and the new Figure 6a,b lays out their implications for generalization. Under the perspective in Figure 6a, the reviewer is quite right: other methods can generalize in ways that MINT cannot. Under the perspective in Figure 6b, expectations are very different. Those expectations make testable predictions. Hence the third major change… iii) We have added an analysis of generalization, using a newly collected dataset. This dataset was collected using Neuropixels Probes during our Pac-Man force-tracking task. This dataset was chosen because it is unusually well-suited to distinguishing the predictions in Figure 6a versus Figure 6b. Finding a dataset that can do so is not simple. Consider R3’s point that training data should “explore the whole movement space and the associated neural space”. The physical simplicity of the Pac-Man task makes it unusually easy to confirm that the behavioral workspace has been fully explored. Importantly, under Figure 6b, this does not mean that the neural workspace has been fully explored, which is exactly what we wish to test when testing generalization. We do so, and compare MINT with a Wiener filter. A Wiener filter is an ideal comparison because it is simple, performs very well on this task, and should be able to generalize well under Figure 1a. Additionally, the Wiener filter (unlike the Kalman Filter) doesn’t leverage the assumption that neural activity reflects the derivative of force. This matters because we find that neural activity does not reflect dforce/dt in this task. The Wiener filter is thus the most natural choice of the interpretable methods whose assumptions match Figure 1a.

      The new analysis is described in Figure 6c-g and accompanying text. Results are consistent with the predictions of Figure 6b. We are pleased to have been motivated to add this analysis for two reasons. First, it provides an additional way of evaluating the predictions of the two competing scientific perspectives that are at the heart of our study. Second, this analysis illustrates an underappreciated way in which generalization is likely to be challenging for any decode method. It can be tempting to think that the main challenge regarding generalization is to fully explore the relevant behavioral space. This makes sense if a behavioral space has “an associated neural space”. However, we are increasingly of the opinion that it doesn’t. Different tasks often involve different neural subspaces, even when behavioral subspaces overlap. We have even seen situations where motor output is identical but neural subspaces are quite different. These facts are relevant to any decoder, something highlighted in the revised Introduction:

      “MINT’s performance confirms that there are gains to be made by building decoders whose assumptions match a different, possibly more accurate view of population activity. At the same time, our results suggest fundamental limits on decoder generalization. Under the assumptions in Figure 1b, it will sometimes be difficult or impossible for decoders to generalize to not-yet-seen tasks. We found that this was true regardless of whether one uses MINT or a more traditional method. This finding has implications regarding when and how generalization should be attempted.”

      We have also added an analysis (Figure 6e) illustrating how MINT’s ability to compute likelihoods can be useful in detecting situations that may strain generalization (for any method). MINT is unusual in being able to compute and use likelihoods in this way.

      Detailed responses to R3: we reproduce each of R3’s specific concerns below, but concentrate our responses on issues not already covered above.

      Main comments: 

      Comment 1. MINT does not generalize to different tasks, which is a main limitation for BCI utility compared with prior BCI decoders that have shown this generalizability as I review below. Specifically, given that MINT tabulates task-specific trajectories, it will not generalize to tasks that are not seen in the training data even when these tasks cover the exact same space (e.g., the same 2D computer screen and associated neural space). 

      First, the authors provide a section on generalization, which is inaccurate because it mixes up two fundamentally different concepts: 1) collecting informative training data and 2) generalizing from task to task. The former is critical for any algorithm, but it does not imply the latter. For example, removing one direction of cycling from the training set as the authors do here is an example of generating poor training data because the two behavioral (and neural) directions are non-overlapping and/or orthogonal while being in the same space. As such, it is fully expected that all methods will fail. For proper training, the training data should explore the whole movement space and the associated neural space, but this does not mean all kinds of tasks performed in that space must be included in the training set (something MINT likely needs while modeling-based approaches do not). Many BCI studies have indeed shown this generalization ability using a model. For example, in Weiss et al. 2019, center-out reaching tasks are used for training and then the same trained decoder is used for typing on a keyboard or drawing on the 2D screen. In Gilja et al. 2012, training is on a center-out task but the same trained decoder generalizes to a completely different pinball task (hit four consecutive targets) and tasks requiring the avoidance of obstacles and curved movements. There are many more BCI studies, such as Jarosiewicz et al. 2015 that also show generalization to complex realworld tasks not included in the training set. Unlike MINT, these works can achieve generalization because they model the neural subspace and its association to movement. On the contrary, MINT models task-dependent neural trajectories, so the trained decoder is very task-dependent and cannot generalize to other tasks. So, unlike these prior BCIs methods, MINT will likely actually need to include every task in its library, which is not practical. 

      I suggest the authors remove claims of generalization and modify their arguments throughout the text and abstract. The generalization section needs to be substantially edited to clarify the above points. Please also provide the BCI citations and discuss the above limitation of MINT for BCIs. 

      As discussed above, R3’s concerns are accurate under the view in Figure 1a (and the corresponding Figure 6a). Under this view, a method such as that in Gilja et al. or Jarosiewicz et al. can find the correct subspace, model the correct neuron-behavior correlations, and generalize to any task that uses “the same 2D computer screen and associated neural space”, just as the reviewer argues. Under Figure 1b things are quite different.

      This topic – and the changes we have made to address it – is covered at length above. Here we simply want to highlight an empirical finding: sometimes two tasks use the same neural subspace and sometimes they don’t. We have seen both in recent data, and it is can be very non-obvious which will occur based just on behavior. It does not simply relate to whether one is using the same physical workspace. We have even seen situations where the patterns of muscle activity in two tasks are nearly identical, but the neural subspaces are fairly different. When a new task uses a new subspace, neither of the methods noted above (Gilja nor Jarosiewicz) will generalize (nor will MINT). Generalizing to a new subspace is basically impossible without some yet-to-be-invented approach. On the other hand, there are many other pairs of tasks (center-out-reaching versus some other 2D cursor control) where subspaces are likely to be similar, especially if the frequency content of the behavior is similar (in our recent experience this is often critical). When subspaces are shared, most methods will generalize, and that is presumably why generalization worked well in the studies noted above.

      Although MINT can also generalize in such circumstances, R3 is correct that, under the perspective in Figure 1a, MINT will be more limited than other methods. This is now carefully illustrated in Figure 6a. In this traditional perspective, MINT will fail to generalize in cases where new trajectories are near previously observed states, yet move in very different ways from library trajectories. The reason we don’t view this is a shortcoming is that we expect it to occur rarely (else tangling would be high). We thus anticipate the scenario in Figure 6b.

      This is worth stressing because R3 states that our discussion of generalization “is inaccurate because it mixes up two fundamentally different concepts: 1) collecting informative training data and 2) generalizing from task to task.” We have heavily revised this section and improved it. However, it was never inaccurate. Under Figure 6b, these two concepts absolutely are mixed up. If different tasks use different neural subspaces, then this requires collecting different “informative training data” for each. One cannot simply count on having explored the physical workspace.

      Comment 2. MINT is shown to achieve competitive/high performance in highly stereotyped datasets with structured trials, but worse performance on MC_RTT, which is not based on repeated trials and is less stereotyped. This shows that MINT is valuable for decoding in repetitive stereotyped use-cases. However, it also highlights a limitation of MINT for BCIs, which is that MINT may not work well for real-world and/or less-constrained setups such as typing, moving a robotic arm in 3D space, etc. This is again due to MINT being a lookup table with a library of stereotyped trajectories rather than a model. Indeed, the authors acknowledge that the lower performance on MC_RTT (Figure 4) may be caused by the lack of repeated trials of the same type. However, real-world BCI decoding scenarios will also not have such stereotyped trial structure and will be less/un-constrained, in which MINT underperforms. Thus, the claim in the abstract or lines 480-481 that MINT is an "excellent" candidate for clinical BCI applications is not accurate and needs to be qualified. The authors should revise their statements according and discuss this issue. They should also make the use-case of MINT on BCI decoding clearer and more convincing. 

      We discussed, above, multiple changes and additions to the revision that were made to address these concerns. Here we briefly expand on the comment that MINT achieves “worse performance on MC_RTT, which is not based on repeated trials and is less stereotyped”. All decoders performed poorly on this task. MINT still outperformed the two traditional methods, but this was the only dataset where MINT did not also perform better (overall) than the expressive GRU and feedforward network. There are probably multiple reasons why. We agree with R3 that one likely reason is that this dataset is straining generalization, and MINT may have felt this strain more than the two machine-learning-based methods. Another potential reason is the structure of the training data, which made it more challenging to obtain library trajectories in the first place. Importantly, these observations do not support the view in Figure 1a. MINT still outperformed the Kalman and Wiener filters (whose assumptions align with Fig. 1a). To make these points we have added the following:

      “Decoding was acceptable, but noticeably worse, for the MC_RTT dataset… As will be discussed below, every decode method achieved its worst estimates of velocity for the MC_RTT dataset. In addition to the impact of slower reaches, MINT was likely impacted by training data that made it challenging to accurate estimate library trajectories. Due to the lack of repeated trials, MINT used AutoLFADS to estimate the neural state during training. In principle this should work well. In practice AutoLFADS may have been limited by having only 10 minutes of training data. Because the random-target task involved more variable reaches, it may also have stressed the ability of all methods to generalize, perhaps for the reasons illustrated in Figure 1b.

      The only dataset where MINT did not perform the best overall was the MC_RTT dataset, where it was outperformed by the feedforward network and GRU. As noted above, this may relate to the need for MINT to learn neural trajectories from training data that lacked repeated trials of the same movement (a design choice one might wish to avoid). Alternatively, the less-structured MC_RTT dataset may strain the capacity to generalize; all methods experienced a drop in velocity-decoding R2 for this dataset compared to the others. MINT generalizes somewhat differently than other methods, and may have been at a modest disadvantage for this dataset. A strong version of this possibility is that perhaps the perspective in Figure 1a is correct, in which case MINT might struggle because it cannot use forms of generalization that are available to other methods (e.g. generalization based on neuron-velocity correlations). This strong version seems unlikely; MINT continued to significantly outperform the Wiener and Kalman filters, which make assumptions aligned with Figure 1a.”

      Comment 3. Related to 2, it may also be that MINT achieves competitive performance in offline and trial-based stereotyped decoding by overfitting to the trial structure in a given task, and thus may not generalize well to online performance due to overfitting. For example, a recent work showed that offline decoding performance may be overfitted to the task structure and may not represent online performance (Deo et al. 2023). Please discuss. 

      We agree that a limitation of our study is that we do not test online performance. There are sensible reasons for this decision:

      “By necessity and desire, all comparisons were made offline, enabling benchmarked performance across a variety of tasks and decoded variables, where each decoder had access to the exact same data and recording conditions.”

      We recently reported excellent online performance in the cycling task with a different algorithm

      (Schroeder et al. 2022). In the course of that study, we consistently found that improvements in our offline decoding translated to improvements in our online decoding. We thus believe that MINT (which improves on the offline performance of our older algorithm) is a good candidate to work very well online. Yet we agree this still remains to be seen. We have added the following to the Discussion:

      “With that goal in mind, there exist three important practical considerations. First, some decode algorithms experience a performance drop when used online. One presumed reason is that, when decoding is imperfect, the participant alters their strategy which in turn alters the neural responses upon which decoding is based. Because MINT produces particularly accurate decoding, this effect may be minimized, but this cannot be known in advance. If a performance drop does indeed occur, one could adapt the known solution of retraining using data collected during online decoding [13]. Another presumed reason (for a gap between offline and online decoding) is that offline decoders can overfit the temporal structure in training data [107]. This concern is somewhat mitigated by MINT’s use of a short spike-count history, but MINT may nevertheless benefit from data augmentation strategies such as including timedilated versions of learned trajectories in the libraries”

      Comment 4. Related to 2, since MINT requires firing rates to generate the library and simple averaging does not work for this purpose in the MC_RTT dataset (that does not have repeated trials), the authors needed to use AutoLFADS to infer the underlying firing rates. The fact that MINT requires the usage of another model to be constructed first and that this model can be computationally complex, will also be a limiting factor and should be clarified. 

      This concern relates to the computational complexity of computing firing-rate trajectories during training. Usually, rates are estimated via trial-averaging, which makes MINT very fast to train. This was quite noticeable during the Neural Latents Benchmark competition. As one example, for the “MC_Scaling 5 ms Phase”, MINT took 28 seconds to train while GPFA took 30 minutes, the transformer baseline (NDT) took 3.5 hours, and the switching nonlinear dynamical system took 4.5 hours.

      However, the reviewer is quite correct that MINT’s efficiency depends on the method used to construct the library of trajectories. As we note, “MINT is a method for leveraging a trajectory library, not a method for constructing it”. One can use trial-averaging, which is very fast. One can also use fancier, slower methods to compute the trajectories. We don’t view this as a negative – it simply provides options. Usually one would choose trial-averaging, but one does not have to. In the case of MC_RTT, one has a choice between LFADS and grouping into pseudo-conditions and averaging (which is fast). LFADS produces higher performance at the cost of being slower. The operator can choose which they prefer. This is discussed in the following section:

      “For MINT, ‘training’ simply means computation of standard quantities (e.g. firing rates) rather than parameter optimization. MINT is thus typically very fast to train (Table 1), on the order of seconds using generic hardware (no GPUs). This speed reflects the simple operations involved in constructing the library of neural-state trajectories: filtering of spikes and averaging across trials. At the same time we stress that MINT is a method for leveraging a trajectory library, not a method for constructing it. One may sometimes wish to use alternatives to trial-averaging, either of necessity or because they improve trajectory estimates. For example, for the MC_RTT task we used AutoLFADS to infer the library. Training was consequently much slower (hours rather than seconds) because of the time taken to estimate rates. Training time could be reduced back to seconds using a different approach – grouping into pseudo-conditions and averaging – but performance was reduced. Thus, training will typically be very fast, but one may choose time-consuming methods when appropriate.”

      Comment 5. I also find the statement in the abstract and paper that "computations are simple, scalable" to be inaccurate. The authors state that MINT's computational cost is O(NC) only, but it seems this is achieved at a high memory cost as well as computational cost in training. The process is described in section "Lookup table of log-likelihoods" on line [978-990]. The idea is to precompute the log-likelihoods for any combination of all neurons with discretization x all delay/history segments x all conditions and to build a large lookup table for decoding. Basically, the computational cost of precomputing this table is O(V^{Nτ} x TC) and the table requires a memory of O(V^{Nτ}), where V is the number of discretization points for the neural firing rates, N is the number of neurons, τ is the history length, T is the trial length, and C is the number of conditions. This is a very large burden, especially the V^{Nτ} term. This cost is currently not mentioned in the manuscript and should be clarified in the main text. Accordingly, computation claims should be modified including in the abstract. 

      As discussed above, the manuscript has been revised to clarify that our statement was accurate.

      Comment 6. In addition to the above technical concerns, I also believe the authors should clarify the logic behind developing MINT better. From a scientific standpoint, we seek to gain insights into neural computations by making various assumptions and building models that parsimoniously describe the vast amount of neural data rather than simply tabulating the data. For instance, low-dimensional assumptions have led to the development of numerous dimensionality reduction algorithms and these models have led to important interpretations about the underlying dynamics (e.g., fixed points/limit cycles). While it is of course valid and even insightful to propose different assumptions from existing models as the authors do here, they do not actually translate these assumptions into a new model. Without a model and by just tabulating the data, I don't believe we can provide interpretation or advance the understanding of the fundamentals behind neural computations. As such, I am not clear as to how this library building approach can advance neuroscience or how these assumptions are useful. I think the authors should clarify and discuss this point. 

      As requested, a major goal of the revision has been to clarify the scientific motivations underlying MINT’s design. In addition to many textual changes, we have added figures (Figures 1a,b and 6a,b) to outline the two competing scientific perspectives that presently exist. This topic is also addressed by extensions of existing analyses and by new analyses (e.g. Figure 6c-g). 

      In our view these additions have dramatically improved the manuscript. This is especially true because we think R3’s concerns, expressed above, are reasonable. If the perspective in Figure 1a is correct, then R3 is right and MINT is essentially a hack that fails to model the data. MINT would still be effective in many circumstances (as we show), but it would be unprincipled. This would create limitations, just as the reviewer argues. On the other hand, if the perspective in Figure 1b is correct, then MINT is quite principled relative to traditional approaches. Traditional approaches make assumptions (a fixed subspace, consistent neuron-kinematic correlations) that are not correct under Figure 1b.

      We don’t expect R3 to agree with our scientific perspective at this time (though we hope to eventually convince them). To us, the key is that we agree with R3 that the manuscript needs to lay out the different perspectives and their implications, so that readers have a good sense of the possibilities they should be considering. The revised manuscript is greatly improved in this regard.

      Comment 7. Related to 6, there seems to be a logical inconsistency between the operations of MINT and one of its three assumptions, namely, sparsity. The authors state that neural states are sparsely distributed in some neural dimensions (Figure 1a, bottom). If this is the case, then why does MINT extend its decoding scope by interpolating known neural states (and behavior) in the training library? This interpolation suggests that the neural states are dense on the manifold rather than sparse, thus being contradictory to the assumption made. If interpolation-based dense meshes/manifolds underlie the data, then why not model the neural states through the subspace or manifold representations? I think the authors should address this logical inconsistency in MINT, especially since this sparsity assumption also questions the low-dimensional subspace/manifold assumption that is commonly made. 

      We agree this is an important issue, and have added an analysis on this topic (Figure 4d). The key question is simple and empirical: during decoding, does interpolation cause MINT to violate the assumption of sparsity? R3 is quite right that in principle it could. If spiking observations argue for it, MINT’s interpolation could create a dense manifold during decoding rather than a sparse one. The short answer is that empirically this does not happen, in agreement with expectations under Figure 1b. Rather than interpolating between distant states and filling in large ‘voids’, interpolation is consistently local. This is a feature of the data, not of the decoder (MINT doesn’t insist upon sparsity, even though it is designed to work best in situations where the manifold is sparse).

      In addition to adding Figure 4d, we added the following (in an earlier section):

      “The term mesh is apt because, if MINT’s assumptions are correct, interpolation will almost always be local. If so, the set of decodable states will resemble a mesh, created by line segments connecting nearby training-set trajectories. However, this mesh-like structure is not enforced by MINT’s operations. Interpolation could, in principle, create state-distributions that depart from the assumption of a sparse manifold. For example, interpolation could fill in the center of the green tube in Figure 1b, resulting in a solid manifold rather than a mesh around its outer surface. However, this would occur only if spiking observations argued for it. As will be documented below, we find that essentially all interpolation is local.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      I appreciate the detailed methods section, however, more specifics should be integrated into the main text. For example on Line 238, it should additionally be stated how many minutes were used for training and metrics like the MAE which is used later should be reported here.

      Thank you for this suggestion. We now report the duration of training data in the main text:

      “Decoding R^2 was .968 over ~7.1 minutes of test trials based on ~4.4 minutes of training data.”

      We have also added similar specifics throughout the manuscript, e.g. in the Fig. 5 legend:

      “Results are based on the following numbers of training / test trials: MC\_Cycle (174 train, 99 test), MC\_Maze (1721 train, 574 test), Area2\_Bump (272 train, 92 test), MC\_RTT (810 train, 268 test).”

      Similar additions were made to the legends for Fig. 6 and 8. Regarding the request to add MAE for the multitask network, we did not do so for the simple reason that the decoded variable (muscle activity) has arbitrary units. The raw MAE is thus not meaningful. We could of course have normalized, but at this point the MAE is largely redundant with the correlation. In contrast, the MAE is useful when comparing across the MC_Maze, Area2_Bump, and MC_RTT datasets, because they all involve the same scale (cm/s).

      Regarding the MC_RTT task, AutoLFADS was used to obtain robust spike rates, as reported in the methods. However, the rationale for splitting the neural trajectories after AutoLFADS is unclear. If the trajectories were split based on random recording gaps, this might lead to suboptimal performance? It might be advantageous to split them based on a common behavioural state? 

      When learning neural trajectories via AutoLFADS, spiking data is broken into short (but overlapping) segments, rates are estimated for each segment via AutoLFADs, and these rates are then stitched together across segments into long neural trajectories. If there had been no recording gaps, these rates could have been stitched into a single neural trajectory for this dataset. However, the presence of recording gaps left us no choice but to stitch together these rates into more than one trajectory. Fortunately, recording gaps were rare: for the decoding analysis of MC_RTT there were only two recording gaps and therefore three neural trajectories, each ~2.7 minutes in duration. 

      We agree that in general it is desirable to learn neural trajectories that begin and end at behaviorallyrelevant moments (e.g. in between movements). However, having these trajectories potentially end midmovement is not an issue in and of itself. During decoding, MINT is never stuck on a trajectory. Thus, if MINT were decoding states near the end of a trajectory that was cut short due to a training gap, it would simply begin decoding states from other trajectories or elsewhere along the same trajectory in subsequent moments. We could have further trimmed the three neural trajectories to begin and end at behaviorallyrelevant moments, but chose not to as this would have only removed a handful of potentially useful states from the library.

      We now describe this in the Methods:

      “Although one might prefer trajectory boundaries to begin and end at behaviorally relevant moments (e.g. a stationary state), rather than at recording gaps, the exact boundary points are unlikely to be consequential for trajectories of this length that span multiple movements. If MINT estimates a state near the end of a long trajectory, its estimate will simply jump to another likely state on a different trajectory (or earlier along the same trajectory) in subsequent moments. Clipping the end of each trajectory to an earlier behaviorally-relevant moment would only remove potentially useful states from the libraries.”

      Are the training and execution times in Table 1 based on pure Matlab functions or Mex files? If it's Mex files as suggested by the code, it would be good to mention this in the Table caption.

      They are based on a combination of MATLAB and MEX files. This is now clarified in the table caption:

      “Timing measurements taken on a Macbook Pro (on CPU) with 32GB RAM and a 2.3 GHz 8-Core Intel Core i9 processor. Training and execution code used for measurements was written in MATLAB (with the core recursion implemented as a MEX file).”

      As the method most closely resembles a Bayesian decoder it would be good to compare performance against a Naive Bayes decoder. 

      We agree and have now done so. The following has been added to the text:

      “A natural question is thus whether a simpler Bayesian decoder would have yielded similar results. We explored this possibility by testing a Naïve Bayes regression decoder [85] using the MC_Maze dataset. This decoder performed poorly, especially when decoding velocity (R2 = .688 and .093 for hand position and velocity, respectively), indicating that the specific modeling assumptions that differentiate MINT from a naive Bayesian decoder are important drivers of MINT’s performance.”

      Line 199 Typo: The assumption of stereotypy trajectory also enables neural states (and decoded behaviors) to be updated in between time bins. 

      Fixed

      Table 3: It's unclear why the Gaussian binning varies significantly across different datasets. Could the authors explain why this is the case and what its implications might be? 

      We have added the following description in the “Filtering, extracting, and warping data on each trial” subsection of the Methods to discuss how 𝜎 may vary due to the number of trials available for training and how noisy the neural data for those trials is:

      “First, spiking activity for each neuron on each trial was temporally filtered with a Gaussian to yield single-trial rates. Table 3 reports the Gaussian standard deviations σ (in milliseconds) used for each dataset. Larger values of σ utilize broader windows of spiking activity when estimating rates and therefore reduce variability in those rate estimates. However, large σ values also yield neural trajectories with less fine-grained temporal structure. Thus, the optimal σ for a dataset depends on how variable the rate estimates otherwise are.”

      An implementation of the method in an open-source programming language could further enhance the widespread use of the tool. 

      We agree this would be useful, but have yet not implemented the method in any other programming languages. Implementation in Python is still a future goal.

      Reviewer #2 (Recommendations For The Authors): 

      - Figures 4 and 5 should show the error bars on the horizontal axis rather than portraying them vertically. 

      [Note that these are now Figures 5 and 6]

      The figure legend of Figure 5 now clarifies that the vertical ticks are simply to aid visibility when symbols have very similar means and thus overlap visually. We don’t include error bars (for this analysis) because they are very small and would mostly be smaller than the symbol sizes. Instead, to indicate certainty regarding MINT’s performance measurements, the revised text now gives error ranges for the correlations and MAE values in the context of Figure 4c. These error ranges were computed as the standard deviation of the sampling distribution (computed via resampling of trials) and are thus equivalent to SEMs. The error ranges are all very small; e.g. for the MC_Maze dataset the MAE for x-velocity is 4.5 +/- 0.1 cm/s. (error bars on the correlations are smaller still).

      Thus, for a given dataset, we can be quite certain of how well MINT performs (within ~2% in the above case). This is reassuring, but we also don’t want to overemphasize this accuracy. The main sources of variability one should be concerned about are: 1) different methods can perform differentially well for different brain areas and tasks, 2) methods can decode some behavioral variables better than others, and 3) performance depends on factors like neuron-count and the number of training trials, in ways that can differ across decode methods. For this reason, the study examines multiple datasets, across tasks and brain areas, and measures performance for a range of decoded variables. We also examine the impact of training-set-size (Figure 8a) and population size (solid traces in Fig. 8b, see R2’s next comment below). 

      There is one other source of variance one might be concerned about, but it is specific to the neuralnetwork approaches: different weight initializations might result in different performance. For this reason, each neural-network approach was trained ten times, with the average performance computed. The variability around this average was very small, and this is now stated in the Methods.

      “For the neural networks, the training/testing procedure was repeated 10 times with different random seeds. For most behavioral variables, there was very little variability in performance across repetitions. However, there were a few outliers for which variability was larger. Reported performance for each behavioral group is the average performance across the 10 repetitions to ensure results were not sensitive to any specific random initialization of each network.”

      - For Figure 6, it is unclear whether the neuron-dropping process was repeated multiple times. If not, it should be since the results will be sensitive to which particular subsets of neurons were "dropped". In this case, the results presented in Figure 6 should include error bars to describe the variability in the model performance for each decoder considered. 

      A good point. The results in Figure 8 (previously Figure 6) were computed by averaging over the removal of different random subsets of neurons (50 subsets per neuron count), just as the reviewer requests. The figure has been modified to include the standard deviation of performance across these 50 subsets. The legend clarifies how this was done.

      Reviewer #3 (Recommendations For The Authors): 

      Other comments: 

      (1) [Line 185-188] The authors argue that in a 100-dimensional space with 10 possible discretized values, 10^100 potential neural states need to be computed. But I am not clear on this. This argument seems to hold only in the absence of a model (as in MINT). For a model, e.g., Kalman filter or AutoLFADS, information is encoded in the latent state. For example, a simple Kalman filter for a linear model can be used for efficient inference. This 10^100 computation isn't a general problem but seems MINT-specific, please clarify. 

      We agree this section was potentially confusing. It has been rewritten. We were simply attempting to illustrate why maximum likelihood computations are challenging without constraints. MINT simplifies this problem by adding constraints, which is why it can readily provide data likelihoods (and can do so using a Poisson model). The rewritten section is below:

      “Even with 1000 samples for each of the neural trajectories in Figure 3, there are only 4000 possible neural states for which log-likelihoods must be computed (in practice it is fewer still, see Methods). This is far fewer than if one were to naively consider all possible neural states in a typical rate- or factor-based subspace. It thus becomes tractable to compute log-likelihoods using a Poisson observation model. A Poisson observation model is usually considered desirable, yet can pose tractability challenges for methods that utilize a continuous model of neural states. For example, when using a Kalman filter, one is often restricted to assuming a Gaussian observation model to maintain computational tractability “

      (2) [Figure 6b] Why do the authors set the dropped neurons to zero in the "zeroed" results of the robustness analysis? Why not disregard the dropped neurons during the decoding process? 

      We agree the terminology we had used in this section was confusing. We have altered the figure and rewritten the text. The following, now at the beginning of that section, addresses the reviewer’s query: 

      “It is desirable for a decoder to be robust to the unexpected loss of the ability to detect spikes from some neurons. Such loss might occur while decoding, without being immediately detected. Additionally, one desires robustness to a known loss of neurons / recording channels. For example, there may have been channels that were active one morning but are no longer active that afternoon. At least in principle, MINT makes it very easy to handle this second situation: there is no need to retrain the decoder, one simply ignores the lost neurons when computing likelihoods. This is in contrast to nearly all other methods, which require retraining because the loss of one neuron alters the optimal parameters associated with every other neuron.”

      The figure has been relabeled accordingly; instead of the label ‘zeroed’, we use the label ‘undetected neuron loss’.

      (3) Authors should provide statistical significance on their results, which they already did for Fig. S3a,b,c but missing on some other figures/places. 

      We have added error bars in some key places, including in the text when quantifying MINT’s performance in the context of Figure 4. Importantly, error bars are only as meaningful as the source of error they assess, and there are reasons to be careful given this. The standard method for putting error bars on performance is to resample trials, which is indeed what we now report. These error bars are very small. For example, when decoding horizontal velocity for the MC_Maze dataset, the correlation between MINT’s decode and the true velocity had a mean and SD of the sampling distribution of 0.963 +/- 0.001. This means that, for a given dataset and target variable, we have enough trials/data that we can be quite certain of how well MINT performs. However, we want to be careful not to overstate this certainty. What one really wants to know is how well MINT performs across a variety of datasets, brain areas, target variables, neuron counts, etc. It is for this reason that we make multiple such comparisons, which provides a more valuable view of performance variability.

      For Figure 7, error bars are unavailable. Because this was a benchmark, there was exactly one test-set that was never seen before. This is thus not something that could be resampled many times (that would have revealed the test data and thus invalidated the benchmark, not to mention that some of these methods take days to train). We could, in principle, have added resampling to Figure 5. In our view it would not be helpful and could be misleading for the reasons noted above. If we computed standard errors using different train/test partitions, they would be very tight (mostly smaller than the symbol sizes), which would give the impression that one can be quite certain of a given R^2 value. Yet variability in the train/test partition is not the variability one is concerned about in practice. In practice, one is concerned about whether one would get a similar R^2 for a different dataset, or brain area, or task, or choice of decoded variable. Our analysis thus concentrated on showing results across a broad range of situations. In our view this is a far more relevant way of illustrating the degree of meaningful variability (which is quite large) than resampling, which produces reassuringly small but (mostly) irrelevant standard errors.

      Error bars are supplied in Figure 8b. These error bars give a sense of variability across re-samplings of the neural population. While this is not typically the source of variability one is most concerned about, for this analysis it becomes appropriate to show resampling-based standard errors because a natural concern is that results may depend on which neurons were dropped. So here it is both straightforward, and desirable, to compute standard errors. (The fact that MINT and the Wiener filter can be retrained many times swiftly was also key – this isn’t true of the more expressive methods). Figure S1 also uses resampling-based confidence intervals for similar reasons.

      (4) [Line 431-437] Authors state that MINT outperforms other methods with the PSTH R^2 metric (trial-averaged smoothed spikes for each condition). However, I think this measure may not provide a fair comparison and is confounded because MINT's library is built using PSTH (i.e., averaged firing rate) but other methods do not use the PSTH. The author should clarify this. 

      The PSTH R^2 metric was not created by us; it was part of the Neural Latents Benchmark. They chose it because it ensures that a method cannot ‘cheat’ (on the Bits/Spike measure) by reproducing fine features of spiking while estimating rates badly. We agree with the reviewer’s point: MINT’s design does give it a potential advantage in this particular performance metric. This isn’t a confound though, just a feature. Importantly, MINT will score well on this metric only if MINT’s neural state estimate is accurate (including accuracy in time). Without accurate estimation of the neural state at each time, it wouldn’t matter that the library trajectory is based on PSTHs. This is now explicitly stated:

      “This is in some ways unsurprising: MINT estimates neural states that tend to resemble (at least locally) trajectories ‘built’ from training-set-derived rates, which presumably resemble test-set rates. Yet strong performance is not a trivial consequence of MINT’s design. MINT does not ‘select’ whole library trajectories; PSTH R2 will be high only if condition (c), index (k), and the interpolation parameter (α) are accurately estimated for most moments.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      In the presented manuscript, the authors investigate how neural networks can learn to replay presented sequences of activity. Their focus lies on the stochastic replay according to learned transition probabilities. They show that based on error-based excitatory and balance-based inhibitory plasticity networks can selforganize towards this goal. Finally, they demonstrate that these learning rules can recover experimental observations from song-bird song learning experiments. 

      Overall, the study appears well-executed and coherent, and the presentation is very clear and helpful. However, it remains somewhat vague regarding the novelty. The authors could elaborate on the experimental and theoretical impact of the study, and also discuss how their results relate to those of Kappel et al, and others (e.g., Kappel et al (doi.org/10.1371/journal.pcbi.1003511))). 

      We agree with the reviewer that our previous manuscript lacked comparison with previously published similar works. While Kappel et al. demonstrated that STDP in winner-take-all circuits can approximate online learning of hidden Markov models (HMMs), a key distinction from our model is that their neural representations acquire deterministic sequential activations, rather than exhibiting stochastic transitions governing Markovian dynamics. Specifically, in their model, the neural representation of state B would be different in the sequences ABC and CBA, resulting in distinct deterministic representations like ABC and C'B'A', where ‘A’ and ‘A'’ are represented by different neural states (e.g., activations of different cell assemblies). In contrast, our network learns to generate stochastically transitioning cell assemblies which replay Markovian trajectories of spontaneous activity obeying the learned transition probabilities between neural representations of states. For example, starting from reactivation from assembly ‘A’, there may be an 80% probability to transition to assembly ‘B’ and 20% to ‘C’. Although Kappel et al.'s model successfully solves HMMs, their neural representations do not themselves stochastically transition between states according to the learned model. Similar to the Kappel et al.'s model, while the models proposed in Barber (2002) and Barber and Agakov (2002) learn the Markovian statistics, these models learned a static spatiotemporal input patterns only and how assemblies of neurons show stochastic transition in spontaneous activity has been still unclear. In contrast with these models, our model captures the probabilistic neural state trajectories, allowing spontaneous replay of experienced sequences with stochastic dynamics matching the learned environmental statistics.

      We have included new sentences for explain these in ll. 509-533 in the revised manuscript.

      Overall, the work could benefit if there was either (A) a formal analysis or derivation of the plasticity rules involved and a formal justification of the usefulness of the resulting (learned) neural dynamics; 

      We have included a derivation of our plasticity rules in ll. 630-670 in the revised manuscript. Consistent with our claim that excitatory plasticity updates the excitatory synapse to predict output firing rates, we have shown that the corresponding cost function measures the discrepancy between the recurrent prediction and the output firing rate. Similarly, for inhibitory plasticity, we defined the cost function that evaluates the difference between the excitatory and inhibitory potential within each neuron. We showed that the resulting inhibitory plasticity rule updates the inhibitory synapses to maintain the excitation-inhibition balance.

      and/or (B) a clear connection of the employed plasticity rules to biological plasticity and clear testable experimental predictions. Thus, overall, this is a good work with some room for improvement. 

      Our proposed plasticity mechanism could be implemented through somatodendritic interactions. Analogous to previous computational works (Urbanczik and Senn., 2014; Asabuki and Fukai., 2020; Asabuki et al., 2022), our model suggests that somatic responses may encode the stimulus-evoked neural activity states, while dendrites encode predictions based on recurrent dynamics that aim to minimize the discrepancy between somatic and dendritic activity. To directly test this hypothesis, future experimental studies could simultaneously record from both somatic and dendritic compartments to investigate how they encode evoked responses and predictive signals during learning (Francioni et al., 2022).

      We have included new sentences for explain these in ll. 476-484 in the revised manuscript.

      Reviewer #2 (Public Review): 

      Summary: 

      This work proposes a synaptic plasticity rule that explains the generation of learned stochastic dynamics during spontaneous activity. The proposed plasticity rule assumes that excitatory synapses seek to minimize the difference between the internal predicted activity and stimulus-evoked activity, and inhibitory synapses try to maintain the E-I balance by matching the excitatory activity. By implementing this plasticity rule in a spiking recurrent neural network, the authors show that the state-transition statistics of spontaneous excitatory activity agree with that of the learned stimulus patterns, which are reflected in the learned excitatory synaptic weights. The authors further demonstrate that inhibitory connections contribute to well-defined state transitions matching the transition patterns evoked by the stimulus. Finally, they show that this mechanism can be expanded to more complex state-transition structures including songbird neural data. 

      Strengths: 

      This study makes an important contribution to computational neuroscience, by proposing a possible synaptic plasticity mechanism underlying spontaneous generations of learned stochastic state-switching dynamics that are experimentally observed in the visual cortex and hippocampus. This work is also very clearly presented and well-written, and the authors conducted comprehensive simulations testing multiple hypotheses. Overall, I believe this is a well-conducted study providing interesting and novel aspects of the capacity of recurrent spiking neural networks with local synaptic plasticity. 

      Weaknesses: 

      This study is very well-thought-out and theoretically valuable to the neuroscience community, and I think the main weaknesses are in regard to how much biological realism is taken into account. For example, the proposed model assumes that only synapses targeting excitatory neurons are plastic, and uses an equal number of excitatory and inhibitory neurons. 

      We agree with the reviewer. The network shown in the previous manuscript consists of an equal number of excitatory and inhibitory neurons, which seems to lack biological plausibility. Therefore, we first tested whether a biologically plausible scenario would affect learning performance by setting the ratio of excitatory to inhibitory neurons to 80% and 20% (Supplementary Figure 7a; left). Even in such a scenario, the network still showed structured spontaneous activity (Supplementary Figure 7a; center), with transition statistics of replayed events matching the true transition probabilities (Supplementary Figure 7a; right). We then asked whether the model with our plasticity rule applied to all synapses would reproduce the corresponding stochastic transitions. We found that the network can learn transition statistics but only under certain conditions. The network showed only weak replay and failed to reproduce the appropriate transition (Supplementary Fig. 7b) if the inhibitory neurons were no longer driven by the synaptic currents reflecting the stimulus, due to a tight balance of excitatory and inhibitory currents on the inhibitory neurons. We then tested whether the network with all synapses plastic can learn transition statistics if the external inputs project to the inhibitory neurons as well. We found that, when each stimulus pattern activates a non-overlapping subset of neurons, the network does not exhibit the correct stochastic transition of assembly reactivation (Supplementary Fig. 7c). Interestingly, when each neuron's activity is triggered by multiple stimuli and has mixed selectivity, the reactivation reproduced the appropriate stochastic transitions (Supplementary Fig. 7d).

      We have included these new results as new Supplementary Figure 7 and they are explained in ll.215-230 in the revised manuscript.

      The model also assumes Markovian state dynamics while biological systems can depend more on history. This limitation, however, is acknowledged in the Discussion. 

      We have included the following sentence to provide a possible solution to this limitation: “Therefore, to learn higher-order stochastic transitions, recurrent neural networks like ours may need to integrate higher-order inputs with longer time scales.” in ll.557-559 in the revised manuscript. 

      Finally, to simulate spontaneous activity, the authors use a constant input of 0.3 throughout the study. Different amplitudes of constant input may correspond to different internal states, so it will be more convincing if the authors test the model with varying amplitudes of constant inputs. 

      We thank the reviewer for pointing this out. In the revised manuscript, we have tested constant input with three different strengths. If the strength is moderate, the network showed accurate encoding of transition statistics in the spontaneous activity as we have seen in Fig.2. We have additionally shown that the weaker background input causes spontaneous activity with lower replay rate, which in turn leads to high variance of encoded transition, while stronger inputs make assembly replay transitions more uniform. We have included these new results as new Supplementary Figure 6 and they are explained in ll.211214 in the revised manuscript.

      Reviewer #3 (Public Review): 

      Summary: 

      Asabuki and Clopath study stochastic sequence learning in recurrent networks of Poisson spiking neurons that obey Dale's law. Inspired by previous modeling studies, they introduce two distinct learning rules, to adapt excitatory-to-excitatory and inhibitory-to-excitatory synaptic connections. Through a series of computer experiments, the authors demonstrate that their networks can learn to generate stochastic sequential patterns, where states correspond to non-overlapping sets of neurons (cell assemblies) and the state-transition conditional probabilities are first-order Markov, i.e., the transition to a given next state only depends on the current state. Finally, the authors use their model to reproduce certain experimental songbird data involving highly-predictable and highly-uncertain transitions between song syllables. 

      Strengths: 

      This is an easy-to-follow, well-written paper, whose results are likely easy to reproduce. The experiments are clear and well-explained. The study of songbird experimental data is a good feature of this paper; finches are classical model animals for understanding sequence learning in the brain. I also liked the study of rapid task-switching, it's a good-to-know type of result that is not very common in sequence learning papers. 

      Weaknesses: 

      While the general subject of this paper is very interesting, I missed a clear main result. The paper focuses on a simple family of sequence learning problems that are well-understood, namely first-order Markov sequences and fully visible (nohidden-neuron) networks, studied extensively in prior work, including with spiking neurons. Thus, because the main results can be roughly summarized as examples of success, it is not entirely clear what the main point of the authors is. 

      We apologize the reviewer that our main claim was not clear. While various computational studies have suggested possible plasticity mechanisms for embedding evoked activity patterns or their probability structures into spontaneous activity (Litwin-Kumar et al., Nat. Commun. 2014, Asabuki and Fukai., Biorxiv 2023), how transition statistics of the environment are learned in spontaneous activity is still elusive and poorly understood. Furthermore, while several network models have been proposed to learn Markovian dynamics via synaptic plasticity (Brea, et al. (2013); Pfister et al. (2004); Kappel et al. (2014)), they have been limited in a sense that the learned network does not show stochastic transition in a neural state space. For instance, while Kappel et al. demonstrated that STDP in winner-take-all circuits can approximate online learning of hidden Markov models (HMMs), a key distinction from our model is that their neural representations acquire deterministic sequential activations, rather than exhibiting stochastic transitions governing Markovian dynamics. Specifically, in their model, the neural representation of state B would be different in the sequences ABC and CBA, resulting in distinct deterministic representations like ABC and C'B'A', where ‘A’ and ‘A'’ are represented by different neural states (e.g., activations of different cell assemblies). In contrast, our network learns to generate stochastically transitioning cell assemblies that replay Markovian trajectories of spontaneous activity obeying the learned transition probabilities between neural representations of states. For example, starting from reactivation from assembly ‘A’, there may be an 80% probability to transition to assembly ‘B’ and 20% to ‘C’. Although Kappel et al.'s model successfully solves HMMs, their neural representations do not themselves stochastically transition between states according to the learned model. Similar to the Kappel et al.'s model, while the models proposed in Barber (2002) and Barber and Agakov (2002) learn the Markovian statistics, these models learned a static spatiotemporal input patterns only and how assemblies of neurons show stochastic transition in spontaneous activity has been still unclear. In contrast with these models, our model captures the probabilistic neural state trajectories, allowing spontaneous replay of experienced sequences with stochastic dynamics matching the learned environmental statistics.

      We have explained this point in ll.509-533 in the revised manuscript.

      Going into more detail, the first major weakness I see in this paper is the heuristic choice of learning rules. The paper studies Poisson spiking neurons (I return to this point below), for which learning rules can be derived from a statistical objective, typically maximum likelihood. For fully-visible networks, these rules take a simple form, similar in many ways to the E-to-E rule introduced by the authors. This more principled route provides quite a lot of additional understanding on what is to be expected from the learning process. 

      We thank the reviewer for pointing this out. To better demonstrate the function of our plasticity rules, we have included the derivation of the rules of synaptic plasticity in ll. 630-670 in the revised manuscript. Consistent with our claim that excitatory plasticity updates the excitatory synapse to predict output firing rates, we have shown that the corresponding cost function measures the discrepancy between the recurrent prediction and the output firing rate. Similarly, for inhibitory plasticity, we defined the cost function that evaluates the difference between the excitatory and inhibitory potential within each neuron. We showed that the resulting inhibitory plasticity rule updates the inhibitory synapses to maintain the excitation-inhibition balance.

      For instance, should maximum likelihood learning succeed, it is not surprising that the statistics of the training sequence distribution are reproduced. Moreover, given that the networks are fully visible, I think that the maximum likelihood objective is a convex function of the weights, which then gives hope that the learning rule does succeed. And so on. This sort of learning rule has been studied in a series of papers by David Barber and colleagues [refs. 1, 2 below], who applied them to essentially the same problem of reproducing sequence statistics in recurrent fully-visible nets. It seems to me that one key difference is that the authors consider separate E and I populations, and find the need to introduce a balancing I-to-E learning rule. 

      The reviewer’s understanding that inhibitory plasticity to maintain EI balance is one of a critical difference from previous works is correct. However, we believe that the most striking point of our study is that we have shown numerically that predictive plasticity rules enable recurrent networks to learn and replay the assembly activations whose transition statistics match those of the evoked activity. Please see our reply above.

      Because the rules here are heuristic, a number of questions come to mind. Why these rules and not others - especially, as the authors do not discuss in detail how they could be implemented through biophysical mechanisms? When does learning succeed or fail? What is the main point being conveyed, and what is the contribution on top of the work of e.g. Barber, Brea, et al. (2013), or Pfister et al. (2004)? 

      Our proposed plasticity mechanism could be implemented through somatodendritic interactions. Analogous to previous computational works (Senn, Asabuki), our model suggests that somatic responses may encode the stimulusevoked neural activity states, while dendrites encode predictions based on recurrent dynamics that aim to minimize the discrepancy between somatic and dendritic activity. To directly test this hypothesis, future experimental studies could simultaneously record from both somatic and dendritic compartments to investigate how they encode evoked responses and predictive signals during learning.

      To address the point of the reviewer, we conducted addionnal simulations to test where the model fails. We found that the model with our plasticity rule applied to all synapses only showed faint replays and failed to replay the appropriate transition (Supplementary Fig. 7b). This result is reasonable because the inhibitory neurons were no longer driven by the synaptic currents reflecting the stimulus, due to a tight balance of excitatory and inhibitory currents on the inhibitory neurons. Our model predicts that mixed selectivity in the inhibitory population is crucial to learn an appropriate transition statistics (Supplementary Fig. 7d). Future work should clarify the role of synaptic plasticity on inhibitory neurons, especially plasticity at I to I synapses. We have explained this result as new supplementary Figure7 in the revised manuscript.

      The use of a Poisson spiking neuron model is the second major weakness of the study. A chief challenge in much of the cited work is to generate stochastic transitions from recurrent networks of deterministic neurons. The task the authors set out to do is much easier with stochastic neurons; it is reasonable that the network succeeds in reproducing Markovian sequences, given an appropriate learning rule. I believe that the main point comes from mapping abstract Markov states to assemblies of neurons. If I am right, I missed more analyses on this point, for instance on the impact that varying cell assembly size would have on the findings reported by the authors.

      The reviewer’s understanding is correct. Our main point comes from mapping Markov statistics to replays of cell assemblies. In the revised manuscript, we performed additional simulations to ask whether varying the size of the cell assemblies would affect learning. We ran simulations with two different configurations in the task shown in Figure 2. The first configuration used three assemblies with a size ratio of 1:1.5:2. After training, these assemblies exhibited transition statistics that closely matched those of the evoked activity (Supplementary Fig.4a,b). In contrast, the second configuration, which used a size ratio of 1:2:3, showed worse performance compared to the 1:1.5:2 case (Supplementary Fig.4c,d). These results suggest that the model can learn appropriate transition statistics as long as the size ratio of the assemblies is not drastically varied.

      Finally, it was not entirely clear to me what the main fundamental point in the HVC data section was. Can the findings be roughly explained as follows: if we map syllables to cell assemblies, for high-uncertainty syllable-to-syllable transitions, it becomes harder to predict future neural activity? In other words, is the main point that the HVC encodes syllables by cell assemblies? 

      The reviewer's understanding is correct. We wanted to show that if the HVC learns transition statistics as a replay of cell assemblies, a high-uncertainty syllable-to-syllable transition would make predicting future reactivations more difficult, since trial-averaged activities (i.e., poststimulus activities; PSAs) marginalized all possible transitions in the transition diagram.

      (1) Learning in Spiking Neural Assemblies, David Barber, 2002. URL: https://proceedings.neurips.cc/paper/2002/file/619205da514e83f869515c782a328d3c-Paper.pdf  

      (2) Correlated sequence learning in a network of spiking neurons usingmaximum likelihood, David Barber, Felix Agakov, 2002. URL: http://web4.cs.ucl.ac.uk/staff/D.Barber/publications/barber-agakovTR0149.pdf  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      In more detail: 

      A) Theoretical analysis 

      The plasticity rules in the study are introduced with a vague reference to previous theoretical studies of others. Doing this, one does not provide any formal insight as to why these plasticity rules should enable one to learn to solve the intended task, and whether they are optimal in some respect. This becomes noticeable, especially in the discussion of the importance of inhibitory balance, which does not go into any detail, but rather only states that its required, both in the results and discussion sections. Another unclarity appears when error-based learning is discussed and compared to Hebbian plasticity, which, as you state, "alone is insufficient to learn transition probabilities". It is not evident how this claim is warranted, nor why error-based plasticity in comparison should be able to perform this (other than referring to the simulation results). Please either clarify formally (or at least intuitively) how plasticity rules result in the mentioned behavior, or alternatively acknowledge explicitly the (current) lack of intuition. 

      The lack of formal discussion is a relevant shortcoming compared to previous research that showed very similar results with formally more rigorous and principled approaches. In particular, Kappel et al derived explicitly how neural networks can learn to sample from HMMs using STDP and winner-take-all dynamics. Even though this study has limitations, the relation with respect to that work should be made very clear; potentially the claims of novelty of some results (sampling) should be adjusted accordingly. See also Yanping Huang, Rajesh PN Rao (NIPS 2014), and possibly other publications. While it might be difficult to formally justify the learning rules post-hoc, it would be very helpful to the field if you very clearly related your work to that of others, where learning rules have been formally justified, and elaborate on the intuition of how the employed rules operate and interact (especially for inhibition). 

      Lastly, while the importance of sampling learned transition probabilities is discussed, the discussion again remains on a vague level, characterized by the lack of references in the relevant paragraphs. Ideally, there should be a proof of concept or a formal understanding of how the learned behaviour enables to solve a problem that is not solved by deterministic networks. Please incorporate also the relation to the literature on neural sampling/planning/RL etc. and substantiate the claims with citations. 

      We have included sentences in ll. 691-696 in the revised manuscript to explain that for Poisson spiking neurons, the derived learning rule is equivalent to the one that minimizes the Kullback-Leibler divergence between the distributions of output firing and the dendritic prediction, in our case, the recurrent prediction (Asabuki and Fukai; 2020). Thus, the rule suggests that the recurrent prediction learns the statistical model of the evoked activity, which in turn allows the network to reproduce the learned transition statistics.

      We have also added a paragraph to discuss the differences between previously published similar models (e.g., Kappel et al.). Please see our response above.

      B) Connection to biology 

      The plasticity rules in the study are introduced with a vague reference to previous theoretical studies of others. Please discuss in more detail if these rules (especially the error-based learning rule) could be implemented biologically and how this could be achieved. Are there connections to biologically observed plasticity? E.g. for error-based plasticity has been discussed in the original publication by Urbanzcik and Senn, or more recently by Mikulasch et al (TINS 2023). The biological plausibility of inhibitory balance has been discussed many times before, e.g. by Vogels and others, and a citation would acknowledge that earlier work. This also leaves the question of how neurons in the songbird experiment could adapt and if the model does capture this well (i.e., do they exhibit E-I balance? etc), which might be discussed as well. 

      Last, please provide some testable experimental predictions. By proposing an interesting experimental prediction, the model could become considerably more relevant to experimentalists. Also, are there potentially alternative models of stochastic sequence learning (e.g., Kappel et al)? How could they be distinguished? (especially, again, why not Hebbian/STDP learning?) 

      We have cited the Vogels paper to acknowledge the earlier work. We have also included additional paragraphs to discuss a possible biologically plausible implementation of our model and how our model differs from similar models proposed previously (e.g., Kappel et al.). Please see our response above.

      Other comments 

      As mentioned, a derivation of recurrent plasticity rules is missing, and parameters are chosen ad-hoc. This leaves the question of how much the results rely on the specific choice of parameters, and how robust they are to perturbations. As a robustness check, please clarify how the duration of the Markov states influences performance. It can be expected that this interacts with the timescale of recurrent connections, so having longer or shorter Markov states, as it would be in reality, should make a difference in learning that should be tested and discussed.

      We thank the reviewer for pointing this out. To address this point, we performed new simulations and asked to what extent the duration of Markov states affect performance. Interestingly, even when the network was trained with input states of half the duration, the distributions of the durations of assembly reactivations remain almost identical to those in the original case (Supplementary Figure 3a). Furthermore, the transition probabilities in the replay were still consistent with the true transition probabilities (Supplementary Figure 3b). We have also included the derivation of our plasticity rule in ll. 630-670 in the revised manuscript. 

      Similarly, inhibitory plasticity operates with the same plasticity timescale parameter as excitatory plasticity, but, as the authors discuss, lags behind excitatory plasticity in simulation as in experiment. Is this required or was the parameter chosen such that this behaviour emerges? Please clarify this in the methods section; moreover, it would be good to test if the same results appear with fast inhibitory plasticity. 

      We have performed a new simulation and showed that even when the learning rate of inhibitory plasticity was larger than that of excitatory plasticity, inhibitory plasticity still occurred on a slower timescale than excitatory plasticity. We have included this result in a new Supplementary Figure 2 in the revised manuscript.

      What is the justification (biologically and theoretically) for the memory trace h and its impact on neural spiking? Is it required for the results or can it be left away? Since this seems to be an important and unconventional component of the model, please discuss it in more detail. 

      In the model, it is assumed that each stimulus presentation drives a specific subset of network neurons with a fixed input strength, which avoids convergence to trivial solutions. Nevertheless, we choose to add this dynamic sigmoid function to facilitate stable replay by regulating neuron activity to prevent saturation. We have explained this point in ll.605-611 in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors): 

      I noticed a couple of minor typos: 

      Page 3 "underly"->"underlie" 

      Page 7 "assemblies decreased settled"->"assemblies decreased and settled"

      We have modified the text. We thank the reviewer for their careful review.

      I think Figure 1C is rather confusing and not intuitive. 

      We apologize that the Figure 1C was confusing. In the revised figure, we have emphasized the flow of excitatory and inhibitory error for updating synapses.

      Reviewer #3 (Recommendations For The Authors): 

      One possible path to improve the paper would be to establish a relationship between the proposed learning rules and e.g. the ones derived by Barber. 

      When reading the paper, I was left with a number of more detailed questions I omitted from the public review: 

      (1) The authors introduce a dynamic sigmoidal function for excitatory neurons, Eq. 3. This point requires more discussion and analysis. How does this impact the results? 

      In the model, it is assumed that each stimulus presentation drives a specific subset of network neurons with a fixed input strength, which avoids convergence to trivial solutions. Nevertheless, we choose to add this dynamic sigmoid function to facilitate stable replay by regulating neuron activity to prevent saturation. We have explained this point in ll.605-611 in the revised manuscript.

      (2) For Poisson spiking neurons, it would be great to understand what cell assemblies bring (apart from biological realism, i.e., reproducing data where assemblies can be found), compared to self-connected single neurons. For example, how do the results shown in Figure 2 depend on assembly size? 

      We have changed the cell assembly size ratio and how it affects learning performance in a new Supplementary Figure 4. Please see our reply above.

      (3) The authors focus on modeling spontaneous transitions, corresponding to a highly stochastic generative model (with most transition probabilities far from 1). A complementary question is that of learning to produce a set of stereotypical sequences, with probabilities close to 1. I wondered whether the learning rules and architecture of the model (in particular under the I-to-E rule) would also work in such a scenario. 

      We thank the reviewer for pointing this out. In fact, we had the same question, so we considered a situation in which the setting in Figure 2 includes both cases where the transition matrix is very stochastic (prob=0.5) and near deterministic (prob=0.9).

      (4) An analysis of what controls the time so that the network stays in a certain state would be welcome. 

      We trained the network model in two cases, one with a fast speed of plasticity and one with a slow speed of plasticity. As a result, we found that the duration of assembly becomes longer in the slow learning case than in the fast case. We have included these results as Supplementary Figure 5 in the revised manuscript.

      Regarding the presentation, given that this is a computational modeling paper, I wonder whether *all* the formulas belong in the Methods section. I found myself skipping back and forth to understand what the main text meant, mainly because I missed a few key equations. I understand that this is a style issue that is very much community-dependent, but I think readability would improve drastically if the main model and learning rule equations could be introduced in the main text, as they start being discussed. 

      We thank the reviewer for the suggestion. To cater to a wider audience, we try to explain the principle of the paper without using mathematical formulas as much as possible in the main text.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to quantify feral pig interactions in eastern Australia to inform disease transmission networks. They used GPS tracking data from 146 feral pigs across multiple locations to construct proximity-based social networks and analyze contact rates within and between pig social units.

      Strengths:

      (1) Addresses a critical knowledge gap in feral pig social dynamics in Australia.

      (2) Uses robust methodology combining GPS tracking and network analysis.

      (3) Provides valuable insights into sex-based and seasonal variations in contact rates.

      (4) Effectively contextualizes findings for disease transmission modeling and management.

      (5) Includes comprehensive ethical approval for animal research.

      (6) Utilizes data from multiple locations across eastern Australia, enhancing generalizability.

      Weaknesses:

      (1) Limited discussion of potential biases from varying sample sizes across populations

      This is a really good comment, and we will address this in the discussion as one of the limitations of the study.

      (2) Some key figures are in supplementary materials rather than the main text.

      We will move some of our supplementary material to the main text as suggested.

      (3) Economic impact figures are from the US rather than Australia-specific data.

      We included the impact figures that are available for Australia (for FDM), and we will include the estimated impact of ASF in Australia in the introduction.

      (4) Rationale for spatial and temporal thresholds for defining contacts could be clearer.

      We will improve the explanation of why we chose the spatial and temporal thresholds based on literature, the size of animals and GPS errors.

      (5) Limited discussion of ethical considerations beyond basic animal ethics approval.

      This research was conducted under an ethics committee's approval for collaring the feral pigs. This research is part of an ongoing pest management activity, and all the ethics approvals have been highlighted in the main manuscript.

      The authors largely achieved their aims, with the results supporting their conclusions about the importance of sex and seasonality in feral pig contact networks. This work is likely to have a significant impact on feral pig management and disease control strategies in Australia, providing crucial data for refining disease transmission models.

      Reviewer #2 (Public review):

      Summary:

      The paper attempts to elucidate how feral (wild) pigs cause distortion of the environment in over 54 countries of the world, particularly Australia.

      The paper displays proof that over $120 billion worth of facilities were destroyed annually in the United States of America.

      The authors have tried to infer that the findings of their work were important and possess a convincing strength of evidence.

      Strengths:

      (1) Clearly stating feral (wild) pigs as a problem in the environment.

      (2) Stating how 54 countries were affected by the feral pigs.

      (3) Mentioning how $120 billion was lost in the US, annually, as a result of the activities of the feral pigs.

      (4) Amplifying the fact that 14 species of animals were being driven into extinction by the feral pigs.

      (5) Feral pigs possessing zoonotic abilities.

      (6) Feral pigs acting as reservoirs for endemic diseases like brucellosis and leptospirosis.

      (7) Understanding disease patterns by the social dynamics of feral pig interactions.

      (8) The use of 146 GPS-monitored feral pigs to establish their social interaction among themselves.

      Weaknesses:

      (1) Unclear explanation of the association of either the female or male feral pigs with each other, seasonally.

      This will be better explain in the methods.

      (2) The "abstract paragraph" was not justified.

      We have justified the abstract paragraph as requested by the reviewer.

      (3) Typographical errors in the abstract.

      Typographical errors have been corrected in the Abstract.

      Reviewer #3 (Public review):

      Summary:

      The authors sought to understand social interactions both within and between groups of feral pigs, with the intent of applying their findings to models of disease transmission. The authors analyzed GPS tracking data from across various populations to determine patterns of contact that could support the transmission of a range of zoonotic and livestock diseases. The analysis then focused on the effects of sex, group dynamics, and seasonal changes on contact rates that could be used to base targeted disease control strategies that would prioritize the removal of adult males for reducing intergroup disease transmission.

      Strengths:

      It utilized GPS tracking data from 146 feral pigs over several years, effectively capturing seasonal and spatial variation in the social behaviors of interest. Using proximity-based social network analysis, this work provides a highly resolved snapshot of contact rates and interactions both within and between groups, substantially improving research in wildlife disease transmission. Results were highly useful and provided practical guidance for disease management, showing that control targeted at adult males could reduce intergroup disease transmission, hence providing an approach for the control of zoonotic and livestock diseases.

      Weaknesses:

      Despite their reliability, populations can be skewed by small sample sizes and limited generalizability due to specific environmental and demographic characteristics. Further validation is needed to account for additional environmental factors influencing social dynamics and contact rates

      This is a good point, and we thank the reviewer for pointing out this issue. We will discuss the potential biases due to sample size in our discussion. We agree that environmental factors need to be incorporated and tested for their influence on social dynamics, and this will be added to the discussion as we have plans to expand this research and conduct, the analysis to determine if environmental factors are influencing social dynamics.

    1. Author response:

      Reviewer #1:

      (1) This concern is addressed in the ESM6, and partly in the ESM1. Indeed, many of the concerns raised by the reviewer later are already addressed on the multiple supplementary materials provided, so we kindly ask the reviewer to read them before moving forward into the discussion.

      (2) This concern is reasonable, but its solution is not "extremely easy", as the reviewer states. The reviewer indicates the use of captive-based versus non-captive-based sources, remarking maximum lifespan, the main variable that is clearly expected to be systematically biased by the source of the data. Nevertheless, except for the ZIMS database, which includes only captive individuals, and some sources, as CNRS databases and EURING, which exclusively includes wild populations, the remaining databases, which are indeed where the vast majority of the data was collected from (i.e. Amniotes database, Birds of the World and AnAge) do not make any distinction. This means that they include just the maximum lifespan from the species as known by the authors of such databases' entries, regardless of provenance, which is also not usually made explicit by the database. Therefore, correcting for this would imply checking all the primary sources. Considering that these databases sometimes do not cite the primary source, but a secondary one, and that on several occasions such source is a specialized book that is not easily accessible, and still these referenced datasets may not indicate the source of the data, tracing all of this information becomes an arduous task, that would even render the usage of databases themselves useless. We will include some details about the concerns of database usage in the discussion to address this.

      Furthermore, it remains relevant to indicate that what we discuss later about the possible effects of captivity is about our usage of animals that come from both sources, not about the provenance of the literature-extracted data used (i.e. captive or wild maximum lifespan, for example), which is an independent matter. We can test for the first for next submission, but very difficultly could we test for the second (as the reviewer seems to be pointing to). In any case, as we do not have in any case the same species from both a captive and a wild source, it would be difficult to determine if the effect tested comes from captivity or from species-specific differences.

      (3) We will add data on the replicability of the glycation measurement in the next manuscript version. The CV for several individuals of different species measured repeated times is quite low (always below 2%).

      (4) The reviewer remarks reported here are already addressed on the supplementary material (ESM6), given the lack of space in the main manuscript. We therefore kindly ask the reviewer to read the supplementary material added to the submission. If the editors agree, all or a considerable part of this could be transferred to the main text for clarity, but this would severely extend the length of a text that the reviewer already considered very long.

      Reviewer #2:

      Thanks for spotting this issue with the coefficient, as it is actually a redaction mistake. It is a remnant of a previous version of the manuscript in which a log-log relation was performed instead. Previous reviewers raised concerns about the usage of log transformation for glycation, this variable being (theoretically) a proportion variable (to which we argue that it does not behave as such), which they considered not to be transformed with a logarithm. After this, we still finally took the decision of not to transform this variable. In this line, the transformations of variables were decided generally by preliminary data exploration. In this particular case, both approaches lead to the same conclusion of higher glycation resistance in the species with higher glucose. Nevertheless, we will consider exploring the comparison of different versions for the resubmission.

      About the issue related to handling time, this variable is not available, for the reasons already exposed in the answer to the other reviewer. Moreover, Kruskal-Wallis test, by its nature, does not determine differences in medians between groups per se, as the reviewer claims, but just differences in ranks-sums. It can be equivalently used for that purpose when the groups' distributions are similar, but not when they differ, as we see here with a difference in variance. What a significant outcome in a Kruskal-Wallis test tells us, thus, is just that the groups differ (in their ranks-sums), which here is plausibly caused by the higher variance in the stressed individuals. Even if we conclude that the average is higher in those groups, mere comparisons of averages for groups with very different variances render different interpretations than when homoscedasticity is met, particularly more so when the distribution of groups overlaps. For example, in a case like this, where the data is left censored (glucose levels cannot be lower than 0), most of this higher variance is related to many values in the stressed groups lying above all the baseline values. This, of course, would increase the average, but such a parameter would not mean the same as if the distributions did not overlap.

      Regarding the GVIFs, why the values are above 1.6 is not well known, but we do not consider this a major concern, as the values are never above 2.2, level usually considered more worrying. We will include a brief explanation of this in the results section. Also, we explicitly calculated life history variables adjusted for body mass, which should eliminate their otherwise strong correlation. There exist other biological and interpretational reasons justified in the ESM6 for using the residuals on the models, instead of the raw values, despite previously raised concerns.

      Given the asseveration by the reviewer that credible intervals are not to be used for the post hoc comparisons, as this is what the whiskers shown in Figure 4B represent, the affirmation of this graph suggesting any difference between groups remains doubtful. New comparisons have now been made with the function HPDinterval() applied to the differences between each diet category calculated from the posterior values of each group, confirming no significant differences exist.

      We do not understand the suggestion made in relation to the model shown in Table 2. Removing glucose from the model could have two results, as the reviewer indicates: 1. Maximum lifespan (ML) relates with glycation, potentially spuriously through the effect of glucose (in this case not included) on both; 2. ML does not relate to glycation, and therefore "high glycation levels do not preclude the evolution of long lifespans", which is what we are already showing with the current model, which also controls for glucose, in an attempt to determine if not just raw glycation values, but glycation resistance, relates to longevity. This is intended to asses if long-lived species may show mechanisms that avoid glycation, by showing levels lower than expected for a non-enzymatic reaction.

    1. Author response:

      In this manuscript, we have addressed one of the possible modes of recruitment of Swi6 to the putative heterochromatin loci.

      Our investigation was guided by earlier work showing ability of HP1 a to bind to a class of RNAs and the role of this binding in recruitment of HP1a to heterochromatin loci in mouse cells (Muchardt et al). While there has been no clarity about the mechanism of Swi6 recruitment given the multiple pathways being involved, the issue is compounded by the overall lack of understanding as to how Swi6 recruitment occurs only at the repeat regions. At the same time, various observations suggested a causal role of RNAi in Swi6 recruitment.

      Thus, guided by the work of Muchardt et al we developed a heuristic approach to explore a possibly direct link between Swi6 and heterochromatin through RNAi pathway. Interestingly, we found that the lysine triplet found in the hinge domain in HP1, which influences its recruitment to heterochromatin in mouse cells, is also present in the hinge domain of Swi6, although we were cautious, keeping in mind the findings of Keller et al showing another role of Swi6 in binding to RNAs and channeling them to the exosome pathway. 

      Accordingly, we envisaged that a mode of recruitment of Swi6 through binding to siRNAs to cognate sites in the dg-dh repeats shared among mating type, centromere and telomere loci could explain specific recruitment as well as inheritance following DNA replication. In accordance we framed the main questions as follows: i) Whether Swi6 binds specifically and with high affinity to the siRNAs and the cognate siRNA-DNA hybrids and whether the Swi63K-3A mutant is defective in this binding, ii) whether this lack of binding of Swi63K-3A affects its localization to heterochromatin, iii) whether the this specificity is validated by binding of Swi6 but not Swi63K-3A  to siRNAs and siRNA-DNA hybrids in vivo and iv) whether the binding mode was qualitatively and quantitatively different from that of Cen100 RNA or random RNAs, like GFP RNA.

      We think that our data provides answers to these lines of inquiry to support a model wherein the Swi6-siRNA mediated recruitment can explain a cis-controlled nucleation of heterochromatin at the cognate sites in the genome. We have also partially addressed the points raised by the study by Keller et al by invoking a dynamic balance between different modes of binding of Swi6 to different classes of RNA to exercise heterochromatin formation by Swi6 under normal conditions and RNA degradation under other conditions.

      While we aver about our hypothesis, we do acknowledge the need for more detailed investigation both to buttress our hypothesis and address the dynamics of siRNA binding and recruitment of Swi6  and how Swi6 functions fit in the context of other components of heterochromatin assembly, like the HDACs and Clr4 on one hand and exosome pathway on the other. Our future studies will attempt to address these issues.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript explores the RNA binding activities of the fission yeast Swi6 (HP1) protein and proposes a new role for Swi6 in RNAi-mediated heterochromatin establishment. The authors claim that Swi6 has a specific and high affinity for short interfering RNAs (siRNAs) and recruits the Clr4 (Suv39h) H3K9 methyltransferases to siRNA-DNA hybrids to initiate heterochromatin formation. These claims are not in any way supported by the incomplete and preliminary RNA binding or the in vivo experiments that the authors present. The proposed model also lacks any mechanistic basis as it remains unclear (and unexplored) how Swi6 might bind to specific small RNA sequences or RNA-DNA hybrids. Work by several other groups in the field has led to a model in which siRNAs produced by the RNAi pathway load onto the Ago1-containing RITS complex, which then binds to nascent transcripts at pericentromeric DNA repeats and recruits Clr4 to initiate heterochromatin formation. Swi6 facilitates this process by promoting the recruitment of the RNA-dependent RNA polymerase leading to siRNA amplification.

      Weaknesses:

      (1) a) The claims that Swi6 binds to specific small RNAs or to RNA-DNA hybrids are not supported by the evidence that the authors present. Their experiments do not rule out non-specific charged-based interactions.

      We disagree. We have used synthetic siRNAs of 20-22 nt length to do EMSA assay, as mentioned in the manuscript. Further, we have sequenced the small RNAs obtained after RIP experiments to validate the enrichment of siRNA in Swi6 bound fraction as compared to the mutant Swi6-bound fraction. These results are internally consistent regardless of the mode of binding. In any case the binding occurs primarily through the chromodomain although it is influenced by the hinge domain (see below).

      Furthermore, we have carried out EMSA experiments using Swi6 mutants carrying all three possible double mutations of the K residues in the KKK triplet and found that there was no difference in the binding pattern as compared to the wt Swi6: only the triple mutant “3K-3A” showed the effect. These results suggest that that the bdining is not completely dependent on the basic residues. These results will be included in the revised version.

      We also have some preliminary data from SAXS study showing that the CD of wt Swi6 shows a change in its structure upon binding to the siRNA, while the “3K-3A” mutant of Swi6 has a compact, folded structure that occludes the binding site of Swi6 in the chromodomain.” We propose to mention this preliminary finding in the revised version as unpublished data.

      b) Claims about different affinities of Swi6 for RNAs of different sizes are based on a comparison of KD values derived by the authors for a handful of S. pombe siRNAs with previous studies from the Buhler lab on Swi6 RNA binding. The authors need to compare binding affinities under identical conditions in their assays.

      Thus, the EMSA data do suggest sequence specificity in binding of Swi6 to specific siRNA sequences (Figure S5) and implies specific residues in Swi6 being responsible for that. Thus, Identification of the residues in Swi6 involved in siRNA binding in the CD would definitely be interesting, as also the experimental confirmation of the consensus siRNA sequence. It may however be noted that as against the binding of Swi6 to siRNAs occurs through CD, that of Cen100 or GFP RNA was shown be through the hinge domain by Keller et al.

      The estimation of Kd by the Buhler group was based on NMR study, which we are not in a position to perform in the near future. Nonetheless, we did carry out EMSA study using the ‘Cen100’ RNA, same as the one used by the Keller et al study. Surprisingly, in contrast with the result of EMSA in agarose gel showing binding of Swi6 to “Cen100” RNA as reported by Keller et al, we fail to observe any binding in EMSA done in acrylamide gel. (The same is true of the RevCen 100). While this raises issues of why the Keller et al chose to do EMSA in agarose gel instead of the conventional approach of using acrylamide gel, it does lend support to our claim of stronger binding of Swi6 to siRNAs. Another relevant observation of binding of Swi6 to the “RevCen” RNA precursor RNAs but a detectable binding to siRNAs denoted as VI-IX (as measured by competition experiments, that are derived from RevCen RNA; Figure S4 and S7), which are derived by Dcr1 cleavage of the ‘’RevCen’’ RNA.

      We also disagree that we carried out EMSA with a small bunch of siRNAs. As indicated in Figure 1 and S1, we synthesized nearly 12 siRNAs representing the dg-dh repeats at Cen, mat and tel loci and measured their specificity of binding to Swi6 using EMSA assay by labeling the ones labelled “D”, “E” and “V” directly and those of the remaining ones by the latter’s ability to compete against the binding (Figure 1, S4). These results point to presence of a consensus sequence in siRNAs that shows highly specific and strong binding to Swi6 in the low micromolar range.

      Further, our claim of binding of Swi6 and not Swi63K>3A to siRNA in vivo is validated by RIP experiments, as shown in Fig 2 and S9.

      c) The regions of Swi6 that bind to siRNAs need to be identified and evidence must be provided that Swi6 binds to RNAs of a specific length, 20-22 mers, to support the claim that Swi6 binds to siRNAs. This is critical for all the subsequent experiments and claims in the study.

      We have provided both in vitro data, which is va;idiated in vivo by RIP experiments, as mentioned above. However, we agree that it wpuld be very interesting to identify the residues in Swi6 chromdomain responsible for binding to siRNA. However, such an investigation is beyond the scope of the present study.

      (2) a) The in vivo results do not validate Swi6 binding to specific RNAs, as stated by the authors. Swi6 pulldowns have been shown to be enriched for all heterochromatic proteins including the RITS complex. The sRNA binding observed by the authors is therefore likely to be mediated by Ago1/RITS.

      We disagree with the first comment. Our RIP experiments do validate the in vitro results (Fig 1, 2, S4 and S9), as argued above. The observation alluded to by the reviewer “Swi6 pulldowns have been shown to be enriched for all heterochromatic proteins including the RITS complex” is not inconsistent with our observation; it is possible that the siRNA may be released from the RITS complex and transferred to Swi6, possibly due to its higher affinity.

      Thus, we would like to suggest that the role of Swi6 is likely to be coincidental or subsequent to that of Ago1/RITS (see below). We think that the binding by Swi6 to the siRNA and siRNA-DNA hybrid and could be also carried out in cis at the level of siRNA-DNA hybrids.

      This point needs to be addressed in future studies.

      b) Most of the binding in Figure S8C seems to be non-specific.

      We would like to point out that the result in Figure S8C needs to be examined together with the Figure S8B, which shows RNA bound by Swi6 but not Swi63K-3A to hybridize with dg, dh and dh-k probes.

      c) In Figure S8D, the authors' data shows that Swi6 deletion does not derepress the rev dh transcript while dcr1 delete cells do, which is consistent with previous reports but does not relate to the authors' conclusions.

      The purpose of results shown in Figure S8D is just to compare the results of Swi6 with that of Swi63K-3A.

      d) Previous results have shown that swi6 delete cells have 20-fold fewer dg and dh siRNAs than swi6+ cells due to decreased RNA-dependent RNA polymerase complex recruitment and reduced siRNA amplification.

      This result is consistent with our results invoking a role of Swi6 in binding to, protecting and recruiting siRNAs to homologous sites.

      To find if the overall production of siRNA is compromised in swi6 3K->3A mutant, we i) calculated the RIP-Seq read counts for swi6 3K->3A , swi6+ and vector control in 200 bp genomic bins , ii) divided the Swi6 3K->3A and swi6+ signals by that of control, iii) removed the background using the criteria of signal value < 25% of max signal, and iv) counted the total reads (in excess to control) in all peak regions in both samples.  This revealed a total count of 10878 and 8994 respectively for Swi6 3K->3A  and swi6+ samples, possibly implying that the overall siRNA production is not compromised in the Swi6 3K->3A mutant.

      (3) a) The RIP-seq data are difficult to interpret as presented. The size distribution of bound small RNAs, and where they map along the genome should be shown as for example presented in previous Ago1 sRNA-seq experiments.

      Please see the response to 2(d).

      b) It is also unclear whether the defects in sRNA binding observed by the authors represent direct sRNA binding to Swi6 or co-precipitation of Ago1-bound sRNAs.

      The correspondence between our in vivo and in vitro results suggests that the binding to Swi6 would be direct. We do not observe a complete correspondence between the Swi6- and Ago-bound siRNAs. We think Swi6 binding may be coincident with or following RITS complex formation.

      This point will be discussed in the Revision.

      The authors should also sequence total sRNAs to test whether Swi6-3A affects sRNA synthesis, as is the case in swi6 delete cells.

      Please see response to 2(d) above.

      (4) The authors examine the effects of Swi6-3A mutant by overexpression from the strong nmt1 promoter. Heterochromatin formation is sensitive to the dosage of Swi6. These experiments should be performed by introducing the 3A mutations at the endogenous Swi6 locus and effects on Swi6 protein levels should be tested.

      Although we agree, we think that the heterochromatin formation is occurring in presence of nmt1-driven Swi6 but not Swi63K>3A, as indicated by the phenotype and Swi6 enrichment at otr1R::ade6, imr1::ura4 and his3-telo (Figure 3) and mating type (Fig. S10). Furthermore, the both GFP-Swi6 and GFPSwi63K>3A are expressed at similar level (Fig. S8A).

      (5) The authors' data indicate an impairment of silencing in Swi6-3A mutant cells but whether this is due to a general lower affinity for nucleosomes, DNA, RNA, or as claimed by the authors, siRNAs is unclear. These experiments are consistent with previous findings suggesting an important role for basic residues in the HP1 hinge region in gene silencing but do not reveal how the hinge region enhances silencing.

      Our study aims to correlate the binding of Swi6 but not Swi63K-3A to siRNA with its localization to heterochromatin. A similar difference in binding of Swi6 but not Swi63K-3A to siRNA-DNA hybrid, together with sensitivity of silencing and Swi6 localization to heterochromatin to RNaseH support the above correlations as being causally connected.

      In terms of mechanism of binding, we need to clarify that the primary mode of binding is through the CD and not the hinge domain, although the hinge domain does influence this binding. This result is different from those of Keller et al.

      We have some structural data based on preliminary SAXS experiment supporting binding of siRNA to the CD and influence of the hinge domain on this binding. However, this line of investigation need to be extended and will be subject of future investigations.

      (6) RNase H1 overexpression may affect Swi6 localization and silencing indirectly as it would lead to a general reduction in R loops and RNA-DNA hybrids across the genome. RNaseH1 OE may also release chromatin-bound RNAs that act as scaffolds for siRNA-Ag1/RITS complexes that recruit Clr4 and ultimately Swi6.

      These are formal possibilities. However, the correlation between swi6 binding to siRNA-DNA hybrid and delocalization upon RNase H1 treatment argues for a more direct link.

      (7) Examples of inaccurate presentation of the literature.

      a) The authors state that "RNA binding by the murine HP1 through its hinge domains is required for heterochromatin assembly (Muchardt et al, 2002). The cited reference provides no evidence that HP1 RNA binding is required for heterochromatin assembly. Only the hinge region of bacterially produced HP1 contributes to its localization to DAPI-stained heterochromatic regions in fixed NIH 3T3 cells.

      Noted. Statement will be corrected.

      b) "... This scenario is consistent with the loss of heterochromatin recruitment of Swi6 as well as siRNA generation in rnai mutants (Volpe et al, 2002)." Volpe et al. did not examine changes in siRNA levels in swi6 mutant cells. In fact, no siRNA analysis of any kind was reported in Volpe et al., 2002.

      Correct.  We only say that Swi6 recruitment is reduced in rnai mutants and correlate it with ability of SWi6 to bind to siRNA generated by RNAi and subsequently to siRNA-DNA hybrid.

      Reviewer #2 (Public review):

      The aim of this study is to investigate the role of Swi6 binding to RNA in heterochromatin assembly in fission yeast. Using in vitro protein-RNA binding assays (EMSA) they showed that Swi6/HP1 binds centromere-derived siRNA (identified by Reinhardt and Bartel in 2002) via the chromodomain and hinge domains. They demonstrate that this binding is regulated by a lysine triplet in the conserved region of the Swi6 hinge domain and that wild-type Swi6 favours binding to DNA-RNA hybrids and siRNA, which then facilitates, rather than competes with, binding to H3K9me2 and to a lesser extent H3K9me3.

      However, the majority of the experiments are carried out in swi6 null cells overexpressing wild-type Swi6 or Swi63K-3A mutant from a very strong promoter (nmt1). Both swi6 null cells and overexpression of Swi6 are well known to exhibit phenotypes, some of which interfere with heterochromatin assembly. This is not made clear in the text.

      We think that the argument is not valid as we show that swi6 but not Swi63K-3A could restore silencing at imr1::ura4, otr1::ade6 and his3-telo (Fig 3) and mating type (Fig. S10), when transformed into a swi6D strain.

      Whilst the RNA binding experiments show that Swi6 can indeed bind RNA and that binding is decreased by Swi63K-3A mutation in vitro (confusingly, they only much later in the text explained that these 3 bands represent differential binding and that II is likely an isotherm). The gels showing these data are of poor quality and it is unclear which bands are used to calculate the Kd.

      We disagree with the comment about the quality of EMSA data. We think it is of similar quality or better than that of Keller et al, except in some cases, like Fig 1D, a shorter exposure shown to distinguish the slowest shifted band has caused the remaining bands to look fainter.

      RNA-seq data shows that overall fewer siRNAs are produced from regions of heterochromatin in the Swi63K-3A mutant so it is unsurprising that analysis of siRNA-associated motifs also shows lower enrichment (or indeed that they share some similarities, given that they originate from repeat regions).

      Please see response to comment 2(d) of the first reviewer above.

      It is not clear which bands are being alluded to. However, we‘ll rectify any gaps in information in the revision.

      The experiments are seemingly linked yet fail to substantiate their overall conclusions. For instance, the authors show that the Swi63K-3A mutant displays reduced siRNA binding in vitro (Figure 1D) and that H3K9me2 levels at heterochromatin loci are reduced in vivo (Figure 3C-D). They conclude that Swi6 siRNA binding is important for Swi6 heterochromatin localization, whilst it remains entirely possible that heterochromatin integrity is impaired by the Swi63K-3A mutation and hence fewer siRNAs are produced and available to bind. Their interpretation of the data is really confusing.

      Our argument is that the lack of binding by Swi63K>3A to siRNA can explain the loss of recruitment to heterochromatin loci and thus affect the integrity of heterochroamtin; the recruitment of Swi6 can occur possibly by binding initially to siRNA and thereafter as siRNA-DNA hybrid. However, the overall level of siRNAs is not affected, as in 2(D) above. This interpretation is supported by results of ChIP assay and confocal experiments, as also by the effect of RNaseH1 in the recruitment of Swi6.

      The authors go on to show that Swi63K-3A cells have impaired silencing at all regions tested and the mutant protein itself has less association with regions of heterochromatin. They perform DNA-RNA hybrid IPs and show that Swi63K-3A cells which also overexpress RNAseH/rnh1 have reduced levels of dh DNA-RNA hybrids than wild-type Swi6 cells. They interpret this to mean that Swi6 binds and protects DNA-RNA hybrids, presumably to facilitate binding to H3K9me2. The final piece of data is an EMSA assay showing that "high-affinity binding of Swi6 to a dg-dh specific RNA/DNA hybrid facilitates the binding to Me2-K9-H3 rather than competing against it." This EMSA gel shown is of very poor quality, and this casts doubt on their overall conclusion.

      We do agree with the reviewer about the quality of EMSA (Fig. 5B). However, as may be noticed in the EMSA for siRNA-DNA hybrid binding  (Fig 4A), the bands of Swi6-bound siRNA-DNA hybrid are extremely retarded. Hence the EMSA for subsequent binding by H3-K9-Me peptides required a longer electrophoretic run, which led to reduction in the sharpness of the bands. Nevertheless, the data does indicate binding efficiency in the order H3K9-Me2> H3-K9-Me3 > H3-K9-Me0. Having said that, we plan to repeat the EMSA or address the question by other methods, like SPR.

      Unfortunately, the manuscript is generally poorly written and difficult to comprehend. The experimental setups and interpretations of the data are not fully explained, or, are explained in the wrong order leading to a lack of clarity. An example of this is the reasoning behind the use of the cid14 mutant which is not explained until the discussion of Figure 5C, but it is utilised at the outset in Figure 5A.

      We tend to agree somewhat and will attempt to submit a revised version with greater clarity, as also the explanation of experiment with cid14D strain.

      Another example of this lack of clarity/confusion is that the abstract states "Here we provide evidence in support of RNAi-independent recruitment of Swi6". Yet it then states "We show that...Swi6/HP1 displays a hierarchy of increasing binding affinity through its chromodomain to the siRNAs corresponding to specific dg-dh repeats, and even stronger binding to the cognate siRNA-DNA hybrids than to the siRNA precursors or general RNAs." RNAi is required to produce siRNAs, so their message is very unclear. Moreover, an entire section is titled "Heterochromatin recruitment of Swi6-HP1 depends on siRNA generation" so what is the author's message?

      The reviewer has correctly pointed out the error. Indeed, our results actually indicate an RNAi-dependent rather than independent mode of recruitment. Rather, we would like to suggest an H3-K9-Me2-indpendnet recruitment of Swi6. We will rectify this error in our revised manuscript.

      The data presented, whilst sound in some parts is generally overinterpreted and does not fully support the author's confusing conclusions. The authors essentially characterise an overexpressed Swi6 mutant protein with a few other experiments on the side, that do not entirely support their conclusions. They make the point several times that the KD for their binding experiments is far higher than that previously reported (Keller et al Mol Cell 2012) but unfortunately the data provided here are of an inferior quality and thus their conclusions are neither fully supported nor convincing.

      We have used the method of Heffler et al (2012) to compute the Kd from EMSA data.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      (1) This work investigates numerically the propagation of subthreshold waves in a model neural network that is derived from the C. elegans connectome. Using a scattering formalism and tight-binding description of the network -- approximations which are commonplace in condensed matter physics -- this work attempts to show the relevance of interference phenomena, such as wavenumber-dependent propagation, for the dynamics of subthreshold waves propagating in a network of electrical synapses.

      (2) The primary strength of the work is in trying to use theoretical tools from a far-away corner of fundamental physics to shed light on the properties of a real neural system. While a system composed of neurons and synapses is classical in nature, there are occasions in which interference or localization effects are useful for understanding wave propagation in complex media [review, van Rossum & Nieuwenhuizen, 1999]. However, it is expected that localization effects only have an impact in some parameter regimes and with low phase dissipation. The authors should have addressed the existence of this validity regime in detail prior to assuming that interference effects are important.

      The theoretical concept and tool used in this study are not situated in a far-away corner of fundamental physics but hold one of the central positions in condensed matter physics and statistical physics. In fact, the non-scientific statement about where the theoretical concept and tool employed by the researchers are positioned within the realm of fundamental physics is irrelevant. The fundamental physics governs the foundations of all natural phenomena, and thus it provides indispensable principles for interpreting not only neural systems but also all life phenomena. One such principle explored in our study is the interference and localization of waves.

      Specifically, in the third paragraph of the Introduction, we introduced that the interference effect of subthreshold oscillating waves, beyond being a theoretical possibility, is a phenomenon actually observed in neural tissue (Chiang and Durand, 2023; Gupta et al., 2016). Moreover, according to Devor and Yarom (2002), the propagation of subthreshold oscillations observed in the inferior olivary nucleus extended beyond a distance of 0.2 mm. Therefore, considering the propagation of subthreshold waves and the resulting interference in the connectome of C. elegans, which has a total body length of less than 1 mm, a diameter of about 0.08 mm, and most neurons distributed in the ring structure near its neck, provides sufficient validity for the initiation of theoretical and computational studies.

      The primary objective of our study is to investigate which regimes of signal transmission/localization and interference phenomena are valid within the network of electrical synapses in C. elegans, the only system for which the neural connectome structure is perfectly known. As the Reviewer rightly pointed out in the question, this is exactly the issue that the Reviewer is curious about. Therefore, the existence of this validity regime cannot be addressed prior to conducting the study but can only be identified as a result of performing the research. And we have conducted such a study.

      (3) An additional approximation that was made without adequate justification is the use of a tight-binding Hamiltonian. This can be a reasonable approximation, even for classical waves, in particular in the presence of high-quality-factor resonators, where most of the wave amplitude is concentrated on the nodes of the network, and nodes are coupled evanescently with each other. Neither of these conditions were verified for this study.

      The tight-binding Anderson Hamiltonian we used in this study originally consisted of the on-site energy at each node and the hopping matrix between nodes. When the on-site energy is relatively much more stable (i.e., has a large negative value) compared to the hopping matrix, most of the wave amplitude becomes concentrated on the nodes as the Reviewer mentioned. However, as is well-known from reference papers (Anderson, 1958; Chang et al., 1995; Meir et al., 1989; Shapir et al., 1982; Thomas and Nakanishi, 2016), in this study, we also removed the on-site energy to prevent the waves from being concentrated on the nodes. Therefore, the tight-binding Hamiltonian we used in this study ensures that waves propagate through edges in the network where the values of the hopping matrix exist.

      To assist the Reviewer in better understanding the model used in this study, we provide additional explanations as follows. In the manuscript, we have already provided detailed descriptions of the setup using the tight-binding Anderson Hamiltonian in the Method section under “Construction of our circuit model” and the explanation of Figure 1. In the model we used, the edges represented by solid lines are perfect conductors, while the dotted lines representing gap junctions act as potential barriers (Fig. 1B). Therefore, when electric signals propagate, we are dealing with the phenomenon where signals transmitted through the edges encounter potential barriers, causing scattering or attenuation. The model described by the Reviewer is indeed a commonly used model in condensed matter physics, but we did not use the exact model mentioned by the Reviewer. Instead, as is common in well-known reference papers, we modified it to suit our purposes. We hope this explanation helps the Reviewer gain a better understanding.

      (4) The motivation for this work is to understand the basic mechanisms underlying subthreshold intrinsic oscillations in the inferior olive, but detailed connectivity patterns in this brain area are not available. The connectome is known for C elegans, but sub-threshold oscillations have not been observed there, and the implications of this work for C elegans neuroscience remain unclear. The authors should also give more evidence for the claim that their study may give a mechanism for synchronized rhythmic activity in the mammalian inferior olive nucleus, or refrain from making this conclusion.

      We agree with the Reviewer's point. In this study, we do not provide additional analysis on the mammalian inferior olive nucleus beyond what is already known from previous research. What we intended to discuss in the Discussion section was to suggest that within our model, there is a “possibility” that a group of cells exchanging wave signals of a specific wavenumber with high transmittance may show synchronized rhythmic activity. Therefore, to avoid any misunderstanding for the reader, we have revised the corresponding sentence in the Discussion as follows.

      In the Discussion, “The plausible possibility according to our model study is that the constructive interference of subthreshold membrane potential waves with a specific wavenumber may generate the synchronized rhythmic activation.

      (5) In the same vein, since the work emphasizes the dependence on the wavenumber for the propagation of subthreshold oscillations, they should make an attempt at estimating the wavenumber of subthreshold oscillations in C elegans if they were to exist and be observed. Next, the presence of two "mobility edges" in the transmission coefficient calculated in this work is unmistakably due to the discrete nature of the system, coming from the tight-binding approximation, and it is unclear if this approximation is justified in the current system.

      In this study, we modeled the propagation of subthreshold waves on the electrical synapse network of C. elegans, but we did not explain the generation of subthreshold oscillations themselves. Here, we simply injected wave signals with various wavenumber values into the network using a hypothetical device called an "Injector." As the Reviewer pointed out, estimating the wavenumbers of subthreshold oscillations that may exist or be observed in C. elegans would require a comprehensive investigation of the membrane potential dynamics occurring in the membranes of individual neurons. However, this is beyond the scope of this study and would require considerable effort to accomplish.

      As for the use of the tight-binding Hamiltonian, we have addressed that in our response to the third paragraph in the Joint Public Review above.

      (6) Similarly, it is possible that the wavenumber-dependent transmission observed depends strongly on the addition of a large number of virtual nodes (VNs) in the network, which the authors give little to no motivation for. As these nodes are not present in the C elegans connectome, the authors should explain the motivation for their inclusion in the model and should discuss their consequences on the transmission properties of the network.

      As mentioned in our response to the third paragraph in the Joint Public Review above, in our model, a node is simply a pathway for waves to pass through. Therefore, inserting virtual nodes between two neurons that are connected in the C. elegans connectome does not alter the actual connection structure. In other words, virtual nodes do not create new connections between cells that didn’t exist in the connectome. The virtual nodes we introduced are merely a way to divide the sections—axon, gap junction, dendrite—through which the wave passes when it is transmitted between two neurons. As we have already explained in Fig. 1B, the edge connected by two virtual nodes, represented by a dotted line, is motivated to depict the gap junction acting as a potential barrier. We hope this explanation helps the Reviewer better understand the model used in this study.

      (7) As it stands, the work would only have a very limited impact on the understanding of subthreshold oscillations in the rat or in C elegans. Indeed, the preprint falls short of relating its numerical results to any phenomena which could be observed in the lab.

      In this study, we proposed a minimalistic model built using the currently available but limited C. elegans connectome information. Specifically, our model is not a phenomenological one that adjusts parameters to accurately predict experimental measurements, but rather an attempt at a novel conceptual approach to theoretically possible scenarios. While the model may not be satisfactory enough to explain experimental phenomena at present, it is a theoretical/computational study that someone needs to undertake. We believe this is the path of scientific progress. Therefore, as the Reviewer has expressed concern, it is entirely understandable that reproducing the numerical results measured in actual experiments is difficult in this study. Nevertheless, we believe that this study makes a basic contribution to the conceptual understanding of subthreshold signal propagation in C. elegans’ electric synapses.

      Rather than offering a stretched opinion, we maintain a positive hope that future researchers in this field will improve the model by incorporating more detailed and extensive biological data through follow-up studies, allowing us to get closer to describing real phenomena.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The word "Sensory" was misspelled in Figures 2, 4 and 5.

      We appreciate the feedback from Reviewer #1. We have corrected the mentioned typos in Figures 2, 4, and 5 of the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      What neurophysiological changes support the learning of new sensorimotor transformations is a key question in neuroscience. Many studies have attempted to answer this question at the neuronal population level - with varying degrees of success - but few, if any, have studied the change in activity of the apical dendrites of layer 5 cortical neurons. Neurons in layer 5 of the sensory cortex appear to play a key role in sensorimotor transformations, showing important decision and reward-related signals, and being the main source of cortical and subcortical projections from the cortex. In particular, pyramidal track (PT) neurons project directly to subcortical regions related to motor activity, such as the striatum and brainstem, and could initiate rapid motor action in response to given sensory inputs. Additionally, layer 5 cortical neurons have large apical dendrites that extend to layer 1 where different neuromodulatory and long-range inputs converge, providing motor and contextual information that could be used to modulate layer 5 neurons output and/or to establish the synaptic plasticity required for learning a new association. 

      In this study, the authors aimed to test whether the learning of a new sensorimotor transformation could be supported by a change in the evoked response of the apical dendrites of layer 5 neurons in the mouse whisker primary somatosensory cortex. To do this, they performed longitudinal functional calcium imaging of the apical dendrites of layer 5 neurons while mice learned to discriminate between two multi-whisker stimuli. The authors used a simple conditioning task in which one whisker stimulus (upward or backward air pu , CS+) is associated with a reward after a short delay, while the other whisker stimulus (CS-) is not. They found that task learning (measured by the probability of anticipatory licking just after the CS+) was not associated with a significant change in the average population response evoked by the CS+ or the CS-, nor a change in the average population selectivity. However, when considering individual dendritic tufts, they found interesting changes in selectivity, with approximately equal numbers of dendrites becoming more selective for CS+ and dendrites becoming more selective for CS-. 

      One of the major challenges when assessing changes in neural representation during the learning of such Go/NoGo tasks is that the movements and rewards themselves may elicit strong neural responses that may be a confounding factor, that is, inexperienced mice do not lick in response to the CS+, while trained mice do. In this study, the authors addressed this issue in three ways: first, they carefully monitored the orofacial movements of mice and showed that task learning is not associated with changes in evoked whisker movements. Second, they show that whisking or licking evokes very little activity in the dendritic tufts compared to whisker stimuli (CS+ and CS-). Finally, the authors introduced into the design of their task a post-conditioning session after the last conditioning session during which the CS+ and the CS- are presented but no reward is delivered. During this post-session, the mice gradually stopped licking in response to the CS+. A better design might have been to perform the pre-conditioning and post-conditioning sessions in nonwater-restricted, unmotivated mice to completely exclude any lick response, but the fact that the change in selectivity persists after the mice stopped licking in the last blocks of the post-conditioning session (in mice relying only on their whiskers to perform the task) is convincing. 

      The clever task design and careful data analysis provide compelling evidence that learning this whisker discrimination task does not result in a massive change in sensory representation in the apical dendritic tufts of layer 5 neurons in the primary somatosensory cortex on average. Nevertheless, individual dendritic tufts do increase their selectivity for one or the other sensory stimulus, likely enhancing the ability of S1 neurons to accurately discriminate the two stimuli and trigger the appropriate motor response (to lick or not to lick). 

      One limitation of the present study is the lack of evidence for the necessity of the primary somatosensory cortex in the learning and execution of the task. As the authors have strongly emphasized in their previous publications, the primary somatosensory cortex may not be necessary for the learning and execution of simple whisker detection tasks, especially when the stimulus is very salient. Although this new task requires the discrimination between two whisker stimuli, the simplicity and salience of the whisker stimuli used could make this task cortex-independent. Especially when considering that some mice seem to not rely entirely on their whiskers to execute the task. 

      Nevertheless, this is an important result that shows for the first time changes in the selectivity to sensory stimuli at the level of individual apical dendritic tufts in correlation with the learning of a discrimination task. This study sheds new light on the cortical cellular substrates of reward-based learning and opens interesting perspectives for future research in this area. In future studies, it will be important to determine whether the change in selectivity of dendritic calcium spikes is causally involved in the learning of the task or whether it simply correlates with learning, as a consequence of changes in synaptic inputs caused by reward. The dendritic calcium spikes may be involved in the establishment of synaptic plasticity required for learning and impact the output of layer 5 pyramidal neurons to trigger the appropriate motor response. It would be important also to study the changes in selectivity in the apical dendrite of the identified projection neurons.  

      Reviewer #2 (Public Review):

      Summary: 

      The authors did not find an increased representation of CS+ throughout reinforcement learning in the tuft dendrites of Rbp4-positive neurons from layer 5B of the barrel cortex, as previously reported for soma from layer 2/3 of the visual cortex. 

      Alternatively, the authors observed an increased selectivity to both stimuli (CS+ and CS-) during reinforcement learning. This feature: 

      (1) was not present in repeated exposures (without reinforcement), 

      (2) was not explained by the animal's behaviour (choice, licking, and whisking), and 

      (3) was long-lasting, being present even when the mice disengaged from the task. 

      Importantly, increased selectivity was correlated with learning (% correct choices), and neural discriminability between stimuli increased with learning. 

      In conclusion, the authors show that tuft dendrites from layer 5B of the barrel cortex increase the representation of conditioned (CS+) and unconditioned stimuli (CS-) applied to the whiskers, during reinforcement learning. 

      Strengths: 

      The results presented are very consistent throughout the entire study, and therefore very convincing: 

      (1) The results observed are very similar using two different imaging techniques (2-photon planar imaging- and SCAPE-volumetric imaging). Figure 3 and Figure 4 respectively. 

      (2) The results are similar using "different groups" of tuft dendrites for the analysis (e.g.

      initially unresponsive and responsive pre- and post-learning). Figure 5. 

      (3) The results are similar from a specific set of trials (with the same sensory input, but di erent choices). Figure 7. 

      (4) Additionally, the selectivity of tuft dendrites from layer 5B of the barrel cortex was higher in the mice that exclusively used the whisker to respond to the stimuli (CS+ and CS-).  The results presented are controlled against a group of mice that received the same stimuli presentation, except for the reinforcement (reward). 

      Additionally, the behaviour outputs, such as choice, whisking, and licking could not account for the results observed. 

      Although there are no causal experiments, the correlation between selectivity and learning (percentage of correct choices), as well as the increased neural discriminability with learning, but not in repeated exposure, are very convincing. 

      Weaknesses: 

      The biggest weakness is the absence of causality experiments. Although inhibiting specifically tuft dendritic activity in layer 1 from layer 5 pyramidal neurons is very challenging, tuft dendritic activity in layer 1 could be silenced through optogenetic experiments as in Abs et al. 2018. By manipulating NDNF-positive neurons the authors could specifically modify tuft dendritic activity in the barrel cortex during CS presentations, and test if silencing tuft dendritic activity in layer 1 would lead to the lack of selectivity and an impairment of reinforcement learning. Additionally, this experiment will test if the selectivity observed during reinforcement learning is due to changes in the local network, namely changes in local synaptic connectivity, or solely due to changes in the long-range inputs.    

      We agree that such causal manipulations are a logical next step. Such manipulations are unfortunately not specific to layer 5 apicals, so the results would be difficult to interpret. We now discuss the challenge of such manipulations in the Discussion section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Overall, the study is solid and the article is well and clearly written. I have no suggestion for other experiments that would fall within the scope of this article. I would like only to suggest some additional analyses and clarifications in the writing. 

      Additional analyses: 

      Obviously, the main confounding factor in this type of data comes from the acquired motor response which follows - with a short latency - the sensory stimulus. This is particularly problematic for functional calcium imaging which has very low temporal resolution. The authors have addressed this question to some extent by showing that motor-evoked activity does not account for the change in selectivity acquired with learning and through the use of a post-conditioning session during which no reward was delivered. Figures 8C-D show that mice gradually stop licking in response to CS+ in this session and that the distribution of the selectivity index remains similar in these last blocks. Perhaps a more convincing analysis would be to simply select Miss and Correct rejection trials in which mice did not lick in response to the CS+ and CS-, respectively. Ideally, if the number of trials is sufficient, one could even select trials devoid of any evoked movement (no licking and no whisking).  

      We agree it would be interesting to compare Miss and Correct rejection trials to further rule out effects of a motor response, but there were never enough Miss trials to conduct such an analysis. Even in very early learning, there are few Miss trials (see Figure 1, session 2). We found that in early learning, animals would lick in most trials. Then, over the course of conditioning, they would learn to withhold licks during CS- presentation. Thus, we were able to examine Hits, Correct rejections, and False alarms (Figure 7), but not Miss trials. We have added text suggesting a future experiment in which the stimulus strengths are substantially reduced to drastically increase the error rates.

      The fact that changes in selectivity occur in both directions overall is really interesting. However, in the way the data are presented currently, one may wonder about mice/field of view vs single cell effect. i.e., do di erent dendritic tufts in the same field of view show opposite changes in selectivity? If we were to replot Figure 3A for a single mouse, would we obtain the same picture?  

      We appreciate this very good suggestion and have added scatter plots and selectivity index histograms for individual conditioned animals in Supplementary figure 2. These data demonstrate that different dendritic tufts in the same field of view exhibit opposite changes in selectivity.

      The authors point out that they observed no change in the mean response or selectivity during learning, but did find changes in selectivity at the level of individual dendritic tufts. This suggests that, at the population level, the ability to discriminate between the two stimuli should improve. A possible complementary analysis would be to show that the ability to decode stimulus identity from dendritic tuft population activity increases with learning.  

      Given the substantial change in individual tuft selectivity and that the tuft events occur are not rare, the population result is guaranteed. If individual tufts increase selectivity, the population will also increase its selectivity on a trial-by-trial basis. We have nevertheless included a new supplementary figure with a population analysis using SVMs to demonstrate this.

      Clarification: 

      The authors should make it clear from the beginning that mice are still water-restricted during the post-conditioning session and actually do keep licking for many CS+ trials. Therefore, this session is not devoid of motor response. 

      We have clarified this in the text.

      Did mice in the repeated exposure condition receive any reward during the recording sessions? If so when were rewards delivered? 

      We previously described in the Methods that these mice received water in their home cage, but we now additionally clarify this in the Results section.

      Minor: 

      Figure 2Aii, the labels of the Alpha and Betta barrels should be swapped. 

      Fixed

      Line 218: I believe this sentence should read "Using SCAPE microscopy, ...". 

      Corrected.

      Line 665: 'Reconstruction from 50' does that refer to the single cell reconstruction on the left panel? 

      Yes – Clarified in legend

      Reviewer #2 (Recommendations For The Authors): 

      Minor suggestions: 

      The 'summary' should mention from which brain area the results were acquired. Otherwise, it is misleading, giving the idea that the results described a generic feature, which is still unknown.  

      Added to the text.

      Please correct sentence 219: "SCAPE microscopy, we image tuft activity of additional mice..." 

      Added to the text.

      In the same sentence (219) it would be good to provide the number of additional mice imaged (2). 

      Added to the text.

      Regarding Supplementary Figure 1, it would be interesting to correlate the second peak after reward and learning rate, to provide further support to the sentences 109 to 113. 

      We agree this would be interesting to examine, but only four animals exhibited this second peak, which is too small of a sample to observe a meaningful correlation. We now clarify this in the text.

      In Figure 3, why not present the correlation between 'neural discriminability' and % of correct choices? 

      We appreciate the suggestion and have added this plot to Figure 3.

      The 'results' section will benefit tremendously if the authors consistently indicate the figures to which the results are being described, or 'data not shown' if it is the case. To give a few examples: 

      Sentence 108 - "averaged 28% ΔF/F" - From which figure is this result coming from?  Sentence 123 - "(p = 0.62, 0.64, respectively)" - comparison not shown, but see Figures 2E and D respectively? 

      Sentence 125 - "(CS+ responsive (...) across all sessions)" - From which figure is this result coming from? 

      Sentence 130 - "during pre-conditioning (p=0.66) or post-conditioning sessions (p=0.44) - From which figure? 

      Sentence 154 - "(Pre: p=0.20; last rewarded: p=0.43; Post: p=0.64, sign-rank test)" - From which figure? 

      Sentence 175 - "(-0.049, -0.001, and 0.003" - From which figure? Please show the graph that shows that the mean SI is not different. It can be supplementary. The distribution of SI will be strengthened by it.  

      We added this plot to supplementary figure 2.

      Sentence 244 - "(conditioned: 458/603; repeated exposure: 334/457) - From Figure 5E. 

      Sentence 256 - "(p=0.04, 2-sample t-test comparison mice) - From Figure 5B.  Sentence 258 - "(p=0.03, paired t-test) - from Figure 5B  Sentences 370 to 378 - No reference to the figure. 

      The 'discussion' section (sentences 459 to 494) refers to the differences between the current and previous studies (references 1,3,5), namely soma vs. dendrites and layer 2/3 vs. layer 5. However, it should also mention the difference between the nature of the stimuli and the brain area recorded (visual cortex vs. barrel cortex).

      We have addressed these issues in the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Authors reject the substance of Reviewer 1’s feedback primarily due to clear lack of understanding of typical parameterization practices used to avoid overfitting. To ensure the Spearman-rank correlation accuracy, 70% of all data was withheld from the optimization process and used solely for testing to yield figure 6. Data was withheld prior to model parameterization and therefore avoids Reviewer 1’s charge of “artificially forcing the correlation”. Authors did appreciate the request for clarification of additional definitions and minor reorganization suggestions. Below we provide specific responses to each numbered point (note: multiple responses are provided for some of the reviewer points).

      Point 1: Clarify Metrics Definition and Evaluation

      Authors clarified the description of biodiversity metrics. The metrics associated with manual methods are detailed in the third paragraph of the Materials and Methods: Data Analysis section, while the sensor-based metric is described in the second paragraph, and summarized in its last sentence.

      Text Additions:

      Authors added clarification to the introduction’s first paragraph defining biodiversity metrics, including species richness.

      Authors added detailed definitions of community metrics and their significance in community ecology in the Materials and Methods section (3rd paragraph of “Data Analysis” section). The discussion was updated to include a reference to community ecology and the benefits of big data, specifically highlighting the potential of autonomous optical sensors in entomology.

      Methods Reorganization

      We have reorganized the Methods section for clarity. Updated section clarifies metrics studied, location, dates, a description and methods around optical sensors, Malaise traps, and sweep netting.

      Text Additions:

      An overview paragraph was added to “Data analysis” (3rd paragraph) detailing key metrics used, specifying metrics such as abundance, richness, Shannon index, and Simpson index.

      Visualization methods for sensor data to deliver analogous metrics of abundance, richness, and diversity indices was added to “Data analysis” section.

      Supplementary Table 1 and the first paragraph of the Materials and Methods section cover location, dates, and other general information.

      Detailed descriptions and methods for optical sensors, Malaise traps, and sweeping are provided.

      Integration of Metrics

      Authors integrated two paragraphs explaining the fundamental differences between conventional methods in the 3rd paragraph of the discussion and the presented method of biodiversity measurement.

      Point 2: Body-to-Wing Ratio Calculation

      The backscattered optical cross-section is now clearly defined as the value measured at the maximum point of the event. Specifically, we have added the word ‘maximum’ to our methods section for clarity.

      Point 3: Ecosystem Services Paragraph

      We have shortened and edited this paragraph for clarity. The revised text is now more straightforward and comprehensible.

      Point 4: Results Section Structure

      We believe restructuring the results section around each metric would result in redundancy. The value of our analysis is in the comparison of different methods; therefore, instead of talking about methods in isolation, we provide an integrated discussion and comparison of all three methods across all metrics. Instead, we have maintained our current structure but ensured that the metrics are consistently described and analyzed.

      Point 5: Abundance Correlation

      We agree that the lack of a correlation between methods for abundance remains an open question. However, we maintain that fitting a linear model would be inappropriate and potentially misleading in the absence of significant correlation. We have clarified this in our manuscript.

      Point 6: Richness and Diversity Evaluations

      The authors disagree with Reviewer 1's feedback, citing a clear misunderstanding of standard parameterization practices used to prevent overfitting. Specifically, authors implemented a 30/70 Training/Testing split. Therefore only 30% of the data was used to fit the model and 70% of the dataset was reserved for testing to ensure the validity and reliability of our clustering results. By validating with a 70% testing dataset, we ensure that the clustering model can accurately group new data points and is robust against overfitting. This process helps verify that the identified clusters are meaningful and consistent across different subsets of the data.  Spearman's rho converts the data values into ranks and does not assume a linear relationship between the variables or require the data to follow a normal distribution. Spearman's rank correlation offers robustness against non-linearity and outliers by focusing on ranks. This approach is explained in the 4th paragraph of the “Data Analysis” section.

      Point 7: Clustering Method Credibility

      Authors acknowledge the variability in optical sensor features. However, the Law of Large Numbers supports increased insect measurement accuracy and stability occurs from optical insect sensors due to the increased number of observations made by the optical sensors compared to conventional methods. The manuscript now includes a detailed discussion of these aspects in the 3rd paragraph of discussion, emphasizing the correlation observed despite variability.

      Reviewer 2:

      Authors appreciate Reviewer 2’s feedback especially regarding contextualization. While authors disagree with the need for more specific experimental questions in a methods paper and the suggested need for more complex analysis, we agree with the essence of the review and added additional text regarding potential questions, method applications, and ecosystem processes for contextualization.

      Point 1: Larger Question Framing

      We present this article as a methodological paper rather than asking a specific experimental question. This approach is justified by the generalizable nature of methods papers, akin to those describing ImageJ or mass spectrometers. The method is widely applicable to a range of scientific questions. 

      We provided a discussion on how this technology could be applied in community ecology, conservation, and managed ecological systems like agriculture.

      In the Conclusion section we provided elaboration on the potential research questions and applications.

      Point 2: Complex Analyses

      While complex analyses like NMDS are useful for specific questions, this paper aims to establish the method. Once established, this method can be applied to various research questions in future studies. Therefore, as we are not directly asking an experimental question, more complex analysis is unnecessary.

      Point 3: Ecosystem Process (Granivory) Assay

      We have improved the contextualization and explanation of the ecosystem process assay throughout the manuscript, ensuring it is well-integrated and clear to readers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This paper explores how diverse forms of inhibition impact firing rates in models for cortical circuits. In particular, the paper studies how the network operating point affects the balance of direct inhibition from SOM inhibitory neurons to pyramidal cells, and disinhibition from SOM inhibitory input to PV inhibitory neurons. This is an important issue as these two inhibitory pathways have largely been studies in isolation. Support for the main conclusions is generally solid, but could be strengthened by additional analyses.

      Strengths:

      A major strength of the paper is the systematic exploration of how circuit architecture effects the impact of inhibition. This includes scans across parameter space to determine how firing rates and stability depend on effective connectivity. This is done through linearization of the circuit about an effective operating point, and then the study of how perturbations in input effect this linear approximation.

      Weaknesses:

      The linearization approach means that the conclusions of the paper are valid only on the linear regime of network behavior. The paper would be substantially strengthened with a test of whether the conclusions from the linearized circuit hold over a large range of network activity. Is it possible to simulate the full network and do some targeted tests of the conclusions from linearization? Those tests could be guided by the linearization to focus on specific parameter ranges of interest.

      We agree with the reviewer that it would be interesting to test if our results hold in a nonlinear regime of network behaviour (i.e. the chaotic regime, see also comment 1 by reviewer 2). As mentioned above, this requires a different type of model (either rate-based or spiking model with multiple neurons instead of modelling the mean population rate dynamics) which, in our opinion, exceeds the scope of this manuscript. Furthermore, the core measures of our study, network gain, and stability require linearization. In a chaotic regime where the linearization approach is impossible, we would need to consider/define new measures to characterize network response/activity. Therefore, while certainly being an interesting question to study, the broad scope of the studying networks in a nonlinear regime is better tackled in a separate study. We now acknowledge in the discussion of our manuscript that the linearization approach is a limitation in our study and that it would be an interesting future direction to investigate chaotic dynamics.

      The results illustrated in the figures are generally well described but there is very little intuition provided for them. Are there simplified examples or explanations that could be given to help the results make sense? Here are some places such intuition would be particularly helpful:

      page 6, paragraph starting ”In sum ...”

      Page 8, last paragraph

      Page 10, paragraph starting ”In summary ...”

      Page 11, sentence starting ”In sum ...”

      We agree with the reviewer that we didn’t provide enough intuition to our results. We now extended the paragraphs listed by the reviewer with additional information, providing a more intuitive understanding of the results presented in the respective chapter.

      Reviewer #2 (Public Review):

      Summary:

      Bos and colleagues address the important question of how two major inhibitory interneuron classes in the neocortex differentially affect cortical dynamics. They address this question by studying Wilson-Cowan-type mathematical models. Using a linearized fixed point approach, they provide convincing evidence that the existence of multiple interneuron classes can explain the counterintuitive finding that inhibitory modulation can increase the gain of the excitatory cell population while also increasing the stability of the circuit’s state to minor perturbations. This effect depends on the connection strengths within their circuit model, providing valuable guidance as to when and why it arises.

      Overall, I find this study to have substantial merit. I have some suggestions on how to improve the clarity and completeness of the paper.

      Strengths:

      (1) The thorough investigation of how changes in the connectivity structure affect the gain-stability relationship is a major strength of this work. It provides an opportunity to understand when and why gain and stability will or will not both increase together. It also provides a nice bridge to the experimental literature, where different gain-stability relationships are reported from different studies.

      (2) The simplified and abstracted mathematical model has the benefit of facilitating our understanding of this puzzling phenomenon. (I have some suggestions for how the authors could push this understanding further.) It is not easy to find the right balance between biologically detailed models vs simple but mathematically tractable ones, and I think the authors struck an excellent balance in this study.

      Weaknesses:

      (1) The fixed-point analysis has potentially substantial limitations for understanding cortical computations away from the steady-state. I think the authors should have emphasized this limitation more strongly and possibly included some additional analyses to show that their conclusions extend to the chaotic dynamical regimes in which cortical circuits often live.

      We agree with the reviewer that it would be interesting to test if our results hold in a chaotic regime of network behaviour (see also comment by reviewer 1). As mentioned above, this requires a different type of model (either rate-based or spiking model with multiple neurons instead of modelling the mean population rate dynamics) which, in our opinion, exceeds the scope of this manuscript. Furthermore, the core measures of our study, network gain, and stability require linearization. In a chaotic regime where the linearization approach is impossible, we would need to consider/define new measures to characterize network response/activity. Therefore, while certainly being an interesting question to study, the broad scope of the studying networks in a nonlinear regime is better tackled in a separate study. We now acknowledge in the discussion of our manuscript that the linearization approach is a limitation in our study and that it would be an interesting future direction to investigate chaotic dynamics.

      (2) The authors could have discussed – even somewhat speculatively – how SST interneurons fit into this picture. Their absence from this modelling framework stands out as a missed opportunity.

      We believe that the reviewer wanted us to speculate about VIP interneurons (and not SST interneurons, which we already do extensively in the manuscript). Previous models have included VIP neurons in the circuit (e.g. del Molino et al., 2017; Palmigiano et al., 2023; Waitzmann et al., 2024). While we do not model VIP cells explicitly, we implicitly assume that a possible source of modulation of SOM neurons comes from VIP cells. We have now added a short discussion on VIP cells in the last paragraph in our discussion section.

      (3) The analysis is limited to paths within this simple E,PV,SOM circuit. This misses more extended paths (like thalamocortical loops) that involve interactions between multiple brain areas. Including those paths in the expansion in Eqs. 11-14 (Fig. 1C) may be an important consideration.

      We agree with the reviewer that our framework can be extended to study many other different paths, like thalamocortical loops, cortical layer-specific connectivity motifs, or circuits with VIP or L1 inhibitory neurons. Studying these questions, however, are beyond the scope of our work. In our discussion, we now mention the possibility of using our framework to study those questions.

      Reviewer #3 (Public Review):

      Summary:

      Bos et al study a computational model of cortical circuits with excitatory (E) and two subtypes of inhibition parvalbumin (PV) and somatostatin (SOM) expressing interneurons. They perform stability and gain analysis of simplified models with nonlinear transfer functions when SOM neurons are perturbed. Their analysis suggests that in a specific setup of connectivity, instability and gain can be untangled, such that SOM modulation leads to both increases in stability and gain. This is in contrast with the typical direction in neuronal networks where increased gain results in decreased stability.

      Strengths:

      - Analysis of the canonical circuit in response to SOM perturbations. Through numerical simulations and mathematical analysis, the authors have provided a rather comprehensive picture of how SOM modulation may affect response changes.

      - Shedding light on two opposing circuit motifs involved in the canonical E-PV-SOM circuitry - namely, direct inhibition (SOM → E) vs disinhibition (SOM → PV → E). These two pathways can lead to opposing effects, and it is often difficult to predict which one results from modulating SOM neurons. In simplified circuits, the authors show how these two motifs can emerge and depend on parameters like connection weights.

      - Suggesting potentially interesting consequences for cortical computation. The authors suggest that certain regimes of connectivity may lead to untangling of stability and gain, such that increases in network gain are not compromised by decreasing stability. They also link SOM modulation in different connectivity regimes to versatile computations in visual processing in simple models.

      Weaknesses:

      The computational analysis is not novel per se, and the link to biology is not direct/clear.

      Computationally, the analysis is solid, but it’s very similar to previous studies (del Molino et al, 2017). Many studies in the past few years have done the perturbation analysis of a similar circuitry with or without nonlinear transfer functions (some of them listed in the references). This study applies the same framework to SOM perturbations, which is a useful and interesting computational exercise, in view of the complexity of the high-dimensional parameter space. But the mathematical framework is not novel per se, undermining the claim of providing a new framework (or ”circuit theory”).

      In the introduction we acknowledge that our analysis method is not novel but is rather based on previous studies (del Molino et al., 2017; Kuchibhotla et al., 2017; Kumar et al., 2023, Litwin-Kumar et al., 2016; Mahrach et al., 2020; Palmigiano et al., 2023; Veit et al., 2023; Waitzmann et al., 2024). We now rewrote parts of the introduction to make sure that it does not sound like the computational analysis has been developed by us, but that we rather use those previously developed frameworks to dissect stability and gain via SOM modulation.

      Link to biology: the most interesting result of the paper with regard to biology is the suggestion of a regime in which gain and stability can be modulated in an unconventional way - however, it is difficult to link the results to biological networks: - A general weakness of the paper is a lack of direct comparison to biological parameters or experiments. How different experiments can be reconciled by the results obtained here, and what new circuit mechanisms can be revealed? In its current form, the paper reads as a general suggestion that different combinations of gain modulation and stability can be achieved in a circuit model equipped with many parameters (12 parameters). This is potentially interesting but not surprising, given the high dimensional space of possible dynamical properties. A more interesting result would have been to relate this to biology, by providing reasoning why it might be relevant to certain circuits (and not others), or to provide some predictions or postdictions, which are currently missing in the manuscript.

      - For instance, a nice motivation for the paper at the beginning of the Results section is the different results of SOM modulation in different experiments - especially between L23 (inhibition) and L4 (disinhibition). But no further explanation is provided for why such a difference should exist, in view of their results and the insights obtained from their suggested circuit mechanisms. How the parameters identified for the two regimes correspond to different properties of different layers?

      As pointed out by the reviewer, the main goal of our manuscript is to provide a general understanding of how gain and stability depend on different circuit motifs (ie different connectivity parameters), and how circuit modulations via SOM neurons affect those measures. However, we agree with the reviewer that it would be useful to provide some concrete predictions or postdictions following from our study.

      An interesting example of a postdiction of our model is that the firing rate change of excitatory neurons in response to a change in the stimulus (which we define as network gain, Eq. 2) depends on firing rates of the excitatory, PV, and SOM neurons at the moment of stimulus presentation (Fig. 3ii; Fig. 4Aii,Bii,Cii; Fig. 5Aii, Bii, Cii). Hence any change in input to the circuit can affect the response gain to a stimulus presentation, in line with experimental evidence which suggests that changes in inhibitory firing rates and changes in the behavioral state of the animal lead to gain modifications (Ferguson and Cardin 2020).

      Another recent concrete example is the study of Tobin et al., 2023, in which the authors show that optogenetically activating SOM cells in the mouse primary auditory cortex (A1) decreases the excitatory responses to auditory stimuli. In our framework, this corresponds to the case of decreases in network gain (gE) for positive SOM modulation, as seen in the circuit with PV to SOM feedback connectivity (Suppl. Fig. S1).

      Another example is the study by Phillips and Hasenstaub 2016, in which the authors study the effect of optogenetic perturbations of SOM (and PV) cells on tuning curves of pyramidal cells in mouse A1. While they find large heterogeneity in additive/subtractive or multiplicative/divisive tuning curve changes following SOM inactivation, most cells have a purely multiplicative or purely additive component (and none of the cells have a divisive component). In our study, we see that large multiplicative responses of the excitatory population follow from circuits with strong E to SOM feedback connectivity.

      We note that in future computational studies, it would be useful to apply our framework with a focus on a specific brain region and add all relevant cell types (at a minimum E, PV, SOM, and VIP) plus a dendritic compartment, in order to formulate much more precise experimental predictions.

      We have now added additional information to the discussion section.

      - Another caveat is the range of parameters needed to obtain the unintuitive untangling as a result of SOM modulation. From Figure 4, it appears that the ”interesting” regime (with increases in both gain and stability) is only feasible for a very narrow range of SOM firing rates (before 3 Hz). This can be a problem for the computational models if the sweet spot is a very narrow region (this analysis is by the way missing, so making it difficult to know how robust the result is in terms of parameter regions). In terms of biology, it is difficult to reconcile this with the realistic firing rates in the cortex: in the mouse cortex, for instance, we know that SOM neurons can be quite active (comparable to E neurons), especially in response to stimuli. It is therefore not clear if we should expect this mechanism to be a relevant one for cortical activity regimes.

      We agree with the reviewer that it’s important to test the robustness of our results. As suggested by the reviewer, we now include a new supplementary figure (Suppl. Fig. S2) which measures the percentage of data points in the respective quadrant Q1-Q4 when changing the SOM firing rates (as done in Fig. 5). We see that the quadrants in which the network gain and stability change in the same direction (Q2 and Q3) remain high in the case for E to SOM feedback (Suppl. Fig. S2A) over SOM rates ranging over 0-10 Hz (and likely beyond).

      - One of the key assumptions of the model is nonlinear transfer functions for all neuron types. In terms of modelling and computational analysis, a thorough analysis of how and when this is necessary is missing (an analysis similar to what has been attempted at in Figure 6 for synaptic weights, but for cellular gains). In terms of biology, the nonlinear transfer function has experimentally been reported for excitatory neurons, so it’s not clear to what extent this may hold for different inhibitory subtypes. A discussion of this, along with the former analysis to know which nonlinearities would be necessary for the results, is needed, but currently missing from the study. The nonlinearity is assumed for all subtypes because it seems to be needed to obtain the results, but it’s not clear how the model would behave in the presence or absence of them, and whether they are relevant to biological networks with inhibitory transfer functions.

      It is true that the nonlinear transfer function is a key component in our model. We chose identical transfer functions for E, PV, and SOM (; Eq. 4) to simplify our analysis. If the transfer function of one of the neuron types would be linear (β \= 1), then the corresponding b terms (the slope of the nonlinearity at the steady state; b \= dfX/dqX; Fig. 1B; Eq. 4) would be equal to α. Therefore, if neurons had a linear transfer function in our model, there would not be a dependence of network gain on E and PV firing rate as studied in Fig. 3-5. This is because the relationship between PV rates and their gain would be constant (bP \= α) in Fig. 1B (bottom).

      If all the transfer functions were linear, changes in firing rates would not have an impact on network gain or stability. Changing the nonlinear transfer function by changing the α or β terms in Eq. 4 would only scale the way a change in the rates affects the b terms and hence the results presented in Fig. 3-5. More interesting would be to study how different types of nonlinearities, like sigmoidal functions or sublinear nonlinearities (i.e. saturating nonlinearities), would change our results. However, we think that such an investigation is out of scope for this study. We now added a comment to the Methods section.

      Experimentally, F-I curves have been measured also for PV and SOM neurons. For example, Romero-Sosa et al., 2021 measure the F-I curve of pyramidal, PV and SOM neurons in mouse cortical slices. They find that similar to pyramidal neurons, PV and SOM neurons show a nonlinear F-I curve. We now added the citation of Romero-Sosa et al., 2021 to our manuscript.

      - Tuning curves are simulated for an individual orientation (same for all), not considering the heterogeneity of neuronal networks with multiple orientation selectivity (and other visual features) - making the model too simplistic.

      The reviewer is correct that we only study changes in tuning curves in a simplistic model. In our model, the excitatory and PV populations are tuned to a single orientation (in the case of Fig. 7 to θ \= 90). While this is certainly an oversimplification, it allows us to understand how additive/subtractive and multiplicative/divisive changes in the tuning curves come about in networks with different connectivity motifs. To model heterogeneity of tuning responses within a network, it requires more complex models. A natural choice would be to extend a classical ring attractor model (Rubin et al., 2015) by splitting the inhibitory population into PV and SOM neurons, or study the tuning curve heterogeneity that occurs in balanced networks (Hansel and van Vreeswijk 2012). However, this model has many more parameters, like the spatial connectivity profiles from and onto PV and SOM neurons. While highly valuable, we believe that studying such models exceeds the scope of our current manuscript. We now added a paragraph in the discussion section, mentioning this as an interesting future direction.

      Reviewer #1 (Recommendations For The Authors):

      The last sentence of the abstract is hard to interpret before reading the rest of the paper - suggest replacing or rephrasing.

      We rephrased the sentence to make more clear what we mean.

      Page 3, last full paragraph: I think this assumes that phi is positive. What is the justification for that assumption? More generally, I think you could say a bit more about phi in the main text since it is a fairly complicated term.

      The reviewer is correct, for a stable system phi is always positive. We now clarify this and explain phi in more detail in the main text.

      Fig 1D: It would be helpful to identify when the stimulus comes on and be clearer about what the stimulus is. I assume it’s a step increase in S input at 0.05 s or so - but that should be immediately apparent looking at the figure.

      We agree with the reviewer and we added a dashed line at the time of stimulus onset in Fig. 1D.

      Page 5: ”To motivate our analysis we compare ... (Fig. 2A)” - Figure 2A does not show responses without modulation, so this sentence is confusing.

      The dashed lines in Fig. 2A (and Fig. 2C) actually represents the rate change without modulation.

      Page 6: sentence “The central goal of our study ...” seems out of place since this is pretty far into the results, and that goal should already be clear.

      We agree with the reviewer, hence we updated the sentence.

      Page 10, top: the green curve in panel Aii always has a negative slope - so I am confused by the statement that increasing wSE decreases both gain and stability.

      We thank the reviewer for pointing out this mistake. We now fixed it in the text.

      Figure 6: in general it is hard to see what is going on in this figure (the green and blue in particular are hard to distinguish). Some additional labels would be helpful, but I would also see if the color scheme can be improved.

      We added a zoom-in to the panels which were hard to distinguish.

      Reviewer #2 (Recommendations For The Authors):

      Major recommendations:

      (1) The authors should explain early on in the results section what the key factor(s) is that differentiates SOM from PV cells in their model. E.g., in Fig. 1A, the only obvious difference is that SOM cells don’t inhibit themselves. However, later on in the paper, the difference in external stimulus drive to these interneuron classes is more heavily emphasized. Given the importance of that difference (in external stim drive), I think this should be highlighted early on.

      We now mention the key factors that differentiate PV and SOM neurons already when describing Fig. 1A.

      (2) The result in Figs. 5,6 demonstrate that recurrent SOM connectivity is important for achieving increases in both gain and stability. This observation could benefit from some intuitive explanation. Perhaps the authors could find this explanation by looking at their series expansion (Eqs. 11-14, Fig. 1C) and determining which term(s) are most important for this effect. The corresponding paths through the circuit – the most important ones – could then be highlighted for the reader.

      We agree with the reviewer that our results benefit from more intuitive explanations. This has also been pointed out by reviewer 1 in their public review. We now extended the concluding paragraphs in the context of Fig. 4-6 with additional information, providing a more intuitive understanding of the results presented in the respective chapter. While it is possible to gain an intuitive understanding of how the network gain depends on rate and weight parameters (Eq. 2), this understanding is unfortunately missing in the case of stability. The maximum eigenvalue of the system have a complex relationship with all the parameters, and often have nonlinear dependencies on changes of a parameter (e.g. as we show in Fig. 3iv or one can see in Fig. 6). We now discuss this difficulty at the end of the section “Influence of weight strength on network gain vs stability”.

      (3) I think the authors should consider including some analyses that do not rely on the system being at or near a fixed point. I admit that such analysis could be difficult, and this could of course be done in a future study. Nevertheless, I want to reiterate that this addition could add a lot of value to this body of work.

      As outlined above, we decided to not include additional analysis on network behaviour in nonlinear regimes but we now acknowledge in the discussion of our manuscript that the linearization approach is a limitation in our study and that it would be an interesting future direction to investigate chaotic dynamics.

      Minor recommendations:

      (1) At the top of P. 6, when the authors first discuss the stability criterion involving eigenvalues, they should address the question ”eigenvalues of what?”. I suggest introducing the idea of the Jacobian matrix, and explaining that the largest eigenvalue of that matrix determines how rapidly the system will return to the fixed point after a small perturbation.

      We included an additional sentence in the respective paragraph explaining the link between stability and negative eigenvalues, and we also added a sentence in the Methods section stating the the largest real eigenvalue dominates the behavior of the dynamical system.

      (2) The panel labelling in Fig. 3 is unnecessarily confusing. It would be simpler (and thus better) to simply label the panels A,B,C,D, or i,ii,iii,iv, instead of the current labelling: Ai, Aii, Aiii, Aiv. (There are currently no panels ”B” in Fig. 3).

      We updated the figure accordingly.

      Reviewer #3 (Recommendations For The Authors):

      • Suggestions for improved or additional experiments, data or analyses.

      Analysis of the effect of different nonlinear transfer functions is necessary.

      Please see our detailed answer to the reviewer’s comment in the public review above.

      Analysis of gain modulation in models with more realistic tuning properties.

      Please see our detailed answer to the reviewer’s comment in the public review above.

      Mathematical analysis of the conditions to obtain ”untangled” gain and stability:

      One of the promises of the paper is that it is offering a computational framework or circuit theory for understanding the effect of SOM perturbation. However, the main result, namely the untangling of gain and stability, has only been reported in numerical simulations (e.g. Fig. 6). Different parameters have been changed and the results of simulations have been reported for different conditions. Given the simplified model, which allows for rigorous mathematical analysis, isn’t it possible to treat this phenomenon more analytically? What would be the conditions for the emergence of the untangled regime? This is currently missing from the analyses and results.

      We agree with the reviewer that our results benefit from more intuitive explanations. This has also been pointed out by reviewer 1 in their public review. We now extended the concluding paragraphs in the context of Fig. 4-6 with additional information, providing a more intuitive understanding of the results presented in the respective chapter. While it is possible understand analytically of how the network gain depends on rate and weight parameters (Eq. 2), this understanding is unfortunately missing in the case of stability. The maximum eigenvalue of the system have a complex relationship with all the parameters, and often have nonlinear dependencies on changes of a parameter (e.g. as we show in Fig. 3iv or one can see in Fig. 6). This doesn’t allow for a a deep analytical understanding of the entangling of gain and stability. We now discuss this difficulty at the end of the section “Influence of weight strength on network gain vs stability”.

      • Recommendations for improving the writing and presentation. The Results section is well written overall, but other parts, especially the Introduction and Discussion, would benefit from proof reading - there are many typos and problems with sentence structures and wording (some mentioned below).

      We have gone through the manuscript again and improved the writing.

      The presentation of the dependence on weight in Figure 6 can be improved. For instance, the authors talk about the optimal range of PV connectivity, but this is difficult to appreciate in the current illustration and with the current colour scheme.

      We added a zoom in to the panels which were hard to distinguish.

      • Minor corrections to the text and figures. Text:

      We thank the reviewer for their thorough reading of our manuscript. We fixed all the issues from below in the manuscript.

      Some examples of bad structure or wording:

      From the Abstract:

      ”We show when E - PV networks recurrently connect with SOM neurons then an SOM mediated modulation that leads to increased neuronal gain can also yield increased network stability.” From Introduction:

      Sentence starting with ”This new circuit reality ...”

      ”Inhibition is been long identified as a physiological or circuit basis for how cortical activity changes depending upon processing or cognitive needs ...”

      Sentence starting with ”Cortical models with both ...”

      ”... allowing SOM neurons the freedom to ..”

      From Results:

      ”... affects of SOM neurons on E ..”

      ”seem in opposition to one another, with SOM neuron activity providing either a source or a relief of E neuron suppression”. The sentence after is also difficult to read and needs to be simplified.

      P. 7: ”We first remark that ...”

      Difficult to read/understand - long and badly structured sentence.

      P. 8: ”adding a recurrent connection onto SOM neurons from the E-PV subcircuit” It’s from E (and not PV) to be more precise (Fig. 5).

      Discussion:

      ”Firstly, E neurons and PV neurons experience very similar synaptic environments.” What does it mean?

      ”Fortunately, PV neurons target both the cell bodies and proximal dendrites” Fortunately for whom or what? ”in line with arge heterogeneity”

      Methods:

      Matrix B is never defined - the diagonal matrix of b (power law exponents) I assume.

      Some of the other notations too, e.g. bs, etc (it’s implicit, but should be explained).

      Structure of sentence:

      ”Network gain is defined as ...” (p. 17)

      Figure:

      The schematics in Figure 4 can be tweaked to highlight the effect of input (rather than other components of the network, which are the same and repetitive), to highlight the main difference for the reader.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors seek to establish what aspects of nervous system structure and function may explain behavioral differences across individual fruit flies. The behavior in question is a preference for one odor or another in a choice assay. The variables related to neural function are odor responses in olfactory receptor neurons or in the second-order projection neurons, measured via calcium imaging. A different variable related to neural structure is the density of a presynaptic protein BRP. The authors measure these variables in the same fly along with the behavioral bias in the odor assays. Then they look for correlations across flies between the structure-function data and the behavior.

      Strengths:

      Where behavioral biases originate is a question of fundamental interest in the field. In an earlier paper (Honegger 2019) this group showed that flies do vary with regard to odor preference, and that there exists neural variation in olfactory circuits, but did not connect the two in the same animal. Here they do, which is a categorical advance, and opens the door to establishing a correlation. The authors inspect many such possible correlations. The underlying experiments reflect a great deal of work, and appear to be done carefully. The reporting is clear and transparent: All the data underlying the conclusions are shown, and associated code is available online.

      We are glad to hear the reviewer is supportive of the general question and approach.

      Weaknesses:

      The results are overstated. The correlations reported here are uniformly small, and don't inspire confidence that there is any causal connection. The main problems are

      Our revision overhauls the interpretation of the results to prioritize the results we have high confidence in (specifically, PC 2 of our Ca++ data as a predictor of OCT-MCH preference) versus results that are suggestive but not definitive (such as PC 1 of Ca++ data as a predictor of Air-OCT preference).

      It’s true that the correlations are small, with R2 values typically in the 0.1-0.2 range. That said, we would call it a victory if we could explain 10 to 20% of the variance of a behavior measure, captured in a 3 minute experiment, with a circuit correlate. This is particularly true because, as the reviewer notes, the behavioral measurement is noisy.

      (1) The target effect to be explained is itself very weak. Odor preference of a given fly varies considerably across time. The systematic bias distinguishing one fly from another is small compared to the variability. Because the neural measurements are by necessity separated in time from the behavior, this noise places serious limits on any correlation between the two.

      This is broadly correct, though to quibble, it’s our measurement of odor preference which varies considerably over time. We are reasonably confident that more variance in our measurements can be attributed to sampling error than changes to true preference over time. As evidence, the correlation in sequential measures of individual odor preference, with delays of 3 hours or 24 hours, are not obviously different. We are separately working on methodological improvements to get more precise estimates of persistent individual odor preference, using averages of multiple, spaced measurements. This is promising, but beyond the scope of this study.

      (2) The correlations reported here are uniformly weak and not robust. In several of the key figures, the elimination of one or two outlier flies completely abolishes the relationship. The confidence bounds on the claimed correlations are very broad. These uncertainties propagate to undermine the eventual claims for a correspondence between neural and behavioral measures.

      We are broadly receptive to this criticism. The lack of robustness of some results comes from the fundamental challenge of this work: measuring behavior is noisy at the individual level. Measuring Ca++ is also somewhat noisy. Correlating the two will be underpowered unless the sample size is huge (which is impractical, as each data point requires a dissection and live imaging session) or the effect size is large (which is generally not the case in biology). In the current version we tried in some sense to avoid discussing these challenges head-on, instead trying to focus on what we thought were the conclusions justified by our experiments with sample sizes ranging from 20 to 60. Our revision is more candid about these challenges.

      That said, we believe the result we view as the most exciting — that PC2 of Ca++ responses predicts OCT-MCH preference — is robust. 1) It is based on a training set with 47 individuals and a test set composed of 22 individuals. The p-value is sufficiently low in each of these sets (0.0063 and 0.0069, respectively) to pass an overly stringent Bonferroni correction for the 5 tests (each PC) in this analysis. 2) The BRP immunohistochemistry provides independent evidence that is consistent with this result — PC2 that predicts behavior (p = 0.03 from only one test) and has loadings that contrast DC2 and DM2. Taken together, these results are well above the field-standard bar of statistical robustness.

      In our revision, we are explicit that this is the (one) result we have high confidence in. We believe this result convincingly links Ca++ and behavior, and warrants spotlighting. We have less confidence in other results, and say so, and we hope this addresses concerns about overstating our results.

      (3) Some aspects of the statistical treatment are unusual. Typically a model is proposed for the relationship between neuronal signals and behavior, and the model predictions are correlated with the actual behavioral data. The normal practice is to train the model on part of the data and test it on another part. But here the training set at times includes the testing set, which tends to give high correlations from overfitting. Other times the testing set gives much higher correlations than the training set, and then the results from the testing set are reported. Where the authors explored many possible relationships, it is unclear whether the significance tests account for the many tested hypotheses. The main text quotes the key results without confidence limits.

      Our primary analyses are exactly what the reviewer describes, scatter plots and correlations of actual behavioral measures against predicted measures. We produced test data in separate experiments, conducted weeks to months after models were fit on training data. This is more rigorous than splitting into training and test sets data collected in a single session, as batch/environmental effects reduce the independence of data collected within a single session.

      We only collected a test set when our training set produced a promising correlation between predicted and actual behavioral measures. We never used data from test sets to train models. In our main figures, we showed scatter plots that combined test and training data, as the training and test partitions had similar correlations.

      We are unsure what the reviewer means by instances where we explored many possible relationships. The greatest number of comparisons that could lead to the rejection of a null hypothesis was 5 (corresponding to the top 5 PCs of Ca++ response variation or Brp signal). We were explicit that the p-values reported were nominal. As mentioned above, applying a Bonferroni correction for n=5 comparisons to either the training or test correlations from the Ca++ to OCT-MCH preference model remains significant at alpha=0.05.

      Our revision includes confidence intervals around ⍴signal for the PN PC2 OCT-MCH model, and for the ORN Brp-Short PC2 OCT-MCH model (lines 170-172, 238)

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to identify the neural sources of behavioral variation in a decision between odor and air, or between two odors.

      Strengths:

      -The question is of fundamental importance.

      -The behavioral studies are automated, and high-throughput.

      -The data analyses are sophisticated and appropriate.

      -The paper is clear and well-written aside from some strong wording.

      -The figures beautifully illustrate their results.

      -The modeling efforts mechanistically ground observed data correlations.

      We are glad to read that the reviewer sees these strengths in the study. We hope the current revision addresses the strong wording.

      Weaknesses:

      -The correlations between behavioral variations and neural activity/synapse morphology are (i) relatively weak, (ii) framed using the inappropriate words "predict", "link", and "explain", and (iii) sometimes non-intuitive (e.g., PC 1 of neural activity).

      Taking each of these points in turn:

      i) It would indeed be nicer if our empirical correlations are higher. One quibble: we primarily report relatively weak correlations between measurements of behavior and Ca++/Brp. This could be the case even when the correlation between true behavior and Ca++/Brp is higher. Our analysis of the potential correlation between latent behavioral and Ca++ signals was an attempt to tease these relationships apart. The analysis suggests that there could, in fact, be a high underlying correlation between behavior and these circuit features (though the error bars on these inferences are wide).

      ii) We worked to ensure such words are used appropriately. “Predict” can often be appropriate in this context, as a model predicts true data values. Explain can also be appropriate, as X “explaining” a portion of the variance of Y is synonymous with X and Y being correlated. We cannot think of formal uses of “link,” and have revised the manuscript to resolve any inappropriate word choice.

      iii) If the underlying biology is rooted in non-intuitive relationships, there’s unfortunately not much we can do about it. We chose to use PCs of our Ca++/Brp data as predictors to deal with the challenge of having many potential predictors (odor-glomerular responses) and relatively few output variables (behavioral bias). Thus, using PCs is a conservative approach to deal with multiple comparisons. Because PCs are just linear transformations of the original data, interpreting them is relatively easy, and in interpreting PC1 and PC2, we were able to identify simple interpretations (total activity and the difference between DC2 and DM2 activation, respectively). All in all, we remain satisfied with this approach as a means to both 1) limit multiple comparisons and 2) interpret simple meanings from predictive PCs.

      No attempts were made to perturb the relevant circuits to establish a causal relationship between behavioral variations and functional/morphological variations.

      We did conduct such experiments, but we did not report them because they had negative results that we could not definitively interpret. We used constitutive and inducible effectors to alter the physiology of ORNs projecting to DC2 and DM2. We also used UAS-LRP4 and UAS-LRP4-RNAi to attempt to increase and decrease the extent of Brp puncta in ORNs projecting to DC2 and DM2. None of these manipulations had a significant effect on mean odor preference in the OCT-MCH choice, which was the behavioral focus of these experiments. We were unable to determine if the effectors had the intended effects in the targeted Gal4 lines, particularly in the LRP experiments, so we could not rule out that our negative finding reflected a technical failure.

      Author response image 1.

      We believe that even if these negative results are not technical failures, they are not necessarily inconsistent with the analyses correlating features of DC2 and DM2 to behavior. Specifically, we suspect that there are correlated fluctuations in glomerular Ca++ responses and Brp across individuals, due to fluctuations in the developmental spatial patterning of the antennal lobe. Thus, the DC2-DM2 predictor may represent a slice/subset of predictors distributed across the antennal lobe. This would also explain how we “got lucky” to find two glomeruli as predictors of behavior, when we were only able to image a small portion of the glomeruli.

      Reviewer #3 (Public Review):

      Churgin et. al. seeks to understand the neural substrates of individual odor preference in the Drosophila antennal lobe, using paired behavioral testing and calcium imaging from ORNs and PNs in the same flies, and testing whether ORN and PN odor responses can predict behavioral preference. The manuscript's main claims are that ORN activity in response to a panel of odors is predictive of the individual's preference for 3-octanol (3-OCT) relative to clean air, and that activity in the projection neurons is predictive of both 3-OCT vs. air preference and 3-OCT vs. 4-methylcyclohexanol (MCH). They find that the difference in density of fluorescently-tagged brp (a presynaptic marker) in two glomeruli (DC2 and DM2) trends towards predicting behavioral preference between 3-oct vs. MCH. Implementing a model of the antennal lobe based on the available connectome data, they find that glomerulus-level variation in response reminiscent of the variation that they observe can be generated by resampling variables associated with the glomeruli, such as ORN identity and glomerular synapse density.

      Strengths:

      The authors investigate a highly significant and impactful problem of interest to all experimental biologists, nearly all of whom must often conduct their measurements in many different individuals and so have a vested interest in understanding this problem. The manuscript represents a lot of work, with challenging paired behavioral and neural measurements.

      Weaknesses:

      The overall impression is that the authors are attempting to explain complex, highly variable behavioral output with a comparatively limited set of neural measurements.

      We would say that we are attempting to explain a simple, highly variable behavioral measure with a comparatively limited set of neural measurements, i.e. we make no claims to explain the complex behavioral components of odor choice, like locomotion, reversals at the odor boundary, etc.

      Given the degree of behavioral variability they observe within an individual (Figure 1- supp 1) which implies temporal/state/measurement variation in behavior, it's unclear that their degree of sampling can resolve true individual variability (what they call "idiosyncrasy") in neural responses, given the additional temporal/state/measurement variation in neural responses.

      We are confident that different Ca++ recordings are statistically different. This is borne out in the analysis of repeated Ca++ recordings in this study, which finds that the significant PCs of Ca++ variation contain 77% of the variation in that data. That this variation is persistent over time and across hemispheres was assessed in Honegger & Smith, et al., 2019. We are thus confident that there is true individuality in neural responses (Note, we prefer not to call it “individual variability” as this could refer to variability within individuals, not variability across individuals.) It is a separate question of whether individual differences in neural responses bear some relation to individual differences in behavioral biases. That was the focus of this study, and our finding of a robust correlation between PC 2 of Ca++ responses and OCT-MCH preference indicates a relation. Because behavior and Ca++ were collected with an hours-to-day long gap, this implies that there are latent versions of both behavioral bias and Ca++ response that are stable on timescales at least that long.

      The statistical analyses in the manuscript are underdeveloped, and it's unclear the degree to which the correlations reported have explanatory (causative) power in accounting for organismal behavior.

      With respect, we do not think our statistical analyses are underdeveloped, though we acknowledge that the detailed reviewer suggestions included the helpful suggestion to include uncertainty in the estimation of confidence intervals around the point estimate of the strength of correlation between latent behavioral and Ca++ response states – we have added these for the PN PC2 linear model (lines 170-172).

      It is indeed a separate question whether the correlations we observed represent causal links from Ca++ to behavior (though our yoked experiment suggests there is not a behavior-to-Ca++ causal relationship — at least one where odor experience through behavior is an upstream cause). We attempted to be precise in indicating that our observations are correlations. That is why we used that word in the title, as an example. In the revision, we worked to ensure this is appropriately reflected in all word choice across the paper.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the Authors):

      Detailed comments: Many of the problems can be identified starting from Figure 4, which summarizes the main claims. I will focus on that figure and its tributaries.

      Acknowledging that the strength of several of our inferences are weak compared to what we consider the main result (the relationship between PC2 of Ca++ and OCT-MCH preference),we have removed Figure 4. This makes the focus of the paper much clearer and appropriately puts focus on the results that have strong statistical support.

      (1) The process of "inferring" correlation among the unobserved latent states for neural sensitivity and behavioral bias is unconventional and risky. The larger the assumed noise linking the latent to the observed variables (i.e. the smaller r_b and r_c) the bigger the inferred correlation rho from a given observed correlation R^2_cb. In this situation, the value of the inferred rho becomes highly dependent on what model one assumes that links latent to observed states. But the specific model drawn in Fig 4 suppl 1 is just one of many possible guesses. For example, models with nonlinear interactions could produce different inference.

      We agree with the reviewer’s notes of caution. To be clear, we do not intend for this analysis to be the main takeaway of the paper and have revised it to make this clear. The signal we are most confident in is the simple correlation between measured Ca++ PC2 and measured behavior. We have added more careful language saying that the attempt to infer the correlation between latent signals is one attempt at describing the data generation process (lines 166-172), and one possible estimate of an “underlying” correlation.

      (2) If one still wanted to go through with this inference process and set confidence bounds on rho, one needs to include all the uncertainties. Here the authors only include uncertainty in the value of R^2_c,b and they peg that at +/-20% (Line 1367). In addition there is plenty of uncertainty associated also with R^2_c,c and R^2_b,b. This will propagate into a wider confidence interval on rho.

      We have replaced the arbitrary +/- 20% window with bootstrapping the pairs of (predicted preference by PN PC2, measured preference) points and getting a bootstrap distribution of R2c,b, which is, not surprisingly, considerably wider. Still, we think there is some value in this analysis as the 90% CI of 𝜌signal under this model is 0.24-0.95. That is, including uncertainty about the R2b,b and R2c,c in the model still implies a significant relationship between latent calcium and behavior signals.

      (2.1) The uncertainty in R^2_cb is much greater than +/-20%. Take for example the highest correlation quoted in Fig 4: R^2=0.23 in the top row of panel A. This relationship refers to Fig 1L. Based on bootstrapping from this data set, I find a 90% confidence interval of CI=[0.002, 0.527]. That's an uncertainty of -100/+140%, not +/-20%. Moreover, this correlation is due entirely to the lone outlier on the bottom left. Removing that single fly abolishes any correlation in the data (R^2=0.04, p>0.3). With that the correlation of rho=0.64, the second-largest effect in Fig 4, disappears.

      We acknowledge that removal of the outlier in Fig 1L abolishes the correlation between predicted and measured OCT-AIR preference. We have thus moved that subfigure to the supplement (now Figure 1 – figure supplement 10B), note that we do not have robust statistical support of ORN PC1 predicting OCT-AIR preference in the results (lines 177-178), and place our emphasis on PN PC2’s capacity to predict OCT-MCH preference throughout the text.

      (2.2) Similarly with the bottom line of Fig 4A, which relies on Fig 1M. With the data as plotted, the confidence interval on R^2 is CI=[0.007, 0.201], again an uncertainty of -100/+140%. There are two clear outlier points, and if one removes those, the correlation disappears entirely (R^2=0.06, p=0.09).

      We acknowledge that removal of the two outliers in Fig 1M between predicted and measured OCT-AIR preference abolishes the correlation. We have also moved that subfigure to the supplement (now Figure 1 – figure supplement 10F) and do not claim to have robust statistical support of PN PC1 predicting OCT-AIR preference.

      (2.3) Similarly, the correlation R^2_bb of behavior with itself is weak and comes with great uncertainty (Fig 1 Suppl 1, panels B-E). For example, panel D figures prominently in computing the large inferred correlation of 0.75 between PN responses and OCT-MCH choice (Line 171ff). That correlation is weak and has a very wide confidence interval CI=[0.018, 0.329]. This uncertainty about R^2_bb should be taken into account when computing the likelihood of rho.

      We now include bootstrapping of the 3 hour OCT-MCH persistence data in our inference of 𝜌signal.

      (2.4) The correlation R^2_cc for the empirical repeatability of Ca signals seems to be obtained by a different method. Fig 4 suppl 1 focuses on the repeatability of calcium recording at two different time points. But Line 625ff suggests the correlation R^2_cc=0.77 all derives from one time point. It is unclear how these are related.

      Because our calcium model predictors utilize principal components of the glomerulus-odor responses (the mean Δf/f in the odor presentation window), we compute R2c,c through adding variance explained along the PCs, up to the point in which the component-wise variance explained does not exceed that of shuffled data (lines 609-620 in Materials and Methods). In this revision we now bootstrap the calcium data on the level of individual flies to get a bootstrap distribution of R2c,c, and propagate the uncertainty forward in the inference of 𝜌signal.

      (2.5) To summarize, two of the key relationships in Fig 1 are due entirely to one or two outlier points. These should not even be used for further analysis, yet they underlie two of the claims in Fig 4. The other correlations are weak, and come with great uncertainty, as confirmed by resampling. Those uncertainties should be propagated through the inference procedure described in Fig 4. It seems possible that the result will be entirely uninformative, leaving rho with a confidence interval that spans the entire available range [0,1]. Until that analysis is done, the claims of neuron-to-behavior correlation in this manuscript are not convincing.

      It is important to note that we never thought our analysis of the relationship between latent behavior and calcium signals should be interpreted as the main finding. Instead, the observed correlation between measured behavior and calcium is the take-away result. Importantly, it is also conservative compared to the inferred latent relationship, which in our minds was always a “bonus” analysis. Our revisions are now focused on highlighting the correlations between measured signals that have strong statistical support.

      As a response to these specific concerns, we have propagated uncertainty in all R2’s (calcium-calcium, behavior-behavior, calcium-behavior) in our new inference for 𝜌signal, yielding a new median estimate for PN PC 2 underlying OCT-MCH preference of 0.68, with a 90% CI of 0.24-0.95. (Lines 171-172 in results, Inference of correlation between latent calcium and behavior states section in Materials and Methods).

      (3) Other statistical methods:

      (3.1) The caption of Fig 4 refers to "model applied to train+test data". Does that mean the training data were included in the correlation measurement? Depending on the number of degrees of freedom in the model, this could have led to overfitting.

      We have removed Figure 4 and emphasize the key results in Figure 1 and 2 that we see statistically robust signal of PN PC 2 explaining OCT-MCH preference variation in both a training set and a testing set of flies (Fig 2 – figure supplement 1C-D).

      (3.2) Line 180 describes a model that performed twice as well on test data (31% EV) as it did on training data (15%). What would explain such an outcome? And how does that affect one's confidence in the 31% number?

      The test set recordings were conducted several weeks after the training set recordings, which were used to establish PN PC 2 as a correlate of OCT-MCH preference. The fact that the test data had a higher R2 likely reflects sampling error (these two correlation coefficients are not significantly different). Ultimately this gives us more confidence in our model, as the predictive capacity is maintained in a totally separate set of flies.

      (3.340 Multiple models get compared in performance before settling on one. For example, sometimes the first PC is used, sometimes the second. Different weighting schemes appear in Fig 2. Do the quoted p-values for the correlation plots reflect a correction for multiple hypothesis testing?

      For all calcium-behavior models, we restricted our analysis to 5 PCs, as the proportion of calcium variance explained by each of these PCs was higher than that explained by the respective PC of shuffled data — i.e., there were at most five significant PCs in that data. We thus performed at most 5 hypothesis tests for a given model. PN PC 2 explained 15% of OCT-MCH preference variation, with a p-value of 0.0063 – this p-value is robust to a conservative Bonferroni correction to the 5 hypotheses considered at alpha=0.05.

      The weight schemes in Figure 2 and Figure 1 – figure supplement 10 reflect our interpretations of the salient features of the PCs and are follow-up analysis of the single principal component hypothesis tests. Thus they do not constitute additional tests that should be corrected. We now state in the methods explicitly that all reported p-values are nominal (line 563).

      (3.4) Line 165 ff: Quoting rho without giving the confidence interval is misleading. For example, the rho for the presynaptic density model is quoted as 0.51, which would be a sizeable correlation. But in fact, the posterior on rho is almost flat, see caption of Fig 4 suppl 1, which lists the CI as [0.11, 0.85]. That means the experiments place virtually no constraint on rho. If the authors had taken no data at all, the posterior on rho would be uniform, and give a median of 0.5.

      We now provide a confidence interval around 𝜌signal for the PN PC 2 model (lines 170-172). But per above, and consistent with the new focus of this revision, we view the 𝜌signal inference as secondary to the simple, significant correlation between PN PC 2 and OCT-MCH preference.

      (4) As it stands now, this paper illustrates how difficult it is to come to a strong conclusion in this domain. This may be worth some discussion. This group is probably in a better position than any to identify what are the limiting factors for this kind of research.

      We thank the reviewer for this suggestion and have added discussion of the difficulties in detecting signals for this kind of problem. That said, we are confident in stating that there is a meaningful correlation between PC 2 of PN Ca++ responses and OCT-MCH behavior given our model’s performance in predicting preference in a test set of flies, and in the consistent signal in ORN Bruchpilot.

      Reviewer #3 (Recommendations for the Authors):

      Two major concerns, one experimental/technical and one conceptual:

      (1) I appreciate the difficulty of the experimental design and problem. However, the correlations reported throughout are based on neural measurements in only 5 glomeruli (~10% of the olfactory system) at early stages of olfactory processing.

      We acknowledge that only imaging 5 glomeruli is regrettable. We worked hard to develop image analysis pipelines that could reliably segment as many glomeruli as possible from almost all individual flies. In the end, we concluded that it was better to focus our analysis on a (small) core set of glomeruli for which we had high confidence in the segmentation. Increasing the number of analyzed glomeruli is high on the list of improvements for subsequent studies. Happily, we are confident that we are capturing a significant, biologically meaningful correlation between PC 2 of PN calcium (dominated by the responses in DC2 and DM2) and OCT-MCH preference.

      3-OCT and MCH activate many glomeruli in addition to the five studied, especially at the concentrations used. There is also limited odor-specificity in their response matrix: notably responses are more correlated in all glomeruli within an individual, compared to responses across individuals (they note this in lines 194-198, though I don't quite understand the specific point they make here). This is a sign of high experimental variability (typically the dynamic range of odor response within an individual is similar to the range across individuals) and makes it even more difficult to resolve underlying individual variation.

      We respectfully disagree with the reviewer’s interpretation here. There is substantial odor-specificity in our response matrix. This is evident in both the ORN and PN response matrices (and especially the PN matrix) as variation in the brightness across rows. Columns, which correspond to individuals, are more similar than rows, which correspond to odor-glomerulus pairs. The dynamic range within an individual (within a column, across rows) is indeed greater than the variation among individuals (within a row, across columns).

      As an (important) aside, the odor stimuli are very unusual in this study. Odors are delivered at extremely high concentrations (variably 10-25% sv, line 464, not exactly sure what "variably' means- is the stimulus intensity not constant?) as compared to even the highest concentrations used in >95% of other studies (usually <~0.1% sv delivered).

      We used these concentrations for a variety of reasons. First, following the protocol of Honegger and Smith (2020), we found that dilutions in this range produce a linear input-output relationship, i.e. doubling or halving one odorant yields proportionate changes in odor-choice behavior metrics. Second, such fold dilutions are standard for tunnel assays of the kind we used. Claridge-Chang et al. (2009) used 14% and 11% for MCH and OCT respectively, for instance. Finally, the specific dilution factor (i.e., within the range of 10-25%) was adjusted on a week-by-week basis to ensure that in an OCT-MCH choice, the mean preference was approximately 50%. This yields the greatest signal of individual odor preference. We have added this last point to the methods section where the range of dilutions is described (lines 442-445).

      A parsimonious interpretation of their results is that the strongest correlation they see (ORN PC1 predicts OCT v. air preference) arises because intensity/strength of ORN responses across all odors (e.g. overall excitability of ORNs) partially predicts behavioral avoidance of 3-OCT. However, the degree to which variation in odor-specific glomerular activation patterns can explain behavioral preference (3-OCT v. MCH) seems much less clear, and correspondingly the correlations are weaker and p-values larger for the 3-OCT v. MCH result.

      With respect, we disagree with this analysis. The correlation between ORN PC 1 and OCT v. air preference (R2 \= 0.23) is quite similar to that of PN PC 2 and OCT vs MCH preference (R2 \= 0.20). However, the former is dependent on a single outlying point, whereas the latter is not. The latter relationship is also backed up by the BRP imaging and modeling. Therefore in the revision we have de-emphasized the OCT v. air preference model and emphasized the OCT v. MCH preference models.

      (2) There is a broader conceptual concern about the degree of logical consistency in the authors' interpretation of how neural variability maps to behavioral variability. For instance, the two odors they focus on, 3-OCT and MCH, barely activate ORNs in 4 of the 5 glomeruli they study. Most of the correlation of ORN PC1 vs. behavioral choice for 3-OCT vs. air, then, must be driven by overall glomerular activation by other odors (but remains predictive since responses across odors appear correlated within an individual). This gives pause to the interpretation that 3-OCT-evoked ORN activity in these five glomeruli is the neural substrate for variability in the behavioral response to 3-OCT.

      Our interpretation of the ORN PC1 linear model is not that 3-OCT-evoked ORN activity is the neural substrate for variability – instead, it is the general responsiveness of an individual’s AL across multiple odors (this is our interpretation of the the uniformly positive loadings in ORN PC1). It is true that OCT and MCH do not activate ORNs as strongly as other odorants – our analysis rests on the loadings of the PCs that capture all odor/glomerulus combinations available in our data. All that said, since a single outlier in Figure 1L dominates the relationship, therefore we have de-emphasized these particular results in our revision.

      This leads to the most significant concern, which is that the paper does not provide strong evidence that odor-specific patterns of glomerular activation in ORNs and PNs underlie individual behavioral preference between different odors (that each drive significant levels of activity, e.g. 3-OCT v. MCH), or that the ORN-PN synapse is a major driver of individual behavioral variability. Lines 26-31 of the abstract are not well supported, and the language should be softened.

      We have modified the abstract to emphasize our confidence in PN calcium correlating with odor-vs-odor preference (removing the ORN & odor-vs-air language).

      Their conclusions come primarily from having correlated many parameters reduced from the ORN and PN response matrices against the behavioral data. Several claims are made that a given PC is predictive of an odor preference while others are not, however it does not appear that the statistical tests to support this are shown in the figures or text.

      For each linear model of calcium dynamics predicting preference, we restricted our analysis to the first 5 principal components. Thus, we do not feel that we correlated many parameters against the behavioral data. As mentioned below, the correlations identified by this approach comfortably survive a conservative Bonferroni correction. In this revision, a linear model with a single predictor – the projection onto PC 2 of PN calcium – is the result we emphasize in the text, and we report R2 between measured and predicted preference for both a training set of flies and for a test set of flies (Figure 1M and Figure 2 – figure supplement 1).

      That is, it appears that the correlation of models based on each component is calculated, then the component with the highest correlation is selected, and a correlation and p-value computed based on that component alone, without a statistical comparison between the predictive values of each component, or to account for effectively performing multiple comparisons. (Figure 1, k l m n o p, Figure 3, d f, and associated analyses).

      To reiterate, this was our process: 1) Collect a training data set of paired Ca++ recordings and behavioral preference scores. 2) Compute the first five PCs of the Ca++ data, and measure the correlation of each to behavior. 3) Identify the PC with the best correlation. 4) Collect a test data set with new experimental recordings. 5) Apply the model identified in step 3. For some downstream analyses, we combined test and training data, but only after confirming the separate significance of the training and test correlations.

      The p-values associated with the PN PC 2 model predicting OCT-MCH preference are sufficiently low in each of the training and testing sets (0.0063 and 0.0069, respectively) to pass a conservative Bonferroni multiple hypothesis correction (one hypothesis for each of the 5 PCs) at an alpha of 0.05.

      Additionally, the statistical model presented in Figure 4 needs significantly more explanation or should be removed- it's unclear how they "infer" the correlation, and the conclusions appears inconsistent with Figure 3 - Figure Supplement 2.

      We have removed Figure 4 and have improved upon our approach of inferring the strength of the correlation between latent calcium and behavior in the Methods, incorporating bootstrapping of all sources of data used for the inference (lines 622-628). At the same time, we now emphasize that this analysis is a bonus of sorts, and that the simple correlation between Ca++ and behavior is the main result.

      Suggestions:

      (1) If the authors want to make the claim that individual variation in ORN or PN odor representations (e.g. glomerular activation patterns) underlie differences in odor preference (MCH v. OCT), they should generalize the weak correlation between ORN/PN activity and behavior to additional glomeruli and pair of odors, where both odors drive significant activity. Otherwise, the claims in the abstract should be tempered.

      We have modified the abstract to focus on the effect we have the highest confidence in: contrasting PN calcium activation of DM2 and DC2 predicting OCT-MCH preference.

      (2) One of the most valuable contributions a study like this could provide is to carefully quantify the amount of measurement variation (across trials, across hemispheres) in neural responses relative to the amount of individual variation (across individuals). Beyond the degree of variation in the amplitude of odor responses, the rank ordering of odor response strength between repeated measurements (to try to establish conditions that account for adaptation, etc.), between hemispheres, and between individuals is important. Establishing this information is foundational to this entire field of study. The authors take a good first step towards this in Figure 1J and Figure 1, supplement 5C, but the plots do not directly show variance, and the comparison is flawed because more comparisons go into the individual-individual crunch (as evidenced by the consistently smaller range of quartiles). The proper way to do this is by resampling.

      We do not know what the reviewer means by “individual-individual crunch,” unfortunately. Thus, it is difficult to determine why they think the analysis is flawed. We are also uncertain about the role of resampling in this analysis. The medians, interquartile ranges and whiskers in the panels referenced by the reviewer are not confidence intervals as might be determined by bootstrap resampling. Rather, these are direct statistics on the coding distances as measured – the raw values associated with these plots are visualized in Figure 1H.

      In our revision we updated the heatmaps in Figure 1 – figure supplement 3 to include recordings across the lobes and trials of each individual fly, and we have added a new supplementary figure, Figure 1 – figure supplement 4, to show the correspondence between recordings across lobes or trials, with associated rank-order correlation coefficients. Since the focus of this study was whether measured individual differences predict individual behavioral preference, a full characterization of the statistics of variation in calcium responses was not the focus, though it was the focus of a previous study (Honegger & Smith et al., 2019).

      To help the reader understand the data, we would encourage displaying data prior to dimensionality reduction - why not show direct plots of the mean and variance of the neural responses in each glomerulus across repeats, hemispheres, individuals?

      We added a new supplementary figure, Figure 1 – figure supplement 4, to show the correspondence between recordings across lobes or trials.

      A careful analysis of this point would allow the authors to support their currently unfounded assertion that odor responses become more "idiosyncratic" farther from the periphery (line 135-36); presumably they mean beyond just noise introduced by synaptic transmission, e.g. "idiosyncrasy" is reproducible within an individual. This is a strong statement that is not well-supported at present - it requires showing the degree of similarity in the representation between hemispheres is more similar within a fly than between flies in PNs compared to ORNs (see Hige... Turner, 2015).

      Here are the lines in question: “PN responses were more variable within flies, as measured across the left and right hemisphere ALs, compared to ORN responses (Figure 1 – figure supplement 5C), consistent with the hypothesis that odor representations become more idiosyncratic farther from the sensory periphery.”

      That responses are more idiosyncratic farther from the periphery is therefore not an “unfounded assertion.” It is clearly laid out as a hypothesis for which we can assess consistency in the data. We stand by our original interpretation: that several observations are consistent with this finding, including greater distance in coding space in PNs compared to ORNs, particularly across lobes and across flies. In addition, higher accuracy in decoding individual identity from PN responses compared to ORN responses (now appearing as Figure 1 – figure supplement 6A) is also consistent with this hypothesis.

      Still, to make confusion at this sentence less likely, we have reworded it as “suggesting that odor representations become more divergent farther from the sensory periphery.” (lines 139-140)

      (3) Figure 3 is difficult to interpret. Again, the variability of the measurement itself within and across individuals is not established up front. Expression of exogenous tagged brp in ORNs is also not guaranteed to reflect endogenous brp levels, so there is an additional assumption at that level.

      Figure 3 – figure supplement 1 Panels A-C display the variability of measurements (Brp volume, total fluorescence and fluorescence density) both within (left/right lobes) and across individuals (the different data points). We agree that exogenous tagged Brp levels will not be identical to endogenous levels. The relationship appears significant despite this caveat.

      Again there are statistical concerns with the correlations. For instance, the claim that "Higher Brp in DM2 predicted stronger MCH preference... " on line 389 is not statistically supported with p<0.05 in the ms (see Figure 3 G as the closest test, but even that is a test of the difference of DM2 and DC2, not DM2 alone).

      We have changed the language to focus on the pattern of the loadings in PC 2 of Brp-Short density and replaced “predict.” (lines 366-369).

      Can the authors also discuss what additional information is gained from the expansion microscopy in the figure supplement, and how it compares to brp density in DC2 using conventional methods?

      The expansion microscopy analysis was an attempt to determine what specific aspect of Brp expression was predictive of behavior, on the level of individual Brp puncta, as a finer look compared to the glomerulus-wide fluorescence signal in the conventional microscopy approach. Since this method did not yield a large sample size, at best we can say it provided evidence consistent with the observation from confocal imaging that Brp fluorescent density was the best measure in terms of predicting behavior.

      I would prefer to see the calcium and behavioral datasets strengthened to better establish the relationship between ORN/PN responses and behavior, and to set aside the anatomical dataset for a future work that investigates mechanisms.

      We are satisfied that our revisions put appropriate emphasis on a robust result relating calcium and behavior measurements: the relationship between OCT-MCH preference and idiosyncratic PN calcium responses. Finding that idiosyncratic Brp density has similar PC 2 loadings that also significantly predict behavior is an important finding that increases confidence in the calcium-behavior finding. We agree with the reviewer that these anatomical findings are secondary to the calcium-behavior analyses, but think they warrant a place in the main findings of the study. As the reviewer suggests, we are conducting follow-on studies that focus on the relationship between neuroanatomical measures and odor preference.

      (4) The mean imputation of missing data may have an effect on the conclusions that it is possible to draw from this dataset. In particular, as shown in Figure 1, supplemental figure 3, there is a relatively large amount of missing data, which is unevenly distributed across glomeruli and between the cell types recorded from. Strikingly, DC2 is missing in a large fraction of ORN recordings, while it is present in nearly all the PN recordings. Because DC2 is one of the glomeruli implicated in predicting MCH-OCT preference, this lack of data may be particularly likely to effect the evaluation of whether this preference can be predicted from the ORN data. Overall, mean imputation of glomerulus activity prior to PCA will artificially reduce the amount of variance contributed by the glomerulus. It would be useful to see an evaluation of which results of this paper are robust to different treatments of this missing data.

      We confirmed that the linear model of predicted OCT-MCH using PN PC2 calcium was minimally altered when we performed imputation via alternating least squares using the pca function with option ‘als’ to infill missing values on the calcium matrix 1000 times and taking the mean infilled matrix (see MATLAB documentation and Figure 1 – figure supplement 5 of Werkhoven et al., 2021). Fitted slope value for model using mean-infilled data presented in article: -0.0806 (SE = 0.028, model R2 \= 0.15), fitted slope value using ALS-imputed model: -0.0806 (SE 0.026, model R2 \= 0.17).

      Additional comments:

      (1) On line 255 there is an unnecessary condition: "non-negative positive".

      Thank you – non-negative has been removed.

      (2) In Figure 4 and the associated analysis, selection of +/- 20% interval around the observed $R^2$ appears arbitrary. This could be based on the actual confidence interval, or established by bootstrapping.

      We have replaced the +/- 20% rule by bootstrapping the calculation of behavior-behavior R2, calcium-calcium R2, and calcium-behavior R2 and propagating the uncertainties forward (Inference of correlation between latent calcium and behavior states section in Materials and Methods).

      (3) On line 409 the claim is made "These sources of variation specifically implicate the ORN-PN synapse..." While the model recapitulates the glomerulus specific variation of activity under PN synapse density variation, it also occurs under ORN identity variation, which calls into question whether the synapse distribution itself is specifically implicated, or if any variation that is expected to be glomerulus specific would be equally implicated.

      We agree with this observation. We found that varying either the ORNs or the PNs that project to each glomeruli can produce patterns of PN response variation similar to what is measured experimentally. This is consistent with the idea that the ORN-PN synapse is a key site of behaviorally-relevant variation.

      (4) Line 214 "... we conclude that the relative responses of DM2 vs DC2 in PNs largely explains an individual's preference." is too strong of a claim, based on the fact that using the PC2 explains much more of the variance, while using the stated hypothesis noticeable decreases the predictive power ($R^2$ = 0.2 vs $R^2$ = 0.12 )

      We have changed the wording here to “we conclude that the relative responses of DM2 vs DC2 in PNs compactly predict an individual’s preference.” (lines 192-193)

    1. Author response:

      Reviewer #1:

      We thank the reviewer for recognizing the impact of our work on the pivotal roles of N-glycan-dependent ERQC in cellular fitness and pathogenicity and providing valuable comments to be considered to improve the manuscript. As suggested, we will rearrange data, reduce text volume, and discuss the possibility of how ERQC mutation decreases EV secretion without significant defect in conventional secretion. Regarding the proteomics data, we have already initiated a comparative analysis of total intracellular and EV-associated proteins to determine whether the reduced cargo loading in the Ugg1 mutant is specific to EV-associated proteins. Additionally, we may extend the analysis to include total secretion, enabling a clearer comparison between classical secretion and EV-mediated secretion to better evaluate the extent of classical secretion defects in the Ugg1 mutant.

      Reviewer #2:

      We sincerely thank the reviewer for the positive evaluation of our work. As recommended, we will reduce the text and reorganize the data to enhance the manuscript's readability.

      Reviewer #3:

      We sincerely thank the reviewer for the high appreciation of our work. As recommended, we will provide a more detailed explanation of the results with improved interpretation, strongly grounded on the obtained data.

    1. Author response:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The assertion that membrane trafficking is impaired by this variant could be bolstered by additional data.

      We agree with this comment and will perform additional analysis and experiments to support the assertion that membrane trafficking is impaired. As noted by the Reviewers, standard biochemical approaches to obtain such data may be challenging due to the fact that Kv3.1 is expressed in only a subset of cells and that we do not have a Kv3.1-A421V specific antibody.

      (2) In some experiments details such as the age of the mice or cortical layer are emphasized, but in others, these details are omitted.

      We appreciate that the Reviewer has noted this omission. We will include such details in the resubmission.

      (3) The impairments in PV neuron AP firing are quite large. This could be expected to lead to changes in PV neuron activity outside of the hypersynchronous discharges that could be detected in the 2-photon imaging experiments, however, a lack of an effect on PV neuron activity is only loosely alluded to in the text. A more formal analysis is lacking. An important question in trying to understand mechanisms underlying channelopathies like KCNC1 is how changes in membrane excitability recorded at the whole cell level manifest during ongoing activity in vivo. Thus, the significance of this work would be greatly improved if it could address this question.

      Yes, the impairments in neocortical PV-IN excitability are more marked than any other PV interneuronopathy that we have studied. We will include a more extensive analysis of the 2-photon imaging data in the resubmission. However, there are limitations to the inferences that can be made as to firing patterns based on 2-photon calcium imaging data, particularly for interneurons.

      (4) Myoclonic jerks and other types of more subtle epileptiform activity have been observed in control mice, but there is no mention of littermate control analyzed by EEG.

      We did not observe myoclonic jerks in control mice. This data will be included in the resubmission.

      Reviewer #2 (Public review):

      Weaknesses:

      In some experiments, the age of the animal in each experiment is not clearly stated. For example, the experiments in Figure 2 demonstrate impaired K+ conductance and membrane localization, but it is not clear whether they correlated with the excitability and synaptic defects shown in subsequent figures. Similarly, it is unclear how old mice the authors conducted EEG recordings, and whether non-epileptic mice are younger than those with seizures.

      We will include explicit information as to the age of the animals used for each experiment in the resubmission.

      The trafficking defect of mutant Kv3.1 proposed in this study is based only on the fluorescence density analysis which showed a minor change in membrane/cytosol ratio. It is not very clear how the membrane component was determined (any control staining?). In addition to fluorescence imaging, an addition of biochemical analysis will make the conclusion more convincing (while it might be challenging if the Kv3.1 is expressed only in PV+ cells).

      We will include additional information in the Methods section as to how the membrane component was determined in a revised version of the manuscript. We agree with Reviewer #2 regarding the limitations in the ability to further evaluate this.

      While the study focused on the superficial layer because Kv3.1 is the major channel subunit, the PV+ cells in the deeper cortical layer also express Kv3.1 (Chow et al., 1999) and they may also contribute to the hyperexcitable phenotype via negative effect on Kv3.2; the mutant Kv3.1 may also block membrane trafficking of Kv3.1/Kv3.2 heteromers in the deeper layer PV cells and reduce their excitability. Such an additional effect on Kv3.2, if present, may explain why the heterozygous A421V KI mouse shows a more severe phenotype than the Kv3.1 KO mouse (and why they are more similar to Kv3.2 KO). Analyzing the membrane excitability differences in the deep-layer PV cells may address this possibility.

      We will include recordings from PV-INs in deeper layers of the neocortex in the revised version of the manuscript, as requested.

      In Table 1, the A421V PV+ cells show a depolarized resting membrane potential than WT by ~5 mV which seems a robust change and would influence the circuit excitability. The authors measured firing frequency after adjusting the membrane voltage to -65mV, but are the excitability differences less significant if the resting potential is not adjusted? It is also interesting that such a membrane potential difference is not detected in young adult mice (Table 2). This loss of potential compensation may be important for developmental changes in the circuit excitability. These issues can be more explicitly discussed.

      We will include a more thorough discussion of this finding in the revised version of the manuscript. However, we do not completely understand this finding. It could be compensatory, as suggested by the Reviewer; however, it is transient and seems to be an isolated finding (i.e., there does not appear to be parallel “compensation” in other properties). Alternatively, it could be that impaired excitability of the Kcnc1-A421V/+ PV-INs may reflect impaired/delayed development, which itself is known to be activity-dependent.

      Reviewer #3 (Public review):

      Weaknesses:

      The manuscript identifies a partial mechanism of disease that leaves several aspects unresolved including the possible role of the observed impairments in thalamic neurons in the seizure mechanism. Similarly, while the authors identify a reduction in potassium currents and a reduction in PV cell surface expression of Kv3.1 it is not clear why these impairments would lead to a more severe disease phenotype than other loss-of-function mutations which have been characterized previously. Lastly, additional analysis of video-EEG data would be helpful for interpreting the extent of the seizure burden and the nature of the seizure types caused by the mutation.

      We agree with this comment. We studied neurons in the reticular thalamus as these cells are known to express Kv3.1 and are linked to epilepty pathogenesis. Yet, we focused on neocortical PV-INs over other Kv3.1-expressing neurons such as neurons of the reticular thalamus because we evaluated the impairments of intrinsic excitability to be more profound in neocortical PV-INs. Cross of Kcnc1-Flox(A421V)/+ mice to a cerebral cortex interneuron-specific driver that would avoid recombination in thalamus – such as Ppp1r2-Cre (RRID:IMSR_JAX:012686) – could assist in determining the relative contribution of thalamic reticular nucleus dysfunction to the overall phenotype, as performed by Makinson et al (2017) to address a similar question. There are of course other Kv3.1-expressing neurons in the brain, including in GABAergic interneurons in hippocampus and amygdala. We will include additional discussion in a revised version of the manuscript as to why we think there is more severe impairment in our Kcnc1-Flox(A421V)/+ mice relative to Kv3.1 and Kv3.2 knockout mice. We will include additional data on the epilepsy phenotype in the revised version of the manuscript, as requested.

    1. Author response:

      We thank the Dr. Ealand and Reviewers for their thoughtful comments on our submitted manuscript. We are in the process of revising our manuscript in light of the comments received, outlined below.

      In addition to the requested revisions, we have new data with M. tuberculosis strain H37Rv +/- gidB deletion (and complementation), confirming that deletion of gidB sensitizes the strain to rifampicin, and extending our findings to pathogenic tuberculosis. This will also be incorporated into the revised manuscript.

      Reviewer #1:

      (1) The structural work at the end feels like both an afterthought in terms of the science and the writing. I would suggest re-writing that section to be clearer about what the figure says and does not say. For example, the caption of Figure 6 appears to be more informative than the text and refers to concepts not present in the main text. In general, I found this section to be the most difficult to understand.

      We are rewriting this section to make it more coherent with the rest of the manuscript.

      (2) "delta-gidB" is written out in the caption of Figure 6. Line 234: gidB not italics.

      Thank you, these changes will be incorporated in the revised manuscript.

      Reviewer #2:

      (1) It would be essential to provide information regarding the growth rate and, ideally, translation rates in the gidB KO and the isogenic WT. As translation balances accuracy and speed, only characterising the speed is not sufficient to understand the phenomenon.

      We are performing these assays and will incorporate them in the revised manuscript.

      (2) Cryo-EM analysis of vacant 70S ribosomes is not sufficient for understanding the mechanisms underlying the accuracy defects in the gidB KO. One should assemble and solve structurally near-cognate and non-cognate complexes. I believe the authors are over-interpreting the scant structural data they have. Furthermore, current representation makes it impossible to assess the resolution of the structure, especially in the areas of interest.

      While we agree with the Reviewer that structures of translating ribosomes will be most informative in elucidating the molecular mechanism(s) by which methylation (or not) by GidB contributes to mistranslation, those experiments are ongoing and beyond the scope of the current study. Unlike E. coli ribosomes, for which there are a plethora of structures for mutants available, there are very structures of mycobacterial ribosomes beyond wild-type apo ribosomes. Therefore we feel that the structures of apo mycobacterial ribosomes +/- GidB-mediated methylation are still of value, and a necessary “first step” for the mechanistic work alluded to above. Secondly, the apo ribosome structures still hint at potential mechanisms by which mistranslation and 16S rRNA methylation may impact on each other – as in the comments to R#1 above, we are revising the text to increase clarity and coherence of this section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors follow up on their published observation that providing a lower glucose parental nutrition (PN) reduces sepsis from a common pathogen [Staphylococcus epidermitis (SE)] in preterm piglets. Here they found that a higher dose of glucose could thread the needle and get the protective effects of low glucose without incurring significant hypoglycemia. They then investigate whether the change in low glucose PN impacts metabolism to confer this benefit. The finding that lower glucose reduces sepsis is important as sepsis is a major cause of morbidity and mortality in preterm infants, and adjusting PN composition is a feasible intervention.

      Strengths:

      (1) They address a highly significant problem of neonatal sepsis in preterm infants using a preterm piglet model.

      (2) They have compelling data in this paper (and in a previous publication, ref 27) that low glucose PN confers a survival advantage. A downside of the low glucose PN is hypoglycemia which they mitigate in this paper by using a slightly high amount of glucose in the PN.

      (3) The experiment where they change PN from high to low glucose after infection is very important to determine if this approach might be used clinically. Unfortunately, this did not show an ability to reduce sepsis risk with this approach. Perhaps this is due to the much lower mortality in the high glucose group (~20% vs 87% in the first figure).

      (4) They produce an impressive multiomics data set from this model of preterm piglet sepsis which is likely to provide additional insights into the pathogenesis of preterm neonatal sepsis.

      Weaknesses:

      (1) The high glucose control gives very high blood glucose levels (Figure 1C). Is this the best control for typical PN and glucose control in preterm neonates? Is the finding that low glucose is protective or high glucose is a risk factor for sepsis?

      This work is a follow-up from our previous work where we explored different PN glucose regimens. Taken together our experiments heavily imply that glucose provision is associated to severity in a seemingly linear manner. In the clinical setting, there is no fixed glucose provision, but guidelines specify ranges that are acceptable. However, these guidelines do not take possible infections into account and are designed to optimize growth outcomes. Increased provision of glucose to preterm neonates may therefore increase their infection risk, but parenteral glucose cannot be entirely avoided as it would lead to hypoglycaemia and associated brain damage. In the present paper the reduced glucose PN reflects the lowest end of the recommended PN glucose intake. More work is needed to figure out the best glucose provision to infected preterm newborns, balancing positive and negative factors.

      (2) In Figure 1B, preterm piglets provided the high glucose PN have 13% survival while preterm piglets on the same nutrition in Figure 6B have ~80% survival. Were the conditions indeed the same? If so, this indicates a large amount of variation in the outcome of this model from experiment to experiment.

      In the follow-up experiment outlined in Figure 6 we reduced the follow-up time to 12 hours in an effort to minimize the suffering of the animals. We did this because we could detect relevant differences in the immune response between High and low glucose infected pigs as 12 hours. If we had extended the follow-up experiment to 22 hours we would likely have seen a much increased mortality.

      (3) Piglets on the low glucose PN had consistently lower density of SE (~1 log) across all time points. This may be due to changes in immune response leading to better clearance or it could be due to slower growth in a lower glucose environment.

      We agree with this assessment and have adjusted our result section to reflect this.

      (4) Many differences in the different omics (transcriptomics, metabolomics, proteomics) were identified in the SE-LOW vs SE-HIGH comparison. Since the bacterial load is very different between these conditions, could the changes be due to bacterial load rather than metabolic reprogramming from the low glucose PN?

      We analyzed the relationship between bacterial burdens and mortality and found that it did not correlate within each of the treatment groups. We have now added this data to the results section as supplemental and report this fact in the section called “Reduced glucose supply increases hepatic OXPHOS and gluconeogenesis and attenuates inflammatory pathways”. This finding inspired us to further explore the relationship between bacterial burdens and infection responses in our model which has resulted in our recent preprint: Wu et at. Regulation of host metabolism and defense strategies to survive neonatal infection. BioRxiv 2024.02.23.581534; doi: https://doi.org/10.1101/2024.02.23.581534

      Reviewer #2 (Public Review):

      Summary:

      The authors demonstrate that a low parenteral glucose regimen can lead to improved bacterial clearance and survival from Staph epi sepsis in newborn pigs without inducing hypoglycemia, as compared to a high glucose regimen. Using RNA-seq, metabolomic, and proteomic data, the authors conclude that this is primarily mediated by altered hepatic metabolism.

      Strengths:

      Well-defined controls for every time point, with multiple time points and biological replicates. The authors used different experimental strategies to arrive at the same conclusion, which lends credibility to their findings. The authors have published the negative findings associated with their study, including the inability to reverse sepsis-related mortality after switching from SE-high to SE-low at 3h or 6h and after administration of hIAIP.

      Weaknesses:

      (1) The authors mention, and it is well-known, that Staph epi is primarily involved in late-onset sepsis. The model of S. epi sepsis used in this study clearly replicates early-onset sepsis, but S. epi is extremely rare in this time period. How do the authors justify the clinical relevance of this model?

      The distinction between early and late onset sepsis makes sense clinically because they are likely to be caused by different organisms and therefore require different empirical antibiotic regimes. Early onset sepsis is caused by organisms transferred perinatally often following chorioamnionitis or uro-gential maternal infections (Strep. agalacticae/E. coli) whereas Late onset sepsis is likely caused by organisms from indwelling catheters or mucosal surfaces, most often coagulase negative staphylococci. Timing of an infection after birth of course plays a role, but the virulence factors of the pathogen probably plays a large role in shaping the immune response. Therefore, even though the infection in our model is initiated on the first day after birth, the organism that we use, Staph epidermidids, makes it a better model for pathogenesis of late onset sepsis. However, it is also important to acknowledge that the pathophysiology of “sepsis” may be similar despite timing and pathogen and depends on the degree of immune activation and downstream effects on organs.

      (2) The authors find that the neutrophil subset of the leukocyte population is diminished significantly in the SE-low and SE-high populations. However, they conclude on page 10 that "modulations of hepatic, but not circulating immune cell metabolism, by reduced glucose supply..." and this is possible because the authors have looked at the entire leukocyte transcriptome. I am curious about why the authors did not sequence the neutrophil-specific transcriptome.

      We collected the whole blood transcript during the experiments, which reflect the transcription profile of all the circulating leucocytes. Since we did not do single cell RNA sequencing during the experiment there is no possibility of isolating the neutrophil transcriptome at this time. Your point however is valid and we will reconsider incorporating single cell transcriptomics in future experiments.

      (3) The authors use high (30g/k/d) and low (7.2g/k/d) glucose regimens. These translate into a GIR of 21 and 5 mg/k/min respectively. A normal GIR for a preterm infant is usually 5-8, and sometimes up to 10. Do the authors have a "safe GIR" or a threshold they think we cannot cross? Maybe a point where the metabolism switch takes place? They do not comment on this, especially as GIR and glucose levels are continuous variables and not categorical.

      Our reduced glucose PN was chosen as it corresponded with the low end of recommended guidelines for PN glucose intake. There likely is not a “safe GIR” as the clinical responses to glucose intake during infections do not seem binary but increase with glucose intake. It is also important to remember that the reduced glucose intervention still resulted in significant morbidity and a 25% mortality within 22 hours. There is therefore still vast room for improvement, but even though further reduction in PN glucose would probably provide further protection it would entail dangerous hypoglycaemia (as described in our previous paper). The findings in this current paper has prompted us to explore several strategies to replace parenteral glucose with alternative macronutrients. Thus, the optimal PN for infected newborns would probably differ from standard PN in all macronutrients and will require much more pre- and clinical research.

      (4) In Figures 2B and C the authors show that SE-high and SE-low animals have differences in the oxphos, TCA, and glycolytic pathways. The authors themselves comment in the Supplementary Table S1B, E-F that these same metabolic pathways are also different in the Con-Low and Con-high animals, it is just the inflammatory pathways that are not different in the non-infected animals. How can they then justify that it is these metabolic pathways specifically which lead to altered inflammatory pathways, and not just the presence of infection along with some other unfound mechanism?

      It is to be expected that the inflammatory pathways do not differ between the Con-Low and Con-High groups as there is no infection to induce these pathways. The identified metabolic pathways that differ between SE-High and SE-Low animals seem to us the best explanation of the differences in clinical phenotype.

      (5) The authors mention in Figure 1F that SE-low animals had lower bacterial burdens than SE-high animals, but then go on to infer that the inflammatory cytokine differences are attributed to a rewiring of the immune response. However, they have not normalized the cytokine levels to the bacterial loads, as the differences in the cytokines might be attributed purely to a difference in bacterial proliferation/clearing.

      Please see our response to reviewer #1

      (6) The authors mention that switching from SE-high to SE-low at 3 or 6 h time points does not reduce mortality. Have the authors considered the reverse? Does hyperglycemia after euglycemia initially, worsen mortality? That would really conclude that there is some metabolic reprogramming happening at the very onset of sepsis and it is a lost battle after that.

      A very good point that we have not explored yet, we have added this consideration to the discussion and slightly amended our conclusions of this follow-up experiment.

      Reviewer #3 (Public Review):

      Summary:

      Baek and colleagues present important follow-up work on the role of serum glucose in the management of neonatal sepsis. The authors previously showed high glucose administration exacerbated neonatal sepsis, while strict glucose control improved outcomes but caused hypoglycemia. In the current report they examined the effect of a more tailored glucose management approach on outcomes and examined hepatic gene expression, plasma metabolome/proteome, blood transcriptome, as well as the the therapeutic impact of hIAIP. The authors leverage multiple powerful approaches to provide robust descriptive accounts of the physiologic changes that occur with this model of sepsis in these various conditions. Strengths:

      (1) Use of preterm piglet model.

      (2) Robust, multi-pronged approach to address both hepatic and systemic implications of sepsis and glucose management.

      (3) Trial of therapeutic intervention - glucose management (Figure 6), hIAIP (Figure 7).

      Weaknesses:

      (1) The translational role of the model is in question. CONS is rarely if ever a cause of EOS in preterm neonates. The model. uses preterm pigs exposed at 2 hours of age. This model most likely replicates EOS.

      Please see our response to Reviewer #2

      (2) Throughout the manuscript it is difficult to tell from which animals the data are derived. Given the ~90% mortality in the experimental CONS group, and 25% mortality in the intervention group, how are the data from animals "at euthanasia" considered? Meaning - are data from survivors and those euthanized grouped together? This should be clarified as biologically these may be very different populations (ie, natural survivor vs death).

      This is a very valid point. For all endpoints that are analyzed “at euthanasia” the age of the animal will vary. Some will have been euthanized early due to clinical deterioration and some will have survived all the way to the end of the experiment. This needs to be kept in mind when interpreting the results. We have further highlighted this point in the discussion and made it clear to the reader at what time-point each analysis was performed.

      (3) With limited time points (at euthanasia ) for hepatic transcriptomics (Figure 2), plasma metabolite (Figure 3) blood transcriptome (Figure 4), and plasma proteome (Figure 5) it is difficult to make conclusions regarding mechanisms preceding euthanasia. Per methods, animals were euthanized with acidosis or clinical decompensation. Are the reported findings demonstrative of end-organ failure and deterioration leading to death, or reflective of events prior?

      Yes, all organ specific endpoints are snapshots of the state of the animals at the time of euthanasia, pooling together animals that succumbed to sepsis and those that survived to 22 hours post infection. These results therefore reflect the end-state of the infection we cannot be sure when the differences between groups manifested themselves. However, given the stark differences in plasma lactate at 12 hours post infection it is likely that changes to metabolism occurred before most of animals succumbed to sepsis.

      We agree this is a weakness in our model, but we have since published a pre-print where we have further explored how metabolic adaptations shape the fate of similarly infected preterm pigs: BioRxiv 2024.02.23.581534; doi: https://doi.org/10.1101/2024.02.23.581534

      (4) Data are descriptive without corresponding "omics" from interventions (glucose management and/or hIAIP) or at least targeted assessment of key differences.

      We only did in-depth analysis of the glucose intervention as this showed the most promising clinical effects that warranted further in-depth investigation. It is possible that further insights could be gained from in-depth analysis of the other interventions but given that there were no obvious clinical befits we refrained from that.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I am intrigued that mortality was not correlated to bacterial burden. Please provide the "data not shown" as this would help the reader understand better whether the difference in bacterial burden is driving the phenotypes and findings of the low glucose group.

      We have added this data to supplementary figure 1.  

      Reviewer #2 (Recommendations For The Authors):

      (1) I would urge the authors to consider a neutrophil-specific transcriptomic analysis. I understand that this would add significantly to the resubmission process. If the authors wish to include that as a future direction instead, they need to specifically mention the limitations of whole blood transcriptomics and how different immune cell types react differently to bacterial antigens.

      We agree with your considerations but we cannot include that data using the whole blood method applied in the experiment. We have added your consideration to the discussions.

      (2) I urge the authors to remove any impression that this is a model of late-onset sepsis, which is implied from the introduction, lines 3 and 4.

      Our intention was not to directly suggest that our model is a perfect reflection of late-onset sepsis but rather to highlight the relevance of using a pathogen commonly associated with LOS. We believe our model primarily captures the effects of intense pro-inflammatory immune activation, which may have parallels with various forms of sepsis, including LOS.

      Reviewer #3 (Recommendations For The Authors):

      Drawing on the robust nature of your "omics", identify key measures and test whether they are altered earlier in the development of clinical sepsis. Test whether these are altered by the intervention.

      A very valid point, at the moment it is not possible for us to explore this within the confines of these experiments. But, building upon these findings and the ones in our recent preprint we are confident that shifts in hepatic ratio of Oxidative phosphorylation and gluconeogenesis vs glycolysis shape the immune response to infections in neonates. In our upcoming experiments we are planning to incorporate plasma metabolomics at earlier timepoints to monitor when shifts in metabolism occur. However, given the heterogeneity of pigs, as opposed to inbred rodent models, sacrificing animals at fixed timepoints to gauge their organ function will be hard to interpret as it is impossible to know what the end state of the particular animal would have been. Therefore longitudinal sampling of liver tissue, during the course of infection would be challenging.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In "Drift in Individual Behavioral Phenotype as a Strategy for Unpredictable Worlds," Maloy et al. (2024) investigate changes in individual responses over time, referred to as behavioral drift within the lifespan of an animal. Drift, as defined in the paper, complements stable behavioral variation (animal individuality/personality within a lifetime) over shorter timeframes, which the authors associate with an underlying bet-hedging strategy. The third timeframe of behavioral variability that the authors discuss occurs within seasons (across several generations of some insects), termed "adaptive tracking." This division of "adaptive" behavioral variability over different timeframes is intuitively logical and adds valuable depth to the theoretical framework concerning the ecological role of individual behavioral differences in animals.

      Strengths:

      While the theoretical foundations of the study are strong, the connection between the experimental data (Figure 1) and the modeling work (Figure 2-4) is less convincing.

      Weaknesses:

      In the experimental data (Figure 1), the authors describe the changes in behavioral preferences over time. While generally plausible, I identify three significant issues with the experiments:

      (1) All of the subsequent theoretical/simulation data is based on changing environments, yet all the experiments are conducted in unchanging environments. While this may suffice to demonstrate the phenomenon of behavioral instability (drift) over time, it does not properly link to the theory-driven work in changing environments. An experiment conducted in a changing environment and its effects on behavioral drift would improve the manuscript's internal consistency and clarify some points related to (3) below.

      In our framework, we posit that the amount of drift has been shaped by evolution to maximize fitness in the environments that the population has experienced, and this drift is observed independent of environment. While we agree that exploring the role of changing environments on the measure of drift would be interesting, we would anticipate the effects may be nuanced and beyond the scope of the current paper (and the scope of our theoretical work, which assumes that the individual phenotype is unaffected by change of environment except as mediated by death due to fitness effects). For example, it would be difficult to differentiate drift from idiosyncratic differences in learning (Smith et al., 2022), and non-adaptive plasticity to unrelated cues has been posited as a method of producing diverse phenotypes (Maxwell and Magwene, 2017), so “learning” to uncorrelated stimuli could conceivably be a mechanism for drift. Given the scope of the current study, we prioritized eliminating potential confounds for measuring drift, but remain interested in the interaction between learning and drift.

      (2) The temporal aspect of behavioral instability. While the analysis demonstrates behavioral instability, the temporal dynamics remain unclear. It would be helpful for the authors to clarify (based on graphs and text) whether the behavioral changes occur randomly over time or follow a pattern (e.g., initially more right turns, then more left turns). A proper temporal analysis and clearer explanations are currently missing from the manuscript.

      We agree it would be helpful to have more description of the dynamics over time aside from the power spectrum and autoregressive model fits. We hope to address this in more detail to provide more description of the changes over time in a revision.

      (3) The temporal dimension leads directly into the third issue: distinguishing between drift and learning (e.g., line 56). In the neutral stimuli used in the experimental data, changes should either occur randomly (drift) or purposefully, as in a neutral environment, previous strategies do not yield a favorable outcome. For instance, the animal might initially employ strategy A, but if no improvement in the food situation occurs, it later adopts strategy B (learning). In changing environments, this distinction between drift and learning should be even more pronounced (e.g., if bananas are available, I prefer bananas; once they are gone, I either change my preference or face negative consequences). Alternatively, is my random choice of grapes the substrate for the learning process towards grapes in a changing environment? Further clarification is needed to resolve these potential conflicts.

      As in our response to point 1, we believe this is a crucial distinction, and we intend to further highlight it in the discussion in the revision and further expand our discussion of how the two strategies may interact.

      Reviewer #2 (Public review):

      Summary:

      This is an inspired study that merges the concept of individuality with evolutionary processes to uncover a new strategy that diversifies individual behavior that is also potentially evolutionarily adaptive.

      The authors use a time-resolved measurement of spontaneous, innate behavior, namely handedness or turn bias in individual, isogenic flies, across several genetic backgrounds.

      They find that an individual's behavior changes over time, or drifts. This has been observed before, but what is interesting here is that by looking at multiple genotypes, the authors find the amount of drift is consistent within genotype i.e., genetically regulated, and thus not entirely stochastic. This is not in line with what is known about innate, spontaneous behaviors. Normally, fluctuations in behavior would be ascribed to a response to environmental noise. However, here, the authors go on to find what is the pattern or rule that determines the rate of change of the behavior over time within individuals. Using modeling of behavior and environment in the context of evolutionarily important timeframes such as lifespan or reproductive age, they could show when drift is favored over bet-hedging and that there is an evolutionary purpose to behavioral drift. Namely, drift diversifies behaviors across individuals of the same genotype within the timescale of lifespan, so that the genotype's chance for expressing beneficial behavior is optimally matched with potential variation of environment experienced prior to reproduction. This ultimately increases the fitness of the genotype. Because they find that behavioral drift is genetically variable, they argue it can also evolve.

      Strengths:

      Unlike most studies of individuality, in this study, the authors consider the impact of individuality on evolution. This is enabled by the use of multiple natural genetic backgrounds and an appropriately large number of individuals to come to the conclusions presented in the study. I thought it was really creative to study how individual behavior evolves over multiple timescales. And indeed this approach yielded interesting and important insight into individuality. Unlike most studies so far, this one highlights that behavioral individuality is not a static property of an individual, but it dynamically changes. Also, placing these findings in the evolutionary context was beneficial. The conclusion that individual drift and bet-hedging are differently favored over different timescales is, I think, a significant and exciting finding.

      Overall, I think this study highlights how little we know about the fundamental, general concepts behind individuality and why behavioral individuality is an important trait. They also show that with simple but elegant behavioral experiments and appropriate modeling, we could uncover fundamental rules underlying the emergence of individual behavior. These rules may not at all be apparent using classical approaches to studying individuality, using individual variation within a single genotype or within a single timeframe.

      Weaknesses:

      I am unconvinced by the claim that serotonin neuron circuits regulate behavioral drift, especially because of its bidirectional effect and lack of relative results for other neuromodulators. Without testing other neuromodulators, it will remain unclear if serotonin intervention increases behavioral noise within individuals, or if any other pharmacological or genetic intervention would do the same. Another issue is that the amount of drugs that the individuals ingested was not tracked. Variable amounts can result in variable changes in behavior that are more consistent with the interpretation of environmental plasticity, rather than behavioral drift. With the current evidence presented, individual behavior may change upon serotonin perturbation, but this does not necessarily mean that it changes or regulates drift.

      However, I think for the scope of this study, finding out whether serotonin regulates drift or not is less important. I understand that today there is a strong push to find molecular and circuit mechanisms of any behavior, and other peers may have asked for such experiments, perhaps even simply out of habit. Fortunately, the main conclusions derived from behavioral data across multiple genetic backgrounds and the modeling are anyway novel, interesting, and in fact more fundamental than showing if it is serotonin that does it or not.

      We agree that our data do not support a strong conclusion that serotonin plays a privileged role in regulating drift. Based on previous literature (e.g. Kain et al., 2014, where identical pharmacological manipulations had an effect on variability while dopaminergic and octopaminergic manipulations did not), we think it likely that large global perturbations in serotonin that we observe are likely to influence plasticity that might be involved in drift (and thus find the results we observe not particularly surprising). Nonetheless, we agree that the mechanism by which serotonin may affect drift could be indirect, and it is similarly plausible that many global perturbations could lead to some shift in the amount of drift. We intend to further discuss these issues in the revision.

      To this point, one thing that was unclear from the methods section is whether genotypes that were tested were raised in replicate vials and how was replication accounted for in the analyses. This is a crucial point - the conclusion that genotypes have different amounts of behavioral drift cannot be drawn without showing that the difference in behavioral drift does not stem from differences in developmental environment.

      While a cursory inspection suggests that batch effects between different replicates was small, we intend to clarify this and more explicitly address the effects of replicates in revision.

      Reviewer #3 (Public review):

      Summary:

      The paper begins by analyzing the drift in individual behavior over time. Specifically, it quantifies the circling direction of freely walking flies in an arena. The main takeaway from this dataset is that while flies exhibit an individual turning bias (when averaged over time), their preferences fluctuate over slow timescales.

      To understand whether genetic or neuromodulatory mechanisms influence the drift in individual preference, the authors test different fly strains concluding that both genetic background and the neuromodulator serotonin contribute to the degree of drift.

      Finally, the authors use theoretical approaches to identify the range of environmental conditions under which drift in individual bias supports population growth.

      Strengths:

      The model provides a clear prediction of the environmental fluctuations under which a drift in bias should be beneficial for population growth.

      The approach attempts to identify genetic and neurophysiological mechanisms underlying drift in bias.

      Weaknesses:

      Different behavioral assays are used and are differently analysed, with little discussion on how these behaviors and analyses compare to each other.

      We intend to address this in a revision of the discussion.

      Some of the model assumptions should be made more explicit to better understand which aspects of the behaviors are covered.

      We will further clarify the assumptions of the model in revision.

  2. Nov 2024
    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Urination requires precise coordination between the bladder and external urethral sphincter (EUS), while the neural substrates controlling this coordination remain poorly understood. In this study, Li et al. identify estrogen receptor 1-expressing neurons (ESR1+) in Barrington's nucleus as key regulators that faithfully initiate or suspend urination. Results from peripheral nerve lesions suggest that BarEsr1 neurons play independent roles in controlling bladder contraction and relaxation of the EUS. Finally, the authors performed region-specific retrograde tracing, claiming that distinct populations of BarEsr1 neurons target specific spinal nuclei involved in regulating the bladder and EUS, respectively.

      Strength:

      Overall, the work is of high quality. The authors integrate several cutting-edge technologies and sophisticated, thorough analyses, including opto-tagged single unit recordings, combined optogenetics, and urodynamics, particularly those following distinct peripheral nerve lesions.

      Weakness:

      (1) My major concern is the novelty of this study. Keller et al. 2018 have shown that BarEsr1 neurons are active during urination and play an essential role in relaxing the external urethral sphincter (EUS). Minimally, substantial content that merely confirms previous findings (e.g. Figures 1A-E; Figures 3A-E) should be move to the supplementary datasets.

      Indeed, we are aware of and have carefully studied the literature of Keller et al. Our manuscript here presents novel experiments beyond the scopes of that paper. Thanks to this comment, we will substantially revise our manuscript to enhance the visibility of novel data while keeping the agreeing data in the supplementary.

      (2) I also have concerns regarding the results showing that the inactivation of BarEsr1 neurons led to the cessation of EUS muscle firing (Figures 2G and S5C). As shown in the cartoon illustration of Figure 8, spinal projections of BarEsr1 neurons contact interneurons (presumably inhibitory) that innervate motor neurons, which in turn excite the EUS. I would therefore expect that the inactivation of BarEsr1 should shift the EUS firing pattern from phasic (as relaxation) to tonic (removal of relaxation), rather than stopping their firing entirely. Could the authors comment on this and provide potential reasons or mechanisms for this finding?

      We agree with this point. We meant that the EUS’ phasic bursting pattern was rapidly stopped upon BarEsr1 photoinhibition, but not all the firing stopped instantaneously. According to the previous studies (Chang et al., 2007, de Groat, 2009, de Groat and Yoshimura, 2015, Kadekawa et al., 2016), the voiding physiology of rodents is probably different from that of humans, such that for rodents the urine is step-wise pumped out in the gap time between multiple consecutive EUS phasic bursting epochs, and for humans the urine is continuously pumped out once the EUS firing is almost fully inhibition during a period of time. Namely, for mice, the EUS display sustained tonic activity following phasic bursting, while, in contrast, for humans the EUS keeps tonic firing until the moment of voiding onset (complete inhibition, muscle relaxed). Despite the prominent differences in the basic physiological properties, our assumption is that the logic of circuits from the brainstem to the urethra in this pathway is evolutionally conserved for both species; thus the logic of brainstem coordination of voiding could also be the same for both species, which is the main interest of our study (of using an animal model to address concerns of human health). Thus, to interpret our data for a broader audience we made a simplified and inaccurate expression. We apologize for the inaccuracy and we will correct our previous inaccurate description in the revised manuscript.

      (3) Current evidence is insufficient to support the claim that the majority of BarEsr1 neurons innervate the SPN but not DGC. The current spinal images are uninformative, as the fluorescence reflects the distribution of Esr1- or Crh-expressing neurons in the spinal cord, along with descending BarEsr1 or BarCrh axons. Given the close anatomical proximity of these two nuclei, a more thorough histological analysis is required to demonstrate that the spinal injections were accurately confined to either the SPN or the DGC.

      We agree that current evidence is insufficient to support the current claim. To address this concern and strengthen our claim, we will repeat the retrograde viral tracing experiments, combined with CTB647 injections to label the injection site, to validate specific targeting of SPN or DGC populations. We will also add higher-magnification imaging to distinguish BarESR1 axonal projections targeting SPN versus DGC. Results from these ongoing experiments will be incorporated into the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors have performed a rigorous study to assess the role of ESR1+ neurons in the PMC to control the coordination of bladder and sphincter muscles during urination. This is an important extension of previous work defining the role of these brainstem neurons, and convincingly adds to the understanding of their role as master regulators of urination. This is a thorough, well-done study that clarifies how the Pontine micturition center coordinates different muscle groups for efficient urination, but there are some questions and considerations that remain.

      Strengths:

      These data are thorough and convincing in showing that ESR1+PMC neurons exert coordinated control over both the bladder and sphincter activity, which is essential for efficient urination. The anatomical distinctions in pelvic versus pudendal control are clear, and it's an advance to understand how this coordination occurs. This work offers a clearer picture of how micturition is driven.

      Weaknesses:

      The dynamics of how this population of ESR1+ neurons is engaged in natural urination events remains unclear. Not all ESR1+neurons are always engaged, and it is not measured whether this is simply variation in population activity, or if more neurons are engaged during more intense starting bladder pressures, for instance. In particular, the response dynamics of single and doubly-projecting neurons are not defined. Additionally, the model for how these neurons coordinate with CRH+ neuron activity in the PMC is not addressed, although these cell types seem to be engaged at the same time. Lastly, it would be interesting to know how sensory input can likely modulate the activity of these neurons, but this is perhaps a future direction.

      In response to the reviewer’s comments, we will attempt perform the following revisions for this round:

      (1) Engagement of ESR1+ neurons in natural urination events:

      We agree that probably not all ESR1+ neurons are consistently engaged during urination. To address this, we will perform a detailed analysis of the opto-tagged single unit recordings data.

      (2) Response dynamics of single- and doubly-projecting neurons:

      (a) We will use retrograde labelling combined with Ca2+ photometry recordings to differentiate the response dynamics of SPN- and DGC-projecting neurons during urination.

      (b) We will perform functional validations to assess the specific roles of single- and doubly-projecting neurons in coordinating bladder and EUS activity.

      (3) Coordination with CRH+ neurons in the PMC:<br /> We appreciate the suggestion to include CRH+ neurons in our model. We will expand our model to incorporate CRH+ neurons and their potential interactions with ESR1+ neurons.

      (4) Sensory modulation of ESR1+ neurons:<br /> The reviewer raises an excellent point regarding sensory input modulation of ESR1+ neuron activity. Although this is beyond the scope of our current study, we recognize its importance and propose to include this as a future direction.

      Reviewer #3 (Public review):

      Summary:

      The paper by Li et al explored the role of Estrogen receptor 1 (Esr1) expressing neurons in the pontine micturition center (PMC), a brainstem region also known as Barrington's nucleus (Hou et al 2016, Keller et al 2018). First, the author conducted bulk Ca2+ imaging/unit recording from PMCESR1 to investigate the correlations of PMCESR1 neural activity to voiding behavior in conscious mice and bladder pressure/external urethral muscle activity in urethane anesthetized mice. Next, the authors conducted optogenetics inactivation/activation of PMCESR1 to confirm the contribution to the voiding behavior also conducted peripheral nerve transection together with optogenetics activation to confirm the independent control of bladder pressure and urethral sphincter muscle.

      Weaknesses:

      (1) The study demonstrates that pelvic nerve transection reduces urinary volume triggered by PMCESR1+ cell photoactivation in freely moving mice. Could the role of pudendal nerve transection also be examined in awake mice to provide a more comprehensive understanding of neural involvement?

      Thank you for the suggestion, the pudendal nerve transection in awake mice is indeed a challenging experiment that has been missed. We will try it for the revision.

      (2) While the paper primarily focuses on PMCESR1+ cells in bladder-sphincter coordination, the analysis of PMCESR1+-DGC/SPN neural circuits - given their distinct anatomical projections in the sacral spinal cord - feels underexplored. How do these circuits influence bladder and sphincter function when activated or inhibited? Also, do you have any tracing data to confirm whether bladder-sphincter innervation comes from distinct spinal nuclei?

      Thank you for this great comment. The projection-specific neuronal function analysis is, as also suggested by Reviewer 2 in a similar comment (#8), missing in our first submission. These are so challenging experiments that we have missed in the first round of tests, but we decide to pursuit this goal again. Namely, we will perform photometry recordings of PMC neurons projecting to the DGC/SPN during measuring bladder pressure and urethral sphincter EMG activity. Additionally, while our study does not include direct tracing data to confirm distinct spinal nuclei for bladder and sphincter innervation, this has been well-documented in classic literature (Yao et al., 2018, Karnup and De Groat, 2020, Karnup, 2021). Specifically, anatomical studies have shown that SPN primarily innervates the bladder, while the DGC is associated with the innervation of the urethral sphincter. We will cite these references to provide context and support for our interpretations.

      (3) Although the paper successfully identifies the physiological role of PMCESR1+ cells in bladder-sphincter coordination, the study falls short in examining the electrophysiological properties of PMCESR1+-DGC/SPN cells. A deeper investigation here would strengthen the findings.

      While our study primarily focuses on the functional role of PMCESR1+ neurons in bladder-sphincter coordination, we acknowledge that understanding their intrinsic electrophysiological characteristics could further strengthen our findings. However, this aspect falls beyond the scope of the current study. Nevertheless, we recognize the significance of this direction and are excited to pursue it in future research. We appreciate the reviewer’s suggestion, as it highlights an important avenue for expanding upon our current findings.

      (4) The parameters for photoactivation (blue light pulses delivered at 25 Hz for 15 ms, every 30 s) and photoinhibition (pulses at 50 Hz for 20 ms) vary. What drove the selection of these specific parameters? Moreover, for photoactivation experiments, the change in pressure (ΔP = P5 sec - P0 sec) is calculated differently from photoinhibition (Δpressure = Ppeak - Pmin). Can you clarify the reasoning behind these differing approaches?

      We sincerely thank the reviewer for raising these important points and for the opportunity to clarify our experimental design and data analysis methods.

      Photoactivation versus photoinhibition parameters: The differences in photoactivation (25 Hz, 15 ms pulses) and photoinhibition (50 Hz, 20 ms pulses) protocols are based on the distinct physiological and technical requirements for activating versus inhibiting PMCESR1+ neurons. For photoactivation, 25 Hz stimulation aligns with the natural firing patterns of central neurons, allowing for intermittent activation without exceeding the neuronal refractory period. The shorter pulse duration (15 ms) minimizes phototoxicity and avoids overstimulation, as performed in previous studies (Keller et al., 2018). In contrast, photoinhibition requires sustained suppression of neuronal activity, achieved through higher frequencies (50 Hz) and longer pulses (20 ms) to ensure continuous coverage of neuronal activity.

      Calculation of pressure changes (ΔP) for photoactivation and photoinhibition: The differing methods for calculating pressure changes reflect the distinct physiological effects we aimed to capture. In photoactivation experiments (ΔP = P5 sec - P0 sec), the pressures before (P0 sec) and 5 seconds after (P5 sec) light delivery were compared to capture the immediate effect of light activation on bladder pressure, focusing on the onset and early dynamics of activation. In contrast, photoinhibition experiments assessed the immediate impact of light-induced suppression on bladder pressure during an ongoing voiding event. Here, Δpressure was calculated as Ppeak – Pmin to measure the rapid drop in pressure directly attributable to neuronal inhibition.

      We will expand these details in the methods section of the revised manuscript to provide greater transparency.

      (5) The discussion could further emphasize how PMCESR1+ cells coordinate bladder contraction and sphincter relaxation to control urination, highlighting their central role in the initiation and suspension of this process.

      We fully agree with this point. Additionally, in response to your and other reviewers’ suggestions, we are preparing a new round of experiments with projection-specific recording, and thus our discussion and conclusion will also be updated according to the newly obtained data.

      (6) In Figure 8, The authors analyze the temporal sequence of bladder pressure and EUS bursting during natural voiding and PMC activation-induced voiding. It would be acceptable to consider the existence of a lower spinal reflex circuit, however, the interpretation of the data contains speculation. Bladder pressure measurement is hard to say reflecting efferent pelvic nerve activity in real time. (As a biological system, bladder contraction is mediated by smooth muscle, and does not reflect real-time efferent pelvic nerve activity. As an experimental set-up, bladder pressure measurement has some delays to reflect bladder pressure because of tubing, but EUS bursting has no delay.) Especially for the inactivation experiment, these factors would contribute to the interpretation of data. This reviewer recommends a rewrite of the section considering these limitations. Most of the section is suitable for the results.

      Thank you for mentioning the possibility of bladder pressure measurement delay. We would prefer to perform a physical control test to quantify how much delay this measurement is under our experimental conditions. We will use a small ballon to mimic the bladder and use two identical pressure sensors, one with a very short tube inserted into the ballon and one with an extended tube same as in our animal experiments. We will then mimic both contraction initiation and halting, and quantify the delay between the two sensors.

      References

      • Chang HY, Cheng CL, Chen JJJ, de Groat WC. 2007. Serotonergic drugs and spinal cord transections indicate that different spinal circuits are involved in external urethral sphincter activity in rats. American Journal of Physiology-Renal Physiology 292: F1044-F1053. DOI: 10.1152/ajprenal.00175.2006

      • de Groat WC. 2009. Integrative control of the lower urinary tract: preclinical perspective. British Journal of Pharmacology 147. DOI: 10.1038/sj.bjp.0706604

      • de Groat WC, Yoshimura N. 2015. Anatomy and physiology of the lower urinary tract. Handb Clin Neurol 130: 61-108. DOI: 10.1016/B978-0-444-63247-0.00005-5

      • Kadekawa K, Yoshimura N, Majima T, Wada N, Shimizu T, Birder LA, Kanai AJ, de Groat WC, Sugaya K, Yoshiyama M. 2016. Characterization of bladder and external urethral activity in mice with or without spinal cord injury—a comparison study with rats. American Journal of Physiology-Regulatory, Integrative and Comparative Physiology 310: R752-R758. DOI: 10.1152/ajpregu.00450.2015

      • Karnup S. 2021. Spinal interneurons of the lower urinary tract circuits. Autonomic Neuroscience 235. DOI: 10.1016/j.autneu.2021.102861

      • Karnup SV, De Groat WC. 2020. Mapping of spinal interneurons involved in regulation of the lower urinary tract in juvenile male rats. IBRO Rep 9: 115-131. DOI: 10.1016/j.ibror.2020.07.002

      • Keller JA, Chen J, Simpson S, Wang EH-J, Lilascharoen V, George O, Lim BK, Stowers L. 2018. Voluntary urination control by brainstem neurons that relax the urethral sphincter. Nature Neuroscience 21: 1229-1238. DOI: 10.1038/s41593-018-0204-3             

      • Yao J, Zhang Q, Liao X, Li Q, Liang S, Li X, Zhang Y, Li X, Wang H, Qin H, Wang M, Li J, Zhang J, He W, Zhang W, Li T, Xu F, Gong H, Jia H, Xu X, Yan J, Chen X. 2018. A corticopontine circuit for initiation of urination. Nature Neuroscience 21: 1541-1550. DOI: 10.1038/s41593-018-0256-4

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study aimed to better understand the role of the H3 protein of the Monkeypox virus (MPXV) in host cell adhesion, identifying a crucial α-helical domain for interaction with heparan sulfate (HS). Using a combination of advanced computational simulations and experimental validations, the authors discovered that this domain is essential for viral adhesion and potentially a new target for developing antiviral therapies.

      Strengths:

      The study's main strengths include the use of cutting-edge computational tools such as AlphaFold2 and molecular dynamics simulations, combined with robust experimental techniques like single-molecule force spectroscopy and flow cytometry. These methods provided a detailed and reliable view of the interactions between the H3 protein and HS. The study also highlighted the importance of the α-helical domain's electric charge and the influence of the Mg(II) ion in stabilizing this interaction. The work's impact on the field is significant, offering new perspectives for developing antiviral treatments for MPXV and potentially other viruses with similar adhesion mechanisms. The provided methods and data are highly useful for researchers working with viral proteins and protein-polysaccharide interactions, offering a solid foundation for future investigations and therapeutic innovations.

      Weaknesses:

      However, some limitations are notable. Despite the robust use of computational methodologies, the limitations of this approach are not discussed, such as potential sources of error, standard deviation rates, and known controls for the H3 protein to justify the claims. Additionally, validations with methodologies like X-ray crystallography would further benefit the visualization of the H3 and HS interaction.

      Thank you very much for the evaluation and appreciation of our work. In response to the identified weakness, we have conducted additional analyses to further assess the limitations of the computational methodologies used. Specifically, we predicted the MPXV H3 structure using two other AI-based protein structure prediction models, ESMFold and RoseTTAFold2. Both models also predicted an a-helical structure, which supports our conclusion. However, they yielded lower pLDDT scores (Figure S1A-C in the revised SI), indicating that some error may be present.

      We agree with this reviewer, as well as the other reviewers, that X-ray crystallography data for the H3 structure would be highly valuable. Unfortunately, we lack the expertise in structural biology to obtain these results at this stage. To complement this, we performed molecular dynamics (MD) simulations, which suggest that the helical domain is connected to the main domain via a flexible linker. This flexibility may help explain the challenges in obtaining a high-resolution X-ray structure. In fact, to date, the only structural data available for H3 is from the VAVC, which excludes the helical domain (The helical domain part is cleaved for the X-ray studies). We have added this point to the discussion and hope that experts in structural biology will be able to resolve the structure of this domain in the future.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript presenting the discovery of a heparan-sulfate (HS) binding domain in monkeypox virus (MPXV) H3 protein as a new anti-poxviral drug target, presented by Bin Zhen and co-workers, is of interest, given that it offers a potentially broad antiviral substance to be used against poxviruses. Using new computational biology techniques, the authors identified a new alpha-helical domain in the H3 protein, which interacts with cell surface HS, and this domain seems to be crucial for H3-HS interaction. Given that this domain is conserved across orthopoxviruses, authors designed protein inhibitors. One of these inhibitors, AI-PoxBlock723, effectively disrupted the H3-HS interaction and inhibited infection with Monkeypox virus and Vaccinia virus. The presented data should be of interest to a diverse audience, given the possibility of an effective anti-poxviral drug.

      Strengths:

      In my opinion, the experiments done in this work were well-planned and executed. The authors put together several computational methods, to design poxvirus inhibitor molecules, and then they test these molecules for infection inhibition.

      Weaknesses:

      One thing that could be improved, is the presentation of results, to make them more easily understandable to readers, who may not be experts in protein modeling programs. For example, figures should be self-explanatory and understood on their own, without the need to revise text. Therefore, the figure legend should be more informative as to how the experiments were done.

      Thank you very much for your appreciation of our work and your support. In response to the identified weakness, we have carefully reviewed all the figure legends to ensure they are more informative.

      Reviewer #3 (Public Review):

      Summary:

      The article is an interesting approach to determining the MPOX receptor using "in silico" tools. The results show the presence of two regions of the H3 protein with a high probability of being involved in the interaction with the HS cell receptor. However, the α-helical region seems to be the most probable, since modifications in this region affect the virus binding to the HS receptor.

      Strengths:

      In my opinion, it is an informative article with interesting results, generated by a combination of "in silico" and wet science to test the theoretical results. This is a strong point of the article.

      Weaknesses:

      Has a crystal structure of the H3 protein been reported?

      The following text is in line 104: "which may represent a novel binding site for HS". It is unclear whether this means this "new binding site" is an alternative site to an old one or whether it is the true binding site that had not been previously elucidated.

      Thank you very much for your thoughtful evaluation and appreciation of our work.

      We agree with this reviewer, as well as the other reviewers, that X-ray crystallography data for the H3 structure would be highly valuable. Unfortunately, we are not experts in structural biology, and we have not yet been able to obtain these structural results. To date, the only structure available for H3 is the one from VAVC, which does not include the helical domain. We have included this point in the discussion and hope that experts in structural biology will be able to resolve the structure of this domain in the future.

      Regarding the "novel binding site," this term refers to "the true binding site that had not been previously elucidated." Previous research identified that H3 binds to heparan sulfate (HS), but the exact binding site had not been determined.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Validation of Results with Other Experimental Methods: While single-molecule force spectroscopy and flow cytometry provide valuable data, including complementary methods such as X-ray crystallography could offer additional insights into the H3-HS interaction and the effectiveness of the inhibitors.

      Discussion of Computational Model Limitations: Although the use of AlphaFold2 and other advanced tools is a strength, it is important to discuss the limitations of these models in more detail, including potential sources of error and how they may impact the interpretation of the results.

      During the manuscript evaluation, it is not clear the protein localization (transmembrane?) since the protein`s end is very close to the virus membrane surface. All experiments demonstrated the protein without being anchored to the membrane, letting the interaction site always be exposed. If the protein is linked to the membrane, how would the site be exposed due to the limited space between it and the virus structure?

      Thank you for these insightful comments. As you pointed out, the H3 protein, particularly the helical domain at the C-terminal, is indeed located close to the membrane, which could limit the available space for H3 binding. To investigate this further, we modeled the full-length H3 protein in the context of the membrane and performed molecular dynamics (MD) simulations to assess the available space. Our results show that there is more than 1 nm of space between the helical domain and the membrane, which should be sufficient for potential heparan sulfate (HS) binding (see Figure 1E, and Figure S1D&E in the revised manuscript).

      Minor corrections:

      Line 31: "is an emerging zoonotic pathogen" should be revised to reflect that Mpox is a re-emerging virus, given its history of causing outbreaks, such as in 2003.

      Line 71 and Line 75: Adding an explanation of "Mg binding sites" and "GAG motifs" would enhance reader understanding, as these represent important points in the study. The current positioning of Figure 1 causes some confusion for the reader.

      Line 111: High score? What controls were used for the protein? Are there known inhibitors of H3? If so, why weren't they tested for structure comparison? Additionally, what about other molecules that H3 binds to, such as UDP-Glucose, as demonstrated in the base article for the Vaccinia virus H3 protein available in the PDB?

      Figure 2B: Improve the legend, as the colors of the lines are not clear.

      Thank you for your instructive comments. We have addressed most of them in the revised manuscript.

      Regarding the "high score," AlphaFold2 provides a confidence score for its protein structure predictions, with a maximum score of 100. A score above 80 indicates a high level of confidence in the prediction.

      There are known inhibitors (such as antibodies) of H3, and while the sequence is available, no structure has been reported so far. Previous s NMR titration measurements have shown that UDP-glucose binds to H3, but no structural data for the complex exist. To date, the only available crystal structure is of a truncated H3, which does not include the helical domain we identified from VAVC.

      Reviewer #2 (Recommendations For The Authors):

      The text described in the result section does not match the text presented in Figures. So, it is not easy to see what are the authors referring to when they mention the Figure. For example, the text referring to Figure S8 mentions the GB1 domain and the Cohesin module, but these are not mentioned in Figure S8.

      I do not understand the results presented in Figure 5B. It is not clear to me, from the Figure legend nor after reading the Material and Methods, how this experiment was done. Specifically, what is plotted on X, is it the amount of inhibitor or the amount of protein? These things have to be checked through the manuscript.

      It would be interesting to confirm if the inhibition of infection is based on the inhibition of viral binding to the cells. This should not be complicated to realize, and it could provide evidence for the mechanism of action.

      Extensive use of terms like "this domain" is not good in this type of article, like in lines 207, and 211. It is not always clear to what domain are authors referring to, so it may be much better to mention the domain in question by the exact name.

      Line 337, If I am not mistaken dilutions are serial not series.

      Line 613, in methods. Please use g force instead of rpm, it is more informative. Even if it is just to pellet cells.

      Thank you very much for your instructive comments. We have addressed most of them in the revised manuscript. For instance, the immobilization of the GB1 domain and the cohesin module is now mentioned in Figure S9. Additionally, in the previous Figure 5B, the "x" represents the concentration of the inhibitor. Serial and g force is updated.

      Reviewer #3 (Recommendations For The Authors):

      Line 190

      Did you mutate all the amino acids at the same time? What was the impact of all these mutations on the structure of the helical region? Or if you modeled the protein again after replacing these 7 amino acids, did you find that there was no difference? Regardless of your answer, you must include a superposition of the mutated structure and the wt.

      Thank you for the insightful comment. We have now also predicted the structure of the serine mutant using AlphaFold2 (AF2). As expected, the helical domain structure remains largely preserved with only minor differences. We have included these results in Figure S6, as suggested.

      Figure 2D

      In this graph, the authors should indicate the ΔG as a negative value. In fact, the graph does not match the text.

      Thanks for the reminder, it is corrected in the graph

      Figure 4B

      Is the difference in binding force significantly different? 28.8 vs 33.7 pN

      The absolute difference in binding force is not large (~5 pN). However, for a system with a relatively low binding force, this difference is significant. Specifically, the 5 pN difference accounts for approximately a 14% reduction in binding force. We have included this percentage in the revised manuscript.

      Figure 5

      If AI-PoxBlocks723 was the only peptide effective in inhibiting viral infection of MPOX and other related viruses but not with 100% effectiveness, do you think this could be a consequence of a low interaction efficiency or the existence of a different receptor? Or a secondary region of binding in the H3? Can you argue about this?

      It has been proposed that there are other adhesion proteins for MPXV, such as D8, in addition to H3. We believe this accounts for the observed less-than-100% effectiveness.

      The use of peptides as "inhibitory tools" could have an interesting effect in vitro, however, in vivo the immunological response against the peptide will reduce/eliminate it, how you may optimize the "drug" development with this system, as you state in line 387.

      Thank you for your thoughtful comment. You are correct that the use of peptides as inhibitory tools could induce an immune response in vivo, which might limit their effectiveness over time. To optimize this approach for drug development, conjugate the peptides with carrier molecules, such as liposomes, nanoparticles, or dendrimers, which can protect the peptides from immune detection and improve their delivery to target cells. This could allow for more controlled and sustained release of the peptide in vivo, reducing the chances of immune clearance. We have added this discussion in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      This study of mixed glutamate/GABA transmission from axons of the supramammillary nucleus to dentate gyrus seeks to sort out whether the two transmitters are released from the same or different synaptic vesicles. This conundrum has been examined in other dual-transmission cases and even in this particular pathway, there are different views. The authors use a variety of electrophysiological and immunohistochemical methods to reach the surprising (to me) conclusion that glutamate and GABA- filled vesicles are distinct yet released from the same nerve terminals. The strength of the conclusion rests on the abundance of data (approaches) rather than the decisiveness of any one approach, and I came away believing that the boutons may indeed produce and release distinct types of vesicles, but have reservations. 

      We thank the reviewer for his/her evaluation of our work. At present, several studies reported that a variety of combinations of two transmitters are co-released from different synaptic vesicles in the central nervous system. In this regard, we think the cotransmission of glutamate/GABA from different synaptic vesicles is not surprising. To better explain to the reader how much we know about co-release of dual transmitters in the brain, we have now added new sentences describing segregated co-release of two neurotransmitters in other synapses in the Introduction (line 63-80).

      Accepting the conclusion, one is now left with another conundrum, not addressed even in the discussion: how can a single bouton sort out VGLUTs and VIAATs to different vesicles, position them in distinct locations with nm precision, and recycle them without mixing? And why do it this way instead of with single vesicles having mixed chemical content? For example, could a quantitative argument be made that separate vesicles allow for higher transmitter concentrations? I feel the paper needs to address these problems with some coherent discussion, at minimum. 

      Although these questions are very important and interesting to address, little is known about molecular mechanisms how VGluT2 and VIAAT are sorted to different vesicles and each synaptic vesicle is segregated. That is why we had not mentioned the sorting mechanisms in the original manuscript. Nevertheless, in response to the reviewer’s suggestion, we have now added new sentences describing possible mechanisms for the sorting and segregation of VGluT2 and VIAAT in the Discussion (line 439-462).

      As for the question regarding why glutamate and GABA are released from different synaptic vesicles, we mentioned the functional roles of separate release of two transmitters over release from single vesicles several times in the Introduction (line 94100), Results (line 300-302), and Discussion (line 406-408, 521-522). Although it seems to be an interesting point to think about transmitter concentrations in the vesicles, we think this issue is beyond the scope of the present study. Given that manipulation of vesicular transmitter contents is technically possible (Hori and Takamori, 2021), this issue awaits further investigation.

      Major concerns: 

      (1) Throughout the paper, the authors use repetitive optogenetic stimulation to activate SuM fibers and co-release glutamate and GABA. There are several issues here: first, can the authors definitively assure the reader that all the short-term plasticity is presynaptic and not due to ChR2 desensitization? This has not been addressed. Second, can the authors also say that all the activated fibers release both transmitters? If for example 20% of the fibers retained a onetransmitter identity and had distinct physiological properties, could that account for some of the physiological findings? 

      Thank you for raising this important point. To examine whether repetitive light illumination induces ChR2 desensitization, the fiber volley was extracellularly recorded. We found that paired-pulse or 10 stimuli at 5, 10, and 20 Hz reliably evoked similar amplitudes of fiber volley during light stimulation. These results clearly indicate that repetitive light stimulation can reliably activate ChR2 and elicit action potentials in the SuM axons. These new findings are now included in Figure 1-figure supplement 2 and Figure 5-figure supplement 2. We also previously demonstrated that by direct patch-clamp recordings from ChR2-expressing hippocampal mossy fiber terminals, 125 times light stimulation at 25 Hz reliably elicited action potentials (Fig. S1: Fukaya et al., 2023). Therefore, we believe that if expression level of ChR2 is high, activation of ChR2 induces action potentials in response to repetitive light stimulation and mediates synaptic transmission with high efficiency.

      We found that most of the SuM terminals (95%) have both VGluT2 and VIAAT (Figure 1E). This anatomical evidence strongly indicates that most of the SuM terminals have the ability to release both glutamate and GABA, and the SuM fibers having one transmitter identity should be minor populations.

      (2) PPR differences in Figures 1F-I are statistically significant but still quite small. You could say they are more similar than different in fact, and residual differences are accounted for by secondary factors like differential receptor saturation. 

      In this experiment, the light intensity was adjusted to yield less than 80% of the maximum response as described in the method section of original and revised manuscript, minimizing the possibility of receptor saturation. We also excluded the possibility that PPR differences could be attributed to differential receptor saturation and desensitization by using a low-affinity AMPA receptor antagonist and a low-affinity GABAA receptor antagonist (Figure 5-figure supplement 3). These results indicate that PPR differences are mediated by the presynaptic origin.

      (3) The logic of the GPCR experiments needs a better setup. I could imagine different fibers released different transmitters and had different numbers of mGluRs, so that one would get different modulations. On the assumption that all the release is from a single population of boutons, then either the mGluRs are differentially segregated within the bouton, or the vesicles have differential responsiveness to the same modulatory signal (presumably a reduced Ca current). This is not developed in the paper. 

      Based on our minimal stimulation results and anatomical analysis, we believe that many SuM terminals contain both glutamate and GABA. Therefore, both transmissions are able to be modulated by mGluRs and GABAB receptors within the same terminals. As the reviewer pointed out, differential responsiveness of glutamate-containing and GABA-containing vesicles to the GPCR signal could be one of the molecular mechanisms for differential effects of GPCRs on EPSCs and IPSCs. In addition, the spatial coupling between GPCRs and active zones for glutamate and GABA in the same SuM terminals may be different, which may give rise to differential modulation of glutamate and GABA release. These possible mechanisms are now described in the Discussion (line 469-476).

      (4) The biphasic events of Figures 3 and S3: I find these (unaveraged) events a bit ambiguous. Another way to look at them is that they are not biphasic per se but rather are not categorizable. Moreover, these events are really tiny, perhaps generated by only a few receptors whose open probability is variable, thus introducing noise into the small currents. 

      We agree with the reviewer that some events are tiny and some small currents could be masked by background noise. We understand that detecting the biphasic events by minimal stimulation has technical limitations. Because we automatically detected biphasic events, which were defined as an EPSC-IPSC sequence, only if an outward peak current following an inward current appeared within 20 ms of light illumination as described in the method section, we cannot exclude the possibility that the biphasic events we detected might include false biphasic responses. To compensate these technical issues, we also performed strontium-induced asynchronous release as another approach and found similar results as minimal stimulation experiments (Figures 3E and 3F). Furthermore, we confirmed that the amplitudes and kinetics of minimal light stimulation-evoked EPSCs or IPSCs were not altered by blockade of their counterpart currents (Figure 3-figure supplement 2). Even if false biphasic responses were accidentally included in the analysis, eventually biphasic events are a minor population and we successfully detected discernible independent EPSCs and IPSCs, which were the major population of uniquantal release-mediated synaptic responses. Thus, multiple pieces of evidence support distinct release of glutamate and GABA from SuM terminals.

      (5) Figure 4 indicates that the immunohistochemical analysis is done on SuM terminals, but I do not see how the authors know that these terminals come from SuM vs other inputs that converge in DG. 

      We thank the reviewer for raising an important point. As shown in Figure 4A, B, almost all VGluT2-positive terminals in the GC layer co-expressed with VIAAT. We are aware that VTA neurons reportedly project to the GC layer of the DG and co-release glutamate and GABA (Ntamati and Luscher, 2016). Contrary to this report, our retrograde tracing analysis did not reveal direct projections from the VTA to the DG. This new data is now included in Figure 4-figure supplement 1. We also added pre-embedding immunogold EM analysis, in which SuM terminals were virally labeled with eYFP, confirming that they form both asymmetric and symmetric synapses (revised Figure 4F). Together with these new data, our results clearly demonstrate that SuM terminals in the GC layer form both asymmetric and symmetric synapses. While our results strongly suggest that VGluT2positive terminals and SuM terminals in the GC layer are nearly identical, we cannot fully exclude the possibility that other inputs originating from unidentified brain regions may co-express VGluT2 and VIAAT in the GC layer. Therefore, in Figure 4 of the revised manuscript, we described “VGluT2-positive terminals” instead of “SuM terminals”.

      (6) Figure 4E also shows many GluN1 terminals not associated with anything, not even Vglut, and the apparent numbers do not mesh with the statistics. Why? 

      In triple immunofluorescence for VGluT2, VIAAT, and GluN1, free GluN1 puncta were predominantly observed in the molecular layer. Given that VGluT2-positive terminals are sparse in the molecular layer, these GluN1 puncta are primarily associated with VGluT1, the dominant subtype. In this study, we focused the analysis of GluN1 puncta specifically on the GC layer, excluding the molecular layer. To avoid miscommunication, we changed the original Figure 4E to the new Figure 4G, which focuses on the GC layer and aligns with the quantitative analysis. Additionally, we used ultrathin sections (100-nm-thick) to enhance spatial resolution, which limits the detection of co-localization events within this confined spatial range, as noted in the Discussion (line 485-488).

      (7) Do the conclusions based on the fluorescence immuno mesh with the apparent dimensions of the EM active zones and the apparent intermixing of labeled vesicles in immuno EM? 

      To further support our immunofluorescence results, we performed EM study and found that a single SuM terminal formed both asymmetric and symmetric synapses on a GC soma (revised Figures 4E and 4F). These new data and our immunofluorescence results clearly indicate that a single SuM terminal forms both glutamatergic and GABAergic synapses on a GC and co-release glutamate and GABA. 

      As the reviewer pointed out, our immuno EM shows that VGluT2 and VIAAT labeled vesicles appear to intermix in asymmetric and symmetric synapses. Accordingly, in the revised manuscript, Figure 7 has been modified to show the intermixing of glutamate and GABA-containing vesicles in the SuM terminal. It should be noted that because of low labeling efficiency, our immuno-EM images don’t represent the whole picture of synaptic vesicles for glutamate and GABA. There could be biased distribution of vesicles close to their release site (more VGluT2-containing vesicles close to asymmetric synapses and more VIAAT-containing vesicles close to symmetric synapses) as reported previously (Root et al., 2018). Additionally, our results could be explained by other mechanisms: co-release of glutamate and GABA from the same vesicles, with one transmitter undetected due to the absence of its postsynaptic receptor. This possibility is now mentioned in the Discussion (line 512-520). More detailed vesicle configuration in a single SuM terminal will have to be investigated in future studies.

      (8) Figure 6 is not so interesting to me and could be removed. It seems to test the obvious: EPSPs promote firing and IPSPs oppose it. 

      We believe these results are necessary for the following two reasons. First, we showed that glutamate/GABA co-transmission balance is dynamically changed in a frequency-dependent manner (Figure 5). In terms of physiological significance, it is important to demonstrate how these frequency-dependent dynamic changes affect GC firing. Therefore, we believe that figure 6, which shows how SuM inputs modulate GC firing by repetitive SuM stimulation, is necessary for this paper. Second, we previously reported the excitatory effects of the SuM inputs on GC firing, suggesting the important roles of glutamatergic transmission of the SuM inputs in synaptic plasticity (Hashimotodani et al., 2018; Hirai et al., 2022; Tabuchi et al., 2022). In contrast, how GABAergic cotransmission contributes to SuM-GC synaptic plasticity and DG information processing was not well understood. Our results in figure 6, which demonstrate the inhibitory effects of GABAergic co-transmission on GC firing by high frequency repetitive SuM input activity, clearly show the contribution of GABAergic co-transmission to short-term plasticity at SuM-GC synapses. For these reasons, we would like to keep Figure 6. We hope that our explanations convince the reviewer. 

      Reviewer #2:

      Summary:

      In this study, the authors investigated the release properties of glutamate/GABA co-transmission at the supramammillary nucleus (SuM)-granule cell (GC) synapses using in vitro electrophysiology and anatomical approaches at the light and electron microscopy level. They found that SuM to dentate granule cell synapses, which co-release glutamate and GABA, exhibit distinct differences in paired-pulse ratio, Ca2+ sensitivity, presynaptic receptor modulation, and Ca2+ channel-vesicle coupling configuration for each neurotransmitter. The study shows that glutamate/GABA co-release produces independent glutamatergic and GABAergic synaptic responses, with postsynaptic targets segregated. They show that most SuM boutons form distinct glutamatergic and GABAergic synapses in close proximity, characterized by GluN1 and GABAAα1 receptor labeling, respectively. Furthermore, they demonstrate that glutamate/GABA co-transmission exhibits distinct short-term plasticity, with glutamate showing frequencydependent depression and GABA showing frequency-independent stable depression. 

      Their findings suggest that these distinct modes of glutamate/GABA co-release by SuM terminals serve as frequency-dependent filters of SuM inputs. 

      Strengths:

      The conclusions of this paper are mostly well supported by the data. 

      We thank the reviewer for their positive and constructive comments on our manuscript.

      Weaknesses: 

      Some aspects of Supplementary Figure 1A and the table need clarification. Specifically, the claim that the authors have stimulated an axon fiber rather than axon terminals is not convincingly supported by the diagram of the experimental setup. Additionally, the antibody listed in the primary antibodies section recognizes the gamma2 subunit of the GABAA receptor, not the alpha1 subunit mentioned in the results and Figure 4. 

      We have now answered these questions in recommendations section below.

      Reviewer #3:

      Summary: 

      In this manuscript, Hirai et al investigated the release properties of glutamate/GABA cotransmission at SuM-GC synapses and reported that glutamate/GABA co-transmission exhibits distinct short-term plasticity with segregated postsynaptic targets. Using optogenetics, whole-cell patch-clamp recordings, and immunohistochemistry, the authors reveal distinct transmission modes of glutamate/GABA co-release as frequency-dependent filters of incoming SuM inputs. 

      Strengths: 

      Overall, this study is well-designed and executed; conclusions are supported by the results. This study addressed a long-standing question of whether GABA and glutamate are packaged in the same vesicles and co-released in response to the same stimuli in the SuM-GC synapses (Pedersen et al., 2017; Hashimotodani et al., 2018; Billwiller et al., 2020; Chen et al., 2020; Li et al., 2020; Ajibola et al., 2021). Knowledge gained from this study advances our understanding of neurotransmitter co-release mechanisms and their functional roles in the hippocampal circuits. 

      Weaknesses:

      No major issues are noted. Some minor issues related to data presentation and experimental details are listed below. 

      We appreciate the reviewer’s positive view of our study. We responded in more detail in recommendations section below.

      Recommendations for the authors:

      Reviewer #1:

      (1) The blue color for VIAAT in panel 1C is extremely hard to see. 

      Thank you for pointing out. We have changed to the cyan color for VIAAT in Figure 1C and D in the revised manuscript.

      (2) Line 329 "perforant" not "perfomant".  

      We appreciate the reviewer’s careful attention. In the revised manuscript, we corrected this misword.

      Reviewer #2:

      To convincingly demonstrate that the authors stimulated SuM axon fiber instead of SuM terminals (Supplementary Figures 1A), they should provide an image showing the distribution of SuMlabeled fibers and axon terminals reaching the dentate gyrus (DG) and the trace of the optic fiber, rather than providing a diagram of the experimental setup. 

      We appreciate the reviewer’s suggestion. We have now provided a new experimental setup image (Figure 1-figure supplement 1A) showing a single GC, the distribution of SuM fibers in the GC layer, and the illumination area at each location. As SuM inputs make synapses onto the GC soma and dendrite close to the GC cell body, SuM-GC synapses in the recording GCs exist in a very limited area. This characteristic synaptic localization allowed us to control the illumination area without applying light to the SuM terminals in the recording GCs. Delayed onsets of EPSCs/IPSCs by over-axon stimulation (Figure 1-figure supplement 1C, D) also support that SuM terminals in the recording GCs were out of illumination area.

      Additionally, the authors should clarify the discrepancy between the antibody mentioned in the list of primary antibodies, which recognizes the gamma2 subunit of the GABAA receptor, and the alpha1 subunit of the GABAA receptor mentioned in the results and Figure 4. 

      We apologize for this mistake. As described in the main text and figure, we used the antibody for a1 subunit of the GABAA receptor. Table S1 has been corrected in the revised version of the paper.

      Reviewer #3:

      (1) In Figure 1, the authors used two [Ca2+]o concentrations to study the EPSC and IPSC amplitudes. How does the Ca2+ concentration affect the PPR in the EPSC and IPSC, respectively? 

      Given that lowering the extracellular Ca2+ concentration reduces the release probability, it is expected that 1 mM extracellular Ca2+ concentration increases PPR compared to 2.5 mM. Actually, we observed that lowering the extracellular Ca2+ concentration increased the synaptic responses from 2nd to 10th (both EPSC and IPSC) by train stimulation (Figure 5).

      (2) In Figure 2D, does baclofen also have a dose-dependent effect on the inhibition of the EPSC and IPSC similar to the DCG-IV in Figure 2C? 

      Thank you for your question. Because we aimed to demonstrate the differential inhibitory effects of baclofen at a certain concentration on glutamatergic and GABAergic co-transmission, we did not go into detail regarding a dose-dependent effect. In response to the reviewer’s comment, we performed the effects of higher concentration of baclofen on EPSCs and IPSCs. As shown in the figure below, 50 µM baclofen inhibited EPSCs and IPSCs to the similar extent. Therefore, by comparing inhibitory effect of two different concentrations of baclofen (5 and 50 µM), we believe that baclofen also has a dose-dependent inhibitory effect on both EPSCs and IPSCs similar to the DCGIV.

      Author response image 1.

      (3) In Figure 2E, statistical labels, such as "*" or "n.s." (not significant), should be provided on the plots to facilitate the reading of figures. 

      In response to the reviewer’s comment, we have provided statistical labels in the Figure 2E.

      (4) In Figure 3A, the latency of the evoked EPSC for the lower light stimulation groups seems to be much slower than the one shown on the left or other figures in the paper, such as Figure 1F.

      Please double-check if the blue light stimulation label is placed in the right location. 

      Corrected, thanks.

      (5) The use of minimal light stimulation in optogenetic experiments is not appropriately justified or described. More detailed information should be provided, such as whether the optogenetic stimulation is performed on the axon or the terminals of the SuM. 

      We appreciate the reviewer’s suggestion. To effectively detect stochastic synaptic responses, the light stimulation was applied on the terminals of the SuM. We have now stated this information (line 212). We also further described the justification of use of minimal light stimulation in the revised manuscript (line 207-209). 

      References

      Fukaya R, Hirai H, Sakamoto H, Hashimotodani Y, Hirose K, Sakaba T (2023) Increased vesicle fusion competence underlies long-term potentiation at hippocampal mossy fiber synapses. Sci Adv 9:eadd3616.

      Hashimotodani Y, Karube F, Yanagawa Y, Fujiyama F, Kano M (2018) Supramammillary Nucleus Afferents to the Dentate Gyrus Co-release Glutamate and GABA and Potentiate Granule Cell Output. Cell Rep 25:2704-2715 e2704.

      Hirai H, Sakaba T, Hashimotodani Y (2022) Subcortical glutamatergic inputs exhibit a Hebbian form of long-term potentiation in the dentate gyrus. Cell Rep 41:111871.

      Hori T, Takamori S (2021) Physiological Perspectives on Molecular Mechanisms and Regulation of Vesicular Glutamate Transport: Lessons From Calyx of Held Synapses. Front Cell Neurosci 15:811892.

      Ntamati NR, Luscher C (2016) VTA Projection Neurons Releasing GABA and Glutamate in the Dentate Gyrus. eNeuro 3.

      Root DH, Zhang S, Barker DJ, Miranda-Barrientos J, Liu B, Wang HL, Morales M (2018) Selective Brain Distribution and Distinctive Synaptic Architecture of Dual Glutamatergic-GABAergic Neurons. Cell Rep 23:3465-3479.

      Tabuchi E, Sakaba T, Hashimotodani Y (2022) Excitatory selective LTP of supra-mammillary glutamatergic/GABAergic co-transmission potentiates dentate granule cell firing. Proc Natl Acad Sci U S A 119:e2119636119.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Lodhiya et al. demonstrate that antibiotics with distinct mechanisms of action, norfloxacin, and streptomycin, cause similar metabolic dysfunction in the model organism Mycobacterium smegmatis. This includes enhanced flux through the TCA cycle and respiration as well as a build-up of reactive oxygen species (ROS) and ATP. Genetic and/or pharmacologic depression of ROS or ATP levels protect M. smegmatis from norfloxacin and streptomycin killing. Because ATP depression is protective, but in some cases does not depress ROS, the authors surmise that excessive ATP is the primary mechanism by which norfloxacin and streptomycin kill M. smegmatis. In general, the experiments are carefully executed; alternative hypotheses are discussed and considered; the data are contextualized within the existing literature. Clarification of the effect of 1) ROS depression on ATP levels and 2) ADP vs. ATP on divalent metal chelation would strengthen the paper, as would discussion of points of difference with the existing literature. The authors might also consider removing Figures 9 and 10A-B as they distract from the main point of the paper and appear to be the beginning of a new story rather than the end of the current one. Finally, statistics need some attention.

      Strengths:

      The authors tackle a problem that is both biologically interesting and medically impactful, namely, the mechanism of antibiotic-induced cell death.

      Experiments are carefully executed, for example, numerous dose- and time-dependency studies; multiple, orthogonal readouts for ROS; and several methods for pharmacological and genetic depletion of ATP.

      There has been a lot of excitement and controversy in the field, and the authors do a nice job of situating their work in this larger context.

      Inherent limitations to some of their approaches are acknowledged and discussed e.g., normalizing ATP levels to viable counts of bacteria.

      We sincerely appreciate the reviewer’s encouraging feedback.

      Weaknesses:

      The authors have shown that treatments that depress ATP do not necessarily repress ROS, and therefore conclude that ATP is the primary cause of norfloxacin and streptomycin lethality for M. smegmatis. Indeed, this is the most impactful claim of the paper. However, GSH and dipyridyl beautifully rescue viability. Do these and other ROS-repressing treatments impact ATP levels? If not, the authors should consider a more nuanced model and revise the title, abstract, and text accordingly.

      We thank the reviewer for asking this question. In the revised version of the manuscript, we have included data on the impact of the antioxidant GSH on antibiotic-induced ATP levels as the supplementary figure (S9C)

      Does ADP chelate divalent metal ions to the same extent as ATP? If so, it is difficult to understand how conversion of ADP to ATP by ATP synthase would alter metal sequestration without concomitant burst in ADP levels.

      We sincerely thank the reviewer for raising this insightful question. Indeed, ADP and AMP can also form complexes with divalent metal ions; however, these complexes tend to be less stable. According to the existing literature, ATP-metal ion complexes exhibit a higher formation constant compared to ADP or AMP complexes. This has been attributed to the polyphosphate chain of ATP, which acts as an active site, forming a highly stable tridentate structure (Khan et al., 1962; Distefano et al., 1953). An antibiotic-induced increase in ATP levels, irrespective of any changes in ADP levels or a total pool size of purine nucleotides, could still result in the formation of more stable complexes with metal ions, potentially leading to metal ion depletion. Although recent studies indicate that antibiotic treatment stimulates purine biosynthesis (Lobritz MA et al., 2022; Yang JH et al., 2019), thereby imposing energy demands and enhancing ATP production, and therefore, the possibility of a corresponding increase in total purine nucleotide levels (ADP+ATP) exist (is mentioned in discussion section). However, this hypothesis requires further investigation.

      Khan MMT, Martell AE. Metal Chelates of Adenosine Triphosphate. Journal of Physical Chemistry (US). 1962 Jan 1;Vol: 66(1):10–5

      Distefano v, Neuman wf. Calcium complexes of adenosinetriphosphate and adenosinediphosphate and their significance in calcification in vitro. Journal of Biological Chemistry. 1953 Feb 1;200(2):759–63

      Lobritz MA, Andrews IW, Braff D, Porter CBM, Gutierrez A, Furuta Y, et al. Increased energy demand from anabolic-catabolic processes drives β-lactam antibiotic lethality. Cell Chem Biol [Internet]. 2022 Feb 17.

      Yang JH, Wright SN, Hamblin M, McCloskey D, Alcantar MA, Schrübbers L, et al. A White-Box Machine Learning Approach for Revealing Antibiotic Mechanisms of Action. Cell [Internet]. 2019 May 30

      Reviewer #1 (Recommendations for the authors):

      (1) Some of the results in the paper diverge from what has been previously reported by some of the referenced literature. These discrepancies should be clarified.

      We apologize for any confusion, but we are uncertain about the specific discrepancies the reviewer is referring. In the discussion section, we have addressed and analysed our results within the broader context of the existing literature, regardless of whether our findings align with or differ from previous studies.

      (a) CCCP, nigericin, BDQ, and the atpD mutant all appear to affect M. smegmatis growth (Figures S6C, S7C, S7D-E, and Figure 1B from reference 41). Could depressed growth contribute to the rescue effects of these compounds?

      We concur with the reviewer that the reagents we used (CCCP, Nigericin, and BDQ) to suppress the ATP burst in the presence of antibiotics do affect bacterial growth. This growth sub-inhibitory effect is expected given their roles in either uncoupling the electron transport chain from oxidative phosphorylation or directly inhibiting ATP synthase, leading to reduced ATP production compared to the untreated control. However, we chose concentrations that reduces the antibiotic-induced surge in ATP levels without significantly depriving the bacteria of the ATP  essential for their survival, thereby avoiding cell death.

      Consequently, all three reagents (as shown in Figures S6C, S7C, and S7D-E) were employed at non-lethal concentrations. We would like to emphasize, however, that it was not feasible to select a reagent concentration that had no impact on growth yet still suppressed the antibiotic-induced ATP burst. We recognize the possibility that growth retardation may have contributed to the observed rescue effects. To address this concern, we used multiple orthogonal methods (CCCP, Nigericin, and BDQ), each with distinct mechanisms having a common effect of reducing the ATP surge, to minimize off-target effects and support our findings.

      Also, the authors report no growth phenotype for atpD mutant (Figure S8) but only carry out the growth curve to an OD of 2, which is approximately where the growth curve from ref 41 begins to diverge.

      Additionally, to further confirm that bacterial rescue was not due to growth retardation caused by these reagents, we utilized the atpD mutant. All experiments, including those involving the atpD mutant, were conducted when the OD600nm reached 0.8 (during the exponential phase). We specifically ensured that the growth of the atpD mutant was not compromised during this phase (Figure S8) and restricted our growth curve to the early stationary phase (OD600 between 1.5 and 2). While it is possible that the atpD mutant may exhibit slower growth compared to wild-type bacteria in stationary phase at an OD600nm of 4 (as shown in ref 41), however, this does not impact our observations.

      (b) Reference 41 also reports that the atpD mutant is more sensitive to some antibiotics  (Figure 6). This includes isoniazid, which references 34 and 35 have both reported caused an ATP burst.

      We acknowledge the reviewer’s query regarding the phenotype of the atpD mutant against isoniazid (Reference 41). However, the cited reference does not provide clarity on why the M. smegmatis atpD mutant exhibits increased sensitivity to isoniazid and other antibiotics, nor does it explain whether this sensitivity is due to reduced ATP levels or altered cell wall properties, such as enhanced drug uptake, as observed with Nile red and ethidium bromide.

      While references 34 and 35 reported an ATP burst following isoniazid treatment in slow-growing M. bovis BCG and M. tuberculosis, it remains to be tested whether isoniazid acts similarly in the fast-growing M. smegmatis, where it is bacteriostatic rather than being bactericidal as observed in M. bovis BCG and M. tuberculosis.  

      (2) The statistics require some attention. First, the wording for almost all of the figures is something like "data points represent the mean of at least three independent replicates," is that correct? CFUs are notoriously messy so it is surprising (impressive?) that the variability between replicates is so low. Second, t-tests are not appropriate for multiple comparisons.

      We thank the reviewer for raising this important query. It is correct that all our experiments included at least three independent replicates, and many of our results exhibit a high degree of variability, as indicated by the large error bars. We would like to clarify that we did not perform multiple comparisons on our results. For all analyses, an unpaired t-test was conducted between the control group and one experimental group at a time. Consequently, statistical data were generated for each pair of results, and the comparisons were displayed on the graph relative to the control data points, as mentioned in the Methods section under the heading “Statistical analysis”

      (3) Figures 9 and 10A-B seem tangential to the main point of the paper and, in the case of Figure 10A-B, preliminary.

      In this study, our aim was to comprehensively investigate the nature of antibiotic-induced stresses (i.e., mechanisms of action from T = 15 hrs) and leverage these insights to enhance our understanding of bacterial adaptation mechanisms, particularly antibiotic tolerance (from T = 25 hrs). While a significant portion of the manuscript focuses on the secondary consequences of antibiotic exposure, we also sought to assess the bacteria's ability to counteract these stresses, contributing to our understanding of how antibiotic tolerance phenotypes develop.

      The results presented in Figure 9 clearly demonstrate that bacteria attempt to reduce respiration by decreasing flux through the complete TCA cycle, thereby mitigating ROS and ATP production in response to antibiotics. These findings not only uncovers potential metabolic pathways to downregulate respiration but also validate our observations regarding the role of increased respiration, ROS generation, and subsequent ATP production in antibiotic action.

      Importantly, bacterial responses to antibiotics were not limited to metabolic adaptations. They also included the upregulation of the intrinsic drug resistance determinant Eis (Figure 10A) and an increase in mutation frequency (Figure 10B), both of which indicate a greater likelihood of these bacteria developing antibiotic tolerance and resistance. Therefore, the data presented in Figures 9 and 10A-B are not peripheral to the central theme of the paper. Rather, they complement and strengthen it by providing a comprehensive understanding of the consequences of antibiotic exposure, which aligns with the primary objectives of our study.

      Do the various perturbations used here (especially streptomycin) effect expression and/or turnover of the genetically-encoded sensors Mrx1-roGFP2 or Peredox-mCherry?

      We appreciate the reviewer for raising this query. Since streptomycin treatment leads to mistranslation and eventually inhibits protein synthesis, it is possible that such treatment could impact the expression and/or turnover of the genetically encoded biosensors, Mrx1-roGFP2 (1) or Peredox-mCherry (2). However, we do not anticipate any effects on the readout as both biosensors provide ratiometric measurements of redox potential and NADH levels, respectively, which eliminates errors due to variations in protein abundance. Nevertheless, in our experiments with both drugs, we employed multiple time- and dose-dependent responses, ensuring that all meaningful conclusions were drawn from the overall trends seen in the data rather than an individual data point.

      (1) Bhaskar A, Chawla M, Mehta M, Parikh P, Chandra P, Bhave D, et al. (2014) Reengineering Redox Sensitive GFP to Measure Mycothiol Redox Potential of Mycobacterium tuberculosis during Infection. PLoS Pathog 10(1): e1003902. https://doi.org/10.1371/journal.ppat.1003902

      (2) Shabir A. Bhat, Iram K. Iqbal, and Ashwani Kumar*. Imaging the NADH:NAD+ Homeostasis for Understanding the Metabolic Response of Mycobacterium to Physiologically Relevant Stresses. Front Cell Infect Microbiol. 2016; 6: 145. doi: 10.3389/fcimb.2016.00145

      (4) Do the antibiotics affect permeability? Especially relevant to CellROX experiments.

      Antibiotics can impact, or even increase, bacterial membrane permeability, a phenomenon noticed in case of self-promoted uptake of aminoglycosides. When aminoglycosides bind to ribosomes, they induce mistranslation, including of membrane proteins, leading to the formation of membrane pores, which in turn enhances antibiotic uptake and lethality (1-2). However, whether the antibiotics used in our study (norfloxacin and streptomycin) at the concentrations applied altered membrane permeability is not known.

      Experiments involving the CellROX dye are unlikely to be influenced by changes in membrane permeability, as the dye is freely permeable to the mycomembrane.

      References:

      (1) Davis BD Chen LL Tai PC (1986) Misread protein creates membrane channels: an essential step in the bactericidal action of aminoglycosides PNAS 83:6164–6168.

      (2) Ezraty B Vergnes A Banzhaf M Duverger Y Huguenot A Brochado AR Su SY Espinosa L Loiseau L Py B Typas A Barras F (2013) Fe-S cluster biosynthesis controls uptake of aminoglycosides in a ROS-less death pathway Science 340:1583–1587.

      (5) Figures 4E-H does GSH affect bacterial growth/viability on its own i.e. in the absence of a drug?

      We thank the reviewer for raising this query. Indeed, the 10 mM GSH used in our experiments to mitigate and rescue cells from antibiotic-induced ROS does impact bacterial growth on its own, though it does not affect viability, likely due to GSH inducing reductive stress on bacterial physiology. For clarification, we have included the viability measurement data in the presence of 10 mM GSH alone in the revised version of the manuscript, as supplementary figure (S4E).

      (6) p. 2 "...antibiotic resistance involves more complex mechanisms and manifests as genotypic resistance, antibiotic tolerance, and persistence." This reads as tolerance and persistence being a subset of resistance, which is not quite accurate. There is at least one other example of similar wording in the text.

      We thank the reviewer for highlighting this point. Our intention was to convey that resistance to antibiotics can manifest in two forms: permanent or genetic resistance, and transient resilience through antibiotic tolerance and persistence.

      (7) p. 3 "...and showing no visible differences in the growth rate...". It is hard to say this as all the values appear to be 0 - possible to zoom in on the CFU counts in this region? Same comment for p. 5 "...the unaffected growth rate in the early response phase...".

      We apologize for the lack of clarity regarding the resolution of the early time points in the growth curve. Unfortunately, it was not feasible for us to zoom in on the initial time points due to the significant difference in cell viability between T=0 and T=25 hours (i.e., spanning 8 generations). For clarification in the growth phenotype at early time points, please refer to Author response image 1, where CFU counts are plotted on a logarithmic scale. The y-axis spans 6-8 orders of magnitude across different conditions, making it difficult to visualize early time points on a linear scale.

      Author response image 1.

      (8) p. 5 "...data for each condition were subjected to rigorous quality control analysis (S2B)." I believe that this is the case, but how Figure S2B demonstrates this fact is not clear.

      Figures S2A and S2B present the quality assessment data for all six proteomics datasets. Figure S2A illustrates the consistency in the number of proteins identified across 10 samples (5 independent replicates for both control and drug treatment). The minimal variation in the number of identified proteins indicates reproducibility across the different runs. Similarly, Figure S2B displays the variability in Pearson correlation coefficient values of protein abundance (LFQ intensities) across the 10 samples. The closer and more consistent the Pearson correlation values, the greater the reproducibility of the quantitative data acquisition.

      (9) p. 7 "To look for a shared mechanism of antibiotic action..." The wording implies an assumption. Perhaps "to test whether" would be more appropriate? Same comment for p. 12 "To further confirm whether enhanced respiration ...".

      We appreciate the reviewer’s suggestions for both sentences and have made the necessary changes in the revised version. Thank you for bringing this to our attention.

      (10) Figure S1A-B figure legend. How was this assay performed?

      The experiment for Figures S1A-B was conducted using a standard REMA assay, as described in the methods section. Cells were harvested at the 25th-hour time point, and drug MICs were compared between cells grown with and without 1/4x MBC99 of the drugs. This was done to determine whether the growth recovery observed during the recovery phase was due to the presence of drug-resistant bacteria.

      (11) p. 14 "...(CCCP), a protonophore, at non-toxic levels..." Figure S6C implies an effect on growth.

      As clarified earlier in response to query 1(a), the CCCP reagent was used at concentrations that effectively minimize the antibiotic-induced surge in ATP levels. However, at these concentrations, CCCP reduces cellular ATP production (Figure S6A), leading to bacterial growth delay (Figure S6C). By "non-toxic levels," we intended to convey that these concentrations of CCCP are non-lethal to the bacteria, as evidenced in Figure S6C.

      (12) Figure 8A y axis is this CFU/mL or OD/mL?

      The y-axis for the figure 8A depicts CFU/ml as it measures the cell survival in response to increasing concentrations of bipyridyl.

      Reviewer #2 (Public review):

      Summary:

      The authors are trying to test the hypothesis that ATP bursts are the predominant driver of antibiotic lethality of Mycobacteria.

      Strengths:

      This reviewer has not identified any significant strengths of the paper in its current form.

      Weaknesses:

      A major weakness is that M. smegmatis has a doubling time of three hours and the authors are trying to conclude that their data would reflect the physiology of M. tuberculosis which has a doubling time of 24 hours. Moreover, the authors try to compare OD measurements with CFU counts and thus observe great variabilities.

      If the authors had evidence to support the conclusion that ATP burst is the predominant driver of antibiotic lethality in mycobacteria then this paper would be highly significant. However, with the way the paper is written, it is impossible to make this conclusion.

      We have identified a new mechanism of antibiotic action in Mycobacterium smegmatis. However, as discussed extensively in the manuscript's discussion section, whether and to what extent this mechanism applies to other organisms still needs to be tested.

      We have always drawn inferences from the CFU counts as the OD600nm is never a reliable method as reported in all of our experiments.

      Reviewer #2 (Recommendations for the authors):

      Figure 1 needs to have an x-axis that has intervals that have 10E5 CFU to 4 x 10E8. But even 4 x 10E8 CFU/ml is a late log and not exponentially growing cells.

      Figure 1 illustrates the growth curve. We hope the reviewer meant the Y axis which represents CFU/ml on a linear scale. As mentioned in response to reviewer #1’s query no. 7, it was not feasible to include the viability (CFU/ml) values at T=0 and a few subsequent time points. Naturally, the starting cell count was not zero; we began with approximately 600,000 CFU/ml, corresponding to an OD600nm of 0.0025/ml. For clarification, we have mentioned the initial OD as well CFU/ml at T= 0 hr in the figure legend.  

      Carefully look at Figure 1, what were you trying to show? Your x-axis goes from 0 to 10E8, of course you did not inoculate 0 cells, but if you had measured CFUs, you might not have gotten the great variability you reported in your graph.

      We assume that the reviewer is suggesting that "if we had measured OD600nm/ml instead of CFU/ml, we might not have observed the high variability we reported." While we agree with the reviewer's comment, our decision to use CFU/ml for growth measurement was to obtain more resolved and detectable data points, as an OD600nm of 0.0025/ml cannot be reliably measured with a spectrophotometer. Additionally, at around T=15 hours, where we observed an extended lag phase (referred to as the stress phase), the OD600nm was approximately 0.05, which is barely detectable. Therefore, the significant differences between the control group and the ¼ x MBC99 drug-treated group might not have been observed if we had relied on OD-based measurements. Despite the presence of high error bars and variability in the data points, we were still able to demonstrate clear differences in bacterial growth between treated and untreated samples at sub-lethal drug doses. This ultimately allowed us to capture the nature of antibiotic-induced stresses.

      There is no doubt that sublethal concentrations of antibiotics will have an effect on the bacterial cells. But it is not clear how you are concluding that ATP burst is the dominant driver of lethality. M. smegmatis can be very different from Mtb.

      Using a series of time- and dose-dependent experiments with plasmid and kit-based approaches, we demonstrated that both antibiotics generate and rely on ROS and ATP bursts to induce lethality in M. smegmatis. Careful monitoring of oxidative stress in cells, following specific quenching of the antibiotic-induced ATP burst (Figure 7, S9A-B), revealed that the ATP burst is the dominant driver of antibiotic lethality. In all tested experiments, surviving bacteria exhibited elevated levels of oxidative stress but were able to maintain their viability, suggesting that oxidative stress alone is not the dominant factor in antibiotic-induced lethality. Furthermore, quenching of ROS by glutathione also suppressed antibiotic-induced surge in ATP levels, thus supporting the notion that ROS alone, is not the dominant driver of antibiotic action as previously understood.

      All experiments reported were conducted using fast-growing M. smegmatis, and have acknowledged the need for similar experiments in other bacterial systems, including M. tuberculosis, to assess whether our findings are applicable to other systems.

      Another point, the use of a mutant in the ATP synthase is an interesting idea, but would it be better to use something where you knock out the ATP synthase activity with siRNA or a temperature-sensitive allele?

      We appreciate the reviewer’s encouraging comment. Knocking out ATP synthase would completely halt oxidative phosphorylation and shut down aerobic respiration, leading to severe metabolic and growth defects. Such stressful and non-growing conditions are not suitable for testing the efficacy of antibiotics, as it is widely accepted that antibiotics are more effective against metabolically active bacteria.

      Lastly, the conclusion is that norfloxacin and streptomycin have common mechanisms of action, but the authors do not explain how a DNA gyrase inhibitor shows the same mechanisms of action as a ribosome inhibitor.

      The connection between antibiotic target corruption (DNA gyrase or ribosome) and the activation of respiration is indeed unclear, intriguing, and represents one of the most exciting questions in the field of antibiotic mechanisms of action. In the discussion section, we have speculated on potential pathways for this connection, including the possibility that the inhibition of cell division by both drugs may create a perception of resource scarcity (energy and biosynthetic precursors), which could subsequently trigger increased metabolism, respiration, ROS production, and ATP synthesis. However, the precise mechanisms underlying this connection require further investigation and are beyond the scope of the present study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Author Response

      Reviewer #1 (Public Review): 

      Weaknesses: 

      - Having demonstrated that NK cell IFNgamma is important for recruiting and activating DCs and T cells in their model, one is left to wonder whether it is important for the therapeutic effect, which was not tested. 

      We conducted a preliminary study to compare the pro-survival effect of WT NK and Ifng-/- NK cell therapies. We found that, in the 95-500 mg day-21 tumor group, the overall survival (OS) of mice receiving Ifng-/- NK cell therapy significantly decreased (p = 0.045) compared to mice receiving WT NK cell therapy up to 60 days after tumor inoculation, but there was no difference in OS beyond 65 days after tumor inoculation. Therefore, we have added the following sentences at the end of the second paragraph in our Discussion (Page 32):

      “However, although Ifng-/- NK cells induced less cDC activation compared to WT NK cells, the levels of CD86 on cDCs of mice that received Ifng-/- NK cells were higher than those of mice not subjected to NK cell transfer (Figure 4B). This outcome indicates the presence of IFN-g-independent or/and compensatory mechanism(s) for cDC activation by the transferred NK cells, which is in line with our preliminary result that Ifng-/- NK cell therapy does not significantly diminish the pro-survival effect in comparison to WT NK cell therapy beyond 60 days after tumor cell inoculation (data not shown).”

      - It was somewhat difficult to gauge the clinical trial results because the trial was early stage and therefore not controlled. Evaluation of the results therefore relies on historical comparisons. To evaluate how encouraging the results are, it would be valuable for the authors to provide some context on the prognoses and likely disease progression of these patients at the time of treatment. 

      We had already indicated in our Results that all six patients had an ECOG performance status of 0 (Page 25 and Table). We have now added in the Results that they had “a predicted survival of >3 months” (Page 25).

      Reviewer #1 (Recommendations For The Authors):

      Minor points: 

      (1) It would be helpful if the authors provided a rationale for why they derived their NK cell product from bone marrow cells instead of the more common source, spleen cells. 

      We now clarify that: “We used BM cells instead of splenocytes for NK cell culture because removal of T cells from BM cells before culturing is not necessary” (Page 35) to the section Ex vivo expansion of murine and human NK cells in our Materials and Methods.

      (2) It would have been helpful to provide summary results from replicates of the cytokine production data shown in Figure 1F. 

      We have now added a graphical panel on the relative ΔMFI of two independent experiments to Figure 1F and revised the figure legend accordingly (Page 7—8).

      (3) The role of conventional CD4+ T cells is a little unclear. The authors state in the discussion that they contribute to the antitumor response, which is consistent with their finding that depleting both CD4 T cells and CD8 T cells has a greater effect than depleting CD8 T cells. Depleting CD4 T cells alone trended towards improving the response, however. Probably Tregs are the culprit in the latter effect but a sentence or two would be helpful if the claim for a protective role for CD4 T cells is to remain.  

      We have now re-analyzed the data of Figure 3D by separating mice into two groups according to day 21 tumor weight, i.e., 95-600 mg and >600 mg (Page 13—14). We have revised our explanation of the Figure 3D data in the Results (Page 11—12) as follows:

      “Accordingly, we examined the role of T cells in NK cell therapy by depleting T cell subsets with antiCD4 or/and anti-CD8 antibodies two days before primary tumor resection (Figure 3D Schema and Figure 3-figure supplement 1). In the 95-600 mg tumor group, depletion of CD8+ cells alone or both CD4+ and CD8+ cells diminished the effect of NK cell therapy, whereas depletion of CD4+ cells alone did not affect OS (Figure 3D). This result indicates that CD8+ T cells are essential for the effect of NK cell therapy. In contrast, the >600 mg tumor group displayed a limited NK-cell treatment effect as expected, but did exhibit improved OS upon depleting CD4+ cells alone (Figure 3D). As the proportion of lung Foxp3+CD4+ T cells in CD45+ cells positively correlated with day 21 tumor weight (data not shown), depletion of Foxp3+CD4+ T cells by anti-CD4 antibody likely has a stronger effect in augmenting the immune response for the >600 mg tumor group than the 95-600 mg tumor group. Moreover, both tumor groups showed diminished OS upon depletion of both CD4+ and CD8+ cells than was the case for depletion of CD8+ cells alone, indicating a CD8+ T cell-independent anti-tumor effect of CD4+ T cells (Figure 3D).”

      (4) The schema in Figure 3E states that mice were inoculated with either EO771 tumor cells or B16F10 tumor cells, but it appears that the data only show EO771 tumor challenges. This should be corrected. 

      Corrected according to the reviewer’s comment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper reports fossil soft-tissue structures (tail vanes) of pterosaurs, and attempts to relate this to flight performance and other proposed functions for the tail

      Strengths:

      The paper presents new evidence for soft-tissue strengthening of vanes using exciting new methods.

      We thank Reviewer #1 for the positive assessment of our work.

      Weaknesses:

      There seems to be no discussion of bias in the sample selection method - even a simple consideration of whether discarded specimens were likely not to have had the cross-linking lattice, or if it was not visible.

      There seems to be no supporting evidence or theory to show how the lattice could have functioned, other than a narrative description. Moreover, there is no comparison to extant organisms where a comparison of function might be drawn.

      We note these weaknesses and have addressed them as part of the consensus of suggested edits given below (‘first option’). We thank the reviewer for this feedback.

      Reviewer #2 (Public review):

      Summary:

      The authors have set out to investigate and explain how early members of the Pterosauria were able to maintain stiffness in the vane of their tails. This stiffness, it is said, was crucial for flight in early members of this clade. Through the use Laser-Stimulated Fluorescence imaging, the authors have revealed that certain pterosaurs had a sophisticated dynamic tensioning system that has previously been unappreciated.

      Strengths:

      The choice of method of investigation for the key question is sound enough, and the execution of the same is excellent. Overall the paper is well written and well presented, and provides a very succinct, accessible and clear conclusion.

      We thank Reviewer #2 for their positive assessment of our work.

      Weaknesses:

      None

      We thank Reviewer #2 for their positive assessment of our work.

      Recommendations for the authors:

      The consensus between the reviewers and reviewing board is that this manuscript can be substantially strengthened and this can be achieved in two ways that are presented in order of preference.

      First option; resolve the following weaknesses:

      - Include a rigorous discussion of possible bias in the sample selection method with consideration of discarded specimens in relation to cross-linking lattice observation.

      - Include published biomechanics theory, supported by citations or a self-derived biomechanical model, to show how the lattice could have functioned biomechanically.

      - Discuss whether you found similar mechanisms in extant organisms for comparative functional interpretation.

      We thank the reviewers and reviewing board for taking the time to discuss the review and propose two consensus options for how to substantially strengthen the manuscript. We carefully considered both proposed options and decided to implement the first option in full. We have therefore made main text edits relating to all three points of the first option. The marked up article file shows exactly which parts of the text were edited in relation to the points.

      Second option; rewrite the manuscript so no mechanistic claims are made that are not supported by the information presented:

      - Accept the possibility of sampling bias and its limitation in the presentation of cross-linking lattice observation, outlining future work needed to address this.

      - Discuss biomechanics theory needs to be developed to show how the lattice could have functioned biomechanically and remove unsupported speculation about this. It is acceptable to present a new hypothesis, clearly outline the motivation for the hypothesis and how it can be tested with future biomechanical and comparative studies. Remove and replace all current speculative sections and phrasing accordingly and replace this with the framework supporting the idea of a new hypothesis.

      The first option was implemented instead of the second option.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Previous work has shown that the evolutionarily-conserved division-orienting protein LGN/Pins (vertebrates/flies) participates in division orientation across a variety of cell types, perhaps most importantly those that undergo asymmetric divisions. Micromere formation in echinoids relies on asymmetric cell division at the 16-cell stage, and these authors previously demonstrated a role for the LGN/Pins homolog AGS in that ACD process. Here they extend that work by investigating and exploiting the question of why echinoids but not other echinoderms form micromeres. Starting with a phylogenetics approach, they determine that much of the difference in ACD and micromere formation in echinoids can be attributed to differences in the AGS Cterminus, in particular a GoLoco domain (GL1) that is missing in most other echinoderms.

      Thank you for the summary.

      Strengths: 

      There is a lot to like about this paper. It represents a superlative match of the problem with the model system and the findings it reports are a valuable addition to the literature. It is also an impressively thorough study; the authors should be commended for using a combination of experimental approaches (and consequently generating a mountain of data). 

      Thank you.

      Weaknesses: 

      There is an intriguing finding described in Figure 1. AGS in sea cucumbers looks identical to AGS in the pencil urchin, at least at the C terminus (including the GL1 domain). Nevertheless, there are no micromeres in sea cucumbers. Therefore another mechanism besides GL motif organization has arisen to support micromere formation. It is a consequential finding and an important consideration in interpreting the data, but I could not find any mention of it in the text. That is a missed opportunity and should be remedied, ideally not only through discussion but also experimentation. Specifically: does sea cucumber AGS (SbAGS) ever localize to the vegetal cortex in sea cucumbers? Can it do so in echinoids? Will that support micromere formation? 

      Thank you for pointing this out. 

      To respond to the Reviewer’s request, we synthesized sea cucumber (Sb) AGS based on the sequence available in the database and tested it in the sea urchin (Sp) embryos, which is enclosed in Fig. S3. We performed this experiment to confirm that SbAGS localizes less at the vegetal cortex than SpAGS as a proof of principle. However, we hesitate to conduct further studies using the synthetic sequence in this study. Sea cucumbers are an emerging yet understudied model. This species is not readily available or established as a model system for embryology. Even for the two species (A. japonicus in Japan and P. parvimensis in the USA) that were previously used for embryonic studies, their gametes are typically available only for 12 months in a year. Since some echinoderm researchers are aiming to establish sea cucumbers as a model system in the near future (see 2024 review: PMID: 38368336), we hope to be able to have better access to their embryos in the future. Yet, it may require a few more years to reach that condition.

      In this revised manuscript, we explained the above details and further added the discussion described below. All of the experimental models used in this study are wild animals obtained from the ocean, raising the standard for reproducibility. However, handling wild animals could come with challenges. We hope that the reviewer understands the unique benefits and challenges of this study.

      Discussion:

      Previous studies (PMIDs: 17726110; 21855794) suggest that GL1 is not involved in intramolecular interaction with TPR domains. This allows GL1 to interact independently with Gαi for cortical recruitment yet without influencing other GLs for AGS activation. To ensure GL1's independence, GL1 is typically located distantly from other GLs in Pins (flies), LGN (humans), and AGS (sea urchins). Based on this prior knowledge, we speculate three scenarios for sea cucumber (Sb) AGS not being able to localize or function during asymmetric cell division (ACD): 1) GL1 and GL2 are located too close to each other, compromising GL1's independence for recruitment. 2) A lack of GL4 loosens the autoinhibition state. 3) The GL1 sequence of SbAGS is quite different from that of echinoids’ AGS (Figure S2), compromising its recruiting efficacy. 

      For 1), we tested this possibility by making the SpAGS-GL1GL2 mutant that has GL1 and GL2 next to each other (Fig. 4G). This mutant indeed compromised its cortical localization and function in ACD. For 2), we showed that the lack of GL4 partially compromised ACD in SpAGS (Fig. 3F), suggesting that GL4 supports ACD. For 3), The results in Figure 4 indicate that the position but not the sequence of GL1 is critical for ACD. Based on these observations, we speculate a combination of 1) and 2) compromised SbAGS's ACD function. However, it is still possible that a significant difference in the GL1 sequence diminished its function as GL entirely. Future studies should address these remaining questions directly in the sea cucumber embryos once they are established as a model system in the near future (PMID: 38368336)

      The authors point out that AGS-PmGL demonstrates enrichment at the vegetal cortex (arrow in 5G, quantifications in 5H), unlike PmAGS. AGS-PmGL does not however support ACD. They interpret this result to indicate "that other elements of SpAGS outside of its C-terminus can drive its vegetal cortical localization but not function." This is a critical finding and deserves more attention. Put succinctly: Vegetal cortical localization of AGS is insufficient to promote ACD, even in echinoids. Why should this be?  

      Thank you for the suggestion. We revised our wording to be more succinct. Of note, as we noted in the text, AGS-PmGL has only two GL domains, which will likely not provide the full force to control ACD and result in insufficient ACD function.

      The authors did perform experiments to address this problem, hypothesizing that the difference might be explained by the linker region, which includes a conserved phosphorylation site that mediates binding to Dlg. They write "To test if this serine is essential for SpAGS localization, we mutated it to alanine (AGS-S389A in Fig. S3A). Compared to the Full AGS control, the mutant AGS-S389A showed reduced vegetal cortical localization (Fig. S3B-C) and function (Fig. S3D-E). Furthermore, we replaced the linker region of PmAGS with that of SpAGS (PmAGSSpLinker in Fig. S4A-B). However, this mutant did not show any cortical localization nor proper function in ACD (Fig. S4C-F). Therefore, the SpAGS C-terminus is the primary element that drives ACD, while the linker region serves as the secondary element to help cortical localization of AGS." 

      The experiments performed only make sense if the AGS-PmGL chimeric protein used in Figure 5 starts the PmGL sequence only after the Sp linker, or at least after the Sp phosphorylation site. I can't tell from the paper (Figure S3 indicates that it does, whereas S5 suggests otherwise), but it's a critical piece of information for the argument. 

      Thank you for the pointer, and we apologize for the confusion. AGS-PmGL contains the SpAGS linker domain. To clarify this point, we added the amino acid position at the junction of each chimeric construct diagram in Figs. 5 and S4. To clarify, Figure S5 is about the GL domain mutations (not about the Linker).  

      Another piece of missing information is whether the PmAGS can be phosphorylated at its own conserved phosphorylation site. The authors don't test this, which they could at least try using a phosphosite prediction algorithm, but they do show that the candidate phosphorylation site has a slightly different sequence in Pm than in Et and Sp (Fig. S4A). With impressive rigor, the authors go on to mutate the PmAGS phosphorylation site to make it identical to Sp. Nothing happens. Vegetal cortical localization does not increase over AGS-PmGL alone. Micromere formation is unrescued. 

      There is therefore a logic problem in the text, or at least in the way the text is written. The paragraph begins "Additionally, AGS-PmGL unexpectedly showed cortical localization (Figure 5G), while PmAGS showed no cortical localization (Figure 5B)." We want to understand why this is true, but the explanation provided in the remainder of the paragraph doesn't match the question: according to quite a bit of their own data, the phosphorylation site in the linker does not explain the difference. It might explain why AGS-PmGL fails to promote micromere formation, but only if the AGS-PmGL chimeric protein uses the Pm linker domain (see above).

      Thank you for the insightful suggestion. As suggested, we performed the phosphosite predictions using GPS 6.0 (PMID: 37158278) and enclosed the results in Fig. S4A (replacing the old Fig. S3A). The software predicts SpAGS and EtAGS have a predicted AuroraA phosphorylation site (RRRSMEN in Supplemental figure S4A) in their linker domain, while PmAGS does not. Sp and Et AGS also have the additional 5-7 predicted phosphorylation sites, while PmAGS has only three sites with low scores. Therefore, the linker domain is not conserved in PmAGS. 

      The PmAGS+SpLinker mutant does restore the predicted AuroraA phosphorylation site on the software, yet it does not restore the cortical localization or ACD function in the embryo. Therefore, other sites in the Linker region might also be necessary for cortical localization and ACD function of AGS. In this study, we did not perform further manipulations in the Linker domain. As the reviewer rightfully pointed out, even if we identify the Linker regions essential for AGS localization and function, it will be difficult to interpret the result unless we know what proteins interact with the Linker domain of AGS. Therefore, this is beyond the scope of the current manuscript. We discussed these remaining matters in the discussion section. 

      Another concern that is potentially related is the measurement of cortical signal. For example, in the control panel of Figure 5C, there is certainly a substantial amount of "non-cortical" signal that I believe is nuclear. I did not see a discussion of this signal or its implications. My impression of the pictures generally is that the nuclear signal and cortical signal are inversely correlated, which makes sense if they are derived from the same pool of total protein at different points of the cell cycle. If that's the case (and it might not be) I would expect some quantifications to be impacted. For example, the authors show in Figure S3B that AGS-S389A mutant does not localize to the cortex. However, this mutant shows a radically different localization pattern to the accompanying control picture (AGS), namely strong enrichment in what I assume to be the nucleus. Is the S389 mutant preventing AGS from making it to the cortex? Or are these pictures instead temporally distinct, meaning that AGS hasn't yet made it out of the nucleus? Notably, the work of Johnston et al. (Cell 2009), cited in the text, does not show or claim that the linker domain impacts Pins localization. Their model is rather that Pins is anchored at the cortex by Gαi, not Dlg, and that is the same model described in this manuscript.

      In agreement with that model and the results of Johnston et al., a later study (Neville et al. EMBO Reports 2023) failed to find a role for Dlg or the conserved phosphorylation site in Pins localization. 

      In the sea urchin embryo, the dye or GFP often appears in the nucleus randomly on top of the cytoplasm (for example, see Fig. S2b of PMID: 35444184). Further, embryos tend to incorporate exogenous genomic fragments more efficiently during early embryogenesis (PMID: 3165895). It is proposed that early embryos may have a loosened or incomplete nuclear envelope compared to adult cells as they divide rapidly (every 40 minutes). Therefore, any excess protein with no specific localization signal may randomly appear in the nucleus as it serves as an available space in the cell. As the Reviewer rightfully pointed out, we consider that the nuclear AGS signal is due to the lack of a specific destination since this signal pattern is not consistent across embryos. In contrast, the proteins that have nuclear localization (e.g., transcription factors) usually show a consistent nuclear signal across cells and embryos with less cytoplasmic signal. To avoid confusion, we replaced the S389A image in Fig. S3B (which is now Fig. S4C) as well as any other images that may create similar confusion.

      Reviewer #2 (Public Review): 

      This study from Dr. Emura and colleagues addresses the relevance of AGS3 mutations in the execution of asymmetric cell divisions promoting the formation of the micromere during seasearching development. To this aim, the authors use quantitative imaging approaches to evaluate the localisation of AGS3 mutants truncated at the N-terminal region or at the Cterminal region, and correlate these distributions with the formation of micromere and correct development of embryos to the pluteus stage. The authors also analyse the capacity of these mutated proteins to rescue developmental defects observed upon AGS3 depletion by morpholino antisense nucleotides (MO). Collectively these experiments revealed that the Cterminus of AGS3, coding for four GoLoco motifs binding to cortical Gaphai proteins, is the molecular determinant for cortical localisation of AGS3 at the micromeres and correct pluteus development. Further genetic dissections and expression of chimeric AGS3 mutants carrying shuffled copies of the GoLoco motifs or four copies of the same motifs revealed that the position of GoLoco1 is essential for AGS3 functioning. To understand whether the AGS3-GoLoco1 evolved specifically to promote asymmetric cell divisions, the authors analyse chimeric AGS3 variants in which they replaced the sea urchin GoLoco region with orthologs from other echinoids that do not form micromeres, or from Drosophila Pins or human LGN. These analyses corroborate the notion that the GoLoco1 position is crucial for asymmetric AGS3 functions. In the last part of the manuscript, the authors explore whether SpAGS3 interacts with the molecular machinery described to promote asymmetric cell division in eukaryotes, including Insc, NuMA, Par3, and Galphai, and show that all these proteins colocalize at the nascent micromere, together with the fate determinant Vasa. Collectively this evidence highlighted how evolutionarily selected AGS3 modifications are essential to sustain asymmetric divisions and specific developmental programs associated with them. 

      Thank you for the useful summary.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The quantifications of "vegetal cortical localization" are somewhat incomplete. As measured, "vegetal cortical localization" does not demonstrate particular enrichment at the vegetal cortex, only that some signal appears there. In other words, we can't tell for sure that there is any more signal at the vegetal cortex than anywhere else along the cortex, and in fact that's plainly true and even described for the ACS1111 and AGS2222 constructs. One solution would be to measure signal strength around the cell perimeter and see where it is strongest. 

      As suggested by the Reviewer, we added new measurements, focusing and comparing the signals on the animal versus vegetal cortices (Figs. 2C, 3D, 4C, 5C, &H, 9D & F, S3D, S4D &I). 

      A related issue is that the strength of cortical enrichment is indicated in this paper by the ratio of cortical to "non-cortical" signal, but "non-cortical" is not defined. Does it include the nuclear signal? 

      As described above, we replaced all measurements using the above animal vs. vegetal cortices to avoid confusion. The nuclear signal is thus not measured in these analyses.

      I'm enthusiastic about the results in Figure 7, but I can't really see them very well. Could you please consider changing the color scheme? For single-color figures, it would be helpful to view them as black on white rather than (for example) blue on black. That change is easily achieved with Fiji. 

      We revised the Figure as suggested.

      Page 3 Results section: "At the time of ACD, Insc recruits Pins/LGN to the cortex through Gαi": I understand this sentence to mean that Gαi is an intermediary protein that Insc uses to recruit Pins/LGN. I think the point should be made more clear. As shown in Figure 1, Insc binds to Pins/LGN directly and interacts with cortical polarity proteins directly. Recruitment therefore doesn't appear to require Gαi, but stable association with the membrane (a subsequent step) probably does. That model is shown and described in Figure 6A.

      Thank you for the pointer. We clarified our explanations as suggested.

      Reviewer #2 (Recommendations For The Authors): 

      The manuscript addresses an interesting question, and uses elegant genetic approaches associated with imaging analyses to elucidate the molecular mechanisms whereby AGS3 and spindle orientation proteins promote asymmetric divisions and specific developmental programs. This considered, it might be worth clarifying a few aspects of the reported findings. 

      (1) In some experimental settings, the presence of AGS3 mutants exacerbates the AGS3 deletion by MO (Figure 4F). Can the author speculate on what can be the molecular explanation? 

      Thank you for pointing this out. We speculate that AGS1111 and AGS2222 are unable to keep the auto-inhibited forms since they lack GL3 and GL4 as modeled in Figure 6. AGS-MO reduces the endogenous AGS, which compromises the vegetal polarity. In this embryo, constitutive active AGS likely further randomizes the polarity, as evidenced by AGS-OE results in Fig. S7, resulting in an even worse outcome. We elaborated on this part in the text.

      (2) Imaging analyses of Figure 4B-C suggest that the mutant AGS1111 does not localise at the vegetal cortex while AGS2222 does (Fig. 4C). However these mutants induce similar developmental defects (Figure 4F). What could be the reason? 

      We apologize for the confusion in Fig. 4C. The majority of embryos from both AGS1111 and 2222 groups failed to form micromeres and showed AGS localization across the cortex. Among the dozens we examined, 0 embryos from 1111 and 8 embryos from 2222 developed micromeres. Those 8 embryos still showed vegetal cortical localization, so the proportion appears high in Fig. 4B, yet it reflects the minority in the group. In contrast, Development was scored for all embryos (including those that failed to form micromeres), so the graph demonstrates the majority of embryos. To avoid this confusion, we replaced the old Fig. 4C with a new graph that analyzes the cortical signal levels at the vegetal versus animal cortices.

      (3) Figure 7 shows the crosstalk between AGS3 and other asymmetry players including NuMA. Vertebrate and Drosophila NuMA are ubiquitously present in tissues and localise to the spindle poles in mitosis. However, in Figures 7A and 7E NuMA seems expressed only in a subset of sea urchin embryonic cells. Is this the case? 

      As the Reviewer rightfully pointed out, Sea urchin NuMA is also present in all cells and localizes to the spindle (please see Fig. 2 of our previous paper PMID: 31439829). AGS is also slightly localized on the spindles of all cells. However, the PLA signal of AGS and NuMA mostly showed up in the vegetal cortex in this study, suggesting that major crosstalk may occur in the vegetal cortex. This does not rule out the possibility that minor interactions may also occur on the spindle or elsewhere in the cell, which was not quantifiable in this study. We clarified this point in the text.

    1. Author response:

      Reviewer #1 (Recommendations for the authors):

      (1) Storyline and Narrative Flow:

      Consider revising the manuscript to create a more coherent and consistent narrative. Clarify how each section of the study-particularly the transition from multi-omics data integration to single-cell RNA-seq validation-contributes to the overall research question. This will help readers better understand the logical flow of the study.

      In the upcoming revisions, we will optimize the logical connections between sections of the manuscript to clarify the role each part plays in the overall research question, making it easier for readers to follow.

      (2) Immune Cell Activity Analysis:

      Reevaluate the methods used to assess immune cell activities within the context of the tumor microenvironment. Consider providing additional justification for the relevance of using the cancer cell model for this analysis. If necessary, explore alternative methods or models that might offer more meaningful insights into immune-tumor interactions.

      We fully recognize the importance of using tumor models to analyze and validate immune activity results, and we are considering experimental research in this area in future projects.

      (3) Single-Cell RNA-Seq Validation:

      Expand the validation of your findings using single-cell RNA-seq data. This could include more in-depth analyses that explore the heterogeneity within the subtypes and confirm the robustness of your classification method at the single-cell level. This would strengthen the support for your claims about the relevance of the identified subtypes.

      In the current study, we have applied the obtained multi-omics profiling features to single-cell sequencing data to classify malignant cells. We analyzed the metabolic and cell communication differences between different subtypes of malignant cells and explored potential reasons for these differences. Next, we plan to conduct further analysis of the differences between malignant cell subtypes to identify additional clues and mechanisms underlying these variations.

      (4) Methodological Justification:

      Provide a more detailed rationale for the selection of machine learning algorithms and integration strategies used in the study. Explain why the chosen methods are particularly well-suited for this research, and discuss any potential limitations they might have.

      In the revised manuscript, we will include descriptions of the principles of these analytical methods, as well as examples of their application in other studies, to discuss the rationale and limitations of applying these methods in this research.

      (5) Figures and Visualizations:

      Improve the clarity of your figures by addressing the following:

      a) Figure 3A: Cluster the pathways to make the comparisons clearer and more meaningful.

      b) Figure 4A: Clearly explain the significance of the blue bar.

      c) Figure 4B: Ensure this figure is discussed in the main text to justify its inclusion.

      d) Figure 7C: Enhance the figure legend to provide more informative details.

      Additionally, ensure that figure descriptions go beyond the captions and provide detailed explanations that help the reader understand the significance of each figure.

      We fully agree with the reviewer’s suggestions regarding these figures, and we will make the necessary revisions in the revised manuscript.

      (6) Supplementary Materials:

      Consider including more detailed supplementary materials that provide additional validation data, extended methodological descriptions, and any other information that would support the robustness of your findings.

      When we submission the revised manuscript, we will include supplementary materials such as figures or tables that may enhance the presentation of the manuscript's completeness.

      (7) Recent Literature:

      a) Incorporate more recent studies in your discussion, especially those related to HCC subtypes and the application of machine learning in oncology. This will provide a more current context for your work and help position your findings within the broader field.

      We appreciate the reviewer's suggestion. We will incorporate more recent studies into the discussion section and optimize its content.

      (8) Data and Code Availability:

      Ensure that all data, code, and materials used in your study are made available in line with eLife's policies. Provide clear links to repositories where readers can access the data and code used in your analyses.

      We have indicated the sources of the data and tools used in the analysis process within the text, and these data and tools can be accessed through the websites or literature we have cited.

      Reviewer #2 (Recommendations for the authors):

      (1) While the computational findings are robust, further experimental validation of the two subtypes, particularly the role of the MIF signaling pathway, would strengthen the biological relevance of the findings. In vitro or in vivo validation could confirm the proposed mechanisms and their influence on patient prognosis.

      We fully recognize the importance of using tumor models to analyze and validate immune activity results, and we are considering experimental research in this area in future projects.

      (2) Consider testing the model on additional independent cohorts beyond the TCGA and ICGC datasets to further demonstrate its generalizability and applicability across different patient populations.

      We are considering looking for independent external datasets in the GEO database or other databases to validate our model.

      (3) Review the manuscript for long or complex sentences, which can be broken down into shorter, more readable parts.

      In the revised manuscript, we will address any grammatical issues present in the manuscript and modify long and complex sentences that may hinder reader comprehension.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Debeuf et al. introduce a new, fast method for the selection of suitable T cell clones to generate TCR transgenic mice, a method claimed to outperform traditional hybridoma-based approaches. Clone selection is based on the assessment of the expansion and phenotype of cells specific for a known epitope following immune stimulation. The analysis is facilitated by a new software tool for TCR repertoire and function analysis termed DALI. This work also introduces a potentially invaluable TCR transgenic mouse line specific for SARS-CoV-2.

      Strengths:

      The newly introduced method proved successful in the quick generation of a TCR transgenic mouse line. Clone selection is based on more comprehensive phenotypical information than traditional methods, providing the opportunity for a more rational T cell clone selection.

      The study provides a software tool for TCR repertoire analysis and its linkage with function.

      The findings entail general practical implications in the preclinical study of a potentially very broad range of infectious diseases or vaccination.

      A novel SARS-CoV-2 spike-specific TCR transgenic mouse line was generated.

      Weaknesses:

      The authors attempt to compare their novel method with a more conventional approach to developing TCR transgenic mice. In this reviewer's opinion, this comparison appears imperfect in several ways:

      (1) Work presenting the "traditional" method was inadequate to justify the selection of a suitable clone. It is therefore not surprising that it yielded negative results. More evidence would have been necessary to select clone 47 for further development of the TCR transgenic line, especially considering the significant time and investment required to create such a line.

      Based on Supplementary Figure 1A only, we understand the concern of the reviewer. However, the data presented in Supplementary Figure 1A is collected during the first rough screening of clones where only the production of IL-2 and IFN-y was measured as a readout for activation. Thereafter, a large selection of responsive clones was further grown and co-cultured with a dose-titration of the antigenic peptide pool. In this second co-culture, also flow cytometry readouts are included such as CD69 expression (as shown in Supplementary Figure 1B). Finally, a narrower selection of responder clones was co-cultured with the different individual peptides to unravel the specificity of the TCR of the clone. In conclusion, the clone was tested at least three times in three distinct set-ups with multiple different readouts.

      However, a good evaluation of a clone in an in vitro setting does not necessarily translate in optimal functioning of the cells in a biological context. For instance, some clones survive better in an in vitro setting than others or have already a more activated profile before stimulation.

      (2) The comparison is somewhat unfair, because the methods start at different points: while the traditional method was attempted using a pool of peptides whose immunogenicity does not appear to have been established, the new method starts by utilising tetramers to select T cells specific for a well-established epitope.

      Given the costs and time involved, only a single clone could be tested for either method, intrinsically making a proper comparison unfeasible. Even for their new method, the authors' ability to demonstrate that the selected clone is ideal is limited unless they made different clones with varying profiles to show that a particular profile was superior to others.

      In my view, there was no absolute need to compare this method with existing ones, as the proposed method holds intrinsic value.

      We acknowledge the importance of the well-established hydridoma technology and in no way intended to compare these methods head-to-head, nor do not want to question the validity of the classical methods. The reason why we also wanted to show the failed CORSET8 mouse was to highlight the parts of the TCR generating process which could be rationalized. We again want to emphasize that we do not want to compare methods in any way and recognise that we started from two different bases in terms of clone selection (peptide pool stimulation versus tetramer staining). While the tetramer staining that was employed in the generation of CORSET8 mice allowed to enrich the samples for specific responder clones, this enrichment step is not an absolute requirement for the implementation of the presented method or for the successful generation of a TCR Tg mouse model. An alternative approach could be to use the described method to select for activated and expanded clones upon immunisation and test their reactivity in subsequent steps using peptide stimulation before selecting a receptor. In conclusion, we merely wish to present a novel roadmap for others to use for the generation of their TCR Tg mouse to aid in the selection of the most preferable clone for their purposes.

      (3) While having more data to decide on clone selection is certainly beneficial, given the additional cost, it remains unclear whether knowing the expression profiles of different proteins in Figure 2 aids in selecting a candidate. Is a cell expressing more CD69 preferable to a cell expressing less of this marker? Would either have been effective? Are there any transcriptional differences between clonotype 1 and 2 (red colour in Figure 2G) that justify selecting clone 1, or was the decision to select the latter merely based on their different frequency? If all major clones (i.e. by clonotype count) present similar expression profiles, would it have been necessary to know much more about their expression profiles? Would TCR sequencing and an enumeration of clones have sufficed, and been a more cost-effective approach?

      The method we present in the paper serves as a proof-of-concept, to be adapted to the researcher’s own needs. We agree with the reviewer that for our intentions with the CORSET8 mice, TCRseq in combination with an enumeration of the clones could also have sufficed and would lower the cost of sequencing. However, we wish to present a roadmap for others to use for the generation of their TCR Tg mouse. Important in this, is that the cellular phenotype, and activation state can be taken into consideration, which might for some projects be essential.  

      Nonetheless, we do see clear interclonal differences regarding the expression of “activation” genes, where clone 1 is clearly one of the well activated and interferon producing clones (as shown in Author response image 1). As such, researchers could expand these types of analysis to probe for specific phenotypes of characteristics.

      Author response image 1.

      (4) Lastly, it appears that several of the experiments presented were conducted only once. This information should have been explicitly stated in the figure legends.

      To control for interexperimental variation, every experiment represented in the manuscript has been performed at least two times. We have added the additional information regarding the experimental repetitions and groups in the figure legends.

      Reviewer #2 (Public Review):

      Summary:

      The authors seek to use single-cell sequencing approaches to identify TCRs specific for the SARS CoV2 spike protein, select a candidate TCR for cloning, and use it to construct a TCR transgenic mouse. The argument is that this process is less cumbersome than the classical approach, which involves the identification of antigen-reactive T cells in vitro and the construction of T cell hybridomas prior to TCR cloning. TCRs identified by single-cell sequencing that are already paired to transcriptomic data would more rapidly identify TCRs that are likely to contribute to a functional response. The authors successfully identify TCRs that have expanded in response to SARS CoV2 spike protein immunization, bind to MHC tetramers, and express genes associated with functional response. They then select a TCR for cloning and construction of a transgenic mouse in order to test the response of resulting T cells in vivo following immunization with spike protein of coronavirus infection.

      Strengths:

      (1) The study provides proof of principle for the identification and characterization of TCRs based on single-cell sequencing data.

      (2) The authors employ a recently developed software tool (DALI) that assists in linking transcriptomic data to individual clones.

      (3) The authors successfully generate a TCR transgenic animal derived from the most promising T cell clone (CORSET8) using the TCR sequencing approach.

      (4) The authors provide initial evidence that CORSET8 T cells undergo activation and proliferation in vivo in response to immunization or infection.

      (5) Procedures are well-described and readily reproducible.

      Weaknesses:

      (1) The purpose of presenting a failed attempt to generate TCR transgenic mice using a traditional TCR hybridoma method is unclear. The reasons for the failure are uncertain, and the inclusion of this data does not really provide information on the likely success rate of the hybridoma vs single cell approach for TCR identification, as only a single example is provided for either.

      We refer to comments 2 and 3 of reviewer 1 for an answer to this point.

      (2) There is little information provided regarding the functional differentiation of the CORSET8 T cells following challenge in vivo, including expression of molecules associated with effector function, cytokine production, killing activity, and formation of memory. The study would be strengthened by some evidence that CORSET8 T cells are successfully recapitulating the functional features of the endogenous immune response (beyond simply proliferating and expressing CD44). This information is important to evaluate whether the presented sequencing-based identification and selection of TCRs is likely to result in T-cell responses that replicate the criteria for selecting the TCR in the first place.

      We agree with the reviewer that the data in the initial manuscript included only a limited in vivo functional validation of the CORSET8 T cells. Therefore, we extended these in vivo readouts and measured IFN-g production, CD69, T-bet expression (as measure for activation) and Ki-67 expression (as alternative readout than CTV for proliferation). In the single cell data, we saw that these markers were more pronounced in the selected clone compared to other clones. We could confirm these findings in vivo, and found a stronger induction of IFN-g, CD69, T-bet and Ki-67 in CORSET8 T cells compared to endogenous CD45.2 cells and even Spike-Tetramer+ CD45.2 endogenous cells. We added these data in Figure 4.

      (3) While I find the argument reasonable that the approach presented here has a lot of likely advantages over traditional approaches for generating TCR transgenic animals, the use of TCR sequencing data to identify TCRs for study in a variety of areas, including cancer immunotherapy and autoimmunity, is in broad use. While much of this work opts for alternative methods of TCR expression in primary T cells (i.e. CRISPR or retroviral approaches), the process of generating a TCR transgenic mouse from a cloned TCR is not in itself novel. It would be helpful if the authors could provide a more extensive discussion explaining the novelty of their approach for TCR identification in comparison to other more modern approaches, rather than only hybridoma generation.

      By integrating the recent technological advances in single cell sequencing into the generation of TCR Tg mice, possibilities arise to rationalize clone selection regarding clonal size, lineage/phenotype and functional characteristics. Often, the selection process based on hybridoma selection yields multiple epitope specific clones that upregulate CD69 or IL-2, and only minimal functional and phenotypic parameters are checked before prioritizing one clone to proceed with. In our experience, transgenic clones selected in this way sometimes render TCR clones unable to compete with endogenous polyclonal T clones in vivo. Taken all these caveats into account, the novelty we present here is that the researcher is fully able to select clones based on several layers of information without the need for extensive or repeated screening. Moreover, the selection of the TCR Tg clone can be done via the interactive and easily interpretable DALI tool. Owing to the browser-based interactive GUI, immunologists having limited coding experience can effectively analyse their complex datasets.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Regarding Supplementary Figure 1A was the experiment conducted more than once? Clone 47 seems minimally superior to the other clones. Incorporating a positive control, such as the response of the OT-I hybridoma to SIINFEKL, could have provided a benchmark to gauge the strength of the observed responses.

      Also, what was the concentration of the peptide used to restimulate the T cells in vitro? High peptide concentrations can lead to non-specific responses. Ideally, a titration should have been performed, perhaps in a subsequent experiment that only tested those clones that responded well initially. Given the resources required to create and maintain a transgenic mouse line, proceeding with the chosen clone based on the data presented seems to carry considerable risk.

      The experiment has been performed three times. The data presented in Supplementary Figure 1A is collected during the first rough screening of clones where only the production of IL-2 and IFN-y was measured as a readout for activation. Thereafter, a large selection of responsive clones was further grown and co-cultured with a dose-titration of the antigenic peptide pool. In this second co-culture, also flow cytometry readouts are included such as CD69 expression (as shown in Supplementary Figure 1B). Finally, a narrower selection of responder clones was co-cultured with the different individual peptides to unravel the specificity of the TCR of the clone. In conclusion, the clone was tested at least three times in three distinct set-ups with multiple different readouts.

      In Supplementary Figure 1C, no response to stimulation was detected. Ideally, this figure should have included a positive control, such as PMA/Ionomycin or aCD3/CD28 stimulation.

      We agree with the reviewer that this experiment should have included a positive control to validate the non-specific responsiveness of the clone and the technical feasibility of the experiment. Unfortunately, the initial CORSET8 line is frozen and is thus not easily available to repeat the experiment.

      Can the authors clarify their gating strategy in the legend of In Supplementary Figure 1D?

      Plotted cells are non-debris > single cells > viable cells > CD45+. We have added the information to the legend of Supplementary Figure 1D.

      In Figure 2, the figure legend should provide more detail on which cells were sorted for the single-cell RNA sequencing analysis. The materials and methods section explains that cells were stained for CD44. Were activated cells then sorted (either tetramer-positive or -negative), plus naïve CD8 T cells from a naïve mouse?

      Supplementary Figure 2 contains the detailed gating strategy during the sort for the single cell experiment. We have added additional red gates to the plots to clarify which samples were sent for sequencing. This has been adapted in the figure legends of both Figure 2 and Supplementary Figure 2. 

      In Figure 3, Rag1 sufficient transgenic mice display similar numbers of CD4 and CD8 T cells as WT mice in the spleen. Typically, transgenic mice present skewed frequencies of T cells towards the type generated (CD8 in this case), which the authors only found in the thymus of CORSET8 mice. Could this be discussed?

      The comment of the reviewer is valid as there is indeed a skewing towards CD8 T cells in the thymi of the CORSET8 mice. We looked back into the data of the experiments and noticed that poor resolution of some markers might have resulted in improper results. We have repeated this and added another T cell marker (TCRbeta) next to the already included CD3e marker. By including both markers, we were able to show that also in spleen the skewing towards the CD8 T cell phenotype is present.

      How many repetitions were performed for the experiments in Figures 3D and 3E? How many mice were analyzed for Figure 3E? Please provide this information in the figure legend. Also, include a proper quantification and statistical analysis of the data shown.

      New quantification graphs with statistical analysis have been added to Figure 3E. The accompanying figure legend has been adapted. The co-culture displayed in Figure 3D is a representative experiment of two repetitions.

      Figure 4C includes 3-4 mice per group. This experiment should have been replicated, and this information should be indicated in the figure legend.

      We apologise for omitting this data in the figure legend. The experiment presented in Figure 4A-C has been repeated twice, yielding results following the same trend. We were unable to pool the data as two different proliferation dyes were used in the separate experiments (CFSE and CTV). Furthermore, in the in vivo BSL3 experiments represented in figure 4E-H, we always took along the Spike/CpG-group as positive control. We have added the additional information regarding the experimental repetitions and groups in the figure legend.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Review #1:

      Summary:

      Jin et al. investigated how the bacterial DNA damage (SOS) response and its regulator protein RecA affect the development of drug resistance under short-term exposure to beta-lactam antibiotics. Canonically, the SOS response is triggered by DNA damage, which results in the induction of error-prone DNA repair mechanisms. These error-prone repair pathways can increase mutagenesis in the cell, leading to the evolution of drug resistance. Thus, inhibiting the SOS regulator RecA has been proposed as a means to delay the rise of resistance. 

      In this paper, the authors deleted the RecA protein from E. coli and exposed this ∆recA strain to selective levels of the beta-lactam antibiotic, ampicillin. After an 8-hour treatment, they washed the antibiotic away and allowed the surviving cells to recover in regular media. They then measured the minimum inhibitory concentration (MIC) of ampicillin against these treated strains. They note that after just 8-hour treatment with ampicillin, the ∆recA had developed higher MICs towards ampicillin, while by contrast, wild-type cells exhibited unchanged MICs. This MIC increase was also observed in subsequent generations of bacteria, suggesting that the phenotype is driven by a genetic change.

      The authors then used whole genome sequencing (WGS) to identify mutations that accounted for the resistance phenotype. Within resistant populations, they discovered key mutations in the promoter region of the beta-lactamase gene, ampC; in the penicillin-binding protein PBP3 which is the target of ampicillin; and in the AcrB subunit of the AcrAB-TolC efflux machinery. Importantly, mutations in the efflux machinery can impact the resistance towards other antibiotics, not just beta-lactams. To test this, they repeated the MIC experiments with other classes of antibiotics, including kanamycin, chloramphenicol, and rifampicin. Interestingly, they observed that the ∆recA strains pre-treated with ampicillin showed higher MICs towards all other antibiotics tested. This suggests that the mutations conferring resistance to ampicillin are also increasing resistance to other antibiotics.

      The authors then performed an impressive series of genetic, microscopy, and transcriptomic experiments to show that this increase in resistance is not driven by the SOS response, but by independent DNA repair and stress response pathways. Specifically, they show that deletion of the recA reduces the bacterium's ability to process reactive oxygen species (ROS) and repair its DNA. These factors drive the accumulation of mutations that can confer resistance to different classes of antibiotics. The conclusions are reasonably well-supported by the data, but some aspects of the data and the model need to be clarified and extended.

      We sincerely appreciate your overall summary of the manuscript and their positive evaluation of our work.

      Strengths:

      A major strength of the paper is the detailed bacterial genetics and transcriptomics that the authors performed to elucidate the molecular pathways responsible for this increased resistance. They systemically deleted or inactivated genes involved in the SOS response in E. coli. They then subjected these mutants to the same MIC assays as described previously. Surprisingly, none of the other SOS gene deletions resulted in an increase in drug resistance, suggesting that the SOS response is not involved in this phenotype. This led the authors to focus on the localization of DNA PolI, which also participates in DNA damage repair. Using microscopy, they discovered that in the RecA deletion background, PolI co-localizes with the bacterial chromosome at much lower rates than wild-type. This led the authors to conclude that deletion of RecA hinders PolI and DNA repair. Although the authors do not provide a mechanism, this observation is nonetheless valuable for the field and can stimulate further investigations in the future.

      In order to understand how RecA deletion affects cellular physiology, the authors performed RNA-seq on ampicillin-treated strains. Crucially, they discovered that in the RecA deletion strain, genes associated with antioxidative activity (cysJ, cysI, cysH, soda, sufD) and Base Excision Repair repair (mutH, mutY, mutM), which repairs oxidized forms of guanine, were all downregulated. The authors conclude that down-regulation of these genes might result in elevated levels of reactive oxygen species in the cells, which in turn, might drive the rise of resistance. Experimentally, they further demonstrated that treating the ∆recA strain with an antioxidant GSH prevents the rise of MICs. These observations will be useful for more detailed mechanistic follow-ups in the future.

      We are grateful to you for your positive assessment of the strengths of our manuscript and your recognition of its potential future applications.

      Weaknesses:

      Throughout the paper, the authors use language suggesting that ampicillin treatment of the ∆recA strain induces higher levels of mutagenesis inside the cells, leading to the rapid rise of resistance mutations. However, as the authors note, the mutants enriched by ampicillin selection can play a role in efflux and can thus change a bacterium's sensitivity to a wide range of antibiotics, in what is known as cross-resistance. The current data is not clear on whether the elevated "mutagenesis" is driven ampicillin selection or by a bona fide increase in mutation rate.

      We greatly appreciate you for raising this issue, as it is an important premise that must be clearly stated throughout the entire manuscript. To verify that the observed increase in mutation rate is a bona fide increase and not due to experimental error, we used a non-selective antibiotic, rifampicin, to evaluate the mutation frequency after drug induction, as it is a gold-standard method documented in other studies [Heterogeneity in efflux pump expression predisposes antibiotic-resistant cells to mutation, Science, 362, 6415, 686-690, 2018.]. In the absence of ampicillin treatment, the natural mutation rates detected using rifampicin were consistent between the wild-type and the ΔrecA strain. However, after ampicillin treatment, the mutation rate detected using rifampicin was significantly elevated only in the ΔrecA strain (Fig. 1G). We also employed other antibiotics, such as ciprofloxacin and chloramphenicol, in our experiments to treat the cells (data not shown). However, we observed that beta-lactam antibiotics specifically induced the emergence of resistance or altered the MIC in our bacterial populations. If resistance had pre-existed before antibiotic exposure or a bona fide increase in mutation rate, we would expect other antibiotics to exhibit a similar selective effect, particularly given the potential for cross-resistance to multiple antibiotics.

      Furthermore, on a technical level, the authors employed WGS to identify resistance mutations in the treated ampicillin-treated wild-type and ∆recA strains. However, the WGS methodology described in the paper is inconsistent. Notably, wild-type WGS samples were picked from non-selective plates, while ΔrecA WGS isolates were picked from selective plates with 50 μg/mL ampicillin. Such an approach biases the frequency and identity of the mutations seen in the WGS and cannot be used to support the idea that ampicillin treatment induces higher levels of mutagenesis.

      We appreciate your concern regarding potential inconsistencies in the WGS methodology. However, we would like to clarify that the primary aim of the WGS experiment was to identify the types of mutations present in the wild-type and ΔrecA strains after treatment of ampicillin, rather than to quantify or compare mutation frequencies. This purpose was explicitly stated in the manuscript.

      Furthermore, the choice of selective and non-selective conditions was made to ensure the successful isolation of mutants in both strains. Specifically, if selective conditions (50 μg/mL ampicillin) were applied to the wild-type strain, it would have been nearly impossible to recover colonies for WGS analysis, as wild-type cells are highly susceptible to ampicillin at this concentration (Top, Author response image 1). Conversely, under non-selective conditions, ΔrecA mutants carrying resistance mutations may not have been effectively isolated, which would have limited our ability to identify resistance mutations in these strains (Bottom, Author response image 1 Thus, the use of different selection pressures was essential for achieving the objective of mutation identification in this study.

      Author response image 1.

      After 8 hours of antibiotic treatment, the wild type or the ΔrecA cells were plated on agar plates either without ampicillin or with 50 μg/mL ampicillin and incubated for 24-48 hours. Top: Under selective conditions, no wild type colonies were recovered, indicating high susceptibility to the antibiotic, preventing further analysis. Bottom: In non-selective conditions, both ΔrecA resistant mutants and non-resistant cells grew, making it difficult to distinguish and isolate the mutants carrying resistance mutations.

      Finally, it is important to establish what the basal mutation rates of both the WT and ∆recA strains are. Currently, only the ampicillin-treated populations were reported. It is possible that the ∆recA strain has inherently higher mutagenesis than WT, with a larger subpopulation of resistant clones. Thus, ampicillin treatment might not in fact induce higher mutagenesis in ∆recA.

      Thanks for this suggestion. The basal mutation frequency of the wild-type and the ∆recA strain have been measured using rifampicin (Fig. 1G), and there is no significant difference between them.

      Reviewer #2:

      Summary:

      This study aims to demonstrate that E. coli can acquire rapid antibiotic resistance mutations in the absence of a DNA damage response. To investigate this, the authors employed a sophisticated experimental framework based on a modified Adaptive Laboratory Evolution (ALE) workflow. This workflow involves numerous steps culminating in the measurement of antibiotic resistance. The study presents evidence that a recA strain develops ampicillin resistance mutations more quickly than the wild-type, as shown by measuring the Minimum Inhibitory Concentration (MIC) and mutation frequency. Whole-genome sequencing of 15 recA-colonies resistant to ampicillin revealed predominantly inactivation of genes involved in the multi-drug efflux pump system, whereas, in the wild-type, mutations appear to enhance the activity of the chromosomal ampC cryptic promoter. By analyzing mutants involved in the SOS response, including a lexA3 mutant incapable of inducing the SOS response, the authors conclude that the rapid evolution of antibiotic resistance occurs in an SOS-independent manner when recA is absent.

      Furthermore, RNA sequencing (RNA-seq) of the four experimental conditions suggests that genes related to antioxidative responses drive the swift evolution of antibiotic resistance in the recA-strain.

      We greatly appreciate your overall summary of the manuscript and their positive evaluation of our work.

      Weaknesses:

      However, a potential limitation of this study is the experimental design used to determine the 'rapid' evolution of antibiotic resistance. It may introduce a significant bottleneck in selecting ampicillin-resistant mutants early on. A recA mutant could be more susceptible to ampicillin than the wild-type, and only resistant mutants might survive after 8 hours, potentially leading to their enrichment in subsequent steps. To address this concern, it would be critical to perform a survival analysis at various time points (0h, 2h, 4h, 6h, and 8h) during ampicillin treatment for both recA and wild-type strains, ensuring there is no difference in viability.

      We appreciate your suggestion. We measured the survival fraction at 0, 2, 4, 6, and 8 hours after ampicillin treatment. The results show no significant difference in antibiotic sensitivity between the wild-type and ΔrecA strain (Fig. S2). We therefore added a description int the main text, “Meanwhile, after 8 hours of treatment with 50 μg/mL ampicillin, the survival rates of both wild type and ΔrecA strain were consistent (Fig. S2)”.

      The observation that promoter mutations are absent in ΔrecA strains could be explained by previous research indicating that amplification of the AmpC genes is a mechanism for E. coli resistance to ampicillin, which does not occur in a recA-deficient background (PMID# 19474201).

      We are very grateful to you for providing this reference. We did examine the amplification of the ampC gene in both wild-type and _recA-_deficient strains, but we found no significant changes in its copy number after ampicillin treatment (Author response image 2). Therefore, the results and discussion regarding gene copy number were not included in this manuscript.

      Author response image 2.

      Copy number variations of genes in the chromosome before and after exposure to ampicillin at 50 µg/mL for 8 hours in the wild type and ΔrecA strain.

      The section describing Figure 3 is poorly articulated, and the conclusions drawn are apparent. The inability of a recA strain to induce the SOS response is well-documented (lines 210 and 278). The data suggest that merely blocking SOS induction is insufficient to cause 'rapid' evolution in their experimental conditions. To investigate whether SOS response can be induced independently of lexA cleavage by recA, alternative experiments, such as those using a sulA-GFP fusion, might be more informative.

      Thanks for your suggestion. We agree that detecting the expression level of SulA can provide valuable information to reveal the impact of the SOS system on rapid drug resistance. In addition to fluorescence visualization and quantification of SulA expression, regulating the transcription level of the sulA gene can achieve the same objective. Therefore, in our transcriptome sequencing analysis, we focused on evaluating the transcription level of sulA (Fig. 4E).

      In Figure 4E, the lack of increased SulA gene expression in the wild-type strain treated with ampicillin is unexpected, given that SulA is an SOS-regulated gene. The fact that polA (Pol I) is going down should be taken into account in the interpretation of Figures 2D and 2E.

      Thank you for your observation regarding the lack of increased SulA gene expression in the wild-type strain treated with ampicillin in Figure 4E. We agree that SulA is typically an SOS-regulated gene, and its expression is expected to increase in response to DNA damage induced by antibiotics like ampicillin. However, in our experimental conditions, the observed lack of increased SulA expression could be due to different factors. One possibility is that the concentration of ampicillin used, or the duration of treatment, was not applicable to induce a strong SOS response in the wild type strain under the specific conditions tested. Additionally, differences in experimental setups such as timing, sampling, or cellular stress responses could account for the lack of a pronounced upregulation of SulA.

      You may state that the fact that polA (Pol I) is going down should be taken into account in the interpretation of Figures 3D and 3E, and we agree with you.

      The connection between compromised DNA repair, the accumulation of Reactive Oxygen Species (ROS) based on RNA-seq data, and accelerated evolution is merely speculative at this point and not experimentally established.

      We greatly appreciate your comments. First, the correlation between DNA mutations and the accumulation of reactive oxygen species (ROS) has been experimentally confirmed. As shown in Fig. 4I, after the addition of the antioxidant GSH, DNA resistance mutations were not detected in the ΔrecA strain treated with ampicillin for 8 hours, compared to those without the addition of GSH, proving that the rapid accumulation of ROS induces the enhancement of DNA resistance mutations. Second, the enhancement of DNA resistance mutations in relation to bacterial resistance has been widely validated and is generally accepted. Finally, we appreciate the your suggestion to strengthen the evidence supporting ROS enhancement. To address this, we have added an experiment to measure ROS levels. Through flow cytometry, we found that ROS levels significantly increased in both the wild-type and ΔrecA strain after 8 hours of ampicillin treatment. However, ROS levels in the ΔrecA strain showed a significant further increase compared to the wild-type strain (Fig. 4G). Additionally, with the addition of 50 mM glutathione, no significant change in ROS levels was observed in either the wild-type or ΔrecA strain before and after ampicillin treatment (Fig. 4H). This result further confirms our finding in Fig. 4I, where adding GSH inhibited the development of antibiotic resistance.

      Reviewer #3:

      Summary:

      In the present work, Zhang et al investigate the involvement of the bacterial DNA damage repair SOS response in the evolution of beta-lactam drug resistance evolution in Escherichia coli. Using a combination of microbiological, bacterial genetics, laboratory evolution, next-generation, and live-cell imaging approaches, the authors propose short-term drug resistance evolution that can take place in RecA-deficient cells in an SOS response-independent manner. They propose the evolvability of drug resistance is alternatively driven by the oxidative stress imposed by the accumulation of reactive oxygen species and inhibition of DNA repair. Overall, this is a nice study that addresses a growing and fundamental global health challenge (antimicrobial resistance). However, although the authors perform several multi-disciplinary experiments, there are several caveats to the authors' proposal that ultimately do not fully support their interpretation that the observed antimicrobial resistance evolution phenotype is due to compromised DNA repair.

      We greatly appreciate your overall summary of the manuscript and positive evaluation of our work.

      Strengths:

      The authors introduce new concepts to antimicrobial resistance evolution mechanisms. They show short-term exposure to beta-lactams can induce durably fixed antimicrobial resistance mutations. They propose this is due to comprised DNA repair and oxidative stress. This is primarily supported by their observations that resistance evolution phenotypes only exist for recA deletion mutants and not other genes in the SOS response.

      Thanks for your positive comments.

      Weaknesses:

      The authors do not show any direct evidence (1) that these phenotypes exist in strains harboring deletions in other DNA repair genes outside of the SOS response, (2) that DNA damage is increased, (3) that reactive oxygen species accumulate, (4) that accelerated resistance evolution can be reversed by anything other than recA complementation. The authors do not directly test alternative hypotheses. The conclusions drawn are therefore premature.

      We sincerely thank you for your insightful comments. First, in this study, our primary focus is on the role of recA deficiency in bacterial antibiotic resistance evolution. Therefore, we conducted an in-depth investigation on E. coli strains lacking RecA and found that its absence promotes resistance evolution through mechanisms involving increased ROS accumulation and downregulation of DNA repair pathways. While we acknowledge the importance of other DNA repair genes outside of the SOS response, exploring them is beyond the scope of this paper. However, in a separate unpublished study, we have identified the involvement of another DNA recombination protein, whose role in resistance evolution is not yet fully elucidated, in promoting resistance development. This finding is part of another independent investigation.

      Regarding DNA damage and repair, our paper emphasizes that resistance-related mutations in DNA are central to the development of antibiotic resistance. These mutations are a manifestation of DNA damage. To demonstrate this, we measured mutation frequency and performed whole-genome sequencing, both of which confirmed an increase in DNA mutations.

      We appreciate the reviewer's suggestion to provide additional evidence for ROS accumulation, and we have now supplemented our manuscript with relevant experiments. Through flow cytometry, we found that ROS levels significantly increased in both the wild type and ΔrecA strains after 8 hours of ampicillin treatment. However, ROS levels in the ΔrecA strain showed a significant further increase compared to the wild-type strain (Fig. 4G). Additionally, with the addition of 50 mM glutathione, no significant change in ROS levels was observed in either the wild-type or ΔrecA strain before and after ampicillin treatment (Fig. 4H). This result further confirms our finding in Fig. 4I, where adding GSH inhibited the development of antibiotic resistance.

      Finally, in response to your question about reversing accelerated resistance evolution, we would like to highlight that, in addition to recA complementation, we successfully suppressed rapid resistance evolution by supplementing with an antioxidant, GSH (Fig. 4I). This further supports our hypothesis that increased ROS levels play a key role in driving accelerated resistance evolution in the absence of RecA.

      Recommendations for the authors:

      Reviewer #1:

      The author's model asserts that deletion of recA impairs DNA repair in E. coli, leading to an accumulation of ROS in the cell, and ultimately driving the rapid rise of resistance mutations. However, the experimental evidence does not adequately address whether the resistance mutations are true, de novo mutations that arose due to beta-lactam treatment, or mutations that confer cross-resistance enriched by ampicillin selection.

      a. Major: In Figure 1F & G, the authors show that the ∆recA strain, following ampicillin treatment, has higher resistance and mutation frequency towards rifampicin than WT. However, it is not clear whether the elevated resistance and mutagenesis are driven by mutations enriched by the ampicillin treatment (e.g. mutations in acrB, as seen in Figure 2) or by "new" mutations in the rpoB gene. As the authors note, the mutants enriched by ampicillin selection can play a role in efflux and can thus change a bacterium's sensitivity to a wide range of antibiotics, including rifampicin, in what is known as cross-resistance. Therefore, the mutation frequency calculation, which relies on quantifying rifampicin-resistant clones, might be confounded by bacteria with mutations that confer cross-resistance. A better approach to calculate mutation frequency would be to employ an assay that does not require antibiotic selection, such as a lac-reversion assay. This would mitigate the confounding effects of cross-resistance of drug-resistant mutations.

      We appreciate your thoughtful comments regarding the potential for cross-resistance to confound the mutation frequency calculation based on rifampicin-resistant clones. Indeed, as noted, ampicillin selection can enrich for mutants with enhanced efflux activity, which may confer cross-resistance to a range of antibiotics, including rifampicin.

      However, we believe that the current approach of calculating mutation frequency using rifampicin-resistant mutants is still valid in our specific context. Rifampicin targets the RNA polymerase β subunit, and resistance typically arises from specific mutations in the rpoB gene. These mutations are well-characterized and distinct from those typically associated with efflux-related cross-resistance. Thus, the likelihood of cross-resistance affecting our mutation frequency calculation is minimized in this scenario.

      Additionally, while the lac-reversion assay could be an alternative, it focuses on specific metabolic pathway mutations (such as those affecting lacZ) and would not necessarily capture the same types of mutations relevant to rifampicin resistance or antibiotic-induced mutagenesis. Given our experimental objective of understanding how ampicillin induces mutations that confer antibiotic resistance, the current approach of using rifampicin selection provides a direct and relevant measurement of mutation frequency under antibiotic stress.

      b. Major: It is important to establish what the basal mutation frequencies/rates of both the WT and ∆recA strains are. Currently, only the ampicillin-treated populations were reported. It is possible that the ∆recA strain has an inherently higher mutagenesis than WT. Thus, ampicillin treatment might not in fact induce higher mutagenesis in ∆recA.

      Thanks for your suggestion. The basal mutation frequency of the wild-type and the ∆recA strain have been measured using rifampicin (Fig. 1G), and there is no significant difference between them.

      c. Major: In the text, the authors write, "To verify whether drug resistance associated DNA mutations have led to the rapid development of antibiotic resistance in recA mutant strain, we randomly selected 15 colonies on non-selected LB agar plates from the wild type surviving isolates, and antibiotic screening plates containing 50 μg/mL ampicillin from the ΔrecA resistant isolates, respectively." Why were the WT clones picked from non-selective plates and the recA mutant from selective ones for WGS? It appears that such a procedure would bias the recA mutant clones to show more mutations (caused by selection on the ampicillin plate). The authors need to address this discrepancy.

      We appreciate your concern regarding potential inconsistencies in the WGS methodology. However, we would like to clarify that the primary aim of the WGS experiment was to identify the types of mutations present in the wild-type and ΔrecA strains after treatment of ampicillin, rather than to quantify or compare mutation frequencies. This purpose was explicitly stated in the manuscript.

      Furthermore, the choice of selective and non-selective conditions was made to ensure the successful isolation of mutants in both strains. Specifically, if selective conditions (50 μg/mL ampicillin) were applied to the wild type strain, it would have been nearly impossible to recover colonies for WGS analysis, as wild-type cells are highly susceptible to ampicillin at this concentration (Top, Author response image 1). Conversely, under non-selective conditions, ΔrecA mutants carrying resistance mutations may not have been effectively isolated, which would have limited our ability to identify resistance mutations in these strains (Bottom, Author response image 1). Thus, the use of different selection pressures was essential for achieving the objective of mutation identification in this study.

      d. Major: In some instances, the authors do not use accurate language to describe their data. In Figure 2A, the authors randomly selected 15 ∆recA clones from a selective plate with 50 µg/mL of ampicillin. These clones were then subjected to WGS, which subsequently identified resistant mutations. Based on the described methods, these mutations are a result of selection: in other words, resistant mutations were preexisting in the bacterial population, and the addition of ampicillin selection killed off the sensitive cells, enabling the proliferation of the resistant clones. However, the in Figure 2 legend and associated text, the authors suggest that these mutations were "induced" by beta-lactam exposure, which is misleading. The data does not support that.

      We appreciate your detailed feedback on the language used to describe our data. We understand the concern regarding the use of the term "induced" in relation to beta-lactam exposure. To clarify, we employed not only beta-lactam antibiotics but also other antibiotics, such as ciprofloxacin and chloramphenicol, in our experiments (data not shown). However, we observed that beta-lactam antibiotics specifically induced the emergence of resistance or altered the MIC in our bacterial populations. If resistance had pre-existed before antibiotic exposure, we would expect other antibiotics to exhibit a similar selective effect, particularly given the potential for cross-resistance to multiple antibiotics.

      Furthermore, we used two different ∆recA strains, and the results were consistent between the strains (Fig. S3). Given that spontaneous mutations can occur with significant variability in populations, if resistance mutations pre-existed before antibiotic exposure, the selective outcomes should have varied between the two strains.

      Most importantly, we found that the addition of anti-oxidative compound GSH prevented the evolution of antibiotic from the treatment of ampicillin in the ΔrecA strain. If we assume that resistant bacteria preexist in the ∆recA strain, then the addition of GSH should not affect the evolution of resistance. Therefore, we believe that the resistance mutations we detected were not simply the result of selection from preexisting mutations but were indeed induced by beta-lactam exposure.

      e. Major: For Figure 4J, using WGS the authors show that the addition of GSH to WT and ∆recA cells inhibited the rise of resistance mutations; no resistance mutations were reported. However, in the "Whole genome sequencing" section under "Materials and Methods", they state that "Resistant clones were isolated by selection using LB agar plates with the supplementation of ampicillin at 50 μg/mL". These clones were then genome-extracted and sequenced. Given the methodology, it is surprising that the WGS did not reveal any resistance mutations in the GSH-treated cells. How were these cells able to grow on 50 μg/mL ampicillin plates for isolation in the first place? The authors need to address this.

      We sincerely apologize for the confusion caused by the incorrect expression in the "Materials and Methods" section. Indeed, when bacteria were treated with the combination of antibiotics and GSH, resistance was significantly suppressed, and no resistant clones could be isolated from selective plates (i.e., LB agar supplemented with 50 μg/mL ampicillin).

      To address this, we instead plated the bacteria treated with antibiotics and GSH onto non-selective plates (without ampicillin) and randomly selected 15 colonies for WGS. None of them showed resistance mutations. We will revise the text in the "Materials and Methods" section to accurately reflect this procedure and provide clarity.

      f. Minor: for Figure 1G, it is misleading to have both "mutation frequency" and "mutant rate" in the y-axis; the two are defined and calculated differently. Based on the Materials and Materials, "mutation frequency" would be the appropriate term. Also, for the ∆recA strain, it is a bit unusual to see mutation frequencies that are tightly clustered. Usually, mutation frequencies follow the Luria-Delbruck distribution. Can the authors explain why the ∆recA data looks so different compared to, say, the WT mutation frequencies?

      Thank you for your insightful feedback. We agree that having both "mutation frequency" and "mutant rate" on the y-axis is misleading, as these terms are defined and calculated differently. To avoid confusion, we will revise Figure 1G to use only "mutation frequency" as the correct term, in line with the methods described in the Materials and Methods section.

      Regarding the ∆recA strain's mutation frequencies, we acknowledge that the data appear more tightly clustered compared to the expected Luria-Delbruck distribution seen in the wild type strain. In fact, the y-axis of the Figure 1G is logarithmic, this causes the data to appear more clustered.

      We further added the basal mutation frequency in the wild type and ∆recA strains before the exposure to ampicillin. The basal mutation frequency of the wild-type and the ∆recA strain have been measured using rifampicin (Fig. 1G), and there is no significant difference between them.

      g. Minor: It needs to be made clear in the Main Text what the selective antibiotic agar plate used was, rifampicin or ampicillin. I am assuming it was rifampicin, as ampicillin plates would yield resistance frequencies close to 100%, given the prior treatment of the culture with ampicillin.

      Thanks for your comments. Depending on the objective, we used different selective plates. For example, when testing the mutation frequency of antibiotic resistance, we used a selective plate containing rifampicin in order to utilize a non-inducing antibiotic, which is the standard method for calculating resistance mutation frequency. In the WGS experiment, to obtain mutations specific to ampicillin resistance, we selected a selective plate containing ampicillin.

      Reviewer #2:

      The Y-axis label (log10 mutant rate) in Figure 1G is misleading or incorrect.

      Thanks for your comments and we apologize for this misleading information. The Figure 1G has been revised accordingly.

      In line 393 of the discussion, the authors claim that excessive ROS accumulation drives the evolution of ampicillin resistance, which has not been conclusively demonstrated. Additional experiments are needed to support this statement.

      We greatly appreciate your comments. First, the correlation between DNA mutations and the accumulation of reactive oxygen species (ROS) has been experimentally confirmed. As shown in Fig. 4I, after the addition of the antioxidant GSH, DNA resistance mutations were not detected in the ΔrecA strain treated with ampicillin for 8 hours, compared to those without the addition of GSH, proving that the rapid accumulation of ROS induces the enhancement of DNA resistance mutations. Second, the enhancement of DNA resistance mutations in relation to bacterial resistance has been widely validated and is generally accepted. Finally, we appreciate the your suggestion to strengthen the evidence supporting ROS enhancement. To address this, we have added an experiment to measure ROS levels. Through flow cytometry, we found that ROS levels significantly increased in both the wild-type and ΔrecA strain after 8 hours of ampicillin treatment. However, ROS levels in the ΔrecA strain showed a significant further increase compared to the wild-type strain (Fig. 4G). Additionally, with the addition of 50 mM glutathione, no significant change in ROS levels was observed in either the wild-type or ΔrecA strain before and after ampicillin treatment (Fig. 4H). This result further confirms our finding in Fig. 4I, where adding GSH inhibited the development of antibiotic resistance.

      The abstract is overly complex and difficult to read, e.g. "Contrary to previous findings, it is shown that this accelerated resistance development process is dependent on the hindrance of DNA repair, which is completely orthogonal to the SOS response").

      Thank you for the valuable feedback regarding the complexity of the abstract. We agree that certain sections could be simplified for clarity. In response, we have revised the abstract to make it more concise and easier to understand. For example, the sentence “Contrary to previous findings, it is shown that this accelerated resistance development process is dependent on the hindrance of DNA repair, which is completely orthogonal to the SOS response” has been rewritten as: "Unlike earlier studies, we found that the rapid development of resistance relies on the hindrance of DNA repair, a mechanism that operates independently of the SOS response."

      Reviewer #3:

      As indicated above, direct evidence is needed to show (1) that these phenotypes exist in strains harboring deletions in other DNA repair genes outside of the SOS response, (2) that DNA damage is increased, (3) that reactive oxygen species accumulate, (4) that accelerated resistance evolution can be reversed by anything other than recA complementation. There are also other resistance evolution mechanisms untested here, including transcription-coupled repair (TCR) mechanisms involving Mfd. These need to be shown in order to draw the conclusions proposed.

      We sincerely thank you for your insightful comments. First, in this study, our primary focus is on the role of recA deficiency in bacterial antibiotic resistance evolution. Therefore, we conducted an in-depth investigation on E. coli strains lacking RecA and found that its absence promotes resistance evolution through mechanisms involving increased ROS accumulation and downregulation of DNA repair pathways. While we acknowledge the importance of other DNA repair genes outside of the SOS response and other resistance evolution mechanisms including the TCR mechanism, exploring them is beyond the scope of this paper. However, in a separate unpublished study, we have identified the involvement of another DNA recombination protein, whose role in resistance evolution is not yet fully elucidated, in promoting resistance development. This finding is part of another independent investigation.

      Regarding DNA damage and repair, our paper emphasizes that resistance-related mutations in DNA are central to the development of antibiotic resistance. These mutations are a manifestation of DNA damage. To demonstrate this, we measured mutation frequency and performed whole-genome sequencing, both of which confirmed an increase in DNA mutations.

      We appreciate the reviewer's suggestion to provide additional evidence for ROS accumulation, and we have now supplemented our manuscript with relevant experiments. Through flow cytometry, we found that ROS levels significantly increased in both the wild type and ΔrecA strains after 8 hours of ampicillin treatment. However, ROS levels in the ΔrecA strain showed a significant further increase compared to the wild-type strain (Fig. 4G). Additionally, with the addition of 50 mM glutathione, no significant change in ROS levels was observed in either the wild-type or ΔrecA strain before and after ampicillin treatment (Fig. 4H). This result further confirms our finding in Fig. 4I, where adding GSH inhibited the development of antibiotic resistance.

      Finally, in response to your question about reversing accelerated resistance evolution, we would like to highlight that, in addition to recA complementation, we successfully suppressed rapid resistance evolution by supplementing with an antioxidant, GSH (Fig. 4I). This further supports our hypothesis that increased ROS levels play a key role in driving accelerated resistance evolution in the absence of RecA.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Thank you for your assessment and constructive critique, which helped us to improve the manuscript and its clarity. Upon carefully reading through the comments, we noticed that, based on the Reviewer's questions, some of our answers were already available but “hidden” as supplementary data. Thus, we changed the following two figures and text accordingly to showcase our results to the reader better:

      A) To highlight how mobile service data can indicate the spread of highly prevalent variants, we added a high-prevalence subcluster to Figure 2 (previously shown in Supplementary Figures S4 and S5) and, in exchange, moved one low-prevalence subcluster from Figure 2 back into the supplement. The figure is now showing a low and a high prevalent subcluster instead of two low prevalent subclusters.

      B) Based on Reviewer 1’s question about where samples were taken in regards to the mobility data from the community of the first identification (negative controls), we now highlight all the mobility data that was available to us in Figure 3 (as triangles) instead of just a few top mobility hits for both - mobility guided and random surveillance (serving as a negative control for the former). This way, we think, it is clearer how random sampling was also performed in some regions where mobility was coming from the community of origin (as asked by Reviewer 1) - the detailed trips and sampling are now part of the supplement for data transparency reasons. We also noticed a typo in the GPS coordinates, aligning one of the arrows falsely, which is corrected in the improved Figure 3.

      We have also included the R-Scripts used to generate all the figures in the manuscript in an OSF repository (we updated the “Data sharing statement”). We also updated Figure 1 slightly and extended the supplemental material. The remaining comments to reviewers are addressed point-by-point below.

      Reviewer 1 (Public Review):

      In "1 Exploring the Spatial Distribution of Persistent SARS-CoV-2 Mutations -Leveraging mobility data for targeted sampling" Spott et al. combine SARS-CoV-2 genomic data alongside granular mobility data to retrospectively evaluate the spread of SARS-CoV-2 alpha lineages throughout Germany and specifically Thuringia. They further prospectively identified districts with strong mobility links to the first district in which BQ.1.1 was observed to direct additional surveillance efforts to these districts. The additional surveillance effort resulted in the earlier identification of BQ.1.1 in districts with strong links to the district in which BQ.1.1 was first observed.

      Thank you for taking the time to review our work.

      (1) It seems the mobility-guided increased surveillance included only districts with significant mobility links to the origin district and did not include any "control" districts (those without strong mobility links). As such, you can only conclude that increasing sampling depth increased the rate of detection for BQ.1.1., not necessarily that doing so in a mobility-guided fashion provided an additional benefit. I absolutely understand the challenges of doing this in a real-world setting and think that the work remains valuable even with this limitation, but I would like the lack of control districts to be more explicitly discussed.

      Thank you for the critical assessment of our work. We agree that a control is essential for interpreting the results. In our case, randomized surveillance (“the gold standard”) served as a control with a total sampling depth seven times higher than the mobility-guided sampling. To better reflect the sampling in regards to the available mobility data, we revisited Figure 3 and added all the mobility information from the origin that was available to us. We also added this information to the random surveillance to provide a clearer picture to the reader. This now clearly shows how randomized surveillance covered communities with varying degrees of incoming mobility from the community of first occurrences, thereby underlining its role as a negative control. We updated the manuscript to reflect these changes and included the October 2020 and June 2021 mobility datasets in Supplementary Table S6. We agree that the sampling depth increases the detection, which is the point of guided sampling to increase sampling, specifically in areas where mobility points towards a possible spread. In regards to the negative control: Random surveillance (not Mobility-guided) in October covered 40 samples in the northwest region of Thuringia (Mobility-guided covered 19 samples). Thus, random surveillance also contained 31 out of 132 samples with a mobility link towards the first occurrence of BQ1.1 but with varying amounts of mobility (low to high).

      We added this information to the main text:

      Line 270 to 293:

      Following its first Thuringian identification, we utilized the latest available dataset of the past two years of mobile service data (October 2020 and June 2021) to investigate the residential movements for the community of first detection. Considering the highest incoming mobility from both datasets, we identified 18 communities with high (> 10,000), 34 with medium (2,001-10,000), and 82 with low (30-2,000) number of incoming one-way trips from the originating community (purple triangles in Figure 3a). As a result, we specifically requested all the available samples from the eight communities with the highest incoming mobility. Still, we were restricted to the submission of third parties over whom we had no influence. This led to the inclusion of the following eight communities with the most residential movement from the originating community: four in central and three in NW of Thuringia, one in NW-neighboring state Saxony-Anhalt. The samples requested from central Thuringia were also due to their geographic arrangement as a “belt” in central Thuringia, linking three major cities (see Supplementary Figure S1). Subsequently, we collected 19 additional samples (isolated between the 17th and 25th of October 2022; see “Guided Sampling” for October 2022, Figure 3a) besides the randomized sampling strategy. Thus, the sampling depth was increased in communities with high incoming mobility from the first origin.

      As part of the general Thuringian surveillance, we collected 132 samples for October (covering dates between the 5th and 31st) and 69 samples in November (covering dates between the 1st and 25th; see Figure 3b and c). Randomized sampling was not influenced or adjusted based on the mobility-guided sample collection. Thus, it also contains samples from communities with a mobility link towards the first occurrence of BQ.1.1, as they were part of the regular random collection (see gray triangles in Figure 3b). A complete overview of all samples is provided in Supplementary Table S5. The mobility datasets from October 2020 and June 2021 for all sampled communities are provided in Supplementary Table S6.

      Line 305 to 313:

      Among the 19 samples specifically collected based on mobile service data, we identified one additional sample of the specific Omicron sublineage BQ.1.1 in a community with high incoming mobility (n = 14, number of trips = 37,499) with a distance of approximately 16 km between both towns. Our randomly sampled routine surveillance strategy did not detect another sample during the same period. This was despite a seven times higher overall sample rate, which included 31 samples from communities with an identified incoming mobility from the community of the first occurrence (October 2022, Figure 3b). Only in the one-month follow-up were four other samples identified across Thuringia through routine surveillance (November 2022, Figure 3c).

      Line 325 to 333:

      In summary, increasing the sampling depth in the suspected regions successfully identified the specified lineage using only a fraction of the samples from the randomized sampling. Conversely, randomized surveillance, the “gold standard” acting as our negative control, did not identify additional samples with similar sampling depths in regions with no or low incoming mobility or even in high mobility regions with less sampling depth. Implementing such an approach effectively under pandemic conditions poses difficult challenges due to the fluctuating sampling sizes. Although the finding of the sample may have been coincidental, our proof of concept demonstrated how we can leverage the potential of mobile service data for targeted surveillance sampling.

      (2) Line 313: While this work has reliably shown that the spread of Alpha was slower in Thuringia, I don't think there have been sufficient analyses to conclude that this is due to the lack of transportation hubs. My understanding is that only mobility within Thuringia has been evaluated here and not between Thuringia and other parts of Germany.

      Thank you for pointing this out. We noticed that the original sentence lacked the necessary clarity. The statement in line 313 was based on the observation that Alpha first occurred in federal states with major transport hubs, such as international airports and ports, which Thuringia lacks, as demonstrated in the Microreact dataset. For clarification, we adjusted the sentence as follows:

      Line 340 and following:

      A plausible explanation for the delayed spread of the Alpha lineage in Thuringia is the lack of major transport hubs, as Alpha first occurred in federal states with such hubs. Previous studies have already highlighted the impact of major transportation hubs in the spread of Sars-CoV-2.

      (3) Line 333 (and elsewhere): I'm not convinced, based on the results presented in Figure 2, that the authors have reliably identified a sampling bias here. This is only true if you assume (as in line 235) that the variant was in these districts, but that hasn't actually been demonstrated here. While I recognize that for high-prevalence variants, there is a strong correlation between inflow and variant prevalence, low-prevalence variants by definition spread less and may genuinely be missing from some districts. To support this conclusion that they identified a bias, I'd like to see some type of statistical model that is based e.g. on the number of sequences, prevalence of a given variant in other districts, etc. Alternatively, the language can be softened ("putative sampling bias").

      Thank you for addressing this legitimate point of criticism in our interpretation. Due to the retrospective nature of the analysis and the fact that we found no additional samples of the clusters after the specified timeframes, we were limited to the samples in our dataset. Therefore, it is impossible to demonstrate if a variant was present in the relevant districts afterward. We agree that the variant’s low prevalence means they may genuinely not have spread to some districts. For clarification, we added the following statements and changed the wording accordingly:

      Additional statement in line 248:

      However, due to their low prevalence, it is also possible that these subclusters have not spread to the indicated districts.

      Adjusted wording in line 361:

      We exemplified this approach with the Alpha lineage, where mobile service data indicated a putative sampling bias and partially predicted the spread of our Thuringian subclusters.

      Recommendations:

      (1) I applaud the use of the microreact page to make the data public, however, I don't see any reference to a GitHub or Zenodo repository with the analysis code. The NextStrain code is certainly appreciated but there is presumably additional code used to identify the clusters, generate figures, etc. I generally prefer this code be made public and it is recommended by eLife.

      Thank you for your appreciation. We have now included the R-scripts in the manuscript’s OSF repository. These were used to create the figures in the manuscript and supplement utilizing the supplementary tables 1-6, which are also stored in the repository. To clearly communicate which data is provided, we changed lines 513 and 514 of the “Data sharing statement” as follows:

      Line 513 and following:

      Supplementary tables and the R-scripts used to generate all figures are also provided in the repository under https://osf.io/n5qj6/. These include the mobile service data used in this study, which is available in processed and anonymized form.

      The subcluster identification was performed manually. By adding each sample's mutation profile to the Microreact metadata file, we visually screened the phylogenetic time tree for all non-Alpha specific mutations present in at least 20 Thuringian genomes. We then applied the criteria described in the Methods section to identify the nine Alpha subclusters. For clarification, we changed line 436:

      Line 436:

      We then manually screened for mutations present in at least 20 genomes with a small phylogenetic distance and a time occurrence of at least two months.

      Reviewer 2 (Public Review):

      In the manuscript, the authors combine SARS-CoV-2 sequence data from a state in Germany and mobility data to help in understanding the movement of the virus and the potential to help decide where to focus sequencing. The global expansion in sequencing capability is a key outcome of the public health response. However, there remains uncertainty about how to maximise the insights the sequence data can give. Improved ability to predict the movement of emergent variants would be a useful public health outcome. Also knowing where to focus sequencing to maximising insights is also key. The presented case study from one State in Germany is therefore a useful addition to the literature. Nevertheless, I have a few comments.

      Thank you for taking the time to review our work.

      (1) One of the key goals of the paper is to explore whether mobile phone data can help predict the spread of lineages. However, it appears unclear whether this was actually addressed in the analyses. To do this, the authors could hold out data from a period of time, and see whether they can predict where the variants end up being found.

      Based on your feedback, we noticed that the results of the other seven clusters presented in the supplement were not appropriately highlighted, causing them to be overlooked. We indeed demonstrated that predicting viral spread based on mobility data is possible, as shown for the high-prevalence subcluster 7 (Cluster “ORF1b:A520V”, 811 samples). This was briefly mentioned in lines 240-242, but the cluster was only shown in Supplementary Figures S4 and S5. Instead, we focused more on the putative sampling bias that the mobility for low-prevalence subclusters could indicate as an interesting use case of mobility data. This addresses a concrete problem of every surveillance: successfully identifying low-prevalence targets. However, based on your feedback, we revisited Figure 2, adding the plots of the high-prevalence subcluster: “ORF1b:A520V” from Supplementary Figures S4 and S5 while moving the low-prevalence subcluster “S:N185D” from Figure 2 into the Supplementary Figures S4 and S5. Additionally, we changed line 229 to highlight this result properly.

      line 229 and following:

      The mobile service data-based prediction of a subcluster’s spread aligned well with the subsequent regional coverage of fast-spreading, highly prevalent subclusters, such as subcluster 7, which covered 811 samples (see Figure 2). In contrast, the predicted spread for the low-prevalence subclusters did not correspond well with the actual occurrence.

      (2) The abstract presents the mobility-guided sampling as a success, however, the results provide a much more mixed result. Ultimately, it's unclear what having this strategy really achieved. In a quickly moving pandemic, it is unclear what hunting for extra sequences of a specific, already identified, variant really does. I'm not sure what public health action would result, especially given the variant has already been identified.

      Thank you for your critical assessment of the presented results and their interpretation.

      Here, we aimed to provide an alternative to the standard randomized surveillance strategy. Through mobility-guided sampling, we sought to increase identification chances while necessitating fewer samples and decreasing costs, ultimately enhancing surveillance efficiency. The Omicron-lineage BQ.1.1 was the perfect example to prove this concept under actual pandemic conditions. Yet, the strategy is not limited to low-prevalence sublineages but can be applied to virtually any surveillance case. However, from your question, we recognize that this conclusion was unclear from the text. Therefore, we adapted the conclusion to better communicate the real implications of our proof of concept. Additionally, we altered line 42 in the abstract for clarification.

      However, we did not assess the benefits of surveillance itself, as the German Robert Koch Institute (RKI) already had outlined its importance for tracking different viral variants. This tracking served several reasons, like monitoring vaccine escapism, mutational progress, and assessing available antibodies for treatment.

      Line 42:

      The latter concept was successfully implemented as a proof-of-concept for a mobility-guided sampling strategy in response to the surveillance of Omicron sublineage BQ.1.1.

      Line 364 to 374:

      Another approach is actively guiding the sampling process through mobile service data, which we demonstrated with our proof of principle focusing on the Omicron-lineage BQ.1.1 as a real-life example. This approach could allow for a flexible allocation of surveillance resources, enabling adaptation to specific circumstances and increasing sampling depth in regions where a variant is anticipated. By incorporating guided sampling, much fewer resources may be needed for unguided or random sampling, thereby reducing overall surveillance costs.

      Additionally, while this approach is particularly useful for identifying low-prevalence variants, it is not limited to such variants. Still, it can provide a guided, more cost-efficient, low-sampling alternative to general randomized surveillance that can also be applied to other viruses or lineages.

      (3) Relatedly, it is unclear to me whether simply relying on spatial distance would not be an alternative simpler approach than mobile phone data. From Figure 2, it seems clear that a simple proximity matrix would work well at reconstructing viral flow. The authors could compare the correlation of spatial, spatial proximity, and CDR data.

      Thank you for pointing this out. While proximity data might appear to be an obvious choice, it has significant limitations compared to mobility data, especially in the context of our study. Proximity data assumes that spatial distance alone can accurately represent movement patterns, which would only be true in a normally distributed traffic network. Geographic features such as mountains, cities, and highways affect traffic flows, leading to variability over distance and time, which are beyond the scope of spatial proximity but efficiently captured by mobility data. In Figure 2, we presented a simplified view of the mobility data. Hence, proximity and mobility data appear to provide the same insights. However, as shown in the updated Figure 3, a detailed overview of the available mobility data reveals obvious and non-obvious spatial connections that proximity data can not capture. Incorporating such a level of detail in Figure 2 would have cluttered the figure and reduced its clarity (e.g., adding triangles for each Thuringian community).

      While a comparison between proximity data and mobility data would indeed be informative, it is beyond the scope of our current study, as our primary focus was to examine the useability of mobility data in explaining our subcluster’s spread in the first place. However, we agree it would be a valuable direction for future research. We summarized our thoughts from above in the following additional sentence:

      Line 374:

      Pre-generated mobility networks automatically tailored to each state's unique infrastructure and population dynamics could provide better-targeted sampling guidance rather than simple geographical proximity.

      Recommendations:

      (1) Line 128: What do these percentages mean - the proportion of States with at least one Alpha variant? Please clarify.

      We clarified the values at their first appearance in the text:

      Line 127:

      By March, Alpha had spread to nearly all states and districts (districts are similar to counties or provinces) in Germany (Median: 76·47 % Alpha samples among a federal states total sequenced samples compared to 36·03 % in February, excluding Thuringia) and Thuringia (Median: 85·29 %, up from 50·00 % in February).

      (2) Line 134: It's a little strange to compare the dynamics of a state with that of the whole country. For it lagged as compared to all other States?

      Line 134: “In summary, the spread of the Alpha lineage in Thuringia lagged roughly two weeks behind the general spread in the rest of Germany but showed similar proportions.”

      Thank you for the feedback. The statement refers to the comparison of Alpha-lineage proportions across federal states, excluding Thuringia, in lines 118 to 130. To simplify, we collectively referred to these federal states as “Germany” in the text. However, we recognize that this formulation is misleading, so we adjusted line 135 for clarification:

      Line 135:

      In summary, the spread of the Alpha lineage in Thuringia lagged roughly two weeks behind the general spread of other German federal states but showed similar proportions.

    1. Author response:

      Reviewer #1 (Public review)

      Weaknesses:

      The main weakness of the manuscript is that to a large degree, one of its main conclusions (MAP symmetry underlies differences in regenerative capacity) relies mainly on a correlation, without firmly establishing a causal link. However, this weakness is relatively minor because (1) it is partially addressed with the Spastin KO and (2) there isn't a trivial way to show a causal relationship in this case.

      We thank Reviewer #1 for their positive assessment of our manuscript. To further strengthen the claim that MAP asymmetry underlies differences in regenerative capacity, we could investigate the effect of depleting other MAPs that lose asymmetry after conditioning lesion (CRMP5 and katanin). One expects that similarly to spastin, this would disrupt the physiological asymmetry of DRG axons and impair axon regeneration. We will further discuss this issue in the revised version of the manuscript.

      Reviewer #2 (Public review):

      Weaknesses:

      In order for the method to be used it needs to be better described. For instance what proportion of neurons develop just two axonal branches, one of which is different? How selective are the researchers in finding appropriate neurons?

      We thank Reviewer #2 for their positive assessment of our manuscript. As suggested, we will include further methodological details on the in vitro system in the revised version of the manuscript. We have evaluated the percentage of DRG neurons exhibiting different morphologies in our cultures: multipolar (4%), bipolar, (35%) bell-shaped (17%), and pseudo-unipolar neurons (43%). This will be included in the revised manuscript. All the pseudo-unipolar neurons analysed had distinct axonal branches in terms of diameter and microtubule dynamics. For imaging purposes, we selected pseuso-unipolar neurons with axons unobstructed from other cells or neurites within a distance of at least 20–30 μm from the bifurcation point, to ensure optimal imaging. In the case of laser axotomy experiments, this distance was increased to 100–200 μm to ensure clear analysis of regeneration. These selection criteria will be detailed in the Methods of the revised manuscript.

      Reviewer #3 (Public review):

      Weaknesses:

      While some of the data are compelling, experimental evidence only partially supports the main claims. In its current form, the study is primarily descriptive and lacks convincing mechanistic insights. It misses important controls and further validation using 3D in vitro models.

      We recognize the importance of further exploring the contribution of other MAPs to microtubule asymmetry and regenerative capacity of DRG axons. In future work, we plan to investigate this issue by using knockout mice for katanin and CRMP5. To understand the mechanisms underlying the differential localization of MAPs in DRG axons, we performed in-situ hybridization to assess the availability of axonal mRNA but no differences were found between central and peripheral DRG axons (Figure 4 – figure supplement 2). To address whether differences in protein transport exist, we attempted to transduce DRG neurons with GFP-tagged spastin both in vitro and in vivo. However, these experiments were inconclusive as very low levels of spastin-GFP were detected. We are actively optimizing these approaches and will address this challenge in future studies. This will be further discussed in the revised manuscript.

      Given the heterogeneity of dorsal root ganglion (DRG) neurons, it is unclear whether the in vitro model described in this study can be applied to all major classes of DRG neurons.

      We acknowledge the diversity of DRG neurons and agree that assessing the presence of different DRG subtypes in our culture system will enrich its future use. Despite this heterogeneity, we focused on DRG neuron features that are common to all subtypes i.e, pseudo-unipolarization and higher regenerative capacity of peripheral branches. This will be further discussed in the revised version of the manuscript.

      Also unclear is the inconsistency with embryonic DRG cultures with embryonic (E)16 from rats and E13 from mice (spastin knockout and wild-type controls).

      Given our previous experience in establishing DRG neuron cultures from Wistar rats and C57BL/6 mice, these developmental stages are equivalent, yielding cultures of DRG neurons with similar percentages of different morphologies. Of note, in our colonies, gestation length is ~19 days in C57BL/6 mice (background of the spastin knockout line) and ~22 days in Wistar Han rats. This will be further clarified in the Methods.

      Furthermore, the authors stated (line 393) that only a small subset of cultured DRG neurons exhibited a pseudo-unipolar morphology. The authors should include the percentage of the neurons that exhibit a pseudo-unipolar morphology.

      We have previously evaluated the percentage of DRG neurons exhibiting different morphologies in our cultures: multipolar (4%), bipolar, (35%) bell-shaped (17%), and pseudo-unipolar neurons (43%). This will be included in the revised manuscript. In line 393, we referred specifically to an experimental setup where DRG neuron transduction was done and 30 transduced neurons were randomly selected for longitudinal imaging. From these, the number of viable pseudo-unipolar DRG neurons was limited by both the random nature of viral transduction and light-induced toxicity as continuous imaging over seven consecutive days at hourly intervals was done. This will be clarified in the revised manuscript.

      The significance of studying microtubule polymerization to DRG asymmetry in vitro is questionable, especially considering the model's validity. The authors might consider eliminating the in vitro data and instead focus on characterizing DRG asymmetry in vivo both before and after a conditioning lesion. If the authors choose to retain the in vitro data, classifying the central and peripheral-like branches in cultured DRG neurons will require further in-depth characterization. Additional validation should be performed in adult DRG neuron cultures not aged in vitro.

      The in vitro system here presented reliably reproduces several key features of DRG neurons observed in vivo, including asymmetry in axon diameter, regenerative capacity, axonal transport, and microtubule dynamics. Of note, most studies in the field were developed using multipolar DRG neurons that do not recapitulate in vivo morphology and asymmetries. Thus, the current in vitro system serves as a versatile tool for advancing our understanding of DRG biology and associated diseases. This system is particularly suited to study axon regeneration, and enables research on mechanisms occurring at the stem axon bifurcation, which are challenging to examine in vivo due to the length of the stem axon and the difficulty of locating the DRG T-junction. Optimizing similar cultures using adult DRG neurons comes with challenges, such as lower cell viability and decreased percentage of pseudo-unipolarization. This is the case with multiple other neuron types for which the vast majority of cultures are obtained from embryonic tissue. These embryonic cultures (as is the case with cortical and hippocampal neurons) are widely used to understand neuronal polarization, axon growth and/or regeneration. This will be further addressed in the revised manuscript.

      The comparison of asymmetry associated with a regenerative response between in vitro and in vivo paradigms has significant limitations due to the nature of the in vitro culture system. When cultured in isolation, DRG neurons fail to form functional connections with appropriate postsynaptic target neurons (the central branch) or to differentiate the peripheral domains associated with the innervation of target organs. Rather than growing neurons on a flat, hard surface like glass, more physiologically relevant substrates and/or culturing conditions should be considered. This approach could help eliminate potential artifacts caused by plating adult DRG neurons on a flat surface. Additionally, the authors should consider replicating their findings in a 3D culture model or using dorsal root ganglia explants, where both centrally and peripherally projecting axons are present.

      We agree that a more sophisticated system, such as a compartmentalized culture, holds great potential for future research. In this respect, we are currently engaged in developing such models. A compartmentalized system would enable the separation of three compartments: central nervous system neurons, DRG neurons, and peripheral targets. While previous efforts to create compartmentalized DRG cultures have been reported, these systems have not demonstrated the development of pseudo-unipolar morphology. Incorporating non-neuronal DRG cells into the DRG neuron compartment, may successfully support the development of a pseudo-unipolar morphology.

      We also recognize the importance of dimensionality in fostering pseudo-unipolar morphology. Of note, our model provides a 3D-like environment, as DRG glial cells are continuously replicating over the 21 days in culture. In relation to DRG explants, we attempted their use but encountered limitations with confocal microscopy as the axial resolution was insufficient to resolve adequately processes at the DRG T-junction or within individual branches. While tissue clearing could improve resolution, it would be incompatible with live imaging, which is essential for our experiments.

      The above issues will be further discussed in the revised manuscript.

      Panels 5H-J require additional processing with astrocyte markers to accurately define the lesion borders. Furthermore, including a lower magnification would facilitate a direct comparison of the lesion site.

      In our study, we relied on the alignment of nuclei to delineate the lesion site as in our accumulated experience, this provides an accurate definition of the lesion boarder. Outside the lesion, the nuclei are well-aligned, while at the lesion site, they become randomly distributed. Additionally, CTB staining further supports the identification of the rostral boarder of the lesion, as most injured central DRG axons stop their growth at the injury site. This will be further detailed in the Methods.

      The use of cholera toxin subunit B (CTB) to trace dorsal column sensory axons is prone to misinterpretation, as the tracer accumulates at the axon's tip. This limitation makes it extremely challenging to distinguish between regenerating and degenerating axons.

      While alternative methods to trace or label regenerating axons exist, CTB is a well-established and widely used tracer for central sensory projections, as shown in multiple studies. Regarding the concern of possible CTB labeling in degenerating axons, we believe this is unlikely to be the case in our study as in spinal cord injury controls, CTB-positive axons are nearly absent. Also, as regeneration was investigated six weeks after injury, axon degeneration has most likely already occurred, as shown in (PMID: 15821747 and PMID: 25937174).

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Rühling et al analyzes the mode of entry of S. aureus into mammalian cells in culture. The authors propose a novel mechanism of rapid entry that involves the release of calcium from lysosomes via NAADP-stimulated activation of TPC1, which in turn causes lysosomal exocytosis; exocytic release of lysosomal acid sphingomyelinase (ASM) is then envisaged to convert exofacial sphingomyelin to ceramide. These events not only induce the rapid entry of the bacteria into the host cells but are also described to alter the fate of the intracellular S. aureus, facilitating escape from the endocytic vacuole to the cytosol.

      Strengths:

      The proposed mechanism is novel and could have important biological consequences.

      Weaknesses:

      Unfortunately, the evidence provided is unconvincing and insufficient to document the multiple, complex steps suggested. In fact, there appear to be numerous internal inconsistencies that detract from the validity of the conclusions, which were reached mostly based on the use of pharmacological agents of imperfect specificity.

      We thank the reviewer for the detailed evaluation of our manuscript. We will address the criticism below.

      We agree with the reviewer that many of the experiments presented in our study rely on the usage of inhibitors. However, we want to emphasize that the main conclusion (invasion pathway affects the intracellular fate/phagosomal escape) was demonstrated without the use of inhibitors or genetic ablation in two key experiments (Figure4 G/H). These experiments were in line with the results we obtained with inhibitors (amitriptyline [Supp. Figure 4E], ARC39, PCK310, [Figure 4c] and Vacuolin-1 [Supp. Figure4f]). Importantly, the hypothesis was also supported by another key experiment, in which we showed the intracellular fate of bacteria is affected by removal of SM from the plasma membrane before invasion, but not by removal of SM from phagosomal membranes after bacteria internalization (Figure4d-f). Taken together, we thus believe that the main hypothesis is strongly supported by our data.

      Moreover, we either used different inhibitors for the same molecule (ASM was inhibited by ARC39, amitriptyline and PCK310 with similar outcome) or supported our hypothesis with gene-ablated cell pools (TPC1, Syt7, SARM1), as we will point out in more detail below.

      Firstly, the release of calcium from lysosomes is not demonstrated. Localized changes in the immediate vicinity of lysosomes need to be measured to ascertain that these organelles are the source of cytosolic calcium changes. In fact, 9-phenantrol, which the authors find to be the most potent inhibitor of invasion and hence of the putative calcium changes, is not a blocker of lysosomal calcium release but instead blocks plasmalemmal TRPM4 channels. On the other hand, invasion is seemingly independent of external calcium. These findings are inconsistent with each other and point to non-specific effects of 9-phenantrol. The fact that ionomycin decreases invasion efficiency is taken as additional evidence of the importance of lysosomal calcium release. It is not clear how these observations support involvement of lysosomal calcium release and exocytosis; in fact treatment with the ionophore should itself have induced lysosomal exocytosis and stimulated, rather than inhibited invasion. Yet, manipulations that increase and others that decrease cytosolic calcium both inhibited invasion.

      With respect to lysosomal Ca2+ release, we agree with the reviewer that direct visual demonstration of lysosomal Ca2+ release upon infection will improve the manuscript. We therefore will perform additional experimentation to show alterations of Ca2+ at the lysosomes during infection.

      As to the TRPM4 involvement in S. aureus host cell internalization, it has been reported that TRPM4 is activated by cytosolic Ca2+. However, the channel conducts monovalent cations such as K+ or Na+ but is impermeable for Ca2+ 1, 2. The following of our observations are supporting this:

      i) S. aureus invasion is dependent on intracellular Ca2+, but is independent from extracellular Ca2+  (Figure 1c).

      ii) 9-phenantrol treatment reduces S. aureus internalization by host cells, illustrating the dependence of this process on TRPM4 (Figure 1b). We therefore hypothesize that TRPM4 is activated by Ca2+ released from lysosomes (see above).

      TRPM4 is localized to focal adhesions and is connected to actin cytoskeleton3, 4 – a requisite of host cell entry of S. aureus.5, 6 This speaks for an important function of TRPM4 in uptake of S. aureus in general, but does not necessarily have to be involved exclusively in the rapid uptake pathway.

      TRPM4 itself is not permeable for Ca2+ but is activated by the cation.  Thus, it is unlikely to cause lysosomal exocytosis. The stronger bacterial uptake reduction by treatment with 9-phenantrol when compared to Ned19 thus may be caused by the involvement of TRPM4 in additional pathways of S. aureus host cell entry involving that association of TRPM4 with focal adhesions or, as pointed out by the reviewer, unspecific side effects of 9-phenantrol that we currently cannot exclude. We will include this information in the revised manuscript.

      Regarding the reduced S. aureus invasion after ionomycin treatment, we agree with the reviewer that ionomycin is known to lead to lysosomal exocytosis as was previously shown by others7 as well as our laboratory8.

      We hypothesized that pretreatment with ionomycin would trigger lysosomal exocytosis and thus would reduce the pool of lysosomes that can undergo exocytosis before host cells are contacted by S. aureus. As a result, we should observe a marked reduction of S. aureus internalization in such “lysosome-depleted cells”, if the lysosomal exocytosis is coupled to bacterial uptake. Our observation of reduced bacterial internalization after ionomycin treatment supports this hypothesis.

      However, ionomycin treatment and S. aureus infection of host cells are distinct processes.

      While ionomycin results in strong global and non-directional lysosomal exocytosis of all “releasable” lysosomes (~5-10 % of all lysosomes according to previous observations)7, we hypothesize that lysosomal exocytosis upon contact with S. aureus only involves a very small proportion of lysosomes at host-bacteria contact sites.

      Since ionomycin disturbs the overall cellular Ca2+ homeostasis, we agree with the reviewer that this does not directly show lysosomal Ca2+ liberation. We will discuss this in more detail in the revised manuscript.

      The proposed role of NAADP is based on the effects of "knocking out" TPC1 and on the pharmacological effects of Ned-19. It is noteworthy that TPC2, rather than TPC1, is generally believed to be the primary TPC isoform of lysosomes. Moreover, the gene ablation accomplished in the TPC1 "knockouts" is only partial and rather unsatisfactory. Definitive conclusions about the role of TPC1 can only be reached with proper, full knockouts. Even the pharmacological approach is unconvincing because the high doses of Ned-19 used should have blocked both TPC isoforms and presumably precluded invasion. Instead, invasion is reduced by only ≈50%. A much greater inhibition was reported using 9-phenantrol, the blocker of plasmalemmal calcium channels. How is the selective involvement of lysosomal TPC1 channels justified?

      As to partial gene ablation of TPC1: To avoid clonal variances, we usually perform pool sorting to obtain a cell population that predominantly contains cells -here- deficient in TPC1, but also a small proportion of wildtype cells as seen by the residual TPC1 protein on the Western blot. We observe a significant reduction of bacterial uptake in this cell pool suggesting that the uptake reduction in a pure K.O. population may be even larger.

      As to the inhibition by Ned19: We agree with the reviewer that Ned19 inhibits TPC1 and TPC2. Since ablation of TPC1 reduced invasion of S. aureus, we concluded that TPC1 is important for S. aureus host cell invasion. We thus agree with the reviewer that a role for TPC2 cannot be excluded. We will clarify this in the reviewed manuscript. It needs to be noted, however, that deficiency in either TPC1 or TPC2 alone was sufficient to prevent Ebola virus infection9, which is in line with our observations.

      The 50% reduction of invasion upon Ned19 treatment (Figure 1d) is comparable with the reduction caused by other compounds that influence the ASM-dependent pathway (such as amitriptyline, ARC39 [Figure 2c], BAPTA-AM [Figure 1c], Vacuolin-1 [Figure 2a], β-toxin [Figure 2e] and ionomycin [Figure 1a]). Further, the partial reduction of invasion is most likely due to the concurrent activity of multiple internalization pathways which are not all targeted by the used compounds.

      Invoking an elevation of NAADP as the mediator of calcium release requires measurements of the changes in NAADP concentration in response to the bacteria. This was not performed. Instead, the authors analyzed the possible contribution of putative NAADP-generating systems and reported that the most active of these, CD38, was without effect, while the elimination of SARM1, another potential source of NAADP, had a very modest (≈20%) inhibitory effect that may have been due to clonal variation, which was not ruled out. In view of these data, the conclusion that NAADP is involved in the invasion process seems unwarranted.

      Our results from two independent experimental set-ups (Ned19 [Figure 1d] and TPC1 K.O. [Figure 1e & Figure 2f]) indicate the involvement of NAADP in the process. However, the measurement of NAADP concentration is non-trivial. However, we can rule out clonal variation in the SARM1 mutant since experiments were conducted with a cell pool as described above in order to avoid clonal variation of single clones.

      The mechanism behind biosynthesis of NAADP is still debated. CD38 was the first enzyme discovered to possess the ability of producing NAADP. However, it requires acidic pH to produce NAADP10 -which does not match the characteristics of a cytosolic NAADP producer. HeLa cells do not express CD38 and hence, it is not surprising that inhibition of CD38 had no effect on S. aureus invasion in HeLa cells. However, NAADP production by HeLa cells was observed in absence of CD3811. Thus CD38-independent NAADP generation is likely. SARM1 can produce NAADP at neutral pH12 and is expressed in HeLa, thus providing a more promising candidate.

      We agree with the reviewer that the reduction of S. aureus internalization after ablation of SARM1 is less pronounced than in other experiments of ours. This may be explained by NAADP originating from other enzymes, such as the recently discovered DUOX1, DUOX2, NOX1 and NOX213, which – with exception of DUOX2- possess a low expression even in HeLa cells. We will discuss this in the revised manuscript.

      The involvement of lysosomal secretion is, again, predicated largely on the basis of pharmacological evidence. No direct evidence is provided for the insertion of lysosomal components into the plasma membrane, or for the release of lysosomal contents to the medium. Instead, inhibition of lysosomal exocytosis by vacuolin-1 is the sole source of evidence. However, vacuolin-1 is by no means a specific inhibitor of lysosomal secretion: it is now known to act primarily as a PIKfyve inhibitor and to cause massive distortion of the endocytic compartment, including gross swelling of endolysosomes. The modest (20-25%) inhibition observed when using synaptotagmin 7 knockout cells is similarly not convincing proof of the requirement for lysosomal secretion.

      We agree that the manuscript will strongly benefit from a functional analysis of lysosomal exocytosis. We therefore will conduct assays to investigate exocytosis in the revision. However, we previously showed i) by addition of specific antisera that LAMP1 transiently is exposed on the plasma membrane during ionomycin and pore-forming toxin challenge and ii) demonstrated the release of ASM activity into the culture medium under these conditions.8 Both measurements are not compatible with S. aureus infection, since LAMP1 antibodies also are non-specifically bound by protein A and another IgG-binding protein on the S. aureus surface, which would bias the results. Since protein A also serves as an adhesin, we cannot simply delete the ORF without changing other aspects of staphylococcal virulence. Further, FBS contains a ASM background activity that impedes activity measurements of cell culture medium. We previously removed this background activity by a specific heat-inactivation protocol.8 However, S. aureus invasion is strongly reduced in culture medium containing this heat-inactivated FBS.

      We agree with the reviewer that Vacuolin-1 has unspecific side effects. We will address this in the revised version of the manuscript.

      As to the involvement of synaptotagmin 7:

      Synaptotagmin 7 is not the only protein possibly involved in Ca-dependent exocytosis. For instance, SYT1 has been shown to possess an overlapping function.14 This may explain the discrepancy between our vacuolin-1 and SYT7 ablation experiments. We will add an according section to the discussion.

      ASM is proposed to play a central role in the rapid invasion process. As above, most of the evidence offered in this regard is pharmacological and often inconsistent between inhibitors or among cell types. Some drugs affect some of the cells, but not others. It is difficult to reach general conclusions regarding the role of ASM. The argument is made even more complex by the authors' use of exogenous sphingomyelinase (beta-toxin). Pretreatment with the toxin decreased invasion efficiency, a seemingly paradoxical result. Incidentally, the effectiveness of the added toxin is never quantified/validated by directly measuring the generation of ceramide or the disappearance of SM.

      Although pharmacological inhibitors can have unspecific side effects, we want to emphasize that the inhibitors used in our study act on the enzyme ASM by completely different mechanisms. Amitriptyline is a so called functional inhibitor of ASM (FIASMA) which induces the detachment of ASM from lysosomal membranes resulting in degradation of the enzyme.15 By contrast, ARC39 is a competitive inhibitor.16, 17

      We do not see inconsistencies in our data obtained with ASM inhibitors. Amitriptyline and ARC39 both reduce the invasion of S. aureus in HuLEC, HuVEC and HeLa cells (Figure 2c). ARC39 needs a longer pre-incubation, since its uptake by host cells is slower (data not shown). We observe a different outcome in 16HBE14o- and Ea.Hy 926 cells, with 16HBE14o- even demonstrating a slightly increased invasion of S. aureus upon ARC39 treatment. Amitriptyline had no effect (Figure 2c). Moreover, both inhibitors affected the invasion dynamics (Figure 3d), phagosomal escape (Figure 4c and Supp. Figure 4e) and Rab7 recruitment (Figure 4a and Supp. Figure 4b) in a similar fashion. Proper inhibition of ASM by both compounds in all cell lines used was validated by enzyme assays (Supp. Figure 2e), which suggests that the ASM-dependent pathway does only exist in specific cell lines. This also may serve as an argument that we here do not observe unspecific side effects of the compounds. We will clarify this in the revised manuscript.

      ASM is a key player for SM degradation and recycling. In clinical context, deficiency in ASM results in the so-called Niemann Pick disease type A/B. The lipid profile of ASM-deficient cells is massively altered18, which will result in severe side effects. Short-term inhibition by small molecules therefore poses a clear benefit when compared to the usage of ASM K.O. cells.

      As to the treatment with a bacterial sphingomyelinase:

      Treatment with the bacterial SMase (bSMase, here: β-toxin) was performed in two different ways:

      i) Pretreatment of host cells with β-toxin to remove SM from the host cell surface before infection. This removes the substrate of ASM from the cell surface prior to addition of the bacteria (Figure 2e, Figure 4d-f). Since SM is not present on the extracellular plasma membrane leaflet after treatment, a release of ASM cannot cause localized ceramide formation at the sites of lysosomal exocytosis. Similar observations were made by others.19

      ii) Addition of bSMase to host cells together with the bacteria to complement for the absence of ASM (Figure 2f).

      Removal of the ASM substrate before infection (i) prevents localized ASM-mediated conversion of SM to Cer during infection and resulted in a decreased invasion, while addition of the SMase during infection resulted in an increased invasion in TPC1 and SYT7 ablated cells. Thus, both experiments are consistent with each other and in line with our other observations.

      Removal of SM from the plasma membrane by β-toxin was indirectly demonstrated by the absence of Lysenin recruitment to phagosomes/escaped bacteria when host cells were pretreatment with the toxin before infection (Figure4F). In another publication, we recently quantified the effectiveness of β-toxin treatment, even though with slightly longer treatment times (75 min vs. 3h).20 We will repeat the measurements also for shorter treatment times.

      To clarify our experimental approaches to the readership we will add an explanatory section to the revised manuscript.

      As to the general conclusions regarding the role of ASM: ASM and lysosomal exocytosis has been shown to be involved in uptake of a variety of pathogens19, 21-25 supporting its role in the process.

      The use of fluorescent analogs of sphingomyelin and ceramide is not well justified and it is unclear what conclusions can be derived from these observations. Despite the low resolution of the images provided, it appears as if the labeled lipids are largely in endomembrane compartments, where they would presumably be inaccessible to the secreted ASM. Moreover, considering the location of the BODIPY probe, the authors would be unable to distinguish intact sphingomyelin from its breakdown product, ceramide. What can be concluded from these experiments? Incidentally, the authors report only 10% of BODIPY-positive events after 10 min. What are the implications of this finding? That 90% of the invasion events are unrelated to sphingomyelin, ASM, and ceramide?

      During the experiments with fluorescent SM analogues (Figure 3a,b), S. aureus was added to the samples immediately before start of video recording. Hence, bacteria are slowly trickling onto the host cells and we thus can image the initial contact between them and the bacteria, for instance, the bacteria depicted in Figure 3a contact the host cell about 9 min before becoming BODIPY-FL-positive (see Supp. Video 1, 55 min). Hence, we think that in these cases we see the formation of phagosomes around bacteria rather than bacteria in endomembrane compartments. Since generation of phagosomes happens at the plasma membrane, SM is accessible to secreted ASM.

      The “trickling” approach for infection is an experimental difference to our invasion measurements, in which we synchronized the infection by a very slow centrifugation. This ensures that all bacteria have contact to host cells and are not just floating in the culture medium. However, live cell imaging of initial bacterial-host contact and synchronization of infection is technically not combinable.

      In our invasion measurements -with synchronization-, we typically see internalization of ~20% of all added bacteria after 30 min. Hence, most bacteria that are visible in our videos likely are still extracellular and only a small proportion was internalized. This explains why only 10% of total bacteria are positive for BODIPY-FL-SM after 10 min. The proportion of internalized bacteria that are positive for BODIPY-FL-SM should be way higher but cannot be determined with this method.

      We agree with the reviewer that we cannot observe conversion of BODIPY-FL-SM by ASM. In order to do that, we attempted to visualize the conversion of a visible-range SM FRET probe (Supp. Figure 3), but the structure of the probe is not compatible with measurement of conversion on the plasma membrane, since the FITC fluorophore released into the culture medium by the ASM activity thereby gets lost for imaging. In general, the visualization of SM conversion with subcellular resolution is challenging and even with novel tools developed in our lab26 visualization of SM on the plasma membrane is difficult.

      The conclusion we draw from these experiments are that i.) S. aureus invasion is associated with SM and ii.) SM-associated invasion can be very fast, since bacteria are rapidly engulfed by BODIPY-FL-SM containing membranes.

      It is also unclear how the authors can distinguish lysenin entry into ruptured vacuoles from the entry of RFP-CWT, used as a criterion of bacterial escape. Surely the molecular weights of the probes are not sufficiently different to prevent the latter one from traversing the permeabilized membrane until such time that the bacteria escape from the vacuole.

      We here want to clarify that both, the Lysenin as well as the CWT reporter have access to rupture vacuoles (Figure 4b). We used the Lysenin reporter in these experiments for estimation of SM content of phagosomal membranes. If a vacuole is ruptured, both the bacteria and the luminal leaflet of the phagosomal membrane remnants get in contact with the cytosol and hence with the cytosolically expressed reporters YFP-Lysenin as well as RFP-CWT resulting in “Lysenin-positive escape” when phagosomes contained SM (see Figure 4f). By contrast, either β-toxin expression by S. aureus or pre-treatment with the bSMase resulted in absence of Lysenin recruitment suggesting that the phagosomal SM levels were decreased/undetectable (Figure 4f, Supp Figure 5f, g, i, j).

      This approach does not enable a quantitative measurement of phagosomal SM and rather gives a “yes or no” answer. However, we think this method is sufficient to show that β-toxin expression and pretreatment markedly decreased phagosomal SM levels in the host cells.

      The approach we used here to analyze “Lysenin-positive escape” can clearly be distinguished from Lysenin-based methods that were used by others.27 There Lysenin was used to show trans-bilayer movement of SM before rupture of bacteria-containing phagosomes.

      To clarify the function of Lysenin in our approach we will add an additional figure to the revised manuscript.

      Both SMase inhibitors (Figure 4C) and SMase pretreatment increased bacterial escape from the vacuole. The former should prevent SM hydrolysis and formation of ceramide, while the latter treatment should have the exact opposite effects, yet the end result is the same. What can one conclude regarding the need and role of the SMase products in the escape process?

      As pointed out above, pretreatment of host cells with SMase removes SM from the plasma membrane and hence, ASM does not have access to its substrate. Hence, both treatment with either ASM inhibitors or pretreatment with bacterial SMase prevent ASM from being active on the plasma membrane and hence block the ASM-dependent uptake (Figure 2 c, e). Although overall less bacteria were internalized by host cells under these conditions, the bacteria that invaded host cells did so in an ASM-independent manner.

      Since blockage of the ASM-dependent internalization pathway (with ASM inhibitor [Figure 4c], SMase pretreatment [Figure 4e] and Vacuolin-1[Supp. Fig.4f]) always resulted in enhanced phagosomal escape, we conclude that bacteria that were internalized in an ASM-independent fashion cause enhanced escape. Vice versa, bacteria that enter host cells in an ASM-dependent manner demonstrate lower escape rates.

      This is supported by comparing the escape rates of “early” and “late” invaders [Figure 4g/h], which in our opinion is a key experiment that supports this hypothesis. The “early” invaders are predominantly ASM-dependent (see e.g. Figure 3e) and thus, bacteria that entered host cell in the first 10 min of infection should have been internalized predominantly in an ASM-dependent fashion, while slower entry pathways are active later during infection. The early ASM dependent invaders possessed lower escape rates, which is in line with the data obtained with inhibitors (e.g. Figure 4c and Supp. Fig. 4f).

      We hypothesize that the activity of ASM on the plasma membrane during invasion mediates the recruitment of a specific subset of receptors, which then influence downstream phagosomal maturation and escape. This hypothesis is supported by the fact that the subset of receptors interacting with S. aureus is altered upon inhibition of the ASM-dependent uptake pathway. We describe this in another study that is currently under evaluation elsewhere.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca2+ and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry.

      The evidence provided is solid, methods used are appropriate and results largely support their conclusions, but can be substantiated further as detailed below. The weakness is a reliance on chemical inhibitors that can be non-specific to delineate critical steps.

      Specific comments:

      A large number of experiments rely on treatment with chemical inhibitors. While this approach is reasonable, many of the inhibitors employed such as amitriptyline and vacuolin1 have other or non-defined cellular targets and pleiotropic effects cannot be ruled out. Given the centrality of ASM for the manuscript, it will be important to replicate some key results with ASM KO cells.

      We thank the reviewer for the critical evaluation of our manuscript and plenty of constructive comments.

      We agree with the reviewer, that ASM inhibitors such as functional inhibitors of ASM (FIASMA) like amitriptyline used in our study have unspecific side effects given their mode-of-action. FIASMAs induce the detachment of ASM from lysosomal membranes resulting in degradation of the enzyme.15  However, we want to emphasize that we also used the competitive inhibitor ARC39 in our study16, 17 which acts on the enzyme by a completely different mechanism. All phenotypes (reduced invasion [Figure 2c, d], effect on invasion dynamics [Figure 3d], enhanced escape [Figure 4c and Supp Figure 4e] and differential recruitment of Rab7 [Supp. Figure 4b]) were observed with both inhibitors thereby supporting the role of ASM in the process.

      We further agree that experiments with genetic evidence usually support and improve scientific findings. However, ASM is a cellular key player for SM degradation and recycling. In a clinical context, deficiency in ASM results in a so-called Niemann Pick disease type A/B. The lipid profile of ASM-deficient cells is massively altered18, which in itself will result in severe side effects. Thus, the usage of inhibitors provides a clear benefit when compared to ASM K.O. cells, since ASM activity can be targeted in a short-term fashion thereby preventing larger alterations in cellular lipid composition.

      Most experiments are done in HeLa cells. Given the pathway is projected as generic, it will be important to further characterize cell type specificity for the process. Some evidence for a similar mechanism in other cell types S. aureus infects, perhaps phagocytic cell type, might be good.

      Whenever possible we performed the experiments not only in HeLa but also in HuLECs. For example, we refer to experiments concerning the role of Ca2+ (Figure 1c/Supp.Figure1e), lysosomal Ca2+/Ned19 (Figure1d/Supp Figure 1g), lysosomal exocytosis/Vacuolin-1 (Figure 2a/Supp. Figure2a), ASM/ARC39 and amitriptyline (Figure 2c), surface SM/β-toxin (Figure 2e/Supp. Figure 2g), analysis of invasion dynamics (complete Figure 3) and measurement of cell death during infection (Figure 5c-e, Supp. Figure 6a+b).

      HuLECs, however, are not really genetically amenable and hence we were not able to generate gene deletions in these cells and upon introduction of the fluorescence escape reporter the cells are not readily growing.

      As to ASM involvement in phagocytic cells: a role for ASM during the uptake of S. aureus by macrophages was previously reported by others.23 However, in professional phagocytes S. aureus does not escape from the phagosome and replicates within the vacuole.28

      I'm a little confused about the role of ASM on the surface. Presumably, it converts SM to ceramide, as the final model suggests. Overexpression of b-toxin results in the near complete absence of SM on phagosomes (having representative images will help appreciate this), but why is phagosomal SM detected at high levels in untreated conditions? If bacteria are engulfed by SM-containing membrane compartments, what role does ASM play on the surface? If surface SM is necessary for phagosomal escape within the cell, do the authors imply that ASM is tuning the surface SM levels to a certain optimal range? Alternatively, can there be additional roles for ASM on the cell surface? Can surface SM levels be visualized (for example, in Figure 4 E, F)?

      We initially hypothesized that we would detect higher phagosomal SM levels upon inhibition of ASM, since our model suggests SM cleavage by ASM on the host cell surface during bacterial cell entry. However, we did not detect any changes in our experiments (Supp. Figure 4d). We currently favor the following explanation: SM is the most abundant sphingolipid in human cells.29 If peripheral lysosomes are exocytosed and thereby release ASM, only a localized and relative small proportion of SM may get converted to Cer, which most likely is below our detection limit. In addition, the detection of cytosolically exposed phagosomal SM by YFP-Lysenin is not quantitative and provides a “Yes or No” measurement. Hence, we think that the rather limited SM to Cer conversion in combination with the high abundance of SM in cellular membranes does not visibly affect the recruitment of the Lysenin reporter.

      In our experiments that employ BODIPY-FL-SM (Figure 3a+b), we cannot distinguish between native SM and downstream metabolites such as Cer. Hence, again we cannot make any assumptions on the extent to which SM is converted on the surface during bacterial internalization. Although our laboratory recently used trifunctional sphingolipid analogs to analyze the SM to Cer conversion20, the visualization of this process on the plasma membrane is currently still challenging.

      Overall, we hypothesize that the localized generation of Cer on the surface by released ASM leads to generation of Cer-enriched platforms. Subsequently, a certain subset of receptors may be recruited to these platforms and influence the uptake process. These platforms are supposed to be very small, which also would explain that we did not detect changes in Lysenin recruitment.

      Related to that, why is ASM activity on the cell surface important? Its role in non-infectious or other contexts can be discussed.

      ASM release by lysosomal exocytosis is implied in plasma membrane repair upon injury. We will this discuss this in the revised version of the manuscript.

      If SM removal is so crucial for uptake, can exocytosis of lysosomes alone provide sufficient ASM for SM removal? How much or to what extent is lysosomal exocytosis enhanced by initial signaling events? Do the authors envisage the early events in their model happening in localized confines of the PM, this can be discussed.

      Ionomycin treatment led to a release of ~10 % of all lysosomes and also increased extracellular ASM activity.7, 8 However, it is currently unclear– to our knowledge -to which extent the released ASM affects surface SM levels. Also, it is unknown which percentage of the lysosomes is released during infection with S. aureus. However, one has to speculate that this will be only a fraction of the “releasable lysosomes” as we assume that the effects (lysosomal Ca2+ liberation, lysosomal exocytosis and ASM activity) are very localized and take place only at host-pathogen contact sites (see also above). In initial experimentation we attempted to visualize the local ASM activity on the cell surface by using a visible range FRET probe (Supp. Fig. 3). Cleavage of the probe by ASM on the surface leads to release of FITC into the cell culture medium which does not contribute a measurable signal at the surface.

      How are inhibitor doses determined? How efficient is the removal of extracellular bacteria at 10 min? It will be good to substantiate the cfu experiments for infectivity with imaging-based methods. Are the roles of TPC1 and TPC2 redundant? If so, why does silencing TPC1 alone result in a decrease in infectivity? For these and other assays, it would be better to show raw values for infectivity. Please show alterations in lysosomal Ca2+ at the doses of inhibitors indicated. Is lysosomal Ca2+ released upon S. aureus binding to the cell surface? Will be good to directly visualize this.

      Concerning the inhibitor concentrations, we either used values established in published studies or recommendations of the suppliers (e.g. 2-APB, Ned19, Vacuolin-1). For ASM inhibitors, we determined proper inhibition of ASM by activity assays. Concentrations of ionomycin resulting in Ca2+ influx and lysosomal exocytosis was determined in earlier studies of our lab.8, 30

      As to the removal of bacteria at 10 min p.i.: Lysostaphin is very efficient for removal of extracellular S. aureus and sterilizes the tissue culture supernatant. It significantly lyses bacteria within a few minutes, as determined by turbidity assays.31

      As to imaging-based infectivity assays: We will add an analysis of imaging-based invasion assays in the revised manuscript.

      Regarding the roles of TPC1 and TPC2: from our data we cannot conclude whether the roles of TPC1 and TPC2 are redundant. One could speculate that since blockage of TPC1 alone is sufficient to reduce internalization of bacteria, that both channels may have distinct roles. On the other hand, there might be a Ca2+ threshold in order to initiate lysosomal exocytosis that can only be attained if TPC1 and TPC2 are activated in parallel. Thus, our observations are in line with another study that shows reduced Ebola virus infection in absence of either TPC1 or TPC2.32

      As to raw CFU counts: whereas the observed effects upon blocking the invasion of S. aureus are stable, the number of internalized bacteria varies between individual biological replicates, for instance, by differences in host cell fitness or growth differences in bacterial cultures, which are prepared freshly for each experiment.

      With respect to visualization of lysosomal Ca2+ release: we agree with the reviewer that direct visual demonstration of lysosomal Ca2+ release upon infection will improve the manuscript. We therefore will perform additional experimentation to show alterations of Ca2+ at the lysosomes during infection.

      The precise identification of cytosolic vs phagosomal bacteria is not very easy to appreciate. The methods section indicates how this distinction is made, but how do the authors deal with partial overlaps and ambiguities generally associated with such analyses? Please show respective images. The number of events (individual bacteria) for the live cell imaging data should be clearly mentioned.

      We apologize for not having sufficiently explained the technology to detect escaped S. aureus. The cytosolic location of S. aureus is indicated by recruitment of RFP-CWT.33 CWT is the cell wall targeting domain of lysostaphin, which efficiently binds to the pentaglycine cross bridge in the peptidoglycan of S. aureus. This reporter is exclusively and homogenously expressed in the host cytosol. Only upon rupture of phagoendosomal membranes the reporter can be recruited to the cell wall of now cytosolically located bacteria. S. aureus mutants, for instance in the agr quorum sensing system, cannot break down the phagosomal membrane in non-professional phagocytes and thus stay unlabeled by the CWT-reporter.33 We will include respective images/movies of escape events and the bacteria numbers for live cell experiments in the revised version of the manuscript.

      In the phagosome maturation experiments, what is the proportion of bacteria in Rab5 or Rab7 compartments at each time point? Will the decreased Rab7 association be accompanied by increased Rab5? Showing raw values and images will help appreciate such differences. Given the expertise and tools available in live cell imaging, can the authors trace Rab5 and Rab7 positive compartment times for the same bacteria?

      We will include the proportion of Rab7-associated bacteria in the revised manuscript. Usually, we observe that Rab5 is only transiently (for a few minutes) present on phagosomes and only afterwards the phagosomes become positive for Rab7. We do not think that a decrease in Rab7-positive phagosomes would increase the proportion of Rab5-positive phagosomes. However, we cannot exclude this hypothesis with our data.

      We can achieve tracing of individual bacteria for recruitment of Rab5/Rab7 only manually, which impedes a quantitative evaluation. However, we will include information that illustrates the consecutive recruitment of the GTPases.

      The results with longer-term infection are interesting. Live cell imaging suggests that ASM-inhibited cells show accelerated phagosomal escape that reduces by 6 hpi. Where are the bacteria at this time point ? Presumably, they should have reached lysosomes. The relationship between cytosolic escape, replication, and host cell death is interesting, but the evidence, as presented is correlative for the populations. Given the use of live cell imaging, can the authors show these events in the same cell?

      We think that most bacteria-containing phagoendosomes should have fused with lysosomes 6 h p.i. as we have previously shown by acidification to pH of 5 and LAMP1 decoration.34

      We will provide images/videos to show the correlation between escape and replication in the revised manuscript.

      Given the inherent heterogeneity in uptake processes and the use of inhibitors in most experiments, the distinction between ASM-dependent and independent pathways might not be as clear-cut as the authors suggest. Some caution here will be good. Can the authors estimate what fraction of intracellular bacteria are taken up ASM-dependent?

      We agree with the reviewer that an overlap between internalization pathways is likely. A clear distinction is therefore certainly non-trivial. Alternative to ASM-dependent and ASM-independent pathways, the ASM activity may also accelerate one or several internalization pathways. We will address this limitation in the revised manuscript. 

      Early in infection (~10 min after contact with the cells), the proportion of bacteria that enter host cells ASM-dependently is relatively high amounting to roughly 75% in HuLEC. After 30 min, this proportion is decreasing to about 50%. We will include this information in the revised version of the manuscript.

      References

      (1) Launay, P. et al. TRPM4 Is a Ca2+-Activated Nonselective Cation Channel Mediating Cell Membrane Depolarization. Cell 109, 397-407 (2002).

      (2) Nilius, B. et al. The Ca<sup>2+</sup>‐activated cation channel TRPM4 is regulated by phosphatidylinositol 4,5‐biphosphate. The EMBO Journal 25, 467-478-478 (2006).

      (3) Cáceres, M. et al. TRPM4 Is a Novel Component of the Adhesome Required for Focal Adhesion Disassembly, Migration and Contractility. PLoS One 10, e0130540 (2015).

      (4) Silva, I., Brunett, M., Cáceres, M. & Cerda, O. TRPM4 modulates focal adhesion-associated calcium signals and dynamics. Biophysical Journal 123, 390a (2024).

      (5) Schlesier, T., Siegmund, A., Rescher, U. & Heilmann, C. Characterization of the Atl-mediated staphylococcal internalization mechanism. International Journal of Medical Microbiology 310, 151463 (2020).

      (6) Jevon, M. et al. Mechanisms of Internalization ofStaphylococcus aureus by Cultured Human Osteoblasts. Infection and Immunity 67, 2677-2681 (1999).

      (7) Rodriguez, A., Webster, P., Ortego, J. & Andrews, N.W. Lysosomes behave as Ca2+-regulated exocytic vesicles in fibroblasts and epithelial cells. J Cell Biol 137, 93-104 (1997).

      (8) Krones & Rühling et al. Staphylococcus aureus alpha-Toxin Induces Acid Sphingomyelinase Release From a Human Endothelial Cell Line. Front Microbiol 12, 694489 (2021).

      (9) Sakurai, Y. et al. Two-pore channels control Ebola virus host cell entry and are drug targets for disease treatment. Science 347, 995-998 (2015).

      (10) Aarhus, R., Graeff, R.M., Dickey, D.M., Walseth, T.F. & Lee, H.C. ADP-ribosyl cyclase and CD38 catalyze the synthesis of a calcium-mobilizing metabolite from NADP. J Biol Chem 270, 30327-30333 (1995).

      (11) Schmid, F., Fliegert, R., Westphal, T., Bauche, A. & Guse, A.H. Nicotinic acid adenine dinucleotide phosphate (NAADP) degradation by alkaline phosphatase. J Biol Chem 287, 32525-32534 (2012).

      (12) Angeletti, C. et al. SARM1 is a multi-functional NAD(P)ase with prominent base exchange activity, all regulated bymultiple physiologically relevant NAD metabolites. iScience 25, 103812 (2022).

      (13) Gu, F. et al. Dual NADPH oxidases DUOX1 and DUOX2 synthesize NAADP and are necessary for Ca(2+) signaling during T cell activation. Sci Signal 14, eabe3800 (2021).

      (14) Schonn, J.-S., Maximov, A., Lao, Y., Südhof, T.C. & Sørensen, J.B. Synaptotagmin-1 and -7 are functionally overlapping Ca<sup>2+</sup> sensors for exocytosis in adrenal chromaffin cells. Proceedings of the National Academy of Sciences 105, 3998-4003 (2008).

      (15) Kornhuber, J. et al. Functional Inhibitors of Acid Sphingomyelinase (FIASMAs): a novel pharmacological group of drugs with broad clinical applications. Cell Physiol Biochem 26, 9-20 (2010).

      (16) Naser, E. et al. Characterization of the small molecule ARC39, a direct and specific inhibitor of acid sphingomyelinase in vitro. J Lipid Res 61, 896-910 (2020).

      (17) Roth, A.G. et al. Potent and selective inhibition of acid sphingomyelinase by bisphosphonates. Angew Chem Int Ed Engl 48, 7560-7563 (2009).

      (18) Schuchman, E.H. & Desnick, R.J. Types A and B Niemann-Pick disease. Mol Genet Metab 120, 27-33 (2017).

      (19) Miller, M.E., Adhikary, S., Kolokoltsov, A.A. & Davey, R.A. Ebolavirus Requires Acid Sphingomyelinase Activity and Plasma Membrane Sphingomyelin for Infection. Journal of Virology 86, 7473-7483 (2012).

      (20) M. Rühling, L.K., F. Wagner, F. Schumacher, D. Wigger, D. A. Helmerich, T. Pfeuffer, R. Elflein, C. Kappe, M. Sauer, C. Arenz, B. Kleuser, T. Rudel, M. Fraunholz, J. Seibel Trifunctional sphingomyelin derivatives enable nanoscale resolution of sphingomyelin turnover in physiological and infection processes via expansion microscopy. Nat Commun accepted in principle (2024).

      (21) Peters, S. et al. Neisseria meningitidis Type IV Pili Trigger Ca(2+)-Dependent Lysosomal Trafficking of the Acid Sphingomyelinase To Enhance Surface Ceramide Levels. Infect Immun 87 (2019).

      (22) Grassmé, H. et al. Acidic sphingomyelinase mediates entry of N. gonorrhoeae into nonphagocytic cells. Cell 91, 605-615 (1997).

      (23) Li, C. et al. Regulation of Staphylococcus aureus Infection of Macrophages by CD44, Reactive Oxygen Species, and Acid Sphingomyelinase. Antioxid Redox Signal 28, 916-934 (2018).

      (24) Fernandes, M.C. et al. Trypanosoma cruzi subverts the sphingomyelinase-mediated plasma membrane repair pathway for cell invasion. J Exp Med 208, 909-921 (2011).

      (25) Luisoni, S. et al. Co-option of Membrane Wounding Enables Virus Penetration into Cells. Cell Host & Microbe 18, 75-85 (2015).

      (26) Rühling, M. et al. Trifunctional sphingomyelin derivatives enable nanoscale resolution of sphingomyelin turnover in physiological and infection processes via expansion microscopy. Nature Communications 15, 7456 (2024).

      (27) Ellison, C.J., Kukulski, W., Boyle, K.B., Munro, S. & Randow, F. Transbilayer Movement of Sphingomyelin Precedes Catastrophic Breakage of Enterobacteria-Containing Vacuoles. Curr Biol 30, 2974-2983 e2976 (2020).

      (28) Moldovan, A. & Fraunholz, M.J. In or out: Phagosomal escape of Staphylococcus aureus. Cell Microbiol 21, e12997 (2019).

      (29) Slotte, J.P. Biological functions of sphingomyelins. Progress in Lipid Research 52, 424-437 (2013).

      (30) Stelzner, K. et al. Intracellular Staphylococcus aureus Perturbs the Host Cell Ca(2+) Homeostasis To Promote Cell Death. mBio 11 (2020).

      (31) Kunz, T.C. et al. The Expandables: Cracking the Staphylococcal Cell Wall for Expansion Microscopy. Front Cell Infect Microbiol 11, 644750 (2021).

      (32) Sakurai, Y. et al. Ebola virus. Two-pore channels control Ebola virus host cell entry and are drug targets for disease treatment. Science 347, 995-998 (2015).

      (33) Grosz, M. et al. Cytoplasmic replication of Staphylococcus aureus upon phagosomal escape triggered by phenol-soluble modulin alpha. Cell Microbiol 16, 451-465 (2014).

      (34) Giese, B. et al. Staphylococcal alpha-toxin is not sufficient to mediate escape from phagolysosomes in upper-airway epithelial cells. Infect Immun 77, 3611-3625 (2009).

    1. Author response:

      Reviewer 1:

      (1) Free energy barriers appear to be very high for a substrate transport process. In Figure 3, the transitions from IF (Inward facing) to OF (Outward facing) state appear to have a barrier of 12 kcal/mol. Other systems with mutant or sodium unbound have even higher barriers. This does not seem consistent with previous studies where transport mechanisms of transporters have been explored using molecular dynamics. 

      First, in Figure 3, the transition from IF to OF state doesn’t have a barrier of 12 kcal/mol. The IFF to OFB transition is almost barrierless, and from OFB to OFF is ~5 kcal/mol, which is also evident in Figure 2.

      If the reviewer was referring to the transition from OFB to IFB states, the barrier is 6.8 kcal/mol (Na+ bound state), and the rate-limiting barrier in the entire sugar transport process (Na+ bound state) is 8.4 kcal/mol, as indicated in Figure 2 and Table 1, which is much lower than the 12 kcal/mol barrier the reviewer mentioned. When the Na+ is unbound, the barrier can be as high as 12 kcal/mol, but it is this high barrier that leads to our conclusion that the Na+ binding is essential for sugar transport, and the 12 kcal/mol barrier indicates an energetically unfavorable sugar translocation process when the Na+ is unbound, which is unlikely to be the major translocation process in nature. 

      Even for the 12 kcal/mol barrier reported for the Na+ unbound state, it is still not too high considering the experimentally measured MelB sugar active transport rate, which is estimated to be on the order of 10 to 100 s-1. This range of transport rate is typical for similar MFS transporters such as the lactose permease (LacY), which has an active transport rate of 20 s-1. The free energy barrier associated with the active transport is thus on the order of ~15-16 kcal/mol based on transition state theory assuming kBT/h as the prefactor. This experimentally estimated barrier is higher than all of our calculated barriers. Our calculated barrier for the sugar translocation with Na+ bound is 8.4 kcal/mol, which means an additional ~7-8 kcal/mol barrier is contributed by the Na+ release process after sugar release in the IFF state. This is a reasonable estimation of the Na+ unbinding barrier.

      Therefore, whether the calculated barrier is too high depends on the experimental kinetics measurements, which are often challenging to perform. Based on the existing experimental data, the MFS transporters are

      usually relatively slow in their active transport cycle. The calculated barrier thus falls within the reasonable range considering the experimentally measured active transport rates.

      (2) Figure 2b: The PMF between images 20-30 shows the conformation change from OF to IF, where the occluded (OC) state is the highest barrier for transition. However, OC state is usually a stable conformation and should be in a local minimum. There should be free energy barriers between OF and OC and in between OC and IF.  

      First, the occluded state (OCB) is not between images 20-30, it is between images 10 to 20. Second, there is no solid evidence that the OCB state is a stable conformation and a local minimum. Existing experimental structures of MFS transporters seldom have the fully occluded state resolved.

      (3) String method pathway is usually not the only transport pathway and alternate lower energy pathways should be explored. The free energy surface looks like it has not deviated from the string pathway. Longer simulations can help in the exploration of lower free energy pathways. 

      We agree with the reviewer that the string method pathway is usually not the only transport pathway and alternate lower energy pathways could exist. However, we also note that even if the fully occluded state is a local minimum and our free energy pathway does visit this missing local minimum after improved sampling, the overall free energy barrier will not be lowered from our current calculated value. This is because the current rate-limiting barrier arises from the transition from the OFB state to the IFF state, and the barrier top corresponds to the sugar molecule passing through the most constricted region in the cytoplasmic region, i.e., the IFC intermediate state visited after the IFB state is reached. Therefore, the free energy difference between the OFB state and the IFC state will not be changed by another hypothetical local minimum between the OFB and IFB states, i.e., the occluded OCB state. In other words, a hypothetical local minimum corresponding to the occluded state, even if it exists, will not decrease the overall rate-limiting barrier and may even increase it further, depending on the depth of the local minimum and the additional barriers of entering and escaping from this new minimum. 

      (4) The conformational change in transporters from OF to IF state is a complicated multi-step process. First, only 10 images in the string pathway are used to capture the transition from OF to IF state. I am not sure is this number is enough to capture the process. Second, the authors have used geodesic interpolation algorithm to generate the intermediate images. However, looking at Figure 3B, it looks like the transition pathway has not captured the occluded (OC) conformation, where the transport tunnel is closed at both the ends. Transporters typically follow a stepwise conformational change mechanism where OF state transitions to OC and then to IF state. It appears that the interpolation algorithm has created a hourglasslike state, where IF gates are opening and OF gates are closing simultaneously thereby creating a state where the transport tunnel is open on both sides of the membrane. These states are usually associated with high energy. References 30-42 cited in the manuscript reveal a distinct OC state for different transporters. 

      In our simulations, even with 10 initial images representing the OF to IF conformational transition, the occluded state is sampled in the final string pathway. There is an ensemble of snapshots where the extracellular and intracellular gates are both relatively narrower than the OF and IF states, preventing the sugar from leaking into either side of the bulk solution. In contrast to the reviewer’s guess, we never observed an hourglass-like state in our simulation where both gates are open. Figure 3B is a visual representation of the backbone structure of the OCB state without explicitly showing the actual radius of the gating region, which also depends on the side chain conformations. Thus, Figure 3B alone cannot be used to conclude that we are dominantly sampling an hourglass-like intermediate conformation instead of the occluded state, as mentioned by the reviewer. 

      Moreover, not all references in 30-42 have sampled the occluded state since many of them did not even simulate the substrate translocation process at all. For the ones that did sample substrate translocation processes, only two of them were studying the cation-coupled MFS family symporter (ref 38, 40) and they didn’t provide the PMF for the entire translocation process. There is no strong evidence for a stable minimum corresponding to a fully occluded state in these two studies.  In fact, different types of transporters with different coupling cations may exhibit different stability of the fully occluded state. For example, the fully occluded state has been experimentally observed for some MFS transporters, such as multidrug transporter EmrD, but not for others, such as lactose permease LacY. Thus, it is not generally true that a stable, fully-occluded state exists in all transporters, and it highly depends on the specific type of transporter and the coupling ion under study. 

      Reviewer 2:

      The manuscript by Liang and Guan provides an impressive attempt to characterize the conformational free energy landscape of a melibiose permease (MelB), a symporter member of major facilitator superfamily (MFS) of transporters. Although similar studies have been conducted previously for other members of MFS, each member or subfamily has its own unique features that make the employment of such methods quite challenging. While the methodology is indeed impressive, characterizing the coupling between large-scale conformational changes and substrate binding in membrane transporters is quite challenging and requires a sophisticated methodology. The conclusions obtained from the three sets of path-optimization and free energy calculations done by the authors are generally supported by the provided data and certainly add to our understanding of how sodium binding facilitates the transport of melibiose in MelB. However, the data is not generated reliably which questions the relevance of the conclusions as well. I particularly have some concerns regarding the implementation of the methodology that I will discuss below. 

      (1) In enhanced sampling techniques, often much attention is given to the sampling algorithm. Although the sampling algorithm is quite important and this manuscript has chosen an excellent pair: string method with swarms of trajectories (SMwST) and replica-exchange umbrella sampling (REUS) for this task, there are other important factors that must be taken into account. More specifically, the collective variables used and the preparation of initial conformations for sampling. I have objectives for both of these (particularly the latter) that I detail below. Overall, I am not confident that the free energy profiles generated (summarized in Figure 5) are reliable, and unfortunately, much of the data presented in this manuscript heavily relies on these free energy profiles. 

      Since comments (1) and (2) from this review are related, please see our response to (2) below. 

      (2) The authors state that they have had an advantage over other similar studies in that they had two endpoints of the string to work from experimental data. I agree that this is an advantage. However, this could lead to some dangerous flaws in the methodology if not appropriately taken into account. Proteins such as membrane transporters have many slow degrees of freedom that can be fully captured within tens of nanoseconds (90 ns was the simulation time used here for the REUS). Biased sampling allows us to overcome this challenge to some extent, but it is virtually impossible to take into account all slow degrees of freedom in the enhanced sampling protocol (e.g., the collective variables used here do not represent anything related to sidechain dynamics). Therefore, if one mixes initial conformations that form different initial structures (e.g., an OF state and an IF state from two different PDB files), it is very likely that despite all equilibration and relaxation during SMwST and REUS simulations, the conformations that come from different sources never truly mix. This is dangerous in that it is quite difficult to detect such inconsistencies and from a theoretical point of view it makes the free energy calculations impossible. Methods such as WHAM and its various offshoots all rely on overlap between neighboring windows to calculate the free energy difference between two windows and the overlap should be in all dimensions and not just the ones that we use for biasing. This is related to well-known issues such as hidden barriers and metastability. If one uses two different structures to generate the initial conformations, then the authors need to show their sampling has been long enough to allow the two sets of conformations to mix and overlap in all dimensions, which is a difficult task to do. 

      We partly agree with the reviewer in that it is challenging to investigate whether the structures generated from the two different initial structures are sufficiently mixed in terms of orthogonal degrees of freedom outside the CV space during our string method and REUS simulations. We acknowledge that our simulations are within 100 ns for each REUS window, and there could be some slow degrees of freedom that are not fully sampled within this timescale. However, the conjectures and concerns raised by the reviewer are somewhat subjective in that they are almost impossible to be completely disproven. In a sense, these concerns are essentially the same as the general suspicion that the biomolecular simulation results are not completely converged, which cannot be fully ruled out for relatively complex biomolecular systems in any computational study involving MD simulations.  We also note that comparison among the PMFs of different cation bound/unbound states will have some error cancellation effects because of the consistent use of the same sampling methods for all three systems. Our main conclusions regarding the cooperative binding and transport of the two substrates lie in such comparison of the PMFs and additionally on the unbiased MD simulations. Thus, although there could be insufficient sampling, our key conclusions based on the relative comparison between the PMFs are more robust and less likely to suffer from insufficient sampling.

      (3) I also have concerns regarding the choice of collective variables. The authors have split the residues in each transmembrane helix into the cyto- and periplasmic sides. Then they have calculated the mass center distance between the cytoplasmic sides of certain pairs of helices and have also done the same for the periplasmic side. Given the shape of a helix, this does not seem to be an ideal choice since rather than the rotational motion of the helix, this captures more the translational motion of the helix. However, the transmembrane helices are more likely to undergo rotational motion than the translational one. 

      Our choice of CVs not only captures the translational motion but also the rotational motion of the helix. Consider a pair of helices. If there is a relative rotation in the angle between the two helices, causing the extracellular halves of the two helices to get closer and the intracellular halves to be more separated, this rotational motion can be captured as the decrease of one CV describing the extracellular distance and increase in the other CV describing the intracellular distance between the two helices. Reversely, if one of the two CVs is forced to increase and the other one forced to decrease, it can, in principle, bias the relative rotation of the two helices with respect to each other. Indeed, comparing Figure 3 with Figure S4, the reorientation of the helices with respect to the membrane normal (Fig. S4) is accompanied by the simultaneous decrease and increase in the pairwise distances between different segments of the helices. Therefore, our choice of CVs in the string method and REUS are not biased against the rotation of the helices, as the reviewer assumed.

      (4) Convergence: String method convergence data does not show strong evidence for convergence (Figure S2) in my opinion. REUS convergence is also not discussed. No information is provided on the exchange rate or overlap between the windows.

      The convergence of string method, REUS, the exchange rate and overlap between windows will be discussed in the reviewed manuscript.

      Reviewer 3:

      The paper from Liang and Guan details the calculation of the potential mean force for the transition between two key states of the melibiose (Mel) transporter MelB. The authors used the string method along with replica-exchange umbrella sampling to model the transition between the outward and inwardfacing Mel-free states, including the binding and subsequent release of Mel. They find a barrier of ~6.8 kcal/mol and an overall free-energy difference of ~6.4 kcal/mol. They also investigate the same process without the co-transported Na+, finding a higher barrier, while in the D59C mutant, the barrier is nearly eliminated.

      For Na+ bound state, the rate-limiting barrier is 8.4 kcal/mol instead of 6.8 kcal/mol. The overall free energy difference is 3.7 kcal/mol instead of 6.4 kcal/mol. These numbers need to be corrected in the public review.

      I found this to be an interesting and technically competent paper. I was disappointed actually to see that the authors didn't try to complete the cycle. I realize this is beyond the scope of the study as presented.

      We agree with the reviewer that characterizing the complete cycle is our eventual goal. However, in order to characterize the complete cycle of the transporter, the free energy landscapes of the Na+ binding and unbinding process in the sugar-bound and unbound states, as well as the OF to IF conformational transition in the apo state. These additional calculations are expensive, and the amount of work devoted to these new calculations is estimated to be at least the same as the current study. Therefore, we prefer to carry out and analyze these new simulations in a future study.  

      The results are in qualitative agreement with expectations from experiments. Could the authors try to make this comparison more quantitative? For example, by determining the diffusivity along the path, the authors could estimate transition rates.

      In our revised manuscript, we will determine the diffusivity along the path and estimate transition rates.

      Relatedly, could the authors comment on how typical concentration gradients of Mel and Na+ would affect these numbers?

      The concentration gradient of Mel and Na+ can be varied in different experimental setups. In a typical active transport essay, the Na+ has a higher concentration outside the cell, and the melibiose has a higher concentration inside the cell. In the steady state, depending on the experiment setup, the extracellular Na+ concentration is in the range of 10-20 mM, and the intracellular concentration is self-balanced in the range of 3-4 mM due to the presence of other ion channels and pumps. In addition to the Na+ concentration gradient, there is also a transmembrane voltage potential of -200 mV (the intracellular side being more negative than the extracellular side), which facilitates the Na+ release into the intracellular side. In the steady state, the extracellular concentration of melibiose is ~0.4 mM, and the intracellular concentration is at least 1000 times the extracellular concentration, greater than 0.4 M. In this scenario, the free energy change of intracellular melibiose translocation will be increased by about ~5 kcal/mol at 300K temperature, leading to a total ∆𝐺 of ~8 kcal/mol. The total barrier for the melibiose translocation is expected to be increased by less than 5 kcal/mol. However, the increase in ∆𝐺 for intracellular melibiose translocation will be compensated by a decrease in ∆𝐺 of similar magnitude ( ~5 kcal/mol) for intracellular Na+ translocation. In a typical sugar self-exchange essay, there is no net gradient in the melibiose or Na+ across the membrane, and the overall free energy changes we calculated apply to this situation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In their manuscript "PDGFRRa signaling regulates Srsf3 transcript binding to affect PI3K signaling and endosomal trafficking" Forman and colleagues use iMEPM cells to characterize the effects of PDGF signaling on alternative splicing. They first perform RNA-seq using a one-hour stimulation with Pdgf-AA in control and Srsf3 knockdown cells. While Srsf3 manipulation results in a sizeable number of DE genes, PDGF does not. They then turn to examine alternative splicing, due to findings from this lab. They find that both PDGF and Srsf3 contribute much more to splicing than transcription. They find that the vast majority of PDGF-mediated alternative splicing depends upon Srsf3 activity and that skipped exons are the most common events with PDGF stimulation typically promoting exon skipping in the presence of Srsf3. They used eCLIP to identify RNA regions bound to Srsf3. Under both PDGF conditions, the majority of peaks were in exons with +PDGF having a substantially greater number of these peaks. Interestingly, they find differential enrichment of sequence motifs and GC content in stimulated versus unstimulated cells. They examine 2 transcripts encoding PI3K pathway (enriched in their

      GO analysis) members: Becn1 and Wdr81. They then go on to examine PDGFRRa and Rab5, an endosomal marker, colocalization. They propose a model in which Srsf3 functions downstream of PDGFRRa signaling to, in part, regulate PDGFRa trafficking to the endosome. The findings are novel and shed light on the mechanisms of PDGF signaling and will be broadly of interest. This lab previously identified the importance of PDGF naling on alternative splicing. The combination of RNA-seq and eCLIP is an exceptional way to comprehensively analyze this effect. The results will be of great utility to those studying PDGF signaling or neural crest biology. There are some concerns that should be considered, however. 

      We thank the Reviewer for these supportive comments.

      (1) It took some time to make sense of the number of DE genes across the results section and Figure 1. The authors give the total number of DE genes across Srsf3 control and loss conditions as 1,629 with 1,042 of them overlapping across Pdgf treatment. If the authors would add verbiage to the point that this leaves 1,108 unique genes in the dataset, then the numbers in Figure 1D would instantly make sense. The same applies to PDGF in Figure 1F and the Venn diagrams in Figure 2. 

      We have edited the relevant sentence for Figure 1D as follows: “There was extensive overlap (521 out of 1,108; 47.0%) of Srsf3-dependent DE genes across ligand treatment conditions, resulting in a total of 1,108 unique genes within both datasets (Fig. 1C,D; Fig. S1A).” Similarly, we edited the relevant sentence for Figure 1F as follows: “There was limited overlap (4 out of 47; 8.51%) of PDGF-AA-dependent DE genes across Srsf3 conditions, resulting in a total of 47 unique genes within both datasets (Fig. 1E,F; Fig. S1B).” We edited the relevant sentence for Figure 2B as follows: “There was limited overlap (203 out of 1,705; 11.9%) of Srsf3-dependent alternatively-spliced transcripts across ligand treatment conditions, resulting in a total of 1,705 unique events within both datasets (Fig. 2A,B).” Finally, we edited the relevant sentence for Figure 2D as follows: “There was negligible overlap (9 out of 622; 1.45%) of PDGF-AA-dependent alternatively-spliced transcripts across Srsf3 conditions, resulting in a total of 622 unique events within both datasets (Fig. 2C,D).”

      (2) The percentage of skipped exons in the +DPSI on the righthand side of Figure 2F is not readable.  

      We have moved the label for the percentage of skipped exon events with a +DPSI for the -PDGF-AA vs +PDGF-AA (scramble) alternatively-spliced transcripts in Figure 2E so that it is legible.

      (3) It would be useful to have more information regarding the motif enrichment in Figure 3. What is the extent of enrichment? The authors should also provide a more complete list of enriched motifs, perhaps as a supplement. 

      We have added P values beneath the motifs in Figure 3F and 3G. Further, we have added a new Supplementary Figure, Figure S5, that lists the occurrence of the top 10 most enriched motifs in the unstimulated and, separately, stimulated samples in the eCLIP dataset and in a control dataset, as well as their P values.

      (4) It is unclear what subset of transcripts represent the "overlapping datasets" on lines 280-315. The authors state that there are 149 unique overlapping transcripts, but the Venn diagram shows 270. Also, it seems that the most interesting transcripts are the 233 that show alternative splicing and are bound by Srsf3. Would the results shown in Figure 5 change if the authors focused on these transcripts? 

      The Reviewer is correct that 233 of the alternatively-spliced transcripts had an Srsf3 eCLIP peak, as indicated in Figure 5A. However, several of these eCLIP peaks were a large distance from an alternatively-spliced element in the rMATS datasets, indicating that Srsf3 binding may not be contributing to the splicing outcomes in these cases. Instead, we correlated the eCLIP peaks with AS events by identifying transcripts in which Srsf3 bound within an alternatively-spliced exon or within 250 bp of the neighboring introns. We have added additional text clarifying this point in the Results: “We next sought to identify high-confidence transcripts for which Srsf3 binding had an increased likelihood of contributing to AS. Previous studies revealed enrichment of functional RBP motifs near alternatively-spliced exons (Yee et al., 2019). As such, we correlated the eCLIP peaks with AS events across all four treatment comparisons by identifying transcripts in which Srsf3 bound within an alternatively-spliced exon or within 250 bp of the neighboring introns (Tables S12-S15).” Further, we have relabeled Figure 5B as “Highconfidence, overlapping datasets biological process GO terms”.

      (5) In general, there is little validation of the sequencing results, performing qPCR on Arhgap12 and Cep55. The authors should additionally validate the PI3K pathway members that they analyze. Related, is Becn1 expression downregulated in the absence of Srsf3, as would be predicted if it is undergoing NMD? 

      We have added two new figure panels, Figure 5F-5G, assessing Wdr81 AS and Wdr81 protein sizes, as this gene has previously been implicated in craniofacial development. We have added the following text to the Results section: “Finally, as Wdr81 protein levels are predicted to regulate RTK trafficking between early and late endosomes, we confirmed the differential AS of Wdr81 transcripts between unstimulated scramble cells and scramble cells treated with PDGFAA ligand for 1 hour by qPCR using primers within constitutively-expressed exons flanking alternatively-spliced exon 9. This analysis revealed a decreased PSI for Wdr81 in each of three biological replicates upon PDGF-AA ligand treatment (Fig. 5F). Relatedly, we assessed the ratio of larger isoforms of Wdr81 protein (containing the WD3 domain) to smaller isoforms (missing the WD3 domain) via western blotting. Consistent with our RNA-seq and qPCR results, PDGFAA stimulation for 24 hours in the presence of Srsf3 led to an increase in smaller Wdr81 protein isoforms (Fig. 5G).”

      (6) What is the alternative splicing event for Acap3?  

      We have added the following text to the Results section and updated Figure 5E with Acap3 eCLIP peak visualization and the predicted alternative splicing outcome: “Finally, Acap3 is a GTPase-activating protein (GAP) for the small GTPase Arf6, converting Arf6 to an inactive, GDP-bound state (Miura et al., 2016). Arf6 localizes to the plasma membrane and endosomes, and has been shown to regulate endocytic membrane trafficking by increasing PI(4,5)P2 levels at the cell periphery (D’Souza-Schorey and Chavrier, 2006). Further, constitutive activation of Arf6 leads to upregulation of the gene encoding the p85 regulatory subunit of PI3K and increased activity of both PI3K and AKT (Yoo et al., 2019)… Srsf3 binding was additionally increased in Acap3 exon 19 upon PDGF-AA stimulation, at an enriched motif within the highconfidence, overlapping datasets, and we observed a corresponding increase in excision of adjacent intron 19 (Fig. 5D,E). As Acap3 intron 19 contains a PTC, this event is predicted to result in more transcripts encoding full-length protein (Fig. 5E).”

      (7) The insets in Figure 6 C"-H" are useful but difficult to see due to their small size. Perhaps these could be made as their own figure panels. 

      We have increased the size of the previous insets in new Figure 6 panels C’’’-H’’’.

      (8) In Figure 6A, it is not clear which groups have statistically significant differences. A clearer visualization system should be used. 

      We have added bracket shapes to Figure 6A indicating the statistically significant differences between scramble 0 minutes and scramble 60 minutes, and between scramble 60 minutes and shSrsf3 60 minutes.

      (9) Similarly in Figure 6B, is 15 vs 60 minutes in the shSrsf3 group the only significant difference? Is there a difference between scramble and shSrsf3 at 15 minutes? Is there a difference between 0 and 15 minutes for either group? 

      We have added a bracket shape to Figure 6B indicating the statistically significant difference between shSrsf3 at 15 minutes and shSrsf3 at 60 minutes. No other pairwise comparisons between treatments or timepoints were statistically significantly different.

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript builds upon the work of a previous study published by the group (Dennison, 2021) to further elucidate the coregulatory axis of Srsf3 and PDGFRa on craniofacial development. The authors in this study investigated the molecular mechanisms by which PDGFRa signaling activates the RNA-binding protein Srsf3 to regulate alternative splicing (AS) and gene expression (GE) necessary for craniofacial development. PDGFRa signaling-mediated Srsf3 phosphorylation drives its translocation into the nucleus and affects binding affinity to different proteins and RNA, but the exact molecular mechanisms were not known. The authors performed RNA sequencing on immortalized mouse embryonic mesenchyme (MEPM) cells treated with shRNA targeting 3' UTR of Srsf3 or scramble shRNA (to probe AS and DE events that are Srsf3 dependent) and with and without PDGF-AA ligand treatment (to probe AS and DE events that are PDGFRa signaling dependent). They found that PDGFRa signaling has more effect on AS than on DE. A matching eCLIP-seq experiment was performed to investigate how Srsf3 binding sites change with and without PDGFRa signaling. 

      Strengths: 

      (1) The work builds well upon the previous data and the authors employ a variety of appropriate techniques to answer their research questions. 

      (2) The authors show that Srsf3 binding pattern within the transcript as well as binding motifs change significantly upon PDGFRa signaling, providing a mechanistic explanation for the significant changes in AS. 

      (3) By combining RNA-seq and eCLIP datasets together, the authors identified a list of genes that are directly bound by Srsf3 and undergo changes in GE and/or AS. Two examples are Becn1 and Wdr81, which are involved in early endosomal trafficking.  We thank the Reviewer for these supportive comments.

      Weaknesses: 

      (1) The authors identify two genes whose AS are directly regulated by Srsf3 and involved in endosomal trafficking; however, they do not validate the differential AS results and whether changes in these genes can affect endosomal trafficking. In Figure 6, they show that PDGFRa signaling is involved in endosome size and Rab5 colocalization, but do not show how Srsf3 and the two genes are involved. 

      We have added two new figure panels, Figure 5F-5G, assessing Wdr81 AS and Wdr81 protein sizes, as this gene has previously been implicated in craniofacial development. We have added the following text to the Results section: “Finally, as Wdr81 protein levels are predicted to regulate RTK trafficking between early and late endosomes, we confirmed the differential AS of Wdr81 transcripts between unstimulated scramble cells and scramble cells treated with PDGFAA ligand for 1 hour by qPCR using primers within constitutively-expressed exons flanking alternatively-spliced exon 9. This analysis revealed a decreased PSI for Wdr81 in each of three biological replicates upon PDGF-AA ligand treatment (Fig. 5F). Relatedly, we assessed the ratio of larger isoforms of Wdr81 protein (containing the WD3 domain) to smaller isoforms (missing the WD3 domain) via western blotting. Consistent with our RNA-seq and qPCR results, PDGFAA stimulation for 24 hours in the presence of Srsf3 led to an increase in smaller Wdr81 protein isoforms (Fig. 5G).” The experiments in Figure 6 compare early endosome size, PDGFRa localization in early endosomes and phospho-Akt levels in response to PDGF-AA stimulation in scramble versus shSrsf3 cells, demonstrating that Srsf3-mediated PDGFRa signaling leads to enlarged early endosomes, retention of PDGFRa in early endosomes and increased downstream phospho-Akt signaling. Though we agree with the Reviewer that functionally linking the AS events to the endosomal phenotype would strengthen our conclusions, these are technically challenging experiments for several reasons. First, this approach has typically relied on tiling oligos against a region of interest to find the optimal sequence. We identified several transcripts that are bound by Srsf3 and undergo alternative splicing upon PDGFRa signaling to potentially contribute to the regulation of PI3K signaling and early endosomal trafficking. We do not expect that these effects are mediated by a single transcript but may instead by mediated by a combination of alternative splicing changes. As such, these experiments would require us to identify and validate multiple splice-switching antisense oligonucleotides (ASOs). Second, ASOs designed against a specific target may not lead to alternative splicing of that target, even in cases of high predicted binding affinities (Scharner et al., 2020, Nucleic Acid Res 48(2), 802816). Third, ASOs have been shown to result in off-target mis-splicing effects, which are hard to predict (Scharner et al., 2020, Nucleic Acid Res 48(2), 802-816). The design of functional ASOs is thus a long-standing challenge in the field, and likely beyond the scope of this manuscript. We have added the following text to the Discussion to highlight this potential future direction: “In the future, it will be worthwhile to attempt to functionally link the AS of transcripts such as Becn1, Wdr81 and/or Acap3 to the endosomal trafficking changes observed above using spliceswitching antisense oligonucleotides (ASOs).”

      (2) The proposed model does not account for other proteins mediating the activation of Srsf3 after Akt phosphorylation. How do we know this is a direct effect (and not a secondary or tertiary effect)? 

      This point is introduced in the Discussion: “Whether phosphorylation of Srsf3 directly influences its binding to target RNAs or acts to modulate Srsf3 protein-protein interactions which then contribute to differential RNA binding remains to be determined, though findings from Schmok et al., 2024 may argue for the latter mechanism. Studies identifying proteins that differentially interact with Srsf3 in response to PDGF-AA ligand stimulation are ongoing and will shed light on these mechanisms…. Again, this shift could be due to loss of RNA binding owing to electrostatic repulsion and/or changes in ribonucleoprotein composition and will be the subject of future studies.” We have added a potential change in Srsf3 protein-protein interactions upon Akt phosphorylation in the model in Figure 6J.

      Reviewer #2 (Recommendations For The Authors): 

      Suggestions: 

      (1) It would strengthen the paper and improve the connection with the other sections of the paper if the authors show: 

      a)  validation of PDGFRa signaling leading to AS of Becn1 and Wdr81 and corresponding changes in protein, and  

      We have added two new figure panels, Figure 5F-5G, assessing Wdr81 AS and Wdr81 protein sizes, as this gene has previously been implicated in craniofacial development. We have added the following text to the Results section: “Finally, as Wdr81 protein levels are predicted to regulate RTK trafficking between early and late endosomes, we confirmed the differential AS of Wdr81 transcripts between unstimulated scramble cells and scramble cells treated with PDGFAA ligand for 1 hour by qPCR using primers within constitutively-expressed exons flanking alternatively-spliced exon 9. This analysis revealed a decreased PSI for Wdr81 in each of three biological replicates upon PDGF-AA ligand treatment (Fig. 5F). Relatedly, we assessed the ratio of larger isoforms of Wdr81 protein (containing the WD3 domain) to smaller isoforms (missing the WD3 domain) via western blotting. Consistent with our RNA-seq and qPCR results, PDGFAA stimulation for 24 hours in the presence of Srsf3 led to an increase in smaller Wdr81 protein isoforms (Fig. 5G).”

      b)  functionally link the AS event(s) to endosomal phenotype using ASOs, etc. 

      Though we agree with the Reviewer that such results would strengthen our conclusions, these are technically challenging experiments for several reasons. First, this approach has typically relied on tiling oligos against a region of interest to find the optimal sequence. We identified several transcripts that are bound by Srsf3 and undergo alternative splicing upon PDGFRa signaling to potentially contribute to the regulation of PI3K signaling and early endosomal trafficking. We do not expect that these effects are mediated by a single transcript but may instead by mediated by a combination of alternative splicing changes. As such, these experiments would require us to identify and validate multiple splice-switching antisense oligonucleotides (ASOs). Second, ASOs designed against a specific target may not lead to alternative splicing of that target, even in cases of high predicted binding affinities (Scharner et al., 2020, Nucleic Acid Res 48(2), 802-816). Third, ASOs have been shown to result in off-target mis-splicing effects, which are hard to predict (Scharner et al., 2020, Nucleic Acid Res 48(2), 802-816). The design of functional ASOs is thus a long-standing challenge in the field, and likely beyond the scope of this manuscript. We have added the following text to the Discussion to highlight this potential future direction: “In the future, it will be worthwhile to attempt to functionally link the AS of transcripts such as Becn1, Wdr81 and/or Acap3 to the endosomal trafficking changes observed above using splice-switching antisense oligonucleotides (ASOs).”

      (2) The Venn diagram in Figure 5A and the description of the analysis the authors did to combine the RNA-seq and eCLIP-seq data are a little confusing. The authors say that they correlated eCLIP peaks with GE or AS events across all four treatment comparisons. The purpose of looking at both datasets was to find genes that are directly bound by Srsf3 and also have significantly affected GE and/or AS. Therefore, the data with and without PDGF-AA should be considered separately. For example, eCLIP peaks in the PDGF-AA condition can be correlated to Srsf3-dependent AS differences (comparing shSrsf3 and scramble) in the -PDGF-AA condition, and eCLIP peaks in the +PDGF-AA condition can be correlated to Srsf3-dependent AS differences in the +PDGF-AA condition. In the Venn diagram and the description, it seems like all comparisons were combined and it is not clear how the data were analyzed.

      As indicated in Figure 5A, 233 of the alternatively-spliced transcripts uniquely found in one of the four treatment comparisons had an Srsf3 eCLIP peak. However, several of these eCLIP peaks were a large distance from an alternatively-spliced element in the rMATS datasets, indicating that Srsf3 binding may not be contributing to the splicing outcomes in these cases. Instead, we correlated the eCLIP peaks with AS events by identifying transcripts in which Srsf3 bound within an alternatively-spliced exon or within 250 bp of the neighboring introns. We have added additional text clarifying this point in the Results: “We next sought to identify highconfidence transcripts for which Srsf3 binding had an increased likelihood of contributing to AS.

      Previous studies revealed enrichment of functional RBP motifs near alternatively-spliced exons (Yee et al., 2019). As such, we correlated the eCLIP peaks with AS events across all four treatment comparisons by identifying transcripts in which Srsf3 bound within an alternativelyspliced exon or within 250 bp of the neighboring introns (Tables S12-S15).” Further, we have relabeled Figure 5B as “High-confidence, overlapping datasets biological process GO terms”. We respectfully disagree with the Reviewer’s suggested comparisons. A comparison of the PDGF-AA eCLIP data with the scramble vs shSrsf3 (-PDGF-AA) data from the list of highconfidence transcripts resulted in only 7 transcripts. Similarly, a comparison of the +PDGF-AA eCLIP data with the scramble vs shSrsf3 (+PDGF-AA) data from the list of high-confidence transcripts resulted in only 14 transcripts. Separate gene ontology analyses of these lists of 7 and 14 transcripts revealed 21 and 40 significant terms for biological process, respectively, the majority of which encompassed one, and never more than two, transcripts. Had we separately examined the -PDGF-AA and +PDGF-AA data, we would not have detected the changes in Becn1, Wdr81 and Acap3 in Figure 5E.

    1. Author response:

      We appreciate the reviewer’s recognition of the strengths of our work as well as their constructive critiques and insightful suggestions for improvement. In this provisional response, we outline how we plan to address the reviewer’s comments in the revised manuscript. 

      (1) Viscosity and surface tension are not accurately measured. 

      We thank the reviewers for bringing up this important point. We are aware that FRAP is not the best method to accurately measure condensate viscoelasticity due to the problems the reviewers and others in the field have pointed out. More accurate methods of measuring fluorescent protein mobility, such as single-molecule tracking or fluorescence correlation spectroscopy, can be used; however, they cannot accurately reflect the time scale dependence of viscoelasticity in the condensate either. Other methods such as rheology and micropipette aspiration that have been used to measure condensate viscoelasticity in vitro are not accessible in living cells yet. Similarly, there is no readily available method to directly measure the surface tension of condensates in live cells. Therefore, we used FRAP and fusion assays to estimate the ratio of surface tension between the two condensates. This ratio was then used to determine the surface tension of the coiled coil condensates in the model after estimating the surface tension for disordered condensate from in vitro measurements (https://doi.org/10.1016/j.bpr.2021.100011). In the revision, we will adjust our FRAP fitting and use condensates with similar sizes to make our FRAP data more accurate. However, based on the large difference we observed for these two condensates, we do not believe these FRAP improvements would change the conclusions. 

      We are also aware that the stokes-einstein relation strictly applies to purely viscous systems. One can apply the generalized Stokes-Einstein relation, which links the diffusion coefficient to the complex viscoelastic modulus of the medium. However, the complex modulus is difficult to determine in cells through live imaging. We thus used the Stokes-Einstein relation to estimate the ratio of effective viscosities, assuming elastic deformations relax faster. In the revision, we will add these assumptions to our discussion. 

      (2) Justification of a Neo-Hookean elasticity model for chromatin. 

      We thank the reviewer for highlighting this important aspect of our work. The observation that the strains R/ξ in our initial model are of the order of 100 is valid and raises questions about the applicability of the Neo-Hookean model. While it is true that at such high strains, the pressure becomes nearly constant (5E/6), our model remains applicable within the range of strains relevant to chromatin, particularly for small droplets where R/ξ values are more moderate. This is explicitly considered in the section “Effect of mechanical heterogeneity on condensate nucleation and growth,” where we also account for heterogeneous mesh sizes correlated with local stiffness. While these points are discussed in the supplementary material, we acknowledge that these details are not clearly presented in the main text, and we will revise the manuscript to explicitly discuss the strain regime and model applicability.

      We agree that varying both the stiffness E and mesh size ξ would provide a more comprehensive understanding of the system, as both parameters are likely affected by experimental perturbations. We will revisit our analysis to incorporate variations in ξ alongside E and discuss the potential effects on our results.

      Furthermore, the stabilization of condensate size by chromatin elasticity arises from the size-dependent pressure exerted by the elastic network, which is a feature of strain-stiffening elastic media rather than a specific property of the Neo-Hookean model. However, we agree that exploring the robustness of our results under alternative elasticity models would strengthen the manuscript. In the revised version, we will analyze additional elasticity models, including strain stiffening and thinning, to evaluate how these might influence our conclusions and to provide a broader context for the predicted growth phases.

      The connection between the nucleation barrier and the cavitation barrier is particularly intriguing. The referenced study (https://doi.org/10.1073/pnas.2102014118) highlights non-linear elastic effects, including breakage and cavitation, which may be relevant in our system. We will explore whether cavitation effects due to elastic confinement play a role in the nucleation dynamics observed here and include a discussion of these mechanisms in the revised manuscript.

      (3) Unclear description of nucleation in the model. 

      We thank the reviewer for pointing out the lack of clarity in our description of nucleation. R_0​ represents the critical radius for nucleation, beyond which droplets grow spontaneously. The nucleation probability p_nuc​ is evaluated at R_0​, which depends on the free energy barrier ΔG, supersaturation S, and the elastic properties of the surrounding medium. We will include a clearer explanation of R_0​, its dependence on parameters, and its role in nucleation in the revised manuscript.

      We ensure that the stiffness is sampled from a truncated normal distribution, preventing negative stiffness values. Sampling is performed at fixed intervals, and we will clarify the protocol to avoid bias and ensure consistency in the simulations.

      Supersaturation S will be defined regarding solute and solvent concentrations, and we will discuss its influence on ΔG and R_0​.

      The dependence of the elastic pressure P_E​ on R_0​, with stiffer surroundings leading to smaller nucleated droplets, will be explicitly clarified. We also agree that Figure S4A may be misleading, as it suggests spatial correlations in stiffness. We will revise the figure and caption to better represent the model assumptions.

      (4) Limited data for the elastic ripening claim.

      We acknowledge the reviewer’s concern regarding the limitation of support for the claim in the current manuscript. We believe our data do indicate elastic ripening. Particularly, the data points very close to zero are not necessarily artifacts of the fitting, as the elastic ripening can be very slow due to small differences in the local stiffness values around the droplets. We have mentioned this at the end of the section “Condensate material properties and chromatin heterogeneity determine the modes of ripening”. We shall revisit these results and remedy this concern with more data and analysis in the revised manuscript. 

      (5) Confusion for dynamic regimes such as "fusion", "ripening", and "diffusion-based" and the problem with using “ripening time” to compare ripening speed.

      We will clear up our definitions of the dynamic regimes and ensure consistent language use. The ripening time was defined as the time it takes per length of droplets to shrink. This way, the size dependence of the absolute ripening time is decoupled and thus can be used to compare the speed of ripening between two condensates. This is not well-explained in our current version. In the revision, we will redefine the normalized ripening time to avoid this confusion. 

      (6) Chromatin should be excluded from the condensates 

      We have data to support that chromatin is excluded from the condensates. We will add the data in the revision. 

      (7) Effect of protein production on the diffusive growth process.

      From the experiment, we do not believe that protein production is a significant source of the diffusive growth because for coiled-coil condensates nucleated with Hotag3 there was little diffusive growth. In the model also, condensates can grow for hours in the absence of protein production, depending on chromatin stiffness and surface tension. We aim to address the effect of protein production on growth in the revised manuscript.

    1. Author response:

      We thank the anonymous very much for dedicating their time to thoroughly review our manuscript. We sincerely appreciate their thoughtful consideration and detailed assessment. Regarding the raised concerns, we acknowledge the importance of exploring the full scope of class IIb microcins, however, we believe that in depth characterization, purification, and in vivo application of the 12 novel compounds goes beyond the scope of this short report and discovery article.

      At the same time, the reviewers acknowledge that the analysis, experimental design, the expression system as well as the performed assays are “sound”, “convincing”, and “corroborated by suitable controls”. In the present manuscript we sought to identify novel antimicrobials and to comprehensively verify their antimicrobial activity in E. coli irrespective of the siderophore-dependent delivery mechanism. Notably, none of the reviewers questioned that we describe new antimicrobials, the characteristics we used to find them, that they are class IIb microcins, or that they do exhibit antimicrobial activity against Gram-negative ESKAPE and plant pathogens.

      We believe that our discovery study can serve as a steppingstone towards the application of bacterially produced antimicrobial compounds to target Gram negative pathogens in numerous plant and animal species, including humans.

    1. Author response:

      Our response to Reviewer #1:

      We appreciate the reviewer’s comments to clarify the strengths and weaknesses of our work. Whether the effect of GM-CSF/IL-3 on the bowel is pro-inflammatory or anti-inflammatory has been controversial. In the present study, we have shown that CD131 mediated a pro-inflammatory effect of GM-CSF on the intestine, which may have worked in synergy with tissue-infiltrating macrophages. While its down-stream signaling has been investigated back and forth, we did not put effort into it. Using macrophage-specific CD131-deficient animals is important to clarify the effects of macrophage-specific CD131 on bowel inflammation. Our present work is indeed incomplete, and we anticipate to work on it further in future research. Concerning the results on human subjects, it is indeed that results from animal experiments were not completely reproduced. We believe that CD131 does have an effect on ulcerative colitis; however, due to the use of biological agents (e.g. anti-TNFs), the need for surgery in the treatment of ulcerative colitis has dramatically decreased and we could not get enough samples to reach a more convincing statistical analysis. Twenty-nine patients shown in the present study were all that received surgical intervention at our center during the past decade, and more human subjects will be needed in future research, possibly from multi-center study.

      Our response to Reviewer #2:

      Many appreciations for the valuable reviewer’s comments and suggestions. We realized that the number of animals per group was not indicated in each figure; in order to clarify the experimental rigor, we have deposited data used to generate the results of the present study in Dryad. Concerning the heterozygous CD131 knock-out animals, we think that others have used the homozygous mice in their studies; however, we observed premature deaths in those animals and we could not get any single homozygous mouse. We could not tell the exact reason, but we did observe robust phenotypes in these heterozygous mice. We do realize that our present work is incomplete, and more experiments need to be done to establish a causal relationship between CD131 and down-stream effects. We anticipate to use macrophage-specific homozygous CD131-deficient mice in our future research, which we believe will produce more meaningful and convincing results.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the Authors:

      Reviewer #2:

      (1) In my previous review, I noted that using three different movies to conclude that different genres evoke different thought patterns is an overinterpretation with only one instance per genre. In the rebuttal letter, the authors state that they provide "evidence that is necessary but not sufficient to conclude that we can distinguish different genres of films" (page 15). Accordingly, I suggest refraining from statements such as "There was a significant main effect of movie genre on memory" (page 13) in the manuscript.

      Thank you for this point. We have removed any reference to genre.

      Page 18 (referring to page 13) [354-355] “First, there was a significant main effect of movie on memory, F(2, 254.12) = 49.33, p <.001, η2 = .28.”

      Reviewer #3:

      The revised manuscript is easier to read and better contextualized.

      Thank you for this comment and for your feedback to allow us to make the manuscript more clear.

      Public Reviews:

      Reviewer #1:

      The lack of direct interrogation of individual differences/reliability of the mDES scores warrants some pause.

      Our study's goal was to understand how group-level patterns of thought in one group of participants relate to brain activity in a different group of participants. To this end, we decomposed trial-level mDES data to show dimensions that are common across individuals, which demonstrated excellent split-half reliability. Then we used these data in two complementary ways. First, we established that these ratings reliably distinguished between the different films (showing that our approach is sensitive to manipulations of semantic and affective features in a film) and that these group-level patterns were also able to predict patterns of brain activity in a different group of participants (suggesting that mDES dimensions are also sensitive to the way brain activity emerges during movie watching). Second, we established that variation across individuals in their mDES scores predicted their comprehension of information from films. Thus our study establishes that when applied to movie-watching, mDES is sensitive to individual differences in the movie-watching experience (as determined by an individual's comprehension). Given the success of this study and the relative ease with which mDES can be performed, it will be possible in the future to conduct mDES studies that hone in on both the general features of the movie-watching experience, as well as aspects that are more unique to an individual.

      Reviewer #2:

      (1) The distinction between thinking and stimulus processing (in the sense of detecting and assigning meaning to features, modulated by factors such as attention) remains unclear. Is "thinking" a form of conscious access or a reportable read-out from sensory and higher-level stimulus processing? Or does it simply refer to the method used here to identify different processing states?

      Thank you for highlighting this first point, which is an important consideration when attempting to map cognitive states. We have added some additional comments to our discussion section to expand on this point.

      Page 35-36 [698-711] “It is possible, therefore, that the identification of regions of visual and auditory cortex by our study reflects the participants attention to sensory input, rather than the complex analysis of these inputs that may be required for certain features of the movie watching experience. On the other hand, it is possible that the movie-watching state is a qualitatively different type of mental state to those that emerge in typical task situations. For example, unlike tasks, the movie-watching state is characterized by multi-modal sensory input, semantically rich themes, that evolve together to reveal a continuous narrative to the viewer. It is possible, therefore, that movies engender an absorbed state which depends more on processing in sensory cortex than would occur in traditional task paradigms such as a working memory task (when systems in association cortex may be needed to maintain information related to task rules). Important headway into addressing this uncertainty can be achieved by using mDES to compare the types of states that occur in different contexts (including both movies and tasks) and comparing the topography of brain activity associated with different experiential states.”

      (2) The dimensions of thought appear to be directly linked to brain areas traditionally associated with core faculties of perception and cognition. For example, superior temporal cortex codes for speech information, which is also where thought reports on verbal detail localize in this study. This raises the question of whether the present study truly captures mechanisms specific to thinking and distinct from processing, especially given that individual variations in reports were not considered and movie-specific features were not controlled for.

      Thank you for this point, we have added an additional paragraph to the discussion to expand on this.

      Page 35 [692-698] “Finally, it is worth considering whether the patterns of brain activity identified by our analysis reflect the stimuli that are processed during movie watching, or the cognitive and affective processing of this information. On the one hand, the regions we found were often within regions of sensory cortex, areas of the brain which are often ascribed basic stimulus processing functions [1]. Moreover, according to perspectives on cognition derived from more traditional task paradigms, complex features of cognition, such as the regulation of thought, are often attributed to regions of association cortex, such as the dorsolateral prefrontal cortex [2].”

      Reviewer #3:

      This paper is framed as presenting a new paradigm but it does little to discuss what this paradigm serves, what are its limitations and how it should have been tested. The novelty appears to be in using experience sampling from 1 sample to model the responses of a second sample.

      Thank you for this comment, we have since made clear what the novelty of the methodology is, as you have correctly identified, by expanding this point beyond the methods section to clearly orient the reader to the application and limitation of our methodological approach with our paradigm.

      Page 7-8 [149-174] “One challenge that arises when attempting to map the dynamics of thought onto brain activity during movie-watching is accounting for the inherently disruptive nature of experience sampling: to measure experience with sufficient frequency to map experiential reports during movies would inherently disrupt the natural processes of the brain and alter the viewer’s experience (for example, by pausing the film at a moment of suspense). Therefore, if we periodically interrupt viewers to acquire a description of their thoughts while recording brain activity, this could impact on the ability to capture important dynamic features of the brain. On the other hand, if we measured fMRI activity continuously over movie-watching (as is usually the case), we would lack the capacity to directly relate brain signals to the corresponding experiential states. Thus, to overcome these obstacles, we developed a novel methodological approach using two independent samples of participants. In the current study, one set of 120 participants was probed with mDES five times across the three ten-minute movie clips (11 minutes total, no sampling in the first minute). We used a jittered sampling technique where probes were delivered at different intervals across the film for different people depending on the condition they were assigned. Probe orders were also counterbalanced to minimize the systematic impact of prior and later probes at any given sampling moment. We used these data to construct a precise description of the dynamics of experience for every 15 seconds of three ten-minute movie clips. These data were then combined with fMRI data from a different sample of 44 participants who had already watched these clips without experience sampling [3]. By combining data from two different groups of participants, our method allows us to describe the time series of different experiential states (as defined by mDES) and relate these to the time series of brain activity in another set of participants who watched the same films with no interruptions. In this way, our study set out to explicitly understand how the patterns of thoughts that dominate different moments in a film in one group of participants relate to the brain activity at these time points in a second set of participants and, therefore, better understand the contribution of different neural systems to the movie-watching experience.”

      Page 33-35 [658-691] “Importantly, our study provides a novel method for answering these questions and others regarding the brain basis of experiences during films that can be applied simply and cost-effectively. As we have shown, mDES can be combined with existing brain activity, allowing information about both brain activity and experience to be determined at a relatively low cost.  For example, the cost-effective nature of our paradigm makes it an ideal way to explore the relationship between cognition and neural activity during movie-watching during different genres of film. In neuroimaging, conclusions are often made using one film in naturalistic paradigm studies [4]. Although the current study only used three movie clips, restraining our ability to form strong conclusions regarding how different patterns of thought relate to specific genres of film, in the future, it will be possible to map cognition across a more extensive set of movies and discern whether there are specific types of experience that different genres of films engage. One of the major strengths of our approach, therefore, is the ability to map thoughts across groups of participants across a wide range of movies at a relatively low cost.

      Nonetheless, this paradigm is not without limitations. This is the first study, as far as we know, that attempts to compare experiential reports in one sample of participants with brain activity in a second set of participants, and while the utility of this method enables us to understand the relationship between thought and brain activity during movies, it will be important to extend our analysis to mDES data during movie-watching while brain activity is recorded. In addition, our study is correlational in nature, and in the future, it could be useful to generate a more mechanistic understanding of how brain activity maps onto the participants experience. Our analysis shows that mDES is able to discriminate between films, highlighting its broad sensitivity to variation in semantic or affective content. Armed with this knowledge, we propose that in the future, researchers could derive mechanistic insights into how the semantic features may influence the mDES data. For example, it may be possible to ask participants to watch movies in a scrambled order to understand how the structure of semantic or information influences the mapping between brains and ongoing experience as measured by mDES. Finally, our study focused on mapping group-level patterns of experience onto group-level descriptions of brain activity. In the future it may be possible to adopt a “precision-mapping” approach by measuring longer periods of experience using mDES and determining how the neural correlates of experience vary across individuals who watched the same movies while brain activity was collected [5]. In the future, we anticipate that the ease with which our method can be applied to different groups of individuals and different types of media will make it possible to build a more comprehensive and culturally inclusive understanding of the links between brain activity and movie-watching experience.”

      What are the considerations for treating high-order thought patterns that occur during film viewing as stable enough to use across participants? What would be the limitations of this method? (Do all people reading this paper think comparable thoughts reading through the sections?) This is briefly discussed in the revised manuscript and generally treated as an opportunity rather than as a limitation.

      It is likely, based on our study, that films can evoke both stereotyped thought patterns (i.e. thoughts that many people will share) and others that are individualistic. It is clear that, in principle, mDES is capable of capturing empirical information on both stereotypical thoughts and idiosyncratic thoughts. For example, clear differences in experiences across films and, in particular, during specific periods within a film, show that movie-watching can evoke broadly similar thought patterns in different groups of participants (see Figure 3 right-hand panel). On the other hand, the association between comprehension and the different mDES components indicate that certain individuals respond to the same film clip in different ways and that these differences are rooted in objective information (i.e. their memory of an event in a film clip). A clear example of these more idiosyncratic features of movie watching experience can be seen in the association between “Episodic Knowledge” and comprehension. We found that “Episodic Knowledge” was generally high in the romance clip from 500 Days of Summer but was especially high for individuals who performed the best, indicating they remembered the most information. Thus good comprehends responded to the 500 Days of Summer clip with responses that had more evidence of “Episodic Knowledge” In the future, since the mDES approach can account for both stereotyped and idiosyncratic features of experience, it will be an important tool in understanding the common and distinct features that movie watching experiences can have, especially given the cost effective manner with which these studies can be run.  

      In conclusion, this study tackles a highly interesting subject and does it creatively and expertly. It fails to discuss and establish the utility and appropriateness of its proposed method.

      Thank you very much for your feedback and critique. In our revision and our responses to these questions, we provided more information about the method's robustness utility and application to understanding cognition. Thank you for bringing these points to our attention.

      References

      (1) Kaas, J.H. and C.E. Collins, The organization of sensory cortex. Current Opinion in Neurobiology, 2001. 11(4): p. 498-504.

      (2) Turnbull, A., et al., Left dorsolateral prefrontal cortex supports context-dependent prioritisation of off-task thought. Nature Communications, 2019. 10.

      (3) Aliko, S., et al., A naturalistic neuroimaging database for understanding the brain using ecological stimuli. Scientific Data, 2020. 7(1).

      (4) Yang, E., et al., The default network dominates neural responses to evolving movie stories. Nature Communications, 2023. 14(1): p. 4197.

      (5) Gordon, E.M., et al., Precision Functional Mapping of Individual Human Brains. Neuron, 2017. 95(4): p. 791-807.e7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper represents a huge amount of work on a condition whose patients' health and well-being have not always been prioritized, and only relatively recently has the immune dysregulation seen in patients with Down Syndrome (DS) been garnering major research interest.

      This paper provides an unparalleled examination of immune disorders in patients with DS. The authors also report the results from a clinical trial with the JAK inhibitor tofacitinib in DS patients.

      Strengths:

      This manuscript reports a herculean effort and provides an unparalleled examination of immune disorders in a large number of patients with DS.

      Weaknesses:

      Not a major weakness but, apart from finding an elevation of CD4 T central memory cells and more differentiated plasmablast, several of the alterations reported in this manuscript had already been suggested by a few case reports and a very small series. On the other hand, the number of patients (and controls) utilized for this study is remarkable and allows for drawing much firmer conclusions.

      We are grateful for the Reviewer’s very positive assessment of the work and results presented in this manuscript. We agree that many of the changes in the peripheral immune system reported here had been previously documented by our team and others using smaller sample sizes. However, as the Reviewer appreciated, this study involves an order of magnitude more research participants than previous studies (i.e., ~400 total participants, ~300 of them with trisomy 21 versus ~100 controls), which enabled us to investigate associations between immune changes and clinical variables, while also helping us draw much firmer conclusions.

      Reviewer #2 (Public Review):

      In this manuscript, Rachubinski and colleagues provide a comprehensive clinical, immunological, and autoantibody assessment of autoimmune/inflammatory manifestations of patients with Down syndrome (DS) in a large number of patients with this disorder. These analyses confirm prior results of excess interferon and cytokine signals in DS patients and extend these observations to highlight early-onset immunological aberrancies, far before symptoms occur, as well as characterizing novel autoantibody reactivities in this patient population. Then, the authors report the interim analysis of an open-label, Phase II, clinical trial of the JAK1/3 inhibitor, tofacitinib, that aims to define the safety, clinical efficacy, and immunological outcomes of DS patients who suffer from inflammatory conditions of the skin. The clinical trial analysis indicates that the treatment is tolerated without serious adverse effects and that the majority of patients have experienced clinical improvement or remission in their corresponding clinical cutaneous manifestations as well as improvement or normalization of aberrant immunological signals such as cytokines.

      The major strength of the study is the recruitment and uniform, systematic evaluation of an impressive number of DS patients. Moreover, the promising early results from the tofacitinib clinical trial pave the way for analysis of a larger number of patients within the Phase II trial and otherwise, which may lead to improved clinical outcomes for affected patients. An inherent weakness of such studies is the descriptive nature of several parameters and the relatively small size of tofacitinib-treated DS patients. However, the descriptive nature of some of the correlative research analyses is of scientific interest and is useful to generate hypotheses for future additional (including mechanistic) work, and treatment of 10 DS patients in a formal clinical trial at interim analysis is not a trivial task for a disease like this. The manuscript achieves the aims of the authors and the results support their conclusions. The authors appropriately acknowledge areas that require more research and areas that are not well understood. The results are represented in a useful manner and statistical methods and analyses appear sound.

      We appreciate the very positive evaluation by this Reviewer. We agree with the Reviewer on the descriptive nature of many of the analyses completed and on the value of a larger cohort of individuals with Down syndrome treated with a JAK inhibitor. The clinical trial will involve a total of 40 participants, and we look forward to reporting the results from the full cohort in the near future.

      Reviewer #3 (Public Review):

      Summary:

      Individuals with Down syndrome (DS) have high rates of autoimmunity and can have exaggerated immune responses to infection that can unfortunately cause significant medical complications. Prior studies from these authors and others have convincingly demonstrated that individuals with DS have immune dysregulation including increased Type I IFN activity, elevated production of inflammatory cytokines (hypercytokinemia), increased autoantibodies, and populations of dysregulated adaptive immune cells that pre-dispose to autoimmunity. Prior studies have demonstrated that using JAK inhibitors to treat patient samples in vitro, in small case series of patients, and in mouse models of DS leads to improvement of immune phenotype and/or clinical disease. This manuscript provides two major advances in our understanding of immune dysregulation and therapy for patients. First, they perform deep immune phenotyping on several hundred individuals with DS and demonstrate that immune dysregulation is present from infancy. Second, they report a promising interim analysis of a Phase II clinical trial of a JAK inhibitor in 10 people with DS and moderate to severe skin autoimmunity.

      Strengths and weaknesses:

      The relatively large cohort and careful clinical annotation here provide new insights into the immune phenotype of patients with DS. For example, it is interesting that regardless of autoimmune disease or autoantibody status, individuals with DS have elevated cytokines and CRP. Analysis of the cohorts by age demonstrated that some cytokines are significantly elevated in people with DS starting in infancy (e.g., IL-9 and IL-17C). Nearly all adults with DS in this study had autoantibodies (98%) and most had six or more autoantibodies (63%), which differed significantly from euploid study participants. This implies that all patients with DS might benefit from early intervention with therapy to reduce inflammation. However, it is also worth considering that an alternative interpretation that since hypercytokinemia does not vary based on disease state in individuals with DS, this may not be a key factor driving autoimmunity (although it may be relevant for other clinical symptoms such as neuroinflammation).

      Small case series have suggested the benefit of JAK inhibitors to treat autoimmunity in DS. This is the first report of a prospective clinical trial to test a JAK inhibitor in this setting. The clinical trial entry criteria included moderate to severe autoimmune skin disease in patients aged 12-50 years with DS, and treatment was with the JAK1/3 inhibitor tofacitinib. This clinical trial is a critically important step for the field. The early results support that treatment is well tolerated with an improvement of interferon scores in patients and reduction of autoantibodies. Most patients experienced clinical improvement, with alopecia areata having the greatest response. Treatment may not affect all skin diseases equally, for example of the 5 patients with hidradenitis suppurativa, only 1 showed clinical improvement based on skin score. While very promising, the clinical trial results reported here are preliminary and based on an interim analysis of 10 patients at 16 weeks. Individuals with DS have a lifelong risk of immune dysregulation and thus it is unclear how long therapy, if of benefit, would need to be continued. The results of longer-term therapy will be informative when considering the risks/benefits of this therapy.

      We thank the Reviewer for the very positive evaluation. We agree with the Reviewer that the hypercytokinemia of Down syndrome may contribute to other pathophysiological processes beyond autoimmune conditions. Although many cytokines elevated in Down syndrome have well demonstrated pathogenic roles in the etiology of autoimmune diseases in the general population (e.g., TNF-a, IL-6), their consistent upregulation in DS regardless of clinical evidence of autoimmune pathology indicates the existence of a prolonged pre-clinical period, where the hypercytokinemia likely precedes evident tissue damage and symptomology. Alternatively, it is possible that these elevated cytokines are contributing the overall pathophysiology of DS (e.g., neuroinflammation, cognitive impairments, complications from viral infections) without formal diagnosis of an autoimmune disease. We also agree with the Reviewer that not all immune skin conditions would respond equally to JAK inhibition. Based on recent approvals for JAK inhibitors in the immunodermatology field, it is expected that JAK inhibition would show the greatest benefits for alopecia areata, atopic dermatitis, and psoriasis, with less clear results for hidradenitis suppurativa. We hope to contribute to this field through the analysis of the full clinical trial cohort in the near future. Lastly, we strongly agree with the need to assess the value of long-term therapy with JAK inhibitors or other immune therapies in people with Down syndrome for various clinical endpoints.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This paper represents a huge amount of work on a condition whose patients' health and well-being have not always been prioritized, and only relatively recently has the immune dysregulation seen in patients with Down Syndrome (DS) been garnering major research interest.

      This paper provides an unparalleled examination of immune disorder in patients with DS. In a truly herculean effort, the authors provided the cumulative examination of over 440 patients with DS, confirmed the alterations in immune cell subsets (n=292, 96 controls) and multi-organ autoimmunity seen in these patients as they age, and identified autoantibody production that could contribute to conditions co-occurring in patients with DS. They also sought to look at whether the early immunosenescence seen in DS was due to the inflammatory profile by comparing age-associated markers in DS patients and euploid controls separately, finding that several markers are regulated with age regardless of group, while comparing the effect of age versus DS status on cytokine status identified inflammatory markers elevated in DS patients across the lifespan that do not increase with age or that increase with age only in the DS cohort. This is very interesting in the context of DS in particular, and immunity during aging in general.

      The second part of the manuscript presents the results from a clinical trial with the JAK inhibitor tofacitinib in DS patients. While the number of DS patients treated with tofacitinib was small, the results were often quite striking. Treatment was well-tolerated and the improvement of dermatological conditions was clear. The less responsive patients AA4 and AA2 provide a very clear illustration that these patients are sensitive to immune triggers during treatment. Additionally, the demonstration that patients' IFN scores and cytokine levels decreased without clear immunosuppression with tofacitinib treatment is encouraging, since treatment with this drug would need to be continuous. I would be curious to see if the patients added past the cutoff for interim analysis follow a similar trajectory. I would not ask the authors to add any data; the paper is well-written and logically constructed.

      I only have a small comment: I really did not like how Figure 2 a, d, and g tethered the coloring to the magnitude of fold change to show the effect of DS particularly for 2a and 2g. Given that these fold changes are quite modest, the coloring is very light and hard to distinguish. The clear takeaway is that the effect on T cells is greatest, but there must be a better way to illustrate this. Perhaps displaying this graph on a non-white background could help with contrast.

      We are grateful for the Reviewer’s very positive assessment of the manuscript and constructive feedback. We want to assure the Reviewer that similar analyses will be completed in the future for the entire cohort recruited into the trial to determine if similar trajectories and results are observed with the larger sample size. Additionally, following Reviewer’s guidance, we have modified the color scales in Figures 2a, d and g so that each panel is on its own dynamic range, thus emphasizing the differences within each immune cell lineage.

      Reviewer #2 (Recommendations For The Authors):

      • Although the focus of the patients in the first part of the paper is on autoimmune/inflammatory conditions, it will be useful to also list the non-autoimmune infectious manifestations for reference with prevalence data. For example, otitis media, or lung infections (mentioned within the paper), or mucosal candidiasis. Same for other manifestations such as cardiac or malignant conditions. Given the impressive number of patients, it will be useful to the readers to have prevalence data for these as well, even in brief statements within the results.

      We appreciate this inquiry by the Reviewer. Following Reviewer’s guidance, we have included information on recurrent otitis media, frequent/recurrent pneumonia, congenital heart defects requiring repair, and various forms of leukemia. These additional data are presented in a revised Supplementary file 1 and briefly discussed in the results.

      • Have the authors looked at DN T cells and whether they may be enriched in DS patients, given their enrichment in some autoimmune conditions?

      Thanks for this inquiry. We did examine DN T cells (double negative T cells), which we referred to in our Figure 2 and Figure 2 – figure supplement 1 as non-CD4+ CD8+ T cells. Although this T cell subset is mildly elevated (in terms of frequency among T cells) in individuals with Down syndrome, the result did not reach statistical significance after multiple hypothesis correction. This negative result is shown in the heatmap in Figure 2 – figure supplement 1d.

      • It would be useful to move the segment of the discussion that discusses the interim predefined analysis of the phase 2 trial to the corresponding segment of the results. As this reviewer was reading the paper, it was unclear why the interim analysis was done, whether it was predefined and it was not until the discussion that it became apparent. I believe it will help the readers to have a brief mention that this interim analysis was predefined and set to occur at the first 10 DS enrollees. Also, it would be helpful to state what is the total number of DS patients planned for enrollment in the Phase 2 trial which is continuing recruitment.

      We appreciate this comment. Following the Reviewer’s guidance, we have revised the text to explain in the Results section that the interim analysis was predefined and triggered once the first 10 participants completed the 16 weeks of treatment. We also explain that the trial will be considered complete once a total of 40 participants undergo 16-weeks of treatment.

      • Although the authors present data on TPO autoantibodies before and after tofacitinib, it remains unclear whether the other non-TPO autoantibodies were altered during treatment or whether this was a TPO autoantibody-specific phenomenon. Was there an alteration in mature B cells or plasmablast populations after tofacitinib? If these data are available, they would further enhance the manuscript. If they are not available, it would be useful for the authors to discuss those in the discussion of the manuscript.

      We are grateful for this comment, which strongly aligns with our future research interests and plans for the analysis of the full cohort once the trial is completed. In the interim analysis, we analyzed only auto-antibodies related to autoimmune thyroid disease and celiac disease, as shown in the manuscript. However, we plan to complete a more comprehensive analysis of the effects of JAK inhibition on autoantibody production once the full sample set is available at the end of the trial. Likewise, the clinical trial protocol contemplates collection and processing of blood samples for immune mapping using mass cytometry, which will enable us to answer the question from the Reviewer about potential changes in B cells or plasmablast populations. Following Reviewer’s guidance, we discuss these planned analyses in the Discussion of the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) Cellular immune phenotyping data in Figure 2 presents a large number of patients with DS versus euploid controls (292 and 96 respectively). Given the relatively large cohort there would seem to be an opportunity to determine whether age or sex alters the immune phenotype shown, for example, TEMRAs, etc. Was the data analyzed in this way?

      We welcome this comment, which clearly aligns with our research interests and planned additional analyses of these datasets generated by the Human Trisome Project. We can share with the Reviewer that although sex as a biological variable has minimal impacts on the strong immune dysregulation observed in Down syndrome, there are clear age-dependent effects, with some immune changes occurring early during childhood versus others taking place later in adult life. A manuscript describing a complete analysis of age-dependent effects on the multi-omics datasets in the Human Trisome Project is currently under preparation.

      (2) The authors should strongly consider incorporating/discussing the findings from Gansa et al, Journal of Clinical Immunology May 2024 - where they reviewed the immune phenotype of 1299 patients with Down syndrome.

      Thanks for this publication to our attention, which is not cited in the revised manuscript.

      (3) It is difficult to differentiate patients Hs2 and Ps1 in Figure 5d.

      Thanks for this observation, we have modified the labels for greater clarity in the revised manuscript.

      (4) Given their finding of no correlation between cytokine levels/immune phenotype and autoimmunity, some additional discussion of the relevance of hypercytokinemia in the pathogenesis of autoimmunity would seem relevant (given that this was the basis for the clinical trial). The authors mention that cytokine levels may not be appropriate measures of disease in the patients.

      We welcome this suggestion and have revised the Discussion along these lines.

      (5) Data availability statement: appropriate.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We greatly appreciate the opportunity to submit a revision of our manuscript entitled: "The Autophagy Protein, ATG14 Safeguards Against Unscheduled Pyroptosis Activation to Enable Embryo Transport During Early Pregnancy" by Popli et al. We thank all three Referees for underscoring the importance of our findings as well as the constructive critiques that we used to improve our paper. Most notably, we added the following new data:

      · To provide more insight into whether pyroptosis activation occurs distinctly in the oviduct, we looked for GSDMD, (primary executioner of the pyroptosis pathway) expression in the uterus and ovary too. We observed no signs of pyroptosis activation in response to ATG14 loss in either the uterus or ovary of Atg14 cKO mice compared to control ones suggesting that ATG14 plays a distinct role in regulating pyroptosis specifically in the oviduct (Revised Figure 5F).

      · To better understand the molecular mechanisms of pyroptosis activation in the oviducts, we examined various key markers of mitochondrial integrity, architecture, and function in control and Atg14 cKO oviducts. Our findings indicate a significant loss of mitochondrial structural and functional integrity, possibly contributing to the embryo retention phenotype via activating the pyroptosis pathway in the oviduct. (Revised Figure 5B & C).

      · To address the spatiotemporal and region-specific expression of ATG14 in the oviduct, we performed immunofluorescence analysis and observed the consistent expression of ATG14 in all the cellular compartments of oviducts including ciliary epithelial cells, secretory epithelial cells, and smooth muscle cells. Moreover, the region-specific expression analysis revealed that distinct expression of ATG14 in the ampullary region of cKO mice oviduct helps to preserve its structural integrity. Conversely, its loss in the isthmus region of the oviduct in concordance with active PR-cre activity causes completely distorted epithelial structures with luminal obliteration or narrowing resulting in an unorganized and obstructed lumen leading to embryo retention, suggesting that ATG14 is essential for maintaining the structural integrity of the oviduct (Revised Figure 3F & S2A).

      · Considering the expression of PR-cre in the pituitary, which could potentially influence hormonal secretion and ovulation, we evaluated the levels of E2 and P4 during pregnancy. Our findings show that these hormone levels remained unchanged in Atg14 cKO mice, indicating that the absence of ATG14 does not negatively affect the HPG axis or pituitary function (Revised Figure 2F).

      · ATG14 is an essential factor for the initiation of autophagy, and its loss can lead to reduced or inhibited autophagic activity. Consistently, we observed elevated levels of LC3b and p62 proteins, two well-known markers of autophagic flux in the oviducts of Atg14-deficient mice implying that loss of ATG14 leads to defective autophagy potentially disturbing the structural integrity of oviductal epithelial cells and impairing embryo transport. (New Supplementary Figure S2B).   

      Reviewer #1 (Public Review):

      This study by Popli et al. evaluated the function of Atg14, an autophagy protein, in reproductive function using a conditional knockout mouse model. The authors showed that female mice lacking Atg14 were infertile partly due to defective embryo transport function of the oviduct and faulty uterine receptivity and decidualization using PgrCre/+; Atg14f/f mice. The findings from this work are exciting and novel. The authors demonstrated that a loss of Atg14 led to an excessive pyroptosis in the oviductal epithelial cells that compromises cellular integrity and structure, impeding the transport function of the oviduct. In addition, the authors use both genetic and pharmacological approaches to test the hypothesis. Therefore, the findings from this study are high-impact and likely reproducible. However, there are multiple major concerns that need to be addressed to improve the quality of the work.

      Major comments:

      (1) It is interesting that deletion of Atg14 using PgrCre results in pyroptosis only in the oviduct; the authors should speculate/evaluate why the oviduct, but not the uterus or follicles. Is there any cellular specificity that is sensitive to autophagy/pyroptosis in the oviduct but not in other cell types? This has not been evaluated or discussed in the manuscript. Is it possible to include GSDMD IHC for the uterine section to ensure that there was no pyroptosis event in the cKO uteri?

      We performed GSDMD IHC and found that, unlike in the oviduct, the cKO uteri and ovaries do not exhibit detectable pyroptosis (Revised Figure 5F). Additionally, we have added text to the discussion section addressing possible reasons for the differential impact of Atg14 loss on pyroptosis along the reproductive tract continuum (Line number: 532-538)

      (2) Please include an explanation of how a loss of Atg14, important for the initiation process of autophagy (as indicated in line 88), can lead to pyroptosis. There was some discussion about inflammation. But the connection is still missing.

      We thank the reviewer for noting on this. We have now included a possible explanation of how autophagy could impact pyroptosis in the discussion section (Line number: 532-538)  

      (3) No expression data of ATG14 using IHC/IF analysis were included in the manuscript - this is missing. This is needed and important as the authors found that Foxj1Cre/+; Atg14f/f cKO mice had no fertility defect. Is it possible that ATG14 is not present in the ciliated epithelial cells of the oviduct? In addition, the data in Figure 5B also points to this speculation. This is because the GSDMD (the pyroptosis marker) is only observed in the isthmus region but not the ampulla.

      We thank the reviewer for this nice suggestion. We performed the immunofluorescence analysis for ATG14 expression in control and Atg14 cKO oviducts and observed the consistent expression of ATG14 in all the cellular compartments of oviducts including ciliary epithelial cells, secretory epithelial cells, and smooth muscle cells (New Supplementary Figure S2A). We also looked for α-tubulin expressions in the oviduct of Foxj1Cre/+; Atg14 f/f mice and control mice and observed that ciliated epithelial cells that were positive for acetylated α-tubulin staining did not appear to be different in Foxj1Cre/+; Atg14 f/f mice oviduct compared to controls (Revised Figure 4C). However, due to the unavailability of reliable fluorescent-labeled antibodies for both Foxj1 and Atg14, we were unable to conduct the co-localization study as intended. This limitation hindered our ability to precisely determine the spatial overlap of these proteins within the tissue.

      (4) In line with the previous comment, is ATG14 present in the human Fallopian tube? If so, which cell type? This needs to be addressed.

      Author’s Response: We appreciate the reviewer's valuable suggestion. While we currently lack access to human fallopian tube biopsies, the Human Protein Atlas (https://www.proteinatlas.org/ENSG00000126775-ATG14) demonstrates distinct ATG14 expression in various fallopian tube cell types, with localization in the cytoplasm, membrane, and nucleus.

      (5) As PgrCre is also expressed in the pituitary, is it possible that the deletion of Atg14 using PgrCre would affect pituitary function – hence a change in the FSH/LH secretion that subsequently affects ovulation? Although the uterine and ovarian histology in the Atg14 cKO looks similar to the controls, is it possible that cyclicity is also affected? The authors should evaluate whether the estrous cycle takes place regularly.

      Author’s Response: Thank you for the insightful comment. However, evaluating the estrous cycle requires significant time and effort and is beyond the scope of the current manuscript. Nonetheless, we have now shown that both P4 and E2 levels were not altered in Atg14 cKO mice, indicating that the loss of Atg14 did not adversely impact the HPG axis, and by extension, pituitary function (Revised Figure 2F).

      (6) The number of total embryos/oocytes in the cKO compared to the control has not been evaluated - this data must be included. Do the changes in autophagy in Atg14 cKO affect preimplantation embryo development? Please categorize the embryos found in the oviduct/uterus in both genotypes. i.e., % blastocyst, % morula, % developmentally delayed, % non-viable etc. It would be interesting to evaluate if the oviduct with heavy pyroptosis can support preimplantation embryo development.

      Author’s Response: We thank the reviewer for this nice suggestion. We categorized the embryos into different categories as suggested and included the data (Revised Figure 3C and Figure 6D).

      (7) It is unclear why the superovulation+mating experiment (Figure 3C) was performed. Please provide justification. Why was the data from natural mating (Figure 3A) insufficient?

      Author’s Response: In Figure 3C, superovulation was employed to complement the natural mating studies and to provide stronger evidence for the embryo retention phenotype observed in the oviduct.

      (8) In lines 297-298, the conclusion that "ATG14 is required for P4-mediated but not for E2-mediated actions during uterine receptivity" is not entirely correct. This is because the authors also observed that the downregulation of MUC1 (E2-target protein) is absent in the PgrCre/+;Atg14f/f cKO female uteri.

      We thank the reviewer for noting this. We detected more E2-induced targets in D-4 pregnant uterine samples and found no change in their expression in response to Atg14 depletion in cKO females (Revised Figure 2E).

      (9) Figure 3D: Please include an image that also represents the ampulla region. All images are from the isthmus region. It would be informative to see if the loss of cell boundaries also takes place at the ampulla region in the cKO oviduct.

      We thank the reviewer for this nice suggestion. We included the ampulla section from the cKO and control female oviducts (Revised Figure 3F). As PR-cre activity is limited to isthmus only [1, 2], we did not see any structural abnormality in ampulla sections of cKO oviducts.

      (10) Figure 3E: Please indicate which region the TEM was performed. Isthmus? Ampulla? Were the changes in mitochondrial phenotype observed across all oviductal regions?

      The TEM imaging was performed by the WashU Core services. Although we clearly mentioned the core person to look into the isthmus region only, we are not sure if they accurately follow the instructions.

      (11) Figure 4B; the evaluation of FOXJ1 IHC. The authors need to include sections that also have an ampulla region-especially in the cKO. In addition, it is misleading to state that there were fewer FOXJ1+ cells (line 361) in the cKO if the region being evaluated is the isthmus (which has a lot fewer ciliated epithelial cells in general) while the control image showed an ampulla where the abundancy of ciliated epithelial cells (FOXJ1+) is higher than that of the isthmus. The authors also need to include a higher resolution image (a zoom-in at the ciliated epithelial cells with FOXJ1+ signal) as well as the quantification of FOXJ1+ cells.

      We appreciate the reviewer for the suggestion. In Figure 4A, we have already shown the ampulla region from both control and cKO oviducts, wherein alpha-tubulin staining was evident in both oviducts.  

      We agree with the reviewer that the isthmus usually has fewer ciliary epithelial cells than the ampulla, however, as illustrated in Figures 4A and 4B, Atg14 depletion causes a marked disruption of structural integrity with loss of cell boundaries specifically in the isthmus, which is far more pronounced than in the ampulla. One reason for this is the reported Pgr Cre activity, which is much more robust in the isthmus than in the ampulla [1, 2] . This disruption leads to the substantial loss of both ciliated and secretory cells, compromising the epithelial architecture to such an extent that it is impossible to accurately quantify the Foxj1 signal as can be seen in higher resolution images in New Supplementary Figure S3.

      For more clarity, we modified the statement in the revised file (Line Number: 393-396)

      (12) All IHC/IF and embryo images need to include the scale bars.

      We thank the reviewer for this suggestion. We now included the scale bar in all the images.

      (13) Figure 5H: although IL1B is being discussed, there was no data in this study to support the figure.

      In Figure 5H, IL1B is presented as part of the pyroptosis signaling pathway. As we have already shown other key executioners of this pathway: Caspase 1 and GSDMD, we believe that additional IL1B data would not provide new insights beyond what has already been shown.

      Minor comments:

      (1) Please include n (sample size) for all data, including the histology image in the figure legends for all studies.

      We now included the sample size in figure legends for all data shown in the manuscript.

      (2) Line 32, did the authors mean to say, "Self-digestion of..." instead of "Self-digestion for..."?

      In Line 32, we meant, “Cellular self-digestion for female reproductive tract functions”. We have now corrected the statement.

      Fig. 1A - please include negative control.

      We included the negative control (Revised Figure 1)

      (3) Figure 1E left panel and Figure 4C - please label "Average no. of pups/female/litter" as each female has more than one litter over her reproductive lifespan. If the authors represent pups/females, then the number should be accumulative in the range of 35-40pups/females in the control group.

      We thank the reviewer for noting this. We now corrected the label in both Revised Figure 1E and Revised Figure 4E.

      (4) Line 273: please remove "& F" as there is no Figure F in the image.

      We removed “&F” from the Line 273.

      (5) The presence of CL is not always indicative of normal hormonal levels; therefore, the authors should include the measurement of progesterone levels at 3.5 dpc in the cKO compared to the control group. Hormonal regulation is also crucial for embryo transport.

      We thank the reviewer for this suggestion. We measured not only P4 but also E2 levels in D4 pregnant females and found no significant difference in their levels compared to corresponding controls (Revised Figure 2F).

      (6) Figure 2A shows that KRT expression is not present in the control uteri. Although the KRT8 levels may have decreased at 4 dpc, they should be present (see Figure S2A).

      We observed no decrease in KRT expression in control uteri on 5 dpc. We included better-resolution images for KRT expression (Revised Figure 2A).

      (7) The dotted white lines in Figure 2A are too thick. It's difficult to see the Ki67 positive signal in the luminal epithelial cells. Please also add a quantitative analysis of Ki67+ cells in the luminal epithelium vs. stromal cells.

      We now corrected the dotted lines in Revised Figure 2B. However, as the Ki-67 proliferation is evident in the representative images, we believe quantification analysis will not add anything new to the existing conclusion.

      (8) Figure 2D - the y-axis mentions the weight ratio. However, the figure legend describes the transcript levels of Atg14 - please correct this.

      We corrected the label in the revised manuscript.

      (9) Line 294 - Please correct Figure 2C to Figure 2B.

      We corrected it.

      (10) Line 308 - Please correct Figure 2E to Figure 2F.

      We corrected it.

      (11) Line 310 - Please correct Figure 2F to Figure 2G.

      We corrected it.

      (12) Line 311 - Please correct Figure 2F to Figure 2G.

      We corrected it.

      (13) Information in Figure S2A and S2B should be included in the main figure.

      We thank the reviewer for this nice suggestion. We now included the figures S2A and S2B in the main figure (Revised Figure 2C & D).

      (14) Figure 3C - due to a lot of cellular debris after flushing, it's difficult to see. But it seems like there are secondary follicles in the flushing of control oviducts - this is highly unlikely. This could be due to an artifact of an accidental poking of the ovaries during collection.

      We agree with the reviewer. It might be due to the unintentional poking of the ovaries. We will take extra care in future experiments to avoid this and ensure clean flushing to prevent any confusion from debris or artifacts.

      (15) Figure 2B and Figure 3D signals from DAPI are missing - it's black with no blue signal. This could be the data loss during file compression for manuscript submission.

      We included better-resolution pictures for the DAPI signal in Revised Figure 2B & Figure 3F.

      (16) Explain why some embryos in the cKO make it to the uterus when the females are superovulated.

      It might be due to the heightened hormonal stimulation provided by the superovulation which could facilitate the movement of some embryos through the oviduct despite any defects or abnormalities caused by the loss of ATG14 in the oviduct.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Popli et al investigated the roles of the autophagy-related gene, Atg14, in the female reproductive tract (FRT) using conditional knockout mouse models. By ablation of Atg14 in both oviduct and uterus with PR-Cre (Atg14 cKO), the authors discovered that such females are completely infertile. They went on to show that Atg14 cKO females have impaired embryo implantation and uterus receptivity due to impaired response to P4 stimulation and stromal decidualization. In addition to the uterus defect, the authors also discovered that early embryos are trapped inside the oviduct and cannot be efficiently transported to the uterus in these females. They went on to show that oviduct epithelium in Atg14 cKO females showed increased pyroptosis, which disrupts oviduct epithelial integrity and leads to obstructive oviduct lumen and impaired embryo transport. Therefore, the authors concluded that autophagy is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable proper embryo transport.

      Strengths:

      This study revealed an important and unexpected role of the autophagy-related gene Atg14 in preventing pyroptosis and maintaining oviduct epithelial integrity, which is poorly studied in the field of reproductive biology. The study is well designed to test the roles ofATG14 in mouse oviduct and uterus. The experimental data in general support the conclusion and the interpretations are mostly accurate. This work should be of interest to reproductive biologists and scientists in the field of autophagy and pyroptosis.

      Weaknesses:

      Despite the strengths, there are several major weaknesses raising concerns. In addition, the mismatched figure panels, the undefined acronyms, and the poor description/presentation of some of the data significantly hinder the readability of the manuscript.

      (1) In the abstract, the authors stated that "autophagy is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable embryo transport". This statement is not substantiated. Although Atg14 is an autophagy-related gene and plays a critical role in oviduct homeostasis, the authors did not show a direct link between autophagy and pyroptosis/oviduct integrity. In addition, the authors pointed out in the last paragraph of the introduction that none of the other autophagy-related genes (ATG16L, FIP200, BECN1) exhibited any discernable impact on oviduct function. Therefore, the oviduct defect is caused by Atg14 specifically, not necessarily by autophagy.

      We thank the reviewer for noting this. We corrected the statement in the revised manuscript (Line number: 53-54).

      (2) In lines 412-414, the authors stated that "Atg14 ablation in the oviduct causes activation of pyroptosis", which is also not supported by the experimental data. The authors did not show that Atg14 is expressed in oviduct cells. PR-Cre is also not specific in oviduct cells. It is possible that Atg14 knockout in other PR-expressing tissues (such as the uterus) indirectly activates pyroptosis in the oviduct. More experiments will be required to support this claim. In line with the no defect when Atg14 has knocked out in oviduct ciliary cells, it will be good to use the secretory cells Cre, such as Pax8-Cre, to demonstrate that Atg14 functions in the secretory cells of the oviduct thus supporting this conclusion.

      We now included the ATG14 expression data in the oviduct (New Supplementary Figure S2A). Consistent with previous studies reporting PR-cre activity in the isthmus [1, 2] , we observed that Atg14 depletion was more pronounced in the isthmus compared to the ampulla. However, generating a secretory Pax-8 cell Cre mice model will require a substantial amount of time and effort, and we respectfully note that this is beyond the scope of the current manuscript.

      (3) With FOXJ1-Cre, the authors attempted to specifically knockout Atg14 in ciliary cells, but there are no clear fertility and embryo implantation defects in Foxj1/Atg14 cKO mice. The author should provide verification data to show that Atg14 had been effectively depleted in ciliary cells if Atg14 is normally expressed.

      We understand the reviewer’s concern. We included new data for ATG14 expression in control and Atg14 cKO mice oviducts (New Supplementary Figure S2A). However, due to the unavailability of reliable fluorescent-labeled antibodies for both Foxj1 and Atg14, we could not conduct the co-localization studies as intended, and this limitation hindered our ability to precisely determine the spatial overlap of these proteins within the oviduct. Nonetheless, Foxj1-cre is a widely used mice model with reported cre-activity in ciliary epithelial cells including oviduct tissues [3]. Given the widespread expression of ATG14 in all the ciliary and secretory cells (New Supplementary Figure S2A) and distinct FOXJ1 expression in the oviduct (New Supplementary Figure S3), we are confident that Atg14 is deleted in the ciliary epithelial cells of Foxj1/Atg14 cKO mice oviducts.

      (4) In lines 307-313, the author tested whether ATG14 is required for the decidualization of HESCs. The author stated that "Control siRNA transfected cells when treated with EPC seemed to change their morphological transformation from fibroblastic to epithelioid (Fig. 2E) and had increased expression of the decidualization markers IGFBP1 and PRL by day three only (Fig. 2F)". First, the labels in Figure 2 are not corresponding to the description in the text. Second, the morphology of the HESCs in the control and Atg14 siRNA group showed no obvious difference even at day 3 and day 6. The author should point out the difference in each panel and explain in the text or figure legend.

      Decidualization is a post-implantation event, whereas our study primarily focuses on pre-implantation events in the oviduct. Therefore, we have removed all data related to human and mouse decidualization to enhance the clarity and precision of our study.

      (5) In lines 332-336, the authors pointed out that the cKO mice oviduct lining shows marked eosinophilic cytoplasmic change, but there's no data to support the claim. In addition, the authors further described that "some of the cells showed degenerative changes with cytoplasmic vacuolization and nuclear pyknosis, loss of nuclear polarity, and loss of distinct cell borders giving an appearance of fusion of cells (Fig. 3D)". First, Figure 3D did not show all these phenotypes, and it is likely a mismatch to Figure 3E. Even in Figure 3E, it is not obvious to notice all the phenotypes described here. The figure legend is overly simple, and there's no explanation of the arrowheads in the panel. More data/images are required to support the claim here and provide a clear indication and explanation in the figure legend.

      Dr. Ramya Masand, Chief pathologist in the Pathology Department at the Baylor College of Medicine, and a contributing author, assessed the H&E-stained oviduct sections from control and cKO mice. We have now included a new Supplementary Figure S3 with previous representative H&E images that depict the cellular alterations described in lines 332–336.

      (6) In lines 317-325, it is rather confusing about the description of the portion of embryos from the oviduct and uterus. In addition, the total number of embryos was not provided. I would recommend presenting the numerical data to show the average embryos from the oviduct and uterus instead of using the percentage data in Figures 3A and 5G.

      We thank the reviewer for this nice suggestion. We calculated the average number of embryos and found no difference in the number of embryos recovered from cKO or polyphyllin-treated pregnant mice at 4 dpc compared to their controls. (New Supplementary Figure S4A & B).

      (7) In lines 389-391, authors tested whether Polyphyllin VI treatment led to activated pyroptosis and blocked embryo transport. Although Figures 5F-G showed the expected embryo transport defect, the authors did not show the pyroptosis and oviduct morphology. It will be important to show that the Polyphyllin VI treatment indeed led to oviduct pyroptosis and lumen disruption.

      We performed the GSDMD staining IHC in Polyphyllin VI or vehicle-treated mice oviducts and observed elevated GSDMD expression with Polyphyllin V (New Figure 6E). However, no significant lumen disruption was detected, which may be attributed to the short-term exposure of the oviducts to pyroptosis induction, in contrast to the more pleiotropic effects observed in genetically induced models. Nonetheless, this observation clearly indicates that unscheduled or unwarranted activation of pyroptosis impedes embryo transport.

      (8) In line 378, it would be better to include a description of pyroptosis and its molecular mechanisms to help readers better understand your experiments. Alternatively, you can add it in the introduction.

      We thank the reviewer for this nice suggestion. We included literature on the pyroptosis pathway in the introduction section (Line Number: 105-118).

      (9) Please make sure to provide definitions for the acronyms such as FRT, HESCs, GSDMD, etc.

      We added definitions for the acronyms such as FRT, HESCs, and GSDMD used in the study.

      (10) It is rather confusing to use oviducal cell plasticity in this manuscript. The work illustrated the oviducal epithelial integrity, not the plasticity.

      We thank the reviewer for the suggestion. We have revised the manuscript accordingly to ensure clarity and precision in describing the oviductal epithelial structural changes observed in the absence of ATG14.

      A few of the additional comments for authors to consider improving the manuscript are listed below.

      (1) Some of the figures are missing scale bars, while others have inconsistent scale bars. It would be better to be consistent.

      We now included the scale bars in all images.

      (2) On a couple of occasions, the DAPI signal cannot be seen, such as in Figure 2B and Figure 3D.

      We now included better-resolution images for the DAPI signal in all fluorescent images shown in the revised manuscript.

      (3) Overall, the figure legends can be improved to provide more detailed information to help the reader to interpret the data.

      We included additional details in all the figure legends in the revised manuscript.

      (4) In Figure 2D, the Y-axis showed the stimulated/unstimulated uterine weight ratio, why did the author put "Atg14" at the top of the graph? At the same time, the X-axis title is missing in Figure 2D.

      We apologize for the typo error. We removed “Atg14” from the top of the graph and included the X-axis title in the revised manuscript.

      (5) In the left panel of Figure 2G, "ATG14" at the top should be "Atg14" to be consistent.

      In Figure 2G, we are representing “ATG14” according to human gene annotation.

      (6) In line 559, there miss "(A)" in front of Immunofluorescence analysis of GSDMD.

      We thank the reviewer for noting this. We corrected it in the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Pooja Popli and co-authors tested the importance of Atg14 in the female reproductive tract by conditionally deleting Atg14 using Pr Cre and also Foxj1cre. The authors showed that loss of Atg14 leads to infertility due to the retention of embryos within the oviduct. The authors further concluded that the retention of embryos within the oviduct is due to pyroptosis in oviduct cells leading to defective cellular integrity. The manuscript has some interesting findings, however there are also areas that could be improved.

      Strengths:

      The importance of Atg14 and autophagy in the female reproductive tract is incompletely understood. The manuscript also provide spatial evidence about a new mechanism linking Atg14 to pyroptosis.

      We thank the reviewer for the positive statements and constructive comments on our manuscript.

      Weaknesses:

      (1) It is not clear why the loss of Atg14 selectively induces Pyroptosis within oviduct cells but not in other cellular compartments. The authors should demonstrate that these events are not happening in uterine cells.

      We thank the reviewer for this nice suggestion. We performed GSDMD IHC and found that, unlike in the oviduct, the cKO uteri and ovaries do not exhibit detectable pyroptosis (Revised Figure 5F). Additionally, we have added text to the discussion section addressing possible reasons for the differential impact of Atg14 loss on pyroptosis along the reproductive tract continuum (Line number: 532-538)

      (2) The manuscript never showed any effect on the autophagy upon loss of Atg14. Is there any effect on autophagy upon Atg14 loss? If so, does that contribute to the observation?

      We thank the reviewer for the nice suggestion. We found LC3b and p62 protein levels, two well-known markers of autophagic flux are elevated due to Atg14 loss in the oviduct (New Supplementary Figure S2B).  Since, p62 accumulation is an indicative of the reduced autophagic flux [4], we posit loss of Atg14 results in defective autophagy in the oviduct. Importantly, this defective autophagy adversely impacted the structural integrity of oviductal epithelial cells, causing impairment in embryo transport.

      (3) It is not clear what the authors meant by cellular plasticity and integrity. There is no evidence provided in that aspect that the plasticity of oviduct cells is lost. Similarly, more experimental evidence is necessary for the conclusion about cellular integrity.

      We thank the reviewer for the suggestion. We have revised the text for clarity and precision in describing the oviductal epithelial structural changes observed in the absence of ATG14. To avoid ambiguity, we have removed the term "cellular plasticity." We have already provided extensive evidence, including multiple H&E stains and immunofluorescence analyses for KRT8 and smooth muscle actin to illustrate cellular integrity in both control and cKO oviducts. However, we respectfully believe that performing additional experiments on cellular integrity would not contribute further to the conclusions already drawn.

      (4) The mitochondrial phenotype shown in Figure 3 didn't appear as severe as it is described in the results section. The analyses should be more thorough. They should include multiple frames (in supplemental information) showing mitochondrial morphology in multiple cells. The authors should also test that aspect in uterine cells. The authors should measure Feret's diagram. Diff erence in membrane potential etc. for a definitive conclusion.

      We appreciate the reviewer’s suggestion. We carried out the TOM20 (mitochondrial structural marker) and cytochrome C (mitochondrial damage and cell death marker) immune-colocalization study and found loss of TOM20 signal with concomitant cytochrome c leakage into the peri-nuclear space (Revised Figure 5B). Additionally, we also observed reduced expression of mitochondrial structural and functional markers by qPCR analysis (Revised Figure 5C). However, we respectfully argue that conducting membrane potential studies on murine oviducts is extremely complex and is beyond the scope of this study.

      (5) The comment that the loss of Atg14 and pyroptosis leads to the narrowing of the lumen in the oviduct should be experimentally shown.

      We have now included a New Supplementary Figure S3 with representative previous immunofluorescence images that clearly show the narrowing of the lumen with Atg14 loss in the oviduct.

      (6) The manuscript never showed the proper mechanism through which Atg14 loss induces pyroptosis. The authors should link the mechanism.

      We respectfully disagree with the reviewer on this point. We have provided substantial evidence regarding the cellular mechanisms through which the loss of Atg14 may lead to the activation of pyroptosis as outlined below:

      (1) Cellular Changes: Loss of ATG14 in the oviduct results in cellular swelling and the formation of fused membranous structures, which are characteristic features of pyroptosis activation.

      (2) Expression of Key Pyroptosis Proteins: We observed an induced expression of GSDMD and Caspase-1, primary executioners of the pyroptotic pathway, in response to Atg14 loss.

      (3) Inflammatory Markers: Elevated levels of inflammatory markers such as TNF-α and CXCR3 were detected, both of which are known to promote pyroptosis [5, 6].

      (4) Mitochondrial Damage: We have added new data demonstrating disrupted colocalization of TOM20 (a mitochondrial structural marker) and Cytochrome c (a cell death marker), resulting in Cytochrome c leakage into the perinuclear space (Revised Figure 5B). Additionally, qPCR analysis revealed reduced expression of mitochondrial structural and functional markers in cKO oviduct tissues (Revised Figure 5C).

      Based on these evidences, we can clearly say that Atg14 has some direct or indirect link to inflammasome activation. However, understanding the complex rheostat between the Atg14-mediated autophagy and inflammation regulatory axis will necessitate future studies employing sophisticated models, such as combined knockout mice where ATG14 is deleted alongside key inflammatory regulators (e.g., NLRP3, GSDMD, or CASPASE-1). These dual knockout models could provide crucial insights into how ATG14 modulates inflammatory pathways.

      References:

      (1) Herrera, G.G.B., et al., Oviductal Retention of Embryos in Female Mice Lacking Estrogen Receptor alpha in the Isthmus and the Uterus. Endocrinology, 2020. 161(2).

      (2) Soyal, S.M., et al., Cre-mediated recombination in cell lineages that express the progesterone receptor. Genesis, 2005. 41(2): p. 58-66.

      (3) Zhang, Y., et al., A transgenic FOXJ1-Cre system for gene inactivation in ciliated epithelial cells. Am J Respir Cell Mol Biol, 2007. 36(5): p. 515-9.

      (4) Mizushima, N., T. Yoshimori, and B. Levine, Methods in mammalian autophagy research. Cell, 2010. 140(3): p. 313-26.

      (5) Vaher, H., Expanding the knowledge of tumour necrosis factor-alpha-induced gasdermin E-mediated pyroptosis in psoriasis. Br J Dermatol, 2024. 191(3): p. 319-320.

      (6) Liu, C., et al., CXCR4-BTK axis mediate pyroptosis and lipid peroxidation in early brain injury after subarachnoid hemorrhage via NLRP3 inflammasome and NF-kappaB pathway. Redox Biol, 2023. 68: p. 102960.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The anatomical connectivity of the claustrum and the role of its output projections has, thus far, not been studied in detail. The aim of this study was to map the outputs of the endopiriform (EN) region of the claustrum complex, and understand their functional role. Here the authors have combined sophisticated intersectional viral tracing techniques, and ex vivo electrophysiology to map the neural circuitry of EN outputs to vCA1, and shown that optogenetic inhibition of the EN→vCA1 projection impairs both social and object recognition memory. Interestingly the authors find that the EN neurons target inhibitory interneurons providing a mechanism for feedforward inhibition of vCA1.

      Strengths:

      The strength of this study was the application of a multilevel analysis approach combining a number of state-of-the-art techniques to dissect the contribution of the EN→vCA1 to memory function.

      Weaknesses:

      Some authors would disagree that the vCA1 represents a 'node for recognition of familiarity' especially for object recognition although that is not to say that it might play some role in discrimination, as shown by the authors. I note however that the references provided in the Introduction, concerning the role of vCA1 in memory refer to anxiety, social memory, temporal order memory, and not novel object recognition memory. Given the additional projections to the piriform cortex shown in the results, I wonder to what extent the observations may be explained by odour recognition effects.

      We have added references demonstrating that the ventral hippocampus contributes to object recognition memory in rodents (Broadbent NJ et al., Learn Mem 2010; Titulaer J et al., Front Behav Neurosci 2021).

      The odor recognition effect is an interesting perspective that we have also considered. However, in our object recognition test, the same odor (70% EtOH) was used for both objects, yet the mice were able to discriminate between the familiar and novel objects. This suggests that the likelihood of the odor cue contributing to their performance in object discrimination test is low.

      In addition, I wondered whether the impairments in discrimination following Chemogenetic inhibition of the EN→vCA1 were due to the subject treating the novel and familiar stimuli as either both novel- which might be observed as an increase in exploration, or both stimuli as familiar, with a decrease in overall exploration.

      We thank the reviewer for rising this interesting point. We analyzed the total exploration time (i.e., time in interaction zones in familiar and novel) during social discrimination test. The data is added to Fig. S9. Total exploration time was not affected by CNO treatment. This indicates inhibition of ENvCA1-proj. neurons reduced interaction time with the novel conspecific and increased interaction time with the familiar conspecific. The subject mice seem to give even weight on familiar and novel stimuli.

      Reviewer #2 (Public Review):

      Summary:

      Yamawaki et al., conducted a series of neuroanatomical tracing and whole-cell recording experiments to elucidate and characterise a relatively unknown pathway between the endopiriform (EN) and CA1 of the ventral hippocampus (vCA1) and to assess its functional role in social and object recognition using fibre photometry and dual vector chemogenetics. The main findings were that the EN sends robust projections to the vCA1 that colateralise to the prefrontal cortex, lateral entorhinal cortex, and piriform cortex, and these EN projection neurons terminate in the stratum lacunosum-moleculare (SLM) layer of distal vCA1, synapsing onto GABAergic neurons that span across the Pyramidal-Stratum Radiatum (SR) and SR-SML borders. It was also demonstrated that EN input disynaptically inhibits vCA1 pyramidal neurons. vCA1 projecting EN neurons receive afferent input from the piriform cortex, and from within EN. Finally, fibre photometry experiments revealed that vCA1 projecting EN neurons are most active when mice explore novel objects or conspecifics, and pathway-specific chemogenetic inhibition led to an impairment in the ability to discriminate between novel vs. familiar objects and conspecifics.

      This is an interesting mechanistic study that provides valuable insights into the function and connectivity patterns of afferent input from the endopiriform to the CA1 subfield of the ventral hippocampus. The authors propose that the EN input to the vCA1 interneurons provides a feedforward inhibition mechanism by which novelty detection could be promoted. The experiments appear to be carefully conducted, and the methodological approaches used are sound. The conclusions of the paper are supported by the data presented on the whole.

      We thank the reviewer for their positive comments on our work.

      The authors used dual retrograde tracing and observed that the highest percentage (~30%) of vCA1 projecting EN cells also projected to the PFC. They then employed an intersectional approach to show the presence of collaterals in other cortical areas such as the entorhinal cortex and piriform cortex in addition to the PFC. However, they state that 'Projection to prefrontal cortex was sparse relative to other areas, as expected based on the retrograde labeling data' (referring to Figure 2K) and subsequently appear to dismiss the initial data set indicating strong axonal projections to the PFC.

      Our interpretation is that 70% of the ENCA1-proj. population does not send collaterals to the PFC, suggesting that the PFC is not a major target for this population (unlike vCA1 where 100% of its population projects). This hypothesis is supported by our axon branching study, which showed lower axon density in the PFC compared to vCA1 (and other regions). We revised the text to 'much sparser relative to that of vCA1' (line 101) to facilitate a direct comparison with the retrograde and anterograde labeling study.

      Since this is a relatively unknown connection, it would be helpful if some evidence/discussion is provided for whether the EN projects to other subfields (CA3, DG) of the ventral hippocampus. This is important, as the retrograde tracer injections depicted in Figure 1B clearly show a spread of the tracer to vCA3 and potentially vDG and it is not possible to ascertain the regional specificity of the pathway.

      We addressed the potential caveat associated with the retrograde tracer injection, as mentioned by the reviewer, by performing intersectional axon branching analysis. This analysis demonstrated that EN axons are primarily located in the SLM of the distal CA1 subfield (Figs. 2, 3, S2). However, we occasionally observed very weak labeling in the CA3 or dentate gyrus. We modified our text (lines 106-108) and figure (Fig. S2D) to account for this.

      The vCA1 projecting EN cells appear to originate from an extensive range along the AP axis. Is there a topographical organization of these neurons within the vCA1? A detailed mapping of this kind would be valuable.

      This is an interesting question for future research. Our data show a non-uniform distribution of this cell type, suggesting the potential for topographic organization.

      Given this extensive range in the location of vCA1 EN originating cells, how were the targets (along the AP axis) in EP selected for the calcium imaging?

      Using our injection coordinates, ENvCA1-proj. neurons were consistently labeled at high density just posterior to the bregma (Fig. 1J). Therefore, we targeted this region for our imaging.

      The vCA1 has extensive reciprocal connections with the piriform cortex as well, which is in close proximity to the EN. How certain are the authors that the chemogenetic targeting was specific to the EN-vCA1 connection?

      We performed histology on every animal used in the behavioral study to examine the specificity of hM4D expression, and only included those with specific labeling in the EN.

      Raw data for the sociability and discrimination indices should be provided so that the readers can gain further insight into the nature of the impairment.

      The raw data for total interaction time during the social discrimination test has been added (Fig. S9F).

      Line 222: It is unclear how locomotor activity informs anxiety in the behavioral tests.

      The degree of exploratory behavior in a novel context is generally considered to infer anxiety levels in rodents. We have added a review paper (Ref 44, Prut, 2003) that discusses this point.

      Figure 7 title; It is stated that activity of EN neurons 'predict' social/object discrimination performance. However, caution must be exercised with this interpretation as the correlational data are underpowered (n=5-8). Furthermore, the results show a significant correlation between calcium event ratios and the discrimination index in the social discrimination test but not the object discrimination test.

      We added the sample size for EN calcium imaging during the object recognition memory test (Fig. 7G). The updated data indicate a significant correlation between EN activity and the object recognition index (N = 9, Pearson R = 0.8, p = 0.01).

      We have changed the title of Figure 7 to 'Activity of ENvCA1-proj. neurons correlates with social/object discrimination performance’.

      While both male and female mice were included in the anatomical tracing and recording experiments, only male mice were used for behavioral tests.

      The female behavior was highly inconsistent in the control condition of our social recognition memory paradigm; therefore, we decided to conduct the study with males. We will design a new behavioral paradigm for future studies to address this challenge.

      Reviewer #1 (Recommendations For The Authors):

      (1) It is not clear how the relative number of vCA1 projecting neurons in Figure 1H was acquired, not enough detail is presented in the methods section. To what extent could these data have been affected by differences in the size or anatomical position of the injection site in vCA1, which judging from the example fluorescent image in Figure 1B also appears to include CA3.

      We used AMaSiNe (Song et al. 2020) to semi-automatically quantify fluorescently labeled presynaptic neurons. This open-source software identifies the number and location of these cells across different regions based on the Allen Mouse Brain Common Framework. To control for transfection variability (e.g., due to slight differences in injection volume or site), we normalized the presynaptic cell count in each region by the total number of cells in regions of interest. We performed for N = 5 brain and found consistent trend as seen in Fig. 1H (grey lines).

      We have added the detailed method of quantification in the Materials and Methods section (line 393).

      (2) For a number of the results, the full statistical values are not presented in the Results section or figure legend.

      We have included the full statistical values in the figure legends of the revised manuscript.

      (3) It is not clear how much virus was injected in the different experiments (tract racing, electrophysiology, behaviour, etc.). The methods state 50-100ul, but there is no further detail in the results or figure legends.

      We have included the injected volumes of the virus in the revised manuscript.

      (4) Figure 2 mentions the CLA complex (line 702) but this is not defined in the text. Although the introduction does refer to the claustrum complex, there is no acronym.

      We have corrected the manuscript accordingly.

      (5) Line 131- 'we recorded from 3-4 GABAergic neurons' - presumably this is in each animal?

      We recorded 3 to 4 GABAergic neurons sequentially from the same slice to compare input strength. We have edited the text to clarify this (line 134).

      Reviewer #2 (Recommendations For The Authors):

      Figure 3C: It is not clear what the dashed lines labelled proximal and distal represent.

      It is the proximal and distal vCA1 regions where GFP signals were measured for Fig. 3D. We have modified the figure legend to clarify this (line 736).

      Figure 5D: what do the different colors represent? Different colors for one brain?

      I assume that the reviewer meant to refer to Fig. 4D instead of Fig. 5D. In Fig. 4D, one color indicates starter cells in one brain. To clarify this, we have edited the figure legend (line 748).

      Figure S6E: The images are low resolution and it is hard to decipher the exact locations of labeled neurons. Please provide more guidance (e/g/. labeling areas of interest).

      We have added reference lines and labels in Figure S6E.

      Some details are missing: what was the volume of AAV injected for each site/experiment; how was CNO made, and where was it purchased from?

      We have added this information (lines 330-331; 431-434).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This work presents a replicable difference in predictive processing between subjects with and without tinnitus. In two independent MEG studies and using a passive listening paradigm, the authors identify an enhanced prediction score in tinnitus subjects compared to control subjects. In the second study, individuals with and without tinnitus were carefully matched for hearing levels (next to age and sex), increasing the probability that the identified differences could truly be attributed to the presence of tinnitus. Results from the first study could successfully be replicated in the second, although the effect size was notably smaller.

      Throughout the manuscript, the authors provide a thoughtful interpretation of their key findings and offer several interesting directions for future studies. Their conclusions are fully supported by their findings. Moreover, the authors are sufficiently aware of the inherent limitations of cross-sectional studies.

      Strengths:

      The robustness of the identified differences in prediction scores between individuals with and without tinnitus is remarkable, especially as successful replication studies are rare in the tinnitus field. Moreover, the authors provide several plausible explanations for the decline of the effect size observed in the second study.

      The rigorous matching for hearing loss, in addition to age and sex, in the second study is an important strength. This ensures that the identified differences cannot be attributed to differences in hearing levels between the groups.

      The used methodology is explained clearly and in detail, ensuring that the used paradigms may be employed by other researchers in future studies. Moreover, the registering of the data collection and analysis methods for Study 2 as a Registered Report should be commended, as the authors have clearly adhered to the methods as registered.

      Weaknesses:

      Although the authors have been careful to match their experimental groups for age, sex, and hearing loss, there are other factors that may confound the current results. For example, subjects with tinnitus might present with psychological comorbidities such as anxiety and depression. The authors' exclusion of distress as a candidate for explaining the found effects is based solely on an assessment of tinnitus-related distress, while it is currently not possible to exclude the effects of elevated anxiety or depression levels on the results. Additionally, as the authors address in the discussion, the presence of hyperacusis may also play a role in predictive processing in this population.

      The authors write that sound intensity was individually determined by presenting a short audio sequence to the participants and adjusting the loudness according to an individual pleasant volume. Neural measurements made during listening paradigms might be influenced by sound intensity levels. The intensity levels chosen by the participants might therefore also have an effect on the outcomes. The authors currently do not provide information on the sound intensity levels in the experimental groups, making it impossible to assess whether sound intensity levels might have played a role.

      Thank you very much for your favorable and constructive evaluation of our manuscript. We agree with you on various additional confounds that we did not consider and included a section in our discussion. It is also correct that we did not include the sound intensity levels in our analysis, which is also a potential confound. Unfortunately, we do not have the data on the individual sound intensity levels but we included a section regarding this issue in our discussion as well.

      Line 937-949:

      “In both studies, tinnitus distress was not correlated with the reported prediction effects. Nevertheless, tinnitus can also be characterized by other features such as its loudness, pitch or duration which were not included in the experimental assessment. Additionally, we solely used a short version of the Mini-TQ (Goebel and Hiller, 1992) in Study 2, which did not allow us to relate prediction scores to subscales like sleep disturbances which potentially influence cognitive functioning and thus predictive processing. Next to sleeping disorders and distress, tinnitus is often also accompanied by psychological comorbidities such as depression or anxiety (Langguth, 2011) which are potential confounds of the results. For the work described in this manuscript the replicability of the core finding was of main importance. More studies are needed taking into account to assess relate the prediction patterns in more detail to aspects of tinnitus sensation and distress.”

      Reviewer #2 (Public Review):  

      Summary:  

      This study aimed to test experimentally a theoretical framework that aims to explain the perception of tinnitus, i.e., the perception of a phantom sound in the absence of external stimuli, through differences in auditory predictive coding patterns. To this aim, the researchers compared the neural activity preceding and following the perception of a sound using MEG in two different studies. The sounds could be highly predictable or random, depending on the experimental condition. They revealed that individuals with tinnitus and controls had different anticipatory predictions. This finding is a major step in characterizing the top-down mechanisms underlying sound perception in individuals with tinnitus.

      Strengths:  

      This article uses an elegant, well-constructed paradigm to assess the neural dynamics underlying auditory prediction. The findings presented in the first experiment were partially replicated in the second experiment, which included 80 participants. This large number of participants for an MEG study ensures very good statistical power and a strong level of evidence. The authors used advanced analysis techniques - Multivariate Pattern Analysis (MVPA) and classifier weights projection - to determine the neural patterns underlying the anticipation and perception of a sound for individuals with or without tinnitus. The authors evidenced different auditory prediction patterns associated with tinnitus. Overall, the conclusions of this paper are well supported, and the limitations of the study are clearly addressed and discussed.  

      Weaknesses:  

      Even though the authors took care of matching the participants in age and sex, the control could be more precise. Tinnitus is associated with various comorbidities, such as hearing loss, anxiety, depression, or sleep disorders. The authors assessed individuals' hearing thresholds with a pure tone audiogram, but they did not take into account the high frequencies (6 kHz to 16 kHz) in the patient/control matching. Moreover, other hearing dysfunctions, such as speech-in-noise deficits or hyperacusis, could have been taken into account to reinforce their claim that the observed predictive pattern was not linked to hearing deficits. Mental health and sleep disorders could also have been considered more precisely, as they were accounted for only indirectly with the score of the 10-item mini-TQ questionnaire evaluating tinnitus distress. Lastly, testing the links between the individuals' scores in auditory prediction and tinnitus characteristics, such as pitch, loudness, duration, and occurrence (how often it is perceived during the day), would have been highly informative.

      Thank you very much for your careful and constructive evaluation. We agree with the weaknesses stated in our manuscript and aimed to highlight these aspects more in our analyses and discussion, so future studies can take them into account (see e.g., line 937949). 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      I would strongly recommend the inclusion of data on the used sound intensity levels. It would be very useful to assess whether there are any group differences regarding sound intensity of the stimuli, to exclude any effects of sound intensity on the results.

      We agree with you that - next to experimental aspects like the stimulus frequencies and the number of trials - the sound intensity levels potentially influence the effects as well. Unfortunately, this data was not saved during the experimental procedure and we are not able to include this as a variable in our analyses. As we, however, acknowledge this issue and want to provide guidelines for future research, we added a section to our discussion targeting sound intensity levels. 

      Line 902-913:

      “Thirdly, both studies used individual sound intensity levels to ensure a comfortable listening situation for the participants. These differences in sound intensity levels are, however, a potential confound in the experimental design as well since sound intensity can have an impact on neural responses (Thaerig et al., 2008). Although in this design, we expect the intensity levels balanced equally to the hearing loss of the participants (which did not differ between groups), and basic decoding of sound frequency did not differ in both studies, we are not able to ultimately exclude the sound intensity level as a driver of our effects. Future studies should include a perceived loudness matching for each frequency and should compare the adapted sound intensity values between each group or integrate them into the analysis (e.g., using the logistic regression approach in Fig. 8).”

      Reviewer #2 (Recommendations For The Authors):

      Major comments

      Introduction

      • The authors wrote: "Overall, this situation calls for the pursuit of alternative or complementary models that place less emphasis on the hearing status of the individual." They clearly demonstrated that the altered-gain model focuses on hearing loss and does not overcome the three described limitations. However, they mentioned other models focusing on brain activity outside of the auditive pathway (noise cancellation, map reorganization, specific neural networks. The authors should better explain the novelty of their approach compared to the existing ones.

      Thank you for your input. The inconclusive results and open questions about the altered-gain framework let us search for a different theoretical foundation for this work. We agree with you, that there are other models such as the map reorganization theory or neural network models next to the altered gain model and recent literature showed results supporting these frameworks (see e.g., a review from our group discussing tinnitus research in MEG over the last 10 years, Reisinger et al. (2023)). Nevertheless, as we focus on prediction processes, the Bayesian inference framework in tinnitus (Sedley et al., 2016) fits best for our approach. As we stated in line 113-116 “The Bayesian inference framework could, therefore, explain the experience of tinnitus in lieu of any increase in neural activity in the auditory system, or indicate an additional alteration, on top of hearing loss, for tinnitus to be perceived”, this framework differs from the other models and demonstrate a novel approach in tinnitus research. The novelty in this work is our methodological approach, which allows for explicit analyses of predictive patterns, irrespective of the exact location in the brain. This is a first step towards our actual underlying question whether aberrant auditory prediction patterns act as a neural correlate of tinnitus or rather as a risk factor or disposition. In our opinion, this question is of crucial relevance for understanding tinnitus processes on a neural level and our robust effects highlight the necessity to investigate these predictive processes in a longitudinal manner. We included a paragraph in our manuscript to make this more apparent for the reader. 

      Line 128-137:

      “We utilized a powerful, recently established experimental approach (Demarchi et al., 2019) showing anticipatory activations of tonotopically specific auditory templates for regular tone sequences. This method allows us to explicitly investigate predictive patterns in line with the Bayesian inference framework (Sedley et al., 2016), leading towards the overall question whether alterations in predictive coding can be interpreted as a neural correlate of tinnitus or rather as a risk factor. Since this question can solely be targeted in a longitudinal manner, we aimed in a first step to investigate prediction patterns in tinnitus over two independent samples, deriving robust effects that should be considered in future research.”

      • "This conceptual model bridges several explanatory gaps: for example, the inconsistent findings in humans regarding the "altered gain" view which states enhanced neural activity in the auditory pathway". What are "the inconsistent findings in humans regarding the 'altered gain'"? It would be helpful if the authors were more explicit about their idea here and added reference(s) to support it.

      Thank you for pointing that out. We agree with you that this section lacks clarity and we aimed to be more precise. 

      Line 108-116:

      “This conceptual model bridges several explanatory gaps: for example, the inconsistent findings in humans regarding the “altered gain” view which states altered neural activity in the auditory pathway. Recent findings vary in both the targeted frequency bands and the direction of the reported power changes which impede consistent conclusions (Eggermont and Roberts, 2015; Elgohyen et al., 2015, Reisinger et al., 2023). The Bayesian inference framework could, therefore, explain the experience of tinnitus in lieu of any increase in neural activity in the auditory system, or indicate an additional alteration, on top of hearing loss, for tinnitus to be perceived.”

      • I suggest moving this part to the discussion:

      "However, alternative explanations cannot be excluded with certainty, such as tinnitus being the cause of altered prediction tendencies or that there is a third variable being responsible for predictions and tinnitus development. Furthermore, even if altered predictive tendencies were to be found, there could be various possibilities of exactly how they could be altered to contribute to the onset or persistence of tinnitus. Some further clarity might then be gained through longitudinal studies in humans or animals."

      Thank you for your suggestion, we moved this part to the corresponding section in the discussion.

      Line 742-756:

      “Distinct predictive processing patterns could e.g., either develop within an individual in contributing to chronification of tinnitus (e.g., shift of “default prediction” from silence to sound; Sedley, 2019). Alternatively, they could be conceived as sensory processing style, making certain individuals more vulnerable to develop tinnitus under certain conditions (e.g., hearing loss, aging), a notion reminiscent of the “strong prior” hypothesis of hallucinations (Corlett et al., 2019). Hence, the direction of the effect remains unclear and alternative explanations, such as a third variable being responsible for predictions and tinnitus development, cannot be excluded with certainty. Furthermore, even if altered predictive tendencies were to be found, there could be various possibilities of exactly how they could be altered to contribute to the onset or persistence of tinnitus. In any case, any more conclusive claims would require longitudinal data, ideally with a tinnitus-free baseline. As such research is challenging to implement, especially in humans, we first focused in this work on finding cross-sectional group differences between individuals with and without tinnitus.”

      Methods

      Participants

      • "We calculated the individual mean hearing ability based on the values for 500, 1000, 2000, and 4000 Hz, which is a common approach for averaging results of pure-tone audiometry". Even if this method has been used multiple times in the literature, I would not recommend it as it can hide differences. Hearing loss is usually larger at high frequencies (starting at 6 000 Hz). An average threshold calculated with those central frequencies is more relevant for clinical use than in research. I strongly recommend performing a linear model with the factors Frequency (including all tested frequencies), Group, Ear side, and their interactions to precisely test the group differences in hearing thresholds.

      Thank you for pointing that out. We agree with you that higher frequencies are of potential interest as well when analyzing hearing loss. We included your suggested linear model in our methods section and the results were in line with our assumption that the groups did not differ substantially. Additionally, we included another logistic regression model in our exploratory analyses when investigating the influence of hearing loss on the prediction scores. Once more, the addition of higher frequencies did not substantially influence the effects.

      Line 194-203:

      “We calculated the individual mean hearing ability based on the values for 500, 1000, 2000, and 4000 Hz, which is a common approach for averaging results of pure-tone audiometry (i.e., PTA-4, see for example Lin et al. (2011); Ozdek et al. (2010)). Using independent t-tests, we found no differences in hearing status over frequencies between groups for the left(t=-1.19, p=.238) and right ear (t=-1.72, p=.09). An additional linear regression including all frequencies from 125 Hz to 8000 Hz also showed that hearing thresholds did not differ between ears (b=0.311, SE=1.600, p=.846) and groups (b=1.702, SE=1.553, p=.273), but solely between frequencies (b=0.003, SE=0.000, p<.001). Interactions were not significant as well.”

      Line 712-725:

      “As these logistic regression models were computed using an average hearing score computed over the frequencies 500, 1000, 2000, and 4000 Hz (i.e., PTA-4, see for example Lin et al. (2011); Ozdek et al. (2010)), we questioned whether hearing loss in higher frequencies influenced our effects. We therefore computed an additional logistic regression including also the PTA values of 6000 and 8000 Hz. In this analysis, hearing loss was not a significant predictor of tinnitus but rather showed a trend with b\=0.211, SE\=0.111, p\=.062. Prediction scores, however, remained a significant predictor of tinnitus even after including high-frequency hearing loss (b\=0.232, SE\=0.111, p\=.040). In this analysis, odds ratios indicated an increase of 26% in the odds of having tinnitus with a one standard deviation increase in the prediction score. Overall, this analysis strongly supports the notion that the main effect genuinely reflects a process related to the experience or statistical risk of experiencing tinnitus.”

      Stimuli and experimental procedure

      • Can you explain the use of movies during sound listening? And not an active listening task with oddball events, for example, to ensure that the subject attention is directed to the sounds?

      Thank you for your comment. We agree with you that attention is a relevant factor and with our design we cannot exclude potential attention effects on our findings. We chose this paradigm since previous research in our group including this exact experimental design (Demarchi et al., 2019) impressively demonstrated the formation of feature-specific auditory predictions in the brain and we aimed to investigate to what extent this can be detected in the tinnitus brain.

      We acknowledged this issue in our discussion (see line 916-919): “In the current work, we used passive listening tasks including a movie to reduce attentional focus on the presented stimuli. Therefore, we cannot draw conclusions whether differences in attention had an influence on the effects. Future studies should include more manipulations of attention to investigate its relevance”. 

      Results

      Pre-stimulus effects are not related to hearing loss and tinnitus-related features

      • How was the hearing loss calculated for this analysis? I recommend a PCA on the hearing levels, to get individual scores with a data-driven approach. Usually, the first dimension will be an average of all the frequencies. The second should be a difference between low and high frequencies. The same comment applies to study 2.

      Thank you for pointing that out. In the first study, participant groups were not controlled for hearing loss and pure-tone audiograms were solely averaged over all frequencies and both ears. As we marked out throughout the manuscript, insufficient control for hearing loss was the key issue in study 1 which led to the implementation of study 2. Further, we do not have data about the hearing status of every participant in study 1 and we do therefore not believe that a more complex approach for calculating hearing loss will increase interpretability in study 1. Nevertheless, we agree with you that it is not apparent how hearing loss was calculated in study 1. The results of the pure-tone audiometry were averaged over all frequencies and both ears, but no cut-off values were defined to characterize hearing loss. We therefore highly appreciate your detailed revision of our manuscript and adjusted the phrasing in the corresponding section. With our approach, it is not justifiable to talk about hearing loss but rather hearing thresholds. As for study 2, the methodological approach was reviewed and accepted as a Registered Report and we therefore do not want to deviate drastically from our pre-registered approach.

      Line 162-165:

      “Standardized pure-tone audiometric testing for frequencies from 125Hz to 8kHz was performed in 31 out of 34 tinnitus participants using Interacoustic AS608 audiometer.

      Averages were computed over all frequencies and both ears.”

      Line 356-362:

      “In the whole sample of participants with tinnitus (n=34) we performed a Spearman correlation of the β-coefficient values corresponding to the time-point of the maximum and the minimum t-value in intergroup analysis (comprised of positive and negative significant clusters emerging in group comparison for sound trials) with hearing thresholds (averaged audiogram for both ears), tinnitus loudness (10-point scale) and tinnitus distress scores (TQ).”

      Line 463-464:

      See as well Line 471-481.

      Line 491-495:

      “Our main findings are: 1) basic processing of carrier frequencies are not altered in tinnitus; 2) with increasing regularity of the sequence, individuals with tinnitus show relatively enhanced predictions of frequency information; 3) the effect is not related to hearing thresholds and tinnitus distress or loudness in this sample.”

      • In the methods, the authors indicated that the volume was adjusted individually at a pleasant volume. Can authors test if the volume was related to the individual's accuracy? Did they test that all frequencies were audible for all participants?

      Thank you for your feedback. We agree with you that it would be interesting to see whether sound intensity levels were related to the accuracy. Unfortunately, data regarding the volume was not saved during the experimental procedure and we are not able to include this as a variable in our analyses. We acknowledge this issue and added a section to our discussion targeting sound intensity levels. As for the second question, the individual volume adjustment was also meant to guarantee that all frequencies were audible for the participant. We clarified this in the methods section. Overall, it is important to mention that we did not find any differences between groups in the decoding of random tones (see Fig. 2 and Fig. 6C), indicating that the volume did not substantially have an influence on one group compared to the other.

      Line 232-234:

      “Sound intensity was individually determined by presenting a short audio sequence to the participants and adjusting the loudness according to an individual pleasant volume with all four frequencies audible for the participant.”

      Line 902-913:

      “Thirdly, both studies used individual sound intensity levels to ensure a comfortable listening situation for the participants. These differences in sound intensity levels are, however, a potential confound in the experimental design as well since sound intensity can have an impact on neural responses (Thaerig et al., 2008). Although in this design, we expect the intensity levels balanced equally to the hearing loss of the participants (which did not differ between groups), and basic decoding of sound frequency did not differ in both studies, we are not able to ultimately exclude the sound intensity level as a driver of our effects. Future studies should include a perceived loudness matching for each frequency and should compare the adapted sound intensity values between each group or integrate them into the analysis (e.g., using the logistic regression approach in Fig. 8).”

      Pre-stimulus differences in ordered and random tone sequences are not related to tinnitus distress • Accuracy was not correlated with tinnitus distress. Could the authors test if the accuracy was related to other clinical data, such as tinnitus pitch, duration, and loudness? And at the subscales of the mini-TQ?

      We appreciate your constructive feedback and agree with you that other tinnitus features such as pitch, duration, or loudness are also interesting in this regard. Unfortunately, these features were not assessed in study 2 and we are therefore not able to provide this information. Additionally, we solely used a short version of the Mini-TQ in this study and did not assess all subscales but rather used all available items for calculating tinnitus distress. This is a limitation of our study design and we included it in the discussion.

      Line 937-949:

      “In both studies, tinnitus distress was not correlated with the reported prediction effects. Nevertheless, tinnitus can also be characterized by other features such as its loudness, pitch or duration which were not included in the experimental assessment. Additionally, we solely used a short version of the Mini-TQ (Goebel and Hiller, 1992) in Study 2, which did not allow us to relate prediction scores to subscales like sleep disturbances which potentially influence cognitive functioning and thus predictive processing. [...] More studies are needed taking into account to assess relate the prediction patterns in more detail to aspects of tinnitus sensation and distress.”

      The strength of group effects differs between the two studies

      • This section should be in the discussion, not the results

      Thank you for your valuable input. In this section, we show comparisons between the two studies and report Bayes factors over time for the differences in decoding accuracy (see Figure 7A). We introduce novel results and believe therefore that this section should remain in the results and is discussed later in the manuscript.  

      Discussion

      • Globally, the discussion is very long and a bit speculative. I recommend the authors shorten the discussion (especially the speculations), and delete the repetition.

      Thank you very much for your constructive feedback. We aimed to shorten our discussion and delete repetitions to increase clarity and readability.

      • The effect of hearing loss has been tested in this study, evaluated as the mean hearing threshold of 4 central frequencies. However, hearing abilities cannot be limited to a central audiogram. High frequencies, speech-in-noise abilities, or other hidden hearing loss can be impacted, even for individuals without hearing loss on 500Hz- 4000Hz. The conclusion on the prediction effect being independent of hearing loss should include this limitation.

      Thank you for pointing that out. We added this limitation to the discussion.

      Line 781-794:

      “In a complementary analysis, we used our prediction score in addition to hearing loss magnitudes as predictors of tinnitus in a logistic regression. Prediction related pre-activation levels were informative whether participants perceived tinnitus, also when statistically controlling for hearing loss. However, it has to be mentioned that we calculated hearing loss based on the PTA results of the frequencies between 500 and 4000 Hz. This does not reflect hearing impairments like high frequency hearing loss or hidden hearing loss (i.e., hearing difficulties despite a normal audiogram, Liberman (2015)). As for hidden hearing loss, we were not able to draw conclusions regarding our effects since this concept of hearing damage is difficult to measure objectively, especially in humans. However, we included an additional logistic regression expanding the frequency range up to 8000 Hz and again, hearing loss did not substantially impact the prediction score as an informative tinnitus predictor.”

      Line 712-723:

      “As these logistic regression models were computed using an average hearing score computed over the frequencies 500, 1000, 2000, and 4000 Hz (i.e., PTA-4, see for example Lin et al. (2011); Ozdek et al. (2010)), we questioned whether hearing loss in higher frequencies influenced our effects. We therefore computed an additional logistic regression including also the PTA values of 6000 and 8000 Hz. In this analysis, hearing loss was not a significant predictor of tinnitus but rather showed a trend with b\=0.211, SE\=0.111, p\=.062. Prediction scores, however, remained a significant predictor of tinnitus even after including high-frequency hearing loss (b\=0.232, SE\=0.111, p\=.040). In this analysis, odds ratios indicated an increase of 26% in the odds of having tinnitus with a one standard deviation increase in the prediction score.”

      • "An increased focus on hippocampal regions, e.g., in fMRI, patient, or animal studies, could be a worthwhile complement to our MEG work, given the outstanding relevance of medial temporal areas in the formation of associations in statistical learning paradigms (see e.g., Covington et al., (2018); Schapiro et al., (2016)).".

      in the opinion of this reviewer, this claim is not well introduced and should be removed.

      Thank you for pointing that out. In our opinion, an increased focus on hippocampal regions is an important consideration for future research and we decided to keep this part in the manuscript. However, we added a third reference highlighting the relevance of temporal areas in tinnitus to strengthen our claim. 

      Line 866-868:

      “... given the outstanding relevance of medial temporal areas in the formation of associations in statistical learning paradigms (see e.g., Covington et al., (2018); Paquette et al., (2017); Schapiro et al., (2016)).”

      References:

      Paquette, S., Fournier, P., Dupont, S., de Edelenyi, F. S., Galan, P., & Samson, S. (2017). Risk of tinnitus after medial temporal lobe surgery. JAMA neurology, 74(11), 1376-1377. https://doi.org/10.1001/jamaneurol.2017.2718.

      • "Overall, our work clearly underlines the true presence of differences, in terms of predictive processing, between individuals with and without tinnitus. At the same time, distinct design choices impact the strength of the effects which is not only apparent in the present work but was also reported recently by Yukhnovich and colleagues (2024). Further to controlling for basic variables (age, sex, hearing loss), future studies using our paradigm and analysis approach should opt for a broad frequency spacing (>2 octaves) and ideally more than 2000 trials per carrier frequency in the random sequence. These recommendations are likely even more important for efforts of testing this paradigm using EEG, which normally comes with inferior data quality as compared to MEG."

      This reviewer considers that the entire paragraph should be deleted, as the effects are already covered in the previous paragraph.

      Thank you very much for your feedback, however, we believe that this paragraph acts as a brief and accurate summary for our guidelines to improve future research in this field. This section therefore remained in the manuscript.

      Minor comments

      Introduction

      • "The onsets of tinnitus and hearing loss often do not occur at the same time ". This sentence should have a reference.

      We appreciate your careful evaluation of our manuscript and included a reference to the sentence pointing out hearing loss as a precursor of tinnitus.

      Line 95f.:

      “2) The onsets of tinnitus and hearing loss often do not occur at the same time (Roberts et al., 2010).” 

      Methods

      Participants

      • Participants' laterality needs to be mentioned.

      Thank you for your input. We agree with you that laterality is an interesting aspect that should be taken into account. Unfortunately, however, we did not assess this in the current design. We mentioned the lack of this information in the methods section.

      Line 158:

      “Laterality of the participants was not assessed.”

      176-177:

      “No participants with psychiatric or neurological diseases were included in the sample. Laterality of the participants was not assessed.”

      "Four individuals with tinnitus did not show any audiometric abnormality; four of the participants showed unilateral hearing impairments; 26 volunteers had high-frequency hearing loss; and six individuals were hearing impaired over most frequencies (i.e. hearing thresholds higher than 30 dB)."

      This part is not precise enough. "Unilateral hearing impairment": is it on one or multiple frequencies? "26 volunteers had high-frequency hearing loss". What is considered as highfrequency here? The precision "(i.e. hearing thresholds higher than 30 dB)" can be dropped as it was defined in the sentence just before.

      We appreciate your constructive feedback and added information to clarify the audiometric characteristics of our participants.

      Line 186-190:

      “Four individuals with tinnitus did not show any audiometric abnormality; four of the participants showed unilateral hearing impairments on at least one frequency; 26 volunteers had high-frequency hearing loss (i.e. hearing thresholds higher than 30 dB); and six individuals were hearing impaired over most frequencies (i.e. hearing thresholds higher than 30 dB).”

      Results

      • Figure 3C: are those group differences significant? It should be noted on the graphs.

      • Figure 6D: I would suggest to remove this figure, as the correlation is not significant.

      • Figure 7A: It would be useful to precise the number of trials for each study, in parenthesis.

      • Figure 8 is unnecessary.

      Thank you for your careful assessment of our figures. We agree with you that significance should be indicated in Figure 3C and that the precise number of trials is relevant information in Figure 7A. We corrected the figures accordingly. However, the Figures 6D and 8 remained in the manuscript since they were already part of our Registered Report and we do not want to remove graphical information that was reviewed and accepted already.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Kimura et al performed a saturation mutagenesis study of CDKN2A to assess the functionality of all possible missense variants and compare them to previously identified pathogenic variants. They also compared their assay result with those from in silico predictors.

      Strengths:

      CDKN2A is an important gene that modulates cell cycle and apoptosis, therefore it is critical to accurately assess the functionality of missense variants. Overall, the paper reads well and touches upon major discoveries in a logical manner.

      Weaknesses:

      The paper lacks proper details for experiments and basic data, leaving the results less convincing. Analyses are superficial and do not provide variant-level resolution.

      We thank the reviewer for their comments. We have updated the manuscript to include additional detail of experimental methods and variant level resolution of data and analyses. We have also conducted additional analyses to compare variant classifications using a gamma generalized linear model and log2 normalized fold change, establish the effect of low variant coverage on variant functional classifications, determine the performance of combining multiple in silico predictions, and determine the prevalence of functionally deleterious variants in gnomAD and functionally deleterious variants of uncertain significance in ClinVar compared all CDKN2A missense variants.

      Reviewer #2 (Public Review):

      This study describes a deep mutational scan across CDKN2A using suppression of cell proliferation in pancreatic adenocarcinoma cells as a readout for CDKN2A function. The results are also compared to in silico variant predictors currently utilized by the current diagnostic frameworks to gauge these predictors' performance. The authors also functionally classify CDKN2A somatic mutations in cancers across different tissues.

      This study is a potentially important contribution to the field of cancer variant interpretation for CDKN2A, but is almost impossible to review because of the severe lack of details regarding the methods and incompleteness of the data provided with the paper. We do believe that the cell proliferation suppression assay is robust and works, but when it comes to the screening of the library of CDKN2A variants the lack of primary data and experimental detail prevents assessment of the scientific merit and experimental rigor.

      We are grateful for the opportunity to clarify our experimental methods and to provide additional data in the revised manuscript. The manuscript has been updated to include, among other changes, additional information on assay design, analysis of variant representation in the library, inclusion of primary data with variant level resolution, and a comparison of variant classifications using a gamma generalized linear model and log2 normalized fold change.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major issues:

      (1) Can the pathogenicity values of individual amino acid changes be opened to the public? It would serve as a valuable asset to the community.

      Thank you for your suggestion. We are happy to provide this information. Individual variant data and functional classifications from the functional assay are given in Appendix 1-table 4.

      (2) In the method section, it is not clear (at least to the reviewer) whether the protocol describing the construction of the CDKN2A missense library was provided.

      Thank you for your comment. We have included additional information in the manuscript describing construction of the CDKN2A missense library.

      “CDKN2A expression plasmid libraries

      Codon-optimized CDKN2A cDNA using p16INK4A amino acid sequence (NP_000068.1), was designed (Appendix 1-table 12) and pLJM1 containing codon optimized CDKN2A (pLJM1-CDKN2A) generated by Twist Bioscience (South San Francisco, CA). 156 plasmid libraries were then synthesized by using pLJM1-CDKN2A, such that each library contained all possible 20 amino acids variants (19 missense and 1 synonymous) at a given position, generating 500 ng of each plasmid library (Twist Bioscience, South San Francisco, CA). The proportion of variant in each library was shown in Appendix 1-table 2. Variants with a representation of less than 1% in a plasmid library were individually generated using the Q5 Site-Directed Mutagenesis kit (New England Biolabs, Ipswich, MA; catalog no. E0552), and added to each library to a calculated proportion of 5%. Primers used for site-directed mutagenesis are given in Appendix 1-table 13. Each library was then amplified to generate at least 5 ug of plasmid DNA using QIAGEN Plasmid Midi Kit (QIAGEN, Germantown, MD; catalog no. 12143).”

      (3) The paper lacks basic experimental results. The results cover almost all possible missense variants, but it would be clearer if actual coverage values used for calculating relative enrichment were shown. Are all variants well covered? Isn't there any spurious signal due to low coverage? How many times were the experiments performed? Also, how many cells were used, what was the expected MOI, and what proportion of harvested cells is thought to have a single variant? How can you distinguish the effect of a single variant from a multiple variants effect?

      We thank the reviewer for their comment. We have provided additional information in the manuscript to address these issues. Briefly, in response to each issue:

      (1) We have provided read count data for all variants, used to determine functional classifications based on either gamma generalized linear model or normalized fold change, in Appendix 1-table 4.

      (2) To assess if low variant coverage resulted in spurious signals, we compared prevalence of functionally deleterious classifications among variants binned by coverage in the Day 9 cell pool. We did not identify any statistically significant differences based on variant coverage.

      “We also determined whether underrepresentation in the cell pool at Day 9 affected variant functional classifications. Fifty-three of 2,964 missense variants (1.8%) were present in the cell pool at Day 9 of the first assay replicate (experiment 1) at < 2%, as determined by the number of sequence reads supporting the variant (Figure 2 -figure supplement 4A, Appendix 1-table 4). There was no statistically significant difference in the proportion of variants classified as functionally deleterious for variants present in less than 2% of the cell pool at Day 9 (12 of 53 variants; 22.6%), and variants present in more than 2% of the cell pool (496 of 2,911 variants; 17.0%) (P value = 0.28) (Figure 2 -figure supplement 4B). We also found no significant differences in the proportion of variants classified as functionally deleterious for variants present in more than 2% of the cell pool at Day 9 when variants were binned in 1% intervals (Figure 2 -figure supplement 4B).”

      (3) The assay was repeated in duplicate for 28 CDKN2A residues. For the remaining 128 residues of CDKN2A, the assay was completed once. We found good agreement between variant classifications in assay repeats. We have added to the text as follows:

      “To confirm the reproducibility of our variant classifications, 28 amino acid residues were assayed in duplicate, and variants classified using the gamma GLM. The majority of missense variants, 452 of 560 (80.7%), had the same functional classification in each of the two replicates (Figure 2 -figure supplement 3A and B, Appendix 1-table 4).”

      We have also added discussion of this study limitation to the manuscript:

      “We repeated our functional assay twice for 28 CDKN2A residues. For the remaining 128 residues of CDKN2A, the functional assay was completed once. While we found general agreement between functional classifications from each replicate for the 28 residues assayed in duplicate, additional repeats for each residue are necessary to determine variability in variant functional classifications.”

      (4) We have added additional information about the number of cells used for transduction and MOI to the method section:

      “Lentiviral transduction

      PANC-1 cells were used for CDKN2A plasmid library and single variant CDKN2A expression plasmid transductions. PANC-1 cells previously transduced with pLJM1-CDKN2A (PANC-1CDKN2A) and selected with puromycin were used for CellTag library transductions. Briefly, 1 x 105 cells were cultured in media supplemented with 10 ug/ml polybrene and transduced with 4 x 107 transducing units per mL of lentivirus particles. Cells were then centrifuged at 1,200 x g for 1 hour. After 48 hours of culture at 37oC and 5% CO2, transduced cells were selected using 3 µg/ml puromycin (CDKN2A plasmid libraries and single variant CDKN2A expression plasmids) or 5 µg/ml blasticidin (CellTag plasmid library) for 7 days. Expected MOI was one. After selection, cells were trypsinized and 5 x 105 cells were seeded into T150 flasks. DNA was collected from remaining cells and this sample was named as (Day 9). T150 flasks were cultured until confluent and then DNA was collected. The time for cells to become confluent varied for each amino acid residue (Day 16 – 40, Appendix 1-table 5).”

      (5) Our assay was not designed to distinguish multiple variant effects. However, we do not anticipate multiple transductions to significantly impact variant classifications in our assay. We found that our functional classifications were consistent with previously reported classifications:

      “In general, our results were consistent with previously reported classifications. Of variants identified in patients with cancer and previously reported to be functionally deleterious in published literature and/or reported in ClinVar as pathogenic or likely pathogenic (benchmark pathogenic variants), 27 of 32 (84.4%) were functionally deleterious in our assay (Figure 2B, Figure 2 -figure supplement 1B and 1C, Appendix 1-table 4) (Chaffee et al., 2018; Chang et al., 2016; Horn et al., 2021; Hu et al., 2018; Kimura et al., 2022; McWilliams et al., 2018; Roberts et al., 2016; Zhen et al., 2015). Five benchmark pathogenic variants were characterized as indeterminate function, with log2 P values from -19.3 to -33.2. Of 156 synonymous variants and six missense variants previously reported to be functionally neutral in published literature and/or reported in ClinVar as benign or likely benign (benchmark benign variants), all were characterized as functionally neutral in our assay (Figure 2B, Figure 2 -figure supplement 1B and 1C, Appendix 1-table 4) (Kimura et al., 2022; McWilliams et al., 2018; Roberts et al., 2016). Of 31 VUSs previously reported to be functionally deleterious, 28 (90.3%) were functionally deleterious and 3 (9.7%) were of indeterminate function in our assay. Similarly, of 18 VUSs previously reported to be functionally neutral, 16 (88.9%) were functionally neutral and 2 (11.1%) were of indeterminate function in our assay, (Figure 2B, Figure 2 -figure supplement 1B and 1C, Appendix 1-table 4).”

      (4) Comparison of functional classifications (shown in Figure 3) from this study and other in silico tools is superficial. The analysis is based on the presumption that their result is gold-standard, thereby calculating the sensitivity, accuracy, and PPV of individual predictors. But apparently, this won't be true, so it would be more reasonable to check the "correlation" of the study results and other predictors: e.g. which variants show consistent results between this study and other predictors? Are there any indicators of consistent vs inconsistent results? How does the consistency change by protein sequences or domains? Etc

      Thank you for your comment. We have added additional analysis to our manuscript comparing our functional classifications with in silico variant effect predictions. Specifically, we have included analysis combining multiple predictors:

      “We also tested the effect of combining multiple in silico predictors. 904 missense variants had in silico predictions from all 7 algorithms. The remaining 2,060 missense variants had in silico predictions from 5 algorithms. Of variants with in silico predictions from all 7 algorithms, 378 (41.8%) had predictions of deleterious or pathogenic effect from a majority of algorithms (≥ 4), and of these, 137 (36.2%) were functionally deleterious in our assay. Similarly, of 2,060 missense variants that had in silico predictions from 5 algorithms, 1107 (53.7%) had predictions of deleterious or pathogenic effect from a majority of algorithms (≥ 3), of which, 361 (32.6%) were functionally deleterious in our assay (Appendix 1-table 7).”

      (5) Similarly, Figure 4 does not deliver much information, either. Rather than delivering a simple summary, it would be more informative if deeper analyses were conducted. e.g., do pathogenic variants show higher frequency among patients, or higher variant frequency in tumors (if data were available).

      We have included additional analysis of somatic alterations in the manuscript. We found pathogenic/likely pathogenic somatic mutations were enriched in patients. This was also the case for somatic mutations that were classified as functionally deleterious in our assay. We also found statistically significant depletion of functionally deleterious mutations in colorectal adenocarcinoma. Interestingly, no patients with a somatic mutation in a mismatch repair gene had a functionally deleterious CDKN2A missense somatic mutation. However, this observation was not statistically significant. Future studies will determine whether CDKN2A and MMR gene somatic mutations are mutually exclusive in colorectal adenocarcinoma.

      “We found that 34.2% - 53.4% of unique missense somatic mutations classified as functionally deleterious, with 61.4% - 67.6% of patients having a functionally deleterious somatic mutation (Figure 4A, Appendix 1-table 9). As with functionally deleterious variants, functionally deleterious missense somatic mutations were also not distributed evenly across CDKN2A, being enriched within the ankyrin repeat 3 (Figure 4B, Appendix 1-table 9). We found that 32.4% - 50.0% of all functionally deleterious missense somatic mutations occurred within ankyrin repeat 3, with 48.0% - 58.0% of patients in each cohort having a functionally deleterious missense somatic mutation in this domain. Notably, 65.7% - 76.0% of functionally deleterious missense somatic mutations in this domain were in residues 80-89 (Appendix 1-table 9).”

      “We were also able to determine the functional classification of CDKN2A missense somatic mutations in COSMIC, TCGA, JHU, and MSK-IMAPCT by cancer type. We found that 22.2% - 100% of CDKN2A missense somatic mutations were functionally deleterious depending on cancer type (Figure 4-figure supplement 2A-D). When considering missense somatic mutation reported in any database, there was a statistically significant depletion of functionally deleterious mutations in colorectal adenocarcinoma (20.4%; adjusted P value = 5.4 x 10-9) (Figure 4C). As the proportion of missense somatic mutations that were functionally deleterious was less in colorectal carcinoma compared to other types of cancer, we assessed whether somatic mutations in mismatch repair genes (MLH1, MLH3, MSH2, MSH6, PMS1, and PMS2) were associated with the functional status of CDKN2A missense somatic mutations. Thirty-five patients in COSMIC had a CDKN2A missense somatic mutation, of which 12 (34.3%) had a somatic mutation in a mismatch repair gene. We found that no patients with a somatic mutation in a mismatch repair gene had a functionally deleterious CDKN2A missense somatic mutation compared to 6 of 23 samples (26.1%) without a somatic mutation in a mismatch repair gene (P value = 0.062).”

      (6) It would be helpful to validate the neutral variants set. Are variants of UK biobank or gnomAD enriched on neutral population? Are synonymous variants exclusively found in neutral populations?

      Thank you for the suggestion. All synonymous variants were found to functionally neutral in our assay. We also assessed VUSs from gnomAD and found a lower prevalence of functionally deleterious variants compared to all CDKN2A variants and CDKN2A missense somatic mutations:

      “The Genome Aggregation Database (gnomAD) v4.1.0 reports 287 missense variants in CDKN2A, including the 13 pathogenic, 4 likely pathogenic, 3 likely benign, 3 benign, and 264 VUSs classified using ACMG variant interpretation guidelines (Figure 5A, Figure 5B, and Appendix 1-table 10). Of the 264 missense VUSs, 177 were functionally neutral (67.0%), 56 (21.2%) were indeterminate function, and 31 (11.7%) were functionally deleterious in our assay using the gamma GLM for classification (Figure 5C).”

      (7) They used a pancreatic cancer cell line and assayed for cell proliferation. The limitations of this method and the possibility of complementing the limitations should be discussed.

      Thank you for the suggestion. We have added discussion of this limitation to our manuscript:

      “We characterized variants based upon a broad cellular phenotype, cell proliferation, in a single PDAC cell line. It is possible that CDKN2A variant functional classifications are cell-specific and assay-specific. Our assay may not encompass all cellular functions of CDKN2A and an alternative assay of a specific CDKN2A function, such as CDK4 binding, may result in different variant functional classifications. Furthermore, CDKN2A variants may have different effects if alternative cell lines are used for the functional assay. However, cell-specific effects appear to be limited. In our previous study, we characterized 29 CDKN2A VUSs in three PDAC cell lines, using cell proliferation and cell cycle assays, and found agreement between all functional classifications (Kimura et al., 2022).”

      Minor issues:

      (1) Figures 2B, C: it would be more intuitive to plot significance by logging p-values than raw p-values.

      We used log2 P value (or log2 normalized fold change) for figures in the manuscript as appropriate.

      (2) Figure 2D: annotate protein domain information at the side. Supplementary Figure 2 shows the domains but it would be more informative to show it in Figure 2D heatmap.

      Thank you for the suggestion, we have annotated protein domain information on the left side of the heatmap in (the now) Figure 2C.

      Reviewer #2 (Recommendations For The Authors):

      Major Concerns:

      (1) How many replicates of the screen were performed? It seems like only one library infection/ proliferation assay was done. If so this is insufficient to obtain any idea of the uncertainty of measurement for each variant.

      The assay was repeated in duplicate for 28 CDKN2A residues. For the remaining 128 residues of CDKN2A, the assay was completed once. We found good agreement between variant classifications in assay repeats. We have added to the text as follows:

      “To confirm the reproducibility of our variant classifications, 28 amino acid residues were assayed in duplicate, and variants classified using the gamma GLM. The majority of missense variants, 452 of 560 (80.7%), had the same functional classification in each of the two replicates (Figure 2 -figure supplement 3A and B, Appendix 1-table 4).”

      We have also added discussion of this study limitation to the manuscript:

      “We repeated our functional assay twice for 28 CDKN2A residues. For the remaining 128 residues of CDKN2A, the functional assay was completed once. While we found general agreement between functional classifications from each replicate for the 28 residues assayed in duplicate, additional repeats for each residue are necessary to determine variability in variant functional classifications.”

      (2) The count data from the experiment and NGS pipeline to call variants need to be provided for each replication (i.e. the counts that were fed into the gamma model)

      Accompanying this should be information about the depth of sequencing of the cells, the number of cells infected with the library, and standard metrics for pooled screens.

      Quality metrics regarding the representation and completeness of the TWIST library need to be provided. See Brenan et al. Cell Reports (2016) Supplemental Figure 1

      Thank you for your suggestion. We are happy to provide this additional information. Sequence read counts for each variant are given in Appendix 1-table 4. We have provided addition detail in the methods section on functional assay, including number of cells infected with each library:

      “Lentiviral transduction

      PANC-1 cells were used for CDKN2A plasmid library and single variant CDKN2A expression plasmid transductions. PANC-1 cells previously transduced with pLJM1-CDKN2A (PANC-1CDKN2A) and selected with puromycin were used for CellTag library transductions. Briefly, 1 x 105 cells were cultured in media supplemented with 10 ug/ml polybrene and transduced with 4 x 107 transducing units per mL of lentivirus particles. Cells were then centrifuged at 1,200 x g for 1 hour. After 48 hours of culture at 37oC and 5% CO2, transduced cells were selected using 3 µg/ml puromycin (CDKN2A plasmid libraries and single variant CDKN2A expression plasmids) or 5 µg/ml blasticidin (CellTag plasmid library) for 7 days. Expected MOI was one. After selection, cells were trypsinized and 5 x 105 cells were seeded into T150 flasks. DNA was collected from remaining cells and this sample was named as (Day 9). T150 flasks were cultured until confluent and then DNA was collected. The time for cells to become confluent varied for each amino acid residue (Day 16 – 40, Appendix 1-table 5). DNA was extracted from PANC-1 cells using the PureLink Genomic DNA Mini Kit (Invitrogen, Carlsbad, CA; catalog no. K1820-01). The assay for CellTag library was repeated in triplicate. We repeated our CDKN2A assay in duplicate for 28 residues. For the remaining 128 CDKN2A residues the assay was completed once.”

      We have also provided additional information on the TWIST library:

      “CDKN2A expression plasmid libraries

      Codon-optimized CDKN2A cDNA using p16INK4A amino acid sequence (NP_000068.1), was designed (Appendix 1-table 12) and pLJM1 containing codon optimized CDKN2A (pLJM1-CDKN2A) generated by Twist Bioscience (South San Francisco, CA). 156 plasmid libraries were then synthesized by using pLJM1-CDKN2A, such that each library contained all possible 20 amino acids variants (19 missense and 1 synonymous) at a given position, generating 500 ng of each plasmid library (Twist Bioscience, South San Francisco, CA). The proportion of variant in each library was shown in Appendix 1-table 2. Variants with a representation of less than 1% in a plasmid library were individually generated using the Q5 Site-Directed Mutagenesis kit (New England Biolabs, Ipswich, MA; catalog no. E0552), and added to each library to a calculated proportion of 5%. Primers used for site-directed mutagenesis are given in Appendix 1-table 13. Each library was then amplified to generate at least 5 ug of plasmid DNA using QIAGEN Plasmid Midi Kit (QIAGEN, Germantown, MD; catalog no. 12143).”

      (3) It is unclear when barcode abundance is assessed in the cell proliferation assay/in the screen. The exact timepoints of "before and after in vitro culture" (line 91) need to be clarified in the text.

      We are happy to clarify. We collected DNA on Day 9 post transfection and at confluency. Day of confluency for each residue is detailed in Appendix 1-table 5. The text of the manuscript has been updated appropriately.

      (4) Is "before" day 9, as detailed in Figure 1 source data 1? If so, it is misleading to state that the experiment is in culture for 14 days but call day 9 "before... in vitro culture."

      The "before" sample should be obtained immediately after viral infection and selection with the library to provide a representation of library representation.

      We apologize for your confusion. We have clarified in the text and figures that our baseline measurement was at Day 9 post transfection. We also determined whether the proportion of each variant is maintained in the Day 9 cell pool compared to the amplified plasmid library for three CDKN2A amino acid residues (p.R24, p.H66, and p.A127) and updated the manuscript text:

      “To confirm that the representation of each variant was maintained after transduction, we transduced three lentiviral libraries (amino acid residues p.R24, p.H66, and p.A127) individually into PANC-1 cells and determined the proportion of each variant in the amplified plasmid library and in the cell pool at Day 9 post-transduction. The proportion of each variant in the amplified plasmid library and in the cell pool at Day 9 were highly correlated (Figure 1 -figure supplement 2C and D, Appendix 1-table 3).”

      (5) There is no information regarding the function of each variant, aside from just a p-value resulting from the final analysis with the gamma model. Some variants may cause loss of function, others may be neutral while others may be gain of function. Simply providing a p-value is not sufficient. The standard in the field is to provide a function score/ test-statistic giving the sign and magnitude of the effect. For proliferation assays at least a ratio of fold-change of (mut/ synonymous)[day 14] vs (mut/synonymous)[baseline] should be provided.

      Thank you for your comment. We have provided read counts, P values, and functional classifications for each variant using the gamma GLM in Appendix 1-table 4. We have also analyzed variants using log2 normalized fold change. This data is presented in the text and compared to our classifications with the gamma GLM. We have provided normalized fold change and resulting classification for each variant in Appendix 1-table 6.

      (6) A plot of the distribution of function scores for all variants is needed. This will serve as an effective visual to distinguish the control variants from those that are functionally deleterious or benign/neutral (see Findlay et al. Nature (2018) Figure 3A for an example visual).

      Thank you for your suggestion. We have provided additional figures to visualize distribution of assay outputs using the gamma GLM in Figure 2 -figure supplement 1.

      (7) Synonymous variants are used as a proxy for WT per variant library, but do all the synonymous variants truly behave like WT CDKN2A in their ability to suppress cell proliferation? A plot of the distribution of synonymous variant function relative to WT CDKN2A function would be effective here.

      All 156 synonymous variants suppressed cell proliferation and were classified as functionally neutral in our assay using the gamma GLM. The manuscript has been updated to reflect this:

      “Of 156 synonymous variants and six missense variants previously reported to be functionally neutral in published literature and/or reported in ClinVar as benign or likely benign (benchmark benign variants), all were characterized as functionally neutral in our assay (Figure 2B, Figure 2 -figure supplement 1B and 1C, Appendix 1-table 4)”

      (8) The gamma generalized linear model is not commonly used to analyze the results of saturation mutagenesis screens. Please provide a justification for the use of this analysis method vs using log fold change as other dms scan studies have done (PMID: 27760319, PMID: 30224644).

      Thank you for this important suggestion. We are happy to provide additional information. We used a gamma GLM to functionally characterize CDKN2A variants as it does not rely on an annotated set of pathogenic and benign variants to determine classification thresholds. Instead, classification thresholds are determined using the change in representation of 20 non-functional barcodes in a pool of PANC-1 cells stably expressing CDKN2A after a period of in vitro growth. As a gamma GLM is not commonly used for saturation mutagenesis screens, as noted by the reviewer, we also classified variants using log2 normalized fold change. We compared variant functional classifications using the gamma GLM and log2 normalized fold change and in general we found agreement between both methods with 98.5% of missense variants classified as functionally deleterious using a gamma GLM, similarly classified using log2 normalized fold change. We have updated the text to reflect this reasoning and additional analysis.

      (9) The statistical methods used to calculate enrichment of deleterious variants per region of CDKN2A (Figure 2 supplement 1B; lines 163-168) are not described anywhere in the paper. Additionally, the same statistical analysis is not applied to the variants in the subregions near the ankyrin repeats (lines 168-172).

      We are happy to clarify and have added text to the methods section:

      “Z-tests with multiple test correction performed with the Bonferroni method was used in the following comparisons: 1) proportion of functionally deleterious variants present in < 2% of the cell pool and ≥ 2% of the cell pool at Day 9 binned in 1% intervals, 2) proportion of variants in each domain predicted to have deleterious or pathogenic effect by the majority of algorithms, 3) proportion of functionally deleterious variants in each domain, and 4) proportion of functionally deleterious missense variants and somatic mutations.”

      Minor:

      (1) Please review the manuscript for spelling and grammatical errors.

      Sure.

    1. Author response:

      Reviewer #1:

      Weaknesses:

      However, the authors should conduct a more thorough computational analysis to complement their manuscript. While the identification of improved multi-point mutants is commendable, the manuscript lacks a detailed investigation into the mechanisms by which these mutations enhance protein properties. The authors briefly mention that some physicochemical characteristics of the mutants are unusual, but they do not delve into why these mutations result in improved performance. Could computational techniques, such as molecular dynamics simulations, be employed to explore the effects of these mutations?  Additionally, the authors claim that their method is efficient. However, the selected VHH is relatively short (<150 AA), resulting in lower computational costs. It remains unclear whether the computational cost of this approach would still be acceptable when designing larger proteins (>1000 AA). Besides, the design process involves a large number of prediction tasks, including the properties of both single-site saturation and multi-point mutants. The computational load is closely tied to the protein length and the number of mutation sites. Could the authors analyze the model's capability boundaries in this regard and discuss how scalable their approach is when dealing with larger proteins or more complex mutation tasks?

      We agree that further analysis of the mechanisms by which the identified mutations enhance protein performance would strengthen our study. In the revised manuscript, we plan to conduct molecular dynamics simulations to explore the physicochemical effects of these mutations in more details. This analysis will help elucidate how the observed structural and dynamic changes contribute to the improved resistance and stability of the designed VHH antibody.

      We acknowledge the need to assess the scalability of our method to larger proteins. To address this, we will include an analysis of the method’s performance when applied to longer proteins, including an estimation of computational cost and potential bottlenecks.

      Reviewer #2:

      (1) The writing throughout the paper is poor. This leaves the reader confused.

      (2) The main technical issue the authors address is whether AI can identify protein mutations that adapt to extreme environments based solely on natural protein data. However, the introduction could be more concise and focused on the key points to better clarify the significance of this question.

      (3) The authors did not develop a new model but instead used their previously developed Pro-PRIME model. This significantly weakens the novelty and contribution of this work.

      (4) The computational experiments are not well-justified. For instance, the authors used a zero-shot setting for single-point mutation experiments but opted for fine-tuning in multiple-point mutation experiments. There is no clear explanation for this discrepancy. How does the model perform in zero-shot settings for multiple-point mutations? How would fine-tuning affect single-point mutation results? The choice of these strategies seems arbitrary and lacks sufficient discussion.

      (1&2) We will revise the manuscript to improve the overall clarity and readability. Specifically, we will restructure the introduction to focus more concisely on the key scientific questions and contributions of our study.

      (3) While the Pro-PRIME model was previously developed, this work focuses on designing proteins with properties that do not naturally exist and are scarce in the natural world. To address the concern about novelty, we will expand the discussion to highlight this unique contribution and its implications for advancing protein design.

      (4) We appreciate the comment regarding the discrepancy between the zero-shot and fine-tuning strategies. In the revised manuscript, we will provide a detailed explanation for the choice of these settings, including an analysis of the trade-offs between zero-shot and fine-tuning approaches in multi-point mutation tasks. We will also explore the model’s performance in zero-shot settings for multi-point mutations and report these results in the supplementary materials to ensure completeness.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Qin and colleagues analysed data from the Human Connectome Project on four right-handed subgroups with different gyrification patterns in Heschl's gyrus. Based on these groups, the authors highlight the structure-function relationship of planum temporale asymmetry in lateralised language processing at the group level and next at the individual level. In particular, the authors propose that especially microstructural asymmetries are related to functional auditory language asymmetries in the planum temporale.

      Strengths:

      The study is interesting because of an ongoing and long-standing debate about the relationship between structural and functional brain asymmetries, and in particular whether structural brain asymmetries can be seen as markers of functional language brain lateralisation.

      In this debate, the relationship between Heschl's gyrus asymmetry and planum temporale asymmetry is rare and therefore valuable here. A large sample size and inter-rater reliability support the findings.

      Weaknesses:

      In this case of multiple brain measures, it would be important to provide the reader with some sort of effect size (e.g. Cohen's d) to help interpret the results.

      Thank you for pointing this out. In the revised version, the effect size, i.e., Cohen's d, has been incorporated into the results (page 8, line 159-160; page 9, line 181-186, supplementary page 14, Table S14).

      In addition, the authors highlight the microstructural results in spite of the macrostructural results. However, the macrostructural surface results are also strong. I would suggest either reducing the emphasis on micro vs macrostructural results or adding information to justify the microstructural importance.

      In the original manuscript, we highlighted the results of microstructural measures because the correlations between PT microstructural and functional measures were more pronounced both within the hemispheres and in terms of asymmetry, compared with the significant results of surface area. Following your comments here, we now lowered the tone of microstructure results (page 2, line 40; page 14, line 267), and added relevant discussion regarding the macrostructural results in the revised version (page 18, line 363-370; as copied below):

      “As for macrostructural measures, the asymmetric PT surface area was also associated with speech comprehension AI. Given that the within-hemispheric coupling tendency between surface and speech comprehension existed only in the left PT, it was possible that the larger surface area of the left PT led to a less recruitment of its right homologous, and therefore the lateralization of functional activity would be more pronounced. Additionally, an opposite tendency was found between the correlation of speech perception and comprehension with surface area, potentially implying the segregation of the different speech processing in the PT area.”

      Recommendations for the authors:

      I have only some comments that I wish to be addressed by the authors:

      (1) Please always specify "structural" or "functional" asymmetry or lateralisation, as the reader may be confused.

      This has been done in relevant places.

      (2) Please state that the scale is not the same between the results in Figure 3.

      This have been specified, as suggested (see below).

      “Notably, we did not standardize these structural measures, so the scales differed between indicators.”

      (3) It may be of interest to the reader to learn more about interpretations of how Heschl's gyrus and planum temporale asymmetries are related.

      Thank you for this comment. Given that the asymmetry of Heschl's gyrus was not analyzed in the present study, we do not have direct data/results for such an interpretation. Also, we reviewed the literature but found no relevant results on how Heschl's gyrus and planum temporale asymmetries are related. To address this, specific investigation targeting on this topic is needed. This has now been added in the discussion (page 20, line 415-417).

      (4) As this manuscript builds somewhat on the Science Advances article by Ocklenburg et al. (2018), it would be important to discuss how this more liberal planum temporale definition might (or might not) affect the results compared to the more conservative planum temporale definition described here.

      Yes, the definition of planum temporale varies across studies. Our current manual one is relatively more conservative than the Ocklenburg et al. (2018), in which the planum temporale was automatically derived from the Destrieux atlas. We believe that the definition of the planum temporale likely have non-trivial impact on the results, and our current manual definition with the consideration of the HG duplication should be more reliable and accurate, therefore favored, relative to the other ones. This has been briefly discussed in the revision (page 15-16, line 300-304).

      (5) I would like the authors to briefly but critically discuss what exactly the MRI NODDI model measures and how this is interpreted as measuring microstructural properties of tissue.

      We now provided relevant information regarding the NODDI measures (page 26, line 552-558; as copied below).

      “NODDI is a highly effective method for detecting key features of neurite morphology, which employs a tissue model that detects three microstructural environments: the intracellular, extracellular and cerebrospinal fluid compartments (Zhang et al., 2012). In the grey matter of the cerebral cortex, the neurite density index (NDI) is an estimated volume fraction of the intracellular microstructural environment, with higher NDIs indicating greater neurite density (Jespersen et al., 2010; Zhang et al., 2012). The orientation dispersion index (ODI) is a measure of the alignment or dispersion of neurite, with higher ODIs indicating more dispersed neurite and lower ODIs indicating more aligned neurite (Jespersen et al., 2012; Zhang et al., 2012).”

      (6) While not mandatory, I would be interested to read the authors' thoughts on the evolution of such a functional/(micro)structural lateralisation link of the planum temporale, in light of the literature on planum temporale asymmetries in (newborn) non-human primate species.

      Thank you for this inspiring suggestion. We have incorporated relevant discussion into the revised version (page 15, line 281-288; as copied below).

      “Moreover, there exist evolutionary evidence supporting the role of the PT as an anatomical substrate for language lateralization. For example, the leftward structural asymmetry of the PT have been observed in multiple non-human primates, including chimpanzees, macaques, and baboons (Becker et al., 2024; Gannon et al., 1998; Xia et al., 2019). Particularly, recent studies on baboons further demonstrated that PT structural leftward asymmetry in newborn baboons could predict future development of communicative gestures, implying a key role of PT structural asymmetry in the lateralized communication system for human and non-human brain evolution (Becker et al., 2024, 2021).”

      Reference

      Becker Y, Phelipon R, Marie D, Bouziane S, Marchetti R, Sein J, Velly L, Renaud L, Cermolacce A, Anton J-L, Nazarian B, Coulon O, Meguerditchian A. 2024. Planum temporale asymmetry in newborn monkeys predicts the future development of gestural communication’s handedness. Nat Commun 15:4791. doi:10.1038/s41467-024-47277-6

      Becker Y, Sein J, Velly L, Giacomino L, Renaud L, Lacoste R, Anton J-L, Nazarian B, Berne C, Meguerditchian A. 2021. Early Left-Planum Temporale Asymmetry in newborn monkeys (Papio anubis): A longitudinal structural MRI study at two stages of development. NeuroImage 227:117575. doi:10.1016/j.neuroimage.2020.117575

      Gannon PJ, Holloway RL, Broadfield DC, Braun AR. 1998. Asymmetry of Chimpanzee Planum Temporale: Humanlike Pattern of Wernicke’s Brain Language Area Homolog. Science 279:220–222. doi:10.1126/science.279.5348.220

      Jespersen SN, Bjarkam CR, Nyengaard JR, Chakravarty MM, Hansen B, Vosegaard T, Østergaard L, Yablonskiy D, Nielsen NChr, Vestergaard-Poulsen P. 2010. Neurite density from magnetic resonance diffusion measurements at ultrahigh field: Comparison with light microscopy and electron microscopy. NeuroImage 49:205–216. doi:10.1016/j.neuroimage.2009.08.053

      Jespersen SN, Leigland LA, Cornea A, Kroenke CD. 2012. Determination of Axonal and Dendritic Orientation Distributions Within the Developing Cerebral Cortex by Diffusion Tensor Imaging. IEEE Trans Med Imaging 31:16–32. doi:10.1109/TMI.2011.2162099

      Xia J, Wang F, Wu Z, Wang L, Zhang C, Shen D, Li G. 2019. Mapping hemispheric asymmetries of the macaque cerebral cortex during early brain development. Hum Brain Mapp. doi:10.1002/hbm.24789

      Zhang H, Schneider T, Wheeler-Kingshott CA, Alexander DC. 2012. NODDI: Practical in vivo neurite orientation dispersion and density imaging of the human brain. NeuroImage 61:1000–1016. doi:10.1016/j.neuroimage.2012.03.072

      Reviewer #2 (Public Review):

      Summary:

      The authors assessed the link between structural and functional lateralization in area PT, one of the brain areas that is known to exhibit strong structural lateralization, and which is known to be implicated in speech processing. Importantly, they included the sulcal configuration of Heschl's gyrus (HG), presenting either as a single or duplicated HG, in their analysis. They found several significant associations between microstructural indices and task-based functional lateralization, some of which depended on the sulcal configuration.

      Strengths:

      A clear strength is the large sample size (n=907), an openly available database, and the fact that HG morphology was manually classified in each individual. This allows for robust statistical testing of the effects across morphological categories, which is not often seen in the literature.

      Weaknesses:

      - Unfortunately, no left-handers were included in the study. It would have been a valuable addition to the literature, to study the effect of handedness on the observed associations, as many previous studies on this topic were not adequately powered. The fact that only right-handers were studied should be pointed out clearly in the introduction or even the abstract.

      Thank for pointing this out. We have explicitly specified this in the Abstract and Introduction.

      - The tasks to quantify functional lateralization were not specifically designed to pick up lateralization. In the interest of the sample size, it is understandable that the authors used the available HCP-task-battery results, however, it would have been feasible to access another dataset for validation. A targeted subset of results, concerning for example the relationship between sulcal morphology and task-based functional lateralization, could be re-assessed using other open-access fMRI datasets.

      Yes, the fMRI task was not specifically designed to evaluate PT functional lateralization, which has been acknowledged in the discussion (page 17, line 330-342). Given the observed small effect size of our current structural-functional relationship, reproducing similar results with other datasets would require a cohort with a large sample size. This would induce a quite labor-intensive work given our current manual protocol for outlining PT and HG for everyone. The lack of validation with independent dataset has been discussed as a limitation in the revised version. We will try to conduct such a validation in future work, likely after developing an automatic pipeline for accurately extracting the PT and HG in the individual space (like the manual outlining protocol).

      - The study is mainly descriptive and the general discussion of the findings in the larger context of brain lateralization comes a bit short. For example, are the observed effects in line with what we know from other 'language-relevant' areas? What could be the putative mechanisms that give rise to functional lateralization based on the microstructural markers observed? And which mechanisms might be underlying the formation of a duplicated HG?

      Thank you for these insightful comments. As suggested, we strengthened the discussion as below:

      “Another possible explanation could be that higher myelin content and larger surface area in left PT potentially indicated more white matter connection with other language-related regions such as Broca’s area, and therefore is more involved in language tasks than its right homolog (Allendorfer et al., 2016; Catani et al., 2005; Giampiccolo and Duffau, 2022).

      The distinct roles of left and right PT in speech processing have been well-documented. A number of studies substantiated that PT of the left hemisphere responded more strongly to lexical-semantic and syntactic aspects of sentence processing, whereas the right hemisphere demonstrated a greater involvement in the speech melody (Albouy et al., 2020; Meyer et al., 2002).

      These findings are consistent with those reported for the arcuate fasciculus (AF). The left AF has been identified as a crucial structure for language function (Giampiccolo and Duffau, 2022; Zhang et al., 2021). Disruption to this pathway has been linked to multimodal phonological and semantic deficits (Agosta et al., 2010), while injuries in the right AF did not affect language function (Zeineh et al., 2015).”

      Regarding the mechanism underlying the formation of a duplicated HG, we did not come up with good thoughts after careful literature review. Also, we feel that this is kind of out of the scope of the present study and therefore did not add more discussion on this topic.

      Recommendations for the authors:

      (1) The data availability statement makes no explicit mention of the manual labels of HG configuration. Would the authors consider making available a list of HCP-subject-ID with a morphological group (L1/R1, L1/R2, etc.) for replicability and for re-use by other researchers?

      The list of HCP-subject-ID with a morphological group (L1/R1, L1/R2, etc.) is now available in the supplementary material 2. We have specified this in the revised version.

      (2) It would be helpful to state again the statistical tests associated with the p-value in the figure/table caption, e.g. Table 2.

      As suggested, we now specified the statistical method in the figure/table caption.

      (3) Sometimes, the y-axis labels are missing or not clear, for example in Figure S2.

      Sorry about these. We double-checked all the figures, and corrected the missing or unclear labels for Figure S2 and S3 in the revised version.

      (4) In a few instances the font sizes vary within a figure caption.

      This has been corrected in the revision.

      Reference

      Agosta F, Henry RG, Migliaccio R, Neuhaus J, Miller BL, Dronkers NF, Brambati SM, Filippi M, Ogar JM, Wilson SM, Gorno-Tempini ML. 2010. Language networks in semantic dementia. Brain J Neurol 133:286–299. doi:10.1093/brain/awp233

      Albouy P, Benjamin L, Morillon B, Zatorre RJ. 2020. Distinct sensitivity to spectrotemporal modulation supports brain asymmetry for speech and melody. Science 367:1043–1047. doi:10.1126/science.aaz3468

      Allendorfer JB, Hernando KA, Hossain S, Nenert R, Holland SK, Szaflarski JP. 2016. Arcuate fasciculus asymmetry has a hand in language function but not handedness. Hum Brain Mapp 37:3297–3309. doi:10.1002/hbm.23241

      Catani M, Jones DK, Ffytche DH. 2005. Perisylvian language networks of the human brain. Ann Neurol 57:8–16. doi:10.1002/ana.20319

      Giampiccolo D, Duffau H. 2022. Controversy over the temporal cortical terminations of the left arcuate fasciculus: a reappraisal. Brain J Neurol 145:1242–1256. doi:10.1093/brain/awac057

      Meyer M, Alter K, Friederici AD, Lohmann G, von Cramon DY. 2002. FMRI reveals brain regions mediating slow prosodic modulations in spoken sentences. Hum Brain Mapp 17:73–88. doi:10.1002/hbm.10042

      Zeineh MM, Kang J, Atlas SW, Raman MM, Reiss AL, Norris JL, Valencia I, Montoya JG. 2015. Right arcuate fasciculus abnormality in chronic fatigue syndrome. Radiology 274:517–526. doi:10.1148/radiol.14141079

      Zhang H, Schneider T, Wheeler-Kingshott CA, Alexander DC. 2012. NODDI: Practical in vivo neurite orientation dispersion and density imaging of the human brain. NeuroImage 61:1000–1016. doi:10.1016/j.neuroimage.2012.03.072

      Zhang J, Zhong S, Zhou L, Yu Yamei, Tan X, Wu M, Sun P, Zhang W, Li J, Cheng R, Wu Y, Yu Yanmei, Ye X, Luo B. 2021. Correlations between Dual-Pathway White Matter Alterations and Language Impairment in Patients with Aphasia: A Systematic Review and Meta-analysis. Neuropsychol Rev 31:402–418. doi:10.1007/s11065-021-09482-8

      Reviewing Editor:

      I encourage the authors to incorporate the suggestions of the reviewers, such as:

      (1) to provide more in-depth interpretations about how and why structural and functional lateralization relate,

      Done.

      (2) to provide statistical effect sizes,

      Done.

      (3) to make their sulcal-morphology classification openly available,

      Done.

      (4) to provide statistical effect sizes,

      Done

      (5) to discuss the possible impact of diverging PT definitions with regard to previous studies,

      Done.

      (6) to provide more in-depth interpretations about how and why structural and functional lateralization relate.

      Done.

      Detailed comments:

      In an impressive cohort of 907 human participants, the present paper presents a very interesting set of data on PT asymmetries not only at the macro-structural but also at the microstructural levels in order to investigate their potential correlates with PT functional asymmetry in relation to perceptual acoustic language tasks.

      I believe this is a key paper for the following reasons:

      (1) it provides critical data and results for addressing a controversial but important question: the relevance of measures of anatomical asymmetry for inferring its language-related functional hemispheric specialization;

      (2) to do so, the authors made a very impressive effort to manually trace the anatomical delineation of the planum temporale at different levels in every participant, the best (but crazy time-consuming) approach so far to document interindividual variability of the PT and to address such a question;

      (3) the contribution is particularly relevant regarding the statistical power of the study, the study and measures having been done in 907 participants!

      (4) I also found the study well designed and well written with great relevance of the findings for the field.

      As the results, the authors reported asymmetric measures of microstructural asymmetry (including intracortical myelin content, neurite density, and neurite orientation) but also of macrostructural asymmetries in relation to functional lateralization for language.

      Comments:

      I have only 2 additional minor comments of my own:

      (1) In agreement with reviewer 2, I don't understand why the authors seem to downplay the links they found between gross PT asymmetry and functional lateralization. I recommend the authors to highlight and discuss this important result, just as the microstructural PT asymmetries and their functional links.

      This has been done (page 18, line 363-370).

      (2) PT structural asymmetry (both micro & macro) has been well documented in nonhuman primates (and their functional link with manual lateralization for gestural communication). Without detailing this literature, I recommend the authors at least mention this literature as a comparative perspective in the introduction and/or discussion in order to make the question of PT asymmetry less anthropocentric.

      This has been done (page 15, line 281-288).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This work from Cui, Pan, Fan, et al explores memory impairment in chronic pain mouse models, a topic of great interest in the neurobiology field. In particular, the work starts from a very interesting observation, that WT mice can be divided into susceptible and unsusceptible to memory impairment upon modelling chronic pain with CCI. This observation represents the basis of the work where the authors identify the sphingosine receptor S1PR1 as down-regulated in the dentate gyrus of susceptible animals and demonstrate through an elegant range of experiments involving AAV-mediated knockdown or overexpression of S1PR1 that this receptor is involved in the memory impairment observed with chronic pain. Importantly for translational purposes, they also show that activation of S1PR1 through a pharmacological paradigm is able to rescue the memory impairment phenotype.

      The authors also link these defects to reduced dendritic branching and a reduced number of mature excitatory synapses in the DG to the memory phenotype.

      They then proceed to explore possible mechanisms downstream of S1PR1 that could explain this reduction in dendritic spines. They identify integrin α2 as an interactor of S1PR1 and show a reduction in several proteins involved in actin dynamic, which is crucial for dendritic spine formation and plasticity.

      They thus hypothesize that the interaction between S1PR1 and Integrin α2 is fundamental for the activation of Rac1 and Cdc42 and consequently for the polymerisation of actin; a reduction in this pathway upon chronic pain would thus lead to impaired actin polymerisation, synapse formation, and thus impaired memory.

      The work is of great interest and the experiments are of very good quality with results of great importance. I have however some concerns. The main concern I have relates to the last part of the work, namely Figures 8 and 9, which I feel are not at the same level as the results presented in the previous 7 Figures, which are instead outstanding.

      In particular:

      - In Figure 8, given the reduction in all the proteins tested, the authors need to check some additional proteins as controls. One good candidate could be RhoA, considering the authors say it is activated by S1PR2 and not by S1PR1;

      Thanks for your suggestion. We tested the expression level of RhoA in mice 7 days and 21 days post CCI as negative controls (Supplemental Figure 9).

      - In addition to the previous point, could the authors also show that the number of neurons is not grossly different between susceptible and unsusceptible mice? This could be done by simply staining for NeuN or performing a western blot for a neuronal-specific protein (e.g. Map2 or beta3-tubulin);

      As suggested, we performed immunofluorescence using NeuN antibody to detect the number of neurons in susceptible and unsusceptible mice. The number is not significantly different between the two populations (Supplementary Figure 7).

      - In Figure 8, the authors should also evaluate the levels of activated RAC1 and activated Cdc42, which are much more important than just basal levels of the proteins to infer an effect on actin dynamics. This is possible through kits that use specific adaptors to pulldown GTP-Rac1 and GTP-Cdc42;

      Thanks for your constructive suggestion. An elevated level and hyperactivation of Rac1 protein are both associated with actin dynamics and dendritic development [1]. We agree that showing the levels of activated RAC1 is better to infer its effect on actin dynamics. Here in Figure 8, the purpose of this experiment is to prove the levels of actin organization related proteins are altered according to the expression level of S1PR1, thus drawing a conclusion that the actin organization was disrupted, but not to specifically emphasize that S1PR1 activated these proteins. We apologize for the confusion made but we think the current data is enough to support the conclusion.

      Thanks again for your advice. Your understanding is greatly appreciated.

      - In Figure 9C, the experiment is performed in an immortalised cell line. I feel this needs to be performed at least in primary hippocampal neurons;

      Thanks for your suggestion. As suggested, we performed the experiment in primary hippocampal neurons. Knockdown of S1pr1 in primary hippocampal neurons induced reduction in the number of branches and filamentous actin. Please refer to the updated Figure 9C.

      - In Figure 9D, the authors use a Yeast two-hybrid system to demonstrate the interaction between S1PR1 and Integrin α2. However, as the yeast two-hybrid system is based on the proximity of the GAL4 activating domain and the GAL4 binding domain, which are used to activate the transcription of reporter genes, the system is not often used when probing the interaction between transmembrane proteins. Could the authors use other transmembrane proteins as negative controls?;

      Thanks for your question. We apologize for the unclear description in the method part. Traditional yeast two-hybrid system can only detect protein interactions that occur in the nucleus, but cannot detect ones between membrane proteins. Here, we utilized the split-ubiquitin membrane-based Yeast two-hybrid system. Briefly, in the ubiquitin system, ubiquitin, a protein composed of 76 amino acid residues that can mediate the ubiquitination degradation of target proteins by proteasomes, is split into two domains, namely Cub at the C-terminus and NbuG at the N-terminus, which are fused and expressed with the bait protein “Bait” and the prey protein “Prey”, respectively. At the same time, Cub is also fused with transcription factors. If Bait and Prey proteins could bind, Cub and NbuG would be brought together and a complete ubiquitin would be formed, which would be recognized by the proteasome and the fused transcription factor would be cut off and enter the cell nucleus to activate the expression of the reporter gene. We then determine whether the Bait and Prey proteins interact with each other through the growth of the yeast.

      Thanks again for pointing this out. We reworded the method in M&M (Line 678-696).

      - In Figure 9E, the immunoblot is very unconvincing. The bands in the inputs are very weak for both ITGA2 and S1PR1, the authors do not show the enrichment of S1PR1 upon its immunoprecipitation and the band for ITGA2 in the IP fraction has a weird appearance. Were these experiments performed on DG lysates only? If so, I suggest the authors repeat the experiment using the whole brain (or at least the whole hippocampus) so as to have more starting material. Alternatively, if this doesn't work, or in addition, they could also perform the immunoprecipitation in heterologous cells overexpressing the two proteins;

      Thanks for the question and suggestion. We used DG lysates from both the dentate gyrus of a single mouse as the starting material. We updated the result which showed clearer bands (Figure 9E).

      - About the point above, even if the results were convincing, the authors can't say that they demonstrate an interaction in vivo. In co-IP experiments, the interaction is much more likely to occur in the lysate during the incubation period rather than being conserved from the in vivo state. These co-IPs demonstrate the ability of proteins to interact, not necessarily that they do it in vivo. If the authors wanted to demonstrate this, they could perform a Proximity ligation assay in primary hippocampal neurons, using antibodies against S1PR1 and ITGA2.

      Thanks for your concern. Co-immunoprecipitation (Co-IP) is the gold standard to identify protein-protein interactions [2], and it is one of the most efficient techniques to study these protein-protein interactions in vivo [3]. We repeated the experiment and followed the experimental procedure exactly to avoid the protein interaction due to over-incubation. Over-incubation, particularly at room temperature, may result in non-specific binding and therefore high background, thus we performed Co-IPs at 4°C to preserve protein interactions. We agree that Proximity ligation assay is better suited for studies of endogenously expressed proteins in primary cells [4]. Since we optimized the experiment procedure to avoid non-specific binding and particularly, Co-IP utilized proteins from DG lysates which could validate the specificity of the protein interaction in native tissue, we prefer to keep the Co-IP result in Figure 9E.

      Thanks again for your suggestion. We appreciate your understanding on this matter.

      - In Figure 9H, could the authors increase the N to see if shItga2 causes further KD in the CCI?

      As suggested, we repeated the experiment and increased the N to 6. As shown in the following picture, shItga2 did not cause further KD in the CCI.

      Author response image 1.

      - To conclusively demonstrate that S1PR1 and ITGA2 participate in the same pathway, they could show that knocking down the two proteins at the same time does not have additive effects on behavioral tests compared to the knockdown of each one of them in isolation.

      Thanks for your suggestion. As suggested, we knocked down the two proteins at the same and did not observe additive effects on behavioral tests compared to the knockdown of each one of them in isolation. Please refer to Figure 9L-O.

      Other major concerns:

      - Supplementary Figure 5: the image showing colocalisation between S1PR1 and CamKII is not very convincing. Is the S1PR1 antibody validated on Knockout or knockdown in immunostaining?;

      S1PR1 is a membrane receptor and the S1P1 antibody (PA1-1040, Invitrogen) shows membranous staining with diffuse dot-like signals (Please refer to the image “A” provided by ThermoFisher Scientific). Here, we utilized the antibody to detect the expression of S1PR1 in DG granule cells. We can see the diffuse dot-like signals aggregated in each single granule cell. CaMKII shows intense staining around the border of the granule cell soma (Image “B”) [5]. According to the images shown in Supplementary Figure 5B, we concluded that S1PR1 is expressed in CaMKII+ cells.

      Besides, as suggested, we validated the S1PR1 antibody on knockdown in immunostaining (Image “C” and “D”). The expression of S1PR1 is significantly decreased compared with the control.

      Author response image 2.

      - It would be interesting to check S1PR2 levels as a control in CCI-chronic animals;

      As suggested, we quantified the S1PR2 levels in Sham and CCI animals, and there is no significant difference between groups (Supplementary Figure 9).

      - Figure 1: I am a bit concerned about the Ns in these experiments. In the chronic pain experiments, the N for Sham is around 8 whereas is around 20 for CCI animals. Although I understand higher numbers are necessary to see the susceptible and unsusceptible populations, I feel that then the same number of Sham animals should be used;

      Thanks for your concern. In the preliminary experiment, we noticed that the ratio of susceptible and unsusceptible populations is around 1:1. After the behavioral tests, we need to further take samples to investigate molecular and cellular changes of each group. Thus, we set sham around 8 and CCI around 20 to ensure that after characterization into susceptible and unsusceptible groups, each group has relatively equal numbers for further investigations.

      - Figures 1E and 1G have much higher Ns than the other panels. Why is that? If they have performed this high number of animals why not show them in all panels?;

      Thanks for your concern. For Figure 1B, C, D and F, we showed the data for each batch of experiment, while for Figure 1E and 1G, we used data collected from all batches of experiment. To show the data from a single batch, we would like to demonstrate the ratio of susceptible to unsusceptible is relatively stable, but not only based on a big sample size.

      - In the experiments where viral injection is performed, the authors should show a zoomed-out image of the brain to show the precision of the injection and how spread the expression of the different viruses was;

      As suggested, we showed the zoomed-out image in Supplementary Figure 6. The viruses are mainly expressed in the hippocampal DG.

      - The authors should check if there is brain inflammation in CCI chronic animals. This would be interesting to explain if this could be the trigger for the effects seen in neurons. In particular, the authors should check astrocytes and microglia. This is of interest also because the pathways altered in Figure 8A are related to viral infection.

      - If the previous point shows increased brain inflammation, it would be interesting for the authors to check whether a prolonged anti-inflammatory treatment in CCI animals administered before the insurgence of memory impairment could stop it from happening;

      - In addition, the authors should speculate on what could be the signal that can induce these molecular changes starting from the site of injury;

      - Also, as the animals are all WT, the authors should speculate on what could render some animals prone to have memory impairments and others resistant.<br />

      Thanks for the above four suggestions. We have observed inflammation including T cell infiltration and microglia activation in the hippocampal DG in CCI chronic animals and also used S1PR1 modulator which has anti-lymphocyte mediated inflammatory effect to prevent the insurgence of memory impairment from happening. We also examined the alteration in the numbers of peripheral T-lymphocyte subsets and the serum levels of cytokines. Furthermore, we found a neuron-microglia dialogue in the DG which may promote the resilience to memory impairment in CCI animals. Since these are unpublished results, we apologize that we would not give much detailed information to the public at the current stage. We will publish these data as soon as possible. Thanks for your understanding.

      Reviewer #2 (Public Review):

      Summary:

      The study investigates the molecular mechanisms underlying chronic pain-related memory impairment by focusing on S1P/S1PR1 signaling in the dentate gyrus (DG) of the hippocampus. Through behavioural tests (Y-maze and Morris water maze) and RNA-seq analysis, the researchers segregated chronic pain mice into memory impairment-susceptible and -unsusceptible subpopulations. They discovered that S1P/S1PR1 signaling is crucial for determining susceptibility to memory impairment, with decreased S1PR1 expression linked to structural plasticity changes and memory deficits.

      Knockdown of S1PR1 in the DG induced a susceptible phenotype, while overexpression or pharmacological activation of S1PR1 promoted resistance to memory impairment and restored normal synaptic structure. The study identifies actin cytoskeleton-related pathways, including ITGA2 and its downstream Rac1/Cdc42 signaling, as key mediators of S1PR1's effects, offering new insights and potential therapeutic targets for chronic pain-related cognitive dysfunction.

      This manuscript consists of a comprehensive investigation and significant findings. The study provides novel insights into the molecular mechanisms of chronic pain-related memory impairment, highlighting the critical role of S1P/S1PR1 signaling in the hippocampal dentate gyrus. The clear identification of S1P/S1PR1 as a potential therapeutic target offers promising avenues for future research and treatment strategies. The manuscript is well-structured, methodologically sound, and presents valuable contributions to the field.

      Strengths:

      (1) The manuscript is well-structured and written in clear, concise language. The flow of information is logical and easy to follow.

      (2) The segregation of mice into memory impairment-susceptible and -unsusceptible subpopulations is innovative and well-justified. The statistical analyses are robust and appropriate for the data.

      (3) The detailed examination of S1PR1 expression and its impact on synaptic plasticity and actin cytoskeleton reorganization is impressive. The findings are significant and contribute to the understanding of chronic pain-related memory impairment.

      Weaknesses:

      (1) Results: While the results are comprehensive, some sections are data-heavy and could be more reader-friendly with summarized key points before diving into detailed data.

      Thanks for the suggestion. For the first sentence in each part/paragraph, we used statement that summarises what will be investigating in the following experiments to make it more reader-friendly. They are labeled as blue in the main text.

      (2) Discussion: There is a need for a more balanced discussion regarding the limitations of the study. For example, addressing potential biases in the animal model or limitations in the generalizability of the findings to humans would strengthen the discussion. Also, providing specific suggestions for follow-up studies would be beneficial.

      As suggested, we discussed more on the limitations of this study and outlined some directions for future research (Line 481-498).

      (3) Conclusion: The conclusion, while concise, could better highlight the study's broader impact on the field and potential clinical implications.

      Thanks. We reworded the conclusion to better highlight the impacts of this study (Line 501-505).

      Reviewer #3 (Public Review):

      Summary of the Authors' Objectives:

      The authors aimed to delineate the role of S1P/S1PR1 signaling in the dentate gyrus in the context of memory impairment associated with chronic pain. They sought to understand the molecular mechanisms contributing to the variability in memory impairment susceptibility and to identify potential therapeutic targets.

      Major Strengths and Weaknesses of the Study:

      The study is methodologically robust, employing a combination of RNA-seq analysis, viral-mediated gene manipulation, and pharmacological interventions to investigate the S1P/S1PR1 pathway. The use of both knockdown and overexpression approaches to modulate S1PR1 levels provides compelling evidence for its role in memory impairment. The research also benefits from a comprehensive assessment of behavioral changes associated with chronic pain.

      However, the study has some weaknesses. The categorization of mice into 'susceptible' and 'unsusceptible' groups based on memory performance requires further validation. Additionally, the reliance on a single animal model may limit the generalizability of the findings. The study could also benefit from a more detailed exploration of the impact of different types of pain on memory impairment.

      Assessment of the Authors' Achievements:

      The authors successfully identified S1P/S1PR1 signaling as a key factor in chronic pain-related memory impairment and demonstrated its potential as a therapeutic target. The findings are supported by rigorous experimental evidence, including biochemical, histological, and behavioral data. However, the study's impact could be enhanced by further exploration of the molecular pathways downstream of S1PR1 and by assessing the long-term effects of S1PR1 manipulation.

      Impact on the Field and Utility to the Community:

      This study is likely to have a significant impact on pain research by providing a novel perspective on the mechanisms underlying memory impairment in chronic pain conditions. The identification of the S1P/S1PR1 pathway as a potential therapeutic target could guide the development of new treatments.

      Additional Context for Readers:

      The study's approach to categorizing susceptibility to memory impairment could inspire new methods for stratifying patient populations in clinical settings.

      Recommendations:

      (1) A more detailed explanation of the k-means clustering algorithm and its application in categorizing mice should be provided.

      As suggested, we explained the k-means clustering algorithm in details (Line 697-711).

      (2) The discussion on the potential influence of different pain types or sensitivities on memory impairment should be expanded.

      Thanks for your suggestion. We discussed this point in the limitations of this study (Line 484-491).

      (3) The protocol for behavioral testing should be clarified and the potential for learning or stress effects should be addressed.

      Thanks for your suggestion. We clarified the order of the battery of behavioral tests in this study (Line 537-542). We start with the least stressful test (Y-maze) and leave the most stressful of all for last (Morris Water maze) [6]. Besides, we also conducted behavioral assays to prove that a one-day rest is enough to decrease carryover effects from prior test (Y-maze). We examined the stress related behaviors one day after Y-maze (23d post CCI) using open field test (OFT) and elevated plus maze (EPM). As shown in Author response image 3, the tests did not reflect the mice were under stressful circumstances. Thus, the order in which the tests were performed are appropriate in this study.

      Author response image 3.

      (4) Conduct additional behavioral assays for other molecular targets implicated in the study.

      We agree that other molecular targets on susceptibility to memory impairment would be interesting to know. Our study was designed to focus specifically on ITGA2 this time and we'd like to keep the focus intact, but we have included your point as a consideration for future study (Lines 496-498). Thank you for the suggestion.

      (5) The effective drug thresholds and potential non-specific effects of pharmacological interventions should be discussed in more detail.

      As suggested, we emphasized this point of drug SEW2871 in Line 242-245.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor concerns:

      - In Figure 6E the lines of the different groups are not visible. Showing the errors as error bars for each point would probably be better;

      We apologize for the mistake of using mean±SD here instead of mean±SEM. After changing to mean±SEM, the lines of Figure 6E, Figure 7E and 7L become much clearer. It looks a little bit messy to show the error bars since there are numerous points, so we prefer to keep the line style.

      - Do the authors have any speculation on why the % time in the quadrant is not further affected in the KD Itga2 in CCI animals (Figure 9K)?;

      In CCI animals, the level of S1PR1 expression is decreased. ITGA2 may participate in the same pathway with S1PR1. Thus, knocking down ITGA2 in CCI animals will not further affect the animal behaviors. This has been proved by knocking down the two proteins at the same time and no additive effects were observed on behavioral tests compared to the knockdown of each one of them in isolation (Figure 9L-O).

      - In the methods, it's unclear if in the multiple infusion, the animals were anaesthetised or kept awake;

      We have clarified this point in the method. mice were deeply anesthetized by 1% pentobarbital sodium (40 mg/kg, i.p.). (Line 649-650)

      - As the DG is quite small, could the authors clarify if, when performing western blots, they used the two DGs from one animal for each sample or if they pulled together the DGs of several animals?;

      We used the two DGs from one animal for each sample. The amount of protein extracted from each sample is enough for 20-30 times of Western Blot assays. We have now added this to the method for clarity (Line 612).

      - Is it possible to check the correlation between performance in the YM and MWM with S1PR1 levels?;

      We would also be interested in this point. The data that we have cannot reveal this for it is difficult to manipulate the S1PR1 levels by using KD and overexpression viruses.

      - EM images have a poor resolution in the figures, could the authors show higher-resolution images?;

      We have inserted 300 DPI images for high resolution output.

      - In line 268 there is a mention of an "ShLamb1"?

      We apologize for the mistake and it was revised.

      Reviewer #3 (Recommendations For The Authors):

      This study explored the role of S1P/S1PR1 signaling within the dentate gyrus (DG) in chronic pain-related memory impairment using a murine model. The authors identified decreased expression of S1PR1 in the DG of mice susceptible to memory deficits. They demonstrated that S1PR1 knockdown increased susceptibility to memory deficits, whereas its overexpression or pharmacological activation mitigated these effects. Further biochemical and immunofluorescence analyses indicated that disruptions in S1P/S1PR1 signaling were related to disruptions in actin cytoskeleton dynamics, influenced by molecular pathways involving ITGA2, Rac1/Cdc42 signaling, and the Arp2/3 complex. These findings offer intriguing insights and suggest a potential therapeutic target for treating memory impairment in chronic pain.

      Major Concerns:

      The following five major concerns are the same with the five recommendations from Reviewer 3 on Page 9-10. Please refer to the answers above.

      (1) The division of subjects into 'susceptible' and 'unsusceptible' categories requires further clarification regarding the methodologies and rationale employed, particularly concerning the use of the k-means clustering algorithm in data analysis. This explanation will strengthen the scientific grounding of the categorization process.

      (2) The categorization of 'susceptible' and 'unsusceptible' groups might also benefit from a more detailed analysis or discussion concerning the influence of different pain sensitivities or types of pain assessments. Although the study mentions that memory impairment stands independent of pain thresholds, a more nuanced exploration could provide deeper insights.

      (3) The article could benefit from more clarity on the protocol of behavioral testing, especially regarding the potential effects of repeated testing on performance outcomes due to learning or stress.

      (4) While the connection between S1P/S1PR1 signaling and the molecular pathways highlighted (ITGA2, Rac1/Cdc42, Arp2/3) is intriguing, only ITGA2 underwent further behavioral validation in vivo. Conducting additional behavioral assays for one or more of the molecular targets could substantially strengthen these findings.

      (5) Discussions regarding effective drug thresholds and the potential for non-specific effects are essential to fully evaluate the implications of pharmacological interventions utilized in the study.

      Minor Concerns:

      (1) Clarification of evidence of the specific infusion sites in pharmacological experiments would enhance the transparency and replicability of these methods.

      For the infusion of S1PR1 agonist, guide cannula (internal diameter 0.34 mm, RWD) was unilaterally implanted into DG of hippocampus (-1.3 A/P, -1.95 M/L, and -2.02 D/V) as evidenced by Figure 5B.

      (2) It would be beneficial if the manuscript provided details regarding the efficiency and reach of viral transfection within the neuronal population. This information would help in assessing the impact of genetic manipulations.

      S1PR1 immunostaining showed that the efficiency is quite high and the reach of viral transfection is sufficient.

      Author response image 4.

      (3) The manuscript should make explicit the normalization techniques used in quantitative assessments such as Western blotting, including the housekeeping genes or proteins used for this purpose.

      Here, we used housekeeping protein normalization for normalizing Western blot data. GAPDH was used as the internal control. First, the stained blot is imaged, a rectangle is drawn around the target protein in each lane, and the signal intensity inside the rectangle is measured by using ImageJ. The signal intensity obtained can then be normalized by being divided by the signal intensity of the loading internal control (GAPDH) detected on the same blot. The average of the ratios from the control group is calculated, and all individual ratios are divided by this average to obtain a new set of values, which represent the normalized values (Line 619-625).

      (4) Details about the control groups in behavioral assessments were subjected to comparable handling and experimental conditions as the chronic pain groups are crucial, barring nerve injury, for maintaining the integrity of the comparative analysis.

      We agree that a control group and an experimental group is identical in all respects except for one difference-nerve injury. We have added this point in the method (Line 520-522).

      Minor Recommendations:

      The following four minor recommendations are the same with the four minor concerns from Reviewer 3 on Page 12-13. Please refer to the answers above.

      (1) Clarify the specifics of infusion site verification in pharmacological experiments.

      (2) Provide details on the efficiency and neuronal reach of viral transfections.

      (3) Explicitly describe the normalization techniques used in quantitative assessments.

      (4) Ensure that control groups in behavioral assessments undergo comparable handling to maintain analysis integrity.

      References

      (1) Gualdoni, S., et al., Normal levels of Rac1 are important for dendritic but not axonal development in hippocampal neurons. Biology of the Cell, 2007. 99(8): p. 455-464.

      (2) Alam, M.S., Proximity Ligation Assay (PLA). Curr Protoc Immunol, 2018. 123(1): p. e58.

      (3) Song, P., S. Zhang, and J. Li, Co-immunoprecipitation Assays to Detect In Vivo Association of Phytochromes with Their Interacting Partners. Methods Mol Biol, 2021. 2297: p. 75-82.

      (4) Krieger, C.C., et al., Proximity ligation assay to study TSH receptor homodimerization and crosstalk with IGF-1 receptors in human thyroid cells. Frontiers in Endocrinology, 2022. 13.

      (5) Arruda-Carvalho, M., et al., Conditional Deletion of α-CaMKII Impairs Integration of Adult-Generated Granule Cells into Dentate Gyrus Circuits and Hippocampus-Dependent Learning. The Journal of Neuroscience, 2014. 34(36): p. 11919-11928.

      (6) Wolf, A., et al., A Comprehensive Behavioral Test Battery to Assess Learning and Memory in 129S6/Tg2576 Mice. PLoS One, 2016. 11(1): p. e0147733.

    1. Author response:

      We thank the reviewers for their feedback. We are currently revising the manuscript to address their questions and concerns. Here we briefly summarize our planned revisions.

      Reviewer 1 requested clarification on three points. We will clarify all these points with text edits. One point is brief enough to be addressed here: in cases when we pooled data from the left and right hemispheres, the reviewer wants to know how this was done. Simply put, we defined the “ipsi” side of the body as the side where the recorded DN resided, and we defined “contra” as the other side.

      Reviewer 2 requested clarification on two minor points. We will clarify these points with text edits and with an additional analysis.

      Reviewer 3 had a number of substantive concerns. Briefly:

      (1) The reviewer asks us to improve its discussion of some relevant literature. We will provide updated information on the DN steering network, and in particular, we will cite Bidaye et al. 2020 and Sapkal et al. 2024. We apologize for the oversight.

      (2) The reviewer asks us for immunofluorescent images documenting the expression patterns of our effector transgenes. With regard to GtACR1::eYPF expression, we will include these images in our resubmission. With regard to ReachR expression, we expressed this reagent stochastically under hs-FLP control, and so different brains had different expression patterns; however, we carefully documented the number of DNa02 cells that expressed ReachR in each brain. With regard to GFP expression, these expression patterns are available online from the FlyLight documentation associated with Namiki et al. eLife 2018 (https://splitgal4.janelia.org/precomputed/Descending%20Neurons%202018.html). The UAS-GFP transgene used by Namiki et al. 2018 (pJFRC200-10XUASIVS-myr::smGFP-HA in attP18) is different from the UAS-GFP transgene we used (10XUAS-IVS-mCD8::GFP(su(Hw)attP8), and so there may be minor differences in expression pattern. However, it should be noted that we only used GFP expression to target somata for patch clamp recording, and DNa01 and DNa02 somata have a distinctive location and a distinctive size; when we performed these recordings, we only targeted a soma in this location, and we verified that there were no “distractor” somata in this vicinity with similar size and appearance. The same applies to patch clamp recordings targeted via Halo7 expression (SiR110-HaloTag fluorescence). In paired recordings from both DNa02 and DN01, we verified the identity of each cell as described in Fig. S1.

      (3) The reviewer asks why we focused on DNa02 in the latter part of the manuscript, rather than DNa01. We made this decision because DNa02 is more highly predictive of steering behavior, as compared to DNa01 (Fig. 1H). Also, an impulse of DNa02 activity is followed by a relatively large turning maneuver, on average, whereas an impulse of DNa01 activity is followed by a relatively small turning maneuver (Fig. 1E-F). Moreover, DNa02 has many more synaptic inputs in the brain (Fig. 7A), and it has many more direct synaptic connections onto motor neurons (Fig. 1B).

      (4) The reviewer highlights difficulties in interpreting DN activity during backward movement (Figs. S3/S4). We included this material in the spirit of completeness, but we agree with the reviewer that it is difficult to interpret. In our revision, we will omit Fig. S3C and Fig. S4A-B, and we will revise these legends to improve clarity.

      (5) The reviewer asks why do a systematic analysis of paired DNa01 recordings, as we did for DNa02. It is difficult to get paired right/left recordings from two DNs of the same type in the same fly, while the fly is walking vigorously, and we were only able to get two such paired recordings from DNa01. We did not feel this was a sufficiently large sample size to support a systematic analysis. We chose not to invest more time in getting more paired DNa01 recordings because we thought that DNa02 was more important, for the reasons noted above.

      (6) The reviewer asks for an analysis of trials where bump-jump led to turning in the opposite direction to the DNa02 being recorded. We will provide this analysis in the revision.

      (7) The reviewer points out that “latent” steering drives might not be latent, as they might produce small postural changes we are not capturing. This is a fair point, and we will note this in our revision.

      (8) The reviewer asks for a systematic analysis of DNa01 inputs in Figure 7, similar to our analysis of DNa02 inputs. Here we would prefer to focus on DNa02, for three reasons. First, we think DNa02 is likely more important, for the reasons noted above. Second, there has been some uncertainty as to the identity of DNa01 in connectome data; indeed, in the hemibrain data set, the cell recently identified as DNa01 was annotated as VES006 (Schlegel et al. Nature 634: 139-152). Third, the cell now identified as DNa01 does not receive direct input from either the central complex or the mushroom body, and for this reason, we felt that the inputs to DNa01 might be less interesting to a general audience.

      (9) The reviewer wonders whether DNa01 is more involved in sideways movement, rather than rotational movement. Our data do not support this conclusion: rather, our data show that DNa01 is only weakly correlated with sideways movement. Thus, the forward filter (Fig. 1F) shows that an impulse of DNa01 activity is (on average) followed by a relatively small amount of sideways movement. Conversely, the reverse filter (in Fig. S2I) shows that an impulse of sideways movement is (on average) preceded by a relatively large amount of DNa01 activity.

      (10) The reviewer points out that the phenotype associated with optogenetic suppression in Fig. 8G is weak. We will highlight this point and discuss potential reasons for this weak phenotype in the revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      This study presents a valuable finding on sperm flagellum and HTCA stabilization. The evidence supporting the authors' claims is incomplete. The work will be of broad interest to cell and reproductive biologists working on cilium and sperm biology.

      We thank the Editor and the two reviewers for their time and thorough evaluation of our manuscript. We greatly appreciate their valuable guidance on improving our study. In the revised manuscript, we have conducted additional experiments and provided quantitative data in response to the reviewers' comments. Furthermore, we have refined the manuscript and added further context to elucidate the significance of our findings for the readers.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper, Wu et al. investigated the physiological roles of CCDC113 in sperm flagellum and HTCA stabilization by using CRISPR/Cas knockouts mouse models, co-IP, and single sperm imaging. They find that CCDC113 localizes in the linker region among radial spokes, the nexin-dynein regulatory complex (N-DRC), and doublet microtubules (DMTs) RS, N-DRC, and DMTs and interacts with axoneme-associated proteins CFAP57 and CFAP91, acting as an adaptor protein that facilitates the linkage between RS, N-DRC, and DMTs within the sperm axoneme. They show the disruption of CCDC113 produced spermatozoa with disorganized sperm flagella and CFAP91, DRC2 could not colocalize with DMTs in Ccdc113-/- spermatozoa. Interestingly, the data also indicate that CCDC113 could localize on the HTCA region, and interact with HTCA-associated proteins. The knockout of Ccdc113 could also produce acephalic spermatozoa. By using Sun5 and Centlein knockout mouse models, the authors further find SUN5 and CENTLEIN are indispensable for the docking of CCDC113 to the implantation site on the sperm head. Overall, the experiments were designed properly and performed well to support the authors' observation in each part. Furthermore, the study's findings offer valuable insights into the physiological and developmental roles of CCDC113 in the male germ line, which can provide insight into impaired sperm development and male infertility. The conclusions of this paper are mostly well supported by data, but some points need to be clarified and discussed.

      We thank Reviewer #1 for his or her critical reading and the positive assessment.

      (1) In Figure 1, a sperm flagellum protein, which is far away from CCDC113, should be selected as a negative control to exclude artificial effects in co-IP experiments.

      We greatly appreciate Reviewer #1’s insightful suggestion. In response, we selected two sperm outer dense fiber proteins, ODF1 and ODF2, which are located distant from the sperm axoneme, as negative controls in the co-IP experiments. As shown in Figure 1- figure supplement 1A and B, neither ODF1 nor ODF2 bound to CCDC113, indicating the interaction observed in Figure 1 is not an artifact.

      (2) Whether the detachment of sperm head and tail in Ccdc113-/- mice is a secondary effect of the sperm flagellum defects? The author should discuss this point.

      Good question. Considering that CCDC113 is localized in the sperm neck region and interacts with SUN5 and CENTLEIN, it may play a direct role in connecting the sperm head and tail. Indeed, PAS staining revealed that Ccdc113–/– sperm heads exhibit abnormal orientation in stages V–VIII of the seminiferous epithelia (Figure 6C-D). Furthermore, transmission electron microscopy (TEM) analysis indicated that the absence of CCDC113 caused detachment of the damaged coupling apparatus from the sperm head in step 9–11 spermatids (Figure 6E). These results suggest that the detachment of the sperm head and tail in Ccdc113–/– mice may not be a secondary effect of sperm flagellum defects. We have discussed this point further below:

      “CCDC113 can interact with SUN5 and CENTLEIN, but not PMFBP1 (Figure 7A-C), and left on the tip of the decapitated tail in Sun5–/– and Centlein–/– spermatozoa (Figure 7K and L). Furthermore, CCDC113 colocalizes with SUN5 in the HTCA region, and immunofluorescence staining in spermatozoa shows that SUN5 is positioned closer to the sperm nucleus than CCDC113 (Figure 7G and H). Therefore, SUN5 and CENTLEIN may be closer to the sperm nucleus than CCDC113. PAS staining revealed that Ccdc113–/– sperm heads are abnormally oriented in stages V–VIII seminiferous epithelia (Figure6 C and D), and TEM analysis further demonstrated that the disruption of CCDC113 causes the detachment of the destroyed coupling apparatus from the sperm head in step 9–11 spermatids (Figure 6E). All these results suggest that the detachment of sperm head and tail in Ccdc113–/– mice may not be a secondary effect of sperm flagellum defects.”

      (3) Given that some cytoplasm materials could be observed in Ccdc113-/- spermatozoa (Fig. 5A), whether CCDC113 is also essential for cytoplasmic removal?

      Good question. Unremoved cytoplasm could be detected in spermatozoa by using transmission electron microscopy (TEM) analysis, including disrupted mitochondria, damaged axonemes, and large vacuoles. These observations indicate defects in cytoplasmic removal in Ccdc113–/– mice. We have discussed this point as below:

      “Moreover, TEM analysis detected excess residual cytoplasm in spermatozoa, including disrupted mitochondria, damaged axonemes, and large vacuoles, indicating defects in cytoplasmic removal in Ccdc113–/– mice (Figure 5A).”

      (4) Although CCDC113 could not bind to PMFBP1, the localization of CCDC113 in Pmfbp1-/- spermatozoa should be also detected to clarify the relationship between CCDC113 and SUN5-CENTLEIN-PMFBP1.

      We appreciate Reviewer #1’s suggestion. We have analyzed the localization of CCDC113 in Pmfbp1-/- spermatozoa and found that CCDC113 was located at the tip of the decapitated tail in Pmfbp1-/- spermatozoa (Figure 7K and L). This finding has been incorporated into the revised manuscript as below:

      “To further elucidate the functional relationships among CCDC113, SUN5, CENTLEIN, and PMFBP1 at the sperm HTCA, we examined the localization of CCDC113 in Sun5-/-, Centlein–/–, and Pmfbp1–/– spermatozoa. Compared to the control group, CCDC113 was predominantly localized on the decapitated flagellum in Sun5-/-, Centlein–/–, and Pmfnp1–/– spermatozoa (Figure 7K and L), indicating SUN5, CENTLEIN, and PMFBP1 are crucial for the proper docking of CCDC113 to the implantation site on the sperm head. Taken together, these data demonstrate that CCDC113 cooperates with SUN5 and CENTLEIN to stabilize the sperm HTCA and anchor the sperm head to the tail.”

      Reviewer #2 (Public Review):

      Summary:

      In the present study, the authors select the coiled-coil protein CCDC113 and revealed its expression in the stages of spermatogenesis in the testis as well as in the different steps of spermiogenesis with expression also mapped in the different parts of the epididymis. Gene deletion led to male infertility in CRISPR-Cas9 KO mice and PAS staining showed defects mapped in the different stages of the seminiferous cycle and through the different steps of spermiogenesis. EM and IF with several markers of testis germ cells and spermatozoa in the epididymis indicated defects in flagella and head-to-tail coupling for flagella as well as acephaly. The authors' co-IP experiments of expressed CCDC113 in HEK293T cells indicated an association with CFAP91 and DRC2 as well as SUN5 and CENTLEIN.

      The authors propose that CCDC113 connects CFAP91 and DRC2 to doublet microtubules of the axoneme and CCDC113's association with SUN5 and CENTLEIN to stabilize the sperm flagellum head-to-tail coupling apparatus. Extensive experiments mapping CCDC13 during postnatal development are reported as well as negative co-IP experiments and studies with SUN5 KO mice as well as CENTLEIN KO mice.

      Strengths:

      The authors provide compelling observations to indicate the relevance of CCDC113 to flagellum formation with potential protein partners. The data are relevant to sperm flagella formation and its coupling to the sperm head.

      We are grateful to Reviewer #2 for his or her recognition of the strength of this study.

      Weaknesses:

      The authors' observations are consistent with the model proposed but the authors' conclusions for the mechanism may require direct demonstration in sperm flagella. The Walton et al paper shows human CCDC96/113 in cilia of human respiratory epithelia. An application of such methodology to the proteins indicated by Wu et al for the sperm axoneme and head-tail coupling apparatus is eagerly awaited as a follow-up study.

      We thank Reviewer 2 for his/her kindly help in improving the manuscript.  We now understand that directly detection of CCDC113 precise localization in sperm axoneme and head-tail coupling apparatus (HTCA) using cryo-electron microscopy (cryo-EM) could powerfully strengthen our model. Recent advances in cryo-EM have indeed advanced our understanding of axonemal structures analysis of axonemal structures and determined the structures of native axonemal DMTs from mouse, bovine, and human sperm (Leung et al., 2023; Zhou et al., 2023). However, high-resolution structures of sperm axoneme and HTCA regions, including those involving CCDC113, have yet to be fully characterized. Thus, we would like to discuss this point and consider it a valuable direction for future research.

      “Given that the cryo-EM of sperm axoneme and HTCA could powerfully strengthen the role of CCDC113 in stabilizing sperm axoneme and head-tail coupling apparatus, it a valuable direction for future research.”

      References:

      Bazan, R., Schröfel, A., Joachimiak, E., Poprzeczko, M., Pigino, G., & Wloga, D. (2021). Ccdc113/Ccdc96 complex, a novel regulator of ciliary beating that connects radial spoke 3 to dynein g and the nexin link. PLoS Genet, 17(3), e1009388.

      Ghanaeian, A., Majhi, S., McCafferty, C. L., Nami, B., Black, C. S., Yang, S. K., Legal, T., Papoulas, O., Janowska, M., Valente-Paterno, M., Marcotte, E. M., Wloga, D., & Bui, K. H. (2023). Integrated modeling of the Nexin-dynein regulatory complex reveals its regulatory mechanism. Nat Commun, 14(1), 5741.

      Leung, M. R., Zeng, J., Wang, X., Roelofs, M. C., Huang, W., Zenezini Chiozzi, R., Hevler, J. F., Heck, A. J. R., Dutcher, S. K., Brown, A., Zhang, R., & Zeev-Ben-Mordehai, T.  (2023). Structural specializations of the sperm tail. Cell, 186(13), 2880-2896.e2817

      Walton, T., Gui, M., Velkova, S., Fassad, M. R., Hirst, R. A., Haarman, E., O'Callaghan, C., Bottier, M., Burgoyne, T., Mitchison, H. M., & Brown, A. (2023). Axonemal structures reveal mechanoregulatory and disease mechanisms. Nature, 618(7965), 625-633.

      Zhou, L., Liu, H., Liu, S., Yang, X., Dong, Y., Pan, Y., Xiao, Z., Zheng, B., Sun, Y., Huang, P., Zhang, X., Hu, J., Sun, R., Feng, S., Zhu, Y., Liu, M., Gui, M., & Wu, J. (2023). Structures of sperm flagellar doublet microtubules expand the genetic spectrum of male infertility. Cell, 186(13), 2897-2910.e2819.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Please provide full gel for the Figure 2C experiment (could be as a supplementary file).

      Thanks for your insightful suggestions. We have replaced Figure 2C and provided the full gel in Figure 2-figure supplement 1A.

      (2) The authors write on Line 163 "In contrast, the flagellum staining appeared reduced in Ccdc113-/- seminiferous tubules (Fig. 2J, red asterisk)." However, the magnification of the pictures is not sufficient to distinguish anything in the panel mentioned, please provide others.

      Many thanks for pointing this out. We have provided the iconic figure to show the flagella defect in seminiferous tubules.

      (3) Please add statistical p-values for figures.

      Thanks for your valuable advice. We have added statistical p-values to the figures in the revised manuscript.

      (4) Line 128: Should "speculate" be "speculated"?

      Thank you for pointing out this problem. We have corrected it in the revised manuscript, as shown below:

      “Given that CFAP91 has been reported to stabilize RS on the DMTs (Bicka et al., 2022; Dymek et al., 2011; Gui et al., 2021) and cryo-EM analysis shows that CCDC113 is closed to DMTs, we speculated that CCDC113 may connect RS to DMTs by binding to CFAP91 and microtubules.”

      (5) In lines 384-385, more "-" is typed.

      Thank you for pointing out this problem. We have corrected it in the revised manuscript, as shown below:

      “Furthermore, CCDC113 colocalizes with SUN5 in the HTCA region, and immunofluorescence staining in spermatozoa shows that SUN5 is closer to the sperm nucleus than CCDC113 (Figure 7G and H). Therefore, SUN5 and CENTLEIN may be closer to the sperm nucleus than CCDC113.”

      (6) In general, the article has many typos and should be professionally proofread.

      Many thanks for pointing this out. We have thoroughly revised the manuscript with the assistance professional proofreading.

      Reviewer #2 (Recommendations For The Authors):

      Can the authors indicate in the Materials and Methods if n=3 biological replicates were done for all co-IP, EM, LM, and IF studies? The statistical analysis section indicates this but quantification is missing for most figures including co-IP, most IF, PAS staining, EM, etc.

      We thank Reviewer 2 for the insightful comments and guidance to improve our data quality. All the experiments in this study were repeated at least three times to ensure reproducibility. We have quantified the co-IP experiments in Figures 1C-H and 7A-F, the IF data in Figures 2K, 5C, and 5D, as well as the PAS staining in Figure 6C. Since electron microscopy samples require very little testicular tissue and the sections obtained are very thin, the likelihood of capturing sections specifically at the sperm head-tail junction is considerably low. This challenge makes it difficult to perform quantitative analysis and statistical evaluation in the TEM experiment. To address this limitation, we have quantified the percentage of _Ccdc113-/-_sperm heads with abnormal orientation in stages V–VIII of the seminiferous epithelium to indicate impaired head-to-tail anchorage.

      Figure S2 is compelling and might be indicated as a major figure instead of a supplementary figure.

      We appreciate the positive comment. We have included it as a major figure in Figure 3F.

      Figure 4A may be incomplete. Data sets for RNA expression suggest high expression in the ovary and other organs in males and females including the brain and are not indicated by the authors. Figure 4A may be considered for removal with a more complete study for another paper.

      Thank you for pointing out this issue. We reviewed RNA expression data from various tissues using RNA-Seq data from Mouse ENCODE (https://www.ncbi.nlm.nih.gov/gene/244608) and found that CCDC113 is highly expressed in the testis, but not significantly in the ovary and brain (Figure 4- figure supplement 1A). Additionally, we re-evaluated CCDC113 protein levels in the spleen, lung, kidney, testis, intestine, stomach, brain, and ovary, confirming that it is highly expressed in the testes, with negligible expression in the ovary and brain (Figure 4- figure supplement 1B). In line with Reviewer 2's suggestion, we have removed Figure 4A in the revised manuscript.

      There are grammatical errors throughout the manuscript and Figure 7 is truncated.

      Thank you for pointing out this problem. We have thoroughly revised the manuscript with the assistance professional proofreading.

      The Introduction and Discussion parts of the paper may need some clarification for the general reader. The material in the "Additional Context " section of the critique below may be a helpful place to introduce what a stage is, and the steps in germ cell development in the testis with the latter of course where and when the flagellum develops.

      We appreciate your valuable suggestions. We have referred to the material in the “Additional Context” section to introduce the stages of spermatogenesis and the steps in germ cell development in the testis in the introduction and results.

      “Male fertility relies on the continuous production of spermatozoa through a complex developmental process known as spermatogenesis. Spermatogenesis involves three primary stages: spermatogonia mitosis, spermatocyte meiosis, and spermiogenesis. During spermiogenesis, spermatids undergo complex differentiation processes to develop into spermatozoa, which includes nuclear elongation, chromatin remodeling, acrosome formation, cytoplasm elimination, and flagellum development (Hermo et al., 2010).”

      Hermo, L., Pelletier, R. M., Cyr, D. G., & Smith, C. E. (2010). Surfing the wave, cycle, life history, and genes/proteins expressed by testicular germ cells. Part 1: background to spermatogenesis, spermatogonia, and spermatocytes. Microscopy research and technique, 73(4), 241–278. https://doi.org/10.1002/jemt.20783

      “Pioneering work in the mid-1950s used the PAS stain in histologic sections of mouse testis to visualize glycoproteins of the acrosome and Golgi in seminiferous tubules (Oakberg, 1956). The pioneers discovered in cross-sectioned seminiferous tubules the association of differentiating germ cells with successive layers to define different stages that in mice are twelve, indicated as Roman numerals (XII). For each stage, different associations of maturing germ cells were always the same with early cells in differentiation at the periphery and more mature cells near the lumen. In this way, progressive differentiation from stem cells to mitotic, meiotic, acrosome-forming, and post-acrosome maturing spermatocytes was mapped to define spermatogenesis with the XII stages in mice representing the seminiferous cycle. The maturation process from acrosome-forming cells to mature spermatocytes is defined as spermiogenesis with 16 different steps that are morphologically distinct spermatids (O'Donnell L, 2015).”

      Oakberg, E. F. (1956). A description of spermiogenesis in the mouse and its use in analysis of the cycle of the seminiferous epithelium and germ cell renewal. The American journal of anatomy, 99(3), 391-413. https://doi.org/10.1002/aja.1000990303

      O'Donnell L. (2015). Mechanisms of spermiogenesis and spermiation and how they are disturbed. Spermatogenesis, 4(2), e979623. https://doi.org/10.4161/21565562.2014.979623

      For the Discussion, the authors indicate that the function of CCDC113 in mammals is unknown yet the authors point to the work of Walton et al on human respiratory epithelia that points to a function for CCDC96/113. The work in the manuscript here does indicate a role in sperm flagella and the head-to-tail coupling apparatus but remains descriptive until the methodology of Walton et al is applied. Hopefully, the authors will consider it for a follow-up study.

      Thank you for pointing out this problem. We have revised this part and highlighted the Walton et al’s work in the Discussion.

      “CCDC113 is a highly evolutionarily conserved component of motile cilia/flagella. Studies in the model organism, Tetrahymena thermophila, have revealed that CCDC113 connects RS3 to dynein g and the N-DRC, which plays essential role in cilia motility (Bazan et al., 2021; Ghanaeian et al., 2023). Recent studies have also identified the localization of CCDC113 within the 96-nm repeat structure of the human respiratory epithelial axoneme, and localizes to the linker region among RS, N-DRC and DMTs (Walton et al., 2023). In this study, we reveal that CCDC113 is indispensable for male fertility, as Ccdc113 knockout mice produce spermatozoa with flagellar defects and head-tail linkage detachment (Figure 3D).”

      “Overall, we identified CCDC113 as a structural component of both the flagellar axoneme and the HTCA, where it performs dual roles in stabilizing the sperm axonemal structure and maintaining the structural integrity of HTCA. Given that the cryo-EM of sperm axoneme and HTCA could powerfully strengthen the role of CCDC113 in stabilizing sperm axoneme and head-tail coupling apparatus, it a valuable direction for future research.”

      The Discussion may be focused on the key aspects of CCDC113 related to sperm flagella and the head-to-tail coupling apparatus that represent a genuine advance. The more speculative parts of the Discussion that have not been addressed by experimentation in the Results section may be considered for removal in the Discussion section.

      Thank you for pointing out this. We have removed the speculative parts of the Discussion that have not been addressed by experimentation in the Results section.

      Additional Context to help readers understand the significance of the work:

      Pioneering work in the mid-1950s used the periodic acid Schiff (PAS) stain in histologic sections of rodent testis to visualize glycoproteins of the acrosome and Golgi in seminiferous tubules. The pioneers discovered in cross-sectioned seminiferous tubules the association of differentiating germ cells with successive layers to define different stages that in mice are twelve, indicated as Roman numerals (XII). For each stage, different associations of maturing germ cells were always the same with early cells in differentiation at the periphery and more mature cells near the lumen. In this way, progressive differentiation from stem cells to mitotic, meiotic, acrosome-forming, and post-acrosome maturing spermatocytes was mapped to define spermatogenesis with the XII stages in mice representing the seminiferous cycle. The maturation process from acrosome-forming cells to mature spermatocytes is defined as spermiogenesis with 19 different steps that are morphologically distinct spermatids. It is from steps 8-19 of spermiogenesis that the formation of the flagellum takes place. Final maturation occurs in the epididymis as sperm move through the caput, corpus, and cauda of the organ with motile spermatozoa generated.

      Thank you very much!

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors aimed to investigate the oscillatory activity of GnRH neurones in freely behaving mice. By utilising GCaMP fiber photometry, they sought to record real-time neuronal activity to understand the patterns and dynamics of GnRH neuron firing and their implications for reproductive physiology.

      Strengths:

      (1) The use of GCaMP fiber photometry allows for high temporal resolution recordings of neuronal activity, providing real-time data on the dynamics of GnRH neurones.

      (2) Recording in freely behaving animals ensures that the findings are physiologically relevant and not artifacts of a controlled laboratory environment.

      (3) The authors used statistical methods to characterise the oscillatory patterns, ensuring the reliability of their findings.

      Weaknesses:

      (1) While the study identifies distinct oscillatory patterns in GnRH neurones' calcium dynamics, it falls short in exploring the functional implications of these patterns for GnRH pulsatility and overall reproductive physiology.

      The functional roles of pulsatile and surge patterns of GnRH release are extremely well established. We have found perfect correlations between GnRH neuron dendron GCaMP activity and LH pulses as well as the LH surge clearly indicating the function of these activity patterns. We do not know the functional role of the clustered high-frequency basal activity that we have discovered and, as noted in the Discussion, are unsure of its physiological importance. Although it may be minor, it will require future investigation.

      (2) The study lacks a broader discussion to include comparisons with existing studies on GnRH neurone activity and pulsatility and highlight how the findings of this study align with or differ from previous research and what novel contributions are made.

      The Reviewer fails to recognise that these are first recordings of GnRH neurons in vivo. There are no prior studies for comparison. We have noted the only other in vivo study (undertaken by ourselves) many years ago in anaesthetized mice. It was never expected that electrophysiological recordings of GnRH neurons in acute brain slices (by ourselves and others) would reflect their activity in vivo. Now that we know this to be the case, it would be churlish to point this out explicitly. We have made some modifications to the Discussion by comparing the present data more thoroughly with other in vivo GnRH secretion and kisspeptin neuron activity studies.

      (3) The authors aimed to characterise the oscillatory activity of GnRH neurons and successfully identified distinct oscillatory patterns. The results support the conclusion that GnRH neurons exhibit complex oscillatory behaviours, which are critical for understanding their role in reproductive physiology. However, it has not been made clear what exactly the authors mean by "multi-dimensional oscillatory patterns" and how has this been shown.

      The study shows three types of GnRH neuron activity; two of which would be classified as oscillatory in nature and these show different temporal dimensions.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors report GCaMP fiber-photometry recordings from the GnRH neuron distal projections in the ventral arcuate nucleus. The recordings are taken from intact, male and female, freely behaving mice. The report three patterns of neuronal activity:

      (1) Abrupt increases in the Ca2+ signals that are perfectly correlated with LH pulses.

      (2) A gradual, yet fluctuating (with a slow ultradian frequency), increase in activity, which is associated with the onset of the LH surge in female animals.

      (3) Clustered (high frequency) baseline activity in both female and male animals.

      Strengths:

      The GCaMP fiber-photometry recordings reported here are the first direct recordings from GnRH neurones in vivo. These recordings have uncovered a rich repertoire of activity suggesting the integration of distinct "surge" and "pulse" generation signals, and an ultradian rhythm during the onset of the surge.

      Weaknesses:

      The data analysis method used for the characterisation of the ultradian rhythm observed during the onset of the surge is not detailed enough. Hence, I'm left wondering whether this rhythm is in any way correlated with the clusters of activity observed during the rest of the cycle and which have similar duration.

      We have provided further information on the characterisation of the ultradian rhythm observed at the time of the surge. Whether this is related to the clustered basal activity is an interesting point but very difficult to resolve. We note that the “basal” and “surge” ultradian oscillations have very different durations of ~30 and ~80 min suggesting that they may be independent phenomenon. However, the only way to really exclude a similar genesis will be to establish the origin of each type of oscillatory activity. Preliminary data in the lab show that the RP3V kisspeptin neurons exhibit an identical pattern of ultradian oscillation at the time of the surge leading us to suspect that the surge oscillation is driven by this input. As noted in the Discussion it is presently difficult to determine where the high basal activity originates.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Evidence of Multi-Dimensional Oscillatory Patterns: The manuscript presents data showing the oscillatory activity of GnRH neurones with distinct frequency and amplitude characteristics. The analysis includes statistical tests that illustrate the variability in neuronal firing patterns. However, the multi-dimensional nature of this activity has not been demonstrated. It is not clear what is meant by "dimension" with regard to the calcium recordings (oscillatory activity). If the authors refer to the frequency content of the calcium signal then a proper Fourier or Wavelet analysis should be carried out to characterise the multiple frequencies present in the calcium dynamics in male mice and during various stages of the cycle in female mice

      The study shows three types of GnRH neuron activity; two of which would be classified as oscillatory in nature. One occurs for ~10 min every hour or so and the other occurs for ~ 12 hours once every 4-5 days. This does not require any analysis to distinguish between the two or claim that they are different i.e. multidimensional. 

      (2) Data Interpretation: Expand the discussion on the physiological relevance of the identified oscillatory patterns. Specifically, explore how these patterns might influence GnRH pulsatility, hormone secretion dynamics, and reproductive cycles.

      The functional roles of pulsatile and surge patterns of GnRH release are extremely well established. We have found perfect correlations between GnRH neuron dendron GCaMP activity and LH pulses as well as the LH surge clearly indicating the function of these activity patterns. We do not know the functional role of the clustered high-frequency basal activity that we have discovered and, as noted in the Discussion, are unsure of its physiological importance. Although it may be minor, it will require future investigation.

      (3) Literature Contextualisation: Broaden the discussion to include comparisons with existing studies on GnRH neuron activity and pulsatility. Highlight how the findings of this study align with or differ from previous research and what novel contributions are made.

      The Reviewer fails to recognise that these are first recordings of GnRH neurons in vivo. There are no prior studies for comparison. We have noted the only other in vivo study (undertaken by ourselves) many years ago in anaesthetized mice. It would be naive to expect that electrophysiological recordings of GnRH neurons in acute brain slices (by ourselves and others) would reflect their activity in vivo. Now that we know this to be the case, it would be churlish to point this out explicitly. We have made some modifications to the Discussion by comparing the present data more thoroughly with other in vivo GnRH secretion and kisspeptin neuron activity studies.

      (4) Future Directions: Suggest potential follow-up experiments to explore the regulatory mechanisms underlying the observed oscillatory patterns. This could include investigating the role of neurotransmitters, hormonal feedback mechanisms, and other factors that might influence GnRH neuron activity.

      By addressing these recommendations, the authors can further strengthen their manuscript and enhance its impact on the field.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions:

      (1) The authors might want to analyse their inter-peak interval data by fitting them to a simple parametric statistical model (the gamma distribution would be a good choice to capture the skewness of these data). This way they would be able to describe the observed variability, and if the fits are not good back up to their claims "The dSEs occurred on average ... and showed no clear modal distribution pattern (Fig. 2D)".

      Thank you for the suggestion. We have carried out Shapiro-Wilk tests for male inter-peak interval distribution and found a W value of 0.87 and P value <0.0001****, providing strong evidence that the data is not normally distributed. Skewness and Kurtosis values are 1.39 and 1.81 respectively, indicating that the distribution is right-skewed with a platykurtic distribution, indicating that the data is less peaked and more spread out than the normal distribution (with a kurtosis of 3). This has now been added to the manuscript.

      (2) If I understand correctly, in Figure 3D, inter-peak intervals from all 4 stages of the estrus cycle are pooled together. It would also be interesting if the authors gave the interval histograms for the different stages of the cycle separately.

      We have now plotted the inter-peak interval distribution histograms for each individual cycle next to the example traces in Figure 3. The descriptions of the distribution pattern are also updated in the figure legends.

      (3) In Figure 3C, one can see the mean interval for different animals (as open circles), is that right? Is the statistical test run on these animals mean, or is the entire dSEs dataset used? In any case, it's not clear to the reader how variable intervals are in individual recordings from each animal. Could the authors add this information (could be easily added in the figure caption)?

      The reviewer is correct, that each open circle is the mean interval for each animal. The statistical test was run on the animals mean. Now this information is added to the figure legend.

      (4) The authors should explain how they identify the regions (clusters) of high-frequency baseline activity, which they present in Figure 4.

      The relevant information is now added to the methods section under the heading ‘GCaMP6 fiber photometry and blood sampling’.

      (5) The authors should detail how to identify and characterise the ultradian rhythm they observe at the onset of the surge.

      The relevant information is now added to the methods section under the heading ‘GCaMP6 fiber photometry and blood sampling’.

      (6) The author could perform some kind of wavelet-type analysis to quantify and analyse how the frequency content of the observed Ca2+ signal changes over the cycle. From their current analysis, I am not sure whether the ultradian oscillations they observe during the surge are related to the low-activity cluster events they observe during the other stages of the cycle.

      This is an interesting point but very difficult to resolve. We note that the “basal” and “surge” ultradian oscillations have very different durations of ~30 and ~80 min suggesting that they may be independent phenomenon. However, the only way to really exclude a similar genesis will be to establish the origin of each type of oscillatory activity. Preliminary data in the lab show that the RP3V kisspeptin neurons exhibit an identical pattern of ultradian oscillation at the time of the surge leading us to suspect that the surge oscillation is driven by this input. As noted in the Discussion it is presently difficult to determine where the high basal activity originates.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviewer’s comments

      We are most grateful for the opportunity to address the reviewer comments. Point-by-point responses are presented below.

      Overall, the paper has several strengths, including leveraging large-scale, multi-modal datasets, using computational reasonable tools, and having an in-depth discussion of the significant results.

      We thank the reviewer for the very supportive comments.

      Based on the comments and questions, we have grouped the concerns and corresponding responses into three categories.

      (1) The scope and data selection

      The results are somewhat inconclusive or not validated.

      The overall results are carefully designed, but most of the results are descriptive. While the authors are able to find additional evidence either from the literature or explain the results with their existing knowledge, none of the results have been biologically validated. Especially, the last three result sections (signaling pathways, eQTLs, and TF binding) further extended their findings, but the authors did not put the major results into any of the figures in the main text.”

      The goal of this manuscript is to provide a list of putative childhood obesity target genes to yield new insights and help drive further experimentation. Moreover, the outputs from signaling pathways, eQTLs, and TF binding, although noteworthy and supportive of our method, were not particularly novel. In our manuscript we placed our focus on the novel findings from the analyses. We did, however, report the part of the eQTLs analysis concerning ADCY3, which brought new insight to the pathology of obesity, in Figure 4C.

      The manuscript would benefit from an explanation regarding the rationale behind the selection of the 57 human cell types analyzed. it is essential to clarify whether these cell types have unique functions or relevance to childhood development and obesity.

      We elected to comprehensively investigate the GWAS-informed cellular underpinnings of childhood development and obesity. By including a diverse range of cell types from different tissues and organs, we sought to capture the multifaceted nature of cellular contributions to obesity-related mechanisms, and open new avenues for targeted therapeutic interventions.

      There are clearly cell types that are already established as being key to the pathogenesis of obesity when dysregulated: adipocytes for energy storage, immune cell types regulating inflammation and metabolic homeostasis, hepatocytes regulating lipid metabolism, pancreatic cell types intricately involved in glucose and lipid metabolism, skeletal muscle for glucose uptake and metabolism, and brain cell types in the regulation of appetite, energy expenditure, and metabolic homeostasis.

      While it is practical to focus on cell types already proven to be associated with or relevant to obesity, this approach has its limitations. It confines our understanding to established knowledge and rules out the potential for discovering novel insights from new cellular mechanisms or pathways that could play significant roles in the pathogenesis if obesity. Therefore, it was essential to reflect known biology against the unexplored cell types to expand our overall understanding and potentially identify innovative targets for treatment or prevention.

      I wonder whether the used epigenome datasets are all from children. Although the authors use literature to support that body weight and obesity remain stable from infancy to adulthood, it remains uncertain whether epigenomic data from other life stages might overlook significant genetic variants that uniquely contribute to childhood obesity.

      The datasets utilized in our study were derived from a combination of sources, both pediatric and adult. We recognize that epigenetic profiles can vary across different life stages but our principal effort was to characterize susceptibility BEFORE disease onset.

      Given that the GTEx tissue samples are derived from adult donors, there appears to be a mismatch with the study's focus on childhood obesity. If possible, identifying alternative validation strategies or datasets more closely related to the pediatric population could strengthen the study's findings.

      We thank the reviewer for raising this important point. We acknowledge that the GTEx tissue samples are derived from adult donors, which might not perfectly align with the study's focus on childhood obesity. The ideal strategy would be a longitudinal design that follows individuals from childhood into adulthood to bridge the gap between pediatric and adult data, offering systematic insights into how early-life epigenetic markers influencing obesity later in life. In future work, we aim to carry out such efforts, which will represent substantial time and financial commitment.

      Along the same lines, the Developmental Genotype-Tissue Expression (dGTEx) Project is a new effort to study development-specific genetic effects on gene expression at 4 developmental windows spanning from infant to post-puberty (0-18 years). Donor recruitment began in August 2023 and remains ongoing. Tissue characterization and data production are underway. We hope that with the establishment of this resource, our future research in the field of pediatric health will be further enhanced.

      Figure 1B: in subplots c and d, the results are either from Hi-C or capture-C. Although the authors use different colors to denote them, I cannot help wondering how much difference between Hi-C and capture-C brings in. Did the authors explore the difference between the Hi-C and capture-C?

      Thank you for your comment. It is not within the scope of our paper to explore the differences between the Hi-C and Capture-C methods. In the context of our study, both methods serve the same purpose of detecting chromatin loops that bring putative enhancers to sometimes genomically distant gene promoters. Consequently, our focus was on utilizing these methods to identify relevant chromatin interactions rather than comparing their technical differences.

      (2) Details on defining different categories of the regions of interest

      Some technical details are missing.

      While the authors described all of their analysis steps, a lot of the time, they did not mention the motivation. Sometimes, the details were also omitted.”

      We have added a section to the revision to address the rationale behind different OCRs categories.

      Line 129: should "-1,500/+500bp" be "-500/+500bp"?

      A gene promoter was defined as a region 1,500 bases upstream to 500 bases downstream of the TSS. Most transcription factor binding sites are distributes upstream (5’) from TSS, and the assembly of transcription machinery occurs up to 1000 bases 5’ from TSS. Given our interest in SNPs that can potentially disrupt transcription factor binding, this defined promoter length allowed us to capture such SNPs in our analyses.

      How did the authors define a contact region?

      Chromatin contact regions identified by Hi-C or Capture-C assays are always reported as pairs of chromatin regions. The Supplementary eMethods provide details on the method of processing and interaction calling from the Hi-C and Capture-C data.

      The manuscript would benefit from a detailed explanation of the methods used to define cREs, particularly the process of intersecting OCRs with chromatin conformation data. The current description does not fully clarify how the cREs are defined.

      In the result section titled "Consistency and diversity of childhood obesity proxy variants mapped to cREs", the authors introduced the different types of cREs in the context of open chromatin regions and chromatin contact regions, and TSS. Figure 2A is helpful in some way, but more explanation is definitely needed. For example, it seems that the authors introduced three chromatin contacts on purpose, but I did not quite get the overall motivation.

      We apologize for the confusion. Our definition of cREs is consistent throughout the study. Figure 2A will be the first Figure 1A in the revision in order to aid the reader.

      The 3 representative chromatin loops illustrate different ways the chromatin contact regions (pairs of blue regions under blue arcs) can overlap with OCRs (yellow regions under yellow triangles – ATAC peaks) and gene promoters.

      (1) The first chromatin loop has one contact region that overlaps with OCRs at one end and with the gene promoter at the other. This satisfies the formation of cREs; thus, the area under the yellow ATAC-peak triangle is green.

      (2) The second loop only overlapped with OCR at one end, and there was no gene promoter nearby, so it is unqualified as cREs formation.

      (3) The third chromatin loop has OCR and promoter overlapping at one end. We defined this as a special cRE formation; thus, the area under the yellow ATAC-peak triangle is green.

      To avoid further confusion for the reader, we have eliminated this variation in the new illustration for the revised manuscript.

      Figure 2A: The authors used triangles filled differently to denote different types of cREs but I wonder what the height of the triangles implies. Please specify.

      The triangles are illustrations for ATAC-seq peaks, and the yellow chromatin regions under them are OCRs. The different heights of ATAC-seq peaks are usually quantified as intensity values for OCRs. However, in our study, when an ATAC-seq peak passed the significance threshold from the data pipeline, we only considered their locations, regardless of their intensities. To avoid further confusion for the reader, we have eliminated this variation in the new illustration for the revised manuscript.

      Figure 1B-c. the title should be "OCRs at putative cREs". Similarly in Figure 1B-d.

      cREs are a subset of OCRs.

      - In the section "Cell type specific partitioned heritability", the authors used "4 defined sets of input genomic regions". Are you corresponding to the four types of regions in Figure 2A? 

      Figure 2A is the first Figure 1A in the revision and is modified to showcase how we define OCRs and cREs.

      It seems that the authors described the 771 proxies in "Genetic loci included in variant-to-genes mapping" (ln 154), and then somehow narrowed down from 771 to 94 (according to ln 199) because they are cREs. It would be great if the authors could describe the selection procedure together, rather than isolated, which made it quite difficult to understand.

      In the Methods section entitled “Genetic loci included in variant-to-genes mapping," we described the process of LD expansion to include 771 proxies from 19 sentinel obesity-significantly associated signals. Not all of these proxies are located within our defined cREs. Figure 2B, now Figure 2A in the revision, illustrates different proportions of these proxies located within different types of regions, reducing the proxy list to 94 located within our defined cREs.

      Figure 2. What's the difference between the 771 and 758 proxies?

      13 out of 771 proxies did not fall within any defined regions. The remaining 758 were located within contact regions of at least one cell type regardless of chromatin state.

      (3) Typos

      In the paragraph "Childhood obesity GWAS summary statistics", the authors may want to describe the case/control numbers in two stages differently. "in stage 1" and "921 cases" together made me think "1,921" is one number.

      This has been amended in the revision.

      Hi-C technology should be spelled as Hi-C. There are many places, it is miss-spelled as "hi-C". In Figure 1, the author used "hiC" in the legend. Similarly, Capture-C sometime was spelled as "capture-C" in the manuscript.

      At the end of the fifth row in the second paragraph of the Introduction section: "exisit" should be "exist".

      In Figure 2A: "Within open chromatin contract region" should be "Within open chromatin contact region”

      These typos and terminology inconsistencies have been amended in the revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Zhang et al. report a genetic screen to identify novel transcriptional regulators that could coordinate mitochondrial biogenesis. They performed an RNAi-based modifier screen wherein they systematically knocked down all known transcription factors in the developing Drosophila eye, which was already sensitised and had decreased mitochondrial DNA content. Through this screen, they identify CG1603 as a potential regulator of mitochondrial content. They show that protein levels of mitochondrial proteins like TFAM, SDHA, and other mitochondrial proteins and mtDNA content are downregulated in CG1603 mutants. RNA-Seq and ChIP-Seq further show that CG1603 binds to the promoter regions of several known nuclear-encoded mitochondrial genes and regulates their expression. Finally, they also identified YL-1 as an upstream regulator of CG1603. Overall, it is a very important study as our understanding of the regulation of mitochondrial biogenesis remains limited across metazoans. Most studies have focused on PGC-1α as a master regulator of mitochondrial biogeneis, which seems a context-dependent regulator. Also, PGC-1α mediated regulation could not explain the regulation of 1100 genes that are required for mitochondrial biogenesis. Therefore, identifying a new regulator is crucial for understanding the overall regulation of mitochondrial biogenesis.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors aim to identify the nuclear genome-encoded transcription factors that regulate mtDNA maintenance and mitochondrial biogenesis. They started with an RNAi screening in developing Drosophila eyes with reduced mtDNA content and identified a number of putative candidate genes. Subsequently, using ChIP-seq data, they built a potential regulatory network that could govern mitochondrial biogenesis. Next, they focused on a candidate gene, CG1603, for further characterization. Based on the expression of different markers, such as TFAM and SDHA, in the RNAi and OE clones in the midgut cells, they argue that CG1603 promotes mitochondrial biogenesis and the expression of ETC complex genes. Then, they used a mutant of CG1603 and showed that both mtDNA levels and mitochondrial protein levels were reduced. Using clonal analyses, they further show a reduction in mitochondrial biogenesis and membrane potential upon loss of CG1603. They made a reporter line of CG1603, showed that the protein is localized to the mitochondria, and binds to polytene chromosomes in the salivary gland. Based on the RNA-seq results from the mutants and the ChIP data, the authors argue that the nucleus-encoded mitochondrial genes that are downregulated >2 folds in the CG1603 mutants and that are bound by CG1603 are related to ETC biogenesis. Finally, they show that YL-1, another candidate in the network, is an upstream regulator of CG1603.

      Strengths:

      This is a valuable study, which identifies a potential regulator and a network of nucleus-encoded transcription factors that regulate mitochondrial biogenesis. Through in-vivo and in-vitro experimental evidence, the authors identify the role of CG1603 in this process. The screening strategy was smart, and the follow-up experiments were nicely executed.

      Weaknesses:

      Some additional experiments showing the effects of CG1603 loss on ETC integrity and functionality would strengthen the work.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Fig 3F: SDHA levels are severely downregulated in CG1603 RNAi clones. Therefore, estimating mitochondrial volume based on the SDHA reporter might be misleading. I suggest the authors perform this experiment with an independent marker of mitochondria, like mitoTracker Green or other dyes. I also suggest checking for mitochondrial number/quantity/size by electron microscopy.

      Even though being downregulated, the SDHA-mNeon signal in EC clones clearly outlined mitochondria and the overall mitochondrial network, allowing us to quantify the total mitochondrial volume. Examining mitochondrial number/quantity/size by electron microscopy would further strengthen this statement, and we will consider it in future studies.

      (2) The authors might comment on whether there was any decrease in the volume of CG1603i clone cells. And whether this was taken into account while normalising the mitochondrial volume.

      The size/volume of CG1603i clone cells were indeed decreased, which was considered while normalizing the mitochondrial volume. We clarified this point in methods section (page 18, line 511-512 (revised version page 18, line 515-517)).

      (3) Line 230-234: Collectively, these results demonstrate that CG1603 promotes the expression of both nuclear and mtDNA-encoded ETC genes and boosts mitochondrial biogenesis. CG1603 RNAi produced very few EC clones, consistent with the notion that mitochondrial respiration is necessary for ISCs differentiation.

      (4) Quantifying the number of EC clone cells observed might help support this statement.

      This is a great point. We quantified the number of EC clone cells, and the data was included in the revised Figure 3—figure supplement.

      (5) Figure 5: The intensity of MTGreen in CH1603 clones seems comparable to that in control cells, at least visually. Since the authors claim a reduction in mitochondrial volume in CG1603 mutants, it is crucial to estimate mitochondrial volume based on MTGreen intensity in mutant and control cells.

      There are two types of clones shown in Figure 5:  germ cell clones including all 16 germ cells in the same egg chamber and follicle cell clones. We highlight these two types of clones in the revised Figure 5, to emphasize this point. The total MT Green intensity in both germ cell and follicle cell CG1603PBac clones were reduced, compared to germ cells in adjacent egg chambers and adjacent follicle cells in the same egg chamber, respectively. We included the quantification of MTGreen intensity in the revised Figure 5—figure supplement C. Examining mitochondrial number/quantity/size by electron microscopy would further strengthen this statement, and we will consider it in future studies.

      (6) Figure 8: It would be interesting to know what happens to steady-state mtDNA levels during YL-1 knockdown. If decreased, could overexpressing CG1603 in YL-1 knockdown cells rescue the phenotype?

      YL-1 knockdown reduced steady-state mtDNA levels in eyes, and overexpressing CG1603 restored mtDNA level in YL-1 knockdown cells. These results are included in the revised Figure 8-figure supplement C.

      Minor comments:

      (7) The paper is lucidly written, but there are minor typos in several places. The authors might proofread it to remove these errors.

      We corrected typos and other minor errors in the manuscript.

      (8) Quantification for Figure 8 - Supplementary needs to be included.

      We performed the quantification, and the result is shown in Figure 8—figure supplement B.

      Reviewer #2 (Recommendations For The Authors):

      (1) In lines 275-276 and Figure 6E, the authors mention that more than 800 nuclear-encoded mitochondrial genes were reduced by >2-folds in CG1603 mutants. One gene related to mitochondrial replication and three genes related to mtDNA transcription were among them. Was TFAM one of these candidates? What were the reduction levels of TFAM mRNA in RNA seq results? Can the author confirm it via RT-PCR?

      In RNAseq analyses, TFAM was differentially expressed with a log2 Fold-Change of “ -0.74”, corresponding to ~1.6-fold decrease, and hence was not one of these candidates that were down-regulated more than two folds in CG1603 mutant. Per reviewer’s suggestion, we carried out RT-PCR and found TFAM was downregulated about 2-fold in CG1603 mutant. We included this result in the revised Figure 6F and listed all differentially expressed genes in Supplementary file 5a.

      (2) In many places, the authors argued about the role of CG1603 in ETC biogenesis. Also, the RNA-seq data shows that 64 genes related to the ETC complex were reduced by > 2-fold in CG1603 mutant. Therefore, it would be critical to expand a little on this aspect. For example, what are these genes and related to which of the ETC complex? Can the authors show the reduced levels of some of the candidate genes from each complex via RT-PCR?

      We listed all ETC genes that were down-regulated more than two folds in CG1603 mutant in a separate sheet in Supplementary file 5b. We further validated the reduced expression of ETC genes by RT-PCR on three randomly selected candidate genes from each complex. The result is included in the revised Figure 6F.

      (3) To make their argument solid on the role of CG1603 on ETC biogenesis, it is important to show the assembly/integrity of ETC complexes as well as the functionality/activity of the ETC complexes in CG1603 mutants.

      We purified mitochondria, and assayed assembly/integrity of three ETC complexes (Complex I, II and IV) and their activities, using blue native PAGE analysis and in gel activity analysis, respectively.  The amount of these three complexes, and accordingly, their activities were all markedly reduced in CG1603 mutant compared to wt.  The result is included as Figure 4—figure supplement A.

      (4) CG1603 has already been named as cliff. Why do the authors not use this name, or alternatively propose one?

      We thank the reviewer for the note. The CG1603 has not been named as cliff when we were preparing this manuscript.

      (5) In lines 230-231, based on the TFAM-GFP and SDHA-mNG levels, the authors claim that "these results demonstrate that CG1603 promotes the expression of both nuclear and mtDNA-encoded ETC genes..." The authors may tone down this statement since it sounds overstating. It would be prudent to claim that a subset of genes are regulated by CG1603.

      We appreciate the reviewer’s suggestion. We revised the text to tone down this statement (page 8, line 201; page 9, line 229-230).

    1. Author response:

      Reviewer #1:

      Weaknesses:

      However, given that S1P is upstream NF-κB signaling, it is unclear if it offers conceptual innovations as compared to previous studies from the same team (Palazzo et al. 2020; 2022, 2023)

      We find distinct differences between the impacts of S1P- and NFkB-signaling on glial activation, neuronal differentiation of the progeny of MGPCs and neuronal survival in damaged retinas. In the current study we demonstrate that 2 consecutive daily intravitreal injections of S1P selectively activated mTor (pS6) and Jak/Stat3 (pStat3), but not MAPK (pERK1/2) signaling in Müller glia.  Further, inhibition of S1P synthesis (SPHK1 inhibitor) decreased ATF3, mTor (pS6) and pSmad1/5/9 levels in activated Müller glia in damaged retinas. Inhibition of NFkB-signaling in damaged chick retinas did not impact the above-mentioned cell signaling pathways (Palazzo et al., 2020). Thus, S1P-signaling impacts cell signaling pathways in MG that are distinct from NFκB, but we cannot exclude the possibility of cross-talk between NFkB and these pathways. Further, inhibition of NFκB-signaling potently decreases numbers of dying cells and increases numbers of surviving ganglion cells (Palazzo et al 2020). Consistent with these findings, a TNF orthologue, which presumably activates NFκB-signaling, exacerbates cell death in damage retinas (Palazzo et al., 2020). By contrast, 5 different drugs targeting S1P-signaling had no effect on numbers of dying cells and only one S1PR1 inhibitor modestly decreased numbers of dying cells (current study). In addition, inhibition of NFκB does not influence the neurogenic potential of MGPCs in damaged chick retinas (Palazzo et al., 2020), whereas inhibition of S1P receptors (S1PR1 and S1PR3) and inhibition of S1P synthesis (SPHK1) significantly increased the differentiation of amacrine-like neurons in damaged retinas (current study). Collectively, in comparison to the effects of pro-inflammatory cytokines and NFκB-signaling, our current findings indicate that S1P-signaling through S1PR1 and S1PR3 in Müller glia has distinct effects upon cell signaling pathways, neuronal regeneration and cell survival in damaged retinas. We will revise text in the Discussion to better highlight these important distinctions between NFκB- and S1P-signaling.

      Reviewer #2:

      Weaknesses:

      The methodology is not very clean. A number of drugs (inhibitors/ antagonists/agonists signal modulators) are used to modulate S1P expression or signaling in the retina without evidence that these drugs are reaching the target cells. No alternative evaluation if the drugs, in fact, are effective. The drug solubility in the vehicle and in the vitreous is not provided, and how did they decide on using a single dose of each drug to have the optimal expected effect on the S1P pathway?

      Müller glia are the predominant retinal cell type that expresses S1P receptors. Consistent with these patterns of expression, we report Müller glia-specific effects of different agonists and antagonists that increase or decrease S1P-signaling. Since we compare cell-level changes within contralateral eyes wherein one retina is exposed to vehicle and the other is exposed to vehicle plus drug, it seems highly probable that the drugs are eliciting effects upon the Müller glia. It is possible, but very unlikely, that the responses we observed could have resulted from drugs acting on extra-retinal tissues, which might secondarily release factors that elicit cellular responses in Müller glia. However, this seems unlikely given the distinct patterns of expression for different S1P receptors in Müller glia, and the outcomes of inhibiting Sphk1 or S1P lyase on retinal levels of S1P.

      For example, we provide evidence that S1PR1 and S1PR3 expression is predominant in Müller glia in the chick retina using single cell-RNA sequencing and fluorescence in situ hybridization (FISH). Thus, we expect that S1PR1/3-targeting small molecule inhibitors to directly act on Müller glia, which is consistent with our read-outs of cell signaling with injections of S1P in undamaged retinas. We show that SPHK1 and SGPL1, which encode the enzymes that synthesize or degrade S1P, are expressed by different retinal cell types, including the Müller glia. The efficacy of the drugs that target SPHK1 and SGPL1 was assessed by measuring levels of S1P in the retina. By using liquid chromatography and tandem mass spectroscopy (LC-MS/MS), we provide data that inhibition of S1P synthesis (inhibition of SPHK1) significantly decreased levels of S1P in normal retinas, whereas inhibition of S1P degradation (inhibition of SGPL1) increased levels of S1P in damaged retinas (Fig. 5).  These data suggest that the SPHK1 inhibitor and the SGPL1 inhibitor specifically act at the intended target to influence retinal levels of S1P.  Further, inhibition of SPHK1 (to decrease levels S1P) results in decreased levels of ATF3, pS6 (mTor) and pSMAD1/5/9 in Müller glia, consistent with the notion that reduced levels of S1P in the retina impacts signaling at Müller glia. Finally, we find similar cellular responses to chemically different agonists or antagonists, and we find opposite cellular responses to agonists and antagonists, which are expected to be complimentary if the drugs are specifically acting at the intended targets in the retina. We will revise the Discussion to better address caveats and concerns regarding the actions and specificity of different drugs within the retina following intravitreal delivery.

      We will provide the drug solubility specifications and estimates of the initial maximum dose per eye for each drug. For chick eyes between P7 and P14, these estimates will assume a volume of about 100 µl of liquid vitreous, 800 µl gel vitreous and an average eye weight of 0.9 grams. We will revise Table 1 (pharmacological compounds) with ranges of reported in vivo ED50’s (mg/kg) for drugs and we will list the calculated initial maximum dose (mg/kg equivalent per eye). Doses were chosen based on estimates of the initial maximum ocular dose that were within the range of reported ED50’s. However, as is the case for any in vivo model system, it is difficult to predict rates of drug diffusion out of the vitreous, how quickly the drugs are cleared from the entire eye, how much of the compound enters the retina, and how quickly the drug is cleared from the retina. Accordingly, we assessed drug specificity and sites of activation by relying upon readouts of cell signaling pathways, parsed with S1P receptor expression patterns, together with measurements of retinal levels of S1P following exposure to drugs targeting enzymes that catalyze synthesis or degradation of S1P, as described above.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Shrestha et al report an investigation of mechanisms underlying gustatory preference for carboxylic acids in Drosophila. They begin with a screen of selected IR mutants, identifying 5 candidates - 2 IR co-receptors and 3 other IRs - whose loss of function causes defects in feeding preference for one or more of the three tested carboxylic acids. The requirement for IR51b, IR94a, and IR94h in carboxylic acid responses is evaluated in more detail using behavior, electrophysiology (labellar sensilla), and calcium imaging (pharyngeal neurons). The behavioral valence of IR94a and IR94h neurons is assessed using optogenetics. Overall the study uses a variety of approaches to test and validate the requirement of IRs in pharyngeal carboxylic acid taste.

      Strengths:

      The involvement of the identified IRs in gustatory responses to carboxylic acids is very clear from this study. The authors use mutants and transgenic rescue experiments and evaluate outcomes using electrophysiology, behavior, and imaging. Complementary approaches of loss-of-function and artificial activation support the main conclusion that the identified pharyngeal neurons sense carboxylic acids and convey a positive behavioral valence.

      Weaknesses:

      Some aspects of expression analysis and calcium imaging need to be clarified to better support the conclusions.

      (1) The conclusion of two parallel IR-mediated pathways rests on expression analysis of Ir94a-GAL4 and Ir94h-GAL4 lines and the observation that Ir51b expression driven by either can rescue the Ir51b mutant phenotype. However, the expression analysis is not as rigorous as it needs to be for such a conclusion. Prior work found co-expression of Ir94a and Ir94h in the LSO. Here, the co-expression of the two drivers has not been examined, and Ir94a-GAL4 does not appear to be expressed in the LSO. Given the challenges in validating expression patterns in pharyngeal organs, the possibility that the drivers do not entirely capture endogenous expression cannot be ruled out. Rescue experiments using feeding preference or single-cell imaging don't suffice as validation. Plus, the expression of Ir51b could not be defined.

      Based on current literature, Ir94a and Ir94h exhibit distinct expression patterns localized to different sensory regions. Specifically, Ir94a is primarily expressed in the V5 region of the VCSO, where it co-localizes with Ir94c-GAL4 (Chen et al., 2017). Conversely, Ir94h is found in the L7-7 sensilla of the LSO, where it co-expresses with Ir94f, and also within the V2 cells of the VCSO. Notably, the projections of Ir94a and Ir94h into the dorso-anterior subesophageal ganglion suggest divergent expression patterns rather than co-expression in the pharyngeal regions (Koh et al., 2014). Regarding co-expression of Ir94a and Ir94h in the LSO, we did not find any evidence to support this claim. Our data reinforce this view, showing that Ir94a-GAL4 expression is limited to the VCSO, while Ir94h-GAL4 is present in both the LSO and VCSO. Thus, the notion of co-expression of Ir94a and Ir94h in the LSO is not substantiated by current evidence.

      As a reviewer suggested, it is possible that the GAL4 drivers utilized may not fully reflect the endogenous expression of these receptors. Despite this limitation, our behavioral, expression, and physiological analyses strongly suggest that Ir94a and Ir94h are located in distinct regions, supporting a model of two parallel IR-mediated pathways operating within the sensory system.

      In addition, RT-PCR analysis confirmed the presence of Ir51b. However, due to methodological constraints, we were unable to conduct cell-type-specific expression studies using Ir51b-GAL4. This limitation, which we have acknowledged in the manuscript, does not detract from our core findings but highlights an area for future research. Further studies utilizing cell-specific expression analysis and co-expression studies with additional drivers could offer more definitive insights into IR51b’s functional role and its interactions within broader IR-mediated pathways.

      (2) The description of methods and results for the ex vivo calcium imaging is not satisfactory. Details about which cells are being analyzed, and in which organs are not included. No solvent stimulus is tested. The temporal dynamics of the responses are not presented. Movies of the imaging are not included as supplementary information - it would be important to visualize those with what was considered modest movement.

      We appreciate this valuable feedback. As discussed above, Ir94h is specifically expressed in the L7-7 sensilla of the LSO, while Ir94a is expressed in the V2 cells of the VCSO. This evidence led us to focus specifically on these cells in our calcium imaging study to ensure accuracy and relevance. In our experiments, Adult hemolymph solution (AHL) (108 mM NaCl, 5 mM KCl, 8.2 mM MgCl2, 2 mM CaCl2, 4 mM NaHCO3, 1 mM NaH2PO4, 5 mM HEPES, pH 7.5) was used as the solvent and employed as a pre-stimulus (as mentioned in the Methods section). During this phase, we observed no changes in fluorescence, indicating that AHL itself did not influence the responses. Fluorescence changes occurred only when the test chemical, dissolved in AHL, was introduced. To further confirm that AHL had no impact on the results, we conducted continuous recordings with AHL alone before beginning our main experiments, and these trials confirmed the absence of fluorescence alterations. We have included the temporal dynamics and supplementary video recordings to provide a more comprehensive understanding of our findings.

      (3) The observed differences in phenotypes of Ir25a and Ir76b mutants are intriguing, as are those between the co-receptor mutants and Ir51b, Ir94a, and Ir94h, but have not been sufficiently considered. Prior studies have also found roles for other response modes (OFF response), other IRs and GRs, and other organs (labellum, tarsi) in behavioral responses to carboxylic acids. Overall, the authors' model may be overly simplistic, and the discussion does not do justice to how their model reconciles with the body of work that already exists.

      Stanley et al. (2021) reported that the gustatory detection of lactic acid requires both IRs and GRs functioning together. Specifically, they found that IR25a mediates the onset peak response (ON response) to lactic acid, while GRs dampen this response and contribute to a removal peak (OFF response). Interestingly, in Ir25a mutants, a small onset peak still occurred, while Gr64a-f mutants showed an enhanced onset, suggesting that IRs and GRs interact dynamically to modulate taste responses.

      In our previous work, we also observed the role of sweet GRs, in addition to Ir25a and Ir76b, in detecting carboxylic acids in the labellum (Shrestha et al., 2021). This raises the possibility of a similar interplay with carboxylic acids in our current study, where different IRs may contribute to distinct aspects of sensory responses in the pharynx, leading to the phenotypic differences we observed. Moreover, Chen et al. (2017) demonstrated that sour-sensing neurons in the tarsi express both IR76b and IR25a and specifically respond to carboxylic and inorganic acids without reacting to sweet or bitter compounds. This finding points to a specialized role for these receptors in sour detection and suggests a coordinated response involving multiple sensory organs—such as the labellum, tarsi, and pharynx.

      The phenotypic differences observed in our mutants align with a more integrated model of carboxylic acid detection, in which multiple receptors and sensory organs contribute to the overall behavioral response. This supports the idea that our current model offers a more detailed understanding of how different carboxylic acids are detected and processed by the gustatory system.

      Reviewer #2 (Public review):

      Shrestha et al investigated the role of IR receptors in the detection of 3 carboxylic acids in adult Drosophila. A low concentration of either of these carboxylic acids added to 2 mM sucrose (1% lactic acid (LA), citric acid (CA), or glycolic acid (GA)) stimulates the consumption of adult flies in choice conditions. The authors use this behavioral test to screen the impact of mutations within 33 receptors belonging to the IR family, a large family of receptors derived from glutamate receptors and expressed both in the olfactory and gustatory sensilla of insects. Within the panel of mutants tested, they observed that 3 receptors (IR25a, IR51b, and IR76b) impaired the detection of LA, CA, and GA, and that 2 others impacted the detection of CA and GA (IR94a and IR94h). Interestingly, impairing IR51b, IR94a, and IR94h did not affect the electrophysiological responses of external gustatory sensilla to LA, CA, and GA. Thanks to the use of GAL4 strains associated with these receptors and thanks to the use of poxn mutants (which do not develop external gustatory sensilla but still have functional internal receptors), they show evidence that IR94a and IR94h are only expressed in two clusters of gustatory neurons of the pharynx, respectively in the VCSO (ventral cibarial sense organ) and in the VCSO + LSO (labral sense organ). As for IR51b, the GAL4 approach was not successful but RT-PCR made on different parts of the insect showed an expression both in the pharyngeal organs and in peripheral receptors. These main findings are then complemented by a host of additional experiments meant to better understand the respective roles of IR94a and IR94h, by using optogenetics and brain calcium imaging using GCamp6. They also report a failed attempt to co-express IR51b, IR94a, and IR94h into external receptors, a co-expression which did not confer the capability of bitter-sensitive cells (expressing GR33a-GAL4) to detect either of the carboxylic acids. These data complete and expand previous observations made on this group and others, and dot to 2 new IR receptors which show an unsuspected specific expression, into organs that still remain difficult to study.

      The conclusions of this paper are supported by the data presented, but it remains difficult to make general conclusions as concerns the mechanisms by which carboxylic acids are detected.

      (1) All experiments were done with 1% of carboxylic acids. What is the dose dependency of the behavioral responses to these acids, and is it conceivable that other receptors are involved at other concentrations?

      In our study, we conducted experiments to examine the dose dependency of behavioral responses to carboxylic acids, with results presented in Supplementary Figure 1. We found that lower concentrations of carboxylic acids are perceived as attractive, while higher concentrations are aversive. This differential response suggests that the receptors identified in our study are primarily tuned to detect low concentrations of these acids. Since higher concentrations elicited aversive responses, it is plausible that additional receptors, beyond the scope of our study, may be involved in sensing these higher concentrations. These receptors could be part of other gustatory receptor neurons that respond specifically to increased acid levels, as fruit flies tend to avoid higher concentrations. We propose that future research could investigate these alternative pathways to gain a complete understanding of the behavioral responses to carboxylic acids. In summary, our findings suggest that specific receptors are involved in detecting low concentrations, while distinct receptor pathways—possibly mediated by other GRNs—may regulate responses to higher concentrations.

      (2) One result needs to be better discussed and hypotheses proposed - which is why the mutations of most receptors lead to a loss of detection (mutant flies become incapable of detecting the acid) while mutations in IR94a and IR94h make CA and GA potent deterrents. Does it mean that CA and GA are detected by another set of receptors that, when activated, make flies actively avoid CA and GA? In that case, do the authors think that testing receptors one by one is enough to uncover all the receptors participating in the detection of these substances?

      As we mentioned above, it is possible that distinct receptor pathways mediate avoidance of GA and CA. This suggests that CA and GA might activate different sets of receptors that trigger avoidance behavior, pointing to a more complex interplay of receptor activity than we initially considered. Certain acids may indeed be detected by multiple receptors, with each receptor contributing uniquely to the behavioral response. Regarding the sufficiency of testing receptors individually, we recognize the limitations of this approach. Examining receptors one by one may not reveal the full spectrum of receptors involved, especially due to potential interactions or compensatory mechanisms that only emerge when certain receptors are inactive. Therefore, a more holistic approach—such as genetic screens for behavioral responses or using complex genetic models to disrupt multiple receptors simultaneously—could provide deeper insights. Moving forward, incorporating receptor interactions that modulate each other, along with more comprehensive assays, could help explain these discrepancies by uncovering previously overlooked receptor functions.

      (3) The paper needs to be updated with a recent paper published by Guillemin et al (2024), indicating that LA is detected externally by a combination of IR94e, IR76b and IR25a. IR25a might help to form a fully functional receptor in GR33a neurons (a former study from Chen et al (2017) indicate that IR25a is expressed in all gustatory neurons of the pharynx).

      According to Guillemin et al. (2024), the combination of IR94e, IR76b, and IR25a is required for amino acid detection but not for detecting lactic acid (LA). In their calcium imaging experiments, 100 mM LA elicited a response similar to the vehicle control, suggesting that these receptors do not play a role in LA detection.

      (4) Although it was not the main focus of the paper, it would have been most interesting if the cells expressing IR94a and IR94h were identified, and placed on the functional map proposed by the group of Dahanukar (Chen et al 2017 Cell Reports, Chen et al 2019 Cell Reports).

      The expression patterns of IR94a and IR94h were previously detailed by Chen et al. (2017), showing that IR94h is expressed in the labial sense organ (LSO, specifically in L7-7) and the ventral cibarial sense organ (VCSO, V2), while IR94a is expressed in the VCSO (V5). Given this established information, we referenced these known expression patterns without replicating the mapping in our study. Our primary focus was to investigate the functional role of these neurons within the pharynx, and we believe we have successfully highlighted their specific contributions. However, we recognize that integrating the functional mapping of these neurons in alignment with the work of Dahanukar’s group would have strengthened our findings and provided a more comprehensive understanding. We acknowledge this as a limitation of our study and appreciate your suggestion, as it points to a valuable direction for future research.

      Reviewer #3 (Public review):

      Summary:

      In this work, the authors investigated the molecular and cellular basis of sour taste perception in Drosophila melanogaster, focusing on identifying receptors that mediate attractive responses to certain carboxylic acids. It builds on previous work from the same group that had identified the IR co-receptors IR25a and IR76b for this sensory process, screening a set of mutants in IRs to identify three, IR51b, IR94a, and IR94h, required for feeding preference responses to some or all of the tested acids.

      Strengths:

      The work is of interest because it assigns sensory roles to IRs of previously unknown function, in particular IR94a and IR94h, and points to pharyngeal neurons in which these receptors are expressed as the relevant sensory neurons (potentially with different roles for IR94a- and IR94h-expressing neurons). The work combines elegant genetics, simple but effective feeding and taste assays, chemo-/opto-genetic activation, and some calcium imaging. Overall the presented data look solid and well-controlled.

      Weaknesses:

      The in situ expression analysis relies entirely on transgenic driver lines for IR94a and IR94h (which had been previously described, though not fully cited in this work). Importantly, given that many of the behavioral experiments (genetic rescue, physiology, artificial activation) use the IR94a and IR94h GAL4 driver lines, it would be helpful to validate that these faithfully reflect IR94a and IR94h expression (as far as I can tell, such validation wasn't done in the original papers describing these lines as part of a large collection of IR drivers). For IR51b, pharyngeal expression is concluded indirectly from non-quantitative RT-PCR analysis (genetic reporters did not work). The lack of direct detection of gene/protein expression (for example, through RNA FISH, immunofluorescence, or protein tagging) would have made for a more complete characterization of these receptors (for example, there is no direct evidence that they also express IR25a and IR76b, as one might expect). Finally, the relationship of IR94a and IR94h neurons to other types of pharyngeal neurons remains unclear, as are their projection patterns in the SEZ.

      Conceptually, the work is of interest mostly to those in the immediate field; there have been a very large number of studies in the past decade (several from this lab) characterizing the contributions of different IRs to various chemosensory processes. The current work doesn't lend much insight into the nature of the minimal functional unit of gustatory IRs (reconstitution of a functional IR in a heterologous neuron/cell has not been achieved here, but this is a limitation of many other previous studies), nor to how different pharyngeal sensory pathways might collaborate to control behavior. Nevertheless, the findings provide a useful contribution to the literature.

      We appreciate your thoughtful feedback. As noted in our response, our primary objective was to investigate the sensory functions of IR94a and IR94h. To this end, we conducted behavioral assays, which we validated with additional approaches including genetic rescue, physiological tests, and artificial activation. Throughout these experiments, we extensively utilized Ir94a- and Ir94h-GAL4 driver lines. To ensure these lines accurately reflect the expression of IR94a and IR94h, we verified their expression patterns using immunohistochemistry across various body parts. Our results align with previous findings that show both receptors are exclusively expressed in the pharynx. Regarding IR51b, we employed RT-PCR due to its high sensitivity and specificity, which supported our hypothesis. Nonetheless, we agree that more direct detection methods would have provided a stronger validation of IR51b expression. Our previous study (Sang et al., 2024) also demonstrated the pharyngeal expression of co-expressed receptors, specifically IR25a and IR76b. However, we recognize that the lack of direct evidence for their co-expression with IR51b remains a significant gap. This limitation primarily stems from the unavailability of specific reagents needed for direct assays targeting IR51b, which restricted our experimental approach.

      You also raised the potential relationship between IR94a and IR94h neurons and other pharyngeal neuron types, including their projection patterns in the subesophageal zone. This is indeed an important area for future research that could clarify neural connectivity and further our understanding of sensory mechanisms. However, our study was focused on exploring sensory mechanisms in peripheral regions rather than detailed neural mapping in the SEZ. Investigating these connections would undoubtedly provide valuable insights into the neural circuitry involved and represents an intriguing direction for future research.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors aim to assess the effect of salt stress on root:shoot ratio, identify the underlying genetic mechanisms, and evaluate their contribution to salt tolerance. To this end, the authors systematically quantified natural variations in salt-induced changes in root:shoot ratio. This innovative approach considers the coordination of root and shoot growth rather than exploring biomass and the development of each organ separately. Using this approach, the authors identified a gene cluster encoding eight paralog genes with a domain-of-unknown-function 247 (DUF247), with the majority of SNPs clustering into SR3G (At3g50160). In the manuscript, the authors utilized an integrative approach that includes genomic, genetic, evolutionary, histological, and physiological assays to functionally assess the contribution of their genes of interest to salt tolerance and root development.

      Strengths:

      The holistic approach and integrative methodologies presented in the manuscript are essential for gaining a mechanistic understanding of a complex trait such as salt tolerance. The authors focused on At3g50160 but included in their analyses additional DUF247 paralogs, which further contributes to the strength of their approach. In addition, the authors considered the developmental stage (young seedlings, early or late vegetative stages) and growth conditions of the plants (agar plates or soil) when investigating the role of SR3G in salt tolerance and root or shoot development.

      Weaknesses:

      The authors' claims and interpretation of the results are not fully supported by the data and analyses. In several cases, the authors report differences that are not statistically significant (e.g., Figures 4A, 7C, 8B, S14, S16B, S17C), use inappropriate statistical tests (e.g., t-test instead of Dunnett Test/ANOVA as in Figures 10B-C, S19-23), present standard errors that do not seem to be consistent with the post-hoc Tukey HSD Test (e.g., Figures 4, 9B-C, S16B), or lack controls (e.g., Figure 5C-E, staining of the truncated versions with FM4-64 is missing).

      We thank the reviewer for their critical thoughts on the presented data. We have revised our data interpretation in the main text to more accurately reflect the results. Given the nature of our experimental setup, where we trace the roots of individual Arabidopsis seedlings grown on plates, there is considerable biological variation, which makes achieving strong statistical significance between samples or genotypes challenging. However, we think that the representation of the data as transparently as possible is necessary to provide the readers and reviewers a true picture of the variability that we are observing.  Consequently, we have centered our data interpretation around observable trends that facilitate drawing conclusions.

      The choice of statistical test is closely tied to the specific biological question being addressed. In Figures 10A-C, as in Figures 6A-B, we compared all genotypes to the wild-type Col-0 within each condition, and thus ANOVA analysis, testing the general effect of the genotype across both mutants and Col-0 wild-type is not appropriate. Similarly, in Figures S19-S23, we compared each mutant line to the wild-type Col-0 under each condition.

      We repeated the post-hoc Tukey HSD Test for Figures 4, 9B-C, and S16B and made adjustments where necessary (see tracked changes manuscript).

      The truncated versions do not localize to the plasma membrane; instead, they are targeted to the nucleus and cytosol, mimicking the localization pattern of free GFP, which was used as a control in Panel F. Therefore, we believe that having FM4-64 as a control for these specific images is not informative, but instead using free GFP is serving as a better control in that particular construct.

      In other cases, traits of root system architecture and expression patterns are inconsistent between different assays despite similar growth conditions (e.g., Figures S17A-B vs. 10A-C vs. 6A, and Figures S16B vs. 4A/9B), or T-DNA insertion alleles of WRKY75 that are claimed to be loss-of-function show comparable expression of WRKY75 as WT plants. Additionally, several supplemental figures are mislabeled (Figures S6-9), and some figure panels are missing (e.g., Figures S16C and S17E).

      We thank the reviewer for raising these points and noticing the inconsistency between different assays (e.g., Figures S17A-B vs. 10A-C vs. 6A, and Figures S16B vs. 4A/9B). As mentioned above, considerable biological variation makes achieving strong statistical significance between samples, genotypes, or experiments challenging. Thus, we have centered our data interpretation around observable “trends” between experiments to facilitate drawing conclusions. Considering Figures S17A-B, 10A-C, and 6A, we acknowledge the reviewer's concern about inconsistencies in root system architecture across experiments. Initially, we observed that the sr3g mutant had reduced lateral root length compared to Col-0 under salt stress. This led us to focus on this specific phenotypic trait rather than the overall root system architecture. Despite some variation, the sr3g mutant consistently showed a similar trend/phenotype when compared to Col-0 under salt stress. We believe the variation in main root length and lateral root number between experiments is due to inherent differences between biological replicates.

      Regarding gene expression patterns between Figures S16B and 4A/9B, we included part of Figure 9B (SR3G gene expression in Col-0) in Figure 4A. Figure S16B represents a completely different assay. Despite variations between assays, the overall message remains consistent: SR3G gene expression is induced under salt stress in the root but not in the shoot.

      Both SR3G and WRKY75 are expressed at very low levels, even under the 75 mM salt stress condition we tested. When gene expression is so low, detecting changes is challenging due to inherent variations. Nonetheless, we observed a reduction in WRKY75 expression in the mutant lines compared to wild-type Col-0, though this reduction was not statistically significant. More importantly, we observed a similar phenotype in the wrky75 mutant, specifically reduced main root length under salt stress, consistent with the findings of the published paper in The Plant Cell by Lu et al. (2023) “Lu, K.K., Song, R.F., Guo, J.X., Zhang, Y., Zuo, J.X., Chen, H.H., Liao, C.Y., Hu, X.Y., Ren, F., Lu, Y.T. and Liu, W.C., 2023. CycC1; 1–WRKY75 complex-mediated transcriptional regulation of SOS1 controls salt stress tolerance in Arabidopsis. The Plant Cell, 35(7), pp.2570-2591”.

      We appreciate the reviewer for spotting the missing labels for Figures S6-9. We corrected them at the main text, figures, and legends. We added panel C to Figure S16 and removed panel E from Figure S17 legend,  now they match to actual figures and legends.

      Consequently, the authors' decisions regarding subsequent functional assays, as well as major conclusions about gene function, including SR3G function in root system architecture, involvement in root suberization, and regulation of cellular damage are incomplete.

      We greatly appreciate the reviewer's thorough review of our manuscript and their critical comments. We have carefully addressed all comments and concerns.

      Reviewer #2 (Public Review):

      Salt stress is a significant and growing concern for agriculture in some parts of the world. While the effects of sodium excess have been studied in Arabidopsis and (many) crop species, most studies have focused on Na uptake, toxicity, and overall effects on yield, rather than on developmental responses to excess Na, per se. The work by Ishka and colleagues aims to fill this gap.

      Working from an existing dataset that exposed a diverse panel of A. thaliana accessions to control, moderate, and severe salt stress, the authors identify candidate loci associated with altering the root:shoot ratio under salt stress. Following a series of molecular assays, they characterize a DUF247 protein which they dub SR3G, which appears to be a negative regulator of root growth under salt stress.

      Overall, this is a well-executed study that demonstrates the functional role played by a single gene in plant response to salt stress in Arabidopsis.

      The abstract and beginning of the Discussion section highlight the "new tool" developed here for measuring biomass accumulation. I feel that this distracts from the central aims of the study, which is really about the role of a specific gene in root development under salt stress. I would suggest moving the tool description to less prominent parts of the manuscript.

      We appreciate the reviewer's suggestion. We believe that the innovative tool used to extract shoot-to-root ratio data from previous experiments underscores the value of reutilizing previously acquired data for new discoveries and demonstrates how reanalyzing the same data can provide fresh insights, such as identification of new allelic variation. Therefore, we decided to retain this section, as our discovery of the SR3G gene originated from this innovative tool.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      Line 58 (opening sentence) - salt accumulation in the soil is not caused by evaporation exceeding input; that scenario results in soil water deficit. The issue is when the input water has dissolved ions.

      We thank the reviewer for raising this important point. While this point is theoretically true, all of the water that is found in natural environments contains some dissolved ions. Therefore, drought conditions will lead, over time, to increased soil salinization. We have amended this sentence to represent our point better.

      “Salt stress is predominant in the dryland areas where evaporation rate exceeds water input. As all water contains dissolved ions, the prolonged exposure to drought stress results in increased accumulation of salts in the upper soil layers 1–3.”

      I feel that it would be helpful, for replication and for interpretation, if the authors could provide water potentials for the growing media used throughout. What water potentials are the plants experiencing when grown in 1/2 MS + agar at 0, 75, and 150mM NaCl? Juenger and Verslues present a great recent discussion of the importance of reporting these values (Juenger, T. E. and P. E. Verslues (2023). "Time for a drought experiment: Do you know your plants' water status?" Plant Cell 35(1): 10-23.)

      Critically, how do the water potentials experienced by agar-grown plants compare to those experienced in soil-grown plants? As a stated aim of this study is to allow translation to crops these data are very important to convince physiologists of the relevance of the results.

      We thank the reviewer for raising this important point. We completely agree that growing plants on agar plates is an artificial setup and knowing the water potential of the plants within this setup would be highly informative. However, as indicated in review by Juenger and Verslues 2023, the agar plate setup is much more reproducible compared to various soil conditions, and we report the media composition in sufficient detail for it to be reproduced in other laboratory conditions.

      Furthermore, while investigating the water status of plants and soil is indeed intriguing, it is beyond the scope of this study and would require us to redo the experiments with specific tools listed within the Juennger and Verslues review, which are currently not within our laboratory equipment list.

      Importantly, any changes reported in this manuscript apply equally to both wild-type and mutant lines under all conditions. We provide extensive report on the soil type used, as well as soil quantity. We are using the gravimetric method to determine the water content, and salt stress application, as described in previous works from our lab (Yu and Sussman et al., 2024 Plant Physiology and Awlia et al., 2016 Frontiers in Plant Science). 

      Nonetheless, we have now included water content measurements for soil-grown plants under different conditions, calculated by subtracting dry weight from fresh weight (new Fig. S24). Although plant water content may not fully capture the water status of the media or soil, our measurements did not reveal any significant differences in water content between genotypes across the various conditions tested.

      Line 69- missing an "and" after "(ABA)."

      Thanks. We added the missing “and”.

      Line 79 - I think the association being made is between natural variation in root and shoot growth and genetic variants, not "underlying genes."

      We thank the reviewer for this suggestion. The cause for the identified association indeed relies on allelic variation within the genetic region. We have re-phrased this sentence within the manuscript.

      “Many forward genetic studies were highly successful in associating natural variation in root and shoot growth with allelic variation in gene coding and promoter regions, thereby identifying potential new target traits for improved stress resilience 18,20,21.”

      Figure 1 - what do "seGF" and "reGF" stand for? Shoot and root growth rate, respectively, but there are extra letters in there…

      The abbreviations stand for shoot exponential Growth Factor and root exponential Growth factor. An explanation of the acronym has been added to the text.

      “The increase in the projected area of shoot and root (Fig. S2) was used to estimate (A) shoot and (B) root exponential growth rate (seGR and reGR respectively).”

      Figure 1 legend - there's an "s" missing in "across." And two "additionally" in the penultimate sentence.

      Thanks for spotting the errors. We fixed these errors.

      Line 109 - how was the white balance estimated for the images on the flatbed scanner?

      Within the developed tool, we have not adjusted or controlled for white balance in any way, as the white balance from the flatbed scanner is kept at one value. The tool transforms the imaged pixels into bins consisting of white (root), green (shoot), and blue (place) pixels based on the closest distance in the RGB scale to the particular color, which makes correcting for white balance obsolete. We have provided an additional explanation for this within the M&M section.

      “A Matlab-based tool was developed to simplify and speed up the segmentation and analysis pipeline. For automatic segmentation, the tool uses a combination of image operations (histogram equalization), thresholding on different color spaces (e.g., RGB, YCbCr, Lab, HSV), and binary image processing (boundary and islands removal). As the tool is digitalizing various color scales and classifies pixels into either white (root), green (shoot) or blue (background) categories, the adjustment for white balance is obsolete. ”

      GWAS was performed separately on traits measured at control, 75mM, and 150mM NaCl treatments. Would it also be informative to map the STI measurement (i.e. plasticity) introduced here?

      We thank the reviewer for this important point. We have performed GWAS on both “raw” and STI traits, however, we found that the identified associations were not as abundant as the ones identified with “raw traits”. This makes sense, as we are compounding the root or shoot growth under both conditions, and plastic responses to the environment are expected to be genetically more complex, as they involve more genetic regulators compared to phenotypes that have low plasticity. We have added this as a part of the result description, as we acknowledge that this might be an interesting observation for the field to build upon, and might provide fodder for new methods to deconvolute the complexity in mapping the plastic traits. 

      “To identify genetic components underlying salt-induced changes in root:shoot ratio, we used the collected data as an input for GWAS. The associations were evaluated based on the p-value, the number of SNPs within the locus, and the number of traits associated with individual loci. As Bonferroni threshold differs depending on the minor allele count (MAC) considered, we identified significant associations based on a Bonferroni threshold for each subpopulation of SNPs based on MAC (Table S3). While we conducted a GWAS on directly measured traits, as well as their Salt Tolerance Index (STI) values, however the amount of associations with STI was much lower compared to directly measured traits (Table S3). This observation aligns with the understanding that plastic responses to environmental conditions tend to be genetically more complex. This complexity likely stems from the involvement of more genetic regulators compared to low-plasticity phenotypes.”

      Line 167 - how was LD incorporated into this analysis? Did you use a genome average? Or was LD allowed to vary (as it does) across the genome?

      Initially, we have used genome average LD for this purpose (10 kbp for Arabidopsis), and extended the region of interest based on the number of coding genes within the window. We have added this as a part of description to our manuscript.

      “For the most promising candidate loci (Table S4), we have identified the gene open reading frames that were located within the genome-wide linkage-disequilibrium (LD) of the associated SNPs. The LD was expanded if multiple SNPs were identified within the region, and the region of interest was expanded based on the number of coding genes within the LD window. ”

      Line 291 - I think the water potentials are essential, here. What does 50% of soil water holding capacity equal in these soils? In the substrate that we use in our lab, that would represent a considerable soil water deficit even without any salts in the soil.

      We thank the reviewer for this comment. As Arabidopsis is occurring naturally in low soil water holding capacity soils (i.e. sandy soils), it is typically growing better in soils that are not very saturated with the water. Throughout many experiments, performed within this study, and other studies performed in our lab (results reported in Awlia et al., 2016 Frontiers in Plant Science and Yu & Sussman et al., 2024 Plant Physiology), we have not observed any drought like symptoms at 50% soil water holding capacity. The fact that this is reproducible across similar soil types across two laboratories (one in Saudi Arabia and one in the USA) is not to be dismissed. Again - we are currently not equipped to measure water potentials for these plants, as this is not a standard practice (yet) for stress experiments, but we are taking these comments on board for all of our future experiments.

      Moreover, our control plants are also “dried down” to 50% of SWHC, and soaked in non-saline water during the “salt stress treatment” to make sure that the soil water saturation is accounted for within the experimental setup. This “dry down” of soil is necessary to ensure equal and effective salt penetration into the soil particles. More details on this method can be found in Awlia et al., 2016.

      Again - We have added a new dataset measuring water content in individually soil-grown plants under different conditions as a proxy for soil water status (see new Fig. S24). While we did not observe any significant differences in water content between genotypes under the various conditions, the sr3g mutant showed a slightly higher, though non-significant, water content compared to wild-type Col-0 under control conditions.

      We have provided additional information and comments to warn the readers about this method:

      “The seeds were germinated in ½ MS media for one week, as described for the agar-based plate experiments. One week after germination, the seedlings were transplanted to the pot (12 x 4 cm insert) containing the Cornell Mix soil (per batch combine: 0.16 m3 of peat moss, 20.84 kg of vermiculite, 0.59 kg of Uni-Mix fertilizer, and 2.27 kg of lime) watered to 100% water holding capacity and placed in the walk-in growth chamber with the 16 h light / 8 h dark period, 22°C and 60% relative humidity throughout the growth period. When all of the pots dried down to the weight corresponding to 50% of their water holding capacity, they were soaked for 1 h in tap water or a 200 mM NaCl solution, resulting in an effective concentration of 100 mM NaCl based on the 50% soil water holding capacity, which corresponded to a moderate level of salt stress (Awlia et al., 2016). The control pots were soaked for the same length of time in 0 mM NaCl solution, to account for the soil saturation effect. We then allowed the pots to be drained for 2-3 h to eliminate excess moisture. The pots were placed under phenotyping rigs equipped with an automated imaging system (Yu et al., 2023) and the pot weight was measured daily to maintain the reference weight corresponding to 50% of the soil water holding capacity throughout the experiment. We would like to note that this gravimetric based method for application of salt stress has been developed for soils typically used for pot-grown plants, with relatively high water holding capacity (Awlia et al. 2016). Within these specific conditions, no drought stress symptoms were observed.”

      Lines 415-416 - are these contrasts significant? Figure S3 likewise does not have any notation for significant differences in the means.

      We have previously not tested the stronger effect of 125 mM vs 75 mM on relative root and shoot growth, and thus these test results were initially not included in Fig. S3. We have now added the tests and included them within Fig. S3, and added description of their significance into the main body of the manuscript:

      “In comparison, the growth rates of the shoot were significantly reduced to 0.71 and 0.43 of the control in 75 and 125 mM NaCl treatments, respectively (Fig. S3). While the mean value of root:shoot growth rate did not change upon salt stress treatment, the variance in the root:shoot ratio significantly expanded with the increasing concentrations of salt (Fig. 1C). These results suggest that while root and shoot growth are well coordinated under non-stress conditions, salt stress exposure results in loss of coordination of organ growth across Arabidopsis accessions.”

      Line 418 - same comment as preceding. Is this change in variance significant?

      We have previously not tested this. We have now added the ANOVA tests and included them within each figure, and added description of their significance into the main body of the manuscript. (see text above)

      Line 421 - why would we expect there to be a correlation between root:shoot growth ratio and seedling size?

      We were trying to use the seedling size as a proxy for “fitness” - or how well the plants can survive under these specific conditions. We were testing here whether any simple and directional strategy - such as increase or decrease in root:shoot ratio under salt stress - is resulting in better salt tolerance - which would translate into larger overall seedlings. We have rephrased this within the manuscript, to better explain the hypothesis being tested within this specific figure:

      “To test whether there is a clear directional correlation between the change in root:shoot ratio and overall salt stress tolerance, we have used the overall seedling size as a proxy for plant salt tolerance (Fig. S4, S5). No significant correlation was found between the root:shoot growth ratio and total seedling size (Fig. S4, S5), indicating that the relationship between coordination of root and shoot growth and salt tolerance during the early seedling establishment is complex.”

      Line 438 - I think a stable web link would be more appropriate than listing Dr. Nordborg's email address.

      Sorry about this. There is a glitch with our reference citing software. We agree, and thank the reviewer for noticing this! We assigned reference number 43 to it.

      Line 439 - I expect that many of your readers may not be experienced with GWAS. Can you provide an explanation as to why only one locus was detected with both the 250K SNP panel and the 4M SNP panel?

      We thank the reviewer for raising this point. We have added additional explanation to this observation:

      “Increased SNP density can provide more potential associations, highlighting the associated loci with more confidence, due to more SNPs being detected within specific region. The different panels could capture different LD blocks across the genome. If the locus detected by both panels is in a region of strong LD or under selection, it could be detected consistently. In contrast, other loci may not be captured well by the lower-density 250K SNP panel. The new GWAS revealed 32 additional loci, with only one significantly associated locus being picked up by both 250k and 4M SNPs GWAS (locus 30, Table S3). The detection of only one common locus between the two SNP panels is likely due to differences in resolution, statistical power, and how well each panel captures the genomic regions associated with the trait. ”

      Figure 2A and B - I suggest adding the p-value cutoff to the y-axis of the Manhattan Plots

      We thank the reviewer for this suggestion, however this is not appropriate. The genome wide p-value cutoffs for GWAS studies are arbitrary, and we have not used a genome-wide cutoff for our SNPs, but rather used cutoffs depending on the minor allele frequency. Therefore, we think adding a straight line to the graphs in Fig. 2A-B representing the overall cutoff, would be misleading. Please see below the text where we explain how the threshold was calculated for individual groups of SNPs with varying MAF:

      “The GWAS associations were evaluated for minor allele count (MAC) and association strength above the Bonferroni threshold with -log10(p-value/#SNPs), calculated for each sub-population of SNPs above threshold MAC (Table S3, Bonf.threshold.MAC.specific)”

      Line 490-492 - Presents the results of the gene tree to support a model in which SR3G diverged from AT3G50150 prior to the speciation events leading to Capsella and Arabidopsis. But this topology requires at least two independent losses of SR3G - can you rule out the hypothesis that the position of SR3G on the gene tree is a result of long branch attraction? Given the syntenic orientation of AT3G50150 and SR3G, and apparent directional selection experienced by the latter lineage, it seems more parsimonious that AT3G50150 and SR3G arose from a very recent duplication event.

      We agree with the reviewer that it seemed most parsimonious for AT3G50160 (SR3G) to be a recent tandem duplication of AT3G50150 – and this was certainly our expectation given the other tandem duplications that have occurred in this genomic region. However, irrespective of the type of alignment from which we built the phylogeny (nucleotide vs AA; sometimes nucleotide is noisier but provides more information) we were never able to recapitulate a tree where AT3G50160 was immediately sister to AT3G50150 – even with a long branch for AT3G50160 indicating a rapid pace of nucleotide/AA change relative to AT3G50150. In regards to long branch attraction, it is our interpretation that long branch attraction typically requires multiple long branches that get placed together at a poorly supported node where sampling is sparse (https://www.nature.com/articles/s41576-020-0233-0), whereas we have the single long branch for AT3G50160, and all other A/C clade (Arabidopsis/Camelina/Capsella) members forming a lineage with a much shorter branch. To test the possibility of long branch attraction we subtracted out individual members of the AT3G50150/160 clade to see if there was algorithmic uncertainty in the placement of AT3G50160. We did not observe this in any of the branch subtractions that we performed (see below). Thus, it appears that we must stick with our original interpretation. If the reviewer would like us to soften this interpretation, we would be more than happy to do so, as it does not impact the overall conclusions for AT3G50160 being a rapidly evolving member of this clade.

      Author response image 1.

      Line 494 (and throughout) - I expect that all of the genes being studied herein are "experiencing selection," even if it's boring-old purifying selection on functionally conserved proteins. I think you mean to say "directional selection."

      We thank the reviewer for this comment and completely agree that we lacked precision on our statement. We have corrected this throughout the manuscript.

      Line 497 - state the background and foreground values of omega, here.

      We apologize for not including these values and have added them at this point in the manuscript (new Table S6).

      Line 511 and Line 673 - Inspection of Figure S13B suggests that SR3G is not "predominantly" expressed nor does it have the "highest enrichment" in the root stele. Certainly, among root cell types, this is predominant. But it appears to be quite highly expressed in late-stage seeds and some floral organs, as well.

      We appreciate the reviewer for recognizing that SR3G is not a highly expressed gene. In root cell types, its expression is enriched in the root stele. Overall, SR3G is expressed at both early and later developmental stages. Our investigation of later developmental stages related to seed production did not reveal any significant phenotypic differences in fertility.

      Line 514 - "54-folds" should be "54-fold."

      Thanks. We made corrections.

      Figure 7 - For symmetry, I suggest adding the "Beginning of salt stress" arrow to the "Early Stress" panel as well (even if it's right at day 0).

      Thanks. We added the arrow to Early Stress in both Panels A and B.

      Figure S2 - both graphs should have the same scale on the y-axis

      Thanks - we have now re-plotted the graph with the matching y-axis scales.

      Line 531 - I feel that this is a significant overstatement. The strongest statement supported by the results presented here is that SR3G is the most prominent DUF247 studied herein in root development under salt stress.

      Thanks for the comments. We rephrase the statement.

      “These results suggest that SR3G is the most prominent DUF247 studied within our study to affect root development under salt stress.”

      Lines 583-605 - These data seem to me to be tangential to the central aims of the study. I suggest removing them for clarity/brevity.

      We greatly appreciate the reviewer's suggestion. Our study primarily focused on characterizing the main GWAS candidate, SR3G. Since SR3G is located within a cluster of other DUF247 genes on chromosome 3, we believe that screening the neighboring DUF247 genes could provide further insights into SR3G’s role in root development. Additionally, we believe that the generated data and lines will serve as a valuable resource for other researchers interested in studying these genes. For these reasons, we have decided to retain these datasets in the manuscript.

      Lines 650-652 - these sections 1-3 differences in suberization between SR3G and Col-0 under control conditions are not significant. At best, this may be described as a "trend" and not "higher levels." In section 4, it is VERY marginally significant (and probably not at all after the large number of tests performed, here.)

      We appreciate the reviewer's feedback and have revised the wording accordingly.

      Line 660 - this statement is only true for Section 1. I suggest adding this caveat.

      We appreciate the reviewer's comments on this matter. We quantified four suberin monomers in whole root seedlings rather than in individual root sections due to the technical challenges of separating the sections without microscopy and the limited availability of samples for GS-MS analysis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Syngnathid fishes (seahorses, pipefishes, and seadragons) present very particular and elaborated features among teleosts and a major challenge is to understand the cellular and molecular mechanisms that permitted such innovations and adaptations. The study provides a valuable new resource to investigate the morphogenetic basis of four main traits characterizing syngnathids, including the elongated snout, toothlessness, dermal armor, and male pregnancy. More particularly, the authors have focused on a late stage of pipefish organogenesis to perform single-cell RNA-sequencing (scRNA-seq) completed by in situ hybridization analyses to identify molecular pathways implicated in the formation of the different specific traits. 

      The first set of data explores the scRNA-seq atlas composed of 35,785 cells from two samples of gulf pipefish embryos that authors have been able to classify into major cell types characterizing vertebrate organogenesis, including epithelial, connective, neural, and muscle progenitors. To affirm identities and discover potential properties of clusters, authors primarily use KEGG analysis that reveals enriched genetic pathways in each cell types. While the analysis is informative and could be useful for the community, some interpretations appear superficial and data must be completed to confirm identities and properties. Notably, supplementary information should be provided to show quality control data corresponding to the final cell atlas including the UMAP showing the sample source of the cells, violin plots of gene count, UMI count, and mitochondrial fraction for the overall

      dataset and by cluster, and expression profiles on UMAP of selected markers characterizing cluster identities. 

      We thank the reviewer for these suggestions, and have added several figures and supplemental files in response. We added a supplemental UMAP showing the sample that each cell originated (S1). We also added supplemental violin plots for each sample showing the gene count, unique molecular identifier (UMI) count, mitochondrial fraction, and the doublet scores (S2). We added feature plots of zebrafish marker genes for these major cell types and marker genes identified from our dataset to the supplement (S3:S57). We also provided two supplemental files with marker genes. These changes should clarify the work that went into labeling the clusters. Although some of the cluster labels are general, we decided it would be unwise to label clusters with speculated specific annotations. We only gave specific annotations to clusters with concrete markers and/or in situ hybridization (ISH) results that cemented an annotation.  As shown in the new supplemental figures and files, certain clusters had clear, specific markers while others did not. Therefore, we used caution when we annotated clusters without distinct markers. 

      The second set of data aims to correlate the scRNA-seq analysis with in situ hybridizations (ISH) in two different pipefish (gulf and bay) species to identify and characterize markers spatially, and validate cell types and signaling pathways active in them. While the approach is rational, the authors must complete the data and optimize labeling protocols to support their statements. One major concern is the quality of ISH stainings and images; embryos show a high degree of pigmentation that could hide part of the expression profile, and only subparts and hardly detectable tissues/stainings are presented. The authors should provide clear and good-quality images of ISH labeling on whole-mount specimens, highlighting the magnification regions and all other organs/structures (positive controls) expressing the marker of interest along the axis. Moreover, ISH probes have been designed and produced on gulf pipefish genome and cDNA respectively, while ISH labeling has been performed indifferently on bay or gulf pipefish embryos and larvae. The authors should specify stages and species on figure panels and should ensure sequence alignment of the probe-targeted sequences in the two species to validate ISH stainings in the bay pipefish. Moreover, spatiotemporal gene expression being a very dynamic process during embryogenesis, interpretations based on undefined embryonic and larval stages of pipefish development and compared to 3dpf zebrafish are insufficient to hypothesize on developmental specificities of pipefish features, such as on the absence of tooth primordia that could represent a very discrete and transient cell population. The ISH analyses would require a clean and precise spatiotemporal expression comparison of markers at the level of the entire pipefish and zebrafish specimens at well-defined stages, otherwise, the arguments proposed on teleost innovations and adaptations turn out to be very speculative. 

      We are appreciative of the reviewer’s feedback. We primarily used the in situ hybridization (ISH) data as supplementary to the scRNAseq library and we are aware that further evidence is necessary to identify origins of syngnathid’s evolutionary novelties. Our goal was to provide clues for the developmental genetic basis of syngnathid derived features.  We hope that our study will inspire future investigations and are excited for the prospect that future research could include this reviewer’s ideas. 

      All of the developmental stages and species information for the embryos used were in the figure captions as well as in supplemental file 6. Because we primarily used wild caught embryos, we did not have specific ages of most embryos. Syngnathid species are challenging to culture in the laboratory, and extracting embryos requires euthanizing the father which makes it difficult to obtain enough embryos for ISH. In addition, embryos do not survive long when removed from the brood pouch prematurely. We supplemented our ISH with bay pipefish caught off the Oregon coast because these fish have large broods. Wild caught pregnant male bay pipefish were immediately euthanized, and their broods were fixed. Because we did not have their age, we classified them based on developmental markers such as presence of somites and the extent of craniofacial elongation. Although these classification methods are not ideal, they are consistent with the syngnathid literature (Sommer et al. 2012). Since the embryos used for the ISH were primarily wild caught, we had a few different developmental stages represented in our ISH data. For our tooth primordia search, we used embryos from the same brood (therefore, same stage) for these experiments.

      We understand the concern for the degree of pigmentation in the samples. We completed numerous bleach trials before embarking on the in situ hybridization experiments. After completing a bleach trial with a probe created from the gene tnmd for ISH_,_ we noticed that the bleached embryos were missing expression domains found in the unbleached embryos. We were, therefore, concerned that using bleached embryos for our experiments would result incorrect conclusions about the expression domains of these genes. We sparingly used bleaching at older stages, hatched larvae, where it was fundamentally necessary to see staining. As stated above, the primary goal of this manuscript was to generate and annotate the first scRNA-seq atlas in a syngnathid, and the ISHs were utilized to support inferred cluster annotations only through a positive identification of marker gene expression in expected tissues/cells. Therefore, the obscuring of gene expression by pigmentation would have resulted in the absence of evidence for a possible cluster annotation, not an incorrect annotation.

      For the ease of viewing the ISHs, we improved annotations and clarity. We increased the brightness and contrast of images. In the original submission, we had to lower the image resolution to make the submission file smaller. We hope that these improvements plus the true image quality improves clarity of ISH results. We also included alignments in our supplementary files of bay pipefish sequences to the Gulf pipefish probes to showcase the high degree of sequence similarity. 

      Sommer, S., Whittington, C. M., & Wilson, A. B. (2012). Standardised classification of pre-release development in male-brooding pipefish, seahorses, and seadragons (Family Syngnathidae). BMC Developmental Biology, 12, 12–15. 

      To conclude, whereas the scRNA-seq dataset in this unconventional model organism will be useful for the community, the spatiotemporal and comparative expression analyses have to be thoroughly pushed forward to support the claims. Addressing these points is absolutely necessary to validate the data and to give new insights to understand the extraordinary evolution of the Syngnathidae family. 

      We really appreciate the reviewer’s enthusiasm for syngnathid research, and hope that the additional files and explanation of the supporting role of the ISHs have adequately addressed their concerns. We share the reviewer’s enthusiasm and are excited for future work that can extend this study. 

      Reviewer #2 (Public Review):

      Summary: 

      The authors present the first single-cell atlas for syngnathid fishes, providing a resource for future evolution & development studies in this group. 

      Strengths: 

      The concept here is simple and I find the manuscript to be well written. I like the in situ hybridization of marker genes - this is really nice. I also appreciate the gene co-expression analysis to identify modules of expression. There are no explicit hypotheses tested in the manuscript, but the discovery of these cell types should have value in this organism and in the determination of morphological novelties in seahorses and their relatives.  

      We are grateful for this reviewer’s appreciation of the huge amount of work that went into this study, and we agree that the in situ hybridizations (ISHs) support the scRNAseq study as we intended. We appreciate that the reviewer thinks that this work will add value to the syngnathid field.

      Weaknesses: 

      I think there are a few computational analyses that might improve the generality of the results. 

      (1) The cell types: The authors use marker gene analysis and KEGG pathways to identify cell types. I'd suggest a tool like SAMap (https://elifesciences.org/articles/66747) which compares single-cell data sets from distinct organisms to identify 'homologous' cell types - I imagine the zebrafish developmental atlases could serve as a reasonable comparative reference. 

      We appreciate the reviewer’s request, and in fact we would have loved to integrate our dataset with zebrafish. However, syngnathid’s unique craniofacial development makes it challenging to determine the appropriate stage for comparison. While 3 days post fertilization (dpf) zebrafish data were appropriate for comparisons of certain cell types (e.g. epidermal cells), it would have been problematic for other cell types (e.g. osteoblasts) that are not easily detectable until older zebrafish stages. Therefore, determining equivalent stages between these species is difficult and contains potential for error. Future research should focus on trying to better match stages across syngnathids and zebrafish (and other fish species such as stickleback). Studies of this nature promise to uncover the role of heterochrony in the evo-devo of syngnathid’s unique snouts.

      (2) Trajectory analyses: The authors suggest that their analyses might identify progenitor cell states and perhaps related differentiated states. They might explore cytoTRACE and/or pseudotime-based trajectory analyses to more fully delineate these ideas.

      We thank the reviewer for this suggestion! We added a trajectory analysis using cytoTRACE to the manuscript. It complemented our KEGG analysis well (L172-175; S73) and has improved the manuscript.

      (3) Cell-cell communication: I think it's very difficult to identify 'tooth primordium' cell types, because cell types won't be defined by an organ in this way. For instance, dental glia will cluster with other glia, and dental mesenchyme will likely cluster with other mesenchymal cell types. So the histology and ISH is most convincing in this regard. Having said this, given the known signaling interactions in the developing tooth (and in development generally) the authors might explore cell-cell communication analysis (e.g., CellChat) to identify cell types that may be interacting. 

      We agree! It would have been a wonderful addition to the paper to include a cell-cell communication analysis. One limitation of CellChat is that it only includes mouse and human orthologs. Given concerns of reviewer #3 for mouse-syngnathid comparisons, we decided to not pursue CellChat for this study. We are looking forward to future cell communication resources that include teleost fishes.

      Reviewer #3 (Public Review): 

      Summary: 

      This study established a single-cell RNA sequencing atlas of pipefish embryos. The results obtained identified unique gene expression patterns for pipefish-specific characteristics, such as fgf22 in the tip of the palatoquadrate and Meckel's cartilage, broadly informing the genetic mechanisms underlying morphological novelty in teleost fishes. The data obtained are unique and novel, potentially important in understanding fish diversity. Thus, I would enthusiastically support this manuscript if the authors improve it to generate stronger and more convincing conclusions than the current forms. 

      Thank you, we appreciate the reviewer’s enthusiasm!

      Weaknesses: 

      Regarding the expression of sfrp1a and bmp4 dorsal to the elongating ethmoid plate and surrounding the ceratohyal: are their expression patterns spatially extended or broader compared to the pipefish ancestor? Is there a much closer species available to compare gene expression patterns with pipefish? Did the authors consider using other species closely related to pipefish for ISH? Sfrp1a and bmp4 may be expressed in the same regions of much more closely related species without face elongation. I understand that embryos of such species are not always accessible, but it is also hard to argue responsible genes for a specific phenotype by only comparing gene expression patterns between distantly related species (e.g., pipefish vs. zebrafish). Due to the same reason, I would not directly compare/argue gene expression patterns between pipefish and mice, although I should admit that mice gene expression patterns are sometimes helpful to make a hypothesis of fish evolution. Alternatively, can the authors conduct ISH in other species of pipefish? If the expression patterns of sfrp1a and bmp4 are common among fishes with face elongation, the conclusion would become more solid. If these embryos are not available, is it possible to reduce the amount of Wnt and BMP signal using Crispr/Cas, MO, or chemical inhibitor? I do think that there are several ways to test the Wnt and/or BMP hypothesis in face elongation. 

      We appreciate the reviewer’s suggestion, and their recognition for challenges within this system. In response to this comment, we completed further in situ hybridization experiments in threespine stickleback, a short snouted fish that is much more closely related to syngnathids than is zebrafish, to make comparisons with pipefish craniofacial expression patterns (S76-S79). We added ISH data for the signaling genes (fgf22, bmp4, and sfrp1a) as well as prdm16. Through adding this additional ISH results, we speculated that craniofacial expression of bmp4, sfrp1a, and prdm16 is conserved across species. However, compared to the specific ceratohyal/ethmoid staining seen in pipefish, stickleback had broad staining throughout the jaws and gills. These data suggest that pipefish have co-opted existing developmental gene networks in the development of their derived snouts. We added this interpretation to the results and discussion of the manuscript (L244-L248; L262-277; L444-470).

      Recommendations for the authors:  

      Reviewing Editor (Recommendations for the Authors)

      We hope that the eLife assessment, as well as the revisions specified here, prove helpful to you for further revisions of your manuscript. 

      Revisions considered essential: 

      (1) Marker genes and single-cell dataset analyses. While these analyses have been performed to a good standard in broad terms, there is a majority view here that cell type annotations and trajectory analyses can be improved. In particular, there is question about the choice of marker genes for the current annotation. For one it can depend on the use of single marker genes (see tnnti1 example for clusters 17 and 31). Here, we recommend incorporating results from SAMap and trajectory analysis (e.g., cytoTRACE or standard pseudotime).

      Because of the reviewer comments, we became aware that we insufficiently communicated how cell clusters were annotated. We did mention in the manuscript that we did not use single marker genes to annotate clusters, but instead we used multiple marker genes for each cluster for the annotation process. We used both marker genes derived from our dataset and marker genes identified from zebrafish resources for cluster annotation. We chose single marker genes for each cluster for visualization purposes and for in situ hybridizations. However, it is clear from the reviewers’ comments that we needed to make more clear how the annotations were performed. To make this effort more clear in our revision, we included two new supplementary files – one with Seurat derived marker genes and one with marker genes derived from our DotPlot method. We also included extensive supplementary figures highlighting different markers. Using Daniocell, we identified 6 zebrafish markers per major cell type and showed their expression patterns in our atlas with FeaturePlots. We also included feature plots of the top 6 marker genes for each cluster. We hope that the addition of these 40+ plots (S3:S57) to the supplement fully addresses these concerns. 

      We appreciated the suggestion of cytotrace from reviewer #2! We ran cytotrace on three major cell lineages (neural, muscle, and connective; S73) which complemented our KEGG analysis in suggesting an undifferentiated fate for clusters 8, 10, and 16. We chose to not run SAMap because it is a scRNA-seq library integration tool. Although we compared our lectin epidermal findings to 3 dpf zebrafish scRNA-seq data, we did not integrate the datasets out of concern that we could draw erroneous conclusions for other cell types.  Future work that explores this technical challenge may uncover the role of heterochrony in syngnathid craniofacial development. We detail these changes more fully in our responses to reviewers.

      (2) The claims regarding evolutionary novelty and/or the genes involved are considered speculative. In part, this comes from relying too heavily on comparisons against zebrafish, as opposed to more closely related species. For example, the discussion regarding C-type lectin expression in the epidermis and KEGG enrichment (lines 358 - 364) seems confusing. Another good example here is the discussion on sfrp1a (lines 258 - 261). Here, the text seems to suggest craniofacial sfrp1a expression (or specifically ethmoid expression?) is connected to the development of the elongated snout in pipefish. However, craniofacial expression of sfrp1a is also reported in the arctic charr, which the authors grouped into fishes with derived craniofacial structures. Separately, sfrp2 expression was also reported in stickleback fish, for example. Do these different discussions truly support the notion that sfrp1a expression is all that unique in pipefish, rather than that pipefish and zebrafish are only distantly related and that sfrp1a was a marker gene first, and co-opted gene second? The authors should respond to the comments in the public review related to this aspect, and include more informative comparison and discussion. 

      A much more nuanced discussion with appropriate comparisons and caveats would be strongly recommended here.  

      We appreciate this insight and used it as a motivator to complete and add select comparative ISH data to this manuscript. We added in situ hybridization experiments from stickleback fish for craniofacial development genes (sfrp_1a, prdm16, bmp4_, and fgf22; S76-S79).  After adding stickleback ISH to the manuscript, we were able to make comparisons between pipefish and stickleback patterns and draw more informed conclusions (L244-L248; L262-277; L444-470). We added additional nuance to the discussion of the head, tooth (L485-489), and male pregnancy (L358-L391) sections to address concerns of study limitations. We describe in more detail these additional data in response to reviewers.

      (3) In situ hybridization results: as already included above, there is generally weak labeling of species, developmental stages, and other markings that can provide context. The collective feeling here is that as it is currently presented, the ISH results do not go too far beyond simply illustrative purposes. To take these results further, more detailed comparison may be needed. At a minimum, far better labeling can help avoid making the wrong impression. 

      Based on the reviewers’ comments, we made changes to improve ISH clarity and add select comparative ISH findings. ISH was used to further interpretation of the scRNAseq atlas. All the developmental stages and species information for the embryos used were in the figure captions as well as in supplemental file 4. Since we primarily used wild caught embryos, we did not have specific ages of most embryos. The technical challenges of acquiring and staging Syngnathus embryos are detailed above. Because we did not have their age, we classified them based on developmental markers (such as presence of somites and the extent of craniofacial elongation). Although these classification methods are not ideal, they are consistent with the syngnathid literature (Sommer et al. 2012).  

      We followed reviewer #1’s recommendations by adding an annotated graphic of a pipefish head, aligning bay and Gulf pipefish sequences for the probe regions, expanding out our supplemental figures for ISH into a figure for each probe, and improving labeling. These changes improved the description of the ISH experiments and have increased the quality of the manuscript.

      We would have loved to complete detailed comparative studies as suggested, but doing such a complete analysis was not feasible for this study. Therefore, we completed an additional focused analysis. We followed reviewer #3’s idea and added ISHs from threespine stickleback, a short snouted fish, for 4 genes (sfrp1a, prdm16, fgf22, and bmp4). While more extensive ISHs tracking all marker genes through a variety of developmental stages in pipefish and stickleback would have provided crucial insights, we feel that it is beyond the scope of this study and would require a significant amount of additional work. We, thus, primarily interpreted the ISH results as illustrative data points in our discussion. As we state in the response to reviewer 1, the generation and annotation of the first scRNA-seq atlas in a syngnathid is the primary goal of this manuscript.  The ISHs were utilized primarily to support inferred cluster annotations if a positive identification of marker gene expression in expected tissues/cells occurred. 

      Reviewer #1 (Recommendations For The Authors): 

      While the scRNA-seq dataset offers a valuable resource for evo-devo analyses in fish and the hypotheses are of interest, critical aspects should be strengthened to support the claims of the study. 

      Concerning the scRNA-seq dataset, the major points to be addressed are listed below: 

      - Supplementary file 3 reports the single markers used to validate cluster annotations. To confirm cluster identities, more markers specific to each cluster should be highlighted and presented on the UMAP. 

      We recognize the reviewer’s concern and had in reality used numerous markers to annotate the clusters. Based upon the reviewer’s comment we decided to make this clear by creating feature plots for every cluster with the top 6 marker genes. These plots showcase gene specificity in UMAP space. We also added feature plots for zebrafish marker genes for key cell types. Through these changes and the addition of 54 supplementary figures (S3:S57), we hope that it is clear that numerous markers validated cluster identity.

      For example, as clusters 17 and 37 share the same tnnti1 marker, which other markers permit to differentiate their respective identity. 

      This is a fair point. Cluster 17 and 37 both are marked by a tnni1 ortholog.

      Different paralogous co-orthologs mark each cluster (cluster 17: LOC125989146; cluster 37: LOC125970863). In our revision to the above comment, additional (6) markers per cluster were highlighted which should remedy this concern. 

      - L146: the low number of identified cartilaginous cells (only 2% of total connective tissue cells) appears aberrant compared to bone cell number, while Figure 1 presents a welldeveloped cartilaginous skeleton with poor or no signs of ossification. Please discuss this point. 

      We also found this to be interesting and added a brief discussion on this subject to the results section (L147-L149). Single cell dissociations can have variable success for certain cell types. It is possible that the cartilaginous cells were more difficult to dissociate than the osteoblast cells.

      - L162: pax3a/b are not specific to muscle progenitors as the genes are also expressed in the neural tube and neural crest derivatives during organogenesis. Please confirm cluster 10 identity.  

      Thank you for the reminder, we added numerous feature plots that explored zebrafish (from Daniocell) and pipefish markers (identified in our dataset). Examining zebrafish satellite muscle markers (myog, pabpc4, and jam2a) shows a strong correspondence with cluster #10.

      - L198: please specify in the text the pigment cell cluster number. 

      We completed this change.

      - L199: it is not clear why considering module 38 correlated to cluster 20 while modules 2/24 appear more correlated according to the p-value color code. 

      We thank the reviewer for pointing this confusing element out! Although the t-statistic value for module 38 (3.75) is lower than the t-statistics for modules 2 and 24 (5.6 and 5.2, respectively), we chose to highlight module 38 for its ‘connectivity dependence’ score. In our connectivity test, we examined whether removing cells from a specific cell cluster reduced the connectivity of a gene network. We found that removing cluster 20 led to a decrease in module 38’s connectivity (-.13, p=0) while it led to an increase in modules 2 and 24’s connectivity (.145, p=1; .145, p=9.14; our original supplemental files 9-10). Therefore, the connectivity analysis showed that module 38’s structure was more dependent on cluster 20 than in comparison with modules 2 and 24. Although you highlighted an interesting quandary, we decided that this is tangential to the paper and did not add this discussion to the manuscript. 

      - Please describe in the text Figure 4A. 

      Completed, we thank the reviewer for catching this! 

      Concerning embryo stainings, the major points to be addressed are listed below: 

      - Figure 1: please enhance the light/contrast of figures to highlight or show the absence of alcian/alizarin staining. Mineralized structures are hardly detectable in the head and slight differences can be seen between the two samples. The developmental stage should be added. Please homogenize the scale bar format (remove the unit on panels E and, G as the information is already in the text legend). It would be useful to illustrate the data with a schematic view of the structures presented in panels B, and E, and please annotate structures in the other panels.  

      We thank the reviewer for these suggestions to improve our figure. We increased the brightness and contrast for all our images. We also added an illustration of the head with labels of elements. As discussed, we used wild caught pregnant males and, therefore, do not know the exact age of the specimens. However, we described the developmental stage based on morphological observations. Slight differences in morphology between samples is expected. We and others have noticed that

      developmental rate varies, even within the same brood pouch, for syngnathid embryos. We observed several mineralization zones including in the embryos including the upper and lower jaws, the mes(ethmoid), and the pectoral fin. We recognize the cartilage staining is more apparent than the bone staining, though increasing image brightness and contrast did improve the visibility of the mineralization front.

      - All ISH stainings and images presented in Figures 4-6/ Figures S2-3 should be revised according to comments provided in the public review. 

      We thank the reviewer for providing thorough comments, we provided an in-depth response to the public review. We made several improvements to the manuscript to address their concerns. 

      - Figure 4: Figure 4B should be described before 4C in the text or inverse panels / L222 the Meckel's cartilage is not shown on Figure 4C. The schematic views in H should be annotated and the color code described / the ISH data must be completed to correlate spatially clusters to head structures. 

      We thank the reviewer for pointing this out, we fixed the issues with this figure and added annotations to the head schematics.

      - Figure 5: typo on panels 'alician' = alcian. 

      We completed this change. 

      - Figures S2-3: data must be better presented, polished / typo in captions 'relavant'= relevant. 

      Thank you for this critique, we created new supplementary figures to enhance interpretation of the data (S59-S71). In these new figures, we included a feature plot for each gene and respective ISHs.

      - Figure S3: soat2 = no evidence of muscle marker neither by ISH presented nor in the literature. 

      We realized this staining was not clear with the previous S2/S3 figures. Our new changes in these supplementary figures based on the reviewer’s ideas made these ISH results clearer. We observed soat2 staining in the sternohyoideus muscle (panel B in S71).

      Other points: 

      - The cartilage/bone developmental state (Alcian/alizarin staining) and/or ISH for classical markers of muscle development (such as pax3/myf5) could be used to clarify the This could permit the completion of a comparative analysis between the two species and the interpretation of novel and adaptative characters.  

      We appreciate this idea! We thought deeply about a well characterized comparative analysis between pipefish and zebrafish for this study. We discussed our concerns in our public response to reviewer 2. We found that it was challenging to stage match all cell types, and were concerned that we could make erroneous conclusions. For example, our pipefish samples were still inside the male brood pouch and possessed yolk sacs. However, we found osteoblast cells in our scRNAseq atlas, and in alizarin staining. Although zebrafish literature notes that the first zebrafish bone appears at 3 dpf (Kimmel et al. 1995), osteoblasts were not recognized until 5 dpf in two scRNAseq datasets (Fabian et al. 2022; Lange et al. 2023). A 5dpf zebrafish is considered larval and has begun hunting. Therefore, we chose to not integrate our data out of concern that osteoblast development may occur at different timelines between the fishes. 

      Fabian, P., Tseng, K.-C., Thiruppathy, M., Arata, C., Chen, H.-J., Smeeton, J., Nelson, N., & Crump, J. G. (2022). Lifelong single-cell profiling of cranial neural crest diversification in zebrafish. Nature Communications 2022 13:1, 13(1), 1–13. 

      Lange, M., Granados, A., VijayKumar, S., Bragantini, J., Ancheta, S., Santhosh, S., Borja, M., Kobayashi, H., McGeever, E., Solak, A. C., Yang, B., Zhao, X., Liu, Y., Detweiler, A. M., Paul,

      S., Mekonen, H., Lao, T., Banks, R., Kim, Y.-J., … Royer, L. A. (2023). Zebrahub – Multimodal Zebrafish Developmental Atlas Reveals the State-Transition Dynamics of Late-Vertebrate Pluripotent Axial Progenitors. BioRxiv, 2023.03.06.531398. 

      Kimmel, C., Ballard, S., Kimmel, S., Ullmann, B., Schilling, T. (1995). Stages of Embryonic Development of the Zebrafish. Developmental Dynamics 203:253:-310.

      'in situs' in the text should be replaced by 'in situ experiments'.  

      We made this change (L395, L663, L666, L762).

      - Lines 562-565: information on samples should be added at the start of the result section to better apprehend the following scRNA-seq data.

      We thank the reviewer for pointing out this issue. Although we had a few sentences on the samples in the first paragraph of the result section, we understand that it was missing some critical pieces of information. Therefore, we added these additional details to the beginning of the results section (L126-L132). 

      - Lines 629-665: PCR with primers designed on gulf pipefish genome could be performed in parallel on bay and gulf cDNA libraries, and amplification products could be sequenced to analyze alignment and validate the use of gulf pipefish ISH probes in bay pipefish embryos. Probe production could also be performed using gulf primers on bay pipefish cDNA pools. 

      After the submission of this manuscript, a bay pipefish genome was prepared by our laboratory. We used this genome to align our probes, these alignments demonstrate strong sequence conservation between the species. We included these alignments in our supplemental files.

      - L663: the bleaching step must be optimized on pipefish embryos. 

      We understand this concern and had completed several bleach optimization experiments prior to publication. Although we found that bleaching improved visibility of staining, we noticed with the probe tnmd that bleached embryos did not have complete staining of tendons and ligaments. The unbleached embryos had more extensive staining than the bleached embryos. We were concerned that bleaching would lead to failures to detect expression domains (false negatives) important for our analysis. Therefore, we did not use bleaching with our in situs experiments (except with hatched fish with a high degree of pigmentation). 

      - Indicate the number of specimens analyzed for each labeling condition.  

      We thank the reviewer for noticing this issue. We added this information to the methods (L766-767).

      - Describe the fixation and pre-treatment methods previous to ISH and skeleton stainings

      We thank the reviewer for pointing out this issue, we added these descriptions (L765-766; L772-774). 

      Reviewer #3 (Recommendations For The Authors): 

      (1) If sfrp1a expression is observed also in other fish species with derived craniofacial structures, it's important to discuss this more in the Discussion. This could be a common mechanism to modify craniofacial structures, although functional tests are ultimately required (but not in this paper, for sure). Can lines 421-428 involve the statement "a prolonged period of chondrocyte differentiation" underlies craniofacial diversity?

      This is a great idea, and we added a sentence that captures this ethos (L451-452).

      (2) Lines 334-346 need to be rephrased. It's hard to understand which genes are expressed or not in pipefish and zebrafish. Did "23 endocytosis genes" show significant enrichment in zebrafish epidermis, or are they expressed in zebrafish epidermis? 

      We thank the reviewer for this comment, we re-phrased this section for clarity (L365-368).

      (3) Figure 4 is missing the "D" panel and two "E" panels. 

      We thank the reviewer for noticing this, we fixed this figure.

      (4) Line 302: "whole-mount" or "whole mount"

      We thank the reviewer for the catch!

    1. Author response:

      Reviewer #1 (Public review):

      Comment 1: In the Results section, the rationale behind selecting the beta band for the central (C3, CP3, Cz, CP4, C4) regions and the theta band for the fronto-central (Fz, FCz, Cz) regions is not clearly explained in the main text. This information is only mentioned in the figure captions. Additionally, why was the beta band chosen for the S-ROI central region and the theta band for the S-ROI fronto-central region? Was this choice influenced by the MVPA results?

      We thank the reviewer for the question regarding the rationale for the S-ROI selection in our study. The beta band was chosen for the central region due to its established relevance in motor control (Engel & Fries, 2010), movement planning (Little et al., 2019) and motor inhibition (Duque et al., 2017). The fronto-central theta band (or frontal midline theta) was a widely recognized indicator in cognitive control research (Cavanagh & Frank, 2014), associated with conflict detection and resolution processes. Moreover, recent empirical evidence suggested that the fronto-central theta reflected the coordination and integration between stimuli and responses (Senoussi et al., 2022). Although we have described the cognitive processes linked to these different frequencies in the introduction and discussion sections, along with the potential patterns of results observed in Stroop-related studies, we did not specify the involved cortical areas. Therefore, we have specified these areas in the introduction to enhance the clarity of the revised version (in the fourth paragraph of the Introduction section).

      Regarding whether the selection of S-ROIs was influenced by the MVPA results, we would like to clarify here that we selected the S-ROIs based on prior research and then conducted the decoding analysis. Specifically, we first extracted the data representing different frequency indicators (three F-ROIs and three S-ROIs) as features, followed by decoding to obtain the MVPA results. Subsequently, the time-frequency analysis, combined with the specific time windows during which each frequency was decoded, provided detailed interaction patterns among the variables for each indicator. The specifics of feature selection are described in the revised version (in the first paragraph of the Multivariate Pattern Analysis section).

      Comment 2: In the Data Analysis section, line 424 states: “Only trials that were correct in both the memory task and the Stroop task were included in all subsequent analyses. In addition, trials in which response times (RTs) deviated by more than three standard deviations from the condition mean were excluded from behavioral analyses.” The percentage of excluded trials should be reported. Also, for the EEG-related analyses, were the same trials excluded, or were different criteria applied?

      We thank the reviewer for this suggestion. Beyond the behavioral exclusion criteria, trials with EEG artifacts were also excluded from the data for the EEG-related analyses. We have now reported the percentage of excluded trials for both behavioral and EEG data analyses in the revised version (in the second paragraph of the EEG Recording and Preprocessing section and the first paragraph of the Behavioral Analysis section).

      Comment 3: In the Methods section, line 493 mentions: “A 400-200 ms pre-stimulus time window was selected as the baseline time window.” What is the justification in the literature for choosing the 400-200 ms pre-stimulus window as the baseline? Why was the 200-0 ms pre-stimulus period not considered?

      We thank the reviewer for this question and would like to provide the following justification. First, although a baseline ending at 0 ms is common in ERP analyses, it may not be suitable for time-frequency analysis. Due to the inherent temporal smoothing characteristic of wavelet convolution in time-frequency decomposition, task-related early activities can leak into the pre-stimulus period (before 0 ms) (Cohen, 2014). This means that extending the baseline to 0 ms will include some post-stimulus activity in the baseline window, thereby increasing baseline power and compromising the accuracy of the results. Second, an ideal baseline duration is recommended to be around 10-20% of the entire trial of interest (Morales & Bowers, 2022). In our study, the epoch duration was 2000 ms, making 200-400 ms an appropriate baseline length. Third, given that the minimum duration of the fixation point before the stimulus in our experiment was 400 ms, we chose the 400 ms before the stimulus as the baseline point to ensure its purity. In summary, considering edge effects, duration requirements, and the need to exclude other influences, we selected a baseline correction window of -400 to -200 ms. To enhance the clarity of the revised version, we have provided the rationale for the selected time windows along with relevant references (in the first paragraph of the Time-frequency analysis section).

      Comment 4: Is the primary innovation of this study limited to the methodology, such as employing MVPA and RSA to establish the relationship between late theta activity and behavior?

      We thank the reviewer for this insightful question and would like to clarify that our research extends beyond mere methodological innovation; rather, it utilized new methods to explore novel theoretical perspectives. Specifically, our research presents three levels of innovation: methodological, empirical, and theoretical. First, methodologically, MVPA overcame the drawbacks of traditional EEG analyses based on specific averaged voltage intensities, providing new perspectives on how the brain dynamically encoded particular neural representations over time. Furthermore, RSA aimed to identify which indicators among the decoded were directly related to behavioral representation patterns. Second, in terms of empirical results, using these two methods, we have identified for the first time three EEG markers that modulate the Stroop effect under verbal working memory load: SP, late theta, and beta, with late theta being directly linked to the elimination of the behavioral Stroop effect. Lastly, from a theoretical perspective, we proposed the novel idea that working memory played a crucial role in the late stages of conflict processing, specifically in the stimulus-response mapping stage (the specific theoretical contributions are detailed in the second-to-last paragraph of the Discussion section).

      Comment 5: On page 14, lines 280-287, the authors discuss a specific pattern observed in the alpha band. However, the manuscript does not provide the corresponding results to substantiate this discussion. It is recommended to include these results as supplementary material.

      We thank the reviewer for this suggestion. We added a new figure along with the corresponding statistical results that displayed the specific result patterns for the alpha band (Supplementary Figure 1).

      Comment 6: On page 16, lines 323-328, the authors provide a generalized explanation of the findings. According to load theory, stimuli compete for resources only when represented in the same form. Since the pre-memorized Chinese characters are represented semantically in working memory, this explanation lacks a critical premise: that semantic-response mapping is also represented semantically during processing.

      We thank the reviewer for this insightful suggestion. We fully agree with the reviewer’s perspective. As stated in our revised version, load theory suggests that cognitive resources are limited and dependent on a specific type (in the second paragraph of the Discussion section). The previously memorized Chinese characters are stored in working memory in the form of semantic representations; meanwhile the stimulus-response mapping should also be represented semantically, leading to resource occupancy. We have included this logical premise in the revised version (in the third-to-last paragraph of the Discussion section).

      Comment 7: The classic Stroop task includes both a manual and a vocal version. Since stimulus-response mapping in the vocal version is more automatic than in the manual version, it is unclear whether the findings of this study would generalize to the impact of working memory load on the Stroop effect in the vocal version.

      We fully agree with the reviewer’s point that the verbal version of the Stroop task differs from the manual version in terms of the degree of automation in the stimulus-response mapping. Specifically, the verbal version relies on mappings that are established through daily language use, while the manual version involves arbitrary mappings created in the laboratory. Therefore, the stimulus-response mapping in the verbal response version is more automated and less likely to be suppressed. However, our previous research indicated that the degree of automation in the stimulus-response mapping was influenced by practice (Chen et al., 2013). After approximately 128 practice trials, semantic conflict almost disappears, suggesting that the level of automation in stimulus-response mapping for the verbal Stroop task is comparable to that of the manual version (Chen et al., 2010). Given that participants in our study completed 144 practice trials (in the Procedure section), we believe these findings can be generalized to the verbal version.

      Comment 8: While the discussion section provides a comprehensive analysis of the study’s results, the authors could further elaborate on the theoretical and practical contributions of this work.

      We thank the reviewer for the constructive suggestions. We recognize that the theoretical and practical contributions of the study were not thoroughly elaborated in the original manuscript. Therefore, we have now provided a more detailed discussion. Specifically, the theoretical contributions focus on advancing load theory and highlighting the critical role of working memory in conflict processing. The practical contributions emphasize the application of load theory and the development of intervention strategies for enhancing inhibitory control. A more detailed discussion can be found in the revised version (in the second-to-last paragraph of the Discussion section).

      Reviewer #2 (Public review):

      Comment 1: As the researchers mentioned, a previous study reported a diminished Stroop effect with concurrent working memory tasks to memorize meaningless visual shapes rather than memorize Chinese characters as in the study. My main concern is that lower-level graphic processing when memorizing visual shapes also influences the Stroop effect. The stage of Stroop conflict processing affected by the working memory load may depend on the specific content of the concurrent working memory task. If that’s the case, I sense that the generalization of this finding may be limited.

      We thank the reviewer for this insightful concern. As mentioned in the manuscript, this may be attributed to the inherent characteristics of Chinese characters. In contrast to English words, the processing of Chinese characters relies more on graphemic encoding and memory (Chen, 1993). Therefore, the processing of line patterns essentially occupies some of the resources needed for character processing, which aligns with our study’s hypothesis based on dimensional overlap. Additionally, regarding the results, even though the previous study presents lower-level line patterns, the results still showed that the working memory load modulated the later theta band. We hypothesize that, regardless of the specific content of the pre-presented working memory load, once the stimulus disappears from view, these loads are maintained as representations in the working memory platform. Therefore, they do not influence early perceptual processing, and resource competition only occurs once the distractors reach the working memory platform. Lastly, previous study has shown that spatial loads, which do not overlap with either the target or distractor dimensions, do not influence conflict effect (Zhao et al., 2010). Taken together, we believe that regardless of the specific content of the concurrent working memory tasks, as long as they occupy resources related to irrelevant stimulus dimensions, they can influence the late-stage processing of conflict effect. Perhaps our original manuscript did not convey this clearly, so we have rephrased it in a more straightforward manner (in the second paragraph of the Discussion section).

      Comment 2: The P1 and N450 components are sensitive to congruency in previous studies as mentioned by the researchers, but the results in the present study did not replicate them. This raised concerns about data quality and needs to be explained.

      We thank the reviewer for this insightful concern. For P1, we aimed to convey that the early perceptual processing represented by P1 is part of the conflict processing process. Therefore, we included it in our analysis. Additionally, as mentioned in the discussion, most studies find P1 to be insensitive to congruency. However, we inappropriately cited a study in the introduction that suggested P1 shows differences in congruency, which is among the few studies that hold this perspective. To prevent confusion for readers, we have removed this citation from the introduction.

      As for N450, most studies have indeed found it to be influenced by congruency. In our manuscript, we did not observe a congruency effect at our chosen electrodes and time window. However, significant congruency effects were detected at other central-parietal electrodes (CP3, CP4, P5, P6) during the 350-500 ms interval. The interaction between task type and consistency remained non-significant, consistent with previous results. Furthermore, with respect to the location of the electrodes chosen, existing studies on N450 vary widely, including central-parietal electrodes and frontal-central electrodes (for a review, see Heidlmayr et al., 2020). We speculate that this phenomenon may be related to the extent of practice. With fewer total trials, the task may involve more stimulus conflicts, engaging more frontal brain areas. On the other hand, with more total trials, the task may involve more response conflicts, engaging more central-parietal brain areas (Chen et al., 2013; van Veen & Carter, 2005). Due to the extensive practice required in our study, we identified a congruency N450 effect in the central-parietal region. We apologize for not thoroughly exploring other potential electrodes in the previous manuscript, and we have revised the results and interpretations regarding N450 accordingly in the revised version (in the N450 section of the ERP results and the third paragraph of the Discussion section).

      Reference

      Cavanagh, J. F., & Frank, M. J. (2014). Frontal theta as a mechanism for cognitive control. Trends in Cognitive Sciences, 18(8), 414–421. https://doi.org/10.1016/j.tics.2014.04.012

      Chen, M. J. (1993). A Comparison of Chinese and English Language Processing. In Advances in Psychology (Vol. 103, pp. 97–117). North-Holland. https://doi.org/10.1016/S0166-4115(08)61659-3

      Chen, X. F., Jiang, J., Zhao, X., & Chen, A. (2010). Effects of practice on semantic conflict and response conflict in the Stroop task. Psychol. Sci., 33, 869–871.

      Chen, Z., Lei, X., Ding, C., Li, H., & Chen, A. (2013). The neural mechanisms of semantic and response conflicts: An fMRI study of practice-related effects in the Stroop task. NeuroImage, 66, 577–584. https://doi.org/10.1016/j.neuroimage.2012.10.028

      Cohen, M. X. (2014). Analyzing Neural Time Series Data: Theory and Practice. The MIT Press. https://doi.org/10.7551/mitpress/9609.001.0001

      Duprez, J., Gulbinaite, R., & Cohen, M. X. (2020). Midfrontal theta phase coordinates behaviorally relevant brain computations during cognitive control. NeuroImage, 207, 116340. https://doi.org/10.1016/j.neuroimage.2019.116340

      Duque, J., Greenhouse, I., Labruna, L., & Ivry, R. B. (2017). Physiological Markers of Motor Inhibition during Human Behavior. Trends in Neurosciences, 40(4), 219–236. https://doi.org/10.1016/j.tins.2017.02.006

      Engel, A. K., & Fries, P. (2010). Beta-band oscillations—Signalling the status quo? Current Opinion in Neurobiology, 20(2), 156–165. https://doi.org/10.1016/j.conb.2010.02.015

      Heidlmayr, K., Kihlstedt, M., & Isel, F. (2020). A review on the electroencephalography markers of Stroop executive control processes. Brain and Cognition, 146, 105637. https://doi.org/10.1016/j.bandc.2020.105637

      Little, S., Bonaiuto, J., Barnes, G., & Bestmann, S. (2019). Human motor cortical beta bursts relate to movement planning and response errors. PLOS Biology, 17(10), e3000479. https://doi.org/10.1371/journal.pbio.3000479

      Morales, S., & Bowers, M. E. (2022). Time-frequency analysis methods and their application in developmental EEG data. Developmental Cognitive Neuroscience, 54, 101067. https://doi.org/10.1016/j.dcn.2022.101067

      Senoussi, M., Verbeke, P., Desender, K., De Loof, E., Talsma, D., & Verguts, T. (2022). Theta oscillations shift towards optimal frequency for cognitive control. Nature Human Behaviour, 6(7), Article 7. https://doi.org/10.1038/s41562-022-01335-5

      van Veen, V., & Carter, C. S. (2005). Separating semantic conflict and response conflict in the Stroop task: A functional MRI study. NeuroImage, 27(3), 497–504. https://doi.org/10.1016/j.neuroimage.2005.04.042

      Zhao, X., Chen, A., & West, R. (2010). The influence of working memory load on the Simon effect. Psychonomic Bulletin & Review, 17(5), 687–692. https://doi.org/10.3758/PBR.17.5.687

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations for The Authors):

      Q1: In response to reviewers you noted totally 292 sequenced LECs, however in reviewer figure 3 B the numbers seem to add up to 221. Please include mention of the total number of LEC sequences. Please mention line 119, page 4 the total number of explored LEC transcriptomes

      Thank you for your carefully review. We have updated Fig 2A, 2C and 2E. It was 242 (not 292) LECs included in our initial analysis, which contains the sample of d5 post MI in raw data (E-MTAB-7895). We dropped d5 in our subsequent analysis because the change in d5 did not significant differ from d3. Therefore, we included 221 LECs in our final analysis as we updated in Fig2A, 2C and 2E.

      Q2-1: Figure 3A supposedly shows % of LEC subpopulations relative to their numbers found in day 0 samples. However, there seem to be some errors, because for example the subpop LEC Cap I include 13 cells day 1 and 6 cells day 1, which corresponds to 46% of initial numbers. However, from your graph 3B the blue population seems to occupy 10%. Please revise or explain how these relative % were calculated.

      Thank you for your question. In the Figure 3A, each column was calculated by dn/d0*100%, that is d0=57/57*100%=100%, and d1= 21/57*100%=36.84%, d3=9/57*100%=15.79%, d7, d14, d28...Therefor, Cap I in d0 (13 cells) is 13/57*100%=22.81%, and Cap I in d1(6 cells) is 6/57*100%= 10.53%.

      Q2-2: Further, based on the relative % of LEC subpopulations, using the numbers mentioned in Fig 3B, it would appear that the relative frequency LEC cap II population is actually stable at around 20-30% of all LECs per time point throughout the study (except day 1 drop). This contrasts with line 136 p. 4 statement. I would also urge caution for interpreting too much into the variation of relative levels of LEC co, as these represent exceeding rare cells in your samples, and could reflect technical issues rather than true biological variation (total LEC co numbers analyzed ranging from 1-24 cells/ time point). The same could be said of LEC cap II and cap III.

      We strongly agree with your comment on the proportion of LEC cell subtypes post MI. As you pointed out, we have revised the result description on Page 4, line 137-143 as followed.

      “In the early stages of myocardial infarction (D1 and D3), the quantity of LECs decreased sharply. The number of LECs gradually increasing from day 7 and returning to normal levels by day 14 after MI. Moreover, from day 14 onwards, the number and proportion of Ca I type LECs significantly increased.”

      Q3: Please list in supplement the gene features used to identify in spatial transcriptomics the different LEC subpopulations, as their profiles (notably for capillary LECs) don't appear to be very different based on data in Fig 2F.

      We have supplied gene features in supplementary materials.

      Q4: In section 2.7 you refer to Gal9 secretion. Please replace with expression as no measure of protein levels from LECs has been described in your study.

      Thank you for your suggestion, we have replaced secretion with expression.

      Q5: The updated method to exclude non-lymphatic cells from lymphatic vessel analyses by incorporating pdpn as an additional marker ('present costained areas wherever possible' line 350 p 10)

      Thank you for your correction. We have updated the description as follows and lighted them in the manuscript: rabbit anti-Lyve1 (1:300, ab14917, Abcam, UK), [Syrian hamster anti-Podoplanin (1:100, 53-5381-82, Thermo, USA), rabbit anti-Prox1(1:300, ab199359, Abcam, UK), both anti-podoplain and anti-prox1 are additional markers co-stained with Lyve1 to exclude non-lymphatic cells from lymphatic vessel].

      Q6: Fig 1B, it is highly surprising to see the lymphatic density in the BZ go from 25 um² at day 3 to more than 1000 um² only four days later (day 7). Is it possible that your day 3 measurements were in the infarct area, and not BZ area? The H&E image shown in Fig1a for d3 sample would seem to indicate the analysis was done in a dead area, rather than BZ. Please revise (perhaps select similar zone as shown for d1 in fig 6D, adjusted for subepicardial region and not mid-myocardial as seems to be the case currently), and also provide lymphatic area measures in healthy myocardium for day 0 samples. The unit used (um²) also would depend on the size of the area examined. Is this unit per image? If so please report total imaged area as a reference.

      A6: Thank you for your reminding and advises. We have labeled each zone on H&E and IF images in Fig1-supplementary Fig2B, and updated a clearer histological photo taken at 3 days post MI in Fig1A. Furthermore, we recalculated the lymphatic vessel area ratio as you suggested by calculating the ratio of LEC co-stained area to total imaged area under 100-fold magnification.

      Q7: The mention that CD68 antibody isn't compatible with lyve1 antibody could easily have been bridged by using other macrophage markers, such as F4/80, which is readily available and often used marker for macs in mice and comes notably as a rat anti-mouse F4-80. It would have added much more relevant information to exclude Lyve1-/F4/80+ cells as compared to the current analysis, which may indeed include in area measures Lyve1+ /Pdpn- single cells erroneously spotted as 'lymphatic vessels'

      Thank you for your excellent suggestion. We co-stained the sample with F4/80 and LYVE1 and supplied in the Fig1-supplementary Figure 1E, as shown in Author response image 1.

      Author response image 1.

      Immunofluorescence (IF) co-staining of tissue section with F4/80 and LYVE1 in sham and MI mice model at d3, d7, d14, and d28 post-MI. LYVE1: lymphatic vessel endothelial hyaluronan receptor 1; DAPI: 4’6-diamidino-2-phenylindole; scale bar in 10×-100 μm, 40×-25μm.

      Reviewer 2 (Recommendations for The Authors):

      Q1: Language expression must be improved. Many incomplete sentences exist throughout the manuscript. A few examples: Line 70-71: In order to further elucidate the effects and regulatory mechanisms of the lymphatic vessels in the repair process of myocardial injury following MI. Line 71-73. This study, integrated single-cell sequencing and spatial transcriptome data from mouse heart tissue at different timepoints after MI from publicly available data (E-MTAB-7895, GSE214611) in the ArrayExpress and gene expression omnibus (GEO) databases. Line 88-89: Since the membrane protein LYVE1 can present lymphatic vessel morphology more clearly than PROX1.

      Thank you for your correction. We have carefully inspected and corrected the whole manuscript.

      Q2: The type of animal models (i.e., permanent MI or MI plus reperfusion) included in Array Express and gene expression omnibus (GEO) databases must be clearly defined as these two models may have completely different effects on lymphatic vessel development during post-MI remodeling.

      Thank you for your excellent suggestion. The animal models used in both E-MTAB-7895 and GSE214611 are permanent MI. We have modified the model information in the methodology section (page 12, line 400-401).

      Q3: Line 119-120: Caution must be taken regarding Cav1 as a lymphocyte marker because Cav1 is expressed in all endothelial cells, not limited to LEC.

      Thanks for your reminding. Cav 1 used in our clustering is one of the marker gene for its different expression in sub-types of LECs, referred in article PMID: 31402260

      Q4: Figure 1 legend needs to be improved. RZ, BZ, and IZ need to be labeled in all IF images. Day 0 images suggest that RZ is the tissue section from the right ventricle.

      Thank you for your suggestion. We have labeled and updated the regions of RZ, BZ, and IZ in H&E and IF image in Figure1-Figure supplement 2B.

      Q5: The discussion section needs to be improved and better focused on the findings from the current study.

      Thank you for your good comment. Based on your suggestion, we have revised the first paragraph of the discussion from lines 250-256 (Page 7) as followed:

      Cardiac lymphatics play an important role in myocardial edema and inflammation. This study, for the first time, integrated single-cell sequencing data and spatial transcriptome data from mouse heart tissue at different time points of post-MI, and identified four transcriptionally distinct subtypes of LECs and their dynamic transcriptional heterogeneity distribution in different regions of myocardial tissue post-MI. These subgroups of LECs were shown to form different function involved in the inflammation, apoptosis, ferroptosis, and water absorption related regulation of vasopressin during the process of myocardial repair after MI.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the work: "Endosomal sorting protein SNX4 limits synaptic vesicle docking and release" Josse Poppinga and collaborators addressed the synaptic function of Sortin-Nexin 4 (SNX4). Employing a newly developed in vitro KO model, with live imaging experiments, electrophysiological recordings, and ultrastructural analysis, the authors evaluate modifications in synaptic morphology and function upon loss of SNX4. The data demonstrate increased neurotransmitter release and alteration in synapse ultrastructure with a higher number of docked vesicles and shorter AZ. The evaluation of the presynaptic function of SNX4 is of relevance and tackles an open and yet unresolved question in the field of presynaptic physiology.

      Strengths:

      The sequential characterization of the cellular model is nicely conducted and the different techniques employed are appropriate for the morpho-functional analysis of the synaptic phenotype and the derived conclusions on SNX4 function at presynaptic site. The authors succeeded in presenting a novel in vitro model that resulted in chronical deletion of SNX4 in neurons. A convincing sequence of experimental techniques is applied to the model to unravel the role of SNX4, whose functions in neuronal cells and at synapses are largely unknown. The understanding of the role of endosomal sorting at the presynaptic site is relevant and of high interest in the field of synaptic physiology and in the pathophysiology of the many described synaptopathies that broadly result in loss of synaptic fidelity and quality control at release sites.

      We thank the reviewer for their positive evaluation of our manuscript.

      Weaknesses:

      The flow of the data presentation is mostly descriptive with several consistent morphological and functional modifications upon SNX loss. The paper would benefit from a wider characterization that would allow us to address the physiological roles of SNX4 at the synaptic site and speculate on the underlying molecular mechanisms. In addition, due to the described role of SNX4 in autophagy and the high interest in the regulation of synaptic autophagy in the field of synaptic physiology, an initial evaluation of the autophagy phenotype in the neuronal SNX4KO model is important, and not to be only restricted to the discussion section.

      We thank the reviewer for their suggestions and agree that broader characterization would help us speculate on the underlying mechanism. To address this, we have conducted additional independent experiments investigating the role of SNX4 in neuronal autophagy, as suggested by this reviewer. These experiments are now included in the main figures and are no longer limited to the discussion section. Please see the detailed responses to this reviewer's recommendations below.

      Reviewer #2 (Public Review):

      Summary:

      SNX4 is thought to mediate recycling from endosomes back to the plasma membrane in cells. In this study, the authors demonstrate the increases in the amounts of transmitter release and the number of docked vesicles by combining genetics, electrophysiology, and EM. They failed to find evidence for its role in synaptic vesicle cycling and endocytosis, which may be intuitively closer to the endosome function.

      Strengths:

      The electrophysiological data and EM data are in principle, convincing, though there are several issues in the study.

      We thank the reviewer for their positive evaluation of our manuscript.

      Weaknesses:

      It is unclear why the increase in the amounts of transmitter release and docked vesicles happened in the SNX4 KO mice. In other words, it is unclear how the endosomal sorting proteins in the end regulate or are connected to presynaptic, particularly the active zone function.

      We thank the reviewer for their suggestions and agree that further characterization would help to understand how endosomal sorting proteins regulate presynaptic neurotransmission. We have now added extra data on electrophysiological recordings clarifying SNX4’s role in the synapse. Please see the detailed responses to this reviewer's recommendations below.

      Reviewer #3 (Public Review):

      Summary:

      The study aims to determine whether the endosomal protein SNX4 performs a role in neurotransmitter release and synaptic vesicle recycling. The authors exploited a newly generated conditional knockout mouse to allow them to interrogate the SNX4 function. A series of basic parameters were assessed, with an observed impact on neurotransmitter release and active zone morphology. The work is interesting, however as things currently stand, the work is descriptive with little mechanistic insight. There are a number of places where the data appear to be a little preliminary, and some of the conclusions require further validation.

      Strengths:

      The strengths of the work are the state-of-the-art methods to monitor presynaptic function.

      We thank the reviewers for their positive evaluation of our manuscript.

      Weaknesses:

      The weaknesses are the fact that the work is largely descriptive, with no mechanistic insight into the role of SNX4. Further weaknesses are the absence of controls in some experiments and the design of specific experiments.

      We thank the reviewer for their suggestions and agree that addition of extra control groups and experiments would strengthen interpretation of the observed phenotype. To address this, we have now performed experiments to investigate the miniature excitatory postsynaptic currents and added extra control groups such as overexpression of SNX4 on control background. In addition, we assessed SNX4-mediated neuronal autophagy as a potential molecular mechanism by which SNX4 affects synaptic output. Please see the detailed responses to this reviewers’ recommendations below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The characterization of the neurite outgrowth presented in Figure 1 is a necessary starting point for the characterization of the model and the interpretation of the following data. Being the analysis conducted at 21 DIV, a significant portion of the neurite tree is out of the analyzed field. Adding sholl analysis will better indicate the complexity of the that appears to be influenced by SNX4 loss in the representative images shown in Figure 1f.

      We fully agree and have now performed a Sholl analysis of dendrite branches to investigate dendritic complexity. (Figure 1(i), page 2-3, line 86-88). SNX4 depletion does not affect dendrite length or dendrite branching.

      (2) Analogously, the characterization of synapse number is of relevance for the interpretation of the data. For a better flow of the data, Figure 4 might be presented as Figure 2 (without the repetition of panel h in Figure 1). An explanation of how VAMP2 puncta are processed is necessary in the method section. A double labelling with a postsynaptic marker would allow trafficking organelles to be distinguished from mature synaptic contacts. Indeed, the analysis of VAMP2 intensity along neurite in mature 21DIV neurons should reveal peaks in the intensity profile that represent synaptic contacts. For unexplained reasons, the profile is rather flat in the two experimental groups. Focusing on axonal branches will surely result in a peaked profile for VAMP2 labelling.

      We fully agree that the characterization of synapses is relevant for the interpretation of the data. We have now added a section in our Material and Methods how the VAMP2 puncta are processed (p14 line 517-520). Instead of labeling mature synapses using double labeling of VAMP2 and PSD95, we analyzed the number of active synapses in live neurons using SypHy (Fig. 3g). The reviewer is correct that the VAMP2 data presented in Fig 1I and Fig 4 is part of the same dataset and we have clarified this in the figure legend. In Fig 1I only the total number of VAMP2 puncta is plotted as a marker for synapse number, while in Fig 4 we assess VAMP2 as potential SNX4 sorting cargo (Ma et al., 2017). Because of these different aims, we prefer to keep the figures separate. The analysis of VAMP2 intensity along the distance of the soma is a Sholl analysis (Fig. 4d), represents the average VAMP2 intensity over distance from the soma of 35-41 neurons per group. In contrast to a line scan of a single neurite, this average profile lacks the peaks of individual synapses.

      (3) Miniature excitatory postsynaptic currents recordings would strengthen the synaptic characterization and complement the electrophysiological recordings shown in Figure 2. Analyzing frequency and amplitude parameters would complement the data on the number of synaptic connections defined by the pre and postsynaptic colocalization puncta as suggested above and may support the data shown in Figure 3 g that suggests a decreased number of active synapses in SNX4-KO cells.

      We fully agree that the characterization of miniature excitatory postsynaptic currents would strengthen the synaptic characterization and complement the other electrophysiological data. Therefore, we have now added additional experiments showing the mEPSCs (Fig. 2k-m, page 4) in SNX4 cKO neurons versus control. This data shows that the amplitude and frequency of spontaneous miniature EPSCs (mEPSCs) were not affected upon SNX4 depletion, consistent with a normal first evoked EPSC and RRP estimate. Furthermore, these data suggest that it is unlikely that the observed increase in neurotransmission is due to post-synaptic effects.

      (4) Recordings on the first evoked response shown in Figure 2 b and quantified in Figures c and d suggest that SNX4 overexpression per se exerts some effect on the Amplitude and the Charge of the first evoked response. This is also evident in the supplementary Figure 2 with lower frequency trains. An additional experimental group, namely control+SNX4 is needed for the correct interpretation of the observed phenotype. The possibility that SNX4 per se exerts an effect on evoked transmission could be discussed in terms of putative mechanisms and interactions.

      We thank the reviewer for their suggestion and agree that an additional experimental group (control + SNX4) would strengthen interpretation of the observed phenotype. We have now added a new experimental condition with overexpression of SNX4 on a control background (Supplementary Fig. 3, page 20). This data shows that the amplitude and charge of the first evoked response were not affected in control + SNX4 neurons compared to control, and no differences were detected in the response to the 40 Hz stimulation train (Supplementary Fig. 3a-e).  Together, these data suggest that SNX4 overexpression in itself does not affect the neurotransmission protocols studied in SNX4 cKO experiments.

      (5) To correctly interpret the SyPhy experiments and exclude an effect of SNX silencing on SV recycling, it is suggested to repeat the experiments shown in Figure 3 in the absence and in the presence of bafilomycin. Indeed, the quantifications shown in Figure 3 d and f do not represent "release fraction" as stated (lines 139/140) but they rather refer to an average difference between release fraction and recovered fraction. With the use of bafilomycin, the comparison of the deltaFmax/deltaFNH4Cl with and without bafilomycin would enable the release fraction to be correctly evaluated and compared.

      We appreciate the reviewer’s suggestion and agree on the importance of considering the impact of SV recycling when evaluating the released fraction. We agree that the presence of bafilomycin is critical to isolate the released component during stimulation. We have now rephrased this conclusion. To assess synaptic recycling in these assays, bafilomycin in not critically required and we show by multiple independent experiments, including SypHy and FM64 dye assays, that SV recycling is either not affected or the effect is too small to be detected by these methods.

      (6) In the ultrastructural analysis, additional quantifications are needed to exclude the accumulation of endosome-like structures. It is not clear if, in the evaluation of total SV number (Figure 5e), the authors counted all vesicles or vesicles < 50nm. This has to be explained and additional quantification of # of SV < 50nm and # SV > 50nm is informative, taking into account the endosomal nature of SNX4. Indeed, although the average size of SV is not changed (fig. 5 d), the density of "bigger vesicle" may result from endosomal-like structure accumulation. An additional suggested quantification is on vesicle # SV > 80nm as previously reported in the cited references dealing with endosomal proteins and presynaptic morphology.

      We fully agree that the characterization of vesicle size is important and that it was not clearly stated which vesicles were included in the total number of SV (Fig. 5e). We have now added this to the figure description. We have also added a histogram that contains the vesicle numbers of different bin sizes for SNX4 cKO synapses and control synapses (Supplementary Fig. 4, page 21) including # SVs > 80nm. (Whilst it seems that there are more “bigger” vesicles in the KO, further analysis revealed that this is mostly driven by one experiment and this effect is not consistent.)

      (7) Due to the high scientific interest in presynaptic autophagy for SV recycling and degradation, and the paucity of experimental work assessing the proteins involved, an initial evaluation of the neuronal autophagy process (by western blot analysis and immunocytochemistry) for the characterization of the model will better support the paragraph in the discussion (lines 314-322) and contribute to future work in the field. Although very rare, autophagosomes quantification at presynaptic sites can also be performed from the already acquired images. A double membrane structure with the material inside is evident in the representative control image presented!

      We appreciate the reviewer’s suggestion and agree that presynaptic autophagy is an interesting potential mechanism that would elaborate our current working model. To address the reviewers’ suggestion, we added multiple independent experiments to investigate basal autophagy markers such as ATG5 using western blot analysis, characterization of p62 levels using immunohistochemistry and performed additional morphometric analysis on the electron microscopy data (Supplementary Fig. 5). In SNX4 cKO neurons, there was no significant difference in P62 puncta numbers or P62 somatic intensity under basal conditions or after blocking autophagic P62 degradation by bafilomycin treatment, suggesting that autophagic flux remains normal. Also, no changes in total ATG5 protein levels were observed and ultrastructural analysis revealed no differences in the total number of autophagosomes. Collectively, these data indicate that SNX4 depletion does not impact the basal autophagic flux, ATG5 protein levels, or the number of autophagosomes.

      Minor points:

      (1) Dorrbaun et al. 2018 is missing from the reference list. In the legend to figure 1 there is an incorrect reference to Figure 6, rather than Figure 4.

      We have now adjusted the figure legend and added the reference (page 16, line 604).

      (2) Information on the construct employed for the rescue is missing. Is it a fluorescent tag construct? Representative images of the three autaptic neurons (control, KO, KO+SNX4) would nicely complement data presentation in Figure 2. 

      We have now elaborated on this in material and methods section (p12, line 418-421). Unfortunately, we did not obtain pictures of autaptic neurons used for electrophysiology experiments.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 2d and f are somewhat inconsistent. Total charges for the 1st EPSCs differ almost 2-fold in the same condition.

      We appreciate the reviewer’s concern. The average EPSCs charge of the first evoked was 89, 122 and 57 pC for control, KO and rescued neurons respectfully. The average charge of the first pulse of 40Hz train was 41,58 and 32 pC for control, KO and rescued neurons respectfully, which is roughly 50% of the naïve response of the same cells. These trains were recorded after 2 or 3 other stimulation paradigms, which can have affected the total charge released in the 40Hz train. That said, the proportional difference between groups is high comparable, with a 37% increased average charge released in SNX4 cKO compared to control in the naïve response and 41% increased response in the first response of the 40 Hz train, and rescued cells show a 53% reduction in average released charge compared to control in the naïve response compared to a 44% reduction in the first response of the 40 Hz train. Although the absolute values differ between these readouts, we conclude that the biological comparison between groups is consistent.

      (2) Figure 2h. This type of analysis has a drawback. See Neher (2015) for the problems associated with this analysis.

      We fully agree with the reviewer’s comment. As noted in our discussion (page 9 line 285), while this analysis has its limitations, it can still provide an indication of the ready releasable pool.   

      (3) The EPSC phenotype may be due to postsynaptic effects. This should be excluded by additional experiments (mEPSC analysis) or further clarification.

      We fully agree that the characterization of miniature excitatory postsynaptic currents recording would strengthen the synaptic characterization and complement the electrophysiological recordings. Therefore, we have now added additional experiments showing the mEPSCs (Fig. 2k-m) in SNX4 cKO neurons versus control. This data shows that the amplitude and frequency of spontaneous miniature EPSCs (mEPSCs) were not affected upon SNX4 depletion, suggesting that it is unlikely that the observed increase in neurotransmission is due to post-synaptic effects.

      (4) The increased number of docked vesicles observed in EM and the increased slope (vesicle recruitment, Figure 2h) are not consistent with each other. Maybe the definition of docked vesicles is unclear in this version of the manuscript.

      As noted in our material & methods (page 15, line 547-548), SVs were defined as docked if there was no distance visible between the SV membrane and the active zone membrane. We have added the pixel size for clarification. Indeed, we do not observe an increase in release probability or first evoked response, which would correspond with an increased docked pool. However, we think that the increase in docked vesicles might contribute to an enhanced SV recruitment (see discussion).

      (5) Figure 3: Vesicle cycling was monitored in only a limited condition. It is known that there are multiple pathways of vesicle cycling. Ideally, these pathways should be dissected. At least, the authors mention the possibility that they have missed some "positive" conditions.

      We fully agree with the reviewer’s comment that vesicle recycling is complex with several parallel pathways involved. While we did not study individual endocytosis pathways, we used different assays covering various recycling pathways. The SypHy assay (Fig. 3c & f) combined with the 100 AP stimulation paradigm at room temperature predominantly addresses clathrin-mediated endocytosis. Additionally, the FM-64 dye assay at 37 degrees Celsius covers ultrafast endocytosis pathways as well as bulk endocytosis routes. Since neither assay showed major effects, we decided not to pursue further experiments focusing on different endocytosis pathways.

      Reviewer #3 (Recommendations For The Authors):

      Major points:

      (1) Since all of the work here is culture-focussed, the in vivo phenotype is not as relevant, however the in vitro properties are. The incomplete Cre-dependent removal of SNX4 is concerning (especially axonal SNX4 levels identified via immunofluorescence), however, the main concern is that there was no profiling of the other molecular changes within these cultures. This is important, since there may be considerable alterations in the expression of a number of presynaptic proteins which may explain the observed phenotypes. Ideally, these cultures could have been profiled in an unbiased manner via mass spectrometry to identify potential changes in the presynaptic proteome, or at the very least the levels of key fusion molecules would have been assessed via Western blotting.

      We thank the reviewer for their suggestion and agree that mass spectrometry would strengthen the interpretation of the observed phenotype. However, due to contractual constraints, we are unable to pursue a mass spectrometry follow-up experiment. We agree that characterizing key fusion molecules is of potential interest. Therefore, based on literature, we selected a likely candidate, VAMP2, which did not show any alterations in expression levels when knocking out SNX4. Given the previously described role of SNX4 in the degradation pathway, one would expect increased degradation of key fusion molecules if they are recycled by SNX4. Other literature indicates that reduced levels of key fusion molecules, such as synaptotagmin or SNAP-25 (Broadie et al., 1994; Washbourne et al., 2001) , do not mimic our phenotype.

      (2) The experiments reported in Figure 2, in particular those in 2c and 2d, suggest that overexpression of SNX4 has a dominant-negative effect on neurotransmitter release. This is strongly supported by the supplementary data during a stimulus train (particularly the start point of the 5 Hz train in Supplementary Figure 2). Therefore, the perceived rescue of EPSC charge in Figure 2f, 2g may be a result of SNX4 inhibiting neurotransmitter release. A determination of the impact of SNX4 overexpression (and level of overexpression) in WT neurons is essential to show that this is a bonefide rescue, rather than a direct inhibition by SNX4 overexpression.

      We thank the reviewer for their suggestion and agree that an additional experimental group (control + SNX4) would strengthen interpretation of the observed phenotype. We have now added a new experiment with an extra experimental condition with overexpression of SNX4 on a control background (Supplementary Fig. 3 page 21). This data shows that the amplitude and charge of the first evoked response were not affected in control + SNX4 neurons compared to control, and no differences were detected in the response to the 40 Hz stimulation train (Supplementary Fig. 3a-e).  Together, these data suggest that SNX4 overexpression in itself does not affect the neurotransmission protocols studied in SNX4 cKO experiments.

      (3) The experiments in Figure 3 clearly reveal a lack of effect of SNX4 depletion on synaptic vesicle endocytosis. However, the assumption that synaptic vesicle recycling is unaffected is a little premature. The fact that the second evoked SypHy peak is significantly larger than the first (Figures 3c-e) suggests that more vesicles may be recycling in KO neurons. Furthermore, the FM dye experiments do not aid interpretation, since there may be insufficient time (10 min) for new vesicles to be generated from endosomal intermediates experiments. Therefore, to confirm an absence of effect on recycling, the authors could either 1) perform the same experiment as 3c, but with 4 stimulation trains (to drive the system harder to reveal any phenotype) or 2) repeat the FM dye experiment but increase the time between loading and unloading to 30 min.

      We fully agree with the reviewers' comment that vesicle recycling is an important component to consider and is complex with several parallel pathways involved. We conducted multiple independent experiments covering the most significant recycling pathways. The SypHy assay (Fig. 3c & f) combined with the 100 AP stimulation paradigm at room temperature predominantly addresses clathrin-mediated endocytosis. Additionally, the FM-64 dye assay at 37 degrees Celsius covers ultrafast endocytosis pathways as well as bulk endocytosis routes. To further challenge the system and reveal recycling phenotypes, we included a second 100 AP stimulation in our SypHy assay. While only the increase of the second SypHy peak is significant, the absolute numbers do not differ much from the first peak (0,17 for control and 0,21 for KO second peak and 0,19 for control and 0,22 for KO first peak, Supplementary table1). We nevertheless do not see any effects on recycling after the second peak (mean decay time is 27 for control and 26 for KO Supplementary Table 1). A single 100 AP 40 Hz train depletes all the synchronous release (not shown) and most of the evoked charge (see Fig 2f), hence two of these trains with one minute recovery is already a very demanding protocol. Although increasing the time between loading and unloading to 30 minutes might uncover other recycling components, it has been shown that ultrafast endocytosis occurs within 30 seconds (Watanabe et al., 2013), suggesting that 10 minutes should provide enough time for synaptic vesicle recycling. This is also evident from the fact that we can significantly destain synapses loaded with FM dye by electrical stimulation (Fig 3j), indicating that synaptic vesicle recycling took place. Since neither assay showed major effects, we concluded that under these circumstances, synaptic recycling is not significantly affected. However, we cannot exclude the possibility that recycling deficits in SNX4 cKO neurons could be detected in other paradigms,

      (4) There is no obvious effect on VAMP2 levels or location in SNX4 KO neurons (Figure 4). However, when one considers that SNX4 is proposed to have a role in VAMP2 trafficking, it is surprising that an experiment examining the live trafficking of VAMP2-SypHy was not performed. This would have revealed activity-dependent alterations that would have been missed by simply measuring VAMP2 expression and localization, and potentially provided a molecular explanation for the enhanced neurotransmitter release during a stimulus train.

      We appreciate the reviewer’s suggestion and agree that it could be a valuable experiment However, overexpressing a VAMP2-pHluorin construct might obscure potential phenotypes related to VAMP2 trafficking. SNX4 is expected to be involved in VAMP2 recycling, even with activity-dependent changes. Mis-sorted VAMP2 would accumulate in acidic vesicles, which could be masked by the VAMP2-pHluorin construct. Similarly, mis-sorting of other SNX4 cargo, such as the transferrin receptor, has been identified through lysosomal degradation, as shown by Western blot analysis of expression levels of the endogenous protein. We did not detect any differences in endogenous levels of VAMP2 within 21 days of SNX4 deletion (Fig 4), indicating that SNX4-dependent endosome sorting is not essential for VAMP2 recycling.

      (5) The morphological data in Figure 5 report a series of small changes in docked vesicles and active zone length. In many cases, significance is obtained due to synapses being used as the experimental n, and thus inflating the statistical power. When one considers that no significant effect was observed on evoked release (apart from during a stimulus train), it suggests that the number of docked vesicles does not alter release probability in this system (which the authors point out). Instead, they suggest that an increased supply of vesicles is responsible, via increased recruitment to RRP/releasable pool (but not via increased recycling). If this is the case, it should have been reflected as an increase in the evoked SypHy response in Fig 2c,d (which is borderline significant). What may help is to determine the morphological landscape immediately after a stimulus strain, since this is the only condition where enhanced release is observed, and thus provide a morphological correlate to the physiological data.

      We fully agree with the reviewer’s suggestion that an ultrastructural characterization immediately after a stimulus train would be informative. Unfortunately, contract constraints prevent us from performing this experiment. For our ultrastructural morphological data, we treated synapses as individual experimental n since it is not possible to determine whether synapses in a micronetwork on one sapphire originate from the same neuron. We used 18 independent sapphires from 3 independent pups to ensure the technical and biological replication of our data and measuring independent neurons. We fully agree with the reviewers comment to be careful with ‘inflating the statistical power’ due to potential nesting effects when using synapses as experimental n. To mitigate the potential nesting effect of analyzing multiple synapses per neuron, the intracluster correlation (ICC) is calculated per variable and per nesting effect. If ICC was close to 0.1, indicating that a considerable portion of the total variance can be attributed to e.g. synapse or sapphire, multilevel analysis was performed to accommodate nested data (Aarts et al., 2014).

      Minor points

      (1) When a new mouse model is generated, it is usually accompanied by a thorough characterization of its properties. However, in this case, there was no information provided about the conditional SNX4 knockout mouse. This is surprising and at a minimum, the following should be provided a) the background strain, b) method of generation, c) the number of animals used to establish the colony, d) breeding strategy, e) backcrossing strategy, f) genotyping protocol.

      We apologize that a thorough characterization of our novel mouse model was lacking and therefore added this to our material & methods section (page 11, line 377-391).

      (2) There is a noticeable difference between WT and KO neurons during train stimulation in Figure 2f, however, this appears to be due to the fact that there is a far higher EPSC charge to begin with in KO neurons. Why is there such a disparity when there is no difference in response to single pulses (Figures 2b-d) or presynaptic plasticity (Figure 2e)?

      We understand the reviewer’s concern. We excluded an outlier (3x SD) in the KO dataset that drove the initial far higher EPSC charge in the graph (was already excluded for the statistics, Supplementary table 1). The average charge of the first pulse of 40Hz train is 41 pC and for KO neurons 58 pC, which did not differ significantly.  These trains of Fig. 2f were recorded after 2 or 3 other stimulation paradigms, which can have affected the total charge released in the 40Hz train. That said, the proportional difference between groups is high comparable between Fig 2b-d and 2f, with a 37% increased average charge released in SNX4 cKO compared to control in the naïve response (Fig. 2d) and 41% increased response in the first response of the 40 Hz train (Fig. 2f), and rescued cells show a 53% reduction in average released charge compared to control in the naïve response compared to a 44% reduction in the first response of the 40 Hz train. Although the absolute values differ between these readouts, we conclude that the biological comparison between groups is consistent.

      (3) Line 343-344 - "(Supplementary Figure 1a)" should be "(Figure 1a)".

      We thank the reviewer for this comment and adjusted this in the manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by McKim et al seeks to provide a comprehensive description of the connectivity of neurosecretory cells (NSCs) using a high-resolution electron microscopy dataset of the fly brain and several single-cell RNA seq transcriptomic datasets from the brain and peripheral tissues of the fly. They use connectomic analyses to identify discrete functional subgroups of NSCs and describe both the broad architecture of the synaptic inputs to these subgroups as well as some of the specific inputs including from chemosensory pathways. They then demonstrate that NSCs have very few traditional presynapses consistent with their known function as providing paracrine release of neuropeptides. Acknowledging that EM datasets can't account for paracrine release, the authors use several scRNAseq datasets to explore signaling between NSCs and characterize widespread patterns of neuropeptide receptor expression across the brain and several body tissues. The thoroughness of this study allows it to largely achieve it's goal and provides a useful resource for anyone studying neurohormonal signaling.

      Strengths:

      The strengths of this study are the thorough nature of the approach and the integration of several large-scale datasets to address short-comings of individual datasets. The study also acknowledges the limitations that are inherent to studying hormonal signaling and provides interpretations within the the context of these limitations.

      Weaknesses:

      Overall, the framing of this paper needs to be shifted from statements of what was done to what was found. Each subsection, and the narrative within each, is framed on topics such as "synaptic output pathways from NSC" when there are clear and impactful findings such as "NSCs have sparse synaptic output". Framing the manuscript in this way allows the reader to identify broad takeaways that are applicable to other model system. Otherwise, the manuscript risks being encyclopedic in nature. An overall synthesis of the results would help provide the larger context within which this study falls.

      We agree with the reviewer and will replace all the subsection titles as suggested.

      The cartoon schematic in Figure 5A (which is adapted from a 2020 review) has an error. This schematic depicts uniglomerular projection neurons of the antennal lobe projecting directly to the lateral horn (without synapsing in the mushroom bodies) and multiglomerular projection neurons projecting to the mushroom bodies and then lateral horn. This should be reversed (uniglomerular PNs synapse in the calyx and then further project to the LH and multiglomerular PNs project along the mlACT directly to the LH) and is nicely depicted in a Strutz et al 2014 publication in eLife.

      We thank the reviewer for spotting this error. We will modify the schematic as suggested.

      Reviewer #2 (Public review):

      Summary:

      The authors aim to provide a comprehensive description of the neurosecretory network in the adult Drosophila brain. They sought to assign and verify the types of 80 neurosecretory cells (NSCs) found in the publicly available FlyWire female brain connectome. They then describe the organization of synaptic inputs and outputs across NSC types and outline circuits by which olfaction may regulate NSCs, and by which Corazon-producing NSCs may regulate flight behavior. Leveraging existing transcriptomic data, they also describe the hormone and receptor expressions in the NSCs and suggest putative paracrine signaling between NSCs. Taken together, these analyses provide a framework for future experiments, which may demonstrate whether and how NSCs, and the circuits to which they belong, may shape physiological function or animal behavior.

      Strengths:

      This study uses the FlyWire female brain connectome (Dorkenwald et al. 2023) to assign putative cell types to the 80 neurosecretory cells (NSCs) based on clustering of synaptic connectivity and morphological features. The authors then verify type assignments for selected populations by matching cluster sizes to anatomical localization and cell counts using immunohistochemistry of neuropeptide expression and markers with known co-expression.

      The authors compare their findings to previous work describing the synaptic connectivity of the neurosecretory network in larval Drosophila (Huckesfeld et al., 2021), finding that there are some differences between these developmental stages. Direct comparisons between adults and larvae are made possible through direct comparison in Table 1, as well as the authors' choice to adopt similar (or equivalent) analyses and data visualizations in the present paper's figures.

      The authors extract core themes in NSC synaptic connectivity that speak to their function: different NSC types are downstream of shared presynaptic outputs, suggesting the possibility of joint or coordinated activation, depending on upstream activity. NSCs receive some but not all modalities of sensory input. NSCs have more synaptic inputs than outputs, suggesting they predominantly influence neuronal and whole-body physiology through paracrine and endocrine signaling.

      The authors outline synaptic pathways by which olfactory inputs may influence NSC activity and by which Corazon-releasing NSCs may regulate flight. These analyses provide a basis for future experiments, which may demonstrate whether and how such circuits shape physiological function or animal behavior.

      The authors extract expression patterns of neuropeptides and receptors across NSC cell types from existing transcriptomic data (Davie et al., 2018) and present the hypothesis that NSCs could be interconnected via paracrine signaling. The authors also catalog hormone receptor expression across tissues, drawing from the Fly Cell Atlas (Li et al., 2022).

      Weaknesses:

      The clustering of NSCs by their presynaptic inputs and morphological features, along with corroboration with their anatomical locations, distinguished some, but not all cell types. The authors attempt to distinguish cell types using additional methodologies: immunohistochemistry (Figure 2), retrograde trans-synaptic labeling, and characterization of dense core vesicle characteristics in the FlyWire dataset (Figure 1, Supplement 1). However, these corroborating experiments often lacked experimental replicates, were not rigorously quantified, and/or were presented as singular images from individual animals or even individual cells of interest. The assignments of DH44 and DMS types remain particularly unconvincing.

      We thank the reviewer for this comment. We would like to clarify that the images presented in Figure 2 and Figure 1 Supplement 1 are representative images based on at least 5 independent samples. We will clarify this in the figure caption and methods. The electron micrographs showing dense core vesicle (DCV) characteristics (Figure 1 Supplement E-G) are also representative images based on examination of multiple neurons. However, we agree with the reviewer that a rigorous quantification would be useful to showcase the differences between DCVs from NSC subtypes. Therefore, we have now performed a quantitative analysis of the DCVs in putative m-NSC<sup>DH44</sup> (n=6), putative m-NSC<sup>DMS</sup> (n=6) and descending neurons (n=4) known to express DMS. For consistency, we examined the cross section of each cell where the diameter of nuclei was the largest. We quantified the mean gray value of at least 50 DCV per cell. Our analysis shows that mean gray values of putative m-NSC<sup>DMS</sup> and DMS descending neurons are not significantly different, whereas the mean gray values of m-NSC<sup>DH44</sup> are significantly larger. This analysis is in agreement with our initial conclusion.

      Author response image 1.

      The authors present connectivity diagrams for visualization of putative paracrine signaling between NSCs based on their peptide and receptor expression patterns. These transcriptomic data alone are inadequate for drawing these conclusions, and these connectivity diagrams are untested hypotheses rather than results. The authors do discuss this in the Discussion section.

      We fully agree with the reviewer and will further elaborate on the limitations of our approach in the revised manuscript. However, there is a very high-likelihood that a given NSC subtype can signal to another NSC subtype using a neuropeptide if its receptor is expressed in the target NSC. This is due to the fact that all NSC axons are part of the same nerve bundle (nervi corpora cardiaca) which exits the brain. The axons of different NSCs form release sites that are extremely close to each other. Neuropeptides from these release sites can easily diffuse via the hemolymph to peripheral tissues that (e.g. fat body and ovaries) that are much further away from the release sites on neighboring NSCs. We believe that neuropeptide receptors are expressed in NSCs near these release sites where they can receive inputs not just from the adjacent NSCs but also from other sources such as the gut enteroendocrine cells. Hence, neuropeptide diffusion is not a limiting factor preventing paracrine signaling between NSCs and receptor expression is a good indicator for putative paracrine signaling.

      Reviewer #3 (Public review):

      Summary:

      The manuscript presents an ambitious and comprehensive synaptic connectome of neurosecretory cells (NSC) in the Drosophila brain, which highlights the neural circuits underlying hormonal regulation of physiology and behaviour. The authors use EM-based connectomics, retrograde tracing, and previously characterised single-cell transcriptomic data. The goal was to map the inputs to and outputs from NSCs, revealing novel interactions between sensory, motor, and neurosecretory systems. The results are of great value for the field of neuroendocrinology, with implications for understanding how hormonal signals integrate with brain function to coordinate physiology.

      The manuscript is well-written and provides novel insights into the neurosecretory connectome in the adult Drosophila brain. Some, additional behavioural experiments will significantly strengthen the conclusions.

      Strengths:

      (1) Rigorous anatomical analysis

      (2) Novel insights on the wiring logic of the neurosecretory cells.

      Weaknesses:

      (1) Functional validation of findings would greatly improve the manuscript.

      We agree with this reviewer that assessing the functional output from NSCs would improve the manuscript. Given that we currently lack genetic tools to measure hormone levels and that behaviors and physiology are modulated by NSCs on slow timescales, it is difficult to assess the immediate functional impact of the sensory inputs to NSC using approaches such as optogenetics. However, since l-NSC<sup>CRZ</sup> are the only known cell type that provide output to descending neurons, we will functionally test this output pathway using different behavioral assays recommended by this reviewer.

    1. Author response:

      Reviewer 2:

      (1) It appears that the purified γ-secretase complex generates the same amount of Aβ40 and Aβ42, which is quite different in cellular and biochemical studies. Is there any explanation for this?

      Roughly equal production of Aβ40 and Aβ42 is a phenomenon seen with purified enzyme assays, and the reason for this has not been identified. However, we suggest that what is meaningful in our studies is the relative difference between the effects of FAD-mutant vs. WT PSEN1 on each proteolytic processing step. All FAD mutations are deficient in multiple cleavage steps in γ-secretase processing of APP substrate, and these deficiencies correlate with stabilization of E-S complexes.

      (2) It has been reported the Aβ production lines from Aβ49 and Aβ48 can be crossed with various combinations (PMID: 23291095 and PMID: 38843321). How does the production line crossing impact the interpretation of this work?

      In the cited reports, such crossover was observed when using synthetic Aβ intermediates as substrate. In PMID 2391095 (Okochi M et al, Cell Rep, 2013), Aβ43 is primarily converted to Aβ40, but also to some extent to Aβ38. In PMID: 38843321 (Guo X et al, Science, 2024), Aβ48 is ultimately converted to Aβ42, but also to a minor degree to Aβ40. We have likewise reported such product line “crossover” with synthetic Aβ intermediates (PMID: 25239621; Fernandez MA et al, JBC, 2014). However, when using APP C99-based substrate, we did not detect any noncanonical tri- and tetrapeptide co-products of Aβ trimming events in the LC-MS/MS analyses (PMID: 33450230; Devkota S et al, JBC, 2021). In the original report on identification of the small peptide coproducts for C99 processing by γ-secretase using LC-MS/MS (PMID: 19828817; Takami M et al, J Neurosci, 2009), only very low levels of noncanonical peptides were observed. In the present study, we did not search for such noncanonical trimming coproducts, so we cannot rule out some degree of product line crossover.

      (3) In Figure 5, did the authors look at the protein levels of PS1 mutations and C99-720, as well as secreted Aβ species? Do the different amounts of PS1 full-length and PS1-NTF/CTF influence FILM results?

      This is a good question. Our preliminary investigation by Western Blot shows no correlation between C99 and PSEN1 expressions and FLIM results, but we will fully address the concern in our point-by-point responses submitted with a revised manuscript. 

      (4) It is interesting that both Aβ40 and Aβ42 Elisa kits detect Aβ43. Have the authors tested other kits in the market? It might change the interpretation of some published work.

      We have not tested other ELISA kits. In light of our findings, it would be a good idea for other investigators to test whatever ELISAs they use for specificity vis-à-vis Aβ43.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      Weaknesses:

      The match between fractal and classical cycles is not one-to-one. For example, the fractal method identifies a correlation between age and cycle duration in adults that is not apparent with the classical method. This raises the question as to whether differences are due to one method being more reliable than another or whether they are also identifying different underlying biological differences. It is not clear for example whether the agreement between the two methods is better or worse than between two human scorers, which generally serve as a gold standard to validate novel methods. The authors provide some insight into differences between the methods that could account for differences in results. However, given that the fractal method is automatic it would be important to clearly identify criteria for recordings in which it will produce similar results to the classical method.

      We thank the reviewer for the insightful suggestions. In the revised Manuscript, we have added a number of additional analyses that provide a quantitative comparison between the classical and fractal cycle approaches aiming to identify the source of the discrepancies between classical and fractal cycle durations. Likewise, we assessed the intra-fractal and intra-classical method reliability.

      Reviewer 2:

      One weakness of the study, from my perspective, was that the IRASA fits to the data (e.g. the PSD, such as in Figure 1B), were not illustrated. One cannot get a sense of whether or not the algorithm is based entirely on the fractal component or whether the oscillatory component of the PSD also influences the slope calculations. This should be better illustrated, but I assume the fits are quite good.

      Thank you for this suggestion. In the revised Manuscript, we have added a new figure (Fig.S1 E, Supplementary Material 2), illustrating the goodness of fit of the data as assessed by the IRASA method.

      The cycles detected using IRASA are called fractal cycles. I appreciate the use of a simple term for this, but I am also concerned whether it could be potentially misleading? The term suggests there is something fractal about the cycle, whereas it's really just that the fractal component of the PSD is used to detect the cycle. A more appropriate term could be "fractal-detected cycles" or "fractal-based cycle" perhaps?

      We agree that these cycles are not fractal per se. In the Introduction, when we mention them for the first time, we name them “fractal activity-based cycles of sleep” and immediately after that add “or fractal cycles for short”. In the revised version, we renewed this abbreviation with each new major section and in Abstract. Nevertheless, given that the term “fractal cycles” is used 88 times, after those “reminders”, we used the short name again to facilitate readability. We hope that this will highlight that the cycles are not fractal per se and thus reduce the possible confusion while keeping the manuscript short.

      The study performs various comparisons of the durations of sleep cycles evaluated by the IRASA-based algorithm vs. conventional sleep scoring. One concern I had was that it appears cycles were simply identified by their order (first, second, etc.) but were not otherwise matched. This is problematic because, as evident from examples such as Figure 3B, sometimes one cycle conventionally scored is matched onto two fractal-based cycles. In the case of the Figure 3B example, it would be more appropriate to compare the duration of conventional cycle 5 vs. fractal cycle 7, rather than 5 vs. 5, as it appears is currently being performed.

      In cases where the number of fractal cycles differed from the number of classical cycles (from 34 to 55% in different datasets as in the case of Fig.3B), we did not perform one-to-one matching of cycles. Instead, we averaged the duration of the fractal and classical cycles over each participant and only then correlated between them (Fig.2C). For a subset of the participants (45 – 66% of the participants in different datasets) with a one-to-one match between the fractal and classical cycles, we performed an additional correlation without averaging, i.e., we correlated the durations of individual fractal and classical cycles (Fig.4S of Supplementary Material 2). This is stated in the Methods, section Statistical analysis, paragraph 2.

      There are a few statements in the discussion that I felt were either not well-supported. L629: about the "little biological foundation" of categorical definitions, e.g. for REM sleep or wake? I cannot agree with this statement as written. Also about "the gradual nature of typical biological processes". Surely the action potential is not gradual and there are many other examples of all-or-none biological events.

      In the revised Manuscript, we have removed these statements from both Introduction and Discussion.

      The authors appear to acknowledge a key point, which is that their methods do not discriminate between awake and REM periods. Thus their algorithm essentially detected cycles of slow-wave sleep alternating with wake/REM. Judging by the examples provided this appears to account for both the correspondence between fractal-based and conventional cycles, as well as their disagreements during the early part of the sleep cycle. While this point is acknowledged in the discussion section around L686. I am surprised that the authors then argue against this correspondence on L695. I did not find the "not-a-number" controls to be convincing. No examples were provided of such cycles, and it's hard to understand how positive z-values of the slopes are possible without the presence of some wake unless N1 stages are sufficient to provide a detected cycle (in which case, then the argument still holds except that its alterations between slow-wave sleep and N1 that could be what drives the detection).

      In the revised Manuscript, we have removed the “NaN analysis” from both Results and Discussion. We have replaced it with the correlation between the difference between the durations of the classical and fractal cycles and proportion of wake after sleep onset. The finding is as follows:

      “A larger difference between the durations of the classical and fractal cycles was associated with a higher proportion of wake after sleep onset in 3/5 datasets as well as in the merged dataset (Supplementary Material 2, Table S10).” Results, section “Fractal cycles and wake after sleep onset”, last two sentences. This is also discussed in Discussion, section “Fractal cycles and age”, paragraph 1, last sentence. 

      To me, it seems important to make clear whether the paper is proposing a different definition of cycles that could be easily detected without considering fractals or spectral slopes, but simply adjusting what one calls the onset/offset of a cycle, or whether there is something fundamentally important about measuring the PSD slope. The paper seems to be suggesting the latter but my sense from the results is that it's rather the former.

      Thank you for this important comment. Overall, our paper suggests that the fractal approach might reflect the cycling nature of sleep in a more precise and sensitive way than classical hypnograms. Importantly, neither fractal nor classical methods can shed light on the mechanism underlying sleep cycle generation due to their correlational approach. Despite this, the advantages of fractal over classical methods mentioned in our Manuscript are as follows:

      (1) Fractal cycles are based on a real-valued metric with known neurophysiological functional significance, which introduces a biological foundation and a more gradual impression of nocturnal changes compared to the abrupt changes that are inherent to hypnograms that use a rather arbitrary assigned categorical value (e.g., wake=0, REM=-1, N1=-2, N2=-3 and SWS=-4, Fig.2 A).

      (2) Fractal cycle computation is automatic and thus objective, whereas classical sleep cycle detection is usually based on the visual inspection of hypnograms, which is time-consuming, subjective and error-prone. Few automatic algorithms are available for sleep cycle detection, which only moderately correlated with classical cycles detected by human raters (r’s = 0.3 – 0.7 in different datasets here).

      (3) Defining the precise end of a classical sleep cycle with skipped REM sleep that is common in children, adolescents and young adults using a hypnogram is often difficult and arbitrary.   The fractal cycle algorithm could detect such cycles in 93% of cases while the hypnogram-based agreement on the presence/absence of skipped cycles between two independent human raters was 61% only; thus, 32% lower.

      (4) The fractal analysis showed a stronger effect size, higher F-value and R-squared than the classical analysis for the cycle duration comparison in children and adolescents vs young adults. The first and second fractal cycles were significantly shorter in the pediatric compared to the adult group, whereas the classical approach could not detect this difference.

      (5) Fractal – but not classical – cycle durations correlated with the age of adult participants.

      These bullets are now summarized in Table 5 that has been added to the Discussion of the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      The authors have added a lot of quantifications to provide a more complete comparison of classical and fractal cycles that address the points I raised.

      Regarding, the question of skipped REM cycles: I am not sure the comparison of skipped cycle accuracies between fractal and manual methods makes sense. To make a fair comparison fractal and 2nd scorer classifications should be compared to the same baseline dataset which doesn't seem to be the case since the number of skipped cycles is not the same. Moreover, it's not indicated whether the fractal method identifies any false positive skipped cycles.

      Thank you for this comment. In the revised Manuscript, we have reported the number of false positive skipped cycles identified by the fractal algorithm. Likewise, we have added the comparison between the fractal algorithm and the second scorer detection of cycles with skipped REM sleep (Results, the section “Skipped cycles”, last paragraph). The text has been revised as follows:

      “Visual inspection of the hypnograms from Datasets 1 – 6 was performed by two independent researchers. Scorer 1 and Scorer 2 detected that out of 226 first sleep cycles 58 (26%) and 64 (28%), respectively, lacked REM episodes. The agreement on the presence of skipped cycles between two human raters equaled 91% (58 cycles detected by both raters out of 64 cycles detected by either one or two scorers). The fractal cycle algorithm detected skipped cycles in 57 out of 58 (98%) cases detected by Scorer 1 with one false positive (which, however, was tagged as a skipped cycle by Scorer2), and in 58 out of 64 (91%) cases detected by Scorer 2 with no false positives.”

      Minor points

      I suggest reporting the values of inter-method / inter-scorer correlations with the classical method in the main text since otherwise interpreting the value for fractal vs classical is impossible.

      Thank you for this comment. In the revised Manuscript, we have moved this section to the main text (Table 3).

      Table 5 + text of discussion: cycle identification based on hypnograms is claimed to be. "based on arbitrary assigned categorical values" the categories are not arbitrary since they correspond to well-validate sleep states, only the number associated it and this does not seem to be very important since it's only for visualization purposes.

      Thank you for this comment. In the revised Manuscript, we have removed the phrase “arbitrary assigned“.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Comment 1: In the Results section, the rationale behind selecting the beta band for the central (C3, CP3, Cz, CP4, C4) regions and the theta band for the fronto-central (Fz, FCz, Cz) regions is not clearly explained in the main text. This information is only mentioned in the figure captions. Additionally, why was the beta band chosen for the S-ROI central region and the theta band for the S-ROI fronto-central region? Was this choice influenced by the MVPA results?

      We thank the reviewer for the question regarding the rationale for the S-ROI selection in our study. The beta band was chosen for the central region due to its established relevance in motor control (Engel & Fries, 2010), movement planning (Little et al., 2019) and motor inhibition (Duque et al., 2017). The fronto-central theta band (or frontal midline theta) was a widely recognized indicator in cognitive control research (Cavanagh & Frank, 2014), associated with conflict detection and resolution processes. Moreover, recent empirical evidence suggested that the fronto-central theta reflected the coordination and integration between stimuli and responses (Senoussi et al., 2022). Although we have described the cognitive processes linked to these different frequencies in the introduction and discussion sections, along with the potential patterns of results observed in Stroop-related studies, we did not specify the involved cortical areas. Therefore, we have specified these areas in the introduction to enhance the clarity of the revised version (in the fourth paragraph of the Introduction section).

      Regarding whether the selection of S-ROIs was influenced by the MVPA results, we would like to clarify here that we selected the S-ROIs based on prior research and then conducted the decoding analysis. Specifically, we first extracted the data representing different frequency indicators (three F-ROIs and three S-ROIs) as features, followed by decoding to obtain the MVPA results. Subsequently, the time-frequency analysis, combined with the specific time windows during which each frequency was decoded, provided detailed interaction patterns among the variables for each indicator. The specifics of feature selection are described in the revised version (in the first paragraph of the Multivariate Pattern Analysis section).

      Comment 2: In the Data Analysis section, line 424 states: “Only trials that were correct in both the memory task and the Stroop task were included in all subsequent analyses. In addition, trials in which response times (RTs) deviated by more than three standard deviations from the condition mean were excluded from behavioral analyses.” The percentage of excluded trials should be reported. Also, for the EEG-related analyses, were the same trials excluded, or were different criteria applied?

      We thank the reviewer for this suggestion. Beyond the behavioral exclusion criteria, trials with EEG artifacts were also excluded from the data for the EEG-related analyses. We have now reported the percentage of excluded trials for both behavioral and EEG data analyses in the revised version (in the second paragraph of the EEG Recording and Preprocessing section and the first paragraph of the Behavioral Analysis section).

      Comment 3: In the Methods section, line 493 mentions: “A 400-200 ms pre-stimulus time window was selected as the baseline time window.” What is the justification in the literature for choosing the 400-200 ms pre-stimulus window as the baseline? Why was the 200-0 ms pre-stimulus period not considered?

      We thank the reviewer for this question and would like to provide the following justification. First, although a baseline ending at 0 ms is common in ERP analyses, it may not be suitable for time-frequency analysis. Due to the inherent temporal smoothing characteristic of wavelet convolution in time-frequency decomposition, task-related early activities can leak into the pre-stimulus period (before 0 ms) (Cohen, 2014). This means that extending the baseline to 0 ms will include some post-stimulus activity in the baseline window, thereby increasing baseline power and compromising the accuracy of the results. Second, an ideal baseline duration is recommended to be around 10-20% of the entire trial of interest (Morales & Bowers, 2022). In our study, the epoch duration was 2000 ms, making 200-400 ms an appropriate baseline length. Third, given that the minimum duration of the fixation point before the stimulus in our experiment was 400 ms, we chose the 400 ms before the stimulus as the baseline point to ensure its purity. In summary, considering edge effects, duration requirements, and the need to exclude other influences, we selected a baseline correction window of -400 to -200 ms. To enhance the clarity of the revised version, we have provided the rationale for the selected time windows along with relevant references (in the first paragraph of the Time-frequency analysis section).

      Comment 4: Is the primary innovation of this study limited to the methodology, such as employing MVPA and RSA to establish the relationship between late theta activity and behavior?

      We thank the reviewer for this insightful question and would like to clarify that our research extends beyond mere methodological innovation; rather, it utilized new methods to explore novel theoretical perspectives. Specifically, our research presents three levels of innovation: methodological, empirical, and theoretical. First, methodologically, MVPA overcame the drawbacks of traditional EEG analyses based on specific averaged voltage intensities, providing new perspectives on how the brain dynamically encoded particular neural representations over time. Furthermore, RSA aimed to identify which indicators among the decoded were directly related to behavioral representation patterns. Second, in terms of empirical results, using these two methods, we have identified for the first time three EEG markers that modulate the Stroop effect under verbal working memory load: SP, late theta, and beta, with late theta being directly linked to the elimination of the behavioral Stroop effect. Lastly, from a theoretical perspective, we proposed the novel idea that working memory played a crucial role in the late stages of conflict processing, specifically in the stimulus-response mapping stage (the specific theoretical contributions are detailed in the second-to-last paragraph of the Discussion section).

      Comment 5: On page 14, lines 280-287, the authors discuss a specific pattern observed in the alpha band. However, the manuscript does not provide the corresponding results to substantiate this discussion. It is recommended to include these results as supplementary material.

      We thank the reviewer for this suggestion. We added a new figure along with the corresponding statistical results that displayed the specific result patterns for the alpha band (Supplementary Figure 1).

      Comment 6: On page 16, lines 323-328, the authors provide a generalized explanation of the findings. According to load theory, stimuli compete for resources only when represented in the same form. Since the pre-memorized Chinese characters are represented semantically in working memory, this explanation lacks a critical premise: that semantic-response mapping is also represented semantically during processing.

      We thank the reviewer for this insightful suggestion. We fully agree with the reviewer’s perspective. As stated in our revised version, load theory suggests that cognitive resources are limited and dependent on a specific type (in the second paragraph of the Discussion section). The previously memorized Chinese characters are stored in working memory in the form of semantic representations; meanwhile the stimulus-response mapping should also be represented semantically, leading to resource occupancy. We have included this logical premise in the revised version (in the third-to-last paragraph of the Discussion section).

      Comment 7: The classic Stroop task includes both a manual and a vocal version. Since stimulus-response mapping in the vocal version is more automatic than in the manual version, it is unclear whether the findings of this study would generalize to the impact of working memory load on the Stroop effect in the vocal version.

      We fully agree with the reviewer’s point that the verbal version of the Stroop task differs from the manual version in terms of the degree of automation in the stimulus-response mapping. Specifically, the verbal version relies on mappings that are established through daily language use, while the manual version involves arbitrary mappings created in the laboratory. Therefore, the stimulus-response mapping in the verbal response version is more automated and less likely to be suppressed. However, our previous research indicated that the degree of automation in the stimulus-response mapping was influenced by practice (Chen et al., 2013). After approximately 128 practice trials, semantic conflict almost disappears, suggesting that the level of automation in stimulus-response mapping for the verbal Stroop task is comparable to that of the manual version (Chen et al., 2010). Given that participants in our study completed 144 practice trials (in the Procedure section), we believe these findings can be generalized to the verbal version.

      Comment 8: While the discussion section provides a comprehensive analysis of the study’s results, the authors could further elaborate on the theoretical and practical contributions of this work.

      We thank the reviewer for the constructive suggestions. We recognize that the theoretical and practical contributions of the study were not thoroughly elaborated in the original manuscript. Therefore, we have now provided a more detailed discussion. Specifically, the theoretical contributions focus on advancing load theory and highlighting the critical role of working memory in conflict processing. The practical contributions emphasize the application of load theory and the development of intervention strategies for enhancing inhibitory control. A more detailed discussion can be found in the revised version (in the second-to-last paragraph of the Discussion section).

      Reviewer #2 (Public review):

      Comment 1: As the researchers mentioned, a previous study reported a diminished Stroop effect with concurrent working memory tasks to memorize meaningless visual shapes rather than memorize Chinese characters as in the study. My main concern is that lower-level graphic processing when memorizing visual shapes also influences the Stroop effect. The stage of Stroop conflict processing affected by the working memory load may depend on the specific content of the concurrent working memory task. If that’s the case, I sense that the generalization of this finding may be limited.

      We thank the reviewer for this insightful concern. As mentioned in the manuscript, this may be attributed to the inherent characteristics of Chinese characters. In contrast to English words, the processing of Chinese characters relies more on graphemic encoding and memory (Chen, 1993). Therefore, the processing of line patterns essentially occupies some of the resources needed for character processing, which aligns with our study’s hypothesis based on dimensional overlap. Additionally, regarding the results, even though the previous study presents lower-level line patterns, the results still showed that the working memory load modulated the later theta band. We hypothesize that, regardless of the specific content of the pre-presented working memory load, once the stimulus disappears from view, these loads are maintained as representations in the working memory platform. Therefore, they do not influence early perceptual processing, and resource competition only occurs once the distractors reach the working memory platform. Lastly, previous study has shown that spatial loads, which do not overlap with either the target or distractor dimensions, do not influence conflict effect (Zhao et al., 2010). Taken together, we believe that regardless of the specific content of the concurrent working memory tasks, as long as they occupy resources related to irrelevant stimulus dimensions, they can influence the late-stage processing of conflict effect. Perhaps our original manuscript did not convey this clearly, so we have rephrased it in a more straightforward manner (in the second paragraph of the Discussion section).

      Comment 2: The P1 and N450 components are sensitive to congruency in previous studies as mentioned by the researchers, but the results in the present study did not replicate them. This raised concerns about data quality and needs to be explained.

      We thank the reviewer for this insightful concern. For P1, we aimed to convey that the early perceptual processing represented by P1 is part of the conflict processing process. Therefore, we included it in our analysis. Additionally, as mentioned in the discussion, most studies find P1 to be insensitive to congruency. However, we inappropriately cited a study in the introduction that suggested P1 shows differences in congruency, which is among the few studies that hold this perspective. To prevent confusion for readers, we have removed this citation from the introduction.

      As for N450, most studies have indeed found it to be influenced by congruency. In our manuscript, we did not observe a congruency effect at our chosen electrodes and time window. However, significant congruency effects were detected at other central-parietal electrodes (CP3, CP4, P5, P6) during the 350-500 ms interval. The interaction between task type and consistency remained non-significant, consistent with previous results. Furthermore, with respect to the location of the electrodes chosen, existing studies on N450 vary widely, including central-parietal electrodes and frontal-central electrodes (for a review, see Heidlmayr et al., 2020). We speculate that this phenomenon may be related to the extent of practice. With fewer total trials, the task may involve more stimulus conflicts, engaging more frontal brain areas. On the other hand, with more total trials, the task may involve more response conflicts, engaging more central-parietal brain areas (Chen et al., 2013; van Veen & Carter, 2005). Due to the extensive practice required in our study, we identified a congruency N450 effect in the central-parietal region. We apologize for not thoroughly exploring other potential electrodes in the previous manuscript, and we have revised the results and interpretations regarding N450 accordingly in the revised version (in the N450 section of the ERP results and the third paragraph of the Discussion section).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comment 1: In the Introduction, line 108 states: “Second, alpha oscillations (8-13 Hz) can serve as a neural inverse index of mental activity or alertness, while a decrease in alpha power reflects increased alertness or enhanced attentional inhibition of distractors (Arakaki et al., 2022; Tafuro et al., 2019; Zhou et al., 2023; Zhu et al., 2023).” Please clarify which specific psychological process related to conflict processing is reflected by alpha oscillations.

      We appreciate your suggestion and we have clearly highlighted the role of alpha oscillations in attentional engagement during conflict processing in the revised version (in the third-to-last paragraph of the introduction).

      Comment 2: In Figures 3C and 3E, a space is needed between “amplitude” and the preceding parenthesis. Similar adjustments are required in Figures 4A, 4B, 4C, 5C, and 6C. Additionally, in Figures 3B and 3D, a space should be added between the numbers and “ms.” This issue also appears in Figure 8. Please review all figures for these formatting inconsistencies.

      We apologize for the inconsistency in formatting and have corrected them throughout the revised version.

      Comment 3: There are some clerical errors in the manuscript that need correction. For instance, on page 19, line 403: “Participants were asked to answer by pressing one of two response buttons (“S with the left ring finger and “L” with the left ring finger).” This should be corrected to: “L” with the right ring finger. I recommend that the authors carefully proofread the manuscript to identify and correct such errors.

      We sincerely apologize for the errors present in the manuscript and have now carefully proofread it (in the Procedure section).

      Comment 4: On page 13, line 254, the elimination of the Stroop effect should not be interpreted as an improvement in processing.

      We greatly appreciate your suggestion. We agree that the elimination of the Stroop effect should not be confused with improvements in processing. We have corrected this in the revised version (the second paragraph of the Discussion section).

      Reviewer #3 (Recommendations for the authors):

      Comment 1: In the introduction section, the N450 was introduced as “a frontal-central negative deflection”, but in the methods part the N450 was computed using central-parietal electrodes. This inconsistency is confusing and needs to be clarified.

      We apologize for this confusion. We have provided a detailed explanation regarding the differences in electrodes and the rationale behind choosing central-parietal electrodes in our response to Reviewer 2’s second comment. To clarify, we have updated the introduction to consistently label them as central-parietal deflections (in the third paragraph of the Introduction section).

      Comment 2: I speculate the “beta” was mistakenly written as “theta” in line 212.

      We sincerely apologize for this mistake. We have corrected this error (in the RSA results section).

      Comment 3: The speculation that “changes in beta bands may be influenced by theta bands, thereby indirectly influencing the behavioral Stroop effect” needs to be rationalized.

      We appreciate your suggestion. What we intended to convey is that we found an interaction effect in the beta bands; however, the RSA results did not show a correlation with the behavioral interaction effect. We speculate that beta activity might be influenced by the theta bands. On the one hand, we realize that the idea of beta bands indirectly influencing the behavioral Stroop effect was inappropriate, and we have removed this point in the revised version. On the other hand, we have provided rational evidence for the idea that beta bands may be influenced by theta bands. This is based on the biological properties of theta oscillations, which support communication between different cortical neural signals, and their functional role in integrating and transmitting task-relevant information to response execution (in the third-to-last paragraph of the Discussion section).

      Comment 4: Typo in line 479: [10,10].

      We sincerely apologize for this mistake. We have corrected this error: [-10,10] (in the Multivariate pattern analysis section).

      Reference

      Cavanagh, J. F., & Frank, M. J. (2014). Frontal theta as a mechanism for cognitive control. Trends in Cognitive Sciences, 18(8), 414–421. https://doi.org/10.1016/j.tics.2014.04.012

      Chen, M. J. (1993). A Comparison of Chinese and English Language Processing. In Advances in Psychology (Vol. 103, pp. 97–117). North-Holland. https://doi.org/10.1016/S0166-4115(08)61659-3

      Chen, X. F., Jiang, J., Zhao, X., & Chen, A. (2010). Effects of practice on semantic conflict and response conflict in the Stroop task. Psychol. Sci., 33, 869–871.

      Chen, Z., Lei, X., Ding, C., Li, H., & Chen, A. (2013). The neural mechanisms of semantic and response conflicts: An fMRI study of practice-related effects in the Stroop task. NeuroImage, 66, 577–584. https://doi.org/10.1016/j.neuroimage.2012.10.028

      Cohen, M. X. (2014). Analyzing Neural Time Series Data: Theory and Practice. The MIT Press. https://doi.org/10.7551/mitpress/9609.001.0001

      Duprez, J., Gulbinaite, R., & Cohen, M. X. (2020). Midfrontal theta phase coordinates behaviorally relevant brain computations during cognitive control. NeuroImage, 207, 116340. https://doi.org/10.1016/j.neuroimage.2019.116340

      Duque, J., Greenhouse, I., Labruna, L., & Ivry, R. B. (2017). Physiological Markers of Motor Inhibition during Human Behavior. Trends in Neurosciences, 40(4), 219–236. https://doi.org/10.1016/j.tins.2017.02.006

      Engel, A. K., & Fries, P. (2010). Beta-band oscillations—Signalling the status quo? Current Opinion in Neurobiology, 20(2), 156–165. https://doi.org/10.1016/j.conb.2010.02.015

      Heidlmayr, K., Kihlstedt, M., & Isel, F. (2020). A review on the electroencephalography markers of Stroop executive control processes. Brain and Cognition, 146, 105637. https://doi.org/10.1016/j.bandc.2020.105637

      Little, S., Bonaiuto, J., Barnes, G., & Bestmann, S. (2019). Human motor cortical beta bursts relate to movement planning and response errors. PLOS Biology, 17(10), e3000479. https://doi.org/10.1371/journal.pbio.3000479

      Morales, S., & Bowers, M. E. (2022). Time-frequency analysis methods and their application in developmental EEG data. Developmental Cognitive Neuroscience, 54, 101067. https://doi.org/10.1016/j.dcn.2022.101067

      Senoussi, M., Verbeke, P., Desender, K., De Loof, E., Talsma, D., & Verguts, T. (2022). Theta oscillations shift towards optimal frequency for cognitive control. Nature Human Behaviour, 6(7), Article 7. https://doi.org/10.1038/s41562-022-01335-5

      van Veen, V., & Carter, C. S. (2005). Separating semantic conflict and response conflict in the Stroop task: A functional MRI study. NeuroImage, 27(3), 497–504. https://doi.org/10.1016/j.neuroimage.2005.04.042

      Zhao, X., Chen, A., & West, R. (2010). The influence of working memory load on the Simon effect. Psychonomic Bulletin & Review, 17(5), 687–692. https://doi.org/10.3758/PBR.17.5.687

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We are grateful to the editors and reviewers for their careful reading and constructive comments. We have now done our best to respond to them fully through additional analyses and text revisions. In the sections below, the original reviewer comments are in black, and our responses are in red.

      To summarize, the major changes in this round of review are as follows:

      (1) We have included a new introductory figure (Figure 1) to explain the distinction between feature-based tasks and property-based tasks.

      (2) We have included a section on “key predictions” and a section on “overview of this study” in the Introduction to clearly delineate our key predictions and provide a overview of our study.

      (3) We have included additional analyses to address the reviewers’ concerns about circularity in Experiments 1 & 2. We show that distance-to-center or visual homogeneity computations performed on object representations obtained from deep networks (instead of the perceptual dissimilarities from Experiment 1) also yields comparable predictions of target-present and target-absent responses in Experiment 2. 

      (4) We have extensively reworked the manuscript wherever possible to address the specific concerns raised by the reviewers.

      We hope that the revised manuscript adequately addresses the concerns raised in this round of review, and we look forward to a positive assessment.

      eLife Assessment

      This study uses carefully designed experiments to generate a useful behavioural and neuroimaging dataset on visual cognition. The results provide solid evidence for the involvement of higher-order visual cortex in processing visual oddballs and asymmetry. However, the evidence provided for the very strong claims of homogeneity as a novel concept in vision science, separable from existing concepts such as target saliency, is inadequate.

      Thank you for your positive assessment. We agree that visual homogeneity is similar to existing concepts such as target saliency, memorability etc. We have proposed it as a separate concept because visual homogeneity has an independent empirical measure (the reciprocal of target-absent search time in oddball search, or the reciprocal of same response time in a same-different task, etc) that may or may not be the same as other empirical measures such as saliency and memorability. Investigating these possibilities is beyond the scope of our study but would be interesting for future work. We have now clarified this in the revised manuscript (Discussion, p. 42).

      However, we’d like to emphasize that the question of whether visual homogeneity is novel or related to existing concepts misses entirely the key contribution of our study.

      Our key contribution is a quantitative, falsifiable model for how the brain could be solving property-based tasks like same-different, oddball or symmetry. Most theories of decision making consider feature-based tasks where there is a well-defined feature space and decision variable. Property-based tasks pose a significant challenge to standard theories since it is not clear how these tasks could be solved. In fact, oddball search, same-different and symmetry tasks have been considered so different that they are rarely even mentioned in the same study. Our study represents a unifying framework showing that all three tasks can be understood as solving the same underlying fundamental problem, and presents evidence in favor of this solution.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors define a new metric for visual displays, derived from psychophysical response times, called visual homogeneity (VH). They attempt to show that VH is explanatory of response times across multiple visual tasks. They use fMRI to find visual cortex regions with VH-correlated activity. On this basis, they declare a new visual region in human brain, area VH, whose purpose is to represent VH for the purpose of visual search and symmetry tasks.

      Thank you for your accurate and positive assessment.

      Strengths:

      The authors present carefully designed experiments, combining multiple types of visual judgments and multiple types of visual stimuli with concurrent fMRI measurements. This is a rich dataset with many possibilities for analysis and interpretation.

      Thank you for your accurate and positive assessment.

      Weaknesses:

      The datasets presented here should provide a rich basis for analysis. However, in this version of the manuscript, I believe that there are major problems with the logic underlying the authors' new theory of visual homogeneity (VH), with the specific methods they used to calculate VH, and with their interpretation of psychophysical results using these methods. These problems with the coherency of VH as a theoretical construct and metric value make it hard to interpret the fMRI results based on searchlight analysis of neural activity correlated with VH.

      We respectfully disagree with your concerns, and have done our best to respond to them fully below.

      In addition, the large regions of VH correlations identified in Experiments 1 and 2 vs. Experiments 3 and 4 are barely overlapping. This undermines the claim that VH is a universal quantity, represented in a newly discovered area of visual cortex, that underlies a wide variety of visual tasks and functions.

      We respectfully disagree with your assertion. First of all, there is partial overlap between the VH regions, for which there are several other obvious explanations that must be considered first before dismissing VH outright as a flawed construct. We acknowledge these alternatives in the Results (p. 27), and the relevant text is reproduced below.

      “We note that it is not straightforward to interpret the overlap between the VH regions identified in Experiments 2 & 4. The lack of overlap could be due to stimulus differences (natural images in Experiment 2 vs silhouettes in Experiment 4), visual field differences (items in the periphery in Experiment 2 vs items at the fovea in Experiment 4) and even due to different participants in the two experiments. There is evidence supporting all these possibilities: stimulus differences (Yue et al., 2014), visual field differences (Kravitz et al., 2013) as well as individual differences can all change the locus of neural activations in object-selective cortex (Weiner and Grill-Spector, 2012a; Glezer and Riesenhuber, 2013). We speculate that testing the same participants on search and symmetry tasks using similar stimuli and display properties would reveal even larger overlap in the VH regions that drive behavior.”

      Maybe I have missed something, or there is some flaw in my logic. But, absent that, I think the authors should radically reconsider their theory, analyses, and interpretations, in light of detailed comments below, in order to make the best use of their extensive and valuable datasets combining behavior and fMRI. I think doing so could lead to a much more coherent and convincing paper, albeit possibly supporting less novel conclusions.

      We respectfully disagree with your assessment, and we hope that our detailed responses below will convince you of the merit of our claims.

      THEORY AND ANALYSIS OF VH

      (1) VH is an unnecessary, complex proxy for response time and target-distractor similarity.<br /> VH is defined as a novel visual quality, calculable for both arrays of objects (as studied in Experiments 1-3) and individual objects (as studied in Experiment 4). It is derived from a center-to-distance calculation in a perceptual space. That space in turn is derived from multi-dimensional scaling of response times for target-distractor pairs in an oddball detection task (Experiments 1 and 2) or in a same different task (Experiments 3 and 4).  Proximity of objects in the space is inversely proportional to response times for arrays in which they were paired. These response times are higher for more similar objects. Hence, proximity is proportional to similarity. This is visible in Fig. 2B as the close clustering of complex, confusable animal shapes.

      VH, i.e. distance-to-center, for target-present arrays is calculated as shown in Fig. 1C, based on a point on the line connecting target and distractors. The authors justify this idea with previous findings that responses to multiple stimuli are an average of responses to the constituent individual stimuli. The distance of the connecting line to the center is inversely proportional to the distance between the two stimuli in the pair, as shown in Fig. 2D. As a result, VH is inversely proportional to distance between the stimuli and thus to stimulus similarity and response times. But this just makes VH a highly derived, unnecessarily complex proxy for target-distractor similarity and response time. The original response times on which the perceptual space is based are far more simple and direct measures of similarity for predicting response times.

      Thank you for carefully thinking through our logic. We agree that a distance-to-centre calculation is entirely unnecessary as an explanation for target-present visual search. The difficulty of target-present search is already known to be directly proportional to the similarity between target and distractor, so there is nothing new to explain here.

      However, this is a narrow and selective interpretation of our findings because you are focusing only on our results on target-present searches, which are only half of all our data. The other half is the target-absent responses which previously have had no clear explanation. You are also missing the fact that we are explaining same-different and symmetry tasks as well using the same visual homogeneity computation.

      We urge you to think more deeply about the problem of how to decide whether an oddball is present or not in the first place. How do we actually solve this task? There must be some underlying representation and decision process. Our study shows that a distance-to-centre computation can actually serve as a decision variable to solve disparate property-based visual tasks. These tasks pose a major challenge to standard models of decision making, because the underlying representation and decision variable have been unclear. Our study resolves this challenge by proposing a novel computation that can be used by the brain to solve all these disparate tasks, and bring these tasks into the ambit of standard theories of decision making.  

      Our results also explain several interesting puzzles in the literature. If oddball search was driven only by target-distractor similarity, the time taken to respond when a target is absent should not vary at all, and should actually take longer than all target-present searches. But in fact, systematic variations in target-absent times have been observed always in the literature, but have never been explained using any theoretical models. Our results explain why target-absent times vary systematically – it is due to visual homogeneity.

      Similarly, in same-different tasks, participants are known to take longer to make a “different” response when the two items differ only slightly. By this logic, they should take the longest to make a “same” response, but in fact, paradoxically, participants are actually faster to make “same” responses. This fast-same effect has been noted several times, but never explained using any models. Our results provide an explanation of why “same” responses to an image vary systematically – it is due to visual homogeneity. 

      Finally, in symmetry tasks, symmetric objects evoke fast responses, and this has always been taken as evidence for special symmetry computations in the brain. But we show that the same distance-to-center computation can explain both responses to symmetric and asymmetric objects. Thus there is no need for a special symmetry computation in the brain.

      (2) The use of VH derived from Experiment 1 to predict response times in Experiment 2 is circular and does not validate the VH theory.<br /> The use of VH, a response time proxy, to predict response times in other, similar tasks, using the same stimuli, is circular. In effect, response times are being used to predict response times across two similar experiments using the same stimuli. Experiment 1 and the target present condition of Experiment 2 involve the same essential task of oddball detection. The results of Experiment 1 are converted into VH values as described above, and these are used to predict response times in experiment 2 (Fig. 2F). Since VH is a derived proxy for response values in Experiment 1, this prediction is circular, and the observed correlation shows only consistency between two oddball detection tasks in two experiments using the same stimuli.

      You are indeed correct in noting that both Experiment 1 & 2 involve oddball search, and so at the superficial level, it looks circular that the oddball search data of Experiment 1 is being used to explain the oddball search data of Experiment 2.

      However a deeper scrutiny reveals more fundamental differences: Experiment 1 consisted of only oddball search with the target appearing on the left or right, whereas Experiment 2 consisted of oddball search with the target either present or completely absent. In fact, we were merely using the search dissimilarities from Experiment 1 to reconstruct the underlying object representation, because it is well known that neural dissimilarities are predicted well by search dissimilarities (Sripati & Olson, 2009; Zhivago et al, 2014).

      To thoroughly refute any lingering concern about circularity, we reasoned that the model predictions for Experiment 2 could have been obtained by a distance-to-center computation on any brain like object representation. To this end, we used object representations from deep neural networks pretrained on object categorization, whose representations are known to match well with the brain, and asked if a distance-to-centre computation on these representations could predict the search data in Experiment 2. This was indeed the case, and these results are now included an additional section in Supplementary Material (Section S1).

      (3) The negative correlation of target-absent response times with VH as it is defined for target-absent arrays, based on distance of a single stimulus from center, is uninterpretable without understanding the effects of center-fitting. Most likely, center-fitting and the different VH metric for target-absent trials produce an inverse correlation of VH with target-distractor similarity.

      Unfortunately, as we have mentioned above, target-distractor similarity cannot explain how target-absent searches behave, since there is no distractor in such searches.

      We do understand your broader concern about the center-fitting algorithm itself. We performed a number of additional analyses to confirm the generality of our results and reject alternate explanations – these are summarized in a new section titled “Confirming the generality of visual homogeneity” (p. 12), and the section is reproduced below for your convenience.   

      “Confirming the generality of visual homogeneity

      We performed several additional analyses to confirm the generality of our results, and to reject alternate explanations.

      First, it could be argued that our results are circular because they involve taking oddball search times from Experiment 1 and using them to explain search response times in Experiment 2. This is a superficial concern since we are using the search dissimilarities from Experiment 1 only as a proxy for the underlying neural representation, based on previous reports that neural dissimilarities closely match oddball search dissimilarities (Sripati and Olson, 2010; Zhivago and Arun, 2014). Nonetheless, to thoroughly refute this possibility, we reasoned that we would get similar predictions of the target present/absent responses in Experiment using any other brain-like object representation. To confirm this, we replaced the object representations derived from Experiment 1 with object representations derived from deep neural networks pretrained for object categorization, and asked if distance-to-center computations could predict the target present/absent responses in Experiment 2. This was indeed the case (Section S1). 

      Second, we wondered whether the nonlinear optimization process of finding the best-fitting center could be yielding disparate optimal centres each time. To investigate this, we repeated the optimization procedure with many randomly initialized starting points, and obtained the same best-fitting center each time (see Methods).

      Third, to confirm that the above model fits are not due to overfitting, we performed a leave-one-out cross validation analysis. We left out all target-present and target-absent searches involving a particular image, and then predicted these searches by calculating visual homogeneity estimated from all other images. This too yielded similar positive and negative correlations (r = 0.63, p < 0.0001 for target-present, r = -0.63, p < 0.001  for target-absent).

      Fourth, if heterogeneous displays indeed elicit similar neural responses due to mixing, then their average distance to other objects must be related to their visual homogeneity. We confirmed that this was indeed the case, suggesting that the average distance of an object from all other objects in visual search can predict visual homogeneity (Section S1).

      Fifth, the above results are based on taking the neural response to oddball arrays to be the average of the target and distractor responses. To confirm that averaging was indeed the optimal choice, we repeated the above analysis by assuming a range of relative weights between the target and distractor. The best correlation was obtained for almost equal weights in the lateral occipital (LO) region, consistent with averaging and its role in the underlying perceptual representation (Section S1).

      Finally, we performed several additional experiments on a larger set of natural objects as well as on silhouette shapes. In all cases, present/absent responses were explained using visual homogeneity (Section S2).”

      The construction of the VH perceptual space also involves fitting a "center" point such that distances to center predict response times as closely as possible. The effect of this fitting process on distance-to-center values for individual objects or clusters of objects is unknowable from what is presented here. These effects would depend on the residual errors after fitting response times with the connecting line distances. The center point location and its effects on distance-to-center of single objects and object clusters are not discussed or reported here.

      While it is true that the optimal center needs to be found by fitting to the data, there no particular mystery to the algorithm: we are simply performing a standard gradient-descent to maximize the fit to the data. We have described the algorithm clearly and are making our codes public. We find the algorithm to yield stable optimal centers despite many randomly initialized starting points. We find the optimal center to be able to predict responses to entirely novel images that were excluded during model training. We are making no assumption about the location of centre with respect to individual points. Therefore, we see no cause for concern regarding the center-finding algorithm. 

      Yet, this uninterpretable distance-to-center of single objects is chosen as the metric for VH of target-absent displays (VHabsent). This is justified by the idea that arrays of a single stimulus will produce an average response equal to one stimulus of the same kind. But it is not logically clear why response strength to a stimulus should be a metric for homogeneity of arrays constructed from that stimulus, or even what homogeneity could mean for a single stimulus from this set. And it is not clear how this VHabsent metric based on single stimuli can be equated to the connecting line VH metric for stimulus pairs, i.e. VHpresent, or how both could be plotted on a single continuum.

      Most visual tasks, such as finding an animal, are thought to involve building a decision boundary on some underlying neural representation. Even visual search has been portrayed as a signal-detection problem where a particular target is to be discriminated from a distractor. However none of these formulations work in the case of property-based visual tasks, where there is no unique feature to look for.

      We are proposing that, when we view a search array, the neural response to the search array can be deduced from the neural responses to the individual elements using well known rules, and that decisions about an oddball target being present or absent can be made by computing the distance of this neural response from some canonical mean firing rate of a population of neurons. This distance to center computation is what we denote as visual homogeneity. We have revised our manuscript throughout to make this clearer and we hope that this helps you understand the logic better. 

      It is clear, however, what *should* be correlated with difficulty and response time in the target-absent trials, and that is the complexity of the stimuli and the numerosity of similar distractors in the overall stimulus set. Complexity of the target, similarity with potential distractors, and number of such similar distractors all make ruling out distractor presence more difficult. The correlation seen in Fig. 2G must reflect these kinds of effects, with higher response times for complex animal shapes with lots of similar distractors and lower response times for simpler round shapes with fewer similar distractors.

      You are absolutely correct that the stimulus complexity should matter, but there are no good empirically derived measures for stimulus complexity, other than subjective ratings which are complex on their own and could be based on any number of other cognitive and semantic factors. But considering what factors are correlated with target-absent response times is entirely different from asking what decision variable or template is being used by participants to solve the task.

      The example points in Fig. 2G seem to bear this out, with higher response times for the deer stimulus (complex, many close distractors in the Fig. 2B perceptual space) and lower response times for the coffee cup (simple, few close distractors in the perceptual space). While the meaning of the VH scale in Fig. 2G, and its relationship to the scale in Fig. 2F, are unknown, it seems like the Fig. 2G scale has an inverse relationship to stimulus complexity, in contrast to the expected positive relationship for Fig. 2F. This is presumably what creates the observed negative correlation in Fig. 2G.

      Taken together, points 1-3 suggest that VHpresent and VHabsent are complex, unnecessary, and disconnected metrics for understanding target detection response times. The standard, simple explanation should stand. Task difficulty and response time in target detection tasks, in both present and absent trials, are positively correlated with target-distractor similarity.

      We strongly disagree. Your assessment seems to be based on only considering target-present searches, which are of course driven by target-distractor similarity. Your  argument is flawed because systematic variations in target-absent trials cannot be linked to any target-distractor similarity since there are no targets in the first place in such trials.

      We have shown that target-absent response times are in fact, independent of experimental context, which means that they index an image property that is independent of any reference target (Results, p. 15; Section S4). This property is what we define as visual homogeneity.

      I think my interpretations apply to Experiments 3 and 4 as well, although I find the analysis in Fig. 4 especially hard to understand. The VH space in this case is based on Experiment 3 oddball detection in a stimulus set that included both symmetric and asymmetric objects. But the response times for a very different task in Experiment 4, a symmetric/asymmetric judgment, are plotted against the axes derived from Experiment 3 (Fig. 4F and 4G). It is not clear to me why a measure based on oddball detection that requires no use of symmetry information should be predictive of within-stimulus symmetry detection response times. If it is, that requires a theoretical explanation not provided here.

      We were simply using an oddball detection task to construct the underlying object representation, on the basis of observations that search dissimilarities are strongly correlated with neural   dissimilarities. In Section S1, we show that similar results could have been obtained using other object representations such as deep networks, as long as the representation is brain-like.

      (4) Contrary to the VH theory, same/different tasks are unlikely to depend on a decision boundary in the middle of a similarity or homogeneity continuum.

      We have provided empirical proof for our claims, by showing that target-present response times in a visual search task are correlated with “different” responses in the same-different task, and that target-absent response times in the visual search task are correlated with “same” responses in the same-different task (Section S4).

      The authors interpret the inverse relationship of response times with VHpresent and VHabsent, described above, as evidence for their theory. They hypothesize, in Fig. 1G, that VHpresent and VHabsent occupy a single scale, with maximum VHpresent falling at the same point as minimum VHabsent. This is not borne out by their analysis, since the VHpresent and VHabsent value scales are mainly overlapping, not only in Experiments 1 and 2 but also in Experiments 3 and 4. The authors dismiss this problem by saying that their analyses are a first pass that will require future refinement. Instead, the failure to conform to this basic part of the theory should be a red flag calling for revision of the theory.

      Again, the opposite correlations between target present/absent search times with VH are the crucial empirical validation of our claims that a distance-to-center calculation explain how we perform these property-based tasks. The VH predictions do not fully explain the data. We have explicitly acknowledged this shortcoming, so we are hardly dismissing it as a problem. 

      The reason for this single scale is that the authors think of target detection as a boundary decision task, along a single scale, with a decision boundary somewhere in the middle, separating present and absent. This model makes sense for decision dimensions or spaces where there are two categories (right/left motion; cats vs. dogs), separated by an inherent boundary (equal left/right motion; training-defined cat/dog boundary). In these cases, there is less information near the boundary, leading to reduced speed/accuracy and producing a pattern like that shown in Fig. 1G.

      Finding an oddball, deciding if two items are same or different and symmetry tasks are disparate visual tasks that do not fit neatly into standard models of decision making. The key conceptual advance of our study is that we propose a plausible neural representation and decision variable that allow all three property-based visual tasks to be reconciled with standard models of decision making.

      This logic does not hold for target detection tasks. There is no inherent middle point boundary between target present and target absent. Instead, in both types of trial, maximum information is present when target and distractors are most dissimilar, and minimum information is present when target and distractors are most similar. The point of greatest similarity occurs at then limit of any metric for similarity. Correspondingly, there is no middle point dip in information that would produce greater difficulty and higher response times. Instead, task difficulty and response times increase monotonically with similarity between targets and distractors, for both target present and target absent decisions. Thus, in Figs. 2F and 2G, response times appear to be highest for animals, which share the largest numbers of closely similar distractors.        

      Your alternative explanation rests on vague factors like “maximum information” which cannot be quantified. By contrast we are proposing a concrete, falsifiable model for three property-based tasks – same/different, oddball present/absent and object symmetry. Any argument based solely on item similarity to explain visual search or symmetry responses cannot explain systematic variations observed for target-absent arrays and for symmetric objects, for the reasons explained earlier.

      DEFINITION OF AREA VH USING fMRI

      (1) The area VH boundaries from different experiments are nearly completely non-overlapping.

      In line with their theory that VH is a single continuum with a decision boundary somewhere in the middle, the authors use fMRI searchlight to find an area whose responses positively correlate with homogeneity, as calculated across all of their target present and target absent arrays. They report VH-correlated activity in regions anterior to LO. However, the VH defined by symmetry Experiments 3 and 4 (VHsymmetry) is substantially anterior to LO, while the VH defined by target detection Experiments 1 and 2 (VHdetection) is almost immediately adjacent to LO. Fig. S13 shows that VHsymmetry and VHdetection are nearly non-overlapping. This is a fundamental problem with the claim of discovering a new area that represents a new quantity that explains response times across multiple visual tasks. In addition, it is hard to understand why VHsymmetry does not show up in a straightforward subtraction between symmetric and asymmetric objects, which should show a clear difference in homogeneity.

      We respectfully disagree. The partial overlap between the VH regions identified in Experiments 1 & 2 can hardly be taken as evidence against the quantity VH itself, because there are several other obvious alternate explanations for this partial overlap, as summarized earlier as well. The VH region does show up in a straightforward subtraction  between symmetric and asymmetric objects (Section S7), so we are not sure what the Reviewer is referring to here.

      (2) It is hard to understand how neural responses can be correlated with both VHpresent and VHabsent.

      The main paper results for VHdetection are based on both target-present and target-absent trials, considered together. It is hard to interpret the observed correlations, since the VHpresent and VHabsent metrics are calculated in such different ways and have opposite correlations with target similarity, task difficulty, and response times (see above). It may be that one or the other dominates the observed correlations. It would be clarifying to analyze correlations for target-present and target-absent trials separately, to see if they are both positive and correlated with each other.

      Thanks for raising this point. We have now confirmed that the positive correlation between VH and neural response holds even when we do the analysis separately for target-present and -absent searches (correlation between neural response in VH region and visual homogeneity (n = 32, r = 0.66, p < 0.0005 for target-present searches & n = 32, r = 0.56, p < 0.005 for target-absent searches).

      (3) Definition of the boundaries and purpose of a new visual area in the brain requires circumspection, abundant and convergent evidence, and careful controls.

      Even if the VH metric, as defined and calculated by the authors here, is a meaningful quantity, it is a bold claim that a large cortical area just anterior to LO is devoted to calculating this metric as its major task. Vision involves much more than target detection and symmetry detection. Cortex anterior to LO is bound to perform a much wider range of visual functionalities. If the reported correlations can be clarified and supported, it would be more circumspect to treat them as one byproduct of unknown visual processing in cortex anterior to LO, rather than treating them as the defining purpose for a large area of visual cortex.

      We totally agree with you that reporting a new brain region would require careful interpretation and abundant and converging evidence. However, this requires many studies worth of work, and historically category-selective regions like the FFA have achieved consensus only after they were replicated and confirmed across many studies. We believe our proposal for the computation of a quantity like visual homogeneity is conceptually novel, and our study represents a first step that provides some converging evidence (through replicable results across different experiments) for such a region. We have reworked our manuscript to make this point clearer (Discussion, p 32).

      Reviewer #3 (Public Review):

      Summary:

      This study proposes visual homogeneity as a novel visual property that enables observers perform to several seemingly disparate visual tasks, such as finding an odd item, deciding if two items are same, or judging if an object is symmetric. In Exp 1, the reaction times on several objects were measured in human subjects. In Exp 2, visual homogeneity of each object was calculated based on the reaction time data. The visual homogeneity scores predicted reaction times. This value was also correlated with the BOLD signals in a specific region anterior to LO. Similar methods were used to analyze reaction time and fMRI data in a symmetry detection task. It is concluded that visual homogeneity is an important feature that enables observers to solve these two tasks.

      Thank you for your accurate and positive assessment.

      Strengths:

      (1) The writing is very clear. The presentation of the study is informative.

      (2) This study includes several behavioral and fMRI experiments. I appreciate the scientific rigor of the authors.

      We are grateful to you for your balanced assessment and constructive comments.

      Weaknesses:

      (1) My main concern with this paper is the way visual homogeneity is computed. On page 10, lines 188-192, it says: "we then asked if there is any point in this multidimensional representation such that distances from this point to the target-present and target-absent response vectors can accurately predict the target-present and target-absent response times with a positive and negative correlation respectively (see Methods)". This is also true for the symmetry detection task. If I understand correctly, the reference point in this perceptual space was found by deliberating satisfying the negative and positive correlations in response times. And then on page 10, lines 200-205, it shows that the positive and negative correlations actually exist. This logic is confusing. The positive and negative correlations emerge only because this method is optimized to do so. It seems more reasonable to identify the reference point of this perceptual space independently, without using the reaction time data. Otherwise, the inference process sounds circular. A simple way is to just use the mean point of all objects in Exp 1, without any optimization towards reaction time data.

      We disagree with you since the same logic applies to any curve-fitting procedure. When we fit data to a straight line, we are finding the slope and intercept that minimizes the error between the data and the straight line, but we would hardly consider the process circular when a good fit is achieved – in fact we take it as a confirmation that the data can be fit linearly. In the same vein, we would not have observed a good fit to the data, if there did not exist any good reference point relative to which the distances of the target-present and target-absent search arrays predicted these response times.

      In Section S2, we show that the visual homogeneity estimates for each object is strongly correlated with the average distance of each object to all other objects (r = 0.84, p<0.0005, Figure S1).

      We have performed several additional analyses to confirm the generality of our results and to reject alternate explanations (see Results, p. 12, Section titled “Confirming the generality of visual homogeneity”). In particular, to confirm that the results we obtained are not due to overfitting, we performed a cross-validation analysis, where we removed all searches involving a particular image and predicted these response times using visual homogeneity. This too revealed a significant model correlation confirming that our results are not due to overfitting.

      (2) Visual homogeneity (at least given the current from) is an unnecessary term. It is similar to distractor heterogeneity/distractor variability/distractor statics in literature. However, the authors attempt to claim it as a novel concept. The title is "visual homogeneity computations in the brain enable solving generic visual tasks". The last sentence of the abstract is "a NOVEL IMAGE PROPERTY, visual homogeneity, is encoded in a localized brain region, to solve generic visual tasks". In the significance, it is mentioned that "we show that these tasks can be solved using a simple property WE DEFINE as visual homogeneity". If the authors agree that visual homogeneity is not new, I suggest a complete rewrite of the title, abstract, significance, and introduction.

      We respectfully disagree that visual homogeneity is an unnecessary term. Please see our comments to Reviewer 1 above. Just like saliency and memorability can be measured empirically, we propose that visual homogeneity can be empirically measured as the reciprocal of the target-absent search time in a search task, or as the reciprocal of the “same” response time in a same-different task. Understanding how these three quantities interact will require measuring them empirically for an identical set of images, which is beyond the scope of this study but an interesting possibility for future work.

      (3) Also, "solving generic tasks" is another overstatement. The oddball search tasks, same-different tasks, and symmetric tasks are only a small subset of many visual tasks. Can this "quantitative model" solve motion direction judgment tasks, visual working memory tasks? Perhaps so, but at least this manuscript provides no such evidence. On line 291, it says "we have proposed that visual homogeneity can be used to solve any task that requires discriminating between homogeneous and heterogeneous displays". I think this is a good statement. A title that says "XXXX enable solving discrimination tasks with multi-component displays" is more acceptable. The phrase "generic tasks" is certainly an exaggeration.

      Thank you for your suggestion. We have now replaced the term “generic tasks” with the term property-based tasks, which we feel is more appropriate and reflect the fact that oddball search, same-different and symmetry tasks all involve looking for a specific image property.

      (4) If I understand it correctly, one of the key findings of this paper is "the response times for target-present searches were positively correlated with visual homogeneity. By contrast, the response times for target-absent searches were negatively correlated with visual homogeneity" (lines 204-207). I think the authors have already acknowledged that the positive correlation is not surprising at all because it reflects the classic target-distractor similarity effect. But the authors claim that the negative correlations in target-absent searches is the true novel finding.

      (5) I would like to make it clear that this negative correlation is not new either. The seminal paper by Duncan and Humphreys (1989) has clearly stated that "difficulty increases with increased similarity of targets to nontargets and decreased similarity between nontargets" (the sentence in their abstract). Here, "similarity between nontargets" is the same as the visual homogeneity defined here. Similar effects have been shown in Duncan (1989) and Nagy, Neriani, and Young (2005). See also the inconsistent results in Nagy & Thomas, 2003, Vicent, Baddeley, Troscianko & Gilchrist, 2009. More recently, Wei Ji Ma has systematically investigated the effects of heterogeneous distractors in visual search. I think the introduction part of Wei Ji Ma's paper (2020) provides a nice summary of this line of research. I am surprised that these references are not mentioned at all in this manuscript (except Duncan and Humphreys, 1989).

      You are right in noting that Duncan and Humphreys (1989) propose that searches are more difficult when nontargets are dissimilar. However, since our searches have identical distractors, the similarity between nontargets is always constant across target-absent searches, and therefore this cannot predict any systematic variation in target-absent search that is observed in our data. By contrast, our results explain both target-absent searches and target-present searches.

      Thank you for pointing us to previous work. These studies show that it is not just the average distractor similarity but the statistics of the distractor similarity that drive visual search. However these studies do not explain why target-absent searches should vary systematically. 

      (6) If the key contribution is the quantitative model, the study should be organized in a different way. Although the findings of positive and negative correlations are not novel, it is still good to propose new models to explain classic phenomena. I would like to mention the three studies by Wei Ji Ma (see below). In these studies, Bayesian observer models were established to account for trial-by-trial behavioral responses. These computational models can also account for the set-size effect, behavior in both localization and detection tasks. I see much more scientific rigor in their studies. Going back to the quantitative model in this paper, I am wondering whether the model can provide any qualitative prediction beyond the positive and negative correlations? Can the model make qualitative predictions that differ from those of Wei Ji's model? If not, can the authors show that the model can quantitatively better account for the data than existing Bayesian models? We should evaluate a model either qualitatively or quantitatively.

      Thank you for pointing us to prior work by Wei Ji Ma. These studies systematically examined visual search for a target among heterogeneous distractors using simple parametric stimuli and a Bayesian modeling framework. By contrast, our experiments involve searching for single oddball targets among multiple identical distractors, so it is not clear to us that the Wei Ji Ma models can be easily used to generate predictions about these searches used in our study. 

      We are not sure what you mean by offering quantitative predictions beyond positive and negative correlations. We have tried to explain systematic variation in target-present and target-absent response times using a model of how these decisions are being made. Our model explains a lot of systematic variation in the data for both types of decisions.

      (7) In my opinion, one of the advantages of this study is the fMRI dataset, which is valuable because previous studies did not collect fMRI data. The key contribution may be the novel brain region associated with display heterogeneity. If this is the case, I would suggest using a more parametric way to measure this region. For example, one can use Gabor stimuli and systematically manipulate the variations of multiple Gabor stimuli, the same logic also applies to motion direction. If this study uses static Gabor, random dot motion, object images that span from low-level to high-level visual stimuli, and consistently shows that the stimulus heterogeneity is encoded in one brain region, I would say this finding is valuable. But this sounds like another experiment. In other words, it is insufficient to claim a new brain region given the current form of the manuscript.

      We agree that parametric stimulus manipulations are important for studying early visual areas where stimulus dimensions are known (e.g. orientation, spatial frequency). Using parametric stimulus manipulations for more complex stimuli is fraught with issues because the underlying representation may not be encoding the dimensions being manipulated. This is the reason why we attempted to recover the underlying neural representation using dissimilarities measured using visual search, and then asked whether a decision making process operating on this underlying representation can explain how decisions are made. Therefore we disagree that parametric stimulus manipulations are the only way to obtain insight into such tasks.

      We have proposed a quantitative model that explains how decisions about target present and absent can be made through distance-to-center computations on an underlying object representation. We feel that the behavioural and the brain imaging results strongly point to a novel computation that is being performed in a localized region in the brain. These results represent an important first step in understanding how complex, property-based tasks are performed by the brain. We have revised our manuscript to make this point clearer.

      REFERENCES

      - Duncan, J., & Humphreys, G. W. (1989). Visual search and stimulus similarity. Psychological Review, 96(3), 433-458. doi: 10.1037/0033-295x.96.3.433

      - Duncan, J. (1989). Boundary conditions on parallel processing in human vision. Perception, 18(4), 457-469. doi: 10.1068/p180457

      - Nagy, A. L., Neriani, K. E., & Young, T. L. (2005). Effects of target and distractor heterogeneity on search for a color target. Vision Research, 45(14), 1885-1899. doi: 10.1016/j.visres.2005.01.007

      - Nagy, A. L., & Thomas, G. (2003). Distractor heterogeneity, attention, and color in visual search. Vision Research, 43(14), 1541-1552. doi: 10.1016/s0042-6989(03)00234-7

      - Vincent, B., Baddeley, R., Troscianko, T., & Gilchrist, I. (2009). Optimal feature integration in visual search. Journal of Vision, 9(5), 15-15. doi: 10.1167/9.5.15

      - Singh, A., Mihali, A., Chou, W. C., & Ma, W. J. (2023). A Computational Approach to Search in Visual Working Memory.

      - Mihali, A., & Ma, W. J. (2020). The psychophysics of visual search with heterogeneous distractors. BioRxiv, 2020-08.

      - Calder-Travis, J., & Ma, W. J. (2020). Explaining the effects of distractor statistics in visual search. Journal of Vision, 20(13), 11-11.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors have not made substantive changes to address my major concerns. Instead, they have responded with arguments about why their original manuscript was good as written. I did not find these arguments persuasive. Given that, I've left my public review the same, since it still represents my opinions about the paper. Readers can judge which viewpoints are more persuasive.

      We respectfully disagree: we have tried our best to address your concerns with additional analysis wherever feasible, and by acknowledging any limitations.

      Reviewer #3 (Recommendations For The Authors):

      (1) As I mentioned above, please consider rewriting title, abstract, introduction, and significance. Please remove the word "visual homogeneity" and instead use distractor heterogeneity/distractor variability/distractor statistics as often used in literature.

      To clarify, visual homogeneity is NOT the same as distractor homogeneity. Visual homogeneity refers to a distance-to-center computation and represents an image-computable property that can vary systematically even when all distractors are identical. By contrast distractor heterogeneity varies only when distractors are different from each other.

      (2) Better to remove the phrase "generic tasks".

      Thanks for your suggestions. We now refer to these tasks as property-based tasks. 

      (3) Better to explicitly specify the predictions made by the quantitative model beyond positive and negative correlations.

      The predictions of the quantitative model are to explain systematic variation in the response times. We are not sure what else is there to predict in the response times.

      (4) If the quantitative model is the key contribution, better to highlight the details and algorithmic contribution of the model, and show the advantage of this model either qualitatively and quantitatively.

      Please see our responses above. Our quantitative model explains behavior and brain imaging data on three disparate tasks – the same/different, oddball visual search and symmetry tasks. 

      (5) If the new brain region is the key contribution, better to downplay the quantitative model.

      Please see our responses above. Our quantitative model explains behavior and brain imaging data on three disparate tasks – the same/different, oddball visual search and symmetry tasks.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1 (Public Review):

      The authors explain that an action potential that reaches an axon terminal emits a small electrical field as it ”annihilates”. This happens even though there is no gap junction, at chemical synapses. The generated electrical field is simulated to show that it can affect a nearby, disconnected target membrane by tens of microvolts for tenths of a microsecond. Longer effects are simulated for target locations a few microns away.

      To simulate action potentials (APs), the paper does not use the standard Hodgkin-Huxley formalism because it fails to explain AP collision. Instead, it uses the Tasaki and Matsumoto (TM) model which is simplified to only model APs with three parameters and as a membrane transition between two states of resting versus excited. The authors expand the strictly binary, discrete TM method to a Relaxing Tasaki Model (RTM) that models the relaxation of the membrane potential after an AP. They find that the membrane leak can be neglected in determining AP propagation and that the capacitive currents dominate the process.

      The strength of the work is that the authors identified an important interaction between neurons that is neglected by the standard models. A weakness of the proposed approach is the assumptions that it makes. For instance, the external medium is modeled as a homogeneous conductive medium, which may be further explored to properly account for biological processes.

      The authors provide convincing evidence by performing experiments to record action potential propagation and collision properties and then developing a theoretical framework to simulate the effect of their annihilation on nearby membranes. They provide both experimental evidence and rigorous mathematical and computer simulation findings to support their claims. The work has the potential of explaining significant electrical interaction between nerve centers that are connected via a large number of parallel fibers.

      We thank the reviewer for the distinct analysis of our work and the assessment that we ’identified an important interaction between neurons that is neglected by standard models’.

      Indeed, we modeled the external (extracellular) medium as homogeneous conductive medium and, compared to real biological systems, this is a simplification. Our intention is to keep our formal model as general as possible, however, it can be extended to account for specific properties. Accessory structures at axon terminals (such as the pinceau at Purkinje cells) most likely evolved to shape ephaptic coupling. In addition, the extracellular medium is neither homogeneous nor isotropic, and to fully mimic a particular neural connection this has to be implemented in a model as well. We agree and look forward to see how specific modification of the external medium in biological systems will affect ephaptic coupling. We hope to facilitate progress on this question by providing our source code for further exploration. Using the tools that have been developed by the BRIAN community one can generate or import arbitrary complex cell morphologies (e.g. NeuroML files). Our source code adds the TM- and RTM model, which allows exploring the direct impact of extracellular properties on target neurons.

      Reviewer 2 (Public Review):

      In this study, the authors measured extracellular electrical features of colliding APs travelling in different directions down an isolated earthworm axon. They then used these features to build a model of the potential ephaptic effects of AP annihilation, i.e. the electrical signals produced by colliding/annihilating APs that may influence neighbouring tissue. The model was then applied to some different hypothetical scenarios involving synaptic connections. The conclusion was that an annihilating AP at a presynaptic terminal can ephaptically influence the voltage of a postsynaptic cell (this is, presumably, the ’electrical coupling between neurons’ of the title), and that the nature of this influence depends on the physical configuration of the synapse.

      As an experimental neuroscientist who has never used computational approaches, I am unable to comment on the rigour of the analytical approaches that form the bulk of this paper. The experimental approaches appear very well carried out, and here I just have one query - an important assumption made is that the conduction velocity of anti- and orthodromically propagating APs is identical in every preparation, but this is never empirically/statistically demonstrated.

      My major concern is with the conclusions drawn from the synaptic modelling, which, disappointingly, is never benchmarked against any synaptic data. The authors state in their Introduction that a ’quantitative physical description’ of ephaptic coupling is ’missing’, however, they do not provide such a description in this manuscript. Instead, modelled predictions are presented of possible ephaptic interactions at different types of synapses, and these are then partially and qualitatively compared to previous published results in the Discussion. To support the authors’ assertion that AP annihilation induces electrical coupling between neurons, I think they need to show that their model of ephaptic effects can quantitatively explain key features of experimental data pertaining to synaptic function. Without this, the paper contains some useful high-precision quantitative measurements of axonal AP collisions, some (I assume) high-quality modelling of these collisions, and some interesting theoretical predictions pertaining to synaptic interactions, but it does not support the highly significant implications suggested for synaptic function.

      We thank the reviewer for highlighting the potential and the limitation of our model. We demonstrated with empirical data that measured conduction velocities of anti- and orthodromic propagating APs are indeed very similar and values are provided in Appendix 3 – table 1.

      In order to address how our model ’of ephaptic effects can quantitatively explain key features of experimental data’, we used the measured modulation of AP rates in Purkinje fibers by Blot and Babour (2014) and our results are now included in the manuscript. In our model, we implemented the ephaptic coupling of the Basket cell (with an annihilating AP) and predicted the modulation of AP rate in the Purkinje cell. Our model predictions are compared to the measured modulation of AP-rates in Purkinje cells and is added as Fig. 5 to the main manuscript (line 264 to 284 ). With this example, we show that ephaptic coupling as described with our RTM model can quantitatively describe key features of experimental data. Both, the rapid inhibition and the rebound activity is described by our model with implementation of non-excitable parts at the pinceau of the Basket cell. Future, experimental research can use the provided formalism to investigate in more detail the ephaptic coupling in systems like the Mauthner cell and the Purkinje cell by exploring how accessory structures and concomitant physical parameters, e.g. the extracellular properties impact ephaptic coupling.

      Reviewer 3 (Public Review):

      This manuscript aims to exploit experimental measurements of the extracellular voltages produced by colliding action potentials to adjust a simplified model of action potential propagation that is then used to predict the extracellular fields at axon terminals. The overall rationale is that when solving the cable equation (which forms the substrate for models of action potential propagation in axons), the solution for a cable with a closed end can be obtained by a technique of superposition: a spatially reflected solution is added to that for an infinite cable and this ensures by symmetry that no axial current flows at the closed boundary. By this method, the authors calculate the expected extracellular fields for axon terminals in different situations. These fields are of potential interest because, according to the authors, their magnitude can be larger than that of a propagating action potential and may be involved in ephaptic signalling. The authors perform direct measurements of colliding action potentials, in the earthworm giant axon, to parameterise and test their model.

      Although simplified models can be useful and the trick of exploiting the collision condition is interesting, I believe there are several significant problems with the rationale, presentation, and application, such that the validity and potential utility of the approach is not established.

      Simplified model vs. Hogdkin and Huxley

      The authors employ a simplified model that incorporates a two-state membrane (in essence resting and excited states) and adds a recovery mechanism. This generates a propagating wave of excitation and key observables such as propagation speed and action potential width (in space) can be adjusted using a small number of parameters. However, even if a Hodgkin-Huxley model does contain a much larger number of parameters that may be less easy to adjust directly, the basic formalism is known to be accurate and typical modifications of the kinetic parameters are very well understood, even if no direct characterisations already exist or cannot be obtained. I am therefore unconvinced by the utility of abandoning the HodgkinHuxley version.

      In several places in the manuscript, the simplified model fits the data well whereas the Hodgkin-Huxley model deviates strongly (e.g. Fig. 3CD). This is unsatisfying because it seems unlikely that the phenomenon could not be modelled accurately using the HH formulation. If the authors really wish to assert that it is ”not suitable to predict the effects caused by AP [collision]” (p9) they need to provide a good deal more analysis to establish the mechanism of failure.

      We are not as convinced as the reviewer that, at the current state of parameter estimation, the HH model is suited for predicting ephaptic coupling after ’adjusting’ parameters. There are strong arguments against such an approach. A major function of a model is to make testable predictions rather than to just mimic a biological phenomenon. The predictive power of a model heavily depends on how reasonable model parameters can be estimated or measured. As the reviewer correctly points out in the specific comments (”... the parameters adjusted to fit the model are the membrane capacitance and intracellular resistance. These have a physical reality and could easily be measured or estimated quite accurately...”), our model contains only parameters that can be assessed experimentally, thus it has a better predictive power compared to the HH model with a multitude of parameters for which ”no direct characterisations already exist or cannot be obtained” (citing reviewer from above).

      Already the founders of the HH model were well aware of the limitations, as stated by Hodgkin and Huxley in 1952 (J Physiol 117:500–544):

      An equally satisfactory description of the voltage clamp data could no doubt have been achieved with equations of very different form ... The success of the equations is no evidence in favour of the mechanism of permeability change that we tentatively had in mind when formulating them.

      A catchy but sloppy description for the problem of overfitting with too many parameters is given by the quote of John von Neumann: With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.

      We do not rule out the possibility that the HH model eventually can be used to predict ephaptic coupling. However, at the moment, parameter estimation for the HH model prevents its usability for predicting ephaptic coupling.

      (In)applicability of the superposition principle

      The reflecting boundary at the terminal is implemented using the symmetry of the collision of action potentials. However, at a closed cable there is no reflecting boundary in the extracellular space and this implied assumption is particularly inappropriate where the extracellular field is one objective of the modelling, as here. I believe this assumption is not problematic for the calculation of the intracellular voltage, because extracellular voltage gradients can usually be neglected1, but the authors need to explain how the issue was dealt with for the calculation of the extracellular fields of terminals. I assume they were calculated from the membrane currents of one-half of the collision solution, but this does not seem to be explained. It might be worth showing a spatial profile of the calculated field.

      We disagree with the reviewer’s statement ’...at a closed cable there is no reflecting boundary in the extracellular space and this implied assumption is particularly inappropriate...’. We do not imply this assumption in our model! We do not assume any symmetry or boundary condition in the extracellular space. Instead, the extracellular field is calculated for an infinite homogeneous volume conductor (Eq.

      6).

      We conduct separate calculations for (1) source membrane current, (2) resulting extracellular field, and (3) impact upon a target neuron. The boundary condition used for our calculations only refers to the axial current being zero at the axon terminal. Consequently all the internal current that enters the last compartment must leave the last compartment as membrane current and contributes to the extracellular current and field.

      The extracellular field around the axon terminal is not symmetric, as can be seen by it’s impact upon a target in Figure 4—figure supplement 1 which is also not symmetric. The symmetry of the extracellular field when APs are colliding (Cf. symmetry in Fig 1C) is merly the result of the symmetric stimulation and counterpropagation of two APs. We now are describing more specifically the bounday condition for colliding and terminating APs already in the introduction: ’A suitable boundary condition (intracellular, axial current equals zero) can be generated experimentally by a collision of two counter-propagating APs ... Within any cable model, the very same boundary condition also exists within the axon at the synaptic terminal due to the broken translation symmetry for the current loops ...’ Later, at the result section (Discharge of colliding APs), we continue with ’AP propagation is blocked when the axial current is shut down at a boundary condition, e.g. by reaching the axon terminal or by AP collision....’ and implement this condition in our calculations for the axon terminals.

      Missing demonstrations

      Central analytical results are stated rather brusquely, notably equations (3) and (4) and the relation between them. These merit an expanded explanation at the least. A better explanation of the need for the collision measurements in parameterising the models should also be provided.

      We thank the reviewer for pointing out the insufficient explanation of the equations 3 and 4. We rephrased the paragraph ’Discharge of colliding APs’ in order to clarify the origin and the function of the two equations (eq. 3: how much charge is expelled and eq. 4: the resulting extracellular potential that is used for model validation).

      Later, in the Discussion, we rephrased the paragraph where we describe the annihilation process and explain further that one term of eq. 4 sometimes is refered to ’activating function’ when using microelectrodes for stimulation.

      With respect to the ’explanation of the need for the collision measurement’, we think that the explanations we give at several locations in the manuscript are sufficient as is. We explain and elaborate in the introduction: ’We explore the behaviour of APs at boundaries ... In this study, we first focus on collisions of APs. Our experimental observation of colliding APs provides unique access to the spatial profile of the extracellular potential around APs that are blocked by collisions and thus annihilate..... Recording propagating APs allows to determine both the propagation velocity and the amplitude of the extracellular electric potentials. The collision experiment provides additional information ... In the results we recall: ’The width of the collision is a measure of the characteristic length λ⋆ of the AP and is uniquely revealed by a collision sweep experiment.’

      Adjusted parameters

      I am uncomfortable that the parameters adjusted to fit the model are the membrane capacitance and intracellular resistance. These have a physical reality and could easily be measured or estimated quite accurately. With a variation of more than 20-fold reported between the different models in Appendix 2 we can be sure that some of the models are based upon quite unrealistic physical assumptions, which in turn undermines confidence in their generality.

      The fact that the parameters of our model have physical realities is clearly in favor of our models. We rephrased the legend of the table, now explaining the procedure for the model fitting and the rational behind. Although the values of g⋆ can differ by a factor of 15 and the resulting amplitude is very different, the relationship ri cm \= vpλ⋆ is very similar, independently of the model used and this confirms our analytical framework.

      p8 - the values of both the extracellular (100 Ohm m) and intracellular resistivity (1 Ohm m) appear to be in error, especially the former.

      We have the following justification for the resistivity values we used. For the intracellular resistivity, literature values range from 0.4 - 1.5 Ohm m, and therefore we selected 1 Ohm m. See: Carpenter et al (1975) doi: 10.1085/jgp.66.2.139; Cole et al (1975) doi: 10.1085/jgp.66.2.133; Bekkers (2014) doi: 10.1007/978-1-46147320-6 35-2.

      Estimating extracellular resistivity is less straight forward, since it depends crucially on the structure around the synapse which consists of conducting saline and insulating fatty tissue. Ranges from 3 to 600 Ohm m are reported (Linden et al (2011) doi: 10.1016/j.neuron.2011.11.006) and Bakiri et al (2011) doi: 10.1113/jphysiol.2010.201376). Weiss et al (2008; doi: 10.1073/pnas.0806145105) report extracellular resistivities in the Mauthner Cap between 50-600 Ohm m in SI. Since the pinceau is structurally similar to the Mauthner cells axon cap, we argue that a value of 100 Ohm m is a reasonable choice for our calculations. Additionally, we derived a value from Blot and Barbour (doi:c10.1038/nn.3624), rephrased the paragraph in the main text and added our calculation to the supplementary material (Appendix 1).

      (In)applicability to axon terminals

      The rationale of the application of the collision formalism to axon terminals is somewhat undermined by the fact that they tend not to be excitable. There is experimental evidence for this in the Calyx of Held and the cerebellar pinceau.

      The solution found via collision is therefore not directly applicable in these cases.

      We do not agree with the reviewer’s statement that ’the solution found via collision is (therefore) not directly applicable...’. Our model is well suited for application on axon terminals that are not excitable, e.g. the pinceau of the basket cell, as the reviewer points out. We have included a calculation for this case and present the results in the new Fig. 5 (main text line 264 to 284 ).

      Comparison with experimental data

      More effort should be made to compare the modelling with the extracellular terminal fields that have been reported in the literature.

      As outlined above (see: Reponse to reviewer 2), we now compare directly the predictions of our models with measured modulation of AP rates in Purkinje fibers (Blot and Babour 2014) and our results are included in the manuscript (Fig. 5 and main text line 264 to 284). See also our response to reviewer 2 in which we address how our model ’of ephaptic effects can quantitatively explain key features of experimental data’.

      Choice of term ”annihilation”

      The term annihilation does not seem wholly appropriate to me. The dictionary definitions are something along the lines of complete destruction by an external force or mutual destruction, for example of an electron and a positron. I don’t think either applies exactly here. I suggest retaining the notion of collision which is well understood in this context.

      Experimentally, we generated a collision of APs and showed that colliding APs dissapear and do not pass each other. For this process the term annihilation is used in our and in other studies (see e.g. Berg et al (2017) doi: 10.1103/PhysRevX.7.028001; Johnson et al (2018) doi: 10.3389/fphys.2018.00779; Follmann (2015) doi: 10.1103/PhysRevE.92.032707; Shrivastava et al (2018) doi: 10.1098/rsif.2017.0803). The physical processes involved in the termination of an AP at a closed end are essentially identical to those of two colliding APs. This we think justifies using the term annihilation for those processes.

      Recommendations for the authors:

      We believe the work is of high quality and should motivate future experimental work. We are including the review comments here for your information. The main piece of feedback we are offering is that the broad claims need to be adjusted to the strength of evidence provided: as is, the manuscript provides compelling predictions but the claim that these predictions are in full agreement with data remains to be substantiated. A technical concern raised by the reviewers is that the reflecting boundary condition may need further justification. The authors may wish to respond to this issue in a rebuttal and/or adjust the manuscript as necessary.

      We substantiated our claim that our predictions are in full agreement with experimental data. We added to the manuscript a section in which we compare our models’ predictions to published, experimental data. To this aim, we extracted date from the publication of Blot and Babour (2014), we elaborated on the parameters used and run our model accordingly. We added to the Results/Model of ephaptic coupling a paragraph on ’The modulation of activity in Purkinje cells...’ (line 264), where we describe our results and we also included another figure to the main text for illustration (Fig. 5).

      We clarified the term ’boundary condition’ by rephrasing parts of the introduction and we explain the rational behind in ’Discharge of colliding APs (...AP propagation is blocked when axial current is shut down...) and in ’Model of ephaptic coupling (Within any cable model, the same boundary...). See also our response to the general comments of reviewer 3 above.

      Reviewer 1 (Recommendations For The Authors):

      Major:

      Accessing data and code requires signing in, which should not be required. The link provided also seems to be not accessible yet - could be pending review.

      The repository is now publicly availible. We did provide an access code within the letter to the editor, this code is no longer required.

      Line 74: how about morphology? Authors should clarify and emphasize in the introduction that the TM model is a spatially continuous model with partial differential equations as opposed to discrete morphological models to simulate HH equations.

      The reviewer is correct that the TM model is continous. However, so is the HH model. The difference between HH and TM is only that the TM model can be solved analytically, which yields a spatially homogeneous analytical solution. It should be noted that this analytical solution can only be valid for a homogeneous (therefore infinite) nerve. Every numerical computation, be it HH or TM, requires a finite number of discrete compartments. In our calculations, we used identical compartment models for HH, TM and RTM model. In each compartment, the differential equations are solved numerically. Since there is no fundamental difference between these models, we obstain from changing the text.

      Minor:

      Major typo: ventral nerve cord, not ”chord”. Repeated in several places.

      Thank you for indicating this typo to us.

      Line 25: inhibition, excitation, and modulation?

      We changed the line to: ... leads to modulation, e.g. excitation or inhibition

      Line 70: better term for ”length” of AP would be ”duration”. Also, the sentence could be simplified to use either ”its” or ”of the AP”

      Space and time are not interchangable. Thus, the term lenght can not be replaced by duration. We simplified the structure of the sentence as suggested.

      Fig 1A/B: it’s strange that panel B precedes panel A.

      Exchanged

      Fig 1C: don’t see the ”horizontal line”; also regarding ”The recording was at a medial position”, the caption is not clear until one reads the main text.

      We changed the legend to: ... The collision is captured in the recording line at y-position 0 mm, while orthodromic propagation is at the top and antidromic propagation is at the bottom. (D) The peak amplitude as a function of the distance to the collision. Examples of four sweeps at three positions along the nerve cord....

      Line 127: the per distance measures could be named as ”specific” conductivity, etc.

      We explicitly provide the units thereby defining the quantities unambigously.

      Line 176: typo ”ad-hoc”.

      Thank you.

      Fig 4B: should clarify that the circle in the schematic is not the soma but a synaptic bouton.

      We rephrased to ’...(B,C) when the AP is annihilating at a bouton of a neuron terminal (upper neuron in end-to-shaft geometry, similar to the Basket cell–Purkinje cell synapse)...’, and we added a label to Fig 4B.

      Reviewer 2 (Recommendations For The Authors):

      Can the authors’ model be quantitatively compared with experimental data of ephaptic interactions at synapses (e.g. the Blot & Barbour study described in the Discussion)?

      We did so as outlined in our response to the reviewer above.

      Can statistical evidence be provided that the velocities of anti- and orthodromic APs are indeed identical in the earthworm nerve recordings?

      These data and statistics are available in Appendix 2, now 3 – table 1

      Why not reorder ABCD in Fig1 so the subpanels run from left to right?

      We adjusted the labels accordingly.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This paper contains what could be described as a "classic" approach towards evaluating a novel taste stimuli in an animal model, including standard behavioral tests (some with nerve transections), taste nerve physiology, and immunocytochemistry of the tongue. The stimulus being tested is ornithine, from a class of stimuli called "kokumi", which are stimuli that enhance other canonical tastes, increasing essentially the hedonic attributes of these other stimuli; the mechanism for ornithine detection is thought to be GPRC6A receptors expressed in taste cells. The authors showed evidence for this in an earlier paper with mice; this paper evaluates ornithine taste in a rat model.

      Strengths:

      The data show the effects of ornithine on taste: in two-bottle and briefer intake tests, adding ornithine results in a higher intake of most, but not all, stimuli tests. Bilateral nerve cuts or the addition of GPRC6A antagonists decrease this effect. Small effects of ornithine are shown in whole-nerve recordings.

      Weaknesses:

      The conclusion seems to be that the authors have found evidence for ornithine acting as a taste modifier through the GPRC6A receptor expressed on the anterior tongue. It is hard to separate their conclusions from the possibility that any effects are additive rather than modulatory. Animals did prefer ornithine to water when presented by itself. Additionally, the authors refer to evidence that ornithine is activating the T1R1-T1R3 amino acid taste receptor, possibly at higher concentrations than they use for most of the study, although this seems speculative. It is striking that the largest effects on taste are found with the other amino acid (umami) stimuli, leading to the possibility that these are largely synergistic effects taking place at the tas1r receptor heterodimer.

      We would like to thank Reviewer #1 for the valuable comments. Our basis for considering ornithine as a taste modifier stems from our observation that a low concentration of ornithine (1 mM), which does not elicit a preference on its own, enhances the preference for umami substances, sucrose, and soybean oil through the activation of the GPRC6A receptor. Notably, this receptor is not typically considered a taste receptor. The reviewer suggested that the enhancement of umami taste might be due to potentiation occurring at the TAS1R receptor heterodimer. However, we propose that a different mechanism may be at play, as an antagonist of GPRC6A almost completely abolished this enhancement. In the revised manuscript, we will endeavor to provide additional information on the role of ornithine as a taste modifier acting through the GPRC6A receptor.

      Reviewer #2 (Public review):

      Summary:

      The authors used rats to determine the receptor for a food-related perception (kokumi) that has been characterized in humans. They employ a combination of behavioral, electrophysiological, and immunohistochemical results to support their conclusion that ornithine-mediated kokumi effects are mediated by the GPRC6A receptor. They complemented the rat data with some human psychophysical data. I find the results intriguing, but believe that the authors overinterpret their data.

      Strengths:

      The authors examined a new and exciting taste enhancer (ornithine). They used a variety of experimental approaches in rats to document the impact of ornithine on taste preference and peripheral taste nerve recordings. Further, they provided evidence pointing to a potential receptor for ornithine.

      Weaknesses:

      The authors have not established that the rat is an appropriate model system for studying kokumi. Their measurements do not provide insight into any of the established effects of kokumi on human flavor perception. The small study on humans is difficult to compare to the rat study because the authors made completely different types of measurements. Thus, I think that the authors need to substantially scale back the scope of their interpretations. These weaknesses diminish the likely impact of the work on the field of flavor perception.

      We would like to thank Reviewer #2 for the valuable comments and suggestions. Regarding the question of whether the rat is an appropriate model system for studying kokumi, we have chosen this species for several reasons: it is readily available as a conventional experimental model for gustatory research; the calcium-sensing receptor (CaSR), known as the kokumi receptor, is expressed in taste bud cells; and prior research has demonstrated the use of rats in kokumi studies involving gamma Glu-Val-Gly (Yamamoto and Mizuta, Chem. Senses, 2022). We acknowledge that fundamentally different types of measurements were conducted in the human psychophysical study and the rat study. Kokumi can indeed be assessed and expressed in humans; however, we do not currently have the means to confirm that animals experience kokumi in the same way that humans do. Therefore, human studies are necessary to evaluate kokumi, a conceptual term denoting enhanced flavor, while animal studies are needed to explore the potential underlying mechanisms of kokumi. We believe that a combination of both human and animal studies is essential, as is the case with research on sugars. While sugars are known to elicit sweetness, it is unclear whether animals perceive sweetness identically to humans, even though they exhibit a strong preference for sugars. In the revised manuscript, we will incorporate additional information to address the comments raised by the reviewer. We will also carefully review and revise our previous statements to ensure accuracy and clarity.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to investigate whether GPRC6A mediates kokumi taste initiated by the amino acid L-ornithine. They used Wistar rats, a standard laboratory strain, as the primary model and also performed an informative taste test in humans, in which miso soup was supplemented with various concentrations of L-ornithine. The findings are valuable and overall the evidence is solid. L-Ornithine should be considered to be a useful test substance in future studies of kokumi taste and the class C G protein-coupled receptor known as GPRC6A (C6A) along with its homolog, the calcium-sensing receptor (CaSR) should be considered candidate mediators of kokumi taste.

      Strengths:

      The overall experimental design is solid based on two bottle preference tests in rats. After determining the optimal concentration for L-Ornithine (1 mM) in the presence of MSG, it was added to various tastants, including inosine 5'-monophosphate; monosodium glutamate (MSG); mono-potassium glutamate (MPG); intralipos (a soybean oil emulsion); sucrose; sodium chloride (NaCl); citric acid and quinine hydrochloride. Robust effects of ornithine were observed in the cases of IMP, MSG, MPG, and sucrose, and little or no effects were observed in the cases of sodium chloride, citric acid, and quinine HCl. The researchers then focused on the preference for Ornithine-containing MSG solutions. The inclusion of the C6A inhibitors Calindol (0.3 mM but not 0.06 mM) or the gallate derivative EGCG (0.1 mM but not 0.03 mM) eliminated the preference for solutions that contained Ornithine in addition to MSG. The researchers next performed transections of the chord tympani nerves (with sham operation controls) in anesthetized rats to identify the role of the chorda tympani branches of the facial nerves (cranial nerve VII) in the preference for Ornithine-containing MSG solutions. This finding implicates the anterior half-two thirds of the tongue in ornithine-induced kokumi taste. They then used electrical recordings from intact chorda tympani nerves in anesthetized rats to demonstrate that ornithine enhanced MSG-induced responses following the application of tastants to the anterior surface of the tongue. They went on to show that this enhanced response was insensitive to amiloride, selected to inhibit 'salt tastant' responses mediated by the epithelial Na+ channel, but eliminated by Calindol. Finally, they performed immunohistochemistry on sections of rat tongue demonstrating C6A positive spindle-shaped cells in fungiform papillae that partially overlapped in its distribution with the IP3 type-3 receptor, used as a marker of Type-II cells, but not with (i) gustducin, the G protein partner of Tas1 receptors (T1Rs), used as a marker of a subset of type-II cells; or (ii) 5-HT (serotonin) and Synaptosome-associated protein 25 kDa (SNAP-25) used as markers of Type-III cells.

      Weaknesses:

      The researchers undertook what turned out to be largely confirmatory studies in rats with respect to their previously published work on Ornithine and C6A in mice (Mizuta et al Nutrients 2021).

      The authors point out that animal models pose some difficulties of interpretation in studies of taste and raise the possibility in the Discussion that umami substances may enhance the taste response to ornithine (Line 271, Page 9). They miss an opportunity to outline the experimental results from the study that favor their preferred interpretation that ornithine is a taste enhancer rather than a tastant.

      At least two other receptors in addition to C6A might mediate taste responses to ornithine: (i) the CaSR, which binds and responds to multiple L-amino acids (Conigrave et al, PNAS 2000), and which has been previously reported to mediate kokumi taste (Ohsu et al., JBC 2010) as well as responses to Ornithine (Shin et al., Cell Signaling 2020); and (ii) T1R1/T1R3 heterodimers which also respond to L-amino acids and exhibit enhanced responses to IMP (Nelson et al., Nature 2001). While the experimental results as a whole favor the authors' interpretation that C6A mediates the Ornithine responses, they do not make clear either the nature of the 'receptor identification problem' in the Introduction or the way in which they approached that problem in the Results and Discussion sections. It would be helpful to show that a specific inhibitor of the CaSR failed to block the ornithine response. In addition, while they showed that C6A-positive cells were clearly distinct from gustducin-positive, and thus T1R-positive cells, they missed an opportunity to clearly differentiate C6A-expressing taste cells and CaSR-expressing taste cells in the rat tongue sections.

      It would have been helpful to include a positive control kokumi substance in the two-bottle preference experiment (e.g., one of the known gamma-glutamyl peptides such as gamma-glu-Val-Gly or glutathione), to compare the relative potencies of the control kokumi compound and Ornithine, and to compare the sensitivities of the two responses to C6A and CaSR inhibitors.

      The results demonstrate that enhancement of the chorda tympani nerve response to MSG occurs at substantially greater Ornithine concentrations (10 and 30 mM) than were required to observe differences in the two bottle preference experiments (1.0 mM; Figure 2). The discrepancy requires careful discussion and if necessary further experiments using the two-bottle preference format.

      We would like to thank Reviewer #3 for the valuable comments and helpful suggestions. We propose that ornithine has two stimulatory actions: one acting on GPRC6A, particularly at lower concentrations, and another on amino acid receptors such as T1R1/T1R3 at higher concentrations. Consequently, ornithine is not preferable at lower concentrations but becomes preferable at higher concentrations. For our study on kokumi, we used a low concentration (1 mM) of ornithine. The possibility mentioned in the Discussion that 'the umami substances may enhance the taste response to ornithine' is entirely speculative. We will reconsider including this description in the revised version. As the reviewer suggested, in addition to GPRC6A, ornithine may bind to CaSR and/or T1R1/T1R3 heterodimers. However, we believe that ornithine mainly binds to GPRC6A, as a specific inhibitor of this receptor almost completely abolished the enhanced response to umami substances, and our immunohistochemical study indicated that GPRC6A-expressing taste cells are distinct from CaSR-expressing taste cells (see Supplemental Fig. 3). We conducted essentially the same experiments using gamma-Glu-Val-Gly in Wistar rats (Yamamoto and Mizuta, Chem. Senses, 2022) and compared the results in the Discussion. The reviewer may have misunderstood the chorda tympani results: we added the same concentration (1 mM) used in the two-bottle preference test to MSG (Fig. 5-B). Fig. 5-A shows nerve responses to five concentrations of plain ornithine. In the revised manuscript, we will strive to provide more precise information reflecting the reviewer’s comments.

    1. Author response:

      We thank both reviewers for their considerate reviews. In this provisional response we would like to make a few key points.

      Given that we introduced a bespoke likelihood model for the second dataset, Reviewer 1 asks whether "every unique dataset requires a tailored prior or likelihood to produce the best results". Our intention is to advocate for the horseshoe prior model as a 'standard' first analysis for any cell count dataset. If extra knowledge about the data is available, or if any data artefacts are detected, more elaborate likelihoods could be introduced as needed in a follow-up analysis. Our introduction of the zero-inflated Poisson likelihood for the second dataset was one such example, but many alternatives could exist. This iterative approach to model building, sometimes referred to as a `Bayesian workflow' is seen as good practise in Bayesian data analysis literature. In the revised version of the paper, we will try to explain the recommendations and modelling philosophy behind this method while emphasising that tailoring or bespoke modelling is not required for our `standard analysis', what we would regard as the Bayesian replacement for a t-test on counts.

      Reviewer 1 notes that "the differences between the results produced by the two Bayesian models in case study 2 are not discussed". We agree that this discrepancy, arising from the specific assumptions of each model is an interesting issue which we should better explore in the paper. In Figure 6 we plotted the actual data values alongside posterior and confidence intervals to explain how the results from the ZIP likelihood and Horseshoe prior compare with those from a t-test. However, our example regions did not highlight cases where differences could be noted between the the two Bayesian models. In the revised version of the paper, we will extend Figure 6 to include further brain regions, such as those mentioned by the referee, and will use that as an opportunity to discuss the broader issue of what to do when the Bayesian models give conflicting results.

      We agree with reviewer 2's point that the model description terminology could be made clearer for the target eLife audience. We tried to strike a balance between introducing the reader to the conventional technical terminology used in the Bayesian data analysis necessary for understanding the model while avoiding exhaustive statistical terminology. We erred too much on the side of the latter instead of providing clear links between the model construction and experimental data. In the revised version of the paper, we will augment any technical terms with more biological language and provide a Glossary for reader reference.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Loh and colleagues investigate valence encoding in the mesolimbic dopamine system. Using an elegant approach, they show that sucrose, which normally evokes strong dopamine neuron activity and release in the nucleus accumbens, is made aversive via conditioned taste aversion, the same sucrose stimulus later evokes much less dopamine neuron activity and release. Thus, dopamine activity can dynamically track the changing valence of an unconditioned stimulus. These results are important for helping clarify valence and value related questions that are the matter of ongoing debate regarding dopamine functions in the field.

      Strengths:

      This is an elegant way to ask this question, the within subject's design and the continuity of the stimulus is a strong way to remove a lot of the common confounds that make it difficult to interpret valence-related questions. I think these are valuable studies that help tie up questions in the field while also setting up a number of interesting future directions. There are number of control experiments and tweaks to the design that help eliminate a number of competing hypotheses regarding the results. The data are clearly presented and contextualized.

      Weaknesses for consideration:

      The focus on one relatively understudied region of the rat striatum for dopamine recordings could potentially limit generalization of the findings. While this can be determined in future studies, the implications should be further discussed in the current manuscript.

      We agree that the manuscript would benefit from providing a stronger rationale for our recording sites and acknowledging the potential for regional differences in dopamine signaling. We have made the following additions to the manuscript:

      Added to the Discussion: “Recordings were targeted to the lateral VTA and the corresponding approximate terminal site in the NAc lateral shell (Lammel et al., 2008). Subregional differences in dopamine activity likely contribute to mixed findings on dopamine and affect. For example, dopamine in the NAc lateral shell differentially encodes cues predictive of rewarding sucrose and aversive footshock, which is distinct from NAc medial shell dopamine responses (de Jong et al., 2019). Our findings are similar to prior work from our group targeting recordings to the NAc dorsomedial shell (Hsu et al., 2020; McCutcheon et al., 2012; Roitman et al., 2008): there, intraoral sucrose increased NAc dopamine release while the response in the same rats to quinine was significantly lower.”

      Reviewer #2 (Public review):

      Summary:

      Koh et al. report an interesting manuscript studying dopamine binding in the lateral accumbens shell of rats across the course of conditioned taste aversion. The question being asked here is how does the dopamine system respond to aversion? The authors take advantage of unique properties of taste aversion learning (notably, within-subjects remapping of valence to the same physical stimulus) to address this.

      They combine a well controlled behavioural design (including key, unpaired controls) with fibre photometry of dopamine binding via GrabDA and of dopamine neuron activity by gCaMP, careful analyses of behaviour (e.g., head movements; home cage ingestion), the authors show that, 1) conditioned taste aversion of sucrose suppresses the activity of VTA dopamine neurons and lateral shell dopamine binding to subsequent presentations of the sucrose tastant; 2) this pattern of activity was similar to the innately aversive tastant quinine; 3) dopamine responses were negatively correlated with behavioural (inferred taste reactivity) reactivity; and 4) dopamine responses tracked the contingency of between sucrose and illness because these responses recovered across extinction of the conditioned taste aversion.

      Strengths:

      There are important strengths here. The use of a well-controlled design, the measurement of both dopamine binding and VTA dopamine neuron activity, the inclusion of an extinction manipulation; and the thorough reporting of the data. I was not especially surprised by these results, but these data are a potentially important piece of the dopamine puzzle (e.g., as the authors note, salience-based argument struggles to explain these data).

      Weaknesses for consideration:

      (1) The focus here is on the lateral shell. This is a poorly investigated region in the context of the questions being asked here. Indeed, I suspect many readers might expect a focus on the medial shell. So, I think this focus is important. But, I think it does warrant greater attention in both the introduction and discussion. We do know from past work that there can be extensive compartmentalisation of dopamine responses to appetitive and aversive events and many of the inconsistent findings in the literature can be reconciled by careful examination of where dopamine is assessed. I do think readers would benefit from acknowledgement this - for example it is entirely reasonable to suppose that the findings here may be specific to the lateral shell.

      As with our response to Reviewer 1, we agree that we should provide further rationale for focusing our recordings on the lateral shell and acknowledge potential differences in dopamine dynamics across NAc subregions. In addition to the changes in the Discussion detailed in our response to Reviewer 1, we have made the following additions to the Introduction:

      Added to the Introduction: “NAc lateral shell dopamine differentially encodes cues predictive of rewarding (i.e., sipper spout with sucrose) and aversive stimuli (i.e., footshock), which is distinct from other subregions (de Jong et al., 2019). It is important to note that other regions of the NAc may serve as hedonic hotspots (e.g. dorsomedial shell; or may more closely align with the signaling of salience (e.g. ventromedial shell; (Yuan et al., 2021)).”

      (2) Relatedly, I think readers would benefit from an explicit rationale for studying the lateral shell as well as consideration of this in the discussion. We know that there are anatomical (PMID: 17574681), functional (PMID: 10357457), and cellular (PMID: 7906426) differences between the lateral shell and the rest of the ventral striatum. Critically, we know that profiles of dopamine binding during ingestive behaviours there can be highly dissimilar to the rest of ventral striatum (PMID: 32669355). I do think these points are worth considering.

      There are several reasons why dopamine dynamics were recorded in the NAc lateral shell:

      (1) Dopamine neurons in more medial aspects of the VTA preferentially target the NAc medial shell and core whereas dopamine neurons in the lateral VTA – our target for VTA DA recordings – project to the lateral shell of the NAc (Lammel et al., 2008). Thus, our goal was to sample NAc release dynamics in areas that receive projections from our cell body recording sites.

      (2) Cues predictive of reward availability (i.e., sipper spout with sucrose) and aversive stimuli (i.e., footshock) are differentially encoded by NAc lateral shell dopamine, which is distinct from NAc ventromedial shell dopamine responses (de Jong et al., 2019). These findings suggest a role for NAc lateral shell dopamine in the encoding of a stimulus’s valence, which made the subregion an area of interest for further examination.

      (3) With respect to the medial NAc shell specifically, extensive literature had already shown it to be a ‘hedonic hotspot’ (Morales and Berridge, 2020; Yuan et al., 2021) whereas the ventral portion is more mixed with respect to valence (Yuan et al., 2021). We had previously shown that intraoral infusions of primary taste stimuli of opposing valence (i.e., sucrose and quinine) evoke differential responses in dopamine release within the NAc dorsomedial shell (Roitman et al., 2008). We more recently replicated differential dopamine responses from dopamine cell bodies in the lateral VTA (Hsu et al., 2020) and thus endeavored to the possibility of changing dopamine responses in the lateral VTA to the same stimulus as its valence changes. As a result of these choices, measuring dopamine release in the lateral shell was a logical choice. The field would greatly benefit from continued future work surveying the entirety of the VTA DA projection terminus. 

      We have included these points of justification in the Introduction and Discussion sections.

      (3) I found the data to be very thoughtfully analysed. But in places I was somewhat unsure:

      (a) Please indicate clearly in the text when photometry data show averages across trials versus when they show averages across animals.

      We have now explicitly indicated in the figure legends of Figures 1, 3, 5, 7, and 8:

      (1) In heat maps, each row represents the averaged (across rats) response on that trial.

      (2) Traces below heat maps represent the response to infusion averaged first across trials for each rat and then across all rats.

      (3) Insets represent the average z-score across the infusion period averaged first across all trials for each rat and then across all rats.

      (b) I did struggle with the correlation analyses, for two reasons.

      (i) First, the key finding here is that the dopamine response to intraoral sucrose is suppressed by taste aversion. So, this will significantly restrict the range of dopamine transients, making interpretation of the correlations difficult.

      The overall hypothesis is that the dopamine response would correlate with the valence of a taste stimulus – even and especially when the stimulus remained constant but its valence changed. We inferred valence from the behavioral reactivity to the stimulus – reasoning that an appetitive taste will evoke minimal movement of the nose and paws (presumably because the animals are primarily engaging in small mouth movements associated with ingestion as shown by the seminal work of Grill and Norgren (1978) and the many studies published by the K.C. Berridge group) whereas an aversive taste will evoke significantly more movement as the rats engage in rejection responses (e.g. forelimb flails, chin rubs, etc.). When we conducted our regression analyses we endeavored to be as transparent as possible and labeled each symbol based on group (Unpaired vs Paired) and day (Conditioning vs Test). Both behavioral reactivity and dopamine responses change – but only for the Paired rats across days. In this sense, we believe the interpretation is clear. However, the Reviewer raises an important criticism that there would essentially be a floor effect with dopamine responses. We believe this is mitigated by data acquired across extinction and especially in Figure 9B. Here, the observations that dopamine responses fall to near zero but return to pre-conditioning levels in the Paired group with strong correlation between dopamine and behavioral reactivity throughout would hopefully partially allay the Reviewer’s concerns. See Part ii below for further support.

      (ii) Second, the authors report correlations by combining data across groups/conditions. I understand why the authors have done this, but it does risk obscuring differences between the groups. So, my question is: what happens to this trend when the correlations are computed separately for each group? I suspect other readers will share the same question. I think reporting these separate correlations would be very helpful for the field -

      regardless of the outcome.

      To address this concern, we performed separate regression analyses for Paired and Unpaired rats and provide the table below to detail results where data were combined across groups or separated. Expectedly, all analyses in Paired rats indicated a significant inverse relationship between dopamine and behavioral reactivity. Afterall, it is only in this group where behavioral reactivity to the taste stimulus changes as function of conditioning. Perhaps even more striking is that in almost all comparisons, even when restricting the regression analysis to Unpaired rats, we still observed a significant inverse relationship between dopamine and behavioral reactivity in most experiments. We have outlined the separated correlations below (asterisks denote slopes significantly different from 0; * p<0.05; ** p<0.01; *** p<0.005; **** p<0.001):

      Author response table 1.

      (4) Figure 1A is not as helpful as it might be. I do think readers would expect a more precise reporting of GCaMP expression in TH+ and TH- neurons. I also note that many of the nuances in terms of compartmentalisation of dopamine signalling discussed above apply to ventral tegmental area dopamine neurons (e.g. medial v lateral) and this is worth acknowledging when interpreting t

      Others have reported (Choi et al., 2020) and quantified (Hsu et al., 2020) GCaMP6f expression in TH+ neurons. While we didn’t report these quantifications, our observations were very much in line with previous quantifications from our laboratory (Hsu et al. 2020).

      We agree that we should elaborate on VTA subregional differences and have answered this response above (See responses to Reviewer 1 Weakness #1 and Reviewer 2 Weakness #2).

      Reviewer #3 (Public review):

      Summary:

      This study helps to clarify the mixed literature on dopamine responses to aversive stimuli. While it is well accepted that dopamine in the ventral striatum increases in response to various rewarding and appetitive stimuli, aversive stimuli have been shown to evoke phasic increases or decreasing depending on the exact aversive stimuli, behavioral paradigm, and/or dopamine recording method and location examined. Here the authors use a well-designed set of experiments to show differential responses to an appetitive primary reward (sucrose) that later becomes a conditioned aversive stimulus (sucrose previously paired with lithium chloride in a conditioned taste aversion paradigm). The results are interesting and add valuable data to the question of how the mesolimbic dopamine system encodes aversive stimuli, however, the conclusions are strongly stated given that the current data do not necessarily align with prior conflicting data in terms of recording location, and it is not clear exactly how to interpret the generally biphasic dopamine response to the CTA-sucrose which also evolves over exposures within a single session.

      Strengths:

      • The authors nicely demonstrate that their two aversive stimuli examined, quinine and sucrose following CTA, evoked aversive facial expressions and paw movements that differed from those following rewarding sucrose to support that the stimuli experienced by the rats differ in valence.

      • Examined dopamine responses to the exact same sensory stimuli conditioned to have opposing valences, avoiding standard confounds of appetitive and aversive stimuli being sensed by different sensory modalities (i.e., sweet taste vs. electric shock)

      • The authors examined multiple measurements of dopamine activity - cell body calcium (GCaMP6f) in midbrain and release in NAc (Grab-DA2h), which is useful as the prior mixed literature on aversive dopamine responses comes from a variety of recording methods.

      • Correlations between sucrose preference and dopamine signals demonstrate behavioral relevance of the differential dopamine signals.

      • The delayed testing experiment in Figure 7 nicely controls for the effect of time to demonstrate that the "rewarding" dopamine response to sucrose only recovers after multiple extinction sucrose exposures to extinguish the CTA.

      Weaknesses for consideration:

      (1) Regional differences in dopamine signaling to aversive stimuli are mentioned in the introduction and discussion. For instance, the idea that dopamine encodes salience is strongly argued against in the discussion, but the paper cited as arguing for that (Kutlu et al. 2021) is recording from the medial core in mice. Given other papers cited in the text about the regional differences in dopamine signaling in the NAc and from different populations of dopamine neurons in midbrain, it's important to mention this distinction wrt to salience signaling. Relatedly, the text says that the lateral NAc shell was targeted for accumbens recordings, but the histology figure looks like the majority of fibers were in the anterior lateral core of NAc. For the current paper to be a convincing last word on the issue, it would be extremely helpful to have similar recordings done in other parts of the NAc to do a more thorough comparison against other studies.

      As the Reviewer notes, NAc dopamine recordings were aimed at the lateral NAc shell. It is possible that some dopamine neurons lying within the anterior lateral core were recorded. Fiber photometry and the size of the fiber optics cannot definitively identify the precise location and number of dopamine neurons from which we recorded. Still, recording sites did not systematically differ between groups. Further, the within-subjects design helps to mitigate any potential biases for one subregion over another. The results presented in the manuscript strongly support a valence code. It is difficult to be the ‘last word’ on this topic and we suspect debate will continue. We used taste stimuli for appetitive and aversive stimuli – whereas many in the field will continue to use other noxious stimuli (e.g. foot shock) that likely recruit different circuits en route to the VTA. And there may very well be a different regional profile for dopamine signaling with different noxious stimuli. Moreover, we used intraoral infusion to avoid confounds of stimulus avoidance and competing motivations (e.g. food or fluid deprivation). We believe that this is one of the most important and unique features of our report. Recent work supports a role for phasic increases in dopamine in avoidance of noxious stimuli (Jung et al., 2024) and it will be critical for the field to reflect on the differences between avoidance and aversion. Moreover, in ongoing studies we aspire to fully survey dopamine signaling in conditioned taste aversion across the medial-lateral and dorsal-ventral axes of the VTA and NAc.

      (2) Dopamine release in the NAc never dips below baseline for the conditioned sucrose. Is it possible to really consider this as a signal for valence per se, as opposed to it being a weaker response relative to the original sucrose response?

      Indeed, NAc dopamine release to intraoral quinine nor aversive sucrose doesn’t dip below baseline but rather dopamine binding doesn’t change from pre-infusion baseline levels. It should be noted that VTA dopamine cell body activity does indeed dip below baseline in response to aversive sucrose. Moreover, using fast-scan cyclic voltammetry, we showed that dopamine release dips below baseline in the NAc dorsomedial shell in response to intraoral quinine (Roitman et al., 2008). The differences across recording sites may reflect regional differences but they may also reflect differences in recording approaches. GrabDA2h, used here, has relatively slow kinetics that may obscure dips below baseline (see response Weakness# 8 below).

      (3) Related to this, the main measure of the dopamine signal here, "mean z-score," obscures the temporal dynamics of the aversive dopamine response across a trial. This measure is used to claim that sucrose after CTA is "suppressing" dopamine neuron activity and release, which is true relative to the positive valence sucrose response. However, both GRAB-DA and cell-body GCaMP measurements show clear increases after onset of sucrose infusion before dipping back to baseline or slightly below in the average of all example experiments displayed. One could point to these data to argue either that aversive stimuli cause phasic increases in dopamine (due to the initial increase) or decreases (due to the delayed dip below baseline) depending on the measurement window. Some discussion of the dynamics of the response and how it relates to the prior literature would be useful.

      We have used mean z-score to do much of our quantitative analyses but the Reviewer raises the intriguing possibility that we are masking an initial increase in dopamine release and VTA DA activity evoked by aversive taste by doing so. We included the heat maps in the manuscript to be as transparent as possible about the time course of dopamine responses – both within a trial and across trials. The Reviewer’s point prompted us to reflect further on the heat maps and recognize that trials early in the session often showed a brief increase in dopamine for aversive sucrose but this response dissipated (NAc dopamine release) or flipped (VTA DA cell body activity) over trials. We now quantitatively characterize this feature by looking at the timecourse of dopamine responses in each third of the trials (1-10, 11-20, 21-30; see Author response images 1,2 and 3). As we infer the valence of the stimulus from nose and paw movements (behavioral reactivity), it is especially striking that we a similar timecourse for changes in behavior. Collectively, the data may reflect an updating process that is relatively slow and requires experience of the stimulus in a new (aversive) state – that is, a model-free process. While our experiments were not designed to test the updating of dopamine responses and discern their participation in model-based versus model-free learning processes – another debate in the dopamine field (Cone et al., 2016; Deserno et al., 2021)– the data reflect a model-free process. This is further supported in the experiment involving multiple conditioning sessions, where dopamine ‘dips’ are observed in trials 1-10 on Conditioning Day 3 and Extinction Day 1 when the new value of sucrose has been established. Finally, the relatively slow updating of the value of sucrose is reflected in older literature using a continuous intraoral infusion. Using this approach, rats began rejecting the saccharin infusion only after ~2min rather than immediately (Schafe et al., 1998; Schafe and Bernstein, 1996; Wilkins and Bernstein, 2006).   

      Author response image 1.

      Author response image 2.

      Author response image 3.

      (4) Would this delayed below-baseline dip be visible with a shorter infusion time?

      While our experiments did not explore this parameter, it would be interesting to parametrically vary infusion duration times and examine differences in dopamine responses. However, we believe the most parsimonious explanation is that the ‘dip’ in VTA cell body activity develops as a function of the slow updating of the value of sucrose reflective of a model-free process. We recognize that this is mere speculation.

      (5) Does the max of the increase or the dip of the decrease better correlate with the behavioral measures of aversion (orofacial, paw movements) or sucrose preference than "mean z-score" measure used here?

      It seems plausible that finding the most extreme value from baseline could better correlate to behavioral measures. Time courses to max increase and max decrease are different. Moreover, with appetitive sucrose, there are often multiple transients that occur throughout a single intraoral infusion. Coupled with a noisy time course for individual components of behavioral reactivity, we determined that averaging data across the whole infusion period (i.e. mean z-score) was the most objective way we could analyze the dopamine and behavioral responses to taste stimuli.

      (6) The authors argue strongly in the discussion against the idea that dopamine is encoding "salience." Could this initial peak (also seen in the first few trials of quinine delivery, fig 1c color plot) be a "salience" response?

      Our response above to the potential for ‘mixed’ dopamine responses to aversive sucrose led to additional analyses that support a slow updating of both behavior and dopamine to the new, aversive value of sucrose. Quinine is innately aversive and thus the Reviewer rightly points out that even here we observe an increase in dopamine release evoked by quinine on the first few trials (as observed in the heat map). We’d like to note, though, that the order of stimulus exposure was counterbalanced across rats. In those rats first receiving a sucrose session, quinine initially caused a modest increase in dopamine release during the first 10 trials (which is more pronounced in the first 2 trials). In the subsequent 2 blocks of 10 trials, no such increase was observed. Interestingly, in rats for which quinine was their first stimulus, we did not see an increase in dopamine release on the first few trials (see Author response image 4). We speculate that the initial sucrose session required the value of intraoral infusions to be updated when quinine was delivered to these rats and that, once more, the updating process may be slow and akin to a model-free process. This analysis, at present, is underpowered but will direct future attention in follow-up work.

      Author response image 4.

      (7) Related to this, the color plots showing individual trials show a reduction in the increases to positive valence sucrose across conditioning day trials and a flip from infusion-onset increase to delayed increases across test day trials. This evolution across days makes it appear that the last few conditioning day trials would be impossible to discriminate from the first few test day trials in the CTA-paired. Presumably, from strength of CTA as a paradigm, the sucrose is already aversive to the animals at the first trial of test day. Why do the authors think the response evolves across this session?

      As the Reviewer noted, Points 3-7 are related. We have speculated that the evolving dopamine response in Paired rats across test day trials reflects a model-free process. Importantly, as in the manuscript, our additional analyses once again show a tight relationship between behavioral reactivity and the dopamine response across the test session trials. It is important to note, though, that these experiments were not designed to test if responses reflect model-free or model-based processes.

      (8) Given that most of the work is using a conditioned aversive stimulus, the comparison to a primary aversive tastant quinine is useful. However, the authors saw basically no dopamine response to a primary aversive tastant quinine (measured only with GRAB-DA) and saw less noticeable decreases following CTA for NAc recordings with GRAB-DA2h than with cell body GCaMP. Given that they are using the high-affinity version of the GRAB sensor, this calls into question whether this is a true difference in release vs. soma activity or issue of high affinity release sensor making decreases in dopamine levels more difficult to observe.

      We share the same speculation as the Reviewer. Using fast-scan cyclic voltammetry, albeit measuring dopamine concentration in the dorsomedial shell, we observed a clear decrease from baseline with intraoral infusions of quinine (Roitman et al., 2008). Using fiber photometry here, the Reviewer and we note that GRAB_DA2h is a high-affinity (i.e., EC50: 7nM) dopamine sensor with relatively long off-kinetics (i.e., t1/2 decay time: 7300ms) (Labouesse et al., 2020). It may therefore be much more difficult to observe decreases (below baseline) using this sensor. The publication of new dopamine sensors - with lower affinity, faster kinetics, and greater dynamic range (Zhuo et al., 2024) – introduces opportunities for comparison and the greater potential for capturing decreases below baseline. Due to the poorer kinetics associated with GRAB_DA2h, we would not assert that direct comparisons between the GCaMP- and GRAB-based signals observed here represent true differences between somatic and terminal activity.

      References

      Choi JY, Jang HJ, Ornelas S, Fleming WT, Fürth D, Au J, Bandi A, Engel EA, Witten IB. 2020. A Comparison of Dopaminergic and Cholinergic Populations Reveals Unique Contributions of VTA Dopamine Neurons to Short-Term Memory. Cell Rep 33. doi:10.1016/j.celrep.2020.108492

      Cone JJ, Fortin SM, McHenry JA, Stuber GD, McCutcheon JE, Roitman MF. 2016. Physiological state gates acquisition and expression of mesolimbic reward prediction signals. Proc Natl Acad Sci U S A 113. doi:10.1073/pnas.1519643113

      de Jong JW, Afjei SA, Pollak Dorocic I, Peck JR, Liu C, Kim CK, Tian L, Deisseroth K, Lammel S. 2019. A Neural Circuit Mechanism for Encoding Aversive Stimuli in the Mesolimbic Dopamine System. Neuron 101. doi:10.1016/j.neuron.2018.11.005

      Deserno L, Moran R, Michely J, Lee Y, Dayan P, Dolan RJ. 2021. Dopamine enhances model-free credit assignment through boosting of retrospective model-based inference. Elife 10. doi:10.7554/eLife.67778

      Hsu TM, Bazzino P, Hurh SJ, Konanur VR, Roitman JD, Roitman MF. 2020. Thirst recruits phasic dopamine signaling through subfornical organ neurons. Proc Natl Acad Sci U S A 117:30744–30754. doi:10.1073/PNAS.2009233117/-/DCSUPPLEMENTAL

      Jung K, Krüssel S, Yoo S, An M, Burke B, Schappaugh N, Choi Y, Gu Z, Blackshaw S, Costa RM, Kwon HB. 2024. Dopamine-mediated formation of a memory module in the nucleus accumbens for goal-directed navigation. Nat Neurosci. doi:10.1038/s41593-024-01770-9

      Labouesse MA, Cola RB, Patriarchi T. 2020. GPCR-based dopamine sensors—A detailed guide to inform sensor choice for in vivo imaging. Int J Mol Sci. doi:10.3390/ijms21218048

      Lammel S, Hetzel A, Häckel O, Jones I, Liss B, Roeper J. 2008. Unique Properties of Mesoprefrontal Neurons within a Dual Mesocorticolimbic Dopamine System. Neuron 57. doi:10.1016/j.neuron.2008.01.022

      McCutcheon JE, Ebner SR, Loriaux AL, Roitman MF, Tobler PN. 2012. Encoding of aversion by dopamine and the nucleus accumbens. Front Neurosci 6. doi:10.3389/fnins.2012.00137

      Morales I, Berridge KC. 2020. ‘Liking’ and ‘wanting’ in eating and food reward: Brain mechanisms and clinical implications. Physiol Behav. doi:10.1016/j.physbeh.2020.113152

      Roitman MF, Wheeler RA, Wightman RM, Carelli RM. 2008. Real-time chemical responses in the nucleus accumbens differentiate rewarding and aversive stimuli. Nature Neuroscience 2008 11:12 11:1376–1377. doi:10.1038/nn.2219

      Schafe GE, Bernstein IL. 1996. Forebrain contribution to the induction of a brainstem correlate of conditioned taste aversion: I. The amygdala. Brain Res 741. doi:10.1016/S0006-8993(96)00906-7

      Schafe GE, Thiele TE, Bernstein IL. 1998. Conditioning method dramatically alters the role of amygdala in taste aversion learning. Learning and Memory 5. doi:10.1101/lm.5.6.481

      Wilkins EE, Bernstein IL. 2006. Conditioning method determines patterns of c-fos expression following novel taste-illness pairing. Behavioural Brain Research 169. doi:10.1016/j.bbr.2005.12.006

      Yuan L, Dou YN, Sun YG. 2021. Topography of reward and aversion encoding in the mesolimbic dopaminergic system. Journal of Neuroscience 39. doi:10.1523/JNEUROSCI.0271-19.2019

      Zhuo Y, Luo B, Yi X, Dong H, Miao X, Wan J, Williams JT, Campbell MG, Cai R, Qian T, Li F, Weber SJ, Wang L, Li B, Wei Y, Li G, Wang H, Zheng Y, Zhao Y, Wolf ME, Zhu Y, Watabe-Uchida M, Li Y. 2024. Improved green and red GRAB sensors for monitoring dopaminergic activity in vivo. Nat Methods 21. doi:10.1038/s41592-023-02100-w

    1. Author response:

      Reviewer #1:

      We agree with Reviewer 1 that the flexibility of SPRAWL also makes it difficult to interpret its outputs. We consider SPRAWL to be a hypothesis-generation tool to answer simple questions of subcellular localization in a statistically robust manner. In this paper we include examples of how it can be incorporated with other tools and wetlab experimentation to build biological intuition. Our hope is that the SPRAWL software, or even the underlying simple statistical ideas are of use to others in the field.

      Reviewer #2:

      We agree with Reviewer #2 that this manuscript does not demonstrate biological significance of the observed results of applying SPRAWL to massively multiplexed FISH datasets. We agree it would require additional wetlab experiments such as cell-type specific and isoform-resolved fluorescence in-situ hybridization, which we consider beyond the scope of this paper. We believe that the observed correlations of subcellular localization detected by SPRAWL and the differential 3’ UTR usage detected by ReadZS are compelling, although not conclusive, as are the Timp3 experimental studies.

      Our understanding is that Baysor is primarily a cell-segmentation algorithm, which is not what SPRAWL attempts to achieve. Baysor states that it identifies “cells of a distinct type will give rise to small molecular neighborhoods with stereotypical transcriptional composition, making it possible to interpret such neighborhoods without performing explicit cell segmentation” which we understand to mean that Baysor identifies spatial groupings of cells with “stereotypical transcriptional composition” rather than subcellular RNA localization. We do not think that SPRAWL and Baysor are comparable, but instead Baysor could be used as an upstream step to SPRAWL to potentially improve cell segmentation.

      Reviewer #3:

      We thank Reviewer #3 for identifying discrepancies in the paper which we addressed to the best of our abilities.

    1. Author response:

      Reviewer 1:

      Many thanks for your positive review and clear overview of our paper. We also agree with your interpretation of our results that ‘the information that is decodable and the information that is task-relevant may relate in very different ways’ and we could have emphasised this point more in the paper.

      With regards to the qualitative similarities between our models and our data, we agree that due to the fact that one can achieve any desired level of activity, decoding accuracy, performance, etc in a model, we focussed on changes over learning of key metrics that are commonly used in the field. Although this can appear qualitative at times because the raw values can differ between the data and our models, our main results are ultimately strongly quantitative (e.g., Fig. 3c,d, and Fig. 5f). We note that we could have fine tuned the models to have similar activity levels, decoding accuracies etc to our data, and on the face of it this may have made the results appear more convincing, but we felt that such trivial fine tuning does not change any of our key results in any fundamental way and is not the aim of computational modelling. The model one chooses to analyse will always be abstracted from biology in some way, by definition.

      Reviewer 2:

      Thank you very much for your kind comments and clear overview of our paper. We also hope that our paper ‘provides a valuable analysis of the effect of two parameters on representations of irrelevant stimuli in trained RNNs.’

      With regards to our suggested mechanism of suppressing dynamically irrelevant stimuli, we are sorry that we did not provide a sufficient enough explanation of suppressing color representations when they are irrelevant. We hopefully provide a longer explanation here. Our mechanism of suppression of dynamically irrelevant stimuli does not suggest that it becomes un-suppressed later, only the behaviourally relevant variable should be decodable when it is needed (i.e., XOR). Although color decodability did increase slightly in the data and some of the models from the color period to the shape period, it was typically not significant and was therefore not a result that we emphasise in the paper (although this could be analysed further to see if additional mechanisms might explain it). We emphasise throughout that color decoding is typically similar between color and shape periods (either high or low) and either decreases or increases over time in both periods. We also focus on whether color decodability increases or decreases over learning during the color period when it is irrelevant (which we call ‘early color decoding’). Importantly, decoding of color or shape is not needed to perform the task, only decoding of XOR is needed to perform the task. For example, in our two-neuron networks, we observe perfect XOR decoding and only 75% decoding of color and shape, and decoding during the shape period is the same as the network at initialisation before any training. The mechanism we suggest of suppressing dynamically irrelevant stimuli does not predict that that stimulus should be un-suppressed later, only the behaviourally relevant variable should be decodable (i.e., XOR). Instead, what we try to explain is that color inputs can generate 0 firing rate during the color period, when that input does not need to be used and is therefore irrelevant (and color decoding decreases during the color period over learning), but these inputs can be combined with shape inputs later to create a perfectly decodable XOR response.

      With regards to interpretation of our results based on metabolic cost constraints, we feel that this is an unnecessarily strong criticism to say that it ‘is not backed up by the presented data/analyses.’ All of our models were trained with only a metabolic cost constraint, a noise strength, and a task performance term. Therefore, the results of the models are directly attributable to the strength of metabolic cost that we use. Additionally, although one could in principle pick any of infinitely many different parameters to change and measure the response in an optimized network, varying metabolic cost and noise are two of the most fundamental phenomena that neural circuits must contend with, and many studies have analysed the impact they have on neural circuit dynamics. Furthermore, in line with previous studies (Yang et al., 2019, Whittington et al., 2022, Sussillo et al., 2015, Orhan et al., 2019, Kao et al., 2021, Cueva et al., 2020, Driscoll et al., 2022, Song et al., 2016, Masse et al., 2019, Schimel et al., 2023), we operationalized metabolic cost in our models through L2 firing rate regularization. This cost penalizes high overall firing rates. (Such an operationalization of metabolic cost also makes sense for our models because network performance is based on firing rates rather than subthreshold activities.) There are however alternative conceivable ways to operationalize a metabolic cost; for example L1 firing rate regularization has been used previously when optimizing neural networks and promotes more sparse neural firing. Interestingly, although our L2 is generally conceived to be weaker than L1 regularization, we still found that it encouraged the network to use purely sub-threshold activity in our task. The regularization of synaptic weights may also be biologically relevant because synaptic transmission uses the most energy in the brain compared to other processes (Faria-Pereira et al., 2022, Harris et al., 2012). Additionally, even subthreshold activity could be regularized as it also consumes energy (although orders of magnitude less than spiking (Zhu et al., 2019)). Therefore, future work will be needed to examine how different metabolic costs affect the dynamics of task-optimized networks.

      With regards to color representations in PFC only qualitatively matching those in our models, in line with the comment from Reviewer 1, we agree that due to the fact that one can achieve any desired level of activity, decoding accuracy, performance, etc in a model, we focussed on changes over learning of key metrics that are commonly used in the field. Although this can appear qualitative at times because the raw values can differ between the data and our models, our main results are ultimately strongly quantitative (e.g., Fig. 3c,d, and Fig. 5f). We note that we could have fine tuned the models to have similar activity levels, decoding accuracies etc to our data, and on the face of it this may have made the results appear more convincing, but we felt that such trivial fine tuning does not change any of our key results in any fundamental way and is not the aim of computational modelling. The model one chooses to analyse will always be abstracted from biology in some way, by definition. Finally, of course we note that changes in color decoding could result from other causes, but we focussed on two key phenomena that neural circuits must contend with: noise and metabolic costs. Therefore, it is likely that these two variables play a strong role in stimulus representations in neural circuits

      Reviewer 3:

      Thank you very much for your thorough and clear overview of our paper and we agree that it is important to investigate phenomena and manipulations in computational models that are almost impossible to do in vivo and we are pleased you found our mathematical analyses rigorous and nicely documented.

      Although we agree that it can be useful to study the responses of individual neurons, we focussed on population analyses of all available neurons without omitting or specifically selecting neurons based on their dynamics. We are also not suggesting that the activities of individual ‘neurons’ in the models and data should be similar since our models are highly abstract firing rate models. But rather, the overall computational strategy, which one can access through population decoding and cross-generalised decoding, was what we were interested in comparing between the models and the data and is arguably the correct level of analysis of such models (an data) given our key questions (Vyas et al., 2020, Churchland et al., 2012, Mante et al., 2013, Ebitz et al., 2021).

      We also certainly agree and are more than open to the fact that suppression of irrelevant stimuli may already be happening on the inputs arriving in PFC. Indeed, we actually suggest this as the mechanism in Fig. 5 (together with recurrent circuit dynamics that make use of these inputs).

      With regards to the dynamics of the two-neuron networks not being ‘informative of what happens in brain networks’, we agree that these models are very simplified and may only contain very fundamental similarities with biological neurons. However, we only used them to illustrate the fundamental mechanism of generating 0 firing rate during the color epoch so that it is more easily understandable for readers as they can see the entire 2-dimensional state space and the entire computational strategy can be seen (Fig. 5a-d). We also note that we did this for both rectified linear and tanh networks, thus showing that such a mechanism is preserved across fundamentally different firing rate nonlinearities. Additionally, after illustrating this fundamental mechanism of networks receiving color information but generating 0 firing rate, we show that the exact same mechanism is at play in the large networks we use throughout the paper (Fig. 5e). We also only compare the large networks to our neural recordings. We do agree though that it would be interesting to further compare fundamental similarities and differences between our models and our neural recordings (always at the right level of analysis that makes sense for our chosen models) to show that the mechanisms we uncover in our models are also strongly relevant for our data.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors have used full-length single-cell sequencing on a sorted population of human fetal retina to delineate expression patterns associated with the progression of progenitors to rod and cone photoreceptors. They find that rod and cone precursors contain a mix of rod/cone determinants, with a bias in both amounts and isoform balance likely deciding the ultimate cell fate. Markers of early rod/cone hybrids are clarified, and a gradient of lncRNAs is uncovered in maturing cones. Comparison of early rods and cones exposes an enriched MYCN regulon, as well as expression of SYK, which may contribute to tumor initiation in RB1 deficient cone precursors.

      Strengths:

      (1) The insight into how cone and rod transcripts are mixed together at first is important and clarifies a long-standing notion in the field.

      (2) The discovery of distinct active vs inactive mRNA isoforms for rod and cone determinants is crucial to understanding how cells make the decision to form one or the other cell type. This is only really possible with full-length scRNAseq analysis.

      (3) New markers of subpopulations are also uncovered, such as CHRNA1 in rod/cone hybrids that seem to give rise to either rods or cones.

      (4) Regulon analyses provide insight into key transcription factor programs linked to rod or cone fates.

      (5) The gradient of lncRNAs in maturing cones is novel, and while the functional significance is unclear, it opens up a new line of questioning around photoreceptor maturation.

      (6) The finding that SYK mRNA is naturally expressed in cone precursors is novel, as previously it was assumed that SYK expression required epigenetic rewiring in tumors.

      Weaknesses:

      (1) The writing is very difficult to follow. The nomenclature is confusing and there are contradictory statements that need to be clarified.

      (2) The drug data is not enough to conclude that SYK inhibition is sufficient to prevent the division of RB1 null cone precursors. Drugs are never completely specific so validation is critical to make the conclusion drawn in the paper.

      We thank the reviewer for describing the study’s strengths and weaknesses.  In the upcoming revision, we will:

      (1) improve the writing and clarify the nomenclature and contradictory statements, particularly those noted in the Reviewer’s Recommendations for Authors; and

      (2) scale back the claims related to the role of SYK in the cone precursor response to RB1 loss; we agree that genetic perturbation of SYK is required to prove it’s role and will perform such analyses in a separate study.

      Reviewer #2 (Public review):

      Summary:

      The authors used deep full-length single-cell sequencing to study human photoreceptor development, with a particular emphasis on the characteristics of photoreceptors that may contribute to retinoblastoma.

      Strengths:

      This single-cell study captures gene regulation in photoreceptors across different developmental stages, defining post-mitotic cone and rod populations by highlighting their unique gene expression profiles through analyses such as RNA velocity and SCENIC. By leveraging full-length sequencing data, the study identifies differentially expressed isoforms of NRL and THRB in L/M cone and rod precursors, illustrating the dynamic gene regulation involved in photoreceptor fate commitment. Additionally, the authors performed high-resolution clustering to explore markers defining developing photoreceptors across the fovea and peripheral retina, particularly characterizing SYK's role in the proliferative response of cones in the RB loss background. The study provides an in-depth analysis of developing human photoreceptors, with the authors conducting thorough analyses using full-length single-cell RNA sequencing. The strength of the study lies in its design, which integrates single-cell full-length RNA-seq, long-read RNA-seq, and follow-up histological and functional experiments to provide compelling evidence supporting their conclusions. The model of cell type-dependent splicing for NRL and THRB is particularly intriguing. Moreover, the potential involvement of the SYK and MYC pathways with RB in cone progenitor cells aligns with previous literature, offering additional insights into RB development.

      Weaknesses:

      The manuscript feels somewhat unfocused, with a lack of a strong connection between the analysis of developing photoreceptors, which constitutes the bulk of the manuscript, and the discussion on retinoblastoma. Additionally, given the recent publication of several single-cell studies on the developing human retina, it is important for the authors to cross-validate their findings and adjust their statements where appropriate.

      We thank the reviewer for summarizing the main findings and for noting the compelling support for the conclusions, the intriguing cell type-dependent splicing of rod and cone lineage factors, and the insights into retinoblastoma development. 

      We concur that some studies of developing photoreceptors were not well connected to retinoblastoma, which diminished the focus.  However, we suggest that it was valuable to highlight how deep, long read sequencing provided new insights into retinoblastoma. For example, our demonstration of similar rod- and cone-related gene expression in early cones and RB cells addressed concerns with the proposed cone cell-of-origin, adding disease relevance.

      We will address the Reviewer’s request to cross-validate our findings with those of other single-cell studies of developing human retina and to adjust the related statements in our upcoming revision.

      Reviewer #3 (Public review):

      Summary:

      The authors use high-depth, full-length scRNA-Seq analysis of fetal human retina to identify novel regulators of photoreceptor specification and retinoblastoma progression.

      Strengths:

      The use of high-depth, full-length scRNA-Seq to identify functionally important alternatively spliced variants of transcription factors controlling photoreceptor subtype specification, and identification of SYK as a potential mediator of RB1-dependent cell cycle reentry in immature cone photoreceptors.

      Human developing fetal retinal tissue samples were collected between 13-19 gestational weeks and this provides a substantially higher depth of sequencing coverage, thereby identifying both rare transcripts and alternative splice forms, and thereby representing an important advance over previous droplet-based scRNA-Seq studies of human retinal development.

      Weaknesses:

      The weaknesses identified are relatively minor. This is a technically strong and thorough study, that is broadly useful to investigators studying retinal development and retinoblastoma.

      We thank the reviewer for describing the strengths of the study. Our upcoming revision will address the minor concerns that were raised separately in the Reviewer’s Recommendations for Authors.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Minor Concern (Original Comment 1):

      “We think that this is sufficient to address our concern. Some citations may be in order to underpin the new text.”

      We appreciate the reviewer’s assessment that the revised text clarifies the complexity of the upstream circuitry beyond the retina, including inputs from the thalamus. As recommended, we have now included additional citations in the revised manuscript to support these points.

      Major Concern (Original Comment 5):

      “We do not feel that this important concern has been addressed. The stats are definitively negative. There is no statistical evidence from these data that multisensory integration is occurring in this assay. The anesthesia, paralysis, and low n may provide explanations for this negative result, but it is still a negative result (p=0.5269). To show two examples of multisensory integration for subthreshold stimuli fits the narrative, but this result is not supported. Examples where individual stimuli caused APs (and combined stimuli did not) also occurred, presumably, and at a rate that is statistically indistinguishable to the examples shown in Figure 5. As such, if results from this assay are going to be in the manuscript, acoustic-only and tectum-only examples should be shown as well, although they would not fit the narrative. To be meaningful, this experiment would have to show that multisensory integration is happening in this circuit. Frustrating though it must be, the experiment has given a negative result to that question.”

      We understand the reviewer’s concern regarding Figure 5C and the firing of action potentials (APs) in response to multisensory stimuli. We acknowledge that our assay is not suited to answer this question definitively and that our results do not provide statistical support for this hypothesis. In response, we have removed the examples previously shown in Figure 5C, along with the related description in the Results section (lines 420–426), to avoid implying unsupported integration in suprathreshold conditions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Point 1: While the manuscript is methodologically sound, the following aspects of image acquisition and data analysis need to be clarified to ensure replicability and reproducibility. The authors state that the sample is a "population-derived adult lifespan sample", the lack of demographic information makes it impossible to know if the sample is truly representative. Though this may seem inconsequential, education may impact both cognitive performance and functional activation patterns. Moreover, the authors do not report race/ethnicity in the manuscript. This information is essential to ensure representativeness in the sample. It is imperative that barriers to study participation within minoritized groups are addressed to ensure rigor and reproducibility of findings.

      First, the section Methods-Participants has been updated to refer readers to a prior article where the sample’s demographics are broken down into nine decile age groups (see Wu et al. 2023 Table 1), including information about their education levels. Secondly, we have updated the Data Availability section text to indicate that all Cam-CAN IDs are included in the available OSF datasets, allowing anyone to verify additional participant demographics described in the Cam-CAN protocol article (Shafto et al., 2014). Third, we have updated the Participants section text to refer to another prior study that reported on the representativeness of the Cam-CAN sample indicating that at least some elements of the sample have been independently deemed as representative (e.g., Sex).

      Page-24

      “A healthy population-derived adult lifespan human sample (N = 223; ages approximately uniformly distributed from 19 - 87 years; females = 112; 50.2%) was collected as part of the Cam-CAN study (Stage 3 cohort; Shafto et al., 2014). Participants were fluent English speakers in good physical and mental health, based on the Cam-CAN cohort’s exclusion criteria which includes poor mini mental state examination, ineligibility for MRI and medical, psychiatric, hearing or visual problems. Throughout analyses, age is defined at the Home Interview (Stage 1; Shafto et al., 2014). The study was approved by the Cambridgeshire 2 (now East of England–Cambridge Central) Research Ethics Committee and participants provided informed written consent. Further demographic information of the sample is reported in Wu et al. (2023) and is openly available (see section Data Availability) with a recent report indicating the representativeness of the sample across sexes (Green et al., 2018).”

      Page-30

      “Raw and minimally pre-processed MRI (i.e., from automatic analysis; Taylor et al., 2017) and behavioural data are available by submitting a data request to Cam-CAN (https://camcan-archive.mrc-cbu.cam.ac.uk/dataaccess/). The univariate and multivariate ROI data, and behavioural data, can be downloaded from the Open Science Framework, which includes Cam-CAN participant identifiers allowing the retrieval of any additional demographic data (https://osf.io/v7kmh), while the analysis code is available on GitHub.”

      Point 2: For the whole-brain analysis in which the ROIs were derived, the authors used a threshold-free cluster enhancement (TFCE; Smith & Nichols 2009). The methodological paper cited suggests that individuals' TCFE image should still be corrected for multiple comparisons using the following: "to correct for multiple comparisons, one [...] has to build up the null distribution (across permutations of the input data) of the maximum (across voxels) TFCE score, and then test the actual TFCE image against that. Once the 95th percentile in the null distribution is found then the TFCE image is simply thresholded at this level to give inference at the p < 0.05 (corrected) level." (Smith & Nichols, 2009). Although the authors mention that clusters were estimated using 2000 permutations, there is no mention of the TFCE image itself being thresholded. While this would impact the overall size of the ROIs used in the study, the remaining analyses are methodologically sound.

      We have updated the text to detail the t=1.97 (i.e., p = .05) threshold we applied before interpretation of the resultant TFCE images to the section: Experimental Design & Statistical Analysis. This threshold value can also be verified in the analytics code that is referenced on GitHub from the section Data Availability within the requisite toolbox functions: https://github.com/kamentsvetanov/CommonalityAnalysis/blob/main/code/ca_vba_tfce_threshold.m#L24 and https://github.com/kamentsvetanov/CommonalityAnalysis/blob/main/code/external/ca_matlab_tfce_transform.m

      Page-30

      “For whole-brain voxelwise analyses, clusters were estimated using threshold-free cluster enhancement (TFCE; Smith & Nichols 2009) with 2000 permutations and the resulting images were thresholded at a t-statistic of 1.97 before interpretation.”

      Point 3: The authors should consider moving the ROI section to results. The way the manuscript currently reads, the ROIs seem to be derived a priori as opposed to being derived from activation maps in the current study.

      After consideration of this point, we have decided to leave the methodological details regarding the definition of ROIs in the methods, to maintain the focus of the Results section. However, we have improved signposting in the results section to highlight that the ROIs were derived from the overlapped activation maps.

      Page-8

      “Crucially, two areas of the brain showed spatially-overlapping positive effects of age and performance, which is suggestive of an age-related compensatory response (Figure 2A yellow intersection). These were in bilateral cuneal cortex (Figure 2B magenta) and bilateral frontal cortex (Figure 2B brown), the latter incorporating parts of the middle frontal gyri and anterior cingulate. Therefore, based on traditional univariate analyses, these are two candidate regions for age-related functional compensation (Cabeza et al. 2013; 2018). Accordingly, we defined regions of interest within these two regions using the overlap activation maps (see section: ROIs) to be used for subsequent univariate and multivariate analysis.”

      Point 4: The manuscript can be strengthened by explaining why the authors chose a greedy search algorithm over a dynamic Bayesian model.

      The text is updated to refer to appropriateness of the computationally efficient greedy search implementation, due to the size of the fMRI cohort dataset.

      Page-28

      “The pattern weights specifying the mapping of data features to the target variable are optimized with a greedy search algorithm using a standard variational scheme (Friston et al., 2007) which was particularly appropriate given the large dataset.”

      Reviewer #2:

      Point 1: However, it might have been nice to see an analysis of a more crystallised intelligence task included too, as a contrast since this is an area that does not demonstrate such a decline (and perhaps continues to improve over aging).

      We (Samu et al., 2017) have previously investigated, but failed to find, univariate evidence for functional compensation in this cohort’s performance on a sentence comprehension task that is more closely aligned to a measure of crystallised intelligence. Based on the additional previous studies where we have applied these types of univariate and multivariate criteria of functional compensation (Morcom & Henson, 2018; Knights et al., 2021), we have consistently observed that the uni-/multivariate effects are in the same direction. Therefore, we would not initially expect a different conclusion here, where the univariate and multivariate effects suggest different outcomes. Notably, the univariate analysis approach in Samu et al. (2017) did differ from focusing on the age x behaviour interaction term here, so it could still be worth future investigation, but it does seem less likely that evidence of compensation would be observed than for fluid intelligence. However, as the Reviewer suggests, such a task may make another good contrast to show evidence against the existence of functional compensation (as in Morcom & Henson, 2018; Knights et al., 2021).

      Point 2: Figure 1B: Consider adding coefficients describing relationships to plots.

      Annotations of the coefficients have been added to Figure 1B:

      Point 3: Figure 2C. The scale of the axis for RSFA-Scales cuneal cortex ROI activations should be the same as the other 3 plots.

      Figure axes are updated such that ROIs are on matching scales, according to whether data were RSFA-scaled or not.

      Point 4: Figure 2C. Adding in the age ranges for each of the three groups following the tertile split may be informative to the reader.

      The age group tertile definition used for Figure 2C visualisations is now added to the Figure description.

      Page-10

      “Figure 2. Univariate analysis. (A) Whole-brain effects of age and performance. Age (green) and performance (red) positively predicted unique aspects of increased task activation, with their spatial overlap (yellow) being overlaid on a template MNI brain, using p < 0.05 TFCE. (B) Intersection ROIs. A bilateral cuneal (magenta) and frontal cortex (brown) ROI were defined from voxels that showed a positive and unique effect of both age and performance (yellow map in Figure 2A). (C) ROI Activation. Activation (raw = left; RSFA-scaled = right) is plotted against behavioural performance based on a tertile split between three age groups (19-44, 45-63 & 64-87 years).”

      Reviewer #3:

      Point 1: [Public Review] 1) I don't quite follow the argumentation that compensatory recruitment would need to show via non-redundant information carried by any given non-MDN region (cf. p14). Wouldn't the fact that a non-MDN region carries task-related information be sufficient to infer that it is involved in the task and, if activated increasingly with increasing age, that its stronger recruitment reflects compensation, rather than inefficiency or dedifferentiation? Put differently, wouldn't "more of the same" in an additional region suffice to qualify as compensation, as compared to the "additional information in an additional region" requirement set by the authors? As a consequence, in my honest opinion, showing that decoding task difficulty from non-MDN ROIs works better with higher age would already count as evidence for compensation, rather than asking for age-related increases in decoding boosts obtained from adding such ROIs. It would be interesting to see whether the arguably redundant frontal ROI would satisfy this less demanding criterion. At any rate, it seems useful to show whether the difference in log evidence for the real vs. shuffled models is also related to age.

      We agree with the logic for conducting a weaker assessment of functional compensation whereby a brain region does not necessarily have to provide a unique contribution beyond that of the ordinarily activated task-relevant network. However, although non-unique recruitment is predicted by a compensation theory, it can also be explained by a nonspecific mechanism that recruits multiple regions in tandem. In contrast, unique additional recruitment is compatible with compensation but not with nonspecific recruitment. In this article, and those prior (Morcom & Henson, 2018; Knights et al. 2021), we have also deliberately avoided using the specific kind of analysis proposed (i.e., testing for an effect of age on differential log evidence) because these would involve applying statistical tests directly to the log evidence, a variable that is already a statistical test output.

      Nevertheless, temporarily putting these caveats aside, we did run the suggested test. Results from multiple regression showed that using log evidence from frontal cortex models still did not meet this less demanding criterion for functional compensation as there was an effect of age in the opposite direction to that expected by functional compensation: there was a significant negative effect of age (t(218) = -7.95, p = < .001) indicating that as age increased, the difference in log evidence decreased. This effect is visualised below for transparency, but we preferred not to add this information to the article because we do not wish to encourage using this kind of analysis for the reason mentioned above. Thus, although our main multivariate test of interest is stringent, the additional step of mapping log evidence back to the boost-likelihood categories (e.g., boost vs. no difference to model performance) lends itself to the more appropriate logistic regression statistical approach.

      Author response image 1.

      Negative effect of age on MVB log evidence model outcomes for frontal cortex.

      A different approach that could be taken to assess a more lenient definition of functional compensation would be to analyse the effects of age on the spread of multivariate responses predicting task difficulty (i.e., standard deviation of fitted MVB voxel weights; also see Morcom & Henson, 2018; Knights et al., 2021) specifically from models that only include the candidate ‘compensation’ ROIs.

      Accordingly, these analyses and their discussion have been added to the article. To summarise, these analyses showed that (1) the frontal cortex still did not show evidence of functional compensation (i.e., a negative effect of age like in Morcom & Henson, 2018) and (2) no effect of age on the cuneal ROI, implying that the original model comparison approach (i.e., Figure 2C in the manuscript now) can provide more sensitivity for detecting evidence of functional compensation (perhaps because of the importance of including task-relevant network responses when building decoding models).

      Page-15

      “As a final analysis, we also tested a more lenient definition of functional compensation, whereby the multivariate contribution from the “compensation ROI” does not necessarily need to be above and beyond that of the task-relevant network (Morcom & Henson, 2018; Knights et al., 2021). To do this, we again assessed whether age was associated with an increase in the spread (standard deviation) of the weights over voxels, for smaller models containing only the cuneal or frontal ROI. This tested whether increased age led to more voxels carrying substantial information about task difficulty, a pattern predicted by functional compensation (but also consistent with non-specific additional recruitment). In this case, the results of this test did not support functional compensation, as there was no effect detected for the cuneal cortex and even a negative effect of age for the frontal cortex where the spread of the information across voxels was lower for older age (Figure 3C; Table 2).”

      Page-21

      “The age- and performance-related activation in our frontal region satisfied the traditional univariate criteria for functional compensation, but our multivariate (MVB) model comparison analysis showed that additional multivariate information beyond that in the MDN was absent in this region, which is inconsistent with the strongest definition of compensation. In fact, the results from the spread analysis showed that as age increased, this frontal area processed less, rather than more, multivariate information about the cognitive outcome (Figure 3C) as previously observed in two (memory) tasks for a comparable ROI within the same Cam-CAN cohort (Morcom & Henson, 2018).”

      Page-24

      “This said, univariate criteria for functional compensation will continue to play a role in hypothesis testing. For instance, the over-additive interaction observed in the cuneal cortex - where the increase in activity with better performance is more pronounced in older adults - offers stronger evidence of compensation compared to the simple additive effect of age and performance observed in the frontal cortex (Figure 2C). So far, the two studies that have combined these rigorous univariate, behavioral and multivariate approaches to assess functional compensation (i.e., Knights et al., 2021; the present study) have generally found converging evidence regardless of the method used. However, it is important to note that the MVB approach uniquely shifts the focus from individual differences to the specific task-related information that compensatory neural activations are assumed to carry and provides a specific test of region- (or network-) unique information. With further studies, it may also be that multivariate approaches prove more sensitive for detecting compensation effects than when using mean responses over voxels (e.g., Friston et al., 1995) particularly since over-additive effects are challenging to observe because compensatory effects are typically ‘partial’ and do not fully restore function (for review see Scheller et al., 2014; Morcom & Johnson, 2015). Within the multivariate analysis options themselves, it is also interesting to highlight that the stringent MVB boost likelihood analysis could detect functional compensation unlike the more lenient analysis focusing on the spread of MVB voxel weights. This suggests the importance of including task-relevant network responses when building decoding models to assess compensation.”

      Page-32

      “Alongside the MVB boost analysis, we also included an additional measure using the spread (standard deviation) of voxel classification weights (Morcom & Henson, 2018). This measure indexes the absolute amplitude of voxel contributions to the task, reflecting the degree to which multiple voxels carry substantial task-related information. When related to age this can serve as a multivariate index of information distribution, unlike univariate analyses. However, it is worth highlighting that even if an ROI shows an effect of age on this spread measure, such an effect could instead be explained by a non-specific mechanism that represents the same information in tandem across multiple regions (rather than reflecting compensation) as seen previously (Knights et al., 2021; also see Morcom & Johnson, 2015). Thus, it is the MVB boost analysis that is the most compelling assessment of functional compensation because it can directly detect novel information representation.”

      Point 2: [Public Review] 2) Relatedly, does the observed boost in decoding by adding the cuneal ROI (in older adults) really reflect "additional, non-redundant" information carried by this ROI? Or could it be that this boost is just a statistical phenomenon that is obtained because the cuneus just happens to show a more clear-cut, less noisy difference in hard vs. easy task activation patterns than does the MDN (which itself may suffer from increased neural inefficiency in older age), and thus the cuneaus improves decoding performance without containing additional (novel) pieces of information (but just more reliable ones)? If so, the compensation account could still be maintained by reference to the less demanding rationale for what constitutes compensation laid out above.

      We agree that this is a possibility and have added this as an additional explanation to the Discussion. We have also discussed why we think it is a less likely possibility, but do concede that it cannot be ruled out currently.

      Page-20

      “Another possibility is that the age-related increases in fMRI activations (for hard versus easy) in one or both of our ROIs do not reflect greater fMRI signal for hard problems in older than younger people, but rather lower fMRI signal for easy problems in the older. Without a third baseline condition, we cannot distinguish these two possibilities in our data. However, a reduced “baseline” level of fMRI signal (e.g., for easy problems) in older people is consistent with other studies showing an age-related decline in baseline perfusion levels, coupled with preserved capacity of cerebrovascular reactivity to meet metabolic demands of neuronal activity at higher cognitive load  (Calautti et al., 2001; Jennings et al., 2005). Though age-related decline in baseline perfusion occurs in the cuneal cortex (Tsvetanov et al., 2021), the brain regions showing modulation of behaviourally-relevant Cattell fMRI activity by perfusion levels did not include the cuneal cortex (Wu et al., 2023). This suggests that the compensatory effects in the cuneus are unlikely to be explained by age-related hypo-perfusion, consistent with the minimal effect here of adjusting for RSFA (Figure 2C).

      One final possibility is whether the observed boost in decoding from adding the cuneal ROI simply reflects less noisy task-related information (i.e., a better signal-to-noise ratio (SNR)) than the MDN and, consequently, the boosted decoding is the result of more resilient patterns of information (rather than the representation of additional information) based on a steeper age-related decline of SNR in the MDN. Overall then, as none of the explanations above agree with all aspects of the results, to functionally explain the role of the cuneal cortex in this task would require further investigation.”

      Point 3: [Public Review] 3) On page 21, the authors state that "...traditional univariate criteria alone are not sufficient for identifying functional compensation." To me, this conclusion is quite bold as I'd think that this depends on the unvariate criterion used. For instance, it could be argued that compensation should be more clearly indicated by an over additive interaction as observed for the relationship of cuneal activity with age and performance (i.e., the activity increase with better performance becomes stronger with age), rather than by an additive effect of age and performance as observed for the prefrontal ROI (see Fig. 2C). In any case, I'd appreciate it if the authors discussed this issue and the relationship between univariate and multivariate results in more detail (e.g. how many differences in sensitivity between the two approaches have contributed), in particular since the sophisticated multivariate approach used here is not widely established in the field yet.

      We have now considered this point further in a section of the Discussion (which is merged with points 1 & 2 above) about the relevance and distinction of univariate / multivariate criteria for functional compensation. As described in text below, whilst we agree that univariate / behavioural approaches have a role in testing functional compensation, we still view the MVB boost analysis to be a particularly compelling approach for assessing this theory.

      Page-22

      “This said, univariate criteria for functional compensation will continue to play a role in hypothesis testing. For instance, the over-additive interaction observed in the cuneal cortex - where the increase in activity with better performance is more pronounced in older adults - offers evidence of compensation compared to the simple additive effect of age and performance observed in the frontal cortex (Figure 2C). However, the conclusions that can be drawn from age-related differences in cross-sectional associations of brain and behaviour are limited, mainly because individual performance differences are largely lifespan-stable (see Lindenberger et al., 2011; Morcom & Johnson, 2015). So far, the two studies that have combined these univariate-behavioral and multivariate approaches to assess functional compensation (i.e., Knights et al., 2021; the present study) have generally found converging evidence regardless of the method used. However, it is important to note that the MVB approach uniquely shifts the focus from individual differences to the specific task-related information that compensatory neural activations are assumed to carry. With further studies, it may also be that multivariate approaches prove more sensitive for detecting compensation effects than when using mean responses over voxels (e.g., Friston et al., 1995) particularly since over-additive effects are challenging to observe because compensatory effects are typically ‘partial’ and do not fully restore function. Within the multivariate analysis options themselves, it is also interesting to highlight that the stringent MVB boost likelihood analysis could detect functional compensation unlike the more lenient analysis focusing on the spread of MVB voxel weights. This suggests the importance of including task-relevant network responses when building decoding models to asses compensation.”

      Point 4: [Public Review] 4) As to the exclusion of poorly performing participants (see p24): If only based on the absolute number of errors, wouldn't you miss those who worked (overly) slowly but made few errors (possibly because of adjusting their speed-accuracy tradeoff)? Wouldn't it be reasonable to define a criterion based on the same performance measure (correct - incorrect) as used in the main behavioural analyses?

      This is a good point, though if we were to exclude participants using a chance level exclusion rate based on the formulae used for measuring behavioural performance, this removes identical subjects to those originally excluded. Based on this, the text has been updated to reflect this more parsimonious approach for defining exclusion criteria.

      Page-25

      “In a block design, participants completed eight 30-second blocks which contained a series of puzzles from one of two difficulty levels (i.e., four hard and four easy blocks completed in an alternating block order; Figure 1A). The fixed block time allowed participants to attempt as many trials as possible. Therefore, to balance speed and accuracy, behavioural performance was measured by subtracting the number of incorrect from correct trials and averaging over the hard and easy blocks independently (i.e., ((hard correct - hard incorrect) + (easy correct - easy incorrect))/2; Samu et al., 2017). For assessing reliability and validity, behavioural performance (total number of puzzles correct) was also collected from the same participants during a full version of the Cattell task (Scale 2 Form A) administered outside the scanner at Stage 2 of the Cam-CAN study (Shafto et al., 2014). Both the in- and out-of-scanner measures were z-scored. We excluded participants (N = 28; 17 females) who performed at chance level ((correct + incorrect) / incorrect < 0.5) on the fMRI task, leading to the same subset as reported in Samu et al. (2017).”

      Point 5: [Public Review] 5) Did the authors consider testing for negative relationships between performance and brain activity, given that there is some literature arguing that neural efficiency (i.e. less activation) is the hallmark of high intelligence (i.e. high performance levels in the Cattell task)? If that were true, at least for some regions, the set of ROIs putatively carrying task-related information could be expanded beyond that examined here. If no such regions were found, it would provide some evidence bearing on the neural efficiency hypothesis.

      No, we did not test for negative relationships between performance and brain activity in this study. However, In Wu et al. (2023) we did specifically test for this and neither of the relevant results reported in section 3.3.1 (i.e., unique relationship between activity and performance) nor section 3.3.2 (i.e., age-related relationship between activity and performance) showed the queried direction of effects. Note that the negative effect in section 3.3.2 (Age U Performance) is a more unique suppression effect representing a positive relationship between performance and activity where this becomes stronger as age is added to the model.

      Point 6: [Recommendations for the authors] 1) Page 26: It is not quite clear how the authors made sure their age and performance covariates functioned as independent regressors in the univariate group-level GLM, given the correlation between age and performance (i.e. shared variance).

      We included age and performance as covariates (of the age x performance effect of interest) by simply including these as independent regressors in the group-level GLM design matrix in addition to the interaction term (i.e., activity ~ age*performance + covariates equivalent to activity ~ age:performance + age + performance + covariates; Wilkinson & Roger 1973 notation), allowing us to examine the unique variance explained by each predictor (Table 1 and Table 2) and to control for their shared variance.

      We should note that while the GLM approach we used accounts for unique and shared effects, it does not explicitly report shared effects in its standard output. To directly examine shared variance, one would need to employ commonality analysis. For reference, results from a commonality analysis on this task have been previously reported in Wu et al. (2023).

      Prompted by this point, we have made some further minor improvements to help ensure our methodological steps are reproducible, as highlighted below.

      Page-30

      “Continuous age and behavioural performance variables were standardised and treated as linear predictors in multiple regression throughout the behavioural (Figure 1B), wholebrain voxelwise (Figure 1C/2A), univariate (Table 1; Figure 1B/2B) and MVB (Table 2; Figure 3) analyses. Throughout, sex was included as a covariate. The models, including interaction terms, can be described, according to Wilkinson & Roger’s (1973) notation, as activity ~ age * performance + covariates (which is equivalent to activity ~ age:performance + age + performance + covariates), allowing us to examine the unique variance explained by each predictor (Table 1) and to control for their shared variance. For whole-brain voxelwise analyses, clusters were estimated using threshold-free cluster enhancement (TFCE; Smith & Nichols 2009) with 2000 permutations and the resulting images were thresholded at a t-statistic of 1.97 before interpretation. Bonferroni correction was applied to a standard alpha = 0.05 based on the two ROIs (cuneal and frontal) that were examined. For Bayes Factors, interpretation criteria norms were drawn from Jarosz & Wiley (2014).”

      Point 7: [Recommendations for the authors] 2) Figure 3: I suggest changing the subheading in panel B to "Joint vs. MDN-only Model," in line with the wording in the main text.

      The subheading of Figure 3B is updated as suggested to `Joint vs. MDN-only Model`.

      Point 8: [Recommendations for the authors] 3) In Figures 1C and 2A, MNI z coordinates should be added to the section views. The appreciation of Figure 2B could be enhanced by adding some rendering with a saggital (medial and/or lateral) view.

      The slice mosaics in Figure 1C and 2A are now updated with each slice’s MNI Z coordinates and mentioned in the figure descriptions.

      Point 9: [Recommendations for the authors] 4) Page 7 (l. 135): What exactly is meant by "lateral occipital temporal cortex"?

      The text is updated to specify the anatomical landmarks that were used for guidance when referring to activation within the lateral occipital temporal cortex, based on ROI criteria definitions used in Knights, Mansfield et al. (2021):

      Page-7 Line-135:

      “Additional activation was observed bilaterally in the inferior/ventral and lateral occipital temporal cortex (i.e., a cluster around the lateral occipital sulcus that extended anteriorly beyond the anterior occipital sulcus), likely due to the visual nature of the task.”

      Point 10: [Recommendations for the authors] 5) On p18ff. (ll. 259-318) the authors discuss in quite some detail how the age-related decoding boost seen with the cuneus ROI can be functionally explained, but it seems like none of the explanations agrees with all aspects of the results. While this is not a major problem for the paper, it may be advisable if this part of the discussion ends with a clearer statement that this issue is not fully solved yet and provides material for future research.

      A more direct sentence has been added to make it clear that future investigation will be needed to explain the role of the cuneal cortex here.

      Page-20 Line-322:

      “Another possibility is that the age-related increases in fMRI activations (for hard versus easy) in one or both of our ROIs do not reflect greater fMRI signal for hard problems in older than younger people, but rather lower fMRI signal for easy problems in the older. Without a third baseline condition, we cannot distinguish these two possibilities in our data. However, a reduced “baseline” level of fMRI signal (e.g., for easy problems) in older people is consistent with other studies showing an age-related decline in baseline perfusion levels, coupled with preserved capacity of cerebrovascular reactivity to meet metabolic demands of neuronal activity at higher cognitive load  (Calautti et al., 2001; Jennings et al., 2005). Though age-related decline in baseline perfusion occurs in the cuneal cortex (Tsvetanov et al., 2021), the brain regions showing modulation of behaviourally-relevant Cattell fMRI activity by perfusion levels did not include the cuneal cortex (Wu et al., 2021). This suggests that the compensatory effects in the cuneus are unlikely to be explained by age-related hypo-perfusion, consistent with the minimal effect here of adjusting for RSFA (Figure 2C). Overall then, as none of the explanations above agree with all aspects of the results, to functionally explain the role of the cuneal cortex in this task will require further investigation.”

      Point 11: [Recommendations for the authors] 6) The threshold choice for Bayesian log evidence (> 3) should be motivated in some more detail, rather than just pointing to a book reference, as there is no established convention in the field, the choice may depend on the type of data and/or analysis, and a sizeable part of the readership may not be deeply familiar with the particular Bayesian approach used here.

      Text is updated to further clarify our motivation for using the log evidence BF>3 criterion:

      Page-29

      “The outcome measure was the log evidence for each model (Morcom & Henson, 2018; Knights et al., 2021). To test whether activity from an ROI is compensatory, we used an ordinal boost measure (Morcom & Henson, 2018; Knights et al., 2021) to assess the contribution of that ROI for the decoding of task-relevant information (Figure 3B). Specifically, Bayesian model comparison assessed whether a model that contains activity patterns from a compensatory ROI and the MDN (i.e., a joint model) boosted the prediction of task-relevant information relative to a model containing the MDN only. The compensatory hypothesis predicts that the likelihood of a boost to model decoding will increase with older age. The dependent measure, for each participant, was a categorical recoding of the relative model evidence to indicate the outcome of the model comparison. The three possible outcomes were: a boost to model evidence for the joint vs. MDN-only model (difference in log evidence > 3), ambiguous evidence for the two models (difference in log evidence between -3 to 3), or a reduction in evidence for the joint vs. MDN-only model (difference in log evidence < -3).These values were selected because a log difference of three corresponds to a Bayes Factor of 20, which is generally considered strong evidence (Lee & Wagenmakers, 2014). Further, with uniform priors, this chosen criterion (Bayes Factor > 3) corresponds to a p-value of p<~.05 (since the natural logarithm of 20 equals three, as evidence for the alternative hypothesis).”

      Point 12: [Recommendations for the authors] 7) Adding page numbers would be helpful.

      Page numbers have been added to the manuscript file – apologies for this oversight.

      References

      Green, E., Bennett, H., Brayne, C., & Matthews, F. E. (2018). Exploring patterns of response across the lifespan: The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study. BMC Public Health18, 1-7.

      Knights, E., Mansfield, C., Tonin, D., Saada, J., Smith, F. W., & Rossit, S. (2021). Hand-selective visual regions represent how to grasp 3D tools: brain decoding during real actions. Journal of Neuroscience41(24), 5263-5273.

      Samu, D., Campbell, K. L., Tsvetanov, K. A., Shafto, M. A., & Tyler, L. K. (2017). Preserved cognitive functions with age are determined by domain-dependent shifts in network responsivity. Nature communications, 8(1), 14743.

      Shafto, M. A., Tyler, L. K., Dixon, M., Taylor, J. R., Rowe, J. B., Cusack, R., ... & Cam-CAN. (2014). The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) study protocol: a cross-sectional, lifespan, multidisciplinary examination of healthy cognitive ageing. BMC neurology14, 1-25.

      Wu, S., Tyler, L. K., Henson, R. N., Rowe, J. B., & Tsvetanov, K. A. (2023). Cerebral blood flow predicts multiple demand network activity and fluid intelligence across the adult lifespan. Neurobiology of aging121, 1-14.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors describe the construction of an extremely large-scale anatomical model of juvenile rat somatosensory cortex (excluding the barrel region), which extends earlier iterations of these models by expanding across multiple interconnected cortical areas. The models are constructed in such a way as to maintain biological detail from a granular scale - for example, individual cell morphologies are maintained, and synaptic connectivity is founded on anatomical contacts. The authors use this model to investigate a variety of properties, from cell-type specific targeting (where the model results are compared to findings from recent large-scale electron microscopy studies) to network metrics. The model is also intended to serve as a platform and resource for the community by being a foundation for simulations of neuronal circuit activity and for additional anatomical studies that rely on the detailed knowledge of cellular identity and connectivity.

      Strengths:

      As the authors point out, the combination of scale and granularity of their model is what makes this study valuable and unique. The comparisons with recent electron microscopy findings are some of the most compelling results presented in the study, showing that certain connectivity patterns can arise directly from the anatomical configuration, while other discrepancies highlight where more selective targeting rules (perhaps based on molecular cues) are likely employed. They also describe intriguing effects of cortical thickness and curvature on circuit connectivity and characterize the magnitude of those effects on different cortical layers.

      The detailed construction of the model is drawn on a wide range of data sources (cellular and synaptic density measures, neuronal morphologies, cellular composition measures, brain geometry, etc.) that are integrated together; other data sources are used for comparison and validation. This consolidation and comparison also represent a valuable contribution to the overall understanding of the modeled system.

      We thank the reviewer for the kind comments.

      Weaknesses:

      The scale of the model, which is a primary strength, also can carry some drawbacks. In order to integrate all the diverse data sources together, many specific decisions must be made about, for example, translating findings from different species or regions to the modeled system, or deciding which aspects of the system can be assumed to be the same and which should vary. All these decisions will have effects on the predicted results from the model, which could limit the types of conclusions that can be made (both by the others and by others in the community who may wish to use the model for their own work).

      We agree that this is a downside of the principle of biophysically detailed modeling that is best addressed by continuous refinement in collaboration with the community. We would like to once again invite any interested party to participate in this process.

      As an example, while it is interesting that broad brain geometry has effects on network structure (Figure 7), it is not clear how those effects are actually manifested. I am not sure if some of the effects could be due to the way the model is constructed - perhaps there may be limited sets of morphologies that fit into columns of particular thicknesses, and those morphologies may have certain idiosyncrasies that could produce different statistics of connectivities where they are heavily used. That may be true to biology, but it may also be somewhat artifactual if, for example, the only neurons in the library that fit into that particular part of the cortex differ from the typical neurons that are actually found in that region (but may not have been part of the morphological sampling).

      We agree that the limited pool of morphological reconstructions can lead to artifactual results in the way the reviewer pointed out. To investigate that hypothesis, we added a supplementary figure (S14) where we characterize (1): to what degree the morphological composition of a columnar subvolume reflects the overall composition of the model; and (2): The level of morphological diversity in each columnar subvolume. We discuss the results at the end of section 2.6. Briefly, while we cannot fully rule out the possibility of an artificial result, we found a high and virtually uniform level of morphological diversity in all columns and layers. This makes it unlikely that individual idiosyncratic morphologies strongly affect the local connectivity. However, we acknowledge that the minimum level of morphological diversity required is unknown. We believe that at this stage all we can do is characterize this and leave final interpretation to the reader.

      I also wonder how much the assumption that the layers have the same relative thicknesses everywhere in the cortex affects these findings, since layer thicknesses do in fact vary across the cortex.

      We agree that layer thickness variation would affect circuit properties. Variability of layer thickness can be split into two components: variability stemming from differences in total thickness, which our model covers, and variability of relative, i.e., normalized layer thickness, which we miss. In this region of cortex, though, data on the relative thickness of cortical layers is sparse. The Waxholm Atlas does not distinguish somatosensory cortical layers in its labels [Kleven et al, 2023]. Yusufoğulları (2015) compares layer thicknesses of rat hindlimb and barrel field regions. After normalization against total thickness, the relative difference increased towards the superficial layers from 0 in L6 to 33% in L1. Variability of normalized thicknesses within developed rat barrel cortex, based on layer boundaries reported in Narayanan et al. (2017) vary by 2% to 5% over approximately 2 mm. One major effect of such variability would be to scale the number of neurons in a given layer locally by the corresponding factors. For comparison, the resulting variability in neuron counts due to differences in conicality (Fig. 7D1) was around +-25%. A further effect of variable relative layer thickness would be its impact on the selection of suitable morphologies to be placed in the volume.

      In summary, adjustment of layer thickness is a refinement which should be done in future versions of the model, once more data is available. The discussion section has been updated to acknowledge this limitation. However, as outlined at the beginning of this point-by-point reply, we will not conduct such updates to the model in the context of this manuscript, as it describes the version of the model used for a number of follow-up studies.

      In addition, the complexity of the model means that some complicated analyses and decisions are only presented in this manuscript with perhaps a single panel and not much textual explanation. I find, for example, that the panels of Figure S2 seem to abstract or simplify many details to the point where I am not clear about what they are actually illustrating - how does Figure S2D represent the results of "the process illustrated in B"? Why are there abrupt changes in connectivity at region borders (shown as discontinuous colors), when dendrites and axons span those borders and so would imply interconnectivity across the borders? What do the histograms in E1 and E2 portray, and how are they related to each other?

      We apologize for the confusion. We have updated the figure caption of Figure S2 to better explain its contents.

      Overall, the model presented in this study represents an enormous amount of work and stands as a unique resource for the community, but also is made somewhat unwieldy for the community to employ due to the weight of its manifold specific construction decisions, size, and complexity.

      Reviewer #2 (Public Review):

      Summary:

      The authors build a colossal anatomical model of juvenile rat non-barrel primary somatosensory cortex, including inputs from the thalamus. This enhances past models by incorporating information on the shape of the cortex and estimated densities of various types of excitatory and inhibitory neurons across layers. This is intended to enable an analysis of the micro- and mesoscopic organisation of cortical connectivity and to be a base anatomical model for large-scale simulations of physiology.

      Strengths:

      • The authors incorporate many diverse data sources on morphology and connectivity.

      • This paper takes on the challenging task of linking micro- and mesoscale connectivity.

      • By building in the shape of the cortex, the authors were able to link cortical geometry to connectivity. In particular, they make an unexpected prediction that cortical conicality affects the modularity of local connectivity, which should be testable.

      • The author's analysis of the model led to the interesting prediction that layer 5 neurons connect local modules, which may be testable in the future, and provide a basis to link from detailed anatomy to functional computations.

      • The visualisation of the anatomy in various forms is excellent.

      • A subnetwork of the model is openly shared (but see question below).

      We thank the reviewer for their kind comments.

      Weaknesses:

      • Why was non-barrel S1 of the juvenile rat cortex selected as the target for this huge modelling effort? This is not explained.

      We have added an explanation of this decision to the third paragraph of the introduction.

      • There is no effort to determine how specific or generalisable the findings here are to other parts of the cortex. Although there is a link to physiological modelling in another paper, there is no clear pathway to go from this type of model to understand how the specific function of the modelled areas may emerge here (and not in other cortical areas).

      With respect to generality against specific findings, our philosophy is as follows: Despite the fact that most of our source data comes from juvenile rat somatosensory cortex, we also had to generalize many data sources across organisms, ages or regions. Hence, in this iteration we focused on investigating the general features of the (multi-region) mammalian cortex, e.g., high-order motifs, connected by L5 neurons across subregions or the effect of curvature on the connectivity. In the future, more specific data sources can be used to build diverging versions of the model, e.g. one for adult vs. juvenile rat. They can then be used to contrast the ages and focus on more specific findings. We already defined a number of structural metrics that can be used to contrast more specific versions of the model quantitatively.

      We now clarify this pathway to understanding more specific function in the last paragraph of the discussion.

      • In a few places the manuscript could be improved by being more specific in the language, for example:

      - "our anatomy-based approach has been shown to be powerful", I would prefer instead to read about specific contributions of past papers to the field, and how this builds on them.

      - similarly: "ensuring that the total number of synapses in a region-to-region pathway matches biology." Biology here is a loose term and implies too much confidence in the matching to some ground truth. Please instead describe the source of the data, including the type of experiment.

      We have removed or rewritten the mentioned parts. We now clarify that we work based on biological estimates from experiments and cite the experiment sources. We also provide brief descriptions of the types of data and how they were derived.

      • Some of the decisions seem a little ad-hoc, and the means to assess those decisions are not always available to the reader e.g.

      - pg. 10. "Based on these results, we decided that the local connectome sufficed to model connectivity within a region.". What is the basis for this decision? Can it be formalised?

      - "In the remaining layers the results of the objective classification were used to validate the class assignments of individual pyramidal cells. We found the objective classification to match the expert classification closely (i.e., for 80-90% of the morphologies). Consequently, we considered the expert classification to be sufficiently accurate to build the model." The description of the validation is a little informal. How many experts were there? What are their initials? Was inter-rater or intra-rater reliability assessed? What are these numbers? The match with Kanari's classification accuracy should be reported exactly. There are clearly experts among the author list, but we are all fallible without good controls in place, and they should be more explicit about those controls here, in my opinion.

      - "Morphology selection was then performed as previously (Markram et al., 2015), that is, a morphology was selected randomly from the top 10% scorers for a given position." A lot of the decisions seem a little ad-hoc, without justification other than this group had previously done the same thing. For example, why 10% here? Shouldn't this be based on selecting from all of the reasonable morphologies?

      We have clarified that the density of local connectivity is verified against the validation datasets by comparing the diagonals in Figure 4B, in addition to the quantification of Figure 4C.

      For the classification, we have now published a detailed preprint describing the objective confirmation of expert classification by a variety of methods (see Kanari et al. 2024 https://www.biorxiv.org/content/10.1101/2024.09.13.612635v1). We cannot include the full methodology in the current paper, due to its large extent. For the benefit of the reader, we have included the appropriate citation and extended the short description of the methodology. As described in this paper, the classification accuracy varies per layer, cell type, etc. We have now described in more details these results, that can be accessed in details in out preprint.

      • I would like to know if one of the key results relating to modularity and cortical geometry can be further explored. In particular, there seem to be sharp changes in the data at the end of the modelled cortical regions, which need to be explored or explained further.

      We now explore these results further in supplementary figure S15, which we discuss in the results Section 2.6.

      • The shape of the juvenile cortex - a key novelty of this work - was based on merely a scalar reduction of the adult cortex. This is very surprising, and surely an oversimplification. Huge efforts have gone into modelling the complex nonlinear development of the cortex, by teams including the developing Human Connectome Project. For such a fundamental aspect of this work, why isn't it possible to reconstruct the shape of this relatively small part of the juvenile rat cortex?

      We agree that a more complex approach should be used in the future. However, as outlined at the beginning of this point-by-point reply, we will not conduct such updates to the model in the context of this manuscript, as it describes the version of the model used for a number of follow-up studies.

      • The same relative laminar depths are used for all subregions. This will have a large impact on the model. However, relative laminar depths can change drastically across the cortex (see e.g. many papers by Palomero-Gallagher, Zilles, and colleagues). The authors should incorporate the real laminar depths, or, failing that, show evidence to show that the laminar depth differences across the subregions included in the model are negligible.

      This point has also been raised by reviewer #1 above. For convenience, we repeat our reply below.

      We agree that layer thickness variation would affect circuit properties. Variability of layer thickness can be split into two components: variability stemming from differences in total thickness, which our model covers, and variability of relative, i.e., normalized layer thickness, which we miss. In this region of cortex, though, data on the relative thickness of cortical layers is sparse. The Waxholm Atlas does not distinguish somatosensory cortical layers in its labels [Kleven et al, 2023]. Yusufoğulları (2015) compares layer thicknesses of rat hindlimb and barrel field regions. After normalization against total thickness, the relative difference increased towards the superficial layers from 0 in L6 to 33% in L1. Variability of normalized thicknesses within developed rat barrel cortex, based on layer boundaries reported in Narayanan et al. (2017) vary by 2% to 5% over approximately 2 mm. One major effect of such variability would be to scale the number of neurons in a given layer locally by the corresponding factors. For comparison, the resulting variability in neuron counts due to differences in conicality (Fig. 7D1) was around +-25%. A further effect of variable relative layer thickness would be its impact on the selection of suitable morphologies to be placed in the volume.

      In summary, adjustment of layer thickness is a refinement which should be done in future versions of the model, once more data is available. The discussion section has been updated to acknowledge this limitation. However, as outlined at the beginning of this point-by-point reply, we will not conduct such updates to the model in the context of this manuscript, as it describes the version of the model used for a number of follow-up studies.

      • The authors perform an affine mapping between mouse and rat cortex. This is again surprising. In human imaging, affine mappings are insufficient to map between two individual brains of the same species and nonlinear transformations are instead used. That an affine transformation should be considered sufficient to map between two different species is then very surprising. For some models, this may be fine, but there is a supposed emphasis here on biological precision in terms of anatomical location.

      We agree that this is a weakness that we will address in future revisions of the model.

      • One of the most interesting conclusions, that the connectivity pattern observed is in part due to cooperative synapse formation, is based on analyses that are unfortunately not shown.

      We originally decided not to show this part as we underestimated the interest in this particular result. We have now included the result in supplementary figure S10 and discuss the figure in the results.

      • Open code:

      - Why is only a subvolume available to the community?

      We have now made the entire model available under doi.org/10.7910/DVN/HISHXN. The Data and Code availability section has been updated to clarify this.

      - Live nature of the model. This is such a colossal model, and effort, that I worry that it may be quite difficult to update in light of new data. For example, how much person and computer time would it take to update the model to account for different layer sizes across subregions? Or to more precisely account for the shape of the juvenile rat cortex?

      To provide more information to people interested in participating in model refinements, we have added a new Figure 9. We discuss potential opportunities for refinement at the end of the discussion section.

      Reviewer #3 (Public Review):

      This manuscript reports a detailed model of the rat non-barrel somatosensory cortex, consisting of 4.2 million morphologically and biophysically detailed neuron models, arranged in space and connected according to highly sophisticated rules informed by diverse experimental data. Due to its breadth and sophistication, the model will undoubtedly be of interest to the community, and the reporting of anatomical details of modeling in this paper is important for understanding all the assumptions and procedures involved in constructing the model. While a useful contribution to this field, the model and the manuscript could be improved by employing data more directly and comparing simple features of the model's connectivity - in particular, connection probabilities - with relevant experimental data.

      The manuscript is well-written overall but contains a substantial number of confusing or unclear statements, and some important information is not provided.

      Below, major concerns are listed, followed by more specific but still important issues.

      Major issues

      (1) Cortical connectivity.

      Section 2.3, "Local, mid-range and extrinsic connectivity modeled separately", and Figure 4: I am confused about what is done here and why. The authors have target data for connectivity (Figure 4B1). But then they use an apposition-based algorithm that results in connectivity that is quite different from the data (Figure 4B2, C). They then use a correction based on the data (Figure 4E) to arrive at a more realistic connectivity. Why not set the connectivity based on the data right away then? That would seem like a more straightforward approach.

      We have completely re-written our description and discussion of connectivity in the model. We now more explicitly motivate our connectivity modeling choices in the first paragraph of section 2.3 of the results and in the second paragraph of the discussion.

      The same comment applies to Section 2.4., "Specificity of axonal targeting": the distributions of synapses on different types of target cell compartments were not well captured by the original model based on axon-dendrite overlap and pruning, so the authors introduced further pruning to match data specificity. While details of this process and what worked and what didn't may be interesting to some, overall it is not surprising, as it has been well known that cell types exhibit connectivity that is much more specific than "Peters rule" or its simple variations. The question is, since one has the data, why not use the data in the first place to set up the connectivity, instead of using the convoluted process of employing axon-dendrite overlap followed by multiple corrections?

      We would like to point out that we are not employing “Peters rule”, we now make this explicit in the revision in the first paragraph of section 2.3 of the results. Furthermore, we would argue that the match to the Motta et al. data indicates that our approach is more than just a “simple variation”. Finally, we believe that there is important insight in: 1. The specific ways in which the algorithm had to be changed to match the Schneider-Mizell data, e.g. that the connectivity of SST positive neurons did not have to be adapted at all. 2. That the specificity of the other two types could still be matched by a selection of a subset of axonal appositions (i.e., of potential synapses).

      Most importantly, what is missing from the whole paper is the characterization of connection probabilities, at least for the local circuit within one area. Such connection probabilities can be obtained from the data that the authors already use here, such as the MICRONS dataset. Another good source of such data is Campagnola et al., Science, 2022. Both datasets are for mouse V1, but they provide a comprehensive characterization across all cortical layers, thus offering a good benchmark for comparison of the model with the data. It would be important for the authors to show how connection probabilities realized in their model for different cell types compared to these data.

      We now report connection probabilities in the reworked figure 4 and compare them to reported connection probabilities from many different sources and labs in supplementary figure S8. We prefer a comparison to a wide range of sources to relying on a single report.

      (2) Section 2.5, "Structure of thalamic inputs" and Figure 6.

      The text in section 2.5 should provide more details on what was done - namely, that the thalamic axons were generated based on the axon density profiles and then synapses were established based on their overall with cortical dendrites. Figure S10 where the target axon densities from data and the model axon densities are compared is not even mentioned here. Now, Figure S10 only shows that the axon densities were generated in a way that matches the data reasonably well. However, how can we know that it results in connectivity that agrees with data? Are there data sources that can be used for that purpose? For example, the authors show that in their model "the peaks of the mean number of thalamic inputs per neuron occur at lower depths than the peaks of the synaptic density". Is this prediction of the model consistent with any available data?

      Most importantly, the authors should show how the different cell types in their model are targeted by the thalamic inputs in each layer. Experimental studies have been done suggesting specificity in targeting of interneuron types by thalamic axons, such as PV cells being targeted strongly whereas SST and VIP cells being targeted less.

      We have updated the Results section to provide context for the thalamic axon placement, and referred the reader to the methods for more detail. A reference to Figure S10 has now been added to this section as well.

      As for validations of the structure of the thalamo-cortical inputs: We found that the existing literature on the topic, such as Cruikshank et al., 2007, 2010 and more recently Sermet et al., 2019, is predominately on the physiological strengths of the pathways. We acknowledge that the authors provide compelling arguments that their findings are likely partially due to differences in the anatomical innervation strengths. On the other hand, Sporns, 2013 cautioned against mixing up structural and functional connectivity. Overall, we believe that it is simply cleaner to perform this validation in the accompanying manuscript (“Part II: Physiology and Experimentation”), using the full physiological model. Note that we have actually performed that validation in the manuscript (see preprint under the following doi: 10.1101/2023.05.17.541168, Figure 3H1).

      Note that a higher physiological strength onto PV+ neurons is observed.

      (3) "We have therefore made not only the model but also most of our tool chain openly available to the public (Figure 1; step 7)."

      In fact it is not the whole model that is made publicly available, but only about 5% of it (211,000 out of 4,200,000 neurons). Also, why is "most" of the tool chain made openly available, and not the whole tool chain?

      We have now made the entire model available under doi.org/10.7910/DVN/HISHXN. This has also been added to the Key resource table.

      With regard to the tool chain, everything is on our public github (https://github.com/BlueBrain/) except for the algorithm for detecting axonal appositions. For that tool there are currently unresolved potential copyright issues with former collaboration partners. We are working to resolve them.

      Other issues

      "At each soma location, a reconstruction of the corresponding m-type was chosen based on the size and shape of its dendritic and axonal trees (Figure S6). Additionally, it was rotated to according to the orientation towards the cortical surface at that point."

      After this procedure, were cells additionally rotated around the white matter-pia axis? If yes, then how much and randomly or not? If not, then why not? Such rotations would seem important because otherwise additional order potentially not present in the real cortex is introduced in the model affecting connectivity and possibly also in vivo physiology (such as the dynamics of the extracellular electric field).

      They are indeed additionally randomly rotated. We have clarified this in the revision.

      The term "new in vivo reconstructions" for the 58 neurons used in this paper in addition to "in vitro reconstructions" is a misnomer. It is not straightforward to see where the procedure is described, but then one finds that the part of Methods that describes experimental manipulations is mostly about that (so, a clearer pointer to that part of Methods could be useful). However, the description in Methods makes it clear that it is only labeling that is done in vivo; the microscopy and reconstruction are done subsequently in vitro. I would recommend changing the terminology here, as it is confusing. Also, can the authors show reconstructions of these neurons in the supplementary figures? Is the reconstruction shown in Figure 4A representative?

      The term is used because the staining is done in vivo. To the best of our knowledge, the reconstruction process cannot be performed in vivo. However, to avoid any confusion we modified the text to clarify this distinction to in-vivo stained.

      With respect to the reconstruction in Figure 4: The intent of the panel is to demonstrate the concept of targeted long-range axons that our morphologies are missing, necessitating the use of a second algorithm for longer-range connectivity. As such, it is not one of the reconstructions we used, but one of Janelia MouseLight. While we mentioned MouseLight in the figure caption, we formulated it in a way that could be misunderstood to mean that we merely used the MouseLight browser to render one of our morphologies. We apologize for the confusion, and we have fixed the figure caption.

      In this revision we have added exemplars of representative morphology reconstructions (in slice stained and in vivo stained) in a new supplementary figure, as requested (Figure S5). It is referenced in the last paragraph of section 2.1.

      In the Discussion, "This was taken into account during the modeling of the anatomical composition, e.g. by using three-dimensional, layer-specific neuron density profiles that match biological measurements, and by ensuring the biologically correct orientation of model neurons with respect to the orientation towards the cortical surface. As local connectivity was derived from axo-dendritic appositions in the anatomical model, it was strongly affected by these aspects.

      However, this approach alone was insufficient at the large spatial scale of the model, as it was limited to connections at distances below 1000μm."

      As mentioned above, it is not clear that this approach was sufficient for local connectivity either. It would be great if the authors showed a systematic comparison of local connection probabilities between different cell types in their model with experimental data and commented here in the Discussion about how well the model agrees with the data.

      As mentioned in the reply to a previous comment, we now report connection probabilities.

      In the Discussion: "The combined connectome therefore captures important correlations at that level, such as slender-tufted layer 5 PCs sending strong non-local cortico-cortical connections, but thick-tufted layer 5 PCs not." (Also the corresponding findings in Results.)

      If I understand this statement correctly, it may not agree with biological data. See analysis from MICRONS dataset in Bodor et al., https://www.biorxiv.org/content/10.1101/2023.10.18.562531v1.

      Our statement was indeed misleading and formulated too strongly. While thick-tufted pyramidal cells do form long-range intra-cortical connections, the structural strength of these pathways is weaker than for slender-tufted PCs, which are associated with the IT (intra-telencephalic) projection type. We have made this clear in the revision.

      Table 2 is confusing. What do pluses and minuses mean? What does it mean that some entries have two pluses? This table is not mentioned anywhere else in the text. If pluses mean some meaningful predictions of the model, then their distribution in the table seems quite liberal and arbitrary. It is not clear to me that the model makes that many predictions, especially for type-specificity and plasticity. Also, why is the hippocampus mentioned in this table? I don't see anything about the hippocampus anywhere else in the paper.

      We have clarified the description of the table in its caption and removed references to hippocampus, which were left from an earlier draft of the paper.

      In the Discussion, "Thus, we made the tools to improve our model also openly available (see Data and Code availability section)."

      As mentioned before, the authors themselves write that they made "most of our tool chain openly available to the public", but not all of it.

      With regard to the tool chain, everything is on our public github (https://github.com/BlueBrain/) except for the algorithm for detecting axonal appositions. For that tool there are currently unresolved potential copyright issues with former collaboration partners. We are working to resolve them.

      Table S2 has multiple question marks. It is not clear whether the "predictions" listed in that table are truly well-thought-out and/or whether experimental confirmations are real.

      Some of the citations in that table were broken due to technical difficulties with the citation manager used. We apologize and have fixed this in the revision.

      Introduction: It would be quite appropriate to cite here Einevoll et al., Neuron, 2019 ("The Scientific Case for Brain Simulations").

      We now reference this important work.

      Recommendations for the authors:

      Reviewing Editor's note:

      Consultation with the reviewers highlighted three main issues: the integration of connection probability profiles, non-uniform cortical thickness, and the overall organization of the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Apart from the points discussed in the public review, my main concern is that the manuscript itself is not as tightly constructed as it should be, to the detriment of the reader's ability to understand the model itself and the conclusions from the presented analyses.

      There are places where the text references seemingly incorrect figure panels or refers to panels that don't exist:

      - Section 2.2, first paragraph - refers to Figure 2D, E but those panels do not exist in Figure 2.

      - Section 2.2, second paragraph - refers to Figure 3D3 - perhaps it should be 3B3?

      - Section 2.8, first paragraph - has no figure references but seems like it should be referring to parts of Figure 8 (perhaps Figure 8B1 specifically?)

      - Is the reference to Figure S11A on page 16 supposed to be to S12A?

      In other places, figure labels and descriptions are not clear, and terminology is not always well-defined or explained.

      - Figure 8 and the associated section 2.8 are very difficult to draw conclusions from as presented - several of the terms used are opaque and not clearly defined in the text or legends. I could not easily infer how the normalization works for the "normalized node participation per layer", or what "position in simplex" means for "unique neurons in core", and what their "relative counts" are relative to.

      - Are "targets" in Figure S12A the same as "sinks"? If so, it would be better to use a single term consistently throughout.

      - Figure S12 - figures in part B do not have enough labels to interpret - what is the y-axis of the "rich-club analysis" graph? Also, the figures in part B bottom are labeled "long-range" rather than "mid-range" connections.

      In general, I found the use of both letters and numbers for figure panels (e.g. Figure 7E1) more confusing than helpful - it didn't seem like panels with the same letter were visually grouped consistently, and it sometimes made it more difficult to follow the flow of a figure. I would recommend using only letters in nearly every case here.

      We thank the reviewer for directing our attention to these issues. We have fixed them in the revision. However, we have decided to keep our original panel numbering scheme. Panels with the same letter are meant to be conceptually grouped as they address related or similar measures.

      Other minor points:

      - Section 2.4 - paragraph 2 - sentence 5 "inhbititory" -> "inhibitory".

      - Figure 5B figure legend - references Schneider-Mizell et al. 2023 but probably should be Motta et al. 2019?

      - Figure 5C - figure key "expcected" -> "expected".

      - The lower part of Figure 7C looks like it belongs to panel D2 instead of panel C due to relative spacing.

      We once again thank the reviewer, and we have fixed the listed issues.

      Reviewer #2 (Recommendations For The Authors):

      (1) Abstract:

      - Is it really 'integrating whole brain-scale data'? This seems a bit misleading.

      - "We delineated the limits of determining connectivity from anatomy" - here I think you mean determining connectivity from morphology, or dendrite/axon appositions. Electron microscopy is still anatomy and presumably would be much closer to function.

      We originally used the term “anatomy” as connectivity depends on the correct placement of neurons in addition to their morphology. However, as the reviewer points out, this term is misleading as it would encompass electron microscopy, which can go beyond what we do with the model. We have updated the text to read “morphology and placement”.

      (2) Introduction:

      "Investigating the multi-scale interactions that shape perception requires a model of multiple cortical subregions with inter-region connectivity, but it also requires the subcellular resolution provided by a morphologically detailed model." - This statement, as written, is not true in my opinion. You can argue for the value of morphologically-detailed neuron models to the study of perception, but they are not required for the investigation of perception.

      We have updated the text to be clearer: subcellular resolution is only required for certain aspects that are related to perception.

      (3) Results:

      - Pg. 9/10. There are three sentences in a row that are of the style: "ensuring that the total number of synapses in a region-to-region pathway matches biology." Biology here is a loose term and implies too much confidence in the matching to some ground truth. Please instead describe the source of the data, including the type of experiment here already. o Pg. 10. On the first read, I found it quite hard to follow what exactly was done in Figure 4.

      What are the target values adapted from Reimann et al., 2019, for example?

      - Pg. 10. "Based on these results, we decided that the local connectome sufficed to model connectivity within a region.". What is the basis for this decision? Can it be formalised? o Pg. 16, Figure 7 B-C. The apparent effect of geometry on modularity is potentially very interesting. However, are the sharp drop-offs in values for modularity (but also conicality and height) true, or are some artefacts due to columns at the edges of the sampled area?

      We have discussed these points above in the general comments and strengths and weaknesses.

      - Pg. 18. Simplicial cores define central subnetworks, tied together by mid-range connections. This work, in particular leading to the conclusion of the layer 5 highway hubs, stands out as being a successful attempt to simplify the highly detailed model to a degree that it generates useable new understanding.

      We thank the reviewer for the kind comment.

      (4) Figures:

      Figure 2: The caption doesn't seem to match the Figure (e.g. there are no brain regions depicted in A). o Figure 4f. This is a key panel, but is squished into a small corner of Figure 4, and therefore hard-to-read.

      We have fixed this in the revision.

      Reviewer #3 (Recommendations For The Authors):

      In Major comments, point (1) discusses the issue of connectivity known from data. For all the aspects of connectivity mentioned there, I would recommend the authors re-build their model using the connectivity data directly. It would be interesting to test whether a model constructed in such a way would have any difference in simulated neural activity relative to the model they have constructed.

      This is indeed a very interesting avenue of research. However, we believe that it is best conducted in separate manuscripts. First, in Pokorny et al., 2024 (https://doi.org/10.1101/2024.05.24.593860) we conduct this investigation, comparing the emerging activity in the model to the one for simpler connectivity models. Additionally, in Egas-Santander et al., 2024 (https://www.biorxiv.org/content/10.1101/2024.03.15.585196v3) we found that simpler connectomes lead to less reliable spiking activity globally. Finally, in the accompanying manuscript (https://www.biorxiv.org/content/10.1101/2023.05.17.541168v5) we compare activity with and without the targeting specificity of Schneider-Mizell et al.

      In Major comments, point (2) discusses thalamic inputs. I would recommend the authors to address the issues mentioned there.

      We have replied to those comments above.

      In addition, panels F and G of Figure 6 are mentioned in the caption but are not shown in the figure. In panel B, the choice of visualization is strange. It would make sense to show box plots for all the data instead of bars for mean values and points for randomly selected 50 cells. Panels E1 and E2 lack units.

      We have removed mentions of panels F and G and changed the style of plot. Units for E1 and E2 are now explained in the figure caption.

      In Major comments, point (3) touches upon model and tool sharing. I would recommend making such statements more accurate and reflecting what exactly is provided to the community since not everything is shared.

      We have now made the entire model available under doi.org/10.7910/DVN/HISHXN.

      With regard to the tool chain, everything is on our public github (https://github.com/BlueBrain/) except for the algorithm for detecting axonal appositions. For that tool there are currently unresolved potential copyright issues with former collaboration partners. We are working to resolve them.

      I would recommend the authors address all the other points mentioned in the public review as well. In addition, below are some smaller issues that should be fixed.

      Figure 2: the caption appears to be partially wrong and partially misassigned to the figure panels.

      We fixed the issue.

      Also, note that in L6 the types L6_TPC:A and L6_TPC:C are listed in the figure, but L6_TPC:B is not mentioned.

      There is indeed no TPC:B type in layer 6. The distinction between TPC:A and TPC:B is based on early or late bifurcations of the apical dendrite and is only observed in layer 5.

      Figure 3, panel B2: the caption refers to colors in panel (C), but the authors probably meant to refer to panel (A).

      We fixed the issue.

      "The placement of morphological reconstructions matched expectation, showing an appropriately layered structure with only small parts of neurites leaving the modeled volume (Figure 2D, E)."

      Figure 2 does not have panels D and E.

      "The volume was clearly dominated by dendrites, filling between 23% and 47% of the space, compared to 2% to 11% for axons (Figure 3D3)." There is no panel D or D3 in Figure 3.

      "Recently, the MICrONS dataset (MICrONS-Consortium et al., 2021) has been analyzed with respect to the axonal targeting of inhibitory subtypes in a 100 x 100 μm subvolume spanning all layers (Schneider-Mizell et al., 2023)."

      100 x 100 μm is an area (and should be 100 x 100 μm^2), not a volume.

      Figure S11B requires a legend for the color map.

      We fixed the issues.

      Table S1: What is the difference between L6_BP and L6_BPC? They both are referred to as L6 bipolar cells.

      We have changed the description of L6_BPC to “Layer 6 bitufted pyramidal cell”.

    1. Author response:

      Reviewer #1:

      We sincerely thank you for your thoughtful review and constructive comments on our work and we appreciate your positive assessment of our study’s innovative design, which allows for improved observation of 3D cell spheroids from an additional lateral view. Your comments underscore the importance of our approach in advancing methods for investigating cell behaviors in tumor organoid studies.

      In response to your suggestions, we will first add a detailed image of the ‘First surface mirror’ in Fig. 1 to provide a reference for readers and other researchers, thereby facilitating broader use of this method in similar observations. Regarding the suitable sample sizes for this device, as the spheroid sizes are relatively small compared to the mirror and culture dish, we have been able to image samples up to 5 mm in height, which provides ample capacity for most spheroids under 1 mm. We will include additional experiments and explanations in the manuscript to clarify this further.

      Concerning the ring-shaped seeding pattern of spheroids, we have conducted extensive culture experiments to optimize this method. The agarose microwells-based method has proven to be highly tolerant of variations. Within these microwells, cells have a propensity to self-aggregate, leading to the formation of spheroid structures. We will add a discussion in the revised manuscript to address this issue.

      Lastly, this device can accommodate the fluorescence imaging of 3D spheroid samples. We will supplement the discussion with a schematic illustrating the principles of fluorescence imaging using this device, providing a foundation for future work in this area. We will also regarding language improvements to enhance the overall quality of the manuscript.

      Thank you once again for your valuable insights, which have greatly contributed to the strengthening of our manuscript.

      Reviewer #2:

      We sincerely thank you for your detailed and supportive review of our manuscript. Your recognition of our system’s capabilities for in situ observation of 3D structures along multiple axes, as well as its potential applications in studying therapeutic effects, is highly encouraging. Your comments on the advantages of this system for analyzing cell migration, morphological changes, and responses to therapeutic agents are especially appreciated.

      Thank you again for your thoughtful feedback and for highlighting the contributions of our work. Your insights have been invaluable in refining the focus and clarity of our study, and we hope that our revisions meet your expectations.

    1. Author response:

      Public reviews:

      Reviewer #1:

      Epigenetic regulation complex (PRC2) is essential for neural crest specification, and its misregulation has been shown to cause severe craniofacial defects. This study shows that Eed, a core PRC2 component, is critical for craniofacial osteoblast differentiation and mesenchymal proliferation after neural crest induction. Using mouse genetics and single-cell RNA sequencing, the researcher found that conditional knockout of Eed leads to significant craniofacial hypoplasia, impaired osteogenesis, and reduced proliferation of mesenchymal cells in post-migratory neural crest populations.

      Overall, the study is superficial and descriptive. No in-depth mechanism was analyzed and the phenotype analysis is not comprehensive.

      We thank the reviewer for sharing their expertise and for taking the time to provide a helpful suggestion to improve our study. We are gratified that the striking phenotypes we report from Eed loss in post-migratory neural crest craniofacial tissues were appreciated. The breadth and depth of our phenotyping techniques, including skeletal staining, micro-CT, echocardiogram, immunofluorescence, histology, and unbiased single-cell gene expression analysis, provide comprehensive data in support our conclusion that PRC2 is required for craniofacial osteoblast differentiation. We hypothesize that epigenetic regulation of chromatin accessibility downstream of PRC2 activity is the molecular mechanism that underlies these phenotypes. To test this hypothesis in our revision, we are using CUT&Tag to profile H3K27me3 epigenetic modifications genome-wide and at the loci encoding the differentially expressed genes revealed by our single-cell transcriptomics in developing craniofacial structures. We anticipate that these experiments will reveal an epigenetic mechanism underlying the phenotypes we report from Eed loss in post-migratory neural crest craniofacial tissues.

      Reviewer #2:

      Summary:The role of PRC2 in post-neural crest induction was not well understood. This work developed an elegant mouse genetic system to conditionally deplete EED upon SOX10 activation. Substantial developmental defects were identified for craniofacial and bone development. The authors also performed extensive single-cell RNA sequencing to analyze differentiation gene expression changes upon conditional EED disruption.

      Strengths:

      (1) Elegant genetic system to ablate EED post neural crest induction.

      (2) Single-cell RNA-seq analysis is extremely suitable for studying the cell type-specific gene expression changes in developmental systems.

      We thank the reviewer for their generous and helpful comments on our study. We are pleased that our mouse genetic and single-cell RNA sequencing approaches were appropriate in pairing the craniofacial phenotypes we report with distinct gene expression changes in post-migratory neural crest tissues upon Eed deletion.

      Weaknesses:

      (1) Although this study is well designed and contains state-of-the-art single-cell RNA-seq analysis, it lacks the mechanistic depth in the EED/PRC2-mediated epigenetic repression. This is largely because no epigenomic data was shown.

      Thank you for this suggestion. As described in response to Reviewer #1, we will include H2K27me3 CUT&Tag data in craniofacial tissue harvested from E12.5 and E16.5 Sox10-Cretg+ Eedfl/fl and Sox10-Cretg+ Eedfl/wt  embryos in our revision. Our analyses will including genome-wide and targeted metaplot visualizations across genotypes and developmental timepoints and assess how H3K27me3 occupancy relates to gene expression changes in our single-cell RNA sequencing data.

      (2) The mouse model of conditional loss of EZH2 in neural crest has been previously reported, as the authors pointed out in the discussion. What is novel in this study to disrupt EED? Perhaps a more detailed comparison of the two mouse models would be beneficial.

      We acknowledge the study the reviewer has indicated (Schwarz et al. Development 2014). This elegant investigation uses Wnt1-Cre to delete Ezh2 and found a similar phenotype to ours in the form of catastrophic craniofacial hypoplasia. We sought to add depth to the study of PRC2’s vital role in neural crest development by ablating Eed, which has a unique function in the PRC2 complex by binding to H3K27me3 and allosterically activating Ezh2. In this sense, we sought to test if phenotypes arising from deletion of Eed, the PRC2 “reader”, differ from phenotypes arising from deletion of Ezh2, the PRC2 “writer”, in neural crest derived tissues. Due to limitations associated with the Wnt1-Cre transgene (Lewis et al. Developmental Biology 2013), we used the Sox10-Cre allele which targets the migratory neural crest and is completely recombined by E10.5, instead of Wnt1-Cre which targets pre-migratory neural crest cells. A more detailed comparison of these mouse models will be included in the Discussion section of our revised manuscript, and we thank the reviewer for this thoughtful suggestion.

      (3) The presentation of the single-cell RNA-seq data may need improvement. The complexity of the many cell types blurs the importance of which cell types are affected the most by EED disruption.

      We agree with the reviewer’s critique of the scRNA-seq data presentation. Because Sox10+ cells were not sorted (via FACS, for example) from craniofacial tissues before single-cell RNA sequencing, we identified a breath of cell types in UMAP space unrelated to epigenetic disruption of neural crest derived tissues. We will include subcluster visualization plots in the figures of our revised manuscript to highlight specific changes in clusters, such as osteoblasts and mesenchymal stem cells, that arise from Eed loss in post-migratory neural crest craniofacial tissues.

      (4) While it's easy to identify PRC2/EED target genes using published epigenomic data, it would be nice to tease out the direct versus indirect effects in the gene expression changes (e.g Figure 4e).

      We agree with the reviewer that our single-cell RNA sequencing data do not provide insight into direct versus indirect changes in gene expression downstream of PRC2. We hope that the aforementioned CUT&Tag experiment will provide the necessary mechanistic insight into H3K27me3 occupancy and direct effects on gene expression resulting from PRC2 inactivation in our mouse model.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      One of the roadblocks in PfEMP1 research has been the challenges in manipulating var genes to incorporate markers to allow the transport of this protein to be tracked and to investigate the interactions taking place within the infected erythrocyte. In addition, the ability of Plasmodium falciparum to switch to different PfEMP1 variants during in vitro culture has complicated studies due to parasite populations drifting from the original (manipulated) var gene expression. Cronshagen et al have provided a useful system with which they demonstrate the ability to integrate a selectable drug marker into several different var genes that allows the PfEMP1 variant expression to be 'fixed'. This on its own represents a useful addition to the molecular toolbox and the range of var genes that have been modified suggests that the system will have broad application. As well as incorporating a selectable marker, the authors have also used selective linked integration (SLI) to introduce markers to track the transport of PfEMP1, investigate the route of transport, and probe interactions with PfEMP1 proteins in the infected host cell.

      What I particularly like about this paper is that the authors have not only put together what appears to be a largely robust system for further functional studies, but they have used it to produce a range of interesting findings including:

      - Co-activation of rif and var genes when in a head-to-head orientation.

      - The reduced control of expression of var genes in the 3D7-MEED parasite line.

      - More support for the PTEX transport route for PfEMP1.

      - Identification of new proteins involved in PfEMP1 interactions in the infected erythrocyte, including some required for cytoadherence.

      In most cases the experimental evidence is straightforward, and the data support the conclusions strongly. The authors have been very careful in the depth of their investigation, and where unexpected results have been obtained, they have looked carefully at why these have occurred.

      (1) In terms of incorporating a drug marker to drive mono-variant expression, the authors show that they can manipulate a range of var genes in two parasite lines (3D7 and IT4), producing around 90% expression of the targeted PfEMP1. Removal of drug selection produces the expected 'drift' in variant types being expressed. The exceptions to this are the 3D7-MEED line, which looks to be an interesting starting point to understand why this variant appears to have impaired mutually exclusive var gene expression and the EPCR-binding IT4var19 line. This latter finding was unexpected and the modified construct required several rounds of panning to produce parasites expressing the targeted PfEMP1 and bind to EPCR. The authors identified a PTP3 deficiency as the cause of the lack of PfEMP1 expression, which is an interesting finding in itself but potentially worrying for future studies. What was not clear was whether the selected IT4var19 line retained specific PfEMP1 expression once receptor panning was removed.

      This is a very interesting point. We do not have systematic long-term data for the Var19 line but medium-term data. After panning the Var19 line, the binding assays were done within 3 months without additional panning. The first binding assay was 2 months after the panning and the last binding assays three weeks later. While there is inherent variation in these assays that precludes detection of smaller changes, the last assay showed the highest level of binding, giving no indication for rapid loss of the binding phenotype. Hence, we can say that the binding phenotype appears to be stable for many weeks without panning the cells again and there was no indication for a rapid loss of binding in these parasites.

      Systematic long-term experiments to assess how long the Var19 parasites retain binding would be interesting, but given that the binding-phenotype appears to remain stable over many weeks, this would only make sense if done for a much longer time (6 months or more). Due to the time needed to carry out such an experiment this would not be practical to still include into the present study. But this might be advisable if the Var19 line is used in future experiments that go over extended periods of time. We intend to include a statement in the discussion of the revised manuscript to highlight that if long-term work with this line is planned, monitoring the binding phenotype and potentially re-panning might be advisable.

      (2) The transport studies using the mDHFR constructs were quite complicated to understand but were explained very clearly in the text with good logical reasoning.

      We are aware of this being a complex issue and are glad this was nevertheless understandable.

      (3) By introducing a second SLI system, the authors have been able to alter other genes thought to be involved in PfEMP1 biology, particularly transport. An example of this is the inactivation of PTP1, which causes a loss of binding to CD36 and ICAM-1. It would have been helpful to have more insight into the interpretation of the IFAs as the anti-SBP1 staining in Figure 5D (PTP-TGD) looks similar to that shown in Figure 1C, which has PTP intact. The anti-EXP2 results are clearly different.

      We realize the description of the PTP1-TGD IFA data and that of the other TGDs was rather cursory. We intend to amend this in the revision.

      (4) It is good to see the validation of PfEMP1 expression includes binding to several relevant receptors. The data presented use CHO-GFP as a negative control, which is relevant, but it would have been good to also see the use of receptor mAbs to indicate specific adhesion patterns. The CHO system if fine for expression validation studies, but due to the high levels of receptor expression on these cells, moving to the use of microvascular endothelial cells would be advisable. This may explain the unexpected ICAM-1 binding seen with the panned IT4var19 line.

      We agree with the reviewer that it is desirable to have better binding systems for studying individual binding interactions. As the main purpose of this paper was to introduce the system and show binding, we did not move to more complicated binding systems. However, we would like to point out that the CSA binding was done on receptor alone in addition to the CSA-expressing HBEC-5i cells and was competed successfully with soluble CSA. In addition, apart from the additional ICAM1-binding of the Var19 line, all binding phenotypes were conform with expectations. We therefore hope the tools used for binding studies are acceptable at this stage of introducing the system while future work interested in specific PfEMP1 receptor interactions are advised to use better systems, ideally including also endothelial organoid models, inhibitory antibodies and possibly domain competition. We intend to add a sentence to the discussion highlighting that future work using this system to study individual receptor-interactions could benefit from using optimized binding systems.

      (5) The proxiome work is very interesting and has identified new leads for proteins interacting with PfEMP1, as well as suggesting that KAHRP is not one of these. The reduced expression seen with BirA* in position 3 is a little concerning but there appears to be sufficient expression to allow interactions to be identified with this construct. The quantitative impact of reduced expression for proxiome experiments will clearly require further work to define it.

      This is a valid point. Clearly there seems to be some impact on binding when BirA* is placed in the extracellular domain (either through reduced presentation or direct reduction of binding efficiency of the modified PfEMP1). The exact impact on the proxiome is indeed difficult to assess. However, we hope that the general coverage of proteins proximal to PfEMP1 with the 3 PfEMP1-BirA* constructs will aid in the identification of proteins involved in PfEMP1 transport and surface display as illustrated with two of the hits targeted here.

      (6) The reduced receptor binding results from the TryThrA and EMPIC3 knockouts were very interesting, particularly as both still display PfEMP1 on the surface of the infected erythrocyte. While care needs to be taken in cross-referencing adhesion work in P. berghei and whether the machinery truly is functionally orthologous, it is a fair point to make in the discussion. The suggestion that interacting proteins may influence the "correct presentation of PfEMP1" is intriguing and I look forward to further work on this.

      We hope we future work will be able to shed light on this.

      Overall, the authors have produced a useful and reasonably robust system to support functional studies on PfEMP1, which may provide a platform for future studies manipulating the domain content in the exon 1 portion of var genes. They have used this system to produce a range of interesting findings and to support its use by the research community.<br /> Finally, a small concern. Being able to select specific var gene switches using drug markers could provide some useful starting points to understand how switching happens in P. falciparum. However, our trypanosome colleagues might remind us that forcing switches may show us some mechanisms but perhaps not all.

      Point noted! From non-systematic data with the Var01 line that has been cultured for extended periods of time (several years), it seems other non-targeted vars remain silent in our SLI “activation” lines but how much SLI-based var-expression “fixing” tampers with the integrity of natural switching mechanisms is indeed very difficult to gage at this stage. We intend to add a statement to the manuscript that even if mutually exclusive expression is maintained, it is not certain the mechanisms controlling var expression all remain intact.

      Reviewer #2 (Public review):

      Summary

      Croshagen et al develop a range of tools based on selection-linked integration (SLI) to study PfEMP1 function in P. falciparum. PfEMP1 is encoded by a family of ~60 var genes subject to mutually exclusive expression. Switching expression between different family members can modify the binding properties of the infected erythrocyte while avoiding the adaptive immune response. Although critical to parasite survival and Malaria disease pathology, PfEMP1 proteins are difficult to study owing to their large size and variable expression between parasites within the same population. The SLI approach previously developed by this group for genetic modification of P. falciparum is employed here to selectively and stably activate the expression of target var genes at the population level. Using this strategy, the binding properties of specific PfEMP1 variants were measured for several distinct var genes with a novel semi-automated pipeline to increase throughput and reduce bias. Activation of similar var genes in both the common lab strain 3D7 and the cytoadhesion competent FCR3/IT4 strain revealed higher binding for several PfEMP1 IT4 variants with distinct receptors, indicating this strain provides a superior background for studying PfEMP1 binding. SLI also enables modifications to target var gene products to study PfEMP1 trafficking and identify interacting partners by proximity-labeling proteomics, revealing two novel exported proteins required for cytoadherence. Overall, the data demonstrate a range of SLI-based approaches for studying PfEMP1 that will be broadly useful for understanding the basis for cytoadhesion and parasite virulence.

      Comments

      (1) While the capability of SLI to actively select var gene expression was initially reported by Omelianczyk et al., the present study greatly expands the utility of this approach. Several distinct var genes are activated in two different P. falciparum strains and shown to modify the binding properties of infected RBCs to distinct endothelial receptors; development of SLI2 enables multiple SLI modifications in the same parasite line; SLI is used to modify target var genes to study PfEMP1 trafficking and determine PfEMP1 interactomes with BioID. Curiously, Omelianczyk et al activated a single var (Pf3D7_0421300) and observed elevated expression of an adjacent var arranged in a head-to-tail manner, possibly resulting from local chromatin modifications enabling expression of the neighboring gene. In contrast, the present study observed activation of neighboring genes with head-to-head but not head-to-tail arrangement, which may be the result of shared promoter regions. The reason for these differing results is unclear although it should be noted that the two studies examined different var loci.

      The point that we are looking at different loci is very valid and we realize this is not mentioned in the discussion. In the revision we intend to add this as a possible reason for this discrepancy. As stated in the discussion, the head-to-head scenario was observed before in lines obtained with panning. However, given the rather few examples where this was analyzed, it is well possible that this varies with gene locus and we will make sure that the revised version of the manuscript will be careful to highlight that it is not clear how much this observation in our work can be generalized.

      (2) The IT4var19 panned line that became binding-competent showed increased expression of both paralogs of ptp3 (as well as a phista and gbp), suggesting that overexpression of PTP3 may improve PfEMP1 display and binding. Interestingly, IT4 appears to be the only known P. falciparum strain (only available in PlasmoDB) that encodes more than one ptp3 gene (PfIT_140083100 and PfIT_140084700). PfIT_140084700 is almost identical to the 3D7 PTP3 (except for a ~120 residue insertion in 3D7 beginning at residue 400). In contrast, while the C-terminal region of PfIT_140083100 shows near-perfect conservation with 3D7 PTP3 beginning at residue 450, the N-terminal regions between the PEXEL and residue 450 are quite different. This may indicate the generally stronger receptor binding observed in IT4 relative to 3D7 results from increased PTP3 activity due to multiple isoforms or that specialized trafficking machinery exists for some PfEMP1 proteins.

      We thank the reviewer for pointing this out, it is an interesting idea that the PTP3 duplication could be a reason for the superior binding of IT4. We intend to add this point to the discussion of the revision.

      So far it seems the PTP3 issue occurred only with Var19. The thought of an extra layer of control, particularly for PfEMP1 variants that might be associated with virulence such as Var19, is very attractive. At present, the manuscript alludes to the possibility of an extra layer of control in the discussion. As var-type specificity and existence of such mechanisms in vivo are so far not known we decided not to speculate on this.

      Reviewer #3 (Public review):

      Summary:

      The submission from Cronshagen and colleagues describes the application of a previously described method (selection linked integration) to the systematic study of PfEMP1 trafficking in the human malaria parasite Plasmodium falciparum. PfEMP1 is the primary virulence factor and surface antigen of infected red blood cells and is therefore a major focus of research into malaria pathogenesis. Since the discovery of the var gene family that encodes PfEMP1 in the late 1990s, there have been multiple hypotheses for how the protein is trafficked to the infected cell surface, crossing multiple membranes along the way. One difficulty in studying this process is the large size of the var gene family and the propensity of the parasites to switch which var gene is expressed, thus preventing straightforward gene modification-based strategies for tagging the expressed PfEMP1. Here the authors solve this problem by forcing the expression of a targeted var gene by fusing the PfEMP1 coding region with a drug-selectable marker separated by a skip peptide. This enabled them to generate relatively homogenous populations of parasites all expressing tagged (or otherwise modified) forms of PfEMP1 suitable for study. They then applied this method to study various aspects of PfEMP1 trafficking.

      Strengths:

      The study is very thorough, and the data are well presented. The authors used SLI to target multiple var genes, thus demonstrating the robustness of their strategy. They then perform experiments to investigate possible trafficking through PTEX, they knock out proteins thought to be involved in PfEMP1 trafficking and observe defects in cytoadherence, and they perform proximity labeling to further identify proteins potentially involved in PfEMP1 export. These are independent and complimentary approaches that together tell a very compelling story.

      Weaknesses:

      (1) When the authors targeted IT4var19, they were successful in transcriptionally activating the gene, however, they did not initially obtain cytoadherent parasites. To observe binding to ICAM-1 and EPCR, they had to perform selection using panning. This is an interesting observation and potentially provides insights into PfEMP1 surface display, folding, etc. However, it also raises questions about other instances in which cytoadherence was not observed. Would panning of these other lines have been successfully selected for cytoadherent infected cells? Did the authors attempt panning of their 3D7 lines? Given that these parasites do export PfEMP1 to the infected cell surface (Figure 1D), it is possible that panning would similarly rescue binding. Likewise, the authors knocked out PTP1, TryThrA, and EMPIC3 and detected a loss of cytoadhesion, but they did not attempt panning to see if this could rescue binding. To ensure that the lack of cytoadhesion in these cases is not serendipitous (as it was when they activated IT4var19), they should demonstrate that panning cannot rescue binding.

      These are very important points. Indeed, we had repeatedly attempted to pan 3D7 when we failed to get the SLI-generated 3D7 PfEMP1 expressor lines to bind, but this had not been successful. After the move to IT4 which readily bound we made no further efforts to understand why 3D7 does not bind but the fact that PfEMP1 is on the surface indicates this is not a PTP3 issue. Also, as the parent 3D7 could not be panned, we assumed it is not easily fixed.

      Panning the TGD lines: we see the reasoning for conducting panning experiments with the TGD lines, but on second thought we are unsure this should be attempted. The outcome might not be easily interpretable if panning leads to increased binding and considerable follow up analyses would be needed to define what has happened. The reason for this is that at least two forces will contribute to the selection in panning experiments with TGD lines that lost binding. Firstly, panning would work against the SLI of the TGD, resulting in a tug of war between the TGD-SLI and binding: a very low frequency of parasites can be expected to loop out the TGD plasmid and would normally be eliminated during standard culturing due to the SLI drug used for the TGD. These revertant cells would bind and the panning would enrich them (hence, panning and SLI are opposed in the case of a TGD abolishing binding). It is unclear how strong such an effect can be, but this might lead to mixed populations that complicate interpretations. The second selecting force are possible compensatory changes to restore binding. These can come in two flavors: reversal of potential independent changes that may have occurred in the TGD parasites and that are in reality causing the binding loss (the concern of the reviewer) or new changes to compensate the loss of the TGD target (in case the TGD is the cause of the binding loss). As both of the TGDs in the paper show some residual binding and have VAR01 on the surface to at least some extent, it is possible that new compensatory changes might indeed occur that indirectly increase binding again. In summary, even if more binding after panning of the lines occurs, it is not clear whether this is due to a compensatory change ameliorating the TGD or reversal of an unrelated change. The impact of repeated panning against SLI is also unknown. To determine the cause, the panned TGD lines would need to be subjected to a complex and time-consuming analysis (WGS, RNASeq, possibly Maurer’s clefts IFA phenotype) to find out whether they had an unrelated chance change that was reverted or a new compensatory change that helps binding.

      The detection of VAR01 on the surface of these TGDs speaks against a PTP3 effect. While we can’t fully exclude other changes in the TGDs that might affect binding, we conducted WGS which did not show any obvious alterations that could be responsible. To fully exclude loss of ptp3 expression as the reason as seen with Var19 (something we would not have seen in the WGS if it is only due to a transcriptional change), we intend to carry out RNASeq with the two TGD lines. The third TGD mentioned by the reviewer (targeting ptp1) was a positive control of a known PfEMP1 trafficking protein, so we assume this does not need to be further validated.

      (2) The authors perform a series of trafficking experiments to help discern whether PfEMP1 is trafficked through PTEX. While the results were not entirely definitive, they make a strong case for PTEX in PfEMP1 export. The authors then used BioID to obtain a proxiome for PfEMP1 and identified proteins they suggest are involved in PfEMP1 trafficking. However, it seemed that components of PTEX were missing from the list of interacting proteins. Is this surprising and does this observation shed any additional light on the possibility of PfEMP1 trafficking through PTEX? This warrants a comment or discussion.

      This is an interesting comment and we agree we should have discussed this. A likely reason why PTEX components are not picked up as interactors is that BirA* is expected to become unfolded when it passes through the channel and in that state can’t biotinylate. Labelling likely would only be possible if PfEMP1 lingered at the PTEX translocation step before BirA* became unfolded to go through the channel which we would not expect under physiological conditions. We intend to add a sentence to the discussion why we think PTEX components would not be detected in our BioIDs even if PfEMP1 passes through it but that this might also be an argument against it passing through PTEX.

    1. Author response:

      Reviewer #1 (Public review):

      The results of this manuscript look at the interplay between pleiotropy, standing genetic variation, and parallelism (i.e. predictability of evolution) in gene expression. Ultimately, their results suggest that (a) pleiotropic genes typically have a smaller range in variation/expression, and (b) adaptation to similar environments tends to favor changes in pleiotropic genes, which leads to parallelism in mechanisms (though not dramatically). However, it is still uncertain how much parallelism is directly due to pleiotropy, instead of a complex interplay between them and ancestral variation.

      I have a few things that I was uncertain about. It may be these things are easily answered but require more discussion or clarity in the manuscript.

      (1) The variation being talked about in this manuscript is expression levels, and not SNPs within coding regions (or elsewhere). The cause of any specific gene having a change in expression can obviously be varied - transcription factors, repressors, promoter region variation, etc. Is this taken into account within the "network connectivity" measurement? I understand the network connectivity is a proxy for pleiotropy - what I'm asking is, conceptually, what can be said about how/why those highly pleiotropic genes have a change (or not) in expression. This might be a question for another project/paper, but it feels like a next step worth mentioning somewhere.

      In current study, we are only able to detect significant and repeatable expression changes but unable to identify the underlying causal variants. An eQTL study in the founder population in combination with genomic resequencing for both evolved and ancestral populations would be required to address this question.

      (2) The authors do have a passing statement in line 361 about cis-regulatory regions. Is the assumption that genetic variation in promoter regions is the ultimate "mechanism" driving any change in expression? In the same vein, the authors bring up a potential confounding factor, though they dismiss it based on a specific citation (lines 476-481; citation 65). I'm of the mindset that in order to more confidently disregard this "issue" based on previous evidence, it requires more than one citation. Especially since the one citation is a plant. That specific point jumps out to me as needing a more careful rebuttal.

      It was not our intention to claim that the expression changes in our experiment are caused by cis-regulatory variation only. We believe that the observed expression variation has both cis- and trans-genetic components, where as some studies tend to estimate much higher cisvariation for gene expression in Drosophila populations (e.g. [1, 2]). We mentioned the positive correlation between cis-regulatory polymorphism and expression variation to (1) highlight the genetic control of gene expression and (2) make the connection between polygenic adaptation and gene expression evolutionary parallelism.

      (3) I feel like there isn't enough exploration of tissue specificity versus network connectivity. Tissue specificity was best explained by a model in which pleiotropy had both direct and indirect effects on parallelism; while network connectivity was best explained (by a small margin) via the model which was mostly pleiotropy having a direct effect on ancestral variation, that then had a direct effect on parallelism. When the strengths of either direct/indirect effects were quantified, tissue specificity showed a stronger direct effect, while network connectivity had none (i.e. not significant). My confusion is with the last point - if network connectivity is explained by a direct effect in the best-supported model, how does this work, since the direct effect isn't significant? Perhaps I am misunderstanding something.

      To clarify, for network connectivity, there’s a significant “indirect” effect on parallelism (i.e. network connectivity affect ancestral gene expression and ancestral gene expression affect parallelism). Hence, in table 2, the direct effect of network connectivity on parallelism is weak and not significant while the indirect effect via ancestral variation is significant.

      Also, network connectivity might favor the most pleiotropic genes being transcription factor hubs (or master regulators for various homeostasis pathways); while the tissue specificity metric perhaps is a kind of a space/time element. I get that a gene having expression across multiple tissues does fit the definition of pleiotropy in the broad sense, but I'm wondering if some important details are getting lost - I'm just thinking about the relative importance of what tissue specificity measurements say versus the network connectivity measurement.

      We examined the statistical relationship between the two measures and found a moderate positive correlation on the basis of which we argued that the two measures may capture different aspects of pleiotropy. We appreciate the reviewer’s suggestions about the biological basis of the two estimates of pleiotropy, but we think that without further experimental insights, an extended discussion of this topic is too premature to provide meaningful insights to the readership.

      Reviewer #2 (Public review):

      Summary:

      Lai and collaborators use a previously published RNAseq dataset derived from an experimental evolution set up to compare the pleiotropic properties of genes whose expression evolved in response to fluctuating temperature for over 100 generations. The authors correlate gene pleiotropy with the degree of parallelisms in the experimental evolution set up to ask: are genes that evolved in multiple replicates more or less pleiotropic?

      They find that, maybe counter to expectation, highly pleiotropic genes show more replicated evolution. Such an effect seems to be driven by direct effects (which the authors can only speculate on) and indirect effects through low variance in pleiotropic genes (which the authors indirectly link to genetic variation underlying gene expression variance).

      Weaknesses:

      The results offer new insights into the evolution of gene expression and into the parameters that constrain such evolution, i.e., pleiotropy. Although the conclusions are supported by the data, I find the interpretation of the results a little bit complicated.

      Major comment:

      The major point I ask the authors to address is whether the connection between polygenic adaptation and parallelism can indeed be used to interpret gene expression parallelism. If the answer is not, please rephrase the introduction and discussion, if the answer is yes, please make it explicit in the text why it is so.

      Our answer is yes, we interpreted gene expression parallelism (high ancestral variance -> less parallelism) using the same framework that links polygenic adaptation and parallelism (high polygenicity = less trait parallelism). We believe that our response covers several of the reviewer’s concerns.

      The authors' argument: parallelism in gene expression is the same as parallelism in SNP allele frequency (AFC) (see L389-383 here they don't mention that this explanation is derived from SNP parallelism and not trait parallelism, and see Figure 1 b). In previous publications, the authors have explained the low level of AFC parallelism using a polygenic argument. Polygenic traits can reach a new trait optimum via multiple SNPs and therefore although the trait is parallel across replicates, the SNPs are not necessarily so.

      Importantly, our rationale is based on the idea that gene expression is rarely the direct target of selection, but rather an intermediate trait [3]. Recently, we have specifically tested this assumption for gene expression and metabolite concentrations and our analysis showed that both traits were are redundant [4], as previously shown for DNA sequences [5]. The important implication for this manuscript is that gene expression is also redundant, so that adaptation can be achieved by distinct changes in gene expression in replicate populations adapting to the same selection pressure. This implies that we can use the same simulation framework for gene expression as for sequencing data. In our case different SNP frequencies correspond to different expression levels (averaged across individuals from a population), which in turn increases fitness by modifying the selected trait. Importantly, the selected trait in our simulations is not gene expression, but a not defined high level phenotype. A key insight from our simulations is that with increasing polygenicity the expression of a gene is more variable in the ancestral population.

      In the current paper, they seem to be exchanging SNP AFC by gene expression, and to me, those are two levels that cannot be interchanged. Gene expression is a trait, not an SNP, and therefore the fact that a gene expression doesn't replicate cannot be explained by a polygenic basis, because again the trait is gene expression itself. And, actually, the results of the simulations show that high polygenicity = less trait parallelism (Figure 4).

      As detailed above, because adaptation can be reached by changes in gene expression at different sets of genes, redundancy is also operating on the expression level not just on the level of SNPs. To clarify, the x-axis of Fig. 4 is the expression variation in the ancestral population.

      Now, if the authors focus on high parallel genes (present in e.g. 7 or more replicates) and they show that the eQTLs for those genes are many (highly polygenic) and the AFC of those eQTLs are not parallel, then I would agree with the interpretation. But, given that here they just assess gene expression and not eQTL AFC, I do not think they can use the 'highly polygenic = low parallelism' explanation.

      The interpretation of the results to me, should be limited to: genes with low variance and high pleiotropy tend to be more parallel, and the explanation might be synergistic pleiotropy.

      While we understand the desire to model the full hierarchy from eQTLs to gene expression and adaptive traits, we raise caution that this would be a very challenging task. eQTLs very often underestimate the contribution of trans-acting factors, hence the understanding of gene expression evolution based on eQTLs is very likely incomplete and cannot explain the redundancy of gene expression during adaptation. Hence, we think that the focus on redundant gene expression is conceptually simpler and thus allows us to address the question of pleiotropy without the incorporation of allele frequency changes.  

      Reviewer #3 (Public review):

      The authors aim to understand how gene pleiotropy affects parallel evolutionary changes among independent replicates of adaptation to a new hot environment of a set of experimental lines of Drosophila simulans using experimental evolution. The flies were RNAsequenced after more than 100 generations of lab adaptation and the changes in average gene expression were obtained relative to ancestral expression levels from reconstructed ancestral lines. Parallelism of gene expression change among lines is evaluated as variance in differential gene expression among lines relative to error variance. Similarly, the authors ask how the standing variation in gene expression estimated from a handful of flies from a reconstructed outbred line affects parallelism. The main findings are that parallelism in gene expression responses is positively associated with pleiotropy and negatively associated with expression variation. Those results are in contradiction with theoretical predictions and empirical findings. To explain those seemingly contradictory results the authors invoke the role of synergistic pleiotropy and correlated selection, although they do not attempt to measure either.

      Strengths:

      (1) The study uses highly replicated outbred laboratory lines of Drosophila simulans evolved in the lab under a constant hot regime for over 100 generations. This allows for robust comparisons of evolutionary responses among lines.

      (2) The manuscript is well written and the hypotheses are clearly delineated at the onset.

      (3) The authors have run a causal analysis to understand the causal dependencies between pleiotropy and expression variation on parallelism.

      (4) The use of whole-body RNA extraction to study gene expression variation is well justified.

      Weaknesses:

      (1) It is unclear how well phenotypic variation in gene expression of the evolved lines has been estimated by the sample of 20 males from a reconstructed outbred line not directly linked to the evolved lines under study. I see this as a general weakness of the experimental design.

      Our intention was not to measure the phenotypic variance of the evolved lines, but rather to estimate the phenotypic variance at the beginning of the experiment. Hence, we measured and investigated the variation of gene expression in the ancestral population since this was the beginning of the replicated experimental evolution. Furthermore, since the ancestral population represents the natural population in Florida, the gene expression variation reflects the history of selection history acting on it.

      (2) There are no estimates of standing genetic variation of expression levels of the genes under study, only phenotypic variation. I wished the authors had been clear about that limitation and had discussed the consequences of the analysis. This also constitutes a weakness of the study.

      The reviewer is correct that we do not aim to estimate the standing genetic variation, which is responsible for differences in gene expression. While we agree that it could be an interesting research question to use eQTL mapping to identify the genetic basis of gene expression, we caution that trans-effects are difficult to estimate and therefore an important component of gene expression evolution will be difficult to estimate. Hence, we consider that our focus on variation in gene expression without explicit information about the genetic basis is simpler and sufficient to address the question about the role of pleiotropy.

      (3) Moreover, since the phenotype studied is gene expression, its genetic basis extends beyond expressed sequences. The phenotypic variation of a gene's expression may thus likely misrepresent the genetic variation available for its evolution. The genetic variation of gene expression phenotypes could be estimated from a cross or pedigree information but since individuals were pool-sequenced (by batches of 50 males), this type of analysis is not possible in this study.

      We agree with the reviewer that gene expression variation may also have a non-genetic basis, we discuss this in depth in the discussion of the manuscript.  

      (4) The authors have not attempted to estimate synergistic pleiotropy among genes, nor how selection acts on gene expression modules. It makes any conclusion regarding the role of synergistic pleiotropy highly speculative.

      We mentioned synergistic pleiotropy as a possible explanation for our results. A positive correlation between the fitness effect of gene expression variation would predict more replicable evolutionary changes. A similar argument has been made by [6]. 

      I don't understand the reason why the analysis would be restricted to significantly differentially expressed genes only. It is then unclear whether pleiotropy, parallelism, and expression variation do play a role in adaptation because the two groups of adaptive and non-adaptive genes have not been compared. I recommend performing those comparisons to help us better understand how "adaptive" genes differentially contribute to adaptation relative to "nonadaptive" genes relative to their difference in population and genetic properties.

      We agree with the reviewer that the comparison between the pleiotropy of adaptive and nonadaptive genes is interesting. We performed the analysis but omitted from the current manuscript for simplicity. Similar to the results in [6], non-adaptive genes are more pleiotropic than the adaptive genes. For adaptive genes we find a positive correlation between the level of pleiotropy and evolutionary parallelism. Thus, high pleiotropy limits the evolvability of a gene, but moderate and potentially synergistic pleiotropy increases the repeatability of adaptive evolution. We included this result in the revised manuscript and discuss it.

      There is a lack of theoretical groundings on the role of so-called synergistic pleiotropy for parallel genetic evolution. The Discussion does not address this particular prediction. It could be removed from the Introduction.

      We modestly disagree with the reviewer, synergistic pleiotropy is covered by theory and empirical results also support the importance of synergistic pleiotropy. 

      References

      (1) Genissel A, McIntyre LM, Wayne ML, Nuzhdin SV. Cis and trans regulatory effects contribute to natural variation in transcriptome of Drosophila melanogaster. Molecular biology and evolution. 2008;25(1):101-10. Epub 20071112. doi: 10.1093/molbev/msm247. PubMed PMID: 17998255.

      (2) Osada N, Miyagi R, Takahashi A. Cis- and Trans-regulatory Effects on Gene Expression in a Natural Population of Drosophila melanogaster. Genetics. 2017;206(4):2139-48. Epub 20170614. doi: 10.1534/genetics.117.201459. PubMed PMID: 28615283; PubMed Central PMCID: PMCPMC5560811.

      (3) Barghi N, Hermisson J, Schlötterer C. Polygenic adaptation: a unifying framework to understand positive selection. Nature reviews Genetics. 2020;21(12):769-81. Epub 2020/07/01. doi: 10.1038/s41576-020-0250-z. PubMed PMID: 32601318.

      (4) Lai WY, Otte KA, Schlötterer C. Evolution of Metabolome and Transcriptome Supports a Hierarchical Organization of Adaptive Traits. Genome biology and evolution. 2023;15(6). Epub 2023/05/26. doi: 10.1093/gbe/evad098. PubMed PMID: 37232360; PubMed Central PMCID: PMCPMC10246829.

      (5) Barghi N, Tobler R, Nolte V, Jaksic AM, Mallard F, Otte KA, et al. Genetic redundancy fuels polygenic adaptation in Drosophila. PLoS biology. 2019;17(2):e3000128. Epub 2019/02/05. doi: 10.1371/journal.pbio.3000128. PubMed PMID: 30716062.

      (6) Rennison DJ, Peichel CL. Pleiotropy facilitates parallel adaptation in sticklebacks. Molecular ecology. 2022;31(5):1476-86. Epub 2022/01/09. doi: 10.1111/mec.16335. PubMed PMID: 34997980; PubMed Central PMCID: PMCPMC9306781.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment 

      This valuable study is a detailed investigation of how chromatin structure influences replication origin function in yeast ribosomal DNA, with a focus on the role of the histone deacetylase Sir2 and the chromatin remodeler Fun30. Convincing evidence shows that Sir2 does not affect origin licensing but rather affects local transcription and nucleosome positioning which correlates with increased origin firing. Overall, the evidence is solid and the model plausible. However, the methods employed do not rigorously establish a key aspect of the mechanism where initiation precisely occurs or rigorously exclude alternative models and the effect of Sir2 on transcription is not re-examined in the fun30 context. 

      Clarification on Sir2 Effect on Transcription in the fun30 Context

      We appreciate the reviewers’ thorough assessment but would like to clarify that the effect of Sir2 on transcription in the fun30 context was addressed in both the original and revised manuscripts. However, we recognize that the presentation of the qPCR results may have been unclear, as we initially plotted absolute transcript levels without normalizing for rDNA array size differences among the genotypes. We have now corrected this.

      After normalizing for copy number variations, the qPCR data show that the sir2 fun30 double mutant results in a ~40-fold increase in C-pro transcription relative to WT, compared to a 4-fold and 19-fold increase in fun30 and sir2 single mutants, respectively (Figure 5, figure supplement 6). These results have been discussed in the manuscript result section, where we note that "C-pro RNA levels were approximately twice as high in sir2 fun30 compared to sir2 cells when adjusted for rDNA size differences." This observation is critical for addressing both alternative models of MCM disappearance and for pinpointing transcription initiation sites, as detailed in the following sections.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This paper presents a mechanistic study of rDNA origin regulation in yeast by SIR2. Each of the ~180 tandemly repeated rDNA gene copies contains a potential replication origin. Earlyefficient initiation of these origins is suppressed by Sir2, reducing competition with origins distributed throughout the genome for rate-limiting initiation factors. Previous studies by these authors showed that SIR2 deletion advances replication timing of rDNA origins by a complex mechanism of transcriptional de-repression of a local PolII promoter causing licensed origin proteins (MCMcomplexes) to re-localize (slide along the DNA) to a different (and altered) chromatin environment. In this study, they identify a chromatin remodeler, FUN30, that suppresses the sir2∆ effect, and remarkably, results in a contraction of the rDNA to about onequarter it's normal length/number of repeats, implicating replication defects of the rDNA. Through examination of replication timing, MCM occupancy and nucleosome occupancy on the chromatin in sir2, fun30, and double mutants, they propose a model where nucleosome position relative to the licensed origin (MCM complexes) intrinsically determines origin timing/efficiency. While their interpretations of the data are largely reasonable and can be interpreted to support their model, a key weakness is the connection between Mcm ChEC signal disappearance and origin firing. While the cyclical chromatin association-dissociation of MCM proteins with potential origin sequences may be generally interpreted as licensing followed by firing, dissociation may also result from passive replication and as shown here, displacement by transcription and/or chromatin remodeling. Moreover, linking its disappearance from chromatin in the ChEC method with such precise resolution needs to be validated against an independent method to determine the initiation site(s). Differences in rDNA copy number and relative transcription levels also are not directly accounted for, obscuring a clearer interpretation of the results. Nevertheless, this paper makes a valuable advance with the finding of Fun30 involvement, which substantially reduces rDNA repeat number in sir2∆ background. The model they develop is compelling and I am inclined to agree, but I think the evidence on this specific point is purely correlative and a better method is needed to address the initiation site question. The authors deserve credit for their efforts to elucidate our obscure understanding of the intricacies of chromatin regulation. At a minimum, I suggest their conclusions on these points of concern should be softened and caveats discussed. Statistical analysis is lacking for some claims. 

      Strengths are the identification of FUN30 as suppressor, examination of specific mutants of FUN30 to distinguish likely functional involvement. Use of multiple methods to analyze replication and protein occupancies on chromatin. Development of a coherent model. 

      Weaknesses are failure to address copy number as a variable; insufficient validation of ChEC method relationship to exact initiation locus; lack of statistical analysis in some cases. 

      Review of revised version and response letter: 

      In the response, the authors make some improvements by better quantifying 2D gels, adding some missing statistical analyses, analyzing the effect of fun30 on rDNA replication in strains with reduced rDNA copy number, and using ChIP-seq of MCMs to support the ChEC-seq data. However, these additions do not address the main issue that is at the heart of their model: where initiation precisely occurs and whether the location is altered in the mutant(s). Thus, mechanistic insight is limited.

      We discuss the issue regarding the initiation site below.

      Under the section "Addressing Alternative Explanations", the authors claim that processes like transcription and passive replication cannot affect the displaced complex specifically. Why? They are not on same DNA (as mentioned in the Fig 1 legend). 

      Premature origin activation, not transcription, drives the disappearance of repositioned MCM complexes in sir2 mutants in HU.

      Indeed, the reviewer is correct in suggesting that C-pro transcription confined to rDNA units with repositioned MCM complexes could selectively displace those complexes, potentially explaining the selective disappearance of displaced MCMs in sir2 cells. However, our analysis of C-pro transcription and MCM occupancy in G1 versus HU across the genotypes allows us to rule out this possibility.

      We show that the fraction of repositioned MCMs in G1 cells is proportional to the level of C-pro transcription (WT < fun30 << sir2 < sir2 fun30), consistent with the involvement of transcription in the repositioning process during MCM loading in G1. Accordingly, with approximately twice the transcription in sir2 fun30 compared to sir2, we observe more repositioned MCMs in sir2 fun30 cells than in sir2 cells in G1 (Fig 5C).

      However, if the disappearance of repositioned MCMs in HU were solely due to C-pro transcription rather than origin activation, we would expect the repositioned MCMs to disappear more quickly in sir2 fun30 cells. Contrary to this expectation, our data show that repositioned MCM complexes are more stable in sir2 fun30 mutants compared to sir2 mutants, indicating that transcription is not the primary factor in the disappearance of displaced MCM complexes in HU; rather, rDNA origin activation appears to be the key factor.

      Replication initiation site in sir2. Using multiple independent approaches, including 2D gels, ChIP-seq, and EdU incorporation, we have demonstrated that rDNA origins fire prematurely in sir2 mutants, a conclusion that the reviewer does not contest. Once an origin fires, the MCM signal disappears from the site of its initial deposition, as expected, and this is confirmed in our MCM ChIP and HU ChEC data, both at rDNA origins and across the genome.

      Given that the majority of MCM complexes in sir2 mutants are repositioned, it is expected that these repositioned complexes disappear following premature origin activation. With less than half of the licensed origins (or <30% of total rDNA copies) retaining MCM at non-repositioned sites in sir2 mutants, if only these non-repositioned complexes were firing, and the repositioned MCM complexes were disappearing via mechanisms other than replication initiation (e.g., transcription), rDNA replication in sir2 mutants would be severely compromised rather than accelerated. Given this, and the strong experimental evidence that repositioned MCM complexes fire prematurely, continued focus on alternative explanations for MCM complex disappearance seems unwarranted.

      We present this analysis in the results section as follows:

      “Finally, although deletion of FUN30 could suppress replication initiation at the rDNA either by inhibiting the firing of the active, repositioned MCM complex or by preventing MCM repositioning to the "active location" in the first place, our results suggest that suppression occurs through the former mechanism. Consistent with previous reports that fun30 mutants are deficient in transcriptional silencing (Neves-Costa et al. 2009), C-pro RNA levels were approximately twice as high in sir2 fun30 cells compared to sir2 cells when adjusted for rDNA size (Figure 5—figure supplement 6).

      Moreover, deletion of FUN30 shifts the distribution toward the repositioned MCM location over the non-repositioned one in G1 cells (Figure 5C), aligning with the increased C-pro transcription observed in fun30 mutants. This shift is evident in both sir2 and SIR2 cells. Despite the increased transcription-mediated repositioning in sir2 fun30 cells compared to sir2 cells during G1, repositioned MCM persists longer in sir2 fun30 cells than in sir2 cells after release into HU. Additionally, sir2 fun30 mutants exhibit reduced MCM accumulation at the RFB compared to sir2 mutants after release into HU, supporting the conclusion that MCM disappearance in HU reflects origin activation rather than transcription-mediated displacement.”

      The model in Fig 7 implies that initiation sites are different in WT versus the mutants and this determines their timing/efficiency. But they also suggest that the same site might be used with different efficiencies in this response. I agree that both are possibilities and are not resolved. 

      Adjustment of the model to account for repositioned MCMs in WT cells In Figure 5—figure supplement 5, we demonstrate that even in WT cells, a small fraction of repositioned MCMs (~5%) can be detected, and that these repositioned MCM complexes disappear prematurely. However, because this represents a very small fraction of MCMs in WT cells, we initially did not include it in our overall model in Figure 7. In light of the reviewer's comment, we have now revised the model to incorporate this detail.

      Supporting their model requires better resolution to determine the actual replication initiation site. While this may be challenging, it should be feasible with methods to map nascent strands like DNAscent, or Okazaki fragment mapping.

      The initiation site in sir2 mutants has been thoroughly analyzed and supported by extensive experimental data, as discussed above. While high-resolution techniques such as DNAscent or Okazaki fragment mapping could potentially offer another layer of validation, the likelihood of obtaining finer detail that would change the conclusions is minimal. The methods we employed provide sufficient resolution to pinpoint the initiation site, and our results align consistently with established replication models.

      Further experimentation would not only be redundant but also unlikely to provide new insights beyond revalidation. Given the strength of our current data, we believe the conclusions regarding replication initiation are robust and well-supported, making additional experiments unnecessary at this stage. Our priority is to focus on advancing other aspects of the research that require deeper exploration.

      The 2D gel analysis of strains with reduced rDNA copy numbers adequately addresses the copy number variable with regard to the replication effect. 

      Overall, the paper is improved by providing additional data and improved analysis. The paper nicely characterizes the effect of Fun30. The model is reasonable but remains lacking in precise details of mechanism. 

      Reviewer #2 (Public review): 

      Summary: 

      In this manuscript, the authors follow up on their previous work showing that in the absence of the Sir2 deacetylase the MCM replicative helicase at the rDNA spacer region is repositioned to a region of low nucleosome occupancy. Here they show that the repositioned displaced MCMs have increased firing propensity relative to non-displaced MCMs. In addition, they show that activation of the repositioned MCMs and low nucleosome occupancy in the adjacent region depend on the chromatin remodeling activity of Fun30. 

      Strengths: 

      The paper provides new information on the role of a conserved chromatin remodeling protein in regulation of origin firing and in addition provides evidence that not all loaded MCMs fire and that origin firing is regulated at a step downstream of MCM loading. 

      Weaknesses: 

      The relationship between the authors results and prior work on the role of Sir2 (and Fob1) in regulation of rDNA recombination and copy number maintenance is not explored, making it difficult to place the results in a broader context. Sir2 has previously been shown to be recruited by Fob1, which is also required for DSB formation and recombination-mediated changes in rDNA copy number. Are the changes that the authors observe specifically in fun30 sir2 cells related to this pathway? Is Fob1 required for the reduced rDNA copy number in fun30 sir2 double mutant cells? 

      Reviewer #3 (Public review): 

      Summary: 

      Heterochromatin is characterized by low transcription activity and late replication timing, both dependent on the NAD-dependent protein deacetylase Sir2, the founding member of the sirtuins. This manuscript addresses the mechanism by which Sir2 delays replication timing at the rDNA in budding yeast. Previous work from the same laboratory (Foss et al. PLoS Genetics 15, e1008138) showed that Sir2 represses transcription-dependent displacement of the Mcm helicase in the rDNA. In this manuscript, the authors show convincingly that the repositioned Mcms fire earlier and that this early firing partly depends on the ATPase activity of the nucleosome remodeler Fun30. Using read-depth analysis of sorted G1/S cells, fun30 was the only chromatin remodeler mutant that somewhat delayed replication timing in sir2 mutants, while nhp10, chd1, isw1, htl1, swr1, isw2, and irc5 had no effect. The conclusion was corroborated with orthogonal assays including two-dimensional gel electrophoresis and analysis of EdU incorporation at early origins. Using an insightful analysis with an Mcm-MNase fusion (Mcm-ChEC), the authors

      show that the repositioned Mcms in sir2 mutants fire earlier than the Mcm at the normal position in wild type. This early firing at the repositioned Mcms is partially suppressed by Fun30. In addition, the authors show Fun30 affects nucleosome occupancy at the sites of the repositioned Mcm, providing a plausible mechanism for the effect of Fun30 on Mcm firing at that position. However, the results from the MNAse-seq and ChEC-seq assays are not fully congruent for the fun30 single mutant. Overall, the results support the conclusions providing a much better mechanistic understanding how Sir2 affects replication timing at rDNA, 

      Strengths 

      (1) The data clearly show that the repositioned Mcm helicase fires earlier than the Mcm in the wild type position. 

      (2) The study identifies a specific role for Fun30 in replication timing and an effect on nucleosome occupancy around the newly positioned Mcm helicase in sir2 cells. 

      Weaknesses 

      (1) It is unclear which strains were used in each experiment. 

      (2) The relevance of the fun30 phospho-site mutant (S20AS28A) is unclear. 

      (3) For some experiments (Figs. 3, 4, 6) it is unclear whether the data are reproducible and the differences significant. Information about the number of independent experiments and quantitation is lacking. This affects the interpretation, as fun30 seems to affect the +3 nucleosome much more than let on in the description. 

      Recommendations for the authors:  

      Reviewer #2 (Recommendations for the authors)

      The authors have addressed my concerns by the addition of new experiments and analysis. 

      One point remains unclear regarding additional support for the Mcm-ChEC results using ChIP experiments to verify whether MCM redistributes in sir2D cells. In their rebuttal, the authors state that, "New supporting based evidence: ChIP at rDNA Origins. Our ChIP analysis also shows that the disappearance of the MCM signal at rDNA origins in sir2Δ cells released into HU is accompanied by signal accumulation at the replication fork barrier (RFB), indicative of stalled replication forks at this location (Figure 5 figure supplement 3)...." The ChIP data in Figure 5 supplement 3 show accumulation of the Mcm2 ChIP signal to the left of the RFB in sir2D cells but it doesn't look like there is any decrease in the MCM signal in sir2D relative to wild-type cells for the peak C-Pro. There is a new MCM peak suggesting perhaps a new MCM loading event. 

      Figure 5 figure supplement 3 shows the relative abundance of the MCM ChIP signal across the ~2 kb rDNA region, spanning from the MCM loading site at the rDNA origin (on the left) to the replication fork barrier (RFB) on the right. The MCM-ChIP data are normalized to the highest signal within this rDNA region rather than across the entire genome, meaning that only the relative abundance of MCM within this region is represented, and not comparisons between different conditions. We have now presented the results with the same axes for both alpha factor and HU.

      In wild-type (WT) cells, the MCM signal remains primarily at the initial loading site. However, in sir2 mutants, a significant portion of the MCM signal shifts rightward, consistent with rDNA origin activation and the movement of MCM along with the progressing replication fork. While some replication forks stall at the RFB, others are positioned between the MCM loading site and the RFB. The additional MCM peak observed does not represent a new MCM loading event, as the experiment was conducted during S-phase, when new MCM loading is not possible.

      Reviewer #3 (Recommendations for the authors): 

      In this revision the authors addressed my concerns and improved the manuscript and the presentation of the data. All my recommendations were implemented.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Public review):

      Summary:

      In their manuscript the authors report that fecal transplantation from young mice into old mice alleviates susceptibility to gout. The gut microbiota in young mice is found to inhibit activation of the NLRP3 inflammasome pathway and reduce uric acid levels in the blood in the gout model.

      Strengths:

      They focused on the butanoate metabolism pathway based on the results of metabolomics analysis after fecal transplantation and identified butyrate as the key factor in mitigating gout susceptibility. In general, this is a well-performed study.

      Weaknesses:

      The discussion on the current results and previous studies regarding the effect of butyrate on gout symptoms is insufficient. The authors need to provide a more thorough discussion of other possible mechanisms and relevant literature.

      Reviewer #2 (Recommendations for the authors):

      General comments:

      I appreciate the authors' efforts to answer the comments raised in my previous review (as Reviewer#2). However, I still detect some issues that need to be fully addressed, with inadequate or even no answers for several comments.

      Thank you for your valuable feedback. Your previous suggestions have been incredibly helpful for our paper. Although we have strived to make the article as comprehensive as possible, there may still be some areas that are not perfectly refined.

      The response to comment 1: The author's statement is not very convincing. What are the trends of inflammation factors? The data in Figure 1G-H suggest that butyrate may not be the only factor to explain this phenomenon. Authors should carefully interpret the data in Figure 1G-H.

      Sorry for the inadequate clarification on your question. We utilize antibiotics for treatment in order to establish the relationship between gut microbiota, age, and gout. Our research findings indicate that there is a trend for serum uric acid levels to increase with age, and we also observe that the older the age, the more pronounced the stimulation to MSU. We found that after clearing the gut microbiota and then stimulating with MSU, the trend of inflammation factors and serum uric acid level changing with age disappears. Thus, we can preliminarily draw the conclusion that the gut microbiota is closely associated with age, gout, and hyperuricemia.

      The response to comment 2: I understand the importance of evaluating a range of indicators, but food thickness is the most crucial clinical marker for diagnosing goats. Please move the data from Supplemental Figure 1A to the main figure.

      Thank you for your suggestions. We have included the most significant results in the main figure, and the description of “foot thickness” has already been provided descriptively in the manuscript. Additionally, considering the layout and arrangement of the images, we have placed it in the supplementary figures 1.

      The response to comment 3: The immunostaining for ZO-1 and Occludin is unclear. Please provide higher magnification images to confirm the specific staining.

      Thank you for your valuable feedback. We have enhanced the clarity of the images. In addition to adding immunohistochemical images in Supplementary Material 4, we have also submitted independent images.

      The response to comment 4: The authors still haven't directly addressed my comment.

      Please accept our sincere apologies for not providing a clearer response to your question. The indicators related to uric acid-producing enzymes and uric acid transporters have been separately analyzed according to different age groups. The specific results are detailed in section " The expression of uric acid-producing enzymes activity and uric acid transporters at the mRNA level across different age groups" of Supplementary Material 4.

      No response was given for comment 5. Please address it.

      In a PCoA plot, the distance between samples reflects the similarity in the structure of the microbial communities: the closer the distance, the more similar the composition of the communities; the greater the distance, the more pronounced the differences. We judge based on the relative distances of each group in the plot, observing their degree of proximity.

      The response to comment 6: I understand the author's statement, and I suggest incorporating it into the discussion section of the revised manuscript.

      Thank you for your suggestions. We have incorporated the relevant content into our discussion.

      The response to comment 7: Again, please incorporate this statement into the discussion section of the revised manuscript.

      Thank you for your suggestions. We have incorporated the relevant content into our discussion.

      Reviewer #3 (Public review):

      Summary:

      The revised manuscript presents interesting findings on the role of gut microbiota in gout, focusing on the interplay between age-related changes, inflammation, and microbiota-derived metabolites, particularly butyrate. The study provides valuable insights into the therapeutic potential of microbiota interventions and metabolites for managing hyperuricemia and gout. While the authors have addressed many of the previous concerns, a few areas still require clarification and improvements to strengthen the manuscript's clarity and overall impact.

      (1) While the authors mention that outliers in the data do not affect the conclusions, there remains a concern about the reliability of some figures (e.g., Figure 2D-G). It is recommended to provide a more detailed explanation of the statistical analysis used to handle outliers. Additionally, the clarity of the Western blot images, particularly IL-1β in Figure 3C, should be improved to ensure clear and supportive evidence for the conclusions.

      Thank you for your suggestion. We respond as follows: (1) Outliers can occasionally constitute intrinsic elements of the dataset, reflecting genuine occurrences within the experimental context. The elimination of such outliers has the potential to introduce bias into the results, thereby facilitating misconceptions regarding the underlying phenomenon under investigation. In order to maintain the transparency and integrity of the dataset, we have elected to retain the outliers within our analysis. This decision is based on the recognition that these values may represent genuine experimental observations or unique conditions that are inherently meaningful to the phenomenon under investigation. By preserving these data points, we aim to provide a comprehensive and unbiased representation of the experimental results, allowing for a more nuanced interpretation of the findings. (2) Due to the scarcity of samples, we are unable to fulfill your request in the short term. Furthermore, we have noted that the band for IL-1β in Figure 3C is indeed visible and we consider it suitable for subsequent analysis.

      (2) The manuscript raises a key question about why butyrate supplementation and FMT have different effects on uric acid metabolism and excretion. While the authors have addressed this by highlighting the involvement of multiple bacterial genera, it is still recommended to expand on the differences between these interventions in the discussion, providing more mechanistic insights based on available literature.

      Thank you for your suggestion. We have enriched the discussion in the manuscript and included additional comparisons

      (3) It is noted that IL-6 and TNF-α results in foot tissue were requested and have been added to supplementary material. However, the main text should clearly reference these additions, and the supplementary figures should be thoroughly reviewed for consistency with the main findings. The use of abbreviations (e.g., ns for no significant difference) and labeling should also be carefully checked across all figures.

      Thank you for your valuable feedback. We have revised the manuscript in accordance with your suggestions.

      (4) The manuscript presents butyrate as a key molecule in gout therapy, yet there are lingering concerns about its central role, especially given that other short-chain fatty acids (e.g., acetic and propionic acids) also follow similar trends. The authors should consider further acknowledging these other SCFAs and discussing their potential contribution to gout management. Additionally, the rationale for focusing primarily on butyrate in subsequent research should be made clearer.

      Thank you for your input. We have incorporated additional evidence into the discussion, explaining why we ultimately chose butyrate in subsequent research.

      (5) The full-length uncropped Western blot images should be provided as requested, to ensure transparency and reproducibility of the data.

      Thank you for your suggestion. We have already included the relevant explanations in the manuscript.

      (6) Despite the authors' revisions, several references still lack page numbers. Please ensure that all references are properly formatted, including complete page ranges.

      Thank you for your suggestions; we will make more detailed revisions to the references.

      The manuscript has improved with the revisions made, particularly regarding clarifications on experimental design and the inclusion of supplementary data. However, some concerns about data quality, mechanistic insights, and clarity in the figures remain. Addressing these points will enhance the overall impact of the work and its potential contribution to the understanding of the gut microbiome in gout and hyperuricemia. A final revision, with careful attention to both major and minor points, is highly recommended before resubmission.

      Once again, we are grateful for your suggestions and recognition. Your input has been of immense help to our manuscript and has also provided us with a valuable learning opportunity.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment

      The aim of this valuable study is to identify novel genes involved in sleep regulation and memory consolidation. It combines transcriptomic approaches following memory induction with measurements of sleep and memory to discover molecular pathways underlying these interlinked behaviors. The authors explore transcriptional changes in specific mushroom body neurons and suggest roles for two genes involved in RNA processing, Polr1F and Regnase-1, in the regulation of sleep and memory. Although this work exploits convincing and validated methodology, the strength of the evidence is incomplete to support the main claim that these two genes establish a definitive link between sleep and memory consolidation.

      We appreciate the reconsideration of our manuscript and recognize that we should have toned down the claims, especially with respect to the link between sleep and memory consolidation.  We have now changed the title, the abstract and the main text and also Figure 5 to essentially just state our findings.  While there is a little speculation in the Discussion, we point out that future work would be required to draw conclusions. We believe the manuscript still represents a considerable advance in showing the modulation of RNA processing genes during sleep-dependent memory consolidation in the relevant neurons, and also showing how one such gene affects sleep and translation and a second affects sleep and memory. 

      Public Reviews:

      Reviewer #2 (Public review):

      Prior work by the Sehgal group has shown that a small group of neurons in the fly brain (anterior posterior (ap) α'β' mushroom body neurons (MBNs)) promote sleep and sleep-dependent appetitive memory specifically under fed conditions (Chouhan et al., (2021) Nature). Here, Li, Chouhan et al. combine cell-specific transcriptomics with measurements of sleep and memory to identify molecular processes underlying this phenomenon. They define transcriptional changes in ap α'β' MBNs and suggest a role for two genes downregulated following memory induction (Polr1F and Regnase-1) in regulating sleep and memory.

      The transcriptional analyses in this manuscript are impressive. The authors have now included additional experiments that define acute and developmental roles for Polr1F and Regnase-1 respectively in regulating sleep. They have also provided additional data to strengthen their conclusion that Polr1F knockdown in α'β' mushroom body neurons enhances sleep.

      The resubmitted work represents a convincing investigation of two novel sleep-regulatory proteins that may also play important roles in memory formation.

      The authors have comprehensively addressed my comments, which I very much appreciate. I congratulate them on this excellent work.

      We very much appreciate the reviewer’s positive feedback. Thank you!

      Reviewer #3 (Public review):

      Previous work (Chouhan et al., 2022) from the Sehgal group investigated the relationship between sleep and long-term memory formation by dissecting the role of mushroom body intrinsic neurons, extrinsic neurons, and output neurons during sleep-dependent and sleep-independent memory consolidation. In this manuscript, Li et al., profiled transcriptome in the anterior-posterior (ap) α'/β' neurons and identified genes that are differentially expressed after training in fed condition, which supports sleep-dependent memory formation. By knocking down candidate genes systematically, the authors identified Polr1F and Regnase-1 as two important hits that play potential roles in sleep and memory formation. What is the function of sleep and how to create a memory are two long-standing questions in science. The present study used a new approach to identify novel components that may link sleep and memory consolidation in a specific type of neuron. Importantly, these components implicated that RNA processing may play a role in these processes.

      While I am enthusiastic about the innovative approach employed to identify RNA processing genes involved in sleep regulation and memory consolidation, I feel that the data presented in the manuscript is insufficient to support the claim that these two genes establish a definitive link between sleep and memory consolidation. First, the developmental role of Regnase-1 in reducing sleep remains unclear because knocking down Regnase-1 using the GeneSwitch system produced neither acute nor chronic sleep loss phenotype. In the revised manuscript, the author used the Gal80ts to restrict the knockdown of Regnase-1 in adult animals and concluded that Regnase-1 RNAi appears to affect sleep through development. Conducting overexpression experiments of Regnase-1 would lend some credibility to the phenotypes, however, this is not pursued in the revised manuscript. Second, while constitutive Regnase-1 knockdown produced robust phenotypes for both sleep-dependent and sleep-independent memory, it also led to a severe short-term memory phenotype. This raises the possibility that flies with constitutive Regnase-1 knockdown are poor learners, thereby having little memory to consolidate. The defect in learning could be simply caused by chronic sleep loss before training. Thus, this set of results does not substantiate a strong link between sleep and long-term memory consolidation. Lastly, the discussion on the sequential function of training, sleep, and RNA processing on memory consolidation appears speculative based on the present data.

      We thank the reviewer for the enthusiasm about the approach. As noted above, we have now removed all claims about a link between sleep and memory, and instead just emphasize that we have identified RNA processing genes that affect sleep and memory.  We agree with the reviewer that the basis of the Regnase-1 memory phenotype is unclear as the flies may be poor learners.  Also, the learning/memory defects could be secondary to sleep loss or, as Reviewer 4 below suggests, all the behavioral deficits could be caused by impaired development/function of the relevant ap ɑ′/β′ cells. We have now included this possibility in the discussion of the manuscript.  And we have modified the discussion on training, RNA processing, sleep and memory to emphasize the need for future experiments to address the sequence and relationship of these different processes. 

      Reviewer #4 (Public review):

      Summary:

      Li and Chouhan et al. follow up on a previous publication describing the role of anterior-posterior (ap) and medial (m) ɑ′/β′ Kenyon cells in mediating sleep-dependent and sleep-independent memory consolidation, respectively, based on feeding state in Drosophila melanogaster. The authors sequenced bulk RNA of ap ɑ′/β′ Kenyon cells 1h after flies were either trained-fed, trained-starved or untrained-fed and find a small number of genes (59) differentially expressed (3 upregulated, 56 downregulated) between trained-fed and trained-starved conditions. Many of these genes encode proteins involved in the regulation of gene expression. The authors then screened these differentially expressed genes for sleep phenotypes by expressing RNAi hairpins constitutively in ap ɑ′/β′ Kenyon cells and measuring sleep patterns. Two hits were selected for further analysis: Polr1F, which promoted sleep, and Regnase-1, which reduced sleep. The pan-neuronal expression of Polr1F and Regnase-1 RNAi constructs was then temporally restricted to adult flies using the GeneSwitch system. Polr1F sleep phenotypes were still observed, while Regnase-1 sleep phenotypes were not, indicating developmental defects. Appetitive memory was then assessed in flies with constitutive knockdown of Polr1F and Regnase-1 in ap ɑ′/β′ Kenyon cells. Polr1F knockdown did not affect sleep-dependent or sleep-independent memory, while Regnase-1 knockdown disrupted sleep-dependent memory, sleep-independent memory, as well as learning. Polr1F knockdown increased pre-ribosomal RNA transcripts in the brain, as measured by qPCR, in line with its predicted role as part of the RNA polymerase I complex. A puromycin incorporation assay to fluorescently label newly synthesized proteins also indicated higher levels of bulk translation upon Polr1F knockdown. Regnase-1 knockdown did not lead to observable changes in measurements of bulk translation.

      Strengths:

      The proposed involvement of RNA processing genes in regulating sleep and memory processes is interesting, and relatively unexplored. The methods are satisfactory.

      Weaknesses:

      The main weakness of the paper is in the overinterpretation of their results, particularly relating to the proposed link between sleep and memory consolidation, as stated in the title. Constitutive Polr1F knockdown in ap ɑ′/β′ Kenyon cells had no effect on appetitive long-term memory, while constitutive Regnase-1 knockdown affected both learning and memory. Since the effects of constitutive Regnase-1 knockdown on sleep could be attributed to developmental defects, it is quite plausible that these same developmental defects are what drive the observed learning and memory phenotypes. In this case, an alternative explanation of the authors' findings is that constitutive Regnase-1 knockdown disrupts the entire functioning of ap ɑ′/β′ Kenyon cells, and as a consequence behaviors involving these neurons (i.e. learning, memory and sleep) are disrupted. It will be important to provide further evidence of the function of RNA processing genes in memory in order to substantiate the memory link proposed by the authors.

      As noted above, we have removed claims of a link between sleep and memory and instead focused the manuscript on our findings of RNA processing genes modulated during sleep-dependent memory. We concur that impaired development of ap ɑ′/β′neurons could account for the sleep and memory phenotype observed and have included this possibility in the manuscript.

      Recommendations for the authors:

      Reviewer #4 (Recommendations for the authors):

      The title of the paper should be reconsidered to reflect the results. The evidence for a link between RNA processing genes and memory is weak.

      We have changed the title.

      Line 328. The term "central dogma" is misused. The central dogma refers to the unidirectional flow of information from DNA to protein. Instead the authors mean "gene expression".

      Changed, thank you.

      A couple of minor comments relating to the figures:

      Figure 1b. It is not clear what the number 10570 in the bottom right corner refers to.

      Fixed.

      Figure 3b. RU- and RU+ annotation is missing (as shown in 3d).

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)

      (1) The identification of the proximal to distal degeneration of the tailgut within the human tail is difficult to distinguish with the current images present in Figure 3. A picture within a picture of the area containing the tail gut could be provided to prominently demonstrate the cellular architecture. Additionally, quantification of the localization of apoptosis would strongly support this observation, as well as provide a visualization of the tail's regression overall. For example, a graph plotting the number of apoptotic cells versus the rostral to caudal locations of the transverse sections while accounting for the CS stage of each analyzed embryo could be created; this could even be further broken down by region of tail, for example, tailgut, ventral ectodermal ridge, somite, etc.

      To provide more information on apoptosis, we prepared serial sections from an additional 6 human tails, 5 of which were processed for fluorescence anti-caspase 3 immunohistochemistry with DAPI staining (Fig 4) and H&E (Fig 6). This confirmed our previous finding of apoptosis especially in the tailgut and ventral mesoderm. We have not quantified the apoptosis, given the difficulty of deciding whether anti-caspase signals represent single or multiple dying cells. Instead, we performed a tissue area analysis from caudal to rostral along the tail (new section on p 9). This shows a progressive enlargement of the neural tube, no change in the notochord and a striking reduction in tailgut area (Fig 4C,D). The smaller tailgut has fewer nuclei in cross section rostrally compared with more caudally (Fig 4E). Given that apoptosis is present in the tailgut at all rostro-caudal levels, this is consistent with a rostralto-caudal loss of the tailgut, as is also found in mouse and rat embryos.

      (2) The identification of the mode of formation of the secondary neural tube is probably the most interesting question to be addressed, however, Figure 7's evidence is not completely satisfying in its current form. While I agree that it is unlikely that multiple polarization foci form within the most caudal part of the tail and coalesce more rostrally, I am equally unsure that a single polarization would form rostrally and then split and re-coalesce as it moves caudally, as is currently depicted by 7B. Multiple groups have recently shown the influence of geometric confinement on neuroectoderm and its ability to polarize and form a singular central lumen (Karzbrun 2021, Knight 2018), or the inverse situation of a lack of confinement resulting in the presence of multiple lumens. The tapering of the diameter of the tail and its shared perimeter and curvature with the polarization bears a striking resemblance to this controlled confinement. An interesting quantification to depict would include the number of lumens versus the transverse section diameter and CS stage to see if there is any correlation between embryo size and the number of multiple polarizations. Anecdotally, the fusion of multiple polarizations/lumens tends to occur often in these human organoid-type platforms, while splitting to multiple lumens as the tissues mature does not. Other supplements to Figure 7 could include 3D renderings of lumens of interest as depicted in Catala 2021, especially if it demonstrates the recoalescence as seen in 7B. The non-pathologic presence of multiple polarizations in human tails compared to the rodent pathogenic counterpart is interesting given that rodents obviously maintain this appendage while it is lost in humans.   

      The additional 6 sectioned human embryo tails (as described above) provide further information in support of the original findings of the paper: (i) that the secondary neural tube formation initially involves a single lumen, and (ii) that neural tube duplication occurs in many tails at more rostral levels. Neural tube duplication was observed in 15/25 of our sectioned tails: hence, overall 60% of human tails exhibited neural tube duplication in this study. We have replaced all the cross sectional images in the original Fig 7 (now Fig 6) to better illustrate the findings of neural tube duplication at relatively rostral levels of the human tail. Additionally, the axial position of sections containing duplicated neural tube are indicated by arrows in the graph of neural tube areas (Fig 4C). From this analysis it appears that neural tube duplication is not contingent on an increasing tail diameter, as raised by the reviewer, because some tails show a transition to neural tube duplication, and then return to an single lumen morphology more rostrally. While the 3D renderings of lumens would be interesting, we consider it beyond the scope of the present study.

      (3) Of potential interest is the process of junctional neurulation describing the mechanistic joining of the primary and secondary neural tube, which has recently been explored in chick embryos and demonstrated to have relevance to human disease (Dady 2014, Eibach 2017, Kim 2021). While it is clear this paper's goal does not center on the relationship between primary and secondary neurulation, such a mechanism may be relevant to the authors' interpretation of their observations of lumen coalescence. I wonder if the embryos studied provide any evidence to support junctional neurulation.  

      We agree this is an important point to address in the paper, and a new section has been inserted in the Discussion: ‘Transition from primary to secondary neurulation’ (pp 13-14). In brief, we find no evidence for a specific mode of ‘junctional neurulation’ in the human embryos. In any event, its existence is hypothetical in humans, suggested largely as an ‘embryological explanation’ for the finding of rare interrupted spinal cord defects in neurosurgical patients (Eibach, 2017). In chick neurulation there is longitudinal dorso-ventral overlap between the primary and secondary neural tubes (Dryden, 1980), with the junctional zone derived from ingressing cells at the node-streak border (Dady, 2014), a known source of neuromesodermal progenitors (NMPs). However, this is a very different developmental situation from the human so-called ‘junctional neurulation’ defect (Eibach, 2017), in which the spinal cord is physically and functionally interrupted, with only a rudimentary filament connecting the rostral and caudal parts.

      Reviewer #1 (Recommendations For The Authors):

      (1) Figures 3, 4, and 7, would be easier to digest quickly with inclusions of labels that mark the rostral and caudal transverse sections. For example, "caudal" over 3G and "rostral" over 3F.  

      Figures 3 and 4 have been combined to form revised Figure 3, and the rostral/caudal sections are no longer included, as these are superseded by the new Figure 4. Similarly Figure 7 has been replaced by new images in the revised Figure 6, with clear labelling of axial levels.

      (2) The manuscript does a nice job of comparing and contrasting the human findings to mouse, however, there are several instances where it would be nice to continue this trend within the text, such as including the rate of somite formation for rodents in the sections that you state the quantified human and published organoid findings, as well as the total number of somite rodents' exhibit. Additionally, the last sentence of the "Morphology of human PNP closure" section correctly states that human PNP's seem to close via Mode 2 neurulation that is seen in the mouse. However, my read of the literature (published by Dr. Copp) demonstrates that the PNP in mice actually closes via Mode 3 at the most caudal portion. If this is the case, it would be pointed to explicitly state that regionally dependent morphogenetic difference between the two species.  

      We agree these are important points to include. The additional somite data (for mouse) has been inserted in the Results section on ‘Somite formation’ (p 8), and the apparent absence of Mode 3 during human spinal neural tube closure is now included in the new Discussion section, ‘Transition from primary to secondary neurulation’ (pp 13-14).

      (3) The introduction to secondary neural tube formation with the hypothesis diagrams in Figure 7 is slightly jarring. At the beginning of the Figure, a schematic depicting the morphogenetic differences between primary and secondary would be helpful in introducing the readership to these complex embryologic events. An example of this could be similar to Figure 1 in Dr. Copp's paper:

      Nikolopoulou, E., et al. Neural tube closure: cellular, molecular and biomechanical mechanisms. Development 144, 552-566 (2017).  

      We feel that a summary diagram of primary and secondary neurulation would simply reproduce diagrams that are already widespread in the literature. As noted by the Reviewer, our article in Development (Nikolopoulou, 2017) contains just such a summary diagram as Figure 1. Therefore, we prefer to explicitly cite this article/figure in our Introduction (see modified first sentence, third paragraph, p 3), so that readers can consult the freely accessible Nikolopoulou review for more detail. The diagram in Figure 7 (now revised Figure 6) has been completely redrawn to make much clearer the hypotheses being examined in the study of human secondary neural tube formation, and neural tube duplication.

      (4) Finally, a matter of semantics, the second paragraph of the introduction describes myelomeningocele as a neurodegenerative defect, while it is true amniotic fluid further degrades exposed neural tissue while exposed, to me, the term neurodegenerative defect suggests a lifelong degeneration, which is not the case for human patients. Perhaps shortening to neurological defect is a compromise. Thank you for the important and interesting work.  

      We agree that ‘neurodegenerative’ can mean different things to different people. Literally, it refers to degeneration of neural tissue, which of course includes neuroepithelial loss due to amniotic fluid action in the uterus. Nevertheless, to avoid confusion, the word has been removed and the sentence expanded to include a reference to the adverse effects of amniotic fluid on the exposed neuroepithelium (see Introduction, second paragraph, p 3).

      Reviewer #2 (Public Review)

      It is not clear how the gestational age of the specimens was determined or how that can be known with certainty. There is no information given in the methods on this. With this in mind, bunching the samples at 2-day intervals in Figure 1J will lead to inaccuracies in assessing the rate of somite formation. This is pointed out as a major difference between specimens and organoids in the abstract but a similar result in the results section. The data supporting either of these statements is not convincing.

      Human embryos were assigned to Carnegie stages based on standard morphological criteria. This was stated, with references, in the first Results paragraph, and we have now also included this information in the Methods (first paragraph, p 19). We assigned the embryos to 2-day intervals based on the standard literature timing of these Carnegies stages, as described in O’Rahilly and Muller (1987). We have clarified both Carnegie staging and assignment of embryos to 2-day intervals in a new sentence within the Methods, first paragraph, p 19. “Embryos were assigned to Carnegie Stages (CS) using morphological criteria (O'Rahilly and Muller 1987; Bullen and Wilson 1997) and to 2-day post-conception intervals for regression analysis based on timings in Table 0-1 of O’Rahilly and Muller (1987).” This has also been inserted in the legend to Figure 1J.

      The regression analysis of somite number against days post-conception (Figure 1J) allowed a conclusion to be drawn on the rate of somite formation in early human embryos. We have added 95% confidence intervals to our finding of a new somite formed every 7.1 h in humans. We consider this to be important for comparison with non-human species and organoid systems. On p 8, second paragraph, we simply state our finding of a 7.1 h somite periodicity in human embryos, compared with 5 h in the organoid system (and 2 h in mouse and rat – as suggested by Reviewer 1). We are careful not to say it is a ‘major difference’ or ‘similar result’ in different parts of the paper, as the Reviewer has drawn attention to.

      Whenever possible, give the numbers of specimens that had the described findings. For example, in Figure 2C - how many embryos were examined with the massive rounded end at CS13? Apoptosis in Figures 3 and 4?  

      Numbers of embryos analysed in Figures 2 and 3 (the latter now a combined version of the original Figures 3 and 4) are shown in Table 2. We have also created a new Supplementary Figure 1 to show additional examples of human embryonic tails, which illustrate the consistency of morphology through the stages from CS13 to CS18. Numbers of samples that contributed to Figures 4-6 are detailed in the legends.

      For Figure 2I-K, it would be informative to superimpose the individual data points on the box plots distinguishing males from females, as in Figure 1I.  

      This was attempted but the data points overlie the box plots and look confusing. Instead, we have created Supplementary Table 2 which gives the raw data on which Figure 2I-K are

      based. We have also drawn attention to the fact that not all embryos yielded all types of measurement, especially tail lengths.

      Is it possible to quantitate apoptosis and proliferation data?  

      We have not quantified apoptosis, given the difficulty of deciding whether anti-caspase signals represent single or multiple dying cells. Instead, we performed a new tissue area analysis along the body axis, which has shed light on the possible direction (rostral to caudal) of tailgut loss in the human caudal region (see response to Reviewer 1 above). Since the cell proliferation data were limited in extent, and not a major focus of the paper, we have removed that analysis completely from the revised version.

      The Tunel staining in Figure 3 is difficult to make out.  

      We have extended our analysis of anti-caspase 3 immunohistochemisty and removed the TUNEL images.

      Reviewer #2 (Recommendations For The Authors)

      The anatomy of the sections in Figures 3, 4, and 7 is difficult to discern. Is it possible to insert adjacent panels tracing and labeling the structures in each panel? Also, drawings showing the axial level of each section would be helpful.

      To clarify the axial levels of sections, we have inserted images of mouse and human embryos as parts A and B of the revised Figure 3. We have tried to clarify the morphology of sections by labelling all relevant structures in the sections themselves.   

      High-magnification views of the tailbud in Figure 5 would be more informative. Staining is difficult to see after CS13. The low-magnification views can be shown in an insert. Figures 5 and 6 can be combined.

      At the reviewer’s suggestion, we have merged Figures 5 and 6 into a revised Figure 5. Now, the sections provide higher magnification images of the areas of expression as shown in the lower magnification whole mount images. We feel this makes the gene expression findings much clearer than before.

      Some of the writing in the abstract, introduction, and results is very descriptive, with a lack of summary and integration of information. For instance, the abstract could be rewritten to include an overall conclusion at the end and a better description of the longstanding questions addressed. Moreover, the abstract suggests multiple lumens are not found in human specimens. Another example is the second paragraph of the introduction lists various NTDs but doesn't provide an integrative conclusion of the information. The discussion is much better but lacks a conclusion at the end.

      We agree that more concluding sentences should be used, as the Reviewer suggests. To this end, we have rewritten the Abstract (p 2) to emphasise the long-standing questions that our study addresses, and concluding sentences are now included in other places (e.g. somite results, p 8). A new ‘Conclusions’ section has been added at the end of the Discussion (pp 17-18).

      ADDITIONAL CHANGES MADE TO REVISED MANUSCRIPT

      Title. This has been amended to: “Spinal neural tube formation and tail development in human embryos” to reflect the greater focus on developmental events, and less on tail regression.

      Additional studies have been added to Supplementary Table 1, to include the main transcriptomic studies of human embryos in the primary/secondary neurulation stage range. This takes the number of previous studies to 28 and the total number of embryos to 925. See p 4, top and p 12, first paragraph for corresponding changes to the text.

      We added a sentence to the Discussion (p 13, first paragraph) to counter the claim that humans have undergone ‘tail-loss’, as included in Xia et al, 2024, “On the genetic basis of tail-loss evolution in humans and apes”. Nature 626:1042-8. Clearly, the human embryo is tailed, which undermines these authors’ statement.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the paper, Yan and her colleagues investigate at which stage of development different categorical signals can be detected with EEG using a steady-state visual evoked potential paradigm. The study reports the development trajectory of selective responses to five categories (i.e., faces, limbs, corridors, characters, and cars) over the first 1.5 years of life. It reveals that while responses to faces show significant early development, responses to other categories (i.e., characters and limbs) develop more gradually and emerge later in infancy. The paper is well-written and enjoyable, and the content is well-motivated and solid.

      Strengths:

      (1) This study contains a rich dataset with a substantial amount of effort. It covers a large sample of infants across ages (N=45) and asks an interesting question about when visual category representations emerge during the first year of life.

      (2) The chosen category stimuli are appropriate and well-controlled. These categories are classic and important for situating the study within a well-established theoretical framework.

      (3) The brain measurements are solid. Visual periodicity allows for the dissociation of selective responses to image categories within the same rapid image stream, which appears at different intervals. This is important for the infant field, as it provides a robust measure of ERPs with good interpretability.

      Weaknesses:

      The study would benefit from a more detailed explanation of analysis choices, limitations, and broader interpretations of the findings. This includes:

      a) improving the treatment of bias from specific categories (e.g., faces) towards others;

      b) justifying the specific experimental and data analysis choices;

      c) expanding the interpretation and discussion of the results.

      I believe that giving more attention to these aspects would improve the study and contribute positively to the field.

      We thank the reviewer for their clear summary of the work and their constructive feedback. To address the reviewer’s concerns, in the revised manuscript we now provide a detailed explanation of analysis choices, limitations, and broader interpretations, as summarized in the point-by-point responses in the section: Reviewer #1 (Recommendations For The Authors) below, for which we give here an overview in points (a), (b), and (c):

      (a) The reviewer is concerned that using face stimuli as one of the comparison categories may hinder the detection of selective responses to other categories like limbs. Unfortunately, because of the frequency tagging design of our study we cannot compare the responses to one category vs. only some of the other categories (e.g. limbs vs objects but not faces). In other words, our experimental design does not enable us to do this analysis suggested by the reviewer. Nonetheless, we underscore that faces compromise only ¼ of contrast stimuli and we are able to detect significant selective responses to limbs, corridors and characters in infants after 6-8 months of age even as faces are included in the contrast and the response to faces continues to increase (see Fig 4). We discuss the reviewer’s point regarding how contrast can contribute to differences in findings in the discussion on pages 12-13, lines 344-351. Full details below in Reviewer 1: Recommendations for Authors - Frequency tagging category responses.

      (b) We expanded the justification of specific experimental and data analysis choices, see details below in Reviewer 1: Recommendations for Authors ->Specific choices for experiment and data analysis.

      (c) We expand the interpretation and discussion, see details below in Reviewer 1: Recommendations for Authors -> More interpretation and discussion.

      Reviewer #2 (Public Review):

      Summary:

      The current work investigates the neural signature of category representation in infancy. Neural responses during steady-state visually-evoked potentials (ssVEPs) were recorded in four age groups of infants between 3 and 15 months. Stimuli (i.e., faces, limbs, corridors, characters, and cars) were presented at 4.286 Hz with category changes occurring at a frequency of 0.857 Hz. The results of the category frequency analyses showed that reliable responses to faces emerge around 4-6 months, whereas responses to libs, corridors, and characters emerge at around 6-8 months. Additionally, the authors trained a classifier for each category to assess how consistent the responses were across participants (leave-one-out approach). Spatiotemporal responses to faces were more consistent than the responses to the remaining categories and increased with increasing age. Faces showed an advantage over other categories in two additional measures (i.e., representation similarity and distinctiveness). Together, these results suggest a different developmental timing of category representation.

      Strengths:

      The study design is well organized. The authors described and performed analyses on several measures of neural categorization, including innovative approaches to assess the organization of neural responses. Results are in support of one of the two main hypotheses on the development of category representation described in the introduction. Specifically, the results suggest a different timing in the formation of category representations, with earlier and more robust responses emerging for faces over the remaining categories. Graphic representations and figures are very useful when reading the results.

      Weaknesses:

      (1) The role of the adult dataset in the goal of the current work is unclear. All results are reported in the supplementary materials and minimally discussed in the main text. The unique contribution of the results of the adult samples is unclear and may be superfluous.

      (2) It would be useful to report the electrodes included in the analyses and how they have been selected.

      We thank the reviewer for their constructive feedback and for summarizing the strengths and weaknesses of our study. We revised the manuscript to address these two weaknesses.

      (1) The reviewer indicates that the role of the adult dataset is unclear. The goal of testing adult participants was to validate the EEG frequency tagging paradigm. We chose to use adults because a large body of fMRI research shows that both clustered and distributed responses to visual categories are found in adults’ high-level visual cortex. Therefore, the goal of the adult data is to determine whether with the same amount of data as we collect on average in infants, we have sufficient power to detect categorical responses using the frequency tagging experimental paradigm as we use in infants. Because this data serves as a methodological validation purpose, we believe it belongs to the supplemental data.

      We clarify this in the Results, second paragraph, page 5 where now write: “As the EEG-SSVEP paradigm is novel and we are restricted in the amount of data we can obtain in infants, we first tested if we can use this paradigm and a similar amount of data to detect category-selective responses in adults. Results in adults validate the SSVEP paradigm for measuring category-selectivity: as they show that (i) category-selective responses can be reliably measured using EEG-SSVEP with the same amount of data as in infants (Supplementary Figs S1-S2), and that (ii) category information from distributed spatiotemporal response patterns can be decoded with the same amount of data as in infants (Supplementary Fig S3).”

      (2) The reviewer asks us to report the electrodes used in the analysis and their selection. We note that the selection of electrodes included in the analyses has been reported in our original manuscript (Methods, section: Univariate EEG analyses). On pages 18-19, lines 530-538, we write: “Both image update and categorical EEG visual responses are reported in the frequency and time domain over three regions-of-interest (ROIs): two occipito-temporal ROIs (left occipitotemporal (LOT): channels 57, 58, 59, 63, 64, 65 and 68; right occipitotemporal (ROT) channels: 90, 91, 94, 95, 96, 99, and 100) and one occipital ROI (channels 69, 70, 71, 74, 75, 76, 82, 83 and 89). These ROIs were selected a priori based on a previously published study51. We further removed several channels in these ROIs for two reasons: (1) Three outer rim channels (i.e., 73, 81, and 88) were not included in the occipital ROI for further data analysis for both infant and adult participants because they were consistently noisy. (2) Three channels (66, 72, and 84) in the occipital ROI, one channel (50) in the LOT ROI, and one channel (101) in the ROT ROI were removed because they did not show substantial responses in the group-level analyses.”

      In the section Reviewer 2, Recommendations for the authors, we also addressed the reviewer’s minor points.

      Reviewer #3 (Public Review):

      Yan et al. present an EEG study of category-specific visual responses in infancy from 3 to 15 months of age. In their experiment, infants viewed visually controlled images of faces and several non-face categories in a steady state evoked potential paradigm. The authors find visual responses at all ages, but face responses only at 4-6 months and older, and other category-selective responses at later ages. They find that spatiotemporal patterns of response can discriminate faces from other categories at later ages.

      Overall, I found the study well-executed and a useful contribution to the literature. The study advances prior work by using well-controlled stimuli, subgroups of different ages, and new analytic approaches.

      I have two main reservations about the manuscript: (1) limited statistical evidence for the category by age interaction that is emphasized in the interpretation; and (2) conclusions about the role of learning and experience in age-related change that are not strongly supported by the correlational evidence presented.

      We thank the reviewer for their enthusiasm and their constructive feedback.

      (1) The overall argument of the paper is that selective responses to various categories develop at different trajectories in infants, with responses to faces developing earlier. Statistically, this would be most clearly demonstrated by a category-by-age interaction effect. However, the statistical evidence for a category by interaction effect presented is relatively weak, and no interaction effect is tested for frequency domain analyses. The clearest evidence for a significant interaction comes from the spatiotemporal decoding analysis (p. 10). In the analysis of peak amplitude and latency, an age x category interaction is only found in one of four tests, and is not significant for latency or left-hemisphere amplitude (Supp Table 8). For the frequency domain effects, no test for category by age interaction is presented. The authors find that the effects of a category are significant in some age ranges and not others, but differences in significance don't imply significant differences. I would recommend adding category by age interaction analysis for the frequency domain results, and ensuring that the interpretation of the results is aligned with the presence or lack of interaction effects.

      The reviewer is asking for additional evidence for age x category interaction by repeating the interaction analysis in the frequency domain. The reason we did not run this analysis in the original manuscript is that the categorical responses of interest are reflected in multiple frequency bins: the category frequency (0.857 Hz) and its harmonics, and there are arguments in the field as to how to quantify response amplitudes from multiple frequency bins (Peykarjou, 2022). Because there is no consensus in the field and also because how the different harmonics combine depends not just on their amplitudes but also on their phase, we chose to transform the categorical responses across multiple frequency bins from the frequency domain to the time domain. The transformed signal in the time domain includes both phase and amplitude information across the category frequency and its harmonics. Therefore, subsequent analyses and statistical evaluations were done in the time domain.

      However, we agree with the reviewer that adding category by age interaction analysis for the frequency domain results can further solidify the results. Thus, in the revised manuscript we added a new analysis, in which we quantified the root mean square (RMS) amplitude value of the responses at the category frequency (0.857 Hz) and its first harmonic (1.714 Hz) for each category condition and infant. Then we used a LMM to test for an age by category interaction. The LMM was conducted separately for the left and right lateral occipitotemporal ROIs. Results of this analysis find a significant category by age interaction, that is, in both hemispheres, the development of response RMS amplitudes varied across category (left occipitotemporal ROIs: βcategory x age = -0.21, 95% CI: -0.39 – -0.04, t(301) = -2.40, pFDR < .05; right occipitotemporal ROIs: βcategory x age = -0.26, 95% CI: -0.48 – -0.03, t(301) = -2.26, pFDR < .05). We have added this analysis in the manuscript, pages 7-8, lines 186-193: “We next examined the development of the category-selective responses separately for the right and left lateral occipitotemporal ROIs. The response amplitude was quantified by the root mean square (RMS) amplitude value of the responses at the category frequency (0.857 Hz) and its first harmonic (1.714 Hz) for each category condition and infant. With a  LMM analysis, we found significant development of response amplitudes in the both occipitotemporal ROIs which varied by category (left occipitotemporal ROIs: βcategory x age = -0.21, 95% CI: -0.39 – -0.04, t(301) = -2.40, pFDR < .05; right occipitotemporal ROIs: βcategory x age = -0.26, 95% CI – -0.48 – -0.03, t(301) = -2.26, pFDR < .05, LMM as a function of log (age) and category; participant: random effect).” We also added the formula for the LMM analysis in Table 1 in the Methods section, page 21.

      (2) The authors argue that their results support the claim that category-selective visual responses require experience or learning to develop. However, the results don't bear strongly on the question of experience. Age-related changes in visual responses could result from experience or experience-independent maturational processes. Finding age-related change with a correlational measure does not favor either of these hypotheses. The results do constrain the question of experience, in that they suggest against the possibility that category-selectivity is present in the first few months of development, which would in turn suggest against a role of experience. However the results are still entirely consistent with the possibility of age effects driven by experience-independent processes. The manner in which the results constrain theories of development could be more clearly articulated in the manuscript, with care taken to avoid overly strong claims that the results demonstrate a role of experience.

      Thanks for the comment. We agree with this nuanced point. It is possible that development of category-selective visual responses is a maturational process. In response to this comment, we have revised the manuscript to discuss both perspectives, see revised discussion section – A new insight about cortical development: different category representations emerge at different times during infancy, pages 14-15, lines 403-426, where we now write: “In sum, the key finding from our study is that the development of category selectivity during infancy is non-uniform: face-selective responses and representations of distributed patterns develop before representations to limbs and other categories. We hypothesize that this differential development of visual category representations may be due to differential visual experience with these categories during infancy. This hypothesis is consistent with behavioral research using head-mounted cameras that revealed that the visual input during early infancy is dense with faces, while hands become more prevalent in the visual input later in development and especially when in contact with objects 41,42. Additionally, a large body of research has suggested that young infants preferentially look at faces and face-like stimuli 17,18,33,34, as well as look longer at faces than other objects 41, indicating that not only the prevalence of faces in babies’ environments but also longer looking times may drive the early development of face representations. Further supporting the role of visual experience in the formation of category selectivity is a study that found that infant macaques that are reared without seeing faces do not develop face-selectivity but develop selectivity to other categories in their environment like body parts40. An alternative hypothesis is that differential development of category representations is maturational. For example, we found differences in the temporal dynamics of visual responses among four infant age groups, which suggests that the infant’s visual system is still developing during the first year of life. While the mechanisms underlying the maturation of the visual system in infancy are yet unknown, they may include myelination and cortical tissue maturation 66-71. Future studies can test these alternatives by examining infants’ visual diet, looking behavior, and brain development and examine responses using additional behaviorally relevant categories such as food 72–74. These measurements can test how environmental and individual differences in visual experiences may impact infants’ developmental trajectories. Specifically, a visual experience account predicts that differences in visual experience would translate into differences in development of cortical representations of categories, but a maturational account predicts that visual experience will have no impact on the development of category representations.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points:

      Bias from faces to other categories:

      - Frequency tagging category responses:

      We see faces from non-face objects and limbs from non-limb objects. Non-limb objects include faces; I suspect that finding the effects of limbs is challenging with faces in the non-limbs category. How would you clarify the choice of categories, and to what extent are the negative (i.e., non-significant) effects on other categories not because of the heavy bias to faces?

      The reviewer is concerned that using face stimuli as one of the comparison categories may hinder the ability to detect selective responses to other categories like limbs in our study. Unfortunately, because of the frequency tagging design of our study, we cannot compare the responses to one category to only some of the other categories (e.g. limbs vs objects but not faces), so our experimental design does not enable us to do the analysis suggested by the reviewer. Nonetheless, we underscore that faces compromise only ¼ of contrast stimuli in the category frequency tagging and we are able to detect significant selective responses to limbs, corridors and characters in infants after 6-8 months of age, when faces are included in the contrast and the responses to faces continue to increase more than for other categories (see Fig 4).

      We address this point in the discussion where we consider differences between our findings and those of Kosakowski et al. 2022, on pages 12-13, lines 344-351 we write: “We note that, the studies differ in several ways: (i) measurement modalities (fMRI in 27 and EEG here), (ii) the types of stimuli infants viewed: in 27 infants viewed isolated, colored and moving stimuli, but in our study, infants viewed still, gray-level images on phase-scrambled backgrounds, which were controlled for several low level properties, and (iii) contrasts used to detect category-selective responses, whereby in 27 the researchers identified within predefined parcels – the top 5% of voxels that responded to the category of interest vs. objects, here we contrasted the category of interest vs. all other categories the infant viewed. Thus, future research is necessary to determine whether differences between findings are due to differences in measurement modalities, stimulus format, and data analysis choices.”

      - Decoding analyses:

      Figure 5 Winner-take-all classification. First, the classifier may be biased towards the categories with strong and clean data, similar to the last point, this needs clarification on the negative effect. Second, it could be helpful to see how exactly the below-chance decoded categories were being falsely classified to which categories at the group level. Decoding accuracy here means a 20% chance the selection will go to the target category, but the prediction and the exact correlation coefficient the winner has is not explicit; concerning a value of 0.01 correlation could take the winner among negative or pretty bad correlations with other categories. It would be helpful to report how exactly the category was correlated, as it could be a better way to define the classification bias, for example, correlation differences between hit and miss classification. Also, the noise ceiling of the correlation within each group should be provided. Third, this classifier needs improvement in distinguishing between noise and signals to identify the type of information it extracts. Do you have thoughts about that?

      Thanks for the questions, answers below:

      In the winner-take-all (WTA) classifier analysis, at each iteration, the LOOCV classifier computed the correlation between each of the five category vectors from the left-out participant (test data, for an unknown stimulus) and each of the mean spatiotemporal vectors across the N-1 participants (training data, labeled data). The winner-take-all (WTA) classifier classifies the test vector to the category that yields the highest correlation with the training vector. For a given test pattern, correct classification yielded a score of 1 and an incorrect classification yielded a score of 0. Then we computed the group mean decoding performance across all N iterations for each category and the group mean decoding accuracies across five categories.

      For the classification data in Fig 5, the statistics and differences from chance are provided in 5B, where we report overall classification across all categories from an infant’s brain data. Like the reviewer, we were interested in assessing if successful classification is uniform across categories or is driven by some categories. As is visible in 5C, decoding success is non-uniform across categories, and is higher for faces than other categories. Because this is broken by category we cannot compare to chance, and what is reported in Fig 5c is percentage infants in each age group that a particular category was successfully decoded. Starting from 4 months of age, faces can be decoded from distributed brain data in a majority of infants, but other categories only in 20-40% of infants. 

      The reviewer also asks about what levels of correlations drive the classification. The analysis of RSMs in Fig 6a shows the mean correlations of distributed responses to different images within and between categories per age group. As is evident from the RSM, reproducible responses for a category only start to emerge at 4-6 months of age and the highest within category correlations are for faces. To quantify what drives the classification we measure distinctiveness - within category minus between-category correlations of distributed responses; all individual infant data per category are in Fig 6C. Distinctiveness values vary by age and category, see text related to Fig 6 in section: What is the nature of categorical spatiotemporal patterns in individual infants?

      Figure 6 Category distinctiveness. An analysis that runs on a "single item level" would ideally warrant a more informative category distinction. Did you try that? Does it work?

      Thanks for the question. We agree that doing an analysis at the single item level would be interesting. However, none of the images were repeated, so we do not have sufficient SNR to perform this analysis.

      Specific choices for experiment and data analysis:

      - Although using the SSVEP paradigm is familiar to the field, the choice could be detailed for understanding or evaluation of the effectiveness of the paradigm. For example, how the specific frequency for entrainment was chosen, and are there any theories or related warrants for studying in infants?

      Thanks for the questions. We choose to use the SSVEP paradigm over traditional ERP designs for several reasons, as described which have been listed in our original manuscript (Results part, first paragraph, pages 4-5, lines 90-94): “We used the EEG-SSVEP approach because: (i) it affords a high signal-to-noise ratio with short acquisitions making it effective for infants 23,46, (ii) it has been successfully used to study responses to faces in infants23,46,49, and (iii) it enables measuring both general visual response to images by examining responses at the image presentation frequency (4.286 Hz), as well as category-selective responses by examining responses at the category frequency (0.857 Hz, Fig 1A).”

      With regards to our choice of presentation rate, a previous study in 4-6-month-olds by de Heering and Rossion (2015) used SSVEP showing infants faces and objects presented the visual stimuli at 6 Hz (i.e. 167 ms per image) to study infants’ categorical responses to natural faces relative to objects. Here, we chose to use a relatively slower presentation rate, which was 4.286 Hz (i.e. 233 ms per image), so that our infant participants would have more time to process each image yet still unlikely to make eye movements across a stimulus. Both de Heering et (2015) and our study have found significant selective responses to faces relative to other categories in 4-6-month-olds, across these presentation rates. As discussed in a recent review of frequency tagging with infants: The visual oddball paradigm (Peykarjou, 2022), there are many factors to consider when adapting SSVEP paradigms to infants. We agree that an interesting direction for future studies is examination of how SSVEP parameters such as stimulus and oddball presentation rate, and overall duration of acquisition affects the sensitivity of the SSVEP paradigm in infants. We added a discussion point on this on page 12, lines 332-334 where we write: “As using SSVEP to study high-level representations is a nascent field52–54, future work can further examine how SSVEP parameters such as stimulus and target category presentation rate may affect the sensitivity of measurements in infants (see review by54).”

      - There is no baseline mentioned in the study. How was the baseline considered in the paradigm and data analysis? The baseline is important for evaluating how robust/ reliable the periodic responses within each group are in the first place. It also helps us to see how different the SNR changes in the fast periodic responses from baseline across age groups. Would the results be stable if the response amplitudes were z-scored by a baseline?

      Thanks for the question. Previous studies using a similar frequency tagging paradigm have compared response amplitude at stimulus-related frequencies to that of neighboring frequency bins as their baseline for differentiating signal from noise. We use a more statistically powerful method, the Hotelling’s T2 statistic to test whether response amplitudes were statistically different from 0 amplitude. Importantly, this method takes into consideration both the amplitude and phase information of the response. That is, a significant response is expected to have consistent phase information across participants as well as significant amplitude.

      - Statistical inferences: could the variance of data be considered appropriately in your LLM? Why?

      As we have explained in our original manuscript (Methods part, section-Statistical Analyses of Developmental Effects, page 21 lines 611-615): “LMMs allow explicit modeling of both within-subject effects (e.g., longitudinal measurements) and between-subject effects (e.g., cross-sectional data) with unequal number of points per participants, as well as examine main and interactive effects of both continuous (age) and categorical (e.g., stimulus category) variables. We used random-intercept models that allow the intercept to vary across participants (term: 1|participant).” This statistical model is widely used in developmental studies that combine both longitudinal and cross-sectional measurements (e.g. Nordt et al. 2022, 2023; Natu et al. 2021; Grotheer et al. 2022).

      - The sampling of the age groups. Why are these age groups considered, as 8-12 months are not considered? Or did the study first go with an equal sampling of the ages from 3 to 15 months? Then how was the age group defined? The log scale of age makes sense for giving a simplified view of the effects, but the sampling procedure could be more detailed.

      Thanks for the question. Our study recruited infants longitudinally for both anatomical MRI and EEG studies. Some of the infants participated in both studies and some only in one of the studies. Infants were recruited at around newborn, 3 months, 6 months, and 12 months. We did not recruit infants between 8-12 months of age because around 9 months there is little contrast between gray and white matter in anatomical MRI scans that were necessary for the MRI study. For the EEG study we binned the subjects by age group such that there were a similar number of participants across age groups to enable similar statistical power. The division of age groups was decided based on the distribution of the infants included in the analyses.

      We have now added the sampling procedure details in the Methods, part, under section: Participants, pages 15-16, lines 440-445: “Sixty-two full-term, typically developing infants were recruited. Twelve participants were part of an ongoing longitudinal study that obtained both anatomical MRI and EEG data in infants. Some of the infants participated in both studies and some only in one of the studies. Infants were recruited at around newborn, 3 months, 6 months, and 12 months. We did not recruit infants between 8-12 months of age because around 9 months there is little contrast between gray and white matter in anatomical MRI scans that were necessary for the MRI study.”

      - 30 Hz cutoff is arbitrary, but it makes sense as most EEG effects can be expected in a lower frequency band than higher. However, this specific choice is interesting and informative, when faced with developmental data and this type of paradigm. Would the results stay robust as the cutoff changes? Would the results benefit from going even lower into the frequency cutoff?

      In the time domain analyses, we choose the 30 Hz cutoff to be consistent with previous EEG studies including those done with infants. However, as our results from the frequency domain (Fig 3, right panel, and supplementary Fig S6-S9) show that there are barely any selective categorical responses above about 6 Hz. Therefore, we expect that using a lower frequency cutoff, such as 10 Hz, will not lead to different results.

      More interpretation and discussion:

      - You report the robust visual responses in occipital regions, the responses that differ across age groups, and their characteristics (i.e., peak latency and amplitude) in time curves. This part of the results needs more interpretation to help the data be better situated in the field; I wondered whether this relates to the difference in the signal processing of the information. Could this be the signature of slow recurrence connection development? Or how could this be better interpreted?

      Thanks for the question. Changes in speed of processing can arise from several related reasons including (i) myelination of white matter connections that would lead to faster signal transmission (Lebenberg et al. 2019; Grotheer et al. 2022), (ii) maturation of cortical visual circuits affecting temporal integration time, and (iii) development of feedback connections. Our data cannot distinguish among these different mechanisms. Future studies that combine functional high temporal resolution measurements with structural imaging of tissue properties could elucidate changes in cortical dynamics over development.

      We added this as a discussion point, on page 15 lines 416-420 we write: “For example, we found differences in the temporal dynamics of visual responses among four infant age groups, which suggests that the infant’s visual system is still developing during the first year of life. While underlying maturational mechanisms are yet unknown, they may include myelination and cortical tissue maturation68–73.”

      - The supplementary material includes a detailed introduction to the methods when facing the developing visual acuity, which justifies the choice of the paradigm. I appreciate this thorough explanation. Interestingly, high visual acuity has its potential developmental downside; for instance, low visual acuity would aid in the development of holistic processing associated with face recognition (as discussed by Vogelsang et al., 2018, in PNAS). How do you view this point in relation to the emergence of complex cognitive processes, as here the category-selective responses?

      Thanks for linking this to the Vogelsang (2018) study. Just as faces are processed in a hierarchical manner, starting with low-level features (edges, contours) and progressing to high-level features (identity, expression), other complex visual categories like cars, scenes, and body parts follow similar hierarchies. Early holistic processing could provide a foundation for recognizing objects quickly and efficiently, while feature-based processing might allow for more precise recognition and categorization as acuity increases. Therefore, as visual acuity improves, an infant’s brain can integrate finer details into those holistic representations, supporting more refined and complex cognitive processes. The balance between low- and high-level visual acuity highlights the intricate interplay between sensory processing and cognitive development across various domains.

      Minor points:

      Paradigm:

      - Are the colored cartoon images for motivating infants' fixation counterbalanced across categories in the paradigm? Or how exactly were the cartoon images presented in the paradigm?

      Response: Yes, the small cartoon images that were presented at the center of the screen during stimuli presentation were used to engage infants’ attention and accommodation to the screen. For each condition, they were randomly drawn from a pool of 70 images (23 flowers, 22 butterflies, 25 birds) from categories unrelated to the ones under test. They were presented in random order with durations uniformly distributed between 1 and 1.5 s.  We have added these details of the paradigm to the Methods section, page 17, lines 479-481: “To motivate infants to fixate and look at the screen, we presented at the center of the screen small (~1°) colored cartoon images such as butterflies, flowers, and ladybugs. They were presented in random order with durations uniformly distributed between 1 and 1.5 s.”

      Analysis:

      - Are the visual responses over the occipital cortex different across different category conditions in the first place? I guess this should not be different; this probably needs one more supplementary figure.

      The visual responses reflect the responses to images that are randomly drawn from the five stimuli categories at a presentation frequency of 4.286 Hz. The only difference between the five conditions is that the stimuli presentation order is different. Therefore, the visual response over the occipital cortex across conditions should not be different within an age group.

      In the revised manuscript, we have added Supplementary Figure S5 that shows the frequency spectra distribution and the response topographies of the visual response at 4.286 Hz and its first 3 harmonics separately for each condition and age group and a new Supplementary Materials section: 5. Visual responses over occipital cortex per condition for all age groups. On page 5, lines 116-120, we now write: “Analysis of visual responses in the occipital ROI separately by category condition revealed that visual responses were not significantly across category condition (Supplementary Fig S5, no significant main effect of category (βcategory = 0.08, 95% CI: -0.08 – 0.24, t(301) \= 0.97, p = .33), or category by age interaction (βcategory x age = -0.04, 95% CI: -0.11 – 0.03, t(301) \= -1.09, p = .28, LMM on RMS of response to first three harmonics).”

      - The summary of epochs used for each category for each age group needs to be included; this is important while evaluating whether the effects are due to not having enough data for categories or others.

      This part of information is provided in the manuscript in the Methods section, page 18 lines 521-524, and supplementary Table S2. Our analysis shows that there was no significant difference in the number of pre-processed epochs across different age groups (F(3,57) = 1.5, p \= .2).

      - Numbers of channels of EEG being interpolated should be provided; is that a difference across age groups?

      Thanks for the suggestion. We have now added information about the number of channels being interpolated for each age groups in the Methods section (page 18, lines 525-528): “The number of electrodes being interpolated for each age group were 10.0 ± 4.8 for 3-4-month-olds, 9.9 ± 3.7 for 4-6-month-olds, 9.9 ± 3.9 for 6-8-month-olds, and 7.7 ± 4.7 for 12-15-month-olds. There was no significant difference in the number of electrodes being interpolated across infant age-groups (F(3,55) = 0.78, p = .51).”

      - I noticed that the removal of EEG artifacts (i.e., muscles and eye-blinks) for data analysis is missing; did the preprocessing pipeline involve any artifacts removing procedures that are typically used in both infants and adults SSVEP data analysis? If so, please provide more information.

      In our analysis, artifact rejection was performed in two steps. First, the continuous filtered data were evaluated according to a sample-by-sample thresholding procedure to locate consistently noisy channels. Channels with more than 20% of samples exceeding a 100-150 μV amplitude threshold were replaced by the average of their six nearest spatial neighbors. Once noisy channels were interpolated in this fashion, the EEG was re-referenced from the Cz reference used during the recording to the common average of all sensors and segmented into epochs (1166.7-ms). Finally, EEG epochs that contained more than 15% of time samples exceeding threshold (150-200 microvolts) were excluded on a sensor-by-sensor basis. This method is provided in the manuscript under Methods section, page 18 lines 510-516.

      Figure:

      - Supplementary Figure 8. The illustration of the WTA classifier was not referred to anywhere in the main text.

      Thanks for pointing this out. The supplementary Figure 8 should be noted as supplementary Figure 10 instead. We have now mentioned it in the manuscript, page 10, line 267.

      - Figure 5 WTA classifier needed to be clarified. It was correlation-based but used to choose the most correlated response patterns averaged across the N-1 subjects for the leave-one-out subject. The change from correlation coefficients to decoding accuracy could be clearer as I spent some time making sense of it. The correlation coefficient here evaluates how correlated the two vectors are, but the actual decoding accuracy estimated at the end is the percentage of participants who can be assigned to the "ground truth" label, so one step in between is missing. Can this be better illustrated?

      Thanks for surfacing that this is not described sufficiently clearly and for your suggestions. The spatiotemporal vector was calculated separately for each category. This is illustrated in Fig 5A. At each iteration, the LOOCV classifier computed the correlation between each of the five category vectors from the left-out participant (test data, for an unknown stimulus) and each of the mean spatiotemporal vectors across the N-1 participants (training data, labeled data). The winner-take-all (WTA) classifier classifies the test vector to the category that yields the highest correlation with the training vector. This is illustrated in Fig 5A, with spatiotemporal patterns and correlation values from an example infant shown.  For a given test pattern, correct classification yields a score of 1 and an incorrect classification yields a score of 0.  We compute the percentage correct across all categories for each left-out-infant, and then mean decoding performance across all participants in an age group (Fig 5B). We have now added these details in the Methods part, section – Decoding analyses, Group-level, page 20 lines 590-597, where we write: “At each iteration, the LOOCV classifier computed the correlation between each of the five category vectors from the left-out participant (test data, for an unknown stimulus) and each of the mean spatiotemporal vectors across the N-1 participants (training data, labeled data). The winner-take-all (WTA) classifier classifies the test vector to the category of the training vector that yields the highest correlation with the training vector (Fig 5A). For a given test pattern, correct classification yields a score of 1 and an incorrect classification yields a score of 0.  For each left-out infant, we computed the percentage correct across all categories, and then the mean decoding performance across all participants in an age group (Fig 5B).”

      Reviewer #2 (Recommendations For The Authors):

      I only have some minor comments.

      Typo on line 90 ("Infants participants in 5 conditions, which [...]").

      Thanks for pointing this out. We have now corrected ‘participants’ to ‘participated’.

      Typo on lines 330: "[...] in example 4-5-months-olds.".

      Thanks for pointing this out. We changed ‘4-5-months-olds’ to ‘4-5-month-olds’.

      Figure 2 - bar plots: rotating and spacing out values on the x-axis may improve readability. Ditto for the line plots in Figure 4.

      Thanks for the suggestions. In the revised manuscript, we have improved the readability of Figure 2.

      Caption of Figure 6: description of the distinctiveness plots may refer to panel C, instead of the bottom panels of section B.

      Thanks for pointing this out. We have now corrected this information in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Opioids and related drugs are powerful analgesics that reduce suffering from pain. Unfortunately, their use often leads to addiction and there is an opioid-abuse epidemic that affects people worldwide. This study represents an ongoing effort to develop non-opioid analgesics for pain management. The findings point to an alternative approach to control post-surgical pain in lieu of opioid medications.

      Strengths:

      (1) The study responds to the urgent need for the development of non-opioid analgesics.

      (2) The study demonstrates the efficacy of Clarix Flo (FLO) and HC-HA/PTX3 from the human amniotic membrane (AM) in reducing pain in a mouse model without the adverse effects of opioids.

      (3) The study further explored the underlying mechanisms of how HC-HA/PTX3 produces its effects on neurons, suggesting the molecules/pathways involved in pain relief.

      (4) The potential use of naturally derived biologics from human birth tissues (AM) is safe and sustainable, compared to synthetic pharmaceuticals.

      (5) The study was conducted with scientific rigor, involving purification of active components, comparative analysis with multiple controls, and mechanistic explorations.

      Weaknesses:

      (1) It should be cautioned that while the preclinical findings are promising, these results still need to be translated into clinical settings that are complex and often unpredictable.

      (2) The study shows the efficacy of FLO and HC-HA/PTX3 in one preclinical model of post-surgical pain. The observed effect may be variable in other pain conditions.

      We thank the reviewer for these good comments and support! We agree with your suggestions and have provided more information in the discussion (Pages 11-12) and conclusion to address these comments.

      Reviewer #2 (Public review):

      Summary:

      This is an outstanding piece of work on the potential of FLO as a viable analgesic biologic for the treatment of postsurgical pain. The authors purified the HC-HA/PTX3 from FLO and demonstrated its potential as an effective non-opioid therapy for postsurgical pain. They further unraveled the mechanisms of action of the compound at cellular and molecular levels.

      Strengths:

      Prominent strengths include the incorporation of behavioral assessment, electrophysiological and imaging recordings, the use of knockout and knockdown animals, and the use of antagonist agents to verify biological effects. The integrated use of these techniques, combined with the hypothesis-driven approach and logical reasoning, provides compelling evidence and novel insight into the mechanisms of the significant findings of this work.

      Weaknesses:

      I did not find any significant weaknesses even with a critical mindset. The only minor suggestion is that the Results section may focus on the results from this study and minimize the discussions of background information.

      We thank the reviewer for your support! We revised the result section as suggested and reduced the discussion of background information.   

      Reviewer #3 (Public review):

      Summary:

      Non-opioid analgesics derived from human amniotic membrane (AM) product represents a novel and unique approach to analgesia that may avoid the traditional harms associated with opioids. Here, the study investigators demonstrate that HC-HAPTX3 is the primary bioactive component of the AM product FLO responsible for anti-nociception in mouse-model and in-vitro dorsal root ganglion (DRG) cell culture experiments. The mechanism is demonstrated to be via CD44 with an acute cytoskeleton rearrangement that is induced that inhibits Na+ and Ca++ current through ion channels. Taken together, the studies reported in the manuscript provide supportive evidence clarifying the mechanisms and efficacy of HC-HAPTX3 antinociception and analgesia.

      Strengths:

      Extensive experiments including murine behavioral paw withdrawal latency and Catwalk test data demonstrating analgesic properties. The breadth and depth of experimental data are clearly supporting mechanisms and antinociceptive properties.

      Weaknesses:

      A few changes to the text of the manuscript would be recommended but no major weaknesses were identified.

      We thank the reviewer for your support! We revised these texts as suggested. 

      Recommendations for the authors: Reviewer #1 (Recommendations for the authors):

      (1) The study showed an effect on baseline nociception and acute post-surgical pain. Chronic post-surgical pain is a major problem and should be considered.

      We thank the reviewer for this comment. To further improve the translational potential, we will extend current findings and employ chronic post-surgical pain models, such as skin/muscle incision and retraction (SMIR) in the thigh of the rodent,(1-3) as well as chronic pain models such as neuropathic pain in the future.  We acknowledged this limitation in the discussion. (Page 12)

      (2) Indicate the source of cultures DRGs.

      We added “Method 15 Culturing DRG neurons” in the revised manuscript.   

      (3) The size of DRG neurons was described in cross-sectional area (Figure 2 caption) and diameter (method). Be consistent.

      We thank the reviewer for this comment. Cross-sectional area has often been used for describing the size of DRG neurons for in vivo calcium imaging studies, including our previous work (4, 5). In order to keep consistent and make data comparable between studies, we also used the cross-sectional area in current study in Fig 2 in vivo calcium imaging experiment.  On the other hand, cell-diameter has been routinely/widely used for in vitro experiments such as in vitro electrophysiology recording and immunofluorescence staining of cultured DRG neurons. To be consistent with this tradition, we used cell-diameter in these experiments.  Methods for measuring the area and diameter are explicitly described for each experimental setting, and consistent between the current study and our previous studies (6). In the manuscript, our previously published studies have also been cited in the Methods section. (Method “4 In vivo calcium imaging in mice” and “10.2 Intrinsic excitability studies of DRG neurons”).

      (4) Clarify what "% of total" means in Figure 2. For bar graphs in 2B-D, the percent of total activated neurons (small, medium, and large) does not add up to 100.

      “% of total” represented the proportion of activated neurons relative to the total number of neurons counted from the same analyzed image. This information was added to the figure legend of Figure 2 (B-C) and Method “4 In vivo calcium imaging in mice”  in the revised manuscript. At the end of each experiment, we can over-exposure the image to unravel all neuronal profiles and count the total number of neurons on that field/image. Only a small portion of neurons in each size category responded to the test stimulation, and hence the total does not add up to 100.

      (5) Discuss clinical data or human studies to validate the efficacy and safety of FLO or HC-HA/PTX3 in patients.

      Thanks for the great suggestion. We provided a brief discussion (Page 11-12).

      Cryopreserved AM/UC has been clinically validated through several hundred peer-reviewed publications since 1995, including 12 studies specifically assessing FLO (Clarix Flo). These studies collectively support the safety and preliminary effectiveness of Clarix Flo in managing some clinical pain conditions such as knee osteoarthritis(7, 8), discogenic pain (9), rotator cuff tears(10), and painful neuropathy of the lower extremities (11). Currently, HC-HA/PTX3 is limited to pre-clinical research, and to our knowledge, there are no available data on its clinical efficacy and safety.

      (6) Introduction, last sentence of the second paragraph, delete "also".

      Thanks for carefully examining our manuscript. It was revised as suggested.

      Reviewer #2 (Recommendations for the authors):

      My only recommendation for improving the writing and presentation is to shorten the discussion of background information in Results.

      We thank the reviewer for your support and comments!  We previously intended to provide some background information to help readers understand the premise and rationale of the study, before illustrating our findings. Nevertheless, we reduced some background information in the result section as suggested by this reviewer to make it more straightforward. 

      Reviewer #3 (Recommendations for the authors):

      P4 last sentence - "Our findings highlight the potential of a naturally derived biologic from human birth tissues as an effective non-opioid treatment for post-surgical pain and unravel the underlying mechanisms." - another sentence clause is required before "unravel".

      As advised, we revised the sentence to: “Collectively, our findings highlight the potential of naturally derived biologics from human birth tissues as an effective non-opioid treatment for post-surgical pain. Moreover, we unravel the underlying mechanisms of pain inhibition induced by FLO and HC-HA/PTX3.”

      P7 second paragraph - please edit the following sentence for clarity: "Since HC-HA/PTX3 mimics FLO in producing pain inhibition, and it has high purity and is more water-soluble than FLO, making it suitable for probing cellular mechanisms.".

      As advised, we have revised the sentence. “Since HC-HA/PTX3 mimics FLO in its ability to inhibit pain and has higher purity and greater water solubility compared to FLO, it is well-suited for investigating cellular mechanisms.”

      References:

      (1) Flatters SJ. Characterization of a model of persistent postoperative pain evoked by skin/muscle incision and retraction (SMIR). Pain. 2008;135(1-2):119-30.

      (2) Ying YL, Wei XH, Xu XB, She SZ, Zhou LJ, Lv J, et al. Over-expression of P2X7 receptors in spinal glial cells contributes to the development of chronic postsurgical pain induced by skin/muscle incision and retraction (SMIR) in rats. Experimental neurology. 2014;261:836-43.

      (3) Cao S, Bian Z, Zhu X, and Shen SR. Effect of Epac1 on pERK and VEGF Activation in Postoperative Persistent Pain in Rats. Journal of molecular neuroscience : MN. 2016;59(4):554-64.

      (4) Chen Z, Huang Q, Song X, Ford NC, Zhang C, Xu Q, et al. Purinergic signaling between neurons and satellite glial cells of mouse dorsal root ganglia modulates neuronal excitability in vivo. Pain. 2022;163(8):1636-47.

      (5) Chen Z, Zhang C, Song X, Cui X, Liu J, Ford NC, et al. BzATP Activates Satellite Glial Cells and Increases the Excitability of Dorsal Root Ganglia Neurons In Vivo. Cells. 2022;11(15).

      (6) Ford NC, Barpujari A, He SQ, Huang Q, Zhang C, Dong X, et al. Role of primary sensory neurone cannabinoid type-1 receptors in pain and the analgesic effects of the peripherally acting agonist CB-13 in mice. Br J Anaesth. 2022;128(1):159-73.

      (7) Castellanos R, and Tighe S. Injectable Amniotic Membrane/Umbilical Cord Particulate for Knee Osteoarthritis: A Prospective, Single-Center Pilot Study. Pain Med. 2019;20(11):2283-91.

      (8) Mead OG, and Mead LP. Intra-Articular Injection of Amniotic Membrane and Umbilical Cord Particulate for the Management of Moderate to Severe Knee Osteoarthritis. Orthop Res Rev. 2020;12:161-70.

      (9) Buck D. Amniotic Umbilical Cord Particulate for Discogenic Pain. J Am Osteopath Assoc. 2019;119(12):814-9.

      (10) Ackley JF, Kolosky M, Gurin D, Hampton R, Masin R, and Krahe D. Cryopreserved amniotic membrane and umbilical cord particulate matrix for partial rotator cuff tears: A case series. Medicine (Baltimore). 2019;98(30):e16569.

      (11) Buksh AB. Ultrasound-guided injections of amniotic membrane/umbilical cord particulate for painful neuropathy of the lower extremity. Cogent Medicine. 2020;7(1):1724067.

    1. Author response:

      eLife Assessment

      “The work presented is important for our understanding of the development of the cardiac conduction system and its regulation by T-box transcription factors. The conclusions are supported by convincing data. Overall, this is an excellent study that advances our understanding of cardiac biology and has implications beyond the immediate field of study.”

      We appreciate the positive assessment of this work and the recognition of its importance in advancing our understanding of the cardiac conduction system, its regulation by T-box transcription factors, and contribution beyond the immediate field.

      Reviewer #1 (Public review):

      Summary:

      In a heroic effort, Ozanna Burnicka-Turek et al. have made and investigated conduction system-specific Tbx3-Tbx5 deficient mice and investigated their cardiac phenotype. Perhaps according to expectations, given the body of literature on the function of the two T-box transcription factors in the heart/conduction system, the cardiomyocytes of the ventricular conduction system seemed to convert to "ordinary" ventricular working myocytes. As a consequence, loss of VCS-specific conduction system propagation was observed in the compound KO mice, associated with PR and QRS prolongation and elevated susceptibility to ventricular tachycardia.

      Strengths:

      Great genetic model. Phenotypic consequences at the organ and organismal levels are well investigated. The requirement of both Tbx3 and Tbx5 for maintaining VCS cell state has been demonstrated.

      We thank Reviewer #1 for acknowledging the effort involved in generating and characterizing the Tbx3/Tbx5 double conditional knockout mouse model and for highlighting the significance of this work in elucidating the role of these transcription factors in maintaining the functional and transcriptional identity of the ventricular conduction system.

      Weaknesses:

      The actual cell state of the Tbx3/Tbx5 deficient conducting cells was not investigated in detail, and therefore, these cells could well only partially convert to working cardiomyocytes, and may, in reality, acquire a unique state.

      We agree with Reviewer #1 that the Tbx3/Tbx5 double mutant ventricular conduction myocardial cells may only partially convert to working cardiomyocytes or may acquire a unique state.  The transcriptional state of the double mutant VCS cells was investigated by bulk profiling of key genes associated with specific conduction and non-conduction cardiac regions, including fast conduction, slow conduction, or working myocardium. Neither the bulk transcriptional approaches nor the optical mapping approaches we employed capture single-cell data; in both cases, the data represents aggregated signals from multiple cells (1, 2). Single cell approaches for transcriptional profiling and cellular electrophysiology would clarify this concern and are appropriate for future studies.

      (1) O’Shea C, Nashitha Kabri S, Holmes AP, Lei M, Fabritz L, Rajpoot K, Pavlovic D (2020) Cardiac optical mapping – State-of-the-art and future challenges. The International Journal of Biochemistry & Cell Biology 126:105804. doi: 10.1016/j.biocel.2020.105804.

      (2) Efimov IR, Nikolski VP, and Salama G (2004) Optical Imaging of the Heart. Circulation Research 95:21-33. doi: 10.1161/01.RES.0000130529.18016.35.

      Reviewer #2 (Public review):

      Summary:

      The goal of this work is to define the functions of T-box transcription factors Tbx3 and Tbx5 in the adult mouse ventricular cardiac conduction system (VCS) using a novel conditional mouse allele in which both genes are targeted in cis. A series of studies over the past 2 decades by this group and others have shown that Tbx3 is a transcriptional repressor that patterns the conduction system by repressing genes associated with working myocardium, while Tbx5 is a potent transcriptional activator of "fast" conduction system genes in the VCS. In a previous work, the authors of the present study further demonstrated that Tbx3 and Tbx5 exhibit an epistatic relationship whereby the relief of Tbx3-mediated repression through VCS conditional haploinsufficiency allows better toleration of Tbx5 VCS haploinsufficiency. Conversely, excess Tbx3-mediated repression through overexpression results in disruption of the fast-conduction gene network despite normal levels of Tbx5. Based on these data the authors proposed a model in which repressive functions of Tbx3 drive the adoption of conduction system fate, followed by segregation into a fast-conducting VCS and slow-conduction AVN through modulation of the Tbx5/Tbx3 ratio in these respective tissue compartments.

      The question motivating the present work is: If Tbx5/Tbx3 ratio is important for slow versus fast VCS identity, what happens when both genes are completely deleted from the VCS? Is conduction system identity completely lost without both factors and if so, does the VCS network transform into a working myocardium-like state? To address this question, the authors have generated a novel mouse line in which both Tbx5 and Tbx3 are floxed on the same allele, allowing complete conditional deletion of both factors using the VCS-specific MinK-CreERT2 line, convincingly validated in previous work. The goal is to use these double conditional knockout mice to further explore the model of Tbx3/Tbx5 co-dependent gene networks and VCS patterning. First, the authors demonstrate that the double conditional knockout allele results in the expected loss of Tbx3 and Tbx5 specifically in the VCS when crossed with Mink-CreERT2 and induced with tamoxifen. The double conditional knockout also results in premature mortality. Detailed electrophysiological phenotyping demonstrated prolonged PR and QRS intervals, inducible ventricular tachycardia, and evidence of abnormal impulse propagation along the septal aspect of the right ventricle. In addition, the mutants exhibit downregulation of VCS genes responsible for both fast conduction AND slow conduction phenotypes with upregulation of 2 working myocardial genes including connexin-43. The authors conclude that loss of both Tbx3 and Tbx5 results in "reversion" or "transformation" of the VCS network to a working myocardial phenotype, which they further claim is a prediction of their model and establishes that Tbx3 and Tbx5 "coordinate" transcriptional control of VCS identity.

      We appreciate Reviewer #2’s detailed summary of the study’s aims, methodologies, and findings, as well as their thoughtful suggestions for further analysis. We are grateful for their recognition of our genetic model’s novelty and robustness.

      Overall Appraisal:

      As noted above, the present study does not further explore the Tbx5/Tbx3 ratio concept since both genes are completely knocked out in the VCS. Instead, the main claims are that the absence of both factors results in a transcriptional shift of conduction tissue towards a working myocardial phenotype, and that this shift indicates that Tbx5 and Tbx3 "coordinate" to control VCS identity and function.

      We agree with this reviewer’s assessment of the assertions in our manuscript.  The novel combined Tbx5/Tbx3 double mutant model does not further explore the TBX5/TBX3 ratio concept, which we previously examined in detail (1). Instead, as the Reviewer notes, this manuscript focuses on testing a model that the coordinated activity of Tbx3 and Tbx5 defines specialized ventricular conduction identity.

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      Strengths:

      (1) Successful generation of a novel Tbx3-Tbx5 double conditional mouse model.

      (2) Successful VCS-specific deletion of Tbx3 and Tbx5 using a VCS-specific inducible Cre driver line.

      (3) Well-powered and convincing assessments of mortality and physiological phenotypes.

      (4) Isolation of genetically modified VCS cells using flow.

      We thank Reviewer #2 for acknowledging the listed strengths of our study.

      Weaknesses:

      (1) In general, the data is consistent with a long-standing and well-supported model in which Tbx3 represses working myocardial genes and Tbx5 activates the expression of VCS genes, which seem like distinct roles in VCS patterning. However, the authors move between different descriptions of the functional relationship and epistatic relationship between these factors, including terms like "cooperative", "coordinated", and "distinct" at various points. In a similar vein, sometimes terms like "reversion" are used to describe how VCS cells change after Tbx3/Tbx5 conditional knockout, and other times "transcriptional shift" and at other times "reprogramming". But these are all different concepts. The lack of a clear and consistent terminology for describing the phenomena observed makes the overarching claims of the manuscript more difficult to evaluate.

      We discriminate prior work on the “long-standing and well-supported model’ supported by investigation of the role of Tbx5 and Tbx3 independently from this work examining the coordinated role of Tbx5 and Tbx3. Prior work demonstrated that Tbx3 represses working myocardial genes and Tbx5 activates expression of VCS genes, consistent with the reviewer’s suggestion of their distinct roles in VCS patterning. However, the current study uniquely evaluates the combined role of Tbx3 and Tbx5 in distinguishing specialized conduction identify from working myocardium, for the first time.

      We appreciate Reviewer #2’s feedback regarding the need for consistent terminology when describing the impact of the double Tbx3 and Tbx5 mutant. We will edit the manuscript to replace terms like “reversion” with “transcriptional shift” or “transformation” when describing the observed phenotype, and we will use “coordination” to describe the combined role of Tbx5 and Tbx3 in maintaining VCS-specific identity.

      (2) A more direct quantitative comparison of Tbx5 Adult VCS KO with Tbx5/Tbx3 Adult VCS double KO would be helpful to ascertain whether deletion of Tbx3 on top of Tbx5 deletion changes the underlying phenotype in some discernable way beyond mRNA expression of a few genes. Superficially, the phenotypes look quite similar at the EKG and arrhythmia inducibility level and no optical mapping data from a single Tbx5 KO is presented for comparison to the double KO.

      We thank Reviewer #2 for the suggestions that a direct comparison between Tbx5 single conditional knockout and Tbx3/Tbx5 double conditional knockout models may help isolate the specific contribution of Tbx3 deletion in addition to Tbx5 deletion.

      Previous studies have assessed the effect of single Tbx5 CKO in the VCS of murine hearts (1, 3, 5). Arnolds et al. demonstrated that the removal of Tbx5 from the adult ventricular conduction system results in VCS slowing, including prolonged PR and QRS intervals, prolongation of the His duration and His-ventricular (HV) interval (3). Furthermore, Burnicka-Turek et al. demonstrated that the single conditional knockout of Tbx5 in the adult VCS caused a shift toward a pacemaker cell state, with ectopic beats and inappropriate automaticity (1). Whole-cell patch clamping of VCS-specific Tbx5-deficient cells revealed action potentials characterized by a slower upstroke (phase 0), prolonged plateau (phase 2), delayed repolarization (phase 3), and enhanced phase 4 depolarization - features characteristic of nodal action potentials rather than typical VCS action potentials (3). These observations were interpreted as uncovering nodal potential of the VCS in the absence of Tbx5. Based on the role of Tbx3 in CCS specification (2), we hypothesized that the nodal state of the VCS uncovered in the absence of Tbx5 was enabled by maintained Tbx3 expression. This motivated us to generate the double Tbx5 / Tbx3 knockout model to examine the state of the VCS in the absence of both T-box TFs.

      In the current study, we demonstrate that the VCS-specific deletion of Tbx3 and Tbx5 results in the loss of fast electrical impulse propagation in the VCS, similar to that observed in the single Tbx5 mutant. However, unlike the Tbx5 single mutant, the Tbx3/Tbx5 double deletion does not cause a gain of pacemaker cell state in the VCS. Instead, the physiological data suggests a transition toward non-conduction working myocardial physiology. This conclusion is supported by the presence of only a single upstroke in the optical action potential (OAP) recorded from the His bundle region and VCS cells in Tbx3/Tbx5 double conditional knockout mice. The electrical properties of VCS cells in the double knockout are functionally indistinguishable from those of ventricular working myocardial cells. As a result, ventricular impulse propagation is significantly slowed, resembling activation through exogenous pacing rather than the rapid conduction typically associated with the VCS. We will edit the text of the manuscript to more carefully distinguish the observations between these models, as suggested.

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      (2) Mohan RA, Bosada FM, van Weerd JH, van Duijvenboden K, Wang J, Mommersteeg MTM, Hooijkaas IB, Wakker V, de Gier-de Vries C, Coronel R, Boink GJJ, Bakkers J, Barnett P, Boukens BJ, Christoffels VM (2020) T-box transcription factor 3 governs a transcriptional program for the function of the mouse atrioventricular conduction system. Proc Natl Acad Sci U S A. 117:18617-18626. doi: 10.1073/pnas.1919379117.

      (3) Arnolds DE, Liu F, Fahrenbach JP, Kim GH, Schillinger KJ, Smemo S, McNally EM, Nobrega MA, Patel VV, Moskowitz IP (2012) TBX5 drives Scn5a expression to regulate cardiac conduction system function. The Journal of Clinical Investigation 122:2509–2518. doi: 10.1172/JCI62617.

      (4) Frank DU, Carter KL, Thomas KR, Burr RM, Bakker ML, Coetzee WA, Tristani-Firouzi M, Bamshad MJ, Christoffels VM, Moon AM (2012) Lethal arrhythmias in Tbx3-deficient mice reveal extreme dosage sensitivity of cardiac conduction system function and homeostasis. Proc Natl Acad Sci U S A. 109:E154-63. doi: 10.1073/pnas.1115165109.

      (5) Moskowitz IP, Pizard A, Patel VV, Bruneau BG, Kim JB, Kupershmidt S, Roden D, Berul CI, Seidman CE, Seidman JG (2004) The T-Box transcription factor Tbx5 is required for the patterning and maturation of the murine cardiac conduction system. Development 131:4107-4116. doi: 10.1242/dev.01265. PMID: 15289437.

      (3) The authors claim that double knockout VCS cells transform to working myocardial fate, but there is no comparison of gene expression levels between actual working myocardial cells and the Tbx3/Tbx5 DKO VCS cells so it's hard to know if the data reflect an actual cell state change or a more non-specific phenomenon with global dysregulation of gene expression or perhaps dedifferentiation. I understand that the upregulation of Gja1 and Smpx is intended to address this, but it's only two genes and it seems relevant to understand their degree of expression relative to actual working myocardium. In addition, the gene panel is somewhat limited and does not include other key transcriptional regulators in the VCS such as Irx3 and Nkx2-5. RNA-seq in these populations would provide a clearer comparison among the groups.

      And

      the main claims are that the absence of both factors results in a transcriptional shift of conduction tissue towards a working myocardial phenotype, and that this shift indicates that Tbx5 and Tbx3 "coordinate" to control VCS identity and function. However, only limited data are presented to support the claim of transcriptional reprogramming since the knockout cells are not directly compared to working myocardial cells at the transcriptional level and only a small number of key genes are assessed (versus genome-wide assessment).

      We appreciate Reviewer #2’s suggestion to expand the gene expression analysis in Tbx3/Tbx5-deficient VCS cells by including other specific genes and comparisons with “native”/actual working ventricular myocardial cells and broadening the gene panel. In this study, we evaluated core cardiac conduction system markers, revealing a loss of conduction system-specific gene expression in the double mutant VCS. Furthermore, we evaluated key working myocardial markers normally excluded from the conduction system, Gja1 and Smpx, revealing a shift towards a working myocardial state in the double mutant VCS (Figure 4). We agree that a more comprehensive analysis, such as transcriptome-wide approaches, would offer greater clarity on the extent and specificity of the observed shift from conduction to non-conduction identity. These approaches are appropriate directions for future studies.

      (4) From the optical mapping data, it is difficult to distinguish between the presence of (a) a focal proximal right bundle branch block due to dysregulation of gene expression in the VCS but overall preservation of the right bundle and its distal ramifications; from (b) actual loss of the VCS with reversion of VCS cells to a working myocardial fate. Related to this, the authors claim that this experiment allows for direct visualization of His bundle activation, but can the authors confirm or provide evidence that the tissue penetration of their imaging modality allows for imaging of a deep structure like the AV bundle as opposed to the right bundle branch which is more superficial? Does the timing of the separation of the sharp deflection from the subsequent local activation suggest visualization of more distal components of the VCS rather than the AV bundle itself? Additional clarification would be helpful.

      And

      In addition, the optical mapping dataset is incomplete and has alternative interpretations that are not excluded or thoroughly discussed.

      We agree with Reviewer #2 that the resolution of the optical mapping experiment may be insufficient to precisely localize the conduction block due to the limited signal strength from the VCS. It is possible that the region defined as the His Bundle also includes portions of the right bundle branch. Our control mice show VCS OAP upstrokes consistent with those reported by Tamaddon et al. (2000) using Di-4-ANEPPS (1). We appreciate the Reviewer’s attention to alternative interpretations, and we will incorporate these caveats into the manuscript text.

      (1) Tamaddon HS, Vaidya D, Simon AM, Paul DL, Jalife J, Morley GE (2000) High-resolution optical mapping of the right bundle branch in connexin40 knockout mice reveals slow conduction in the specialized conduction system. Circulation Research 87:929-36. doi: 10.1161/01.res.87.10.929. 

      Impact:

      The present study contributes a novel and elegantly constructed mouse model to the field. The data presented generally corroborate existing models of transcriptional regulation in the VCS but do not, as presented, constitute a decisive advance.

      And

      In sum, while this study adds an elegantly constructed genetic model to the field, the data presented fit well within the existing paradigm of established functions of Tbx3 and Tbx5 in the VCS and in that sense do not decisively advance the field. Moreover, the authors' claims about the implications of the data are not always strongly supported by the data presented and do not fully explore alternative possibilities.

      We appreciate Reviewer # 2’s acknowledgment of the elegance and novelty of the mouse model we generated. However, we respectfully disagree with their assessment that this work merely corroborates existing models without providing a decisive advance. Previous studies have investigated single Tbx5 or Tbx3 gene knockouts in-depth and established the T-box ratio model for distinguishing fast VCS from slow nodal conduction identity (1) that the reviewer alludes to in earlier comments. In contrast, this study aimed to explore a different model, that the combined effects of Tbx5 and Tbx3 distinguish adult VCS identity from non-conduction working myocardium. The coordinated Tbx3 and Tbx5 role in conduction system identify remained untested due to the lack of a mouse model that allowed their simultaneous removal. The very model the reviewer recognizes as “novel and elegantly constructed” has allowed the examination of the coordinated role of Tbx5 and Tbx3 for the first time. While we acknowledge the opportunity for additional depth of investigation of this model in future studies, the data we present provides consistent experimental support for the coordinated requirement of both Tbx5 and Tbx3 for ventricular cardiac conduction system identity.

      (1) Burnicka-Turek O, Broman MT, Steimle JD, Boukens BJ, Petrenko NB, Ikegami K, Nadadur RD, Qiao Y, Arnolds DE, Yang XH, Patel VV, Nobrega MA, Efimov IR, Moskowitz IP (2020) Transcriptional Patterning of the Ventricular Cardiac Conduction System. Circulation Research 127:e94-e106. doi:10.1161/CIRCRESAHA.118.314460. 

      Reviewer #3 (Public review):

      Summary:

      In the study presented by Burnicka-Turek et al., the authors generated for the first time a mouse model to cause the combined conditional deletion of Tbx3 and Tbx5 genes. This has been impossible to achieve to date due to the proximity of these genes in chromosome 5, preventing the generation of loss of function strategies to delete simultaneously both genes. It is known that both Tbx3 and Tbx5 are required for the development of the cardiac conduction system by transcription factor-specific but also overlapping roles as seen in the common and diverse cardiac defects found in patients with mutations for these genes. After validating the deletion efficiency and specificity of the line, the authors characterized the cardiac phenotype associated with the cardiac conduction system (CCS)-specific combined deletion of T_bx5_ and Tbx3 in the adult by inducing the activation of the CCS-specific tamoxifen-inducible Cre recombination (MinK-creERT) at 6 weeks after birth. Their analysis of 8-9-week-old animals did not identify any major morphological cardiac defects. However, the authors found conduction defects including prolonged PR and QTR intervals and ventricular tachycardia causing the death of the double mutants, which do not survive more than 3 months after tamoxifen induction. Molecular and optical mapping analysis of the ventricular conduction system (VCS) of these mutants concluded that, in the absence of Tbx5 and Tbx3 function, the cells forming the ventricular conduction system (VCS) become working myocardium and lose the specific contractile features characterizing VCS cells. Altogether, the study identified the critical combined role of Tbx3 and Tbx5 in the maintenance of the VCS in adulthood.

      Strengths:

      The study generated a new animal model to study the combined deletion of Tbx5 and Tbx3 in the cardiac conduction system. This unique model has provided the authors with the perfect tool to answer their biological questions. The study includes top-class methodologies to assess the functional defects present in the different mutants analyzed, and gathered very robust functional data on the conduction defects present in these mutants. They also applied optical action potential (OAP) methods to demonstrate the loss of conduction action potential and the acquisition of working myocardium action potentials in the affected cells because of Tbx5/Tbx3 loss of function. The study used simpler molecular and morphological analysis to demonstrate that there are no major morphological defects in these mutants and that indeed, the conduction defects found are due to the acquisition of working myocardium features by the VCS cells. Altogether, this study identified the critical role of these transcription factors in the maintenance of the VCS in the adult heart.

      We appreciate the Reviewer’s comments regarding the originality and utility of our model and the strengths of our methodological approach. The Reviewer’s appreciation of the molecular and morphological analyses as well as their constructive feedback is highly valuable.

      Weaknesses:

      In the opinion of this reviewer, the weakness in the study lies in the morphological and molecular characterization. The morphological analysis simply described the absence of general cardiac defects in the adult heart, however, whether the CCS tissues are present or not was not investigated. Lineage tracing analysis using the reporter lines included in the crosses described in the study will determine if there are changes in CCS tissue composition in the different mutants studied. Similarly, combining this reporter analysis with the molecular markers found to be dysregulated by qPCR and western blot, will demonstrate that indeed the cells that were specified as VCS in the adult heart, become working myocardium in the absence of Tbx3 and Tbx5 function.

      We appreciate the reviewer’s concern regarding the morphology of the cardiac conduction system in the Tbx3/Tbx5 double conditional knockout model. We did not observe any structural abnormalities, as the Reviewer notes. We agree with their suggestion for using Genetic Inducible Fate Mapping to mark cardiac conduction cells expressing MinKCre. In fact, we utilized this approach to isolate VCS cells for transcriptional profiling. Specifically, we combined the tamoxifen-inducible MinKCreERT allele with the Cre-dependent R26Eyfp reporter allele to label MinKCre-expressing cells in both control VCS and VCS-specific double Tbx3/Tbx5 knockouts. EYFP-positive cells were isolated for transcriptional studies, ensuring that our analysis exclusively targeted conduction system-lineage marked cells. The ability to isolate MinKCre-marked cells from both controls and Tbx5/Tbx3 double mutants indicates that VCS cells persisted in the double knockout. Nonetheless, the suggestion for in-vivo marking by Genetic Inducible Fate Mapping and morphologic analysis is a valuable recommendation for future studies.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Mutations in CDHR1, the human gene encoding an atypical cadherin-related protein expressed in photoreceptors, are thought to cause cone-rod dystrophy (CRD). However, the pathogenesis leading to this disease is unknown. Previous work has led to the hypothesis that CDHR1 is part of a cadherin-based junction that facilitates the development of new membranous discs at the base of the photoreceptor outer segments, without which photoreceptors malfunction and ultimately degenerate. CDHR1 is hypothesized to bind to a transmembrane partner to accomplish this function, but the putative partner protein has yet to be identified.

      The manuscript by Patel et al. makes an important contribution toward improving our understanding of the cellular and molecular basis of CDHR1-associated CRD. Using gene editing, they generate a loss of function mutation in the zebrafish cdhr1a gene, an ortholog of human CDHR1, and show that this novel mutant model has a retinal dystrophy phenotype, specifically related to defective growth and organization of photoreceptor outer segments (OS) and calyceal processes (CP). This phenotype seems to be progressive with age. Importantly, Patel et al, present intriguing evidence that pcdh15b, also known for causing retinal dystrophy in previous Xenopus and zebrafish loss of function studies, is the putative cdhr1a partner protein mediating the function of the junctional complex that regulates photoreceptor OS growth and stability.

      This research is significant in that it:

      (1) provides evidence for a progressive, dystrophic photoreceptor phenotype in the cdhr1a mutant and, therefore, effectively models human CRD; and

      (2) identifies pcdh15b as the putative, and long sought after, binding partner for cdhr1a, further supporting the theory of a cadherin-based junction complex that facilitates OS disc biogenesis.

      Nonetheless, the study has several shortcomings in methodology, analysis, and conceptual insight, which limits its overall impact.

      Below I outline several issues that the authors should address to strengthen their findings.

      Major comments:

      (1) Co-localization of cdhr1a and pcdh15b proteins

      The model proposed by the authors is that the interaction of cdhr1a and pcdh15b occurs in trans as a heterodimer. In cochlear hair cells, PCDH15 and CDHR23 are proposed to interact first as dimers in cis and then as heteromeric complexes in trans. This was not shown here for cdhr1a and pcdh15b, but it is a plausible configuration, as are single heteromeric dimers or homodimers. Regardless, this model depends on the differential compartmental expression of the cdhr1a and pcdh15b proteins. Data in Figure 1 show convincing evidence that these two proteins can, at least in some cases, be distributed along the length of photoreceptor membranes that are juxtaposed, as would be the case for OS and CP. If pcdh15b is predominantly expressed in CPs, whereas cdhr1a is predominantly expressed in OS, then this should be confirmed with actin double labeling with cdhr1a and pcdh15b since the apicobasal oriented (vertical) CPs would express actin in this same orientation but not in the OS. This would help to clarify whether cdhr1a and pcdh15b can be trafficked to both OS and CP compartments or whether they are mutually exclusive.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we are undertaking imaging of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections. Additionally, we have recently established an immuno-gold-TEM protocol and are going to provide data showcasing co-labeling of cdhr1a and pcdh15b at TEM resolution.

      Photoreceptor heterogeneity goes beyond the cone versus rod subtypes discussed here and it is known that in zebrafish, CP morphology is distinct in different cone subtypes as well as cone versus rod. It would be important to know which specific photoreceptor subtypes are shown in zebrafish (Figures 1A-C) and the non-fish species depicted in Figures 1E-L. Also, a larger field of view of the staining patterns for Figures 1E-L would be a helpful comparison (could be added as a supplementary figure).

      The revised manuscript will include clear labeling of the different cone cell types as well as lower magnification images to be included as supplemental figures.

      (2) Cdhr1a function in cell culture

      The authors should explain the multiple bands in the anti-FLAG blots. Also, it would be interesting to confirm that the cdhr1a D173 mutant prevents the IP interaction with pcdh15b as well as the additive effects in aggregate assays of Figure 2.

      We believe that the D173 mutation results in no cdhr1a polypeptide, based on the lack of in situ signal in our WISH studies (figures showing absence of cdhr1a mRNA will be provided in a new supplemental figure). However, we will clone the D173 mutant and attempt co-IP with pchd15b in our cell culture system as well as the aggregation assay using K562 cells.

      Is it possible that the cultured cells undergo proliferation in the aggregation assays shown in Figure 2? Cells might differentially proliferate as clusters form in rotating cultures. A simple assay for cell proliferation under the different transfection conditions showing no differences would address this issue and lend further support to the proposed specific changes to cell adhesion as a readout of this assay.

      This is a possibility, however we did not use rotating cultures, this was a monolayer culture. We did not observe any differences in total cell number between the differing transfections. As such, we do not feel proliferation explains the aggregation of K562 cells.

      Also, the authors report that the number of clusters was normalized to the field of view, but this was not defined. Were the n values different fields of view from one transfection experiment, or were they different fields of view from separate transfection experiments? More details and clarification are needed.

      This will be clarified in the revised manuscript, in short we replicated this experiment 3 times, quantifying 5 different fields of view in each replicate.

      (3) Methodological issues in quantification and statistical analyses

      Were all the OS and CP lengths counted in the observation region or just a sample within the region? If the latter, what were the sampling criteria? For CPs, it seems that the length was an average estimate based on all CPs observed surrounding one cone or one-rod cell. Is this correct? Again, if sampled, how was this implemented? In Fig 4M', the cdhr1a-/- ROS mostly looks curvilinear. Did the measurements account for this, or were they straight linear dimension measurements from base to tip of the OS as depicted in Fig 5A-E? A clearer explanation of the OS and CP length quantification methodology is required.

      The revised manuscript will clearly outline measurement methods. In short, we measured every CP/OS in the imaged regions. We did not average CPs/cell, we simply included all CP measurements in our analysis. All our CP measurements (actin or cdhr1a or pcdh15), were done in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements (landmark) and association with proper cell type.

      All measurements were taken as best as possible to reflect a straight linear dimension for consistency.

      How were cone and rod photoreceptor cell counts performed? The legend in Figure 4 states that they again counted cells in the observation region, but no details were provided. For example, were cones and rods counted as an absolute number of cells in the observation region (e.g., number of cones per defined area) or relative to total (DAPI+) cell nuclei in the region? Changes in cell density in the mutant (smaller eye or thinner ONL) might affect this quantification so it would be important to know how cell quantification was normalized.

      The revised manuscript will clearly outline measurement methods. In short, rod and cone cell counts were based on the number of outer segments that were observed in the imaging region and previously measured for length. We did not observe any eye size differences in our mutant fish.

      In Figure 6I, K, measuring the length of the signal seems problematic. The dimension of staining is not always in the apicobasal (vertical) orientation. It might be more accurate to measure the cdhr1a expression domain relative to the OS (since the length of the OS is already reduced in the mutants). Another possible approach could be to measure the intensity of cdhr1 staining relative to the intensity within a Prph2 expression domain in each group. The authors should provide complementary evidence to support their conclusion.

      The revised manuscript will clearly outline measurement methods. In short, all of our CP measurements (actin or cdhr1a or pcdh15), were done in the presence of a counter stain, WGA, prph2, gnb1 or PNA to ensure proper measurements and association with proper cell type.

      A better description of the statistical methodology is required. For example, the authors state that "each of the data points has an n of 5+ individuals." This is confusing and could indicate that in Figure 4F alone there were ~5000 individuals assayed (~100 data points per treatment group x n=5 individuals per data point x 10 treatment groups). I don't think that is what the authors intended. It would be clearer if the authors stated how many OS, CP, or cells were counted in their observation region averaged per individual, and then provided the n value of individuals used per treatment group (controls and mutants), on which the statistical analyses should be based.

      This will be addressed in the revised manuscript. In short we had an n=5 (individual fish) analyzed for each genotype/time point. We will also include numbers of OS/CP quantified in the observation regions.

      There are hundreds of data points in the separate treatment groups shown in several of the graphs. It would not be correct to perform the ANOVA on the separate OS or CP length measurements alone as this will bias the estimates since they are not all independent samples. For example, in Figure 6H, 5dpf pcdh15b+/- have shorter CPs compared to WT but pcdh15b-/- have longer compared to WT. This could be an artifact of the analysis. Moreover, the authors should clarify in the Methods section which ANOVA post hoc tests were used to control for multiple pairwise comparisons.

      This will be clarified in the revised manuscript.

      (4) Cdhr1a function in photoreceptors

      The cdhr1a IHC staining in 5dpf WT larvae in Figure 3E appears different from the cdhr1a IHC staining in 5dpf WT larvae in Figure 1A or Figure 6I. Perhaps this is just the choice of image. Can the authors comment or provide a more representative image?

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we will include an image that better represents cdhr1a staining in the WT and mutant.

      The authors show that pcdh15b localization after 5dpf mirrored the disorganization of the CP observed with actin staining. They also show in Figure 5O that at 180dpf, very little pcdh15b signal remains. They suggest based on this data that total degradation of CPs has occurred in the cdhr1a-/- photoreceptors by this time. However, although reduced in length, COS and cone CPs are still present at 180dpf (Figure 5E, E'). Thus, contrary to the authors' general conclusion, it is possible that the localization, trafficking, and/or turnover of pcdh15b is maintained through a cdhr1a-dependent mechanism, irrespective of the degree to which CPs are maintained. The experiments presented here do not clearly distinguish between a requirement for maintenance of localization versus a secondary loss of localization due to defective CPs.

      We agree, this point will be addressed in our revised manuscript.

      (5) Conceptual insights

      The authors claim that cdhr1a and pcdh15b double mutants have synergistic OS and CP phenotypes. I think this interpretation should be revisited.

      First, assuming the model of cdhr1a-pcdh15b interaction in trans is correct, the authors have not adequately explained the logic of why disrupting one side of this interaction in a single mutant would not give the same severity of phenotype as disrupting both sides of this interaction in a double mutant.

      Second, and perhaps more critically, at 10dpf the OS and CP lengths in cdhr1a-/- mutants (Figure 7J, T) are significantly increased compared to WT. In contrast, there are no significant differences in these measurements in the pcdh15b-/- mutants. Yet in double homozygous mutants, there is a significant reduction of ~50% in these measurements compared to WT. A synergistic phenotype would imply that each mutant causes a change in the same direction and that the magnitude of this change is beyond additive in the double mutants (but still in the same direction). Instead, I would argue that the data presented in Figure 7 suggest that there might be a functionally antagonistic interaction between cdhr1a and pcdh15b with respect to OS and CP growth at 10dpf.

      If these proteins physically interacted in vivo, it would appear that the interaction is complex and that this interaction underlies both OS growth-promoting and growth-restraining (stabilizing) mechanisms working in concert. Perhaps separate homodimers or heterodimers subserve distinct CP-OS functional interactions. This might explain the age-dependent differences in mutant CP and OS length phenotypes if these mechanisms are temporally dynamic or exhibit distinct OS growth versus maintenance phases. Regardless of my speculations, the model presented by the authors appears to be too simplistic to explain the data.

      We agree with the reviewer, as such we will address this conclusion in our revised manuscript. To do so we will revise our final model and include more flexibility in the proposed mechanisms.

      Reviewer #2 (Public review):

      Summary:

      The goal of this study was to develop a model for CDHR1-based Con-rod dystrophy and study the role of this cadherin in cone photoreceptors. Using genetic manipulation, a cell binding assay, and high-resolution microscopy the authors find that like rods, cones localize CDHR1 to the lateral edge of outer segment (OS) discs and closely oppose PCDH15b which is known to localize to calyceal processes (CPs). Ectopic expression of CDHR1 and PCDH15b in K652 cells indicates these cadherins promote cell aggregation as heterophilic interactants, but not through homophilic binding. This data suggests a model where CDHR1 and PCDH15b link OS and CPs and potentially stabilize cone photoreceptor structure. Mutation analysis of each cadherin results in cone structural defects at late larval stages. While pcdh15b homozygous mutants are lethal, cdhr1 mutants are viable and subsequently show photoreceptor degeneration by 3-6 months.

      Strengths:

      A major strength of this research is the development of an animal model to study the cone-specific phenotypes associated with CDHR1-based CRD. The data supporting CDHR1 (OS) and PCDH15 (CP) binding is also a strength, although this interaction could be better characterized in future studies. The quality of the high-resolution imaging (at the light and EM levels) is outstanding. In general, the results support the conclusions of the authors.

      Weaknesses:

      While the cellular phenotyping is strong, the functional consequences of CDHR1 disruption are not addressed. While this is not the focus of the investigation, such analysis would raise the impact of the study overall. This is particularly important given some of the small changes observed in OS and CP structure. While statistically significant, are the subtle changes biologically significant? Examples include cone OS length (Figures 4F, 6E) as well as other morphometric data (Figure 7I in particular). Related, for quantitative data and analysis throughout the manuscript, more information regarding the number of fish/eyes analyzed as well as cells per sample would provide confidence in the rigor. The authors should also note whether the analysis was done in an automated and/or masked manner.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      The revised manuscript will clearly outline both methods and statistics used for quantitation of our data. (please see comments from reviewer 1). While we do not include direct evidence of the mechanism of CDHR1 function, we do propose that its role is important in anchoring the CP and the OS, particularly in the cones, while in rods it may serve to regulate the release of newly formed disks (as previously proposed in mice). We do plan to test both of these hypothesis directly, however, that will be the basis of our future studies.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Patel et al investigates the hypothesis that CDHR1a on photoreceptor outer segments is the binding partner for PCDH15 on the calyceal processes, and the absence of either adhesion molecule results in separation between the two structures, eventually leading to degeneration. PCDH15 mutations cause Usher syndrome, a disease of combined hearing and vision loss. In the ear, PCDH15 binds CDH23 to form tip links between stereocilia. The vision loss is less understood. Previous work suggested PCDH15 is localized to the calyceal processes, but the expression of CDH23 is inconsistent between species. Patel et al suggest that CDHR1a (formerly PCDH21) fulfills the role of CDH23 in the retina.

      The experiments are mainly performed using the zebrafish model system. Expression of Pcdh15b and Cdhr1a protein is shown in the photoreceptor layer through standard confocal and structured illumination microscopy. The two proteins co-IP and can induce aggregation in vitro. Loss of either Cdhr1a or Pcdh15, or both, results in degeneration of photoreceptor outer segments over time, with cones affected primarily.

      The idea of the study is logical given the photoreceptor diseases caused by mutations in either gene, the comparisons to stereocilia tip links, and the protein localization near the outer segments. The work here demonstrates that the two proteins interact in vitro and are both required for ongoing outer segment maintenance. The major novelty of this paper would be the demonstration that Pcdh15 localized to calyceal processes interacts with Cdhr1a on the outer segment, thereby connecting the two structures. Unfortunately, the data presented are inadequate proof of this model.

      Strengths:

      The in vitro data to support the ability of Pcdh15b and Cdhr1a to bind is well done. The use of pcdh15b and cdhr1a single and double mutants is also a strength of the study, especially being that this would be the first characterization of a zebrafish cdhr1a mutant.

      Weaknesses:

      (1) The imaging data in Figure 1 is insufficient to show the specific localization of Pcdh15 to calyceal processes or Cdhr1a to the outer segment membrane. The addition of actin co-labelling with Pcdh15/Cdhr1a would be a good start, as would axial sections. The division into rod and cone-specific imaging panels is confusing because the two cell types are in close physical proximity at 5 dpf, but the cone Cdhr1a expression is somehow missing in the rod images. The SIM data appear to be disrupted by chromatic aberration but also have no context. In the zebrafish image, the lines of Pcdh15/Cdhr1a expression would be 40-50 um in length if the scale bar is correct, which is much longer than the outer segments at this stage and therefore hard to explain.

      First let me thank the reviewer for taking the time to comprehensively evaluate our work and provide constructive criticism which will improve the quality of our final version.

      To address this issue, we are undertaking imaging of actin/cdhr1a and actin/pcdh15b using SIM in both transverse and axial sections. Additionally, we have recently established an immuno-gold-TEM protocol and are going to provide data showcasing co-labeling of cdhr1a and pcdh15b at TEM resolution. We are also going to include lower magnification images to complement the SIM images presented in figure 1.

      (2) Figure 3E staining of Cdhr1a looks very different from the staining in Figure 1. It is unclear what the authors are proposing as to the localization of Cdhr1a. In the lab's previous paper, they describe Cdhr1a as being associated with the connecting cilium and nascent OS discs, and fail to address how that reconciles with the new model of mediating CP-OS interaction. And whether Cdhr1a localizes to discrete domains on the disc edges, where it interacts with Pcdh15 on individual calyceal processes.

      The image in figure 3E was captured using a previous non antigen retrieval protocol which limits the resolution of the cdhr1a signal along the CP. In the revised manuscript we will include an image that better represents cdhr1a staining in the WT and mutant.

      (3) The authors state "In PRCs, Pcdh15 has been unequivocally shown to be localized in the CPs". However, the immunostaining here does not match the pattern seen in the Miles et al 2021 paper, which used a different antibody. Both showed loss of staining in pcdh15b mutants so unclear how to reconcile the two patterns.

      We agree that our staining appears different, but we attribute this to our antigen retrieval protocol which differed from the Miles et al paper. We also point to the fact that pcdh15b localization has been shown to be similar to our images in other species (monkey and frog). As such, we believe our protocol reveals the proper localization pattern which might be lost/hampered in the procedure used in Miles et al 2021.

      (4) The explanation for the CRISPR targets for cdhr1a and the diagram in Figure 3 does not fit with crRNA sequences or the mutation as shown. The mutation spans from the latter part of exon 5 to the initial portion of exon 6, removing intron 5-6. It should nevertheless be a frameshift mutation but requires proper documentation.

      This was an overlooked error in figure making, we apologize and will address this typo in the revised manuscript.

      (5) There are complications with the quantification of data. First, the number of fish analyzed for each experiment is not provided, nor is the justification for performing statistics on individual cell measurements rather than using averages for individual fish. Second, all cone subtypes are lumped together for analysis despite their variable sizes. Third, t-tests are inappropriately used for post-hoc analysis of ANOVA calculations.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript.

      (6) Unclear how calyceal process length is being measured. The cone measurements are shown as starting at the external limiting membrane, which is not equivalent to the origin of calyceal processes, and it is uncertain what defines the apical limit given the multiple subtypes of cones. In Figure 5, the lines demonstrating the measurements seem inconsistently placed.

      As we discussed for reviewer 1 and 2, all methods and quantification/statistics will be clearly described in the revised manuscript.

      (7) The number of fish analyzed by TEM and the prevalence of the phenotype across cells are not provided. A lower magnification view would provide context. Also, the authors should explain whether or not overgrowth of basal discs was observed, as seen previously in cdhr1-null frogs (Carr et al., 2021).

      The revised manuscript will include the aforementioned stats and lower magnification images. We will also compare our results directly to Carr 2021.

      (8) The statement describing the separation between calyceal processes and the outer segment in the mutants is not backed up by the data. TEM or co-labelling of the structures in SIM could be done to provide evidence.

      We will work to include more TEM and co-labeling data for the revised manuscript (see comments to reviewer 1)

      (9) "Based on work in the murine model and our own observations of rod CPs, we hypothesize that zebrafish rod CPs only extend along the newly forming OS discs and do not provide structural support to the ROS." Unclear how murine work would support that conclusion given the lack of CPs in mice, or what data in the manuscript supports this conclusion.

      In the revised manuscript we will improve our discussion of murine CPs, in that we still detect the juxtaposition of cdhr1 and pcdh15, along a potential remanent of the CP as previously described in SEM studies. Our findings do not indicate that mice or rats have CPs, we simply wanted to outline that the behavior of cdhr1 and pcdh15 still remains conserved, despite the absence of long traditional CPs.

      (10) The authors state "from the fact that rod CPs are inherently much smaller than cone CPs" without providing a reference. In the manuscript, the measurements do show rod CPs to be shorter, but there are errors in the cone measurements, and it is possible that the RPE pigment is interfering with the rod measurements.

      We will include a reference where rod CPs have been found to be shorter (monkey and frog data). We have no doubt that in zebrafish the rod CPs are significantly shorter. All our CP measurements are done with a counter stain for rods and cones to be sure that we are measuring the correct cell type.

      (11) The discussion should include a better comparison of the results with ocular phenotypes in previously generated pcdh15 and cdhr1 mutant animals.

      In the revised manuscript we will include this in our discussion.

      (12) The images in panels B-F of the Supplemental Figure are uncannily similar, possibly even of the same fish at different focal planes.

      We assure the reviewer that each of the images in supplemental figure 1 are distinct and represent different in situ experiments.

    1. Author response:

      We thank the reviewers for the positive and constructive feedback on our manuscript. We appreciate you highlighting the importance of our work in advancing our understanding of HIV latency and viral reactivation. The reviewers had mostly minor comments that we are in the process of addressing by completing additional experiments that are responsive to reviewer comments as well as some clarification of the text. These include:

      (1) The impact of INTS12 knockout on cell viability.

      We did not see an effect of the knockout of INTS12 on cell viability in the flow cytometry gating of live/dead cells, nor a gross difference in cell proliferation. However, we will test cell viability and proliferation more quantitatively and include this data in the revision.

      (2) The effect of INTS12 knockout on additional LRAs.

      There is published data that the Integrator complex inhibits HIV reactivation via additional LRAs that we will better highlight in the revision. In addition, we have data that we did not include in the original submission suggesting that INST12 knockout affects the degree of HIV reactivation with additional LRAs. We will confirm these results and include the data in the revision.

      (3) Extend the discussion on how exquisitely sensitive HIV transcription is to pausing and transcriptional elongation and the insights this provides about general HIV transcriptional regulation.

      Yes, we agree with this and will extend the discussion in this manner. We will also include additional data that we recently obtained that further emphasizes this point.

      (4) Comparison to another CRISPR screen using the same library (Hsieh et al., PLOS Pathogens, 2023).

      Indeed, INST12 was one of the hits in the previous paper (Hsieh et al., 2023) but was not specifically described or validated in that paper. We will point that out in the revision. Also, the Hsieh et al paper already described the library in more detail, but we will include additional text in the revision to emphasize that it casts a wide net on processes involved in transcriptional regulation.

      (5) We made a mistake on the numbering of the supplemental figures which lead to some misunderstanding. We will correct this as well as add other suggestions of the reviewers for clarifications.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      D'Oliviera et al. have demonstrated cleavage of human TRMT1 by the SARS-CoV-2 main protease in vitro. Following, they solved the structure of Mpro (Nsp5)-C145A bound to TRMT1 substrate peptide, revealing binding conformation distinct from most viral substrates. Overall, this work enhances our understanding of substrate specificity for a key drug target of CoV2. The paper is well-written and the data is clearly presented. It complements the companion article by demonstrating interaction between Mpro and TRMT1, as well as TRMT1 cleavage under isolated conditions in vitro. They show that cleaved TRMT1 has reduced tRNA binding affinity, linking a functional consequence to TRMT1 cleavage by MPro. Importantly, the revelation for flexible substrate binding of Nsp5 is fundamental for understanding Nsp5 as a drug target. Trmt1 cleavage assays by Mpro revealed similar kinetics for TRMT1 cleavage as compared to nsp8/9 viral polyprotein cleavage site. They purify TRMT1-Q350K, in which there is a mutation in the predicted cleavage consensus sequence, and confirm that it is resistant to cleavage by recombinant Mpro. I am unable to comment critically on the structural analyses as it is outside of my expertise. Overall, I think that these findings are important for confirming TRMT1 as a substrate of Mpro, defining substrate binding and cleavage parameters for an important drug target of SARS-CoV-2, and may be of interest to researchers studying RNA modifications.

      We thank the reviewer for their positive assessment and summary of our work in this paper!

      Reviewer #2 (Public review):

      Summary:

      The manuscript 'Recognition and Cleavage of Human tRNA Methyltransferase TRMT1 by the SARS-CoV-2 Main Protease' from Angel D'Oliviera et al., uncovers that TRMT1 can be cleaved by SARS-CoV-2 main protease (Mpro) and defines the structural basis of TRMT1 recognition by Mpro. They use both recombinant TRMT1 and Mpro as well as endogenous TRMT1 from HEK293T cell lysates to convincingly show cleavage of TRMT1 by the SARS-CoV-2 protease. Using in vitro assays, the authors demonstrate that TRMT1 cleavage by Mpro blocks its enzymatic activity leading to hypomodification of RNA. To understand how Mpro recognizes TRMT1, they solved a co-crystal structure of Mpro bound to a peptide derived from the predicted cleavage site of TRMT1. This structure revealed important protein-protein interfaces and highlights the importance of the conserved Q530 for cleavage by Mpro. They then compare their structure with previous X-ray crystal structures of Mpro bound to substrate peptides derived from the viral polyprotein and propose the concept of two distinct binding conformations to Mpro: P3´-out and P3´-in conformations (here P3´ stands for the third residue downstream of the cleavage site). It remains unknown what is the physiological role of these two binding conformations on Mpro function, but the authors established that Mpro has dramatically different cleavage efficiencies for three distinct substrates. In an effort to rationalize this observation, a series of mutations in Mpro's active site and the substrate peptide were tested but unexpectedly had no significant impact on cleavage efficiency. While molecular dynamic simulations further confirmed the propensity of certain substrates to adopt the P3´-out or P3´-in conformation, it did not provide additional insights into the dramatic differences in cleavage efficiencies between substrates. This led the authors to propose that the discrimination of Mpro for preferred substrates might occur at a later stage of catalysis after binding of the peptide. Overall, this work will be of interest to biologists studying proteases and substrate recognition by enzymes and RNA modifications as well as help efforts to target Mpro with peptide-like drugs.

      We thank the reviewer for this thorough and accurate summary of our work in this manuscript.

      Strengths:

      • The authors' statements are well supported by their data, and they used relevant controls when needed. Indeed, they used the Mpro C145A inactive variant to unambiguously show that the TRMT1 cleavage detected in vitro is solely due to Mpro's activity. Moreover, they used two distinct polyclonal antibodies to probe TRMT1 cleavage.

      • They demonstrate the impact of TRMT1 cleavage on RNA modification by quantifying both its activity and binding to RNA.

      • Their 1.9 Å crystal structure is of high quality and increases the confidence in the reported protein-protein contacts seen between TRMT1-derived peptide and Mpro.

      • Their extensive in vitro kinetic assay was performed in ideal conditions although it is sometimes unclear how many replicates were performed.

      • They convincingly show how Mpro cleavage is conserved among most but not all mammalian TRMT1 bringing an interesting evolutionary perspective on virus-host interactions.

      • The authors test multiple hypotheses to rationalize the preference of Mpro for certain substrates.

      • While this reviewer is not able to comment on the rigor of the MD simulations, the interpretations made by the authors seem reasonable and convincing.

      • The concept of two binding conformations (P3´-out or P3´-in) for the substrate in the active site of Mpro is significant and can guide drug design.

      We thank the reviewer for these positive assessments of manuscript strengths!

      Weaknesses:

      • The two polyclonal antibodies used by the authors seem to have strong non-specific binding to proteins other than TRMT1 but did not impact the author's conclusions or statements. This is a limitation of the commercially available antibodies for TRMT1.

      Yes, there are some levels of non-specific binding for all of the TRMT1 antibodies we have tested (this limitation of commercially available TRMT1 antibodies is also observed and noted by Zhang et al), but we agree that this does not impact the overall conclusions and that by using multiple different antibodies to show the same effects, we can have high confidence in the Western blot analysis and interpretation.

      • Despite the reasonable efforts of the authors, it remains unknown why Mpro shows higher cleavage efficiency for the nsp4/5 sequence compared to TRMT1 or nsp8/9 sequences. This is a challenging problem that will take substantially more effort by several labs to decipher mechanistically.

      True! To our knowledge and despite significant past efforts of many research groups studying similar coronavirus proteases (e.g. SARS-CoV-1 Mpro) a clear understanding of the detailed mechanistic relationship between cleavage sequence and cleavage kinetics remains mostly undefined. This is a great and important problem for mechanistic and computational groups with deep interests in proteases to tackle in the future! To highlight these and similar open questions, we have added a short paragraph to the Discussion section (second from the last paragraph).

      • The peptide cleavage kinetic assay used by the authors relies on a peptide labelled with a fluorophore (MCA) on the N-terminus and a quencher (Dpn) on the C-terminus. This design allows high-throughput measurements compatible with plate readers and is a robust and convenient tool. Nevertheless, the authors did not control for the impact of the labels (MCA and Dpn) on the activity of Mpro. While in most cases the introduced fluorophore/quencher do not impact activity, sometimes it can.

      Yes, we agree that it is possible the MCA and Dnp labels could have effects on the measured cleavage rates. These fluorophore/quencher peptide cleavage assays are the standard assays used by many labs in the protease field to study diverse proteases and diverse cleavage targets. When other labs have compared cleavage kinetic parameters measured with fluorophore/quencher-based peptide cleavage assays versus HPLC-based peptide cleavage assays, these are often found to be quite similar (e.g. Lee, J., Worrall, L.J., Vuckovic, M. et al. Crystallographic structure of wild-type SARS-CoV-2 main protease acyl-enzyme intermediate with physiological C-terminal autoprocessing site. Nat Commun 11, 5877 (2020). https://doi.org/10.1038/s41467-020-19662-4), although there are also examples where differences arise. In any case, we agree there could be some effects on the cleavage kinetics introduced by the fluorophore and/or quencher groups. However, our main focus in this paper is to show how a sequence in the human tRNA-modifying enzyme TRMT1 is cleaved by Mpro (and in this revision we have also added new data to show the functional effects of cleavage on TRMT1 activity); it will take significant future work to fully dissect the detailed relationships between peptide sequence, including the quantitative effects of fluorophore/quencher labels, and protease-directed cleavage kinetics. Based on our work in this paper and many past studies of similar proteases, understanding how peptide sequence or conformation relates to cleavage efficiency is a longer-term and very challenging problem that we view as beyond the scope of this work. We have added a brief section elaborating on this in the Discussion.

      • An unanswered question not addressed by the authors is if the peptides undergo conformational changes upon Mpro binding or if they are pre-organized to adopt the P3´-out and P3´-in conformations. This might require substantially more work outside the scope of this immediate article.

      We agree this is unanswered; we considered additional MD experiments to address this, but ultimately decided that since both of these sequences are cleaved in the context of much larger polypeptides (FL TRMT1 or the viral polypeptide), any simple analysis to assess the possibility of pre-organization and relate this preferred binding conformation to cleavage kinetics would be difficult to interpret in a biologically meaningful way. We think this and similar questions about how pre-organization of peptides or amino acid sequences in the polypeptides might influence protease binding and cleavage activity are interesting and important future questions for protease-focused groups in this field.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors have used a combination of enzymatic, crystallographic, and in silico approaches to provide compelling evidence for substrate selectivity of SARS-CoV-2 Mpro for human TRMT1.

      Strengths:

      In my opinion, the authors came close to achieving their intended aim of demonstrating the structural and biochemical basis of Mpro catalysis and cleavage of human TRMT1 protein. The revised version of the manuscript has addressed most of the questions I had posed in my earlier review.

      We thank the reviewer for their positive assessment of this work, and we are glad to hear the manuscript revisions were helpful in addressing the first round of reviews and questions.

      Weaknesses:

      Although several new hypotheses are generated from the Mpro structural data, the manuscript falls a bit short of testing them in functional assays, which would have solidified the conclusions the authors have drawn.

      Toward showing some of the functional effects of TRMT1 cleavage, in this revised version of the manuscript we have added new data and a new results section (‘Cleavage of TRMT1 results in complete loss of tRNA m2,2G modification activity and reduced tRNA binding in vitro’) showing that cleavage of TRMT1 results in reduced tRNA binding to TRMT1 (Figure 2D) and the complete loss of TRMT1-mediated tRNA modification activity in vitro (Figure 2C). This complements the in-cell data presented by Zhang et al showing that cleavage of TRMT1 in SARS-CoV-2 infected human cells results in the reduction of m2,2G modification levels. We think these data are a strong addition to this paper that broadens the impacts of our reported results more directly into the RNA modifications field.

      In terms of showing the further, downstream biological effects of TRMT1 cleavage and/or the specific impacts of TRMT1 cleavage on SARS-CoV-2 propagation and replication, while we agree further functional assays could absolutely heighten the overall impact, we view the main focus of our paper as showing how TRMT1 is recognized and cleaved by Mpro at the structural level and characterizing the biochemistry of the TRMT1-Mpro interaction and the effects of cleavage on TRMT1 tRNA-modifying activity. Zhang et al present some cellular data suggesting that loss of TRMT1 and/or TRMT1 cleavage during infection is actually detrimental to SARS-CoV-2 replication and infectivity. However, a full understanding of how TRMT1-mediated m2,2G modification of tRNA impacts viral translation, whether TRMT1 plays other roles during the viral life cycle, or whether TRMT1 cleavage (even if not important for viral fitness) contributes to cellular phenotypes during infection, will take a significant amount of future cell biology and virology work to unravel. Indeed, our understanding is that characterizing some of the endogenous cleavage targets for the HIV protease and determining the downstream biological effects and impacts on HIV infection took well over a decade. We hope that the biochemical and structural characterization of the Mpro-TRMT1 interaction presented in our paper will provide the necessary fundamental groundwork and impetus for future virology and cellular biochemistry studies to further investigate the biological roles of TRMT1 cleavage by SARS-CoV-2 Mpro.

      ---

      The following is the authors’ response to the original reviews.

      eLife Assessment:

      This manuscript provides important structural insights into the recognition and degradation of the host tRNA methyltransferase by SARS-CoV-2 protease nsp5 (Mpro). The data convincingly support the main conclusions of the paper. These results will be of interest to researchers studying structures and substrate recognition and specificity of viral proteases.

      We thank the eLife editors and reviewers for handling this manuscript and the overall positive assessment of our work.

      In this revised version of the manuscript we have included significant, new experimental data with recombinant purified, catalytically active TRMT1 that directly shows cleavage of TRMT1 reduces its tRNA binding affinity (by gel shift assays) and results in the complete loss of tRNA modifying activity in vitro (by radiolabel-based methyltransferase assays). Because these added experiments provide new information about how Mpro-mediated cleavage specifically impacts TRMT1 tRNA binding and m2,2G modification activity, and thus new information about the functional effects of loss of the TRMT1 Zn finger domain, we would strongly suggest adding that “this work may be of interest to researchers studying RNA modifications”, or a similar phrase, in the eLife assessment.

      Please find below our point-by-point response to each of the reviewer comments, which outlines additional changes to the manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      D'Oliviera et al. have demonstrated cleavage of human TRMT1 by the SARS-CoV-2 main protease in vitro. Following this, they solved the structure of Mpro-C145A bound to TRMT1 substrate peptide, revealing binding conformation distinct from most viral substrates. Overall, this work enhances our understanding of substrate specificity for a key drug target of CoV2. The paper is well-written and the data is clearly presented. It complements the companion article by demonstrating the interaction between Mpro and TRMT1 and TRMT1 cleavage under isolated conditions in vitro. Importantly, the revelation of flexible substrate binding of Nsp5 is fundamental for understanding Nsp5 as a drug target. Trmt1 cleavage assays revealed similar kinetics for TRMT1 cleavage as compared to the nsp8/9 viral polyprotein cleavage site, however, it would have been more rigorous for the authors to independently reproduce the kinetics reported for nsp8/9 using their specific experimental conditions. The finding that murine TRMT1 lacks a conserved consensus sequence is interesting, but is not experimentally tested here and is reported elsewhere. I am unable to comment critically on the structural analyses as it is outside of my expertise. Overall, I think that these findings are important for confirming TRMT1 as a substrate of Mpro and defining substrate binding and cleavage parameters for an important drug target of SARS-CoV-2.

      We thank the reviewer for their positive assessment and summary of our work in this paper!

      We absolutely agree that comparing to nsp8/9 cleavage kinetics measured in our own hands would be more rigorous here, and we have carried out these measurements in triplicate under the same conditions as were used to measure all the other peptide cleavage kinetics in this manuscript. Figures 5A & B (as well as Table S3 and Dataset S2) have been updated with our new nsp8/9 kinetic data (kcat = 0.019 +/- 0.002 s-1 and KM = 40 +/- 7.5 µM). As expected, our newly measured nsp8/9 kinetic parameters are very similar to those that we had previously cited from MacDonald et al (kcat = 0.013 +/- 0.001 s-1, KM = 36 +/- 6.0 µM), and show that Mpro-mediated TRMT1 peptide cleavage has similar proteolysis kinetics to the nsp8/9 viral polypeptide cleavage site.

      We have also purified full-length human TRMT1 Q530K, which is the key change in the cleavage consensus sequence that likely makes murine TRMT1 resistant to Mpro-mediated cleavage. In in vitro cleavage assays we find that indeed TRMT1 Q530K is entirely resistant to cleavage by recombinant Mpro and we have added this data to the manuscript in Figure 6D. These findings are consistent with previously cited data from Lu et al, which suggest mouse and hamster TRMT1 are not cleaved in HEK293T cells expressing Mpro.

      With the addition of the TRMT1 Q530K mutant data, we decided to move the evolutionary analysis together with this kinetic data to a new section in the Results. We think these additions and changes make the paper stronger and clearer, and thank the reviewer for these suggestions!

      Reviewer #2 (Public Review):

      Summary:

      The manuscript 'Recognition and Cleavage of Human tRNA Methyltransferase TRMT1 by the SARS-CoV-2 Main Protease' from Angel D'Oliviera et al., uncovers that TRMT1 can be cleaved by SARS-CoV-2 main protease (Mpro) and defines the structural basis of TRMT1 recognition by Mpro. They use both recombinant TRMT1 and Mpro as well as endogenous TRMT1 from HEK293T cell lysates to convincingly show cleavage of TRMT1 by the SARS-CoV-2 protease. To understand how Mpro recognizes TRMT1, they solved a co-crystal structure of Mpro bound to a peptide derived from the predicted cleavage site of TRMT1. This structure revealed important protein-protein interfaces and highlights the importance of the conserved Q530 for cleavage by Mpro. They then compared their structure with previous X-ray crystal structures of Mpro bound to substrate peptides derived from the viral polyprotein and proposed the concept of two distinct binding conformations to Mpro: P3´-out and P3´-in conformations (here P3´ stands for the third residue downstream of the cleavage site). It remains unknown what is the physiological role of these two binding conformations on Mpro function, but the authors established that Mpro has dramatically different cleavage efficiencies for three distinct substrates. In an effort to rationalize this observation, a series of mutations in Mpro's active site and the substrate peptide were tested but unexpectedly had no significant impact on cleavage efficiency. While molecular dynamic simulations further confirmed the propensity of certain substrates to adopt the P3´-out or P3´-in conformation, they did not provide additional insights into the dramatic differences in cleavage efficiencies between substrates. This led the authors to propose that the discrimination of Mpro for preferred substrates might occur at a later stage of catalysis after binding of the peptide. Overall, this work will be of interest to biologists studying proteases and substrate recognition by enzymes as well as help efforts to target Mpro with peptide-like drugs.<br />

      We thank the reviewer for this thorough and accurate summary of our work in this manuscript.

      Strengths:

      • The authors' statements are well supported by their data, and they used relevant controls when needed. Indeed, they used the Mpro C145A inactive variant to unambiguously show that the TRMT1 cleavage detected in vitro is solely due to Mpro's activity. Moreover, they used two distinct polyclonal antibodies to probe TRMT1 cleavage.

      • Their 1.9 Å crystal structure is of high quality and increases the confidence in the reported protein-protein contacts seen between TRMT1-derived peptide and Mpro.

      • Their extensive in vitro kinetic assay was performed in ideal conditions although it is unclear how many replicates were performed.

      • The authors test multiple hypotheses to rationalize the preference of Mpro for certain substrates.

      • While this reviewer is not able to comment on the rigor of the MD simulations, the interpretations made by the authors seem reasonable and convincing.

      • The concept of two binding conformations (P3´-out or P3´-in) for the substrate in the active site of Mpro is significant and can guide drug design.

      We thank the reviewer for these positive assessments of manuscript strengths!

      Weaknesses:

      • While the authors convincingly show that TRMT1 is cleaved by Mpro, the exact cleavage site was never confirmed experimentally. It is most likely that the predicted site is the main cleavage site as proposed by the authors (region 527-534). Nevertheless, in Fig 1C (first lane from the right) there are two bands clearly observed for the cleavage product containing the MT Domain. If the predicted site was the only cleavage site recognized by Mpro, then a single band for the MT domain would be expected. This observation suggests that there might be two cleavage sites for Mpro in TRMT1. Indeed, residues RFQANP (550-555) in TRMT1 might be a secondary weaker cleavage site for Mpro, which would explain the two observed bands in Fig 1C. A mass spectrometry analysis of the cleaved products would clarify this.

      We agree with the reviewer that based on the originally presented data it is possible there could be an additional Mpro-targeted cleavage site in TRMT1 beyond the 527-534 region that we validated through peptide cleavage assays of the TRMT1 526-536 peptide. Because it may be difficult to unambiguously identify and differentiate other putative cleavage sites that are nearby to 527-534 (e.g. the suggested possibility of 550-555) by mass spectrometry, we instead carried out additional in vitro cleavage assays with purified FL TRMT1 Q530K. Mutation of the invariant P1 Gln residue in the cleavage sequence is expected to prevent cleavage at this site, and allow us to probe whether there are other sites in TRMT1 that can be cleaved by Mpro (and if so, more straightforwardly identify them by mass spectrometry). We compared cleavage of purified WT FL TRMT1 and FL TRMT1 Q530K with recombinant Mpro in in vitro cleavage assays and found that TRMT1 Q530K is not cleaved by Mpro over the course of a 2h cleavage reaction. In these experiments, we also saw clear cleavage of WT FL TRMT1 over the course of 2h into only a single detectable band. Together, both of these pieces of data strongly suggest that the 527-534 region is the only Mpro-targeted cleavage site in TRMT1 (if there was an additional cleavage site, we should have seen some amount of cleavage in the Q530K mutant, but we do not). Overall, we feel that the updated WT and Q530K experiments clearly demonstrate that there is only one Mpro-mediated cleavage site in human TRMT1, which also is consistent with experiments in Zhang et al showing that Q530N mutations also block TRMT1 cleavage by co-expressed Mpro in human cells.

      The updated WT and Q530K cleavage assays have been added to the manuscript in Figure 6D.

      • A control is missing in Fig 1D. Since the authors use western blots to show the gradual degradation of endogenous TRMT1, a control with a protein that does not change in abundance over the course of the measurement is important. This is required to show that the differences in intensity of TRMT1 by western blotting are not due to loading differences etc.

      Yes, we agree this is an important control and have repeated these experiments and blotted for TRMT1 and GAPDH as a loading control. The updated Western blots are now shown in Figure 2B, and show the same result as the older data.

      • The two polyclonal antibodies used by the authors seem to have strong non-specific binding to proteins other than TRMT1 but did not impact the author's conclusions. This is a limitation of the commercially available antibodies for TRMT1, and unless the authors select a new monoclonal antibody specific to TRMT1 (costly and lengthy process), this limitation seems out of their control.

      Yes, there are some levels of non-specific binding for all of the TRMT1 antibodies we have tested (this limitation of commercially available TRMT1 antibodies is also observed and noted by Zhang et al), but we agree that this does not impact the overall conclusions and that by using multiple different antibodies to show the same effects, we can have high confidence in the Western blot analysis and interpretation.

      • The recombinantly purified TRMT1 seems to have some non-negligible impurities (extra bands in Fig 1C). This does not impact the conclusions of the authors but might be relevant to readers interested in working with TRMT1 for biochemical, structural, or other purposes.

      Yes, our initial isolations of recombinant TRMT1 for the first version of this paper produced smaller amounts of TRMT1 with some impurities; we agree that these do not impact the conclusions of the cleavage experiments. However, since our first submission, we have optimized our purification protocols for TRMT1 and are now able to obtain larger quantities of higher purity recombinant human TRMT1 from bacterial cells and we have used this material for the TRMT1 activity and tRNA binding assays added in this revision; we have also included updates to the expression and purification section for recombinant TRMT1. We hope that these improvements will be helpful to readers interested in working on TRMT1.

      • Despite the reasonable efforts of the authors, it remains unknown why Mpro shows higher cleavage efficiency for the nsp4/5 sequence compared to TRMT1 or nsp8/9 sequences.

      True! To our knowledge and despite significant past efforts of many research groups studying similar coronavirus proteases (e.g. SARS-CoV-1 Mpro) a clear understanding of the detailed mechanistic relationship between cleavage sequence and cleavage kinetics remains mostly undefined. This is a great and important problem for mechanistic and computational groups with deep interests in proteases to tackle in the future! To highlight these and similar open questions, we have added a short paragraph to the Discussion section (second from the last paragraph).

      • The peptide cleavage kinetic assay used by the authors relies on a peptide labelled with a fluorophore (MCA) on the N-terminus and a quencher (Dpn) on the C-terminus. This design allows high-throughput measurements compatible with plate readers and is a robust and convenient tool. Nevertheless, the authors did not control for the impact of the labels (MCA and Dpn) on the activity of Mpro. It is possible that the differences in cleavage efficiencies between peptides are due to unexpected conformational changes in the peptide upon labelling. Moreover, the TRMT1 peptide has an E at the N-terminus and an R at the C-terminus (while the nsp4/5 peptide has an S and M, respectively). It is possible that these two terminal residues form a salt bridge in the TRMT1 peptide that might constrain the conformation of the peptide and thus reduce its accessibility and cleavage by Mpro. Enzymatic assays in the absence of labels and MD simulations with the bona fide peptides (including the labels) used in the kinetic measurements are needed to prove that the cleavage efficiencies are not biased by the fluorescence assay.

      These fluorophore/quencher peptide cleavage assays are the standard assays used by many labs in the protease field to study diverse proteases and diverse cleavage targets. When other labs have compared cleavage kinetic parameters measured with fluorophore/quencher-based peptide cleavage assays versus HPLC-based peptide cleavage assays, these are often found to be quite similar (e.g. Lee, J., Worrall, L.J., Vuckovic, M. et al. Crystallographic structure of wild-type SARS-CoV-2 main protease acyl-enzyme intermediate with physiological C-terminal autoprocessing site. Nat Commun 11, 5877 (2020). https://doi.org/10.1038/s41467-020-19662-4), although there are also examples where differences arise. In any case, we agree there could be some effects on the cleavage kinetics introduced by the fluorophore and/or quencher groups or sequence-specific conformational preferences of the peptides. However, because our main focus in this paper is to show how a sequence in the human tRNA-modifying enzyme TRMT1 is cleaved by Mpro (and in this revision we have also added new data to show the functional effects of cleavage on TRMT1 activity), and the broad focus of our lab is understanding the mechanisms controlling the function and activity of RNA-modifying enzymes, we will leave it to other labs focused more specifically on protease biochemistry to fully dissect the detailed relationships between peptide sequence and conformation to protease-directed cleavage kinetics. As discussed above, based on our work in this paper and many past studies of similar proteases, understanding how sequence relates to cleavage efficiency is a longer-term and very challenging problem that we view as beyond the scope of this work. As noted above, we have added a brief section explaining this in the Discussion.

      • The authors used A431S variant in TRMT1-derived peptide to disrupt the P3´-in conformation. While this reviewer agrees with the rationale behind A431S design, it is important to confirm experimentally that the mutation disrupted the P3´-in conformation in favor of the P3´-out conformer. The authors could use their MD simulations to determine if the TRMT1 A431S variant favors the P3´-out conformation.

      Thank you for this suggestion; we agree and have carried out the suggested MD simulations with TRMT1 A531S peptides bound to Mpro. Surprisingly, these simulations suggest that the A531S peptide can still readily adopt the P3’-in conformation by orienting the Ser sidechain in a different way as compared to its positioning in the Mpro-nsp4/5 structure. Since this somewhat changes our interpretation of the results of the A531S kinetic experiments, we have rewritten this section of the manuscript by: (a) removing the ‘TRMT1 mutations predicted to alter peptide binding conformation have little effect on cleavage kinetics’ section in the Results, (b) instead adding several sentences talking about the A531S mutation to the previous section of the results, and including this mutation as another example of how mutations to either Mpro or TRMT1 residues that might be expected to impact cleavage kinetics do not in fact affect cleavage rates, and finally (c) adding the new MD simulation results to the A531S kinetic data in Figure S5 in the Supporting Information. We thank the reviewer for suggesting this important follow-up simulation!

      • An unanswered question not addressed by the authors is if the peptides undergo conformational changes upon Mpro binding or if they are pre-organized to adopt the P3´-out and P3´-in conformations.

      We agree this is unanswered; we considered additional MD experiments to address this, but ultimately decided that since both of these sequences are cleaved in the context of much larger polypeptides (FL TRMT1 or the viral polypeptide), any simple analysis to assess the possibility of pre-organization and relate this preferred binding conformation to cleavage kinetics would be difficult to interpret in a biologically meaningful way. We think this and similar questions about how pre-organization of peptides or amino acid sequences in the polypeptides might influence protease binding and cleavage activity are interesting and important future questions for protease-focused groups in this field.

      • While the authors describe at great length the hydrogen bonds involved in the substrate recognition by Mpro, they occluded to highlight important stacking interactions in this interface. For instance, Phe533 from TRMT1 stacks with Met49 while L529 from TRMT1 packs against His41 of Mpro. Both hydrogen bonding and stacking interactions seem important for TRMT1-derived peptide recognition by Mpro.

      Thank you for these suggestions toward additional structural analysis. We have added a short description of L529 packing in the S2 pocket to the main text and Figure S3B. We have also added a short description of F533 packing in the S3’ pocket to the main text and Figure S3C.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors have used a combination of enzymatic, crystallographic, and in silico approaches to provide compelling evidence for substrate selectivity of SARS-CoV-2 Mpro for human TRMT1.

      Strengths:

      In my opinion, the authors came close to achieving their intended aim of demonstrating the structural and biochemical basis of Mpro catalysis and cleavage of human TRMT1 protein. The combination of orthogonal approaches is highly commendable.

      We thank the reviewer for their positive assessment of this work!

      Weaknesses:

      It would have been of high scientific impact if the consequences of TRMT1 cleavage by Mpro on cellular metabolism were provided. Furthermore, assays to investigate the effect of inhibition of this Mpro activity on SARS-CoV-2 propagation and infection would have been extremely useful in providing insights into host- SARS-CoV-2 interactions.

      Toward showing some of the consequences of TRMT1 cleavage, in this revised version of the manuscript we have added new data and a new results section (‘Cleavage of TRMT1 results in complete loss of tRNA m2,2G modification activity and reduced tRNA binding in vitro’) showing that cleavage of TRMT1 results in reduced tRNA binding to TRMT1 (Figure 2D) and the complete loss of TRMT1-mediated tRNA modification activity in vitro (Figure 2C). This complements the in-cell data presented by Zhang et al showing that cleavage of TRMT1 in SARS-CoV-2 infected human cells results in the reduction of m2,2G modification levels. We think these data are a strong addition to this paper that broadens the impacts of our reported results more directly into the RNA modifications field.

      In terms of showing the further, downstream biological effects of TRMT1 cleavage and/or the specific impacts of TRMT1 cleavage on SARS-CoV-2 propagation and replication, while we agree this would absolutely heighten the overall impact, we view the main focus of our paper as showing how TRMT1 is recognized and cleaved by Mpro at the structural level and characterizing the biochemistry of the TRMT1-Mpro interaction and the effects of cleavage on TRMT1 tRNA-modifying activity. Zhang et al present some cellular data suggesting that loss of TRMT1 and/or TRMT1 cleavage during infection is actually detrimental to SARS-CoV-2 replication and infectivity. However, a full understanding of how TRMT1-mediated m2,2G modification of tRNA impacts viral translation, whether TRMT1 plays other roles during the viral life cycle, or whether TRMT1 cleavage (even if not important for viral fitness) contributes to cellular phenotypes during infection, will take a significant amount of future cell biology and virology work to unravel. Indeed, our understanding is that characterizing some of the endogenous cleavage targets for the HIV protease and determining the downstream biological effects and impacts on HIV infection took well over a decade. We hope that the biochemical and structural characterization of the Mpro-TRMT1 interaction presented in our paper will provide the necessary fundamental groundwork and impetus for future virology and cellular biochemistry studies to further investigate the biological roles of TRMT1 cleavage by SARS-CoV-2 Mpro.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Please list Mpro alias Nsp5 in the Abstract and Introduction, as this is the nomenclature used in the companion article.

      OK, we have made these changes.

      Reviewer #2 (Recommendations For The Authors):

      In addition to the points mentioned in the public review, this reviewer encourages the authors to address the following points:

      • Citation 14 is important for this work since the authors used multiple structures from that earlier study for comparison. Citation 14 seems outdated since it refers to a preprint that has been published since then in Nat Comm. The authors should cite the peer-reviewed work https://pubmed.ncbi.nlm.nih.gov/35729165/

      Thank you, we have updated this reference.

      • The description of the hydrogen bonds is tedious to read. The authors could instead classify them into two groups. Hydrogen bonds between main chain backbones or hydrogen bonds between side chains. For instance, they mention the contact between Mpro Glu166-TRMT1 Arg528. This can lead to confusion that a salt bridge is formed while these two residues interact only via their main chain backbones. Indeed, the side chain of R528 is exposed to the solvent.

      OK, we have taken this suggestion and tried to simplify and clarify this portion of the text (along with the accompanying structure Figure 3 showing key hydrogen bonds; see below).

      • For Figure 2, please label the residues of the peptide with the TRMT1 numbering. This will help the reader to follow the text while looking at the figure.

      OK we have added the TRMT1 numbering to what is now Figure 3A, and labeled key TRMT1 residues in Figures 3B, C, and D.

      • Fig 2B is important but crowded. The authors could use two panels to show two different views of this interface.

      Thank you for this suggestion, we have split B (now C and D in Figure 3) into two panels, rotated 90 degrees from one another, with each view showing a different subset of TRMT1-Mpro interactions. These updated panels are less crowded, and will hopefully be much clearer to readers.

      • For increased clarity, the authors could color P3´-out in orange and P3´-in teal in Fig 3D.

      OK, we have made this change.

      • Please proofread the method section. There should be a space between values and their units. For example, 20mM HEPES should be 20 mM HEPES.

      Thank you, we have corrected these formatting errors in the methods section of the revised version of the manuscript.

      • The authors did not identify the mechanism for the higher efficiency of nsp4/5 cleavage despite testing several mutants and MD simulations. Did the author consider changes in the network of water molecules that might be identified in the MD simulations?

      We did look at the positioning of waters in nsp4/5 vs nsp8/9 vs TRMT1 MD simulations. In the nsp4/5 simulation we do see a slightly higher density of water molecules positioned at approximately reasonable attack angles for substrate hydrolysis. If we consider water molecules with an attack angle on the scissile amide of 82 – 96 degrees and an attack distance of 4 Å or closer, the probabilities for these conditions in the simulations are: nsp4/5 – 19%, nsp8/9 – 9%, TRMT1 – 6%. More water positioned at reasonable attack positions for nsp4/5 might be consistent with its higher cleavage efficiency, but: (a) these are relatively small differences in water positioning across these 3 Mpro-substrate simulations that would not be enough to clearly explain the large differences in observed kinetics, and (b) hydrolysis happens in the later steps of the catalytic cycle, so to accurately capture this we would likely need to simulate reaction intermediates formed after initial attack of the active site Cys.

      We very much appreciate the reviewer’s enthusiasm in pushing us to understand the mechanistic basis for Mpro-directed cleavage efficiencies, and we would have absolutely loved to figure this out! (As it appears to be a long-standing question in the field!) But as discussed above and in the manuscript, we think that it will take a detailed dissection of different steps in the catalytic cycle to understand where and how this selectivity arises. We will leave it to research groups focused more exclusively on the details of protease biochemistry and simulations of reactive intermediates to take up these significant and long-term challenges!

      • In the PDB deposition, Y154 from chain B should be fixed.

      • In the PDB deposition, some added glycerols seem to conflict. Although this is not important for the biological work discussed in this study, the authors should check if glycerol 403 in chain A and 402, 403 in chain B are properly modeled. Does the density justify placing a glycerol there?

      • In the PDB deposition, there are over 51 RSRZ outliers. The authors should double-check if they cannot fix them with additional refinements. While such outliers in poorly defined linkers are understandable, this is unexpected for well-defined regions in the map.

      We have made a number of updates to our PDB deposition to address the above three points. (1) We have reexamined and tweaked the loop region at Y154 chain B; this region of the structure has relatively poorly defined electron density, but we now have a model where Y154 is no longer a Ramachandran outlier. The PDB model is now free of any Ramachandran outliers. (2) We have reexamined each of the modeled glycerol molecules and removed one of these (GOL 402), which had a weaker fit to the electron density. The remaining two glycerols appear to be well-modeled (omit maps leaving out each glycerol show strong Fo-Fc density that clearly looks like a glycerol in shape, adding each glycerol back into the model decreases Rwork and Rfree, and the refined 2Fo-Fc map fits well to the modeled glycerols). (3) We agree there are a large number of RSRZ outliers in this structure. We have reexamined many of these, and come to the same conclusion as for our original deposition: that most of these result from residues where there is clear enough density for placing the backbone into the map, but very poor density for the sidechain. Modeling different sidechain positions for the RSRZ outliers we reexamined did not appreciably improve the model fit or change their RSRZ outlier status. For example, Y154 in chains A and B remain some of the worst RSRZ outliers; while the density for these loop regions is generally not very good, it is clear that the backbone atoms of Y154 can be modeled into the structure, but there is very very weak density for the sidechain. We tried modeling alternative and/or multiple sidechain conformations for Y154, but this did not significantly reduce the size of the RSRZ outlier. In short, while we could remove some of these residues or truncate the sidechain where the sidechain density is very poor to lower the total number of RSRZ outliers, we think the best model is one where we leave these residues built into the structure and accept the higher number of RSRZ outliers. Importantly, none of the significant RSRZ outliers are key residues of biological interest that would affect our interpretation of the structure and/or TRMT1-Mpro biochemistry.

      We have deposited a new, re-refined PDB model (9DW6) that incorporates these changes and supersedes our old PDB entry (8D35). We have updated the manuscript with the new PDB ID. We thank the reviewer for these suggestions that improved the overall structural model.

      Reviewer #3 (Recommendations For The Authors):

      The crystal structure entry in the PDB should mention the Cys-to-Ala substitution in Mpro.

      Thank you, we have made this change

      Fig 2A and 2B: Can the authors highlight the Gln520-Ala531 peptide bind with a different color, please? It gets lost in panel B.

      Yes, we have made significant revisions to what is now Figure 3, and have highlighted the scissile peptide bond atoms in orange in each of these panels. Thank you for this suggestion, we agree it helps readers to orient themselves within the structure.

      "Importantly, the identified Mpro-targeted residues in human TRMT1 are conserved in the human population (i.e. no missense polymorphisms), showing that human TRMT1 can be recognized and cleaved by SARS-CoV-2 Mpro." Is TRMT1 prone to a high frequency of missense polymorphisms? If so, then this point makes sense. If not, it is not clear if this really informs on any biologically relevant mechanism.

      Given (i) that primate TRMT1 was previously identified under positive selection (i.e. rapid evolution) in an evolutionary screen (Cariou et al PNAS 2022) and (ii) that our study is mostly in vitro, we thought it was important to, first, make sure that this sequence of TRMT1 used in functional assays is not specific to a reference sequence that we tested in vitro, but is actually the sequence of TRMT1 in the human population. Further, we were also looking for whether some variations in the Mpro cleavage site of TRMT1 were possibly present in some humans (could these be linked with severe COVID or susceptibility, for example?).

      Overall, this statement aims to anchor our in vitro results to the TRMT1 sequences actually present in humans. However, we agree this does not inform “biologically relevant mechanism”. We therefore took out the “Importantly” that was probably misleading.

      "TRMT1 engages the Mpro active site in a distinct binding conformation."

      This is reported as an observation with little analysis. What is the structural basis of this conformational difference between the bound peptides? Why are the psi angles different? Is there a steric factor that is different between these peptide chains? This section can be substantially improved in detail from its current state.

      See our related answer to the next comment below.

      "Molecular dynamics simulations suggest kinetic discrimination happens during later steps of Mpro-catalyzed substrate cleavage." This section could have partly addressed my previous comment. It is not clear why there is such a large difference in the psi-angle. With access to several peptide-bound structures, the authors should derive and provide insights into the underlying fundamental principles. After all, this is a major point of discovery in their investigation.

      We agree that it is not entirely clear why TRMT1 seems to favor the P3’-in conformation when binding to Mpro. The only other known peptide-bound structure that adopts a similar P2’ psi angle is nsp6/7, but there are not clear sequence, steric, or interaction features that distinguish TRMT1 and nsp6/7 from the other 6 peptide-Mpro structures that favor a P3’-out conformation with larger P2’ psi angle. In particular, the identity of the P1’ and P3’ residues, which would probably be expected to have the largest impact on this conformation, have no clear commonality in TRMT1 and nsp6/7 that give hints about why these adopt this unique conformation. As we describe in the discussion section of the manuscript, and has been observed by many other studies of Mpro, the protease active site is very plastic and able to accommodate a diverse range of sequences surrounding the invariant P1 Gln. Furthermore, while the crystal structures of TRMT1 and other nsp cleavage sequences bound to Mpro show a single peptide conformation in the active site, our MD simulations suggest that both P3’-in and P3’-out type conformations are present in solution for TRMT1, nsp4/5, and nsp8/9, just with different populations. It is very likely that there is a delicate energetic balance between these conformations that may depend subtly on multiple sequence features of the peptide and how they interact with each other and the flexible Mpro active site. As with our replies to questions from Reviewer 2 above about deciphering the underlying principles that connect peptide sequence to cleavage efficiency, we expect that dissecting the detailed links between sequence and binding conformation will be a long-term challenge for mechanistic and biocomputational groups focused on viral protease enzymes; systematic mutation of all residues in the cleavage sequence to multiple different amino acid identities followed by structure determination either experimentally and/or computationally will likely be required to uncover the key sequence or steric properties and interactions that underly and drive favored peptide binding conformations.

      To highlight these questions as significant and difficult future challenges toward understanding the fundamental principles underlying SARS-CoV Mpro proteolysis, we have added an additional paragraph (second from the last paragraph) in the discussion section.

      This work can be taken to a whole new level if the authors were to provide insights into how TRMT1 degradation by Mpro affects host cell biology and how the inhibition of this activity affects CoV biology.

      We certainly agree that showing the biological effects of TRMT1 degradation on host cell biology and/or viral biology could raise the impact of this work. But as discussed in more detail above in our response to the weakness listed in Reviewer 3’s public review, we see the main focus of this work as showing the biochemical and structural basis for TRMT1 recognition and cleavage by SARS-CoV-2 Mpro, and directly showing the immediate effects of this cleavage on the TRMT1-tRNA interaction and modification activity. As was the case with other viral proteases, like the HIV-1 protease, understanding the potentially diverse and nuanced downstream biological effects of host protein cleavage and its impacts on cellular phenotypes or viral fitness could take many years of careful cell biology and virology work. We hope that our paper provides the key first steps to viral biology labs taking on this significant but important challenge for TRMT1!

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study has uncovered some important initial findings about cellular responses to aneuploidy through analysis of gene expression in a set of donated human embryos. While the study's findings are in general solid, some experiments lack statistical power due to small sample sizes. The authors should try to get much more insight with their data highlighting the novel findings.

      We thank the editor for considering our manuscript for publication at elife, and for the helpful and thorough reviews of our work. Based on the suggestions of the reviewers, we have carried out additional experiments, expanded the sample size and reanalyzed the data. This has resulted in a thoroughly revised manuscript and much improved work, which we are convinced meets the requirements to be published as a version of record. Of note, the experiments for the revision required the support by 2 additional researchers from our lab which are now coauthors.

      These are the main changes made to the initial manuscript:

      (1) The RNA-seq data (Figures 1+2) is now FDR corrected and been reanalyzed. This has not affected the initial observations on the activation of p53 and apoptosis in aneuploid human embryos, as well as that the transcriptomic changes are driven by gene dosage effects. 

      (2) We have included the transcriptome analysis of reversine-treated embryos in the supplementary data.

      (3) For validation of novel findings such as the presence of DNA-damage and the expression of DRAM1 in aneuploid embryos, we now include the stainings of 30 human blastocysts (Figure 3o-t). We found absence of DNA-damage in aneuploid embryos and that DRAM1 is increased in the TE but not the ICM of aneuploid embryos. 

      (4) We re-analyzed the co-expression of CASP8/HSP70 in reversine-embryos as suggested by reviewer 1 and found that both proteins tend to be co-expressed. 

      (5) We have added a new analysis of NANOG expression (Figure 4a,b) of the embryos used in Figure 3o-t and have found retention of NANOG protein in both the TE and ICM.

      (6) We have added 6 euploid and 4 aneuploid embryos to Figure 4l-s, which support the conclusions on the absence of autophagy activation in the ICM and failure of PrE formation in aneuploid embryos.

      (7) We have significantly changed the layout of the figures, revised the supplementary tables, added source data files and rewritten the discussion.

      Regarding the sample size of the study, it is important to emphasize that human embryos are ethically sensitive material and that those with the specific genetic content we used in this study are rare, limiting our ability to expand the sample size. For the revision, we have added 40 human blastocysts to our initial 85 embryos. Compared to similar and high-quality studies using human embryos, our study shows a relatively large sample size (n=125): Victor et al. 2021: 30 human blastocysts for immunostainings1; Martin et al. 2023: 14 human blastocysts2; Martin et al. 2024: 64 human blastocysts3; Domingo-Muelas et al. 2023: 23 human blastocysts4.              

      Public Reviews:

      Reviewer#1(PublicReview):

      This study investigated an important question in human reproduction: why most fully aneuploid embryos is incompatible with normal fetal development. Specifically, the authors investigated the cellular responses to aneuploidy through analysis of gene expression in a set of donated human blastocysts. The samples included uniform aneuploid embryos of meiotic origin and mosaic aneuploid embryos from the SAC inhibitor reversine treatment. The authors relied mainly on low-input RNA sequencing and immunofluorescence staining. Pathway analysis with RNA-seq data of trophectoderm cells suggested activation of p53 and possibly apoptosis, and this cellular signature appeared to be stronger in TE cells with a higher degree of aneuploidy. Immunostaining also found some evidence of apoptosis, increased expression of HSP70 and autophagy in some aneuploid cells. With combinational OCT4 and GATA4 as lineage markers, it appeared that aneuploidy could alter the second lineage segregation and primitive endoderm formation in particular.

      Although this study is largely descriptive, it generated valuable RNA-seq data from a set of aneuploid TE cells with known karyotypes. Immunostaining results in general were consistent with findings in mouse embryos and human gastruloids.

      We thank the reviewer for the thorough evaluation of our manuscript. We have implemented most of the suggestions, which have further strengthened the original findings.

      While there is a scarcity of human embryo materials for research, the lack of single cell level data limits further extension of the presented data on the consequences of mosaic embryos.  

      We did not include single cell RNA-seq data of mosaic human embryos in our study because we focused on embryos diagnosed with complex meiotic abnormalities. Our hypothesis was that the cellular consequences of aneuploidy would be strongest in this type of aneuploidies and most evident to identify and would allow us to provide a basis for the mechanisms of elimination of aneuploid cells in human embryos. In the manuscript (lines 596-626) we acknowledge the limitations of the extrapolation of our results to mosaic embryos.

      A major concern is that the gene list used for pathway analysis is not FDR controlled. It is also unclear how the many plots generated with the "supervised approach" were actually performed. 

      We agree with the concerns about the fact that our differential expression gene list was not FDR but p-value ranked. We followed the suggestion of the reviewer and revised the RNAseq analysis and focused primarily on pathway analysis. We have also added the comparison between aneuploid and reversine treated embryos to the supplementary data and expanded the analysis of high dosage and low dosage embryos. Importantly, the new analysis has not changed the original finding that aneuploid embryos show hallmarks of p53 activation and apoptosis, and that these effects are gene dosage dependent. The manuscript now includes two completely revised and new figures 1 and 2.

      Since we discarded the data generated from our previous approach, we do not use the term supervised approach anymore.

      The authors also appear to have ignored the possibility that high-dosage group could have a higher mitotic defect.

      This is indeed a possibility. In the discussion (lines 504-508) we have now incorporated the notion that the high dosage embryos could have higher mitotic defects, although our data cannot provide any evidence for this. Of note, the gene expression data shows that all aneuploid embryos (including low dosage and reversine embryos) equally show an enrichment for mitotic spindle pathway genes.

      Assuming a fully aneuploid embryo, why do only some cells display p53 and autophagy marker? 

      This is a very good question, on which we can only speculate, but the answer likely lies in the diversity across cells of the same embryo.

      Even in genetically homogenous tissues and cell cultures, individual cells can exhibit different levels of stress responses, such as p53 activation and apoptosis. This variation may be influenced by the local cellular environment, stochastic gene expression, or differences in cell cycle stages. Other studies on fully aneuploid human embryos could also not detect apoptotic responses in every cell1,3.

      For instance, p53 activation differs even between cells that have a similar number of DNA breaks, and this activation is influenced by both cell-intrinsic factors and previous exposure to DNA damage5.

      Cell cycle tightly regulates the response of cells to different stressors. For instance, cells in G1 or S-phase might be more sensitive to apoptosis signals6, while those in G2/M might escape this response temporarily7.  Autophagy is more induced in G1 and S phases, with reduced activity in G2 and M phases8.

      Individual cells may also have different levels of success in the activation of the compensatory pathways, including the unfolded protein response, autophagy, or changes in metabolism, resulting in some cells adapting better than others.

      The expression of p53 and the sensitivity to apoptosis could also be influenced by epigenetic differences between cells, which may alter their transcriptional response to aneuploidy. Even in a genetically identical population, cells can have different epigenetic landscapes, leading to heterogeneous gene expression patterns.

      The conclusion about proteotoxic stress was largely based on staining of HSP70. It appears from Figure 3 d,h that the same cells exhibited increased HSP70 and CASP8 staining. Since HSP70 is known to have anti-apoptotic effect, could the increased expression of Hsp70 be an anti-apoptotic response?

      Our conclusion about proteotoxic stress was not solely based on HSP70 expression. We also stained for LC3B and p62, which are markers for autophagy and when highly expressed indirectly point towards underlying proteotoxic stress in the cells. 

      We reanalyzed the imaging of the stainings in the reversine-treated embryos, and found that the same cells were positive for both HSP70 and CASP8 staining while the minority was single positive (shown now in Figure 3k,l). 

      HSP70 does indeed not only unfold misfolded and aggregated proteins but does also have a function during cell survival and apoptosis9. HSP70 has been for instance found to inhibit the cleavage of Bid through active CASP8 within the extrinsic apoptosis pathway10. It is thus possible that it temporarily plays this role, and we have acknowledged this in the discussion (lines 623-626). On the other hand, the evidence points at an active apoptosis in the TE, with concomitant cell loss, so if HSP70 is indeed having an anti-apoptotic effect, it is having a limited impact.

      Reviewer #2 (Public Review): 

      A high fraction of cells in early embryos carry aneuploid karyotypes, yet even chromosomally mosaic human blastocysts can implant and lead to healthy newborns with diploid karyotypes. Previous studies in other models have shown that genotoxic and proteotoxic stresses arising from aneuploidy lead to the activation of the p53 pathway and autophagy, which helps eliminate cells with aberrant karyotypes. These observations have been here evaluated and confirmed in human blastocysts. The study also demonstrates that the second lineage and formation of primitive endoderm are particularly impaired by aneuploidy.

      This is a timely and potentially important study. Aneuploidy is common in early embryos and has a negative impact on their development, but the reasons behind this are poorly understood. Furthermore, how mosaic aneuploid embryos with a fraction of euploidy greater than 50 % can undergo healthy development remains a mystery. Most of our current information comes from studies on murine embryos, making a substantial study on human embryos of great importance. However, there are only very few new findings or insights provided by this study. Some of the previous findings were reproduced, but it is difficult to say whether this is a real finding, or whether it is a consequence of a low sample number. The authors could get much more insight with their data.

      We thank the reviewer for the thorough evaluation of our manuscript and the valuable suggestions made in the private recommendations. We have expanded the sample size and have carried out additional experiments that have significantly improved the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Instead of using cut off to generate a list, the authors could just rank the entire detected transcriptome for GSEA. This method fits better the authors' intentions of "primarily focused on pathway analysis." The cut-off value "-log10(p-value)<0.05" is not correct. As we can see from the PCA plot, one would not expect many cut off defined DEGs at all. The most obvious transcriptome change is dosage dependent, as the authors cleared showed with InferCNV.

      We thank the reviewer for this suggestion and agree that this was an important concern of the study. We have entirely revised the RNA-seq analysis based on the proposed approach (Figure 1 and 2, Supplementary Figure 1). Also, we have included the analysis of aneuploid versus reversine treated embryos, which has allowed us to determine the differences between naturally occurring chromosomal abnormalities and those that are induced using reversine (Supplementary Figure 1). 

      We first performed differential gene expression analysis using DESEq2 with a cut-off value for significantly differentially expressed genes of | log2FC | > 1 and an FDR < 0.05. Based on the PCAs and the low number of differentially expressed genes for all comparisons, besides high dosage versus euploid embryos, we focussed primarily on pathway analysis. 

      For that, based on the reviewer’s suggestion, we generated a ranked gene list using the GSEA software (version 4.2.2, MSigDatabase) based on the normalized count matrix of the whole transcriptome that was detected after differential gene expression. The ranked gene list was then subjected to the run GSEA function, and we searched the Hallmark and C2 library for significantly enriched pathways. Thus, we could generate normalized enrichment scores, allowing us to predict whether a pathway is activated or suppressed. The details of the new analysis are described in the Material and Methods section (lines 220-232). Significance was determined using a cut-off value of 25% FDR. This cut-off is proposed in the user guide of the GSEA (https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideTEXT.htm) especially for incoherent gene expression datasets, as suggested by our PCAs, which allows for hypothesis driven validation of the dataset. 

      Indeed, we found that the most important transcriptome changes are aneuploidy dosage dependent. High dosage embryos show signatures of cellular unfitness, while low-dosage embryos still seem to activate survival pathways (lines 349-364). 

      This new analysis did not only increase robustness of our results but also introduced novel findings, which pave the road for future studies. 

      The validity of our findings is supported by recent work by the Zernicka-Goetz lab. We found that hypoxia is upregulated in low dosage human aneuploid TE cells. In line with our data, the Zernicka-Goetz lab found in a mouse model of low degree chromosomal abnormalities that hypoxia inducible factor 1A (HIF1A) promotes survival of extraembryonic aneuploid cells by reducing levels of DNA damage11.

      (2) It would be very helpful if the authors could perform co-staining of multiple stress markers to better understand the origins of apoptosis and autophagy cells. In Fig 3d and 3h, it seems that the same reversine treated embryo was stained with CASP8, LC3B and HSP70. Is there any correlation between CASP8 and HSP70 at the single cell level? Is there any correlation between p53 and LC3B as the authors suggested, possibly through DRAM1?

      We decided to use the complex aneuploid embryos that were left at our facility for the validation of novel findings such as upregulation of DRAM1 and presence and consequences of DNA damage in aneuploid embryos. As suggested by the editor and the other reviewer we also added embryos to existing datasets to increase the sample size where necessary. Therefore, we did not include other co-staining’s of multiple stress markers.

      Following the reviewer’s suggestion, we reanalyzed the existing stainings and evaluated whether there is a correlation between CASP8 and HSP70 at the single cell level. The reversine-treated embryos were the only embryo group that was co-stained for both CASP8 and HSP70. We quantified the percentage of cells that were single or double positive for CASP8 and HSP70 and found a higher proportion of double positive cells than to single positives. Therefore, we concluded that there is indeed a correlation between both proteins at the single cell level in reversine-treated embryos and included this data in Figure 3k,l. 

      During the experiments for the revision, we found that the DRAM1 protein was upregulated in the cytoplasm of TE cells but not in the ICM of aneuploid embryos (Figure 3s,t), which validates the findings of the gene expression analysis. This data also supports our findings that autophagy is active in aneuploid TE cells while not significantly increased in aneuploid pluripotent ICM cells. Unfortunately, we could not stain LC3B and DRAM1 in the same embryo because the antibodies were raised in the same species.

      (3) While " the possibilities for functional studies and lineage tracing experiments in human embryos are very limited," the authors can leverage in silico modelling (ie, PMID: 28700688) to address the roles of aneuploidy in blastocyst formation and development. Is there any selfregulating mechanism underlying the ratios of PrE and EPI? Is apoptosis of ICM cells a natural process during PrE formation (PMID: 18725515)?

      It is a very interesting proposal to use in silico modelling to address the roles of aneuploidy during human blastocyst formation and lineage segregation. Although this type of analysis would yield very important insights, we are not able to address this point of the revision due to lack of expertise for this type of analysis in our group, requiring setting up a collaboration with experts in this field.  In the discussion we proposed that future studies can leverage our data to be carried out in silico modelling and cited the proposed article (lines 608-610).

      On the second part of the question, we would like to discuss the differences between mouse and human embryo studies. Parts of this were included in the discussion on the possible mechanisms of PrE elimination. 

      Is there a self-regulating mechanism for EPI/PrE formation?

      To extrapolate the knowledge on mouse development to human it is important to bear in mind that (1) human embryos are outbred, as compared to inbred super-fertile laboratory mouse strains and (2) the embryos are donated to research by subfertile couples, which could compromise the EPI/PrE ratios. For instance, Chousal and colleagues found that poor quality blastocysts have a reduced number of PrE cells12. In human embryos the proportion EPI and PrE cells is indeed highly variable (20%-60%) and while the number of EPI cells does not increase between dpf6 and 7, the number of PrE cells does grow13. We found a similar variable number of EPI and PrE in our study on the lineage segregation mechanisms in good quality human embryos, with an absolute number of EPI of 12.1±6.5 cells and 8.4±3.44 PrE cells14.

      By comparison, in late mouse blastocysts, the ratio EPI/PrE cells is consistent (2/3)15. Overall, self-regulating mechanisms in the human embryo are not yet studied in detail due to the lack of possible functional testing.

      Is apoptosis a natural process during PrE formation?

      Yes, in mice apoptosis is a natural process during PrE formation to eliminate misallocated cells of the inner cell mass through cell competition16,17. Yet, in the human embryo there is no evidence of such mechanisms. Although apoptosis is present even in human blastocysts of good quality18, the origin of such apoptotic cells is now still shown, although suboptimal culture conditions are known to increase cellular fragmentation19. Conversely, our data and that of others1,2 supports the notion that the pluripotent inner cell mass in human embryos is more resistant to apoptosis than the trophectoderm, even in karyotypically aberrant cells. 

      (4) The "count tables generated from the raw data files" could not be found in the source data files.

      This slipped to our attention, we have added now the count tables to the source data files. Our apologies.

      (5) Citations on aneuploidy literature were not done in a fully scholarly manner. It appears that authors selectively cite previous papers that are in support of their hypothesis but left out those with alternative conclusions.

      We apologize if we missed any literature that contradicts our findings, it is not intentional. We would be grateful if the reviewer could provide such references. 

      In the manuscript we describe the alignment and differences of key findings with several studies (listed below) and the limitations of our study are extensively described in lines 596626.

      Our findings align with other work on these aspects:

      - RNA-sequencing data2,20–26

      - Gene dosage effects drive the transcriptome of the aneuploid human embryo27,28

      - Aneuploid cells are cleared by sustained proteotoxic stress followed by p53 activation, autophagy and eventually apoptosis29–37.

      - p53 is active in constitutional aneuploid cells38

      - The ICM is less sensitive to apoptosis1,2

      Our findings differ with other work on these points:

      - p53 activation is independent from DNA-damage39

      - p53 is active in constitutional aneuploid cells40,41

      - Apoptosis is only present in the aneuploid TE of aneuploid cells in the embryo29,30,42    

      Reviewer #2 (Recommendations For The Authors):

      Comments:

      (1) The main problem is that there is no substantial novelty. The authors look at previously identified factors affected by chromosome gains and losses, but none of the new one from their analysis. Anything what could be potentially novel is not carefully analyzed (e.g. the difference between reversine-treated and aneuploid samples, or new potential candidates) or explained. This is really a pity.

      In the revision, we have further elaborated on the DNA damage aspect by staining for DNA double-stranded breaks and have validated DRAM1 as an activated downstream effector of p53. We have also added the analyses of the gene-expression of the reversine-treated embryos.

      (2) Some of the general statements on aneuploidy are confusing and often borderline generalized. E.g. introduction line 106: "If this (proteotoxic stress) remains unresolved by the activation of autophagy..." I am not aware of any publication suggesting that autophagy resolves proteotoxic stress in aneuploid cells. Citations that replication stress causes DNA damage in aneuploid cells are wrong. This link was first shown by Passerini et al. in 2016. etc.

      We have clarified these statements in the introduction and added the proposed citations on replication stress that causes DNA damage in aneuploid cells (lines 95-108).

      (3) In the figures the authors show a representative image of aneuploid and diploid embryos. Given the aneuploid embryos have widely different karyotypes, it would be important to clarify which of the embryos has been actually shown. Similarly, in the heat maps it is not clear which line is which embryo. This would be very useful.

      We added the karyotypes of the aneuploid embryos to the images in figure 3 and 4. Since the heatmaps were removed from the figures we added the karyotypes to the PCAs in all figures.

      (4) The authors constantly state that aneuploid embryo accumulate more DNA damage, which is supported by some of their observations, e.g. the DNA damage response is upregulated. It would be great if they would validated this statements with testing some markers for DNA damage.

      We agree with the reviewer that this was an important point and addressing it has revealed that our initial assumption was incorrect and has provided new interesting findings. From the revised RNA-seq analysis, we found only one pathway (DNA damage response TP53) to be activated in all aneuploid embryos (Fig.1e). The ATM pathway was also activated specifically in high-dosage embryos. Following this, we set to test if DNA damage was indeed increased in aneuploid embryos by staining for DNA double strand breaks with gH2AX. 

      First, we investigated the gH2AX expression in 5dpf embryos in which we induced DNAdamage with Bleomycin. We compared 6 untreated versus 6 Bleomycin treated human embryos (Fig. 3m) and found that gH2AX foci were rarely present in the untreated embryos and that all cells of the treated embryos showed a pan-nuclear gH2AX staining. 

      Second, we compared the presence of gH2AX foci in the TE (NANOG negative cells), ICM (NANOG positive cells) and the whole embryo of 7 euploid versus 11 aneuploid embryos. Interestingly, we found no differences in the number of gH2AX foci or pan-nuclear gH2AX nuclei between euploid and aneuploid embryos (Fig 3o). When dividing our aneuploid embryos into high and low dosage embryos we could also not account for differences. Our data now suggests that complex aneuploid human embryonic cells of meiotic origin do not contain more DNA-double strand breaks, precluding DNA-damage as the source of p53 activation. Last, in our previous experiment we found that phosphorylated S15p53 is increased in aneuploid embryos, supporting an active p53 pathway as suggested by our transcriptomic data. Since we could not find DNA-damage in aneuploid human embryos we speculate that p53 is phosphorylated on Serine15 through metabolic stress as suggested by Jones and colleagues43. We also argue that proteotoxic stress might induce p53 expression as proposed by Singla and colleagues29.

      (5) The source of embryos is only partially described in a figure legend. This should be expanded and described in the Materials and Methods section. The embryos are named, but this is nowhere explained. One can only assume that T is for trisomy and M is for monosomy.

      We have divided the embryos into different experimental series (Experiment 1-4). This is now described in the Material and methods section (lines 157-175). Also, we have added the experiment number of each embryo to the supplementary tables and to the source data. The abbreviation for T = Trisomy and M= Monosomy was initially introduced in the last paragraph of the figure legend of figure 4.  We now added it to every panel.

      (6) Recent works from non-embryonic cells suggest that the cellular response to monosomy is different than the response to trisomy. Did the authors try to test this possible difference? For example, one could compare embryos M174/21, M2/19 and M17 with T2/10, T10/22 and T1/15/18/22.

      We thank the reviewer for pointing this out. Our RNA-seq. dataset consisted of three embryos that contained trisomies only and four embryos that contained monosomies only. When reanalyzing our data we found different transcriptomic responses between monosomic only and trisomic only cells. Compared to euploid cells, monosomy only cells activate mainly the p53pathway and protein secretion while translation, DNA replication, cell cycle G1/S, DNA synthesis and processing of DNA double strand breaks were inhibited. Trisomy only cells show activated oxidative phosphorylation, ribosome and translation while protein secretion, apoptosis and cell cycle are inhibited. These differences were confirmed by testing transcriptomic differences between trisomic versus monosomic cells. Our results are similar to studies on human embryos20,26 and other monosomic and trisomic cell lines44,45. However, the interpretation of these results is very limited by the small sample size and the comparison of monosomies and trisomies of different chromosomes. Thus, we decided to keep this analysis out of the manuscript.

      Author response image 1.

      On the protein level, next to the small sample size, our results were also limited by the fact that not all embryos were stained with the same combinations of antibodies. LC3B was the only protein for which all embryos were immunostained. Thus, other protein data could not be re-analyzed due to even lower sample sizes. 

      Below we have separated the LC3B puncta per cell counts into euploid, trisomies only, monosomies only and all other aneuploid embryos. We performed a Kruskal Wallis test with multiple comparisons. It is worth noticing that the difference between euploid and monosomies only (and those that contained both) was statistically significant, while the difference between euploid vs trisomies only and trisomies only vs monosomies only was not statistically significant. These differences contradict the studies on monosomic cell lines that found that proteotoxic stress and autophagy are not present and specific to trisomic cell lines. Here we also decided to keep this specific protein expression analysis out of the manuscript due to the above-mentioned limitations.

      Author response image 2.

      (7) Line 329: "a trisomy 12 meiotic chromosomal abnormality in one reversine-treated embryo." What does it mean? Why meiotic chromosomal abnormality when the reversine treatment was administered 4 days after fertilization? In the discussion, the authors state "presumed meiotic," but this should be discussed and described more clearly.

      Since reversine induces mitotic abnormalities of different types leading to chromosomally mosaic embryos, we could not identify these induced abnormalities using inferCNV on the RNAseq of TE biopsies of said embryos. However, we were not aware of the karyotype of the embryos that were used for these experiments, as they were thawed after they had been cryopreserved at day 3 of development and had not been subjected to genetic testing.  This makes it possible that some of those embryos we used for the reversine experiments in fact carried endogenously acquired meiotic and mitotic chromosomal abnormalities. Since we are only able to detect by inferCNV aneuploidies homogeneously present in the majority of the cells of the sequenced biopsy, we only picked up this trisomy 12.  It is possible that this was not a meiotic abnormality but a miotic one originating at the first cleavage and present at a high percentage of cells in the blastocyst. At any rate, the exact origin of this aneuploidy has no further implications for the results of the study. We clarified this in the manuscript (lines 310-315).

      (8) Line 422: "The gene expression profiles suggest that the accumulation of autophagic proteins in aneuploid embryos is caused by increased autophagic flux due to differential expression of the p53 target gene DNA Damage Regulated Autophagy Modulator-1 (DRAM1), rather than by inhibition of autophagy (Supplementary Table 2)." This is highly speculative, as the authors do not have any evidence to support this statement.

      To validate this finding we have now stained 7 euploid and 11 aneuploid embryos with a DRAM1 antibody. We found DRAM1 protein to be significantly enriched in the cytoplasm of TE cells but not in the ICM of aneuploid embryos when comparing with euploid embryos (Fig. 3s,t). This data is consistent with the finding that autophagy is increased in the TE and not the ICM of aneuploid human embryos. (Fig 4l-o). Potential implications of DRAM1 expression have been mentioned in the discussion.

      (9) The figure legends are confusing. They are mixed up with the methods and some key information are missing.

      We revised all figure legends accordingly and removed the experimental set-up figures from the manuscript to reduce any confusion. The methods section was revised and expanded.

      (10) In Figure 1, what is the difference between "activated" and "deregulated"?

      Since we analyzed our RNA-seq dataset with the method proposed by reviewer 1 we now generated normalized enrichment scores. The terms activated and deregulated are thus not present anymore.  

      (11) The p62 images are not really clear. There might be more puncta (not obvious, though), but the staining intensity seems lower in the representative images.  

      We do not agree with the reviewer that there might be more p62 puncta (purple), however, we agree that it was not clearly visible from the pictures. Below we show an example of the counting mask (in green) of the aneuploid embryo from figure 3i, where one can clearly appreciate that all the puncta are captured by the counting mask. In this case, the software counted 1704 puncta. To further clarify, we now added a zoom of a randomly chose ROI of the p62 staining’s to figure 3i.

      Author response image 3.

      (12) The authors claim that there are differences between lineages in response to aneuploidy, such as autophagy not being activated in the OCT4+ lineage, etc. However, the differences are very small and based on a small number of embryos. It is difficult to draw far-reaching conclusions based on a small number of experiments (Fig. 4n-r). The authors also claim in the Abstract that they demonstrated "clear differences with previous findings in the mouse", which are however difficult to identify in the text.

      We agree with the reviewer that our conclusions on figures 4l-o were based on a small number of embryos. We have increased as much as possible the sample size. This is challenging due to the constrictions in accessing human embryos, and especially the limited number of embryos with meiotic complex aneuploidy. We have performed immunostainings for LC3B, OCT4 and GATA4 of six additional euploid and four additional aneuploid human embryos. This did not change our overall findings that aneuploid embryos upregulate autophagy in the TE rather than the ICM (Figure 4l-o). After the inclusion of additional embryos, we removed our speculation from the manuscript that autophagy is present in ICM cells of already differentiated cells towards EPI/PrE.

      We have rephrased the abstract to state that we highlight a few differences with previous findings in the mouse. Here we focused especially on the different transcriptomic response of reversine treated embryos, that aneuploid mouse embryos do not seem to suffer from lineage segregation errors and that the ICM of aneuploid human embryos lacks apoptosis while aneuploid mouse embryos show elimination from the EPI. Likewise, we highlighted the similar stress responses and that we could give novel insights into p53 mediated autophagy and apoptosis activation through DRAM1 in aneuploid TE cells but not the ICM.  

      (13) The text needs thorough editing - long sentences, typos, and grammar errors are frequent. Punctuation is largely missing.

      We have revised the text.

      References

      (1) Victor, A. R. et al. One hundred mosaic embryos transferred prospectively in a single clinic: exploring when and why they result in healthy pregnancies. Fertil Steril 111, 280–293 (2019).

      (2) Martin, A. et al. Mosaic results after preimplantation genetic testing for aneuploidy may be accompanied by changes in global gene expression. Front Mol Biosci 10, 264 (2023).

      (3) Martín, Á. et al. Trophectoderm cells of human mosaic embryos display increased apoptotic levels and impaired differentiation capacity: a molecular clue regarding their reproductive fate? Human Reproduction 39, 709–723 (2024).

      (4) Domingo-Muelas, A. et al. Human embryo live imaging reveals nuclear DNA shedding during blastocyst expansion and biopsy. Cell 186, 3166-3181.e18 (2023).

      (5) Loewer, A., Karanam, K., Mock, C. & Lahav, G. The p53 response in single cells is linearly correlated to the number of DNA breaks without a distinct threshold. BMC Biol 11, 1–13 (2013).

      (6) Kim, H., Watanabe, S., Kitamatsu, M., Watanabe, K. & Ohtsuki, T. Cell cycle dependence of apoptosis photo-triggered using peptide-photosensitizer conjugate. Scientific Reports 2020 10:1 10, 1–8 (2020).

      (7) Pollak, N. et al. Cell cycle progression and transmitotic apoptosis resistance promote escape from extrinsic apoptosis. J Cell Sci 134, (2021).

      (8) Neufeld, T. P. Autophagy and cell growth--the yin and yang of nutrient responses. J Cell Sci 125, 2359–2368 (2012).

      (9) Lanneau, D. et al. Heat shock proteins: essential proteins for apoptosis regulation. J Cell Mol Med 12, 743 (2008).

      (10) Gabai, V. L., Mabuchi, K., Mosser, D. D. & Sherman, M. Y. Hsp72 and Stress Kinase cjun N-Terminal Kinase Regulate the Bid-Dependent Pathway in Tumor Necrosis Factor-Induced Apoptosis. Mol Cell Biol 22, 3415 (2002).

      (11) Sanchez-Vasquez, E., Bronner, M. E. & Zernicka-Goetz, M. HIF1A contributes to the survival of aneuploid and mosaic pre-implantation embryos. bioRxiv 2023.09.04.556218 (2023) doi:10.1101/2023.09.04.556218.

      (12) Chousal, J. N. et al. Molecular profiling of human blastocysts reveals primitive endoderm defects among embryos of decreased implantation potential. Cell Rep 43, (2024).

      (13) Corujo-Simon, E., Radley, A. H. & Nichols, J. Evidence implicating sequential commitment of the founder lineages in the human blastocyst by order of hypoblast gene activation. Development (Cambridge) 150, (2023).

      (14) Regin, M. et al. Lineage segregation in human pre-implantation embryos is specified by YAP1 and TEAD1. Human Reproduction 38, 1484–1498 (2023).

      (15) Saiz, N., Williams, K. M., Seshan, V. E. & Hadjantonakis, A. K. Asynchronous fate decisions by single cells collectively ensure consistent lineage composition in the mouse blastocyst. Nature Communications 2016 7:1 7, 1–14 (2016).

      (16) Plusa, B., Piliszek, A., Frankenberg, S., Artus, J. & Hadjantonakis, A. K. Distinct sequential cell behaviours direct primitive endoderm formation in the mouse blastocyst. Development 135, 3081–3091 (2008).

      (17) Hashimoto, M. & Sasaki, H. Epiblast Formation by TEAD-YAP-Dependent Expression of Pluripotency Factors and Competitive Elimination of Unspecified Cells. Dev Cell 50, 139-154.e5 (2019).

      (18) Hardy, K. Apoptosis in the human embryo. Rev Reprod 4, 125–134 (1999).

      (19) Ramos-Ibeas, P. et al. Embryo responses to stress induced by assisted reproductive technologies. Mol Reprod Dev 86, 1292–1306 (2019).

      (20) Licciardi, F. et al. Human blastocysts of normal and abnormal karyotypes display distinct transcriptome profiles. Sci Rep 8, 1–9 (2018).

      (21) Maxwell, S. M. et al. Investigation of Global Gene Expression of Human Blastocysts Diagnosed as Mosaic using Next-generation Sequencing. Reproductive Sciences 1–11 (2022) doi:10.1007/s43032-022-00899-x.

      (22) Groff, A. F. et al. RNA-seq as a tool for evaluating human embryo competence. Genome Res 29, 1705–1718 (2019).

      (23) Starostik, M. R., Sosin, O. A. & McCoy, R. C. Single-cell analysis of human embryos reveals diverse patterns of aneuploidy and mosaicism. Genome Res 30, 814–826 (2020).

      (24) Vera-Rodriguez, M., Chavez, S. L., Rubio, C., Pera, R. A. R. & Simon, C. Prediction model for aneuploidy in early human embryo development revealed by single-cell analysis. Nat Commun 6, 7601 (2015).

      (25) Sanchez-Ribas, I. et al. Transcriptomic behavior of genes associated with chromosome 21 aneuploidies in early embryo development. Fertil Steril 111, 991-1001.e2 (2019).

      (26) Fuchs Weizman, N. et al. Towards Improving Embryo Prioritization: Parallel Next Generation Sequencing of DNA and RNA from a Single Trophectoderm Biopsy. Sci Rep 9, 1–11 (2019).

      (27) Fernandez Gallardo, E. et al. A multi-omics genome-and-transcriptome single-cell atlas of human preimplantation embryogenesis reveals the cellular and molecular impact of chromosome instability. bioRxiv 2023.03.08.530586 (2023) doi:10.1101/2023.03.08.530586.

      (28) Dürrbaum, M. & Storchová, Z. Effects of aneuploidy on gene expression: implications for cancer. FEBS J 283, 791–802 (2016).

      (29) Singla, S., Iwamoto-Stohl, L. K., Zhu, M. & Zernicka-Goetz, M. Autophagy-mediated apoptosis eliminates aneuploid cells in a mouse model of chromosome mosaicism. Nat Commun 11, 1–15 (2020).

      (30) Bolton, H. et al. Mouse model of chromosome mosaicism reveals lineage-specific depletion of aneuploid cells and normal developmental potential. Nat Commun 7, 1– 12 (2016).

      (31) Ohashi, A. et al. Aneuploidy generates proteotoxic stress and DNA damage concurrently with p53-mediated post-mitotic apoptosis in SAC-impaired cells. Nat Commun 6, 1–16 (2015).

      (32) Santaguida, S. & Amon, A. Short- and long-term effects of chromosome missegregation and aneuploidy. Nature Reviews Molecular Cell Biology vol. 16 473–485 Preprint at https://doi.org/10.1038/nrm4025 (2015).

      (33) Santaguida, S., Vasile, E., White, E. & Amon, A. Aneuploidy-induced cellular stresses limit autophagic degradation. Genes Dev 29, 2010–2021 (2015).

      (34) Chunduri, N. K. & Storchová, Z. The diverse consequences of aneuploidy. Nature Cell Biology 2019 21:1 21, 54–62 (2019).

      (35) Dürrbaum, M. et al. Unique features of the transcriptional response to model aneuploidy in human cells. BMC Genomics 15, 139 (2014).

      (36) Pan, J.-A., Ullman, E., Dou, Z. & Zong, W.-X. Inhibition of protein degradation induces apoptosis through a microtubule-associated protein 1 light chain 3-mediated activation of caspase-8 at intracellular membranes. Mol Cell Biol 31, 3158–70 (2011).

      (37) Stingele, S. et al. Global analysis of genome, transcriptome and proteome reveals the response to aneuploidy in human cells. Mol Syst Biol 8, 608 (2012).

      (38) Tang, Y.-C., Williams, B. R., Siegel, J. J. & Amon, A. Identification of aneuploidyselective antiproliferation compounds. Cell 144, 499–512 (2011).

      (39) Janssen, A., Van Der Burg, M., Szuhai, K., Kops, G. J. P. L. & Medema, R. H. Chromosome segregation errors as a cause of DNA damage and structural chromosome aberrations. Science 333, 1895–1898 (2011).

      (40) Li, M. et al. The ATM-p53 pathway suppresses aneuploidy-induced tumorigenesis. Proc Natl Acad Sci U S A 107, 14188–14193 (2010).

      (41) Thompson, S. L. & Compton, D. A. Proliferation of aneuploid human cells is limited by a p53-dependent mechanism. J Cell Biol 188, 369–381 (2010).

      (42) Yang, M. et al. Depletion of aneuploid cells in human embryos and gastruloids. Nat Cell Biol 23, 314–321 (2021).

      (43) Jones, R. G. et al. AMP-activated protein kinase induces a p53-dependent metabolic checkpoint. Mol Cell 18, 283–293 (2005).

      (44) Chunduri, N. K., Barthel, K. & Storchova, Z. Consequences of Chromosome Loss: Why Do Cells Need Each Chromosome Twice? Cells 2022, Vol. 11, Page 1530 11, 1530 (2022).

      (45) Krivega, M., Stiefel, C. M. & Storchova, Z. Consequences of chromosome gain: A new view on trisomy syndromes. American Journal of Human Genetics vol. 109 2126–2140 Preprint at https://doi.org/10.1016/j.ajhg.2022.10.014 (2022).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment

      This work presents an important method for depleting ribosomal RNA from bacterial single-cell RNA sequencing libraries, enabling the study of cellular heterogeneity within microbial biofilms. The approach convincingly identifies a small subpopulation of cells at the biofilm's base with upregulated PdeI expression, offering invaluable insights into the biology of bacterial biofilms and the formation of persister cells. Further integrated analysis of gene interactions within these datasets could deepen our understanding of biofilm dynamics and resilience.

      Thank you for your valuable feedback and for recognizing the importance of our method for depleting ribosomal RNA from bacterial single-cell RNA sequencing libraries. We are pleased that our approach has convincingly identified a small subpopulation of cells at the base of the biofilm with upregulated PdeI expression, providing significant insights into the biology of bacterial biofilms and the formation of persister cells.

      We acknowledge your suggestion for a more comprehensive analysis of multiple genes and their interactions. While we conducted a broad analysis across the transcriptome, our decision to focus on the heterogeneously expressed gene PdeI was primarily informed by its critical role in biofilm biology. In addition to PdeI, we investigated other marker genes and noted that lptE and sstT exhibited potential associations with persister cells. However, our interaction analysis revealed that LptE and SstT did not demonstrate significant relationships with c-di-GMP and PdeI based on current knowledge. This insight led us to concentrate on PdeI, given its direct relevance to biofilm formation and its close connection to the c-di-GMP signaling pathway.

      We fully agree that other marker genes may also have important regulatory roles in different aspects of biofilm dynamics. Thus, we plan to explore the expression patterns and potential functions of these genes in our future research. Specifically, we intend to conduct more extensive gene network analyses to uncover the complex regulatory mechanisms involved in biofilm formation and resilience.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Yan and colleagues introduce a modification to the previously published PETRI-seq bacterial single cell protocol to include a ribosomal depletion step based on a DNA probe set that selectively hybridizes with ribosome-derived (rRNA) cDNA fragments. They show that their modification of the PETRI-seq protocol increases the fraction of informative non-rRNA reads from ~4-10% to 54-92%. The authors apply their protocol to investigating heterogeneity in a biofilm model of E. coli, and convincingly show how their technology can detect minority subpopulations within a complex community.

      Strengths:

      The method the authors propose is a straightforward and inexpensive modification of an established split-pool single cell RNA-seq protocol that greatly increases its utility, and should be of interest to a wide community working in the field of bacterial single cell RNA-seq.

      We sincerely thank the reviewer for their thoughtful and positive evaluation of our work. We appreciate the recognition of our modification to the PETRI-seq bacterial single-cell RNA sequencing protocol by incorporating a ribosomal depletion step. The significant increase in the fraction of informative non-rRNA reads, as noted in the reviewer’s summary, underscores the effectiveness of our method in enhancing the utility of the PETRI-seq approach. We are also encouraged by the reviewer's acknowledgment of our ability to detect minority subpopulations within complex biofilm communities. Our team is committed to further validating and optimizing this method, and we believe that RiboD-PETRI will contribute meaningfully to the field of bacterial single-cell transcriptomics. We hope this innovative approach will facilitate new discoveries in microbial ecology and biofilm research.

      Reviewer #2 (Public review):

      Summary:

      This work introduces a new method of depleting the ribosomal reads from the single-cell RNA sequencing library prepared with one of the prokaryotic scRNA-seq techniques, PETRI-seq. The advance is very useful since it allows broader access to the technology by lowering the cost of sequencing. It also allows more transcript recovery with fewer sequencing reads. The authors demonstrate the utility and performance of the method for three different model species and find a subpopulation of cells in the E.coli biofilm that express a protein, PdeI, which causes elevated c-di-GMP levels. These cells were shown to be in a state that promotes persister formation in response to ampicillin treatment.

      Strengths:

      The introduced rRNA depletion method is highly efficient, with the depletion for E.coli resulting in over 90% of reads containing mRNA. The method is ready to use with existing PETRI-seq libraries which is a large advantage, given that no other rRNA depletion methods were published for split-pool bacterial scRNA-seq methods. Therefore, the value of the method for the field is high. There is also evidence that a small number of cells at the bottom of a static biofilm express PdeI which is causing the elevated c-di-GMP levels that are associated with persister formation. This finding highlights the potentially complex role of PdeI in regulation of c-di-GMP levels and persister formation in microbial biofilms.

      Weaknesses:

      Given many current methods that also introduce different techniques for ribosomal RNA depletion in bacterial single-cell RNA sequencing, it is unclear what is the place and role of RiboD-PETRI. The efficiency of rRNA depletion varies greatly between species for the majority of the available methods, so it is not easy to select the best fitting technique for a specific application.

      Thank you for your insightful comments regarding the place and role of RiboD-PETRI in the landscape of ribosomal RNA depletion techniques for bacterial single-cell RNA sequencing. We appreciate the opportunity to address your concerns and clarify the significance of our method.

      We acknowledge that the field of rRNA depletion in bacterial single-cell RNA sequencing is diverse, with many methods offering different approaches. We also recognize the challenge of selecting the best technique for a specific application, given the variability in rRNA depletion efficiency across species for many available methods. In light of these considerations, we believe RiboD-PETRI occupies a distinct and valuable niche in this landscape due to following reasons: 1) Low-input compatibility: Our method is specifically tailored for the low-input requirements of single-cell RNA sequencing, maintaining high efficiency even with limited starting material. This makes RiboD-PETRI particularly suitable for single-cell studies where sample quantity is often a limiting factor. 2) Equipment-free protocol: One of the unique advantages of RiboD-PETRI is that it can be conducted in any lab without the need for specialized equipment. This accessibility ensures that a wide range of researchers can implement our method, regardless of their laboratory setup. 3) Broad species coverage: Through comprehensive probe design targeting highly conserved regions of bacterial rRNA, RiboD-PETRI offers a robust solution for samples involving multiple bacterial species or complex microbial communities. This approach aims to provide consistent performance across diverse taxa, addressing the variability issue you mentioned. 4) Versatility and compatibility: RiboD-PETRI is designed to be compatible with various downstream single-cell RNA sequencing protocols, enhancing its utility in different experimental setups and research contexts.

      In conclusion, RiboD-PETRI's unique combination of low-input compatibility, equipment-free protocol, broad species coverage, and versatility positions it as a robust and accessible option in the landscape of rRNA depletion methods for bacterial single-cell RNA sequencing. We are committed to further validating and improving our method to ensure its valuable contribution to the field and to provide researchers with a reliable tool for their diverse experimental needs.

      Despite transcriptome-wide coverage, the authors focused on the role of a single heterogeneously expressed gene, PdeI. A more integrated analysis of multiple genes and\or interactions between them using these data could reveal more insights into the biofilm biology.

      Thank you for your valuable feedback. We understand your suggestion for a more comprehensive analysis of multiple genes and their interactions. While we indeed conducted a broad analysis across the transcriptome, our decision to focus on the heterogeneously expressed gene PdeI was primarily based on its crucial role in biofilm biology. Beyond PdeI, we also conducted overexpression experiments on several other marker genes and examined their phenotypes. Notably, the lptE and sstT genes showed potential associations with persister cells. We performed an interaction analysis, which revealed that LptE and SstT did not show significant relationships with c-di-GMP and PdeI based on current knowledge. This finding led us to concentrate our attention on PdeI. Given PdeI's direct relevance to biofilm formation and its close connection to the c-di-GMP signaling pathway, we believed that an in-depth study of PdeI was most likely to reveal key biological mechanisms.

      We fully agree with your point that other marker genes may play regulatory roles in different aspects. The expression patterns and potential functions of these genes will be an important direction in our future research. In our future work, we plan to conduct more extensive gene network analyses to uncover the complex regulatory mechanisms of biofilm formation.

      Author response image 1.

      The proportion of persister cells in the partially maker genes and empty vector control groups. Following induction of expression with 0.002% arabinose for 2 hours, a persister counting assay was conducted on the strains using 150 μg/ml ampicillin.

      The authors should also present the UMIs capture metrics for RiboD-PETRI method for all cells passing initial quality filter (>=15 UMIs/cell) both in the text and in the figures. Selection of the top few cells with higher UMI count may introduce biological biases in the analysis (the top 5% of cells could represent a distinct subpopulation with very high gene expression due to a biological process). For single-cell RNA sequencing, showing the statistics for a 'top' group of cells creates confusion and inflates the perceived resolution, especially when used to compare to other methods (e.g. the parent method PETRI-seq itself).

      Thank you for your valuable feedback regarding the presentation of UMI capture metrics for the RiboD-PETRI method. We appreciate your concern about potential biological biases and the importance of comprehensive data representation in single-cell RNA sequencing analysis. We have now included the UMI capture metrics for all cells passing the initial quality filter (≥15 UMIs/cell) for the RiboD-PETRI method. This information has been added to both the main text and the relevant figures, providing a more complete picture of our method's performance across the entire range of captured cells. These revisions strengthen our manuscript and provide readers with a more complete understanding of the RiboD-PETRI method in the context of single-cell RNA sequencing.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The reviewers have responded thoughtfully and comprehensively to all of my comments. I believe the details of the protocol are now much easier to understand, and the text and methods have been significantly clarified. I have no further comments.

      Reviewer #2 (Recommendations for the authors):

      The authors edited the manuscript thoroughly in response to the comments, including both performing new experiments and showing more data and information. Most of the major points raised between both reviewers were addressed. The authors explained the seeming contradiction between c-di-GMP levels and PdeI expression. Despite these improvements, a few issues remain:

      - Despite now depositing the data and analysis files to GEO, the access is embargoed and the reviewer token was not provided to evaluate the shared data and accessory files.

      Please note that although the data and analysis files have been deposited to GEO, access is currently embargoed. To evaluate the shared data and accessory files, you will need a reviewer token, which appears to have not been provided.

      To gain access, please follow these steps:

      Visit the GEO accession page at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE260458

      In the designated field, enter the reviewer token: ehipgqiohhcvjev

      - Despite now discussing performance metrics for RiboD-PETRI method for all cells passing initial quality filter (>=15 UMIs/cell) in the text, the authors continued to also include the statistics for top 1000 cells, 5,000 cells and so on. Critically, Figure 2A-B is still showing the UMI and gene distributions per cell only for these select groups of cells. The intent to focus on these metrics is not quite clear, as selection of the top few cells with higher UMI count may introduce biological biases in the analysis (what if the top 5% of cells are unusual because they represent a distinct subpopulation with very high gene expression due to a biological process). I understand the desire to demonstrate the performance of the method by highlighting a few select 'best' cells, however, for single-cell RNA sequencing showing the statistics for a 'top' group of cells is not appropriate and creates confusion, especially when used to compare to other methods (e.g. the parent method PETRI-seq itself).

      We appreciate your insightful feedback regarding our presentation of the RiboD-PETRI method's performance metrics. We acknowledge the concerns you've raised and agree that our current approach requires refinement. We have revised our analysis to prominently feature metrics for all cells that pass the initial quality filter (≥15 UMIs/cell) (Fig. 2A, Fig. 3A, Supplementary Fig. 1A, B and Supplementary Fig. 2A, G). This approach provides a more representative view of the method's performance across the entire dataset, avoiding potential biases introduced by focusing solely on top-performing cells.​

      We recognize that selecting only the top cells based on UMI counts can indeed introduce biological biases, as these cells may represent distinct subpopulations with unique biological processes rather than typical cellular states. To address this, we have clearly stated the potential for bias when highlighting select 'best' cells. We also provided context for why these high-performing cells are shown, explaining that they demonstrate the upper limits of the method's capabilities (lines 139). In addition, when comparing RiboD-PETRI to other methods, including the parent PETRI-seq, we ensured that comparisons are made using consistent criteria across all methods.

      By implementing these changes, we aim to provide a more accurate, unbiased, and comprehensive representation of the RiboD-PETRI method's performance while maintaining scientific rigor and transparency. We appreciate your critical feedback, as it helps us improve the quality and reliability of our research presentation.

      - Line 151 " The findings reveal that our sequencing saturation is 100% (Fig. S1B, C)" - I suggest the authors revisit this calculation as this parameter is typically very challenging to get above 95-96%. The sequencing saturation should be calculated from the statistics of alignment themselves, i.e. the parameter calculated by Cell Ranger as described here https://kb.10xgenomics.com/hc/en-us/articles/115003646912-How-is-sequencing-saturation-calculated :

      "The web_summary.html output from cellranger count includes a metric called "Sequencing Saturation". This metric quantifies the fraction of reads originating from an already-observed UMI. More specifically, this is the fraction of confidently mapped, valid cell-barcode, valid UMI reads that are non-unique (match an existing cell-barcode, UMI, gene combination).

      The formula for calculating this metric is as follows:

      Sequencing Saturation = 1 - (n_deduped_reads / n_reads)

      where

      n_deduped_reads = Number of unique (valid cell-barcode, valid UMI, gene) combinations among confidently mapped reads.

      n_reads = Total number of confidently mapped, valid cell-barcode, valid UMI reads.

      Note that the numerator of the fraction is n_deduped_reads, not the non-unique reads that are mentioned in the definition. n_deduped_reads is a degree of uniqueness, not a degree of duplication/saturation. Therefore we take the complement of (n_deduped_reads / n_reads) to measure saturation."

      We appreciate your insightful comment regarding our sequencing saturation calculation. The sequencing saturation algorithm we initially employed was based on the methodology used in the BacDrop study (PMID: PMC10014032, https://pmc.ncbi.nlm.nih.gov/articles/PMC10014032/).

      We acknowledge the importance of using standardized and widely accepted methods for calculating sequencing saturation. As per your suggestion, we have recalculated our sequencing saturation using the method described by 10x Genomics. Given the differences between RiboD-PETRI and 10x Genomics datasets, we have adapted the calculation as follows:

      · n_deduped_reads: We used the number of UMIs as a measure of unique reads.

      · n_reads: We used the total number of confidently mapped reads.

      After applying this adapted calculation method, we found that our sequencing saturation ranges from 92.16% to 93.51%. This range aligns more closely with typical expectations for sequencing saturation in single-cell RNA sequencing experiments, suggesting that we have captured a substantial portion of the transcript diversity in our samples. We also updated Figure S1 to reflect these recalculated sequencing saturation values. We will also provide a detailed description of our calculation method in the methods section to ensure transparency and reproducibility. It's important to note that this saturation calculation method was originally designed for 10× Genomics data. While we've adapted it for our study, we acknowledge that its applicability to our specific experimental setup may be limited.

      We thank you for bringing this important point to our attention. This recalculation not only improves the accuracy of our reported results but also aligns our methodology more closely with established standards in the field. We believe these revisions strengthen the overall quality and reliability of our study.

      - Further, this calculated saturation should be taken into account when comparing the performance of the method in terms of retrieving diverse transcripts from cells. I.e., if the RiboD-Petri dataset was subsampled to the same saturation as the original PETRI-seq dataset was obtained with, would the median UMIs/cell for all cells above filter be comparable? In other words, does rRNA depletion just decreases the cost to sequence to saturation, or does it provide UMI capture benefits at a comparable saturation?

      We appreciate your insightful question regarding the comparison of method performance in terms of transcript retrieval diversity and the impact of saturation. To address your concerns, we conducted an additional analysis comparing the RiboD-PETRI and original PETRI-seq datasets at equivalent saturation levels besides our original analysis with equivalent sequencing depth.

      With equivalent sequencing depth, RiboD-PETRI demonstrates a significantly enhanced Unique Molecular Identifier (UMI) counts detection rate compared to PETRI-seq alone (Fig. 1C). This method recovered approximately 20175 cells (92.6% recovery rate) with ≥ 15 UMIs per cell with a median UMI count of 42 per cell, which was significantly higher than PETRI-seq's recovery rate of 17.9% with a median UMI count of 20 per cell (Figure S1A, B), indicating the number of detected mRNA per cell increased prominently.

      When we subsampled the RiboD-PETRI dataset to match the saturation level of the original PETRI-seq dataset (i.e., equalizing the n_deduped_reads/n_reads ratio), we found that the median UMIs/cell for all cells above the filter threshold was higher in the RiboD-PETRI dataset compared to the original PETRI-seq (as shown in Author response image 2). This observation can be primarily attributed to the introduction of the rRNA depletion step in the RiboD-PETRI method. ​Our analysis suggests that rRNA depletion not only reduces the cost of sequencing to saturation but also provides additional benefits in UMI capture efficiency at comparable saturation levels.​The rRNA depletion step effectively reduces the proportion of rRNA-derived reads in the sequencing output. Consequently, at equivalent saturation levels, this leads to a relative increase in the number of n_deduped_reads corresponding to mRNA transcripts. This shift in read composition enhances the capture of informative UMIs, resulting in improved transcript diversity and detection.

      In conclusion, our findings indicate that the rRNA depletion step in RiboD-PETRI offers dual advantages: it decreases the cost to sequence to saturation and provides enhanced UMI capture benefits at comparable saturation levels, ultimately leading to more efficient and informative single-cell transcriptome profiling.

      Author response image 2.

      At almost the same sequencing saturation (64% and 67%), the number of cells exceeding the screening criteria (≥15 UMIs ) and the median number of UMIs in cells in Ribod-PETRI and PETRI-seq data of exponential period E. coli (3h).

      - smRandom-seq and BaSSSh-seq need to also be discussed since these newer methods are also demonstrating rRNA depletion techniques. (https://doi.org/10.1038/s41467-023-40137-9 and https://doi.org/10.1101/2024.06.28.601229)

      Thank you for your valuable feedback. We appreciate the opportunity to discuss our method, RiboD-PETRI, in the context of other recent advances in bacterial RNA sequencing techniques, particularly smRandom-seq and BaSSSh-seq.

      RiboD-PETRI employs a Ribosomal RNA-derived cDNA Depletion (RiboD) protocol. This method uses probe primers that span all regions of the bacterial rRNA sequence, with the 3'-end complementary to rRNA-derived cDNA and the 5'-end complementary to a biotin-labeled universal primer. After hybridization, Streptavidin magnetic beads are used to eliminate the hybridized rRNA-derived cDNA, leaving mRNA-derived cDNA in the supernatant. smRandom-seq utilizes a CRISPR-based rRNA depletion technique. This method is designed for high-throughput single-microbe RNA sequencing and has been shown to reduce the rRNA proportion from 83% to 32%, effectively increasing the mRNA proportion four times (from 16% to 63%). While specific details about BaSSSh-seq's rRNA depletion technique are not provided in the available information, it is described as employing a rational probe design for efficient rRNA depletion. This technique aims to minimize the loss of mRNA during the depletion process, ensuring a more accurate representation of the transcriptome.

      RiboD-PETRI demonstrates significant enhancement in rRNA-derived cDNA depletion across both gram-negative and gram-positive bacterial species. It increases the mRNA ratio from 8.2% to 81% for E. coli in exponential phase, from 10% to 92% for S. aureus in stationary phase, and from 3.9% to 54% for C. crescentus in exponential phase. smRandom-seq shows high species specificity (99%), a minor doublet rate (1.6%), and a reduced rRNA percentage (32%). These metrics indicate its efficiency in single-microbe RNA sequencing. While specific performance metrics for BaSSSh-seq are not provided in the available information, its rational probe design approach suggests a focus on maintaining mRNA integrity during the depletion process.

      RiboD-PETRI is described as a cost-effective ($0.0049 per cell), equipment-free, and high-throughput solution for bacterial scRNA-seq. This makes it an attractive option for researchers with budget constraints. While specific cost information is not provided, the efficiency of smRandom-seq is noted to be affected by the overwhelming quantity of rRNAs (>80% of mapped reads). The CRISPR-based depletion technique likely adds to the complexity and cost of the method. Cost and accessibility information for BaSSSh-seq is not provided in the available data, making a direct comparison difficult.

      All three methods represent significant advancements in bacterial RNA sequencing, each offering unique approaches to the challenge of rRNA depletion. RiboD-PETRI stands out for its cost-effectiveness and demonstrated success in complex systems like biofilms. Its ability to significantly increase mRNA ratios across different bacterial species and growth phases is particularly noteworthy. smRandom-seq's CRISPR-based approach offers high specificity and efficiency, which could be advantageous in certain research contexts, particularly where single-microbe resolution is crucial. However, the complexity of the CRISPR system might impact its accessibility and cost-effectiveness. BaSSSh-seq's focus on minimizing mRNA loss during depletion could be beneficial for studies requiring highly accurate transcriptome representations, although more detailed performance data would be needed for a comprehensive comparison. The choice between these methods would depend on specific research needs. RiboD-PETRI's cost-effectiveness and proven application in biofilm studies make it particularly suitable for complex bacterial community analyses. smRandom-seq might be preferred for studies requiring high-throughput single-cell resolution. BaSSSh-seq could be the method of choice when preserving the integrity of the mRNA profile is paramount.

      In conclusion, while all three methods offer valuable solutions for rRNA depletion in bacterial RNA sequencing, RiboD-PETRI's combination of efficiency, cost-effectiveness, and demonstrated application in complex biological systems positions it as a highly competitive option in the field of bacterial transcriptomics.

      We have revised our discussion in the manuscript according to the above analysis (lines 116-119)

      - Ctrl and Delta-Delta abbreviations are used in main text but not defined there (lines 107-110).

      Thank you for your valuable feedback. We have now defined the abbreviations "Ctrl" and "Delta-Delta" in the main text for clarity.

      - The utility of Figs 2E and 3E is questionable - the same information can be conveyed in text.

      Thank you for your thoughtful observation regarding Figures 2E and 3E. We appreciate your feedback and would like to address the concerns you've raised.

      While we acknowledge that some of the information in these figures could be conveyed textually, we believe that their visual representation offers several advantages. Figures 2E and 3E provide a comprehensive visual overview of the pathway enrichment analysis for marker genes, which may be more easily digestible than a textual description. This analysis was conducted in response to another reviewer's request, demonstrating our commitment to addressing diverse perspectives in our research.

      These figures allow for a systematic interpretation of gene expression data, revealing complex interactions between genes and their involvement in biological pathways that might be less apparent in a text-only format. Visual representations can make complex data more accessible to readers with different learning styles or those who prefer graphical summaries. Additionally, including such figures is consistent with standard practices in our field, facilitating comparison with other studies. We believe that the pathway enrichment analysis results presented in these figures provide valuable insights that merit inclusion as visual elements.​ However, we are open to discussing alternative ways to present this information if you have specific suggestions for improvement.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This work investigated the role of CXXC-finger protein 1 (CXXC1) in regulatory T cells. CXXC1-bound genomic regions largely overlap with Foxp3-bound regions and regions with H3K4me3 histone modifications in Treg cells. CXXC1 and Foxp3 interact with each other, as shown by co-immunoprecipitation. Mice with Treg-specific CXXC1 knockout (KO) succumb to lymphoproliferative diseases between 3 to 4 weeks of age, similar to Foxp3 KO mice. Although the immune suppression function of CXXC1 KO Treg is comparable to WT Treg in an in vitro assay, these KO Tregs failed to suppress autoimmune diseases such as EAE and colitis in Treg transfer models in vivo. This is partly due to the diminished survival of the KO Tregs after transfer. CXXC1 KO Tregs do not have an altered DNA methylation pattern; instead, they display weakened H3K4me3 modifications within the broad H3K4me3 domains, which contain a set of Treg signature genes. These results suggest that CXXC1 and Foxp3 collaborate to regulate Treg homeostasis and function by promoting Treg signature gene expression through maintaining H3K4me3 modification.

      Strengths:

      Epigenetic regulation of Treg cells has been a constantly evolving area of research. The current study revealed CXXC1 as a previously unidentified epigenetic regulator of Tregs. The strong phenotype of the knockout mouse supports the critical role CXXC1 plays in Treg cells. Mechanistically, the link between CXXC1 and the maintenance of broad H3K4me3 domains is also a novel finding.

      Weaknesses:

      (1) It is not clear why the authors chose to compare H3K4me3 and H3K27me3 enriched genomic regions. There are other histone modifications associated with transcription activation or repression. Please provide justification.

      Thank you for highlighting this important point. We prioritized H3K4me3 and H3K27me3 because they are well-established markers of transcriptional activation and repression, respectively. These modifications provide a robust framework for investigating the dynamic interplay of chromatin states in Treg cells, particularly in regulating the balance between activation and suppression of key genes. While histone acetylation, such as H3K27ac, is linked to enhancer activity and transcriptional elongation, our focus was on promoter-level regulation, where H3K4me3 and H3K27me3 are most relevant. Although other histone modifications could provide additional insights, we chose to focus on these two to maintain clarity and feasibility in our analysis. We are happy to further elaborate on this rationale in the manuscript if necessary.

      (2) It is not clear what separates Clusters 1 and 3 in Figure 1C. It seems they share the same features.

      We apologize for not clarifying these clusters clearly. Cluster 1 and 3 are both H3K4me3 only group, with H3K4me3 enrichment and gene expression levels being higher in Cluster 1. At first, we divided the promoters into four categories because we wanted to try to classify them into four categories: H3K4me3 only, H3K27me3 only, H3K4me3-H3K27me3 co-occupied, and None. However, in actual classification, we could not distinguish H3K4me3-H3K27me3 co-occupied group. Instead, we had two categories of H3K4me3 only, with cluster 1 having a higher enrichment level for H3K4me3 and gene expression levels.

      (3) The claim, "These observations support the hypothesis that FOXP3 primarily functions as an activator by promoting H3K4me3 deposition in Treg cells." (line 344), seems to be a bit of an overstatement. Foxp3 certainly can promote transcription in ways other than promoting H3K3me3 deposition, and it also can repress gene transcription without affecting H3K27me3 deposition. Therefore, it is not justified to claim that promoting H3K4me3 deposition is Foxp3's primary function.

      We appreciate the reviewer’s thoughtful observation regarding our claim about FOXP3’s role in promoting H3K4me3 deposition. We acknowledge that FOXP3 is a multifunctional transcription factor with diverse mechanisms of action, including transcriptional activation independent of H3K4me3 deposition and transcriptional repression that does not necessarily involve H3K27me3 deposition.

      Our intention was not to imply that promoting H3K4me3 deposition is the exclusive or predominant function of FOXP3 but rather to highlight that this mechanism contributes significantly to its role in regulating Treg cell function. We agree that our wording may have overstated this point, and we will revise the text to provide a more nuanced interpretation. Specifically, we will clarify that our observations suggest FOXP3 can facilitate transcriptional activation, in part, by promoting H3K4me3 deposition, but this does not preclude its other regulatory mechanisms.

      (4) For the in vitro suppression assay in Figure S4C, and the Treg transfer EAE and colitis experiments in Figure 4, the Tregs should be isolated from Cxxc1 fl/fl x Foxp3 cre/wt female heterozygous mice instead of Cxxc1 fl/fl x Foxp3 cre/cre (or cre/Y) mice. Tregs from the homozygous KO mice are already activated by the lymphoproliferative environment and could have vastly different gene expression patterns and homeostatic features compared to resting Tregs. Therefore, it's not a fair comparison between these activated KO Tregs and resting WT Tregs.

      Thank you for this insightful comment and for pointing out the potential confounding effects associated with using Treg cells from homozygous Foxp3Cre/Cre (or Cre/Y) Cxxc1fl/fl mice. We agree that using Treg cells from _Foxp3_Cre/+ _Cxxc1_fl/fl (referred to as “het-KO”) and their littermate _Foxp3_Cre/+ _Cxxc1_fl/+ (referred to as “het-WT”) female mice would provide a more balanced comparison, as these Treg cells are less likely to be influenced by the activated lymphoproliferative environment present in homozygous KO mice.

      To address this concern, we will perform additional experiments using Treg cells isolated from _Foxp3_Cre/+ _Cxxc1_fl/fl (“het-KO”) and their littermate _Foxp3_Cre/+ _Cxxc1_fl/+ (“het-WT”) female mice. We will update the manuscript with these new data to provide a more accurate assessment of the impact of CXXC1 deficiency on Treg cell function.

      (5) The manuscript didn't provide a potential mechanism for how CXXC1 strengthens broad H3K4me3-modified genomic regions. The authors should perform Foxp3 ChIP-seq or Cut-n-Taq with WT and Cxxc1 cKO Tregs to determine whether CXXC1 deletion changes Foxp3's binding pattern in Treg cells.

      Thank you for your insightful comments and valuable suggestions. We greatly appreciate your recommendation to explore the potential mechanism by which CXXC1 enhances broad H3K4me3-modified genomic regions.

      In response, we plan to conduct CUT&Tag experiments for Foxp3 in both WT and Cxxc1 cKO Treg cells.

      Reviewer #2 (Public review):

      FOXP3 has been known to form diverse complexes with different transcription factors and enzymes responsible for epigenetic modifications, but how extracellular signals timely regulate FOXP3 complex dynamics remains to be fully understood. Histone H3K4 tri-methylation (H3K4me3) and CXXC finger protein 1 (CXXC1), which is required to regulate H3K4me3, also remain to be fully investigated in Treg cells. Here, Meng et al. performed a comprehensive analysis of H3K4me3 CUT&Tag assay on Treg cells and a comparison of the dataset with the FOXP3 ChIP-seq dataset revealed that FOXP3 could facilitate the regulation of target genes by promoting H3K4me3 deposition.

      Moreover, CXXC1-FOXP3 interaction is required for this regulation. They found that specific knockdown of Cxxc1 in Treg leads to spontaneous severe multi-organ inflammation in mice and that Cxxc1-deficient Treg exhibits enhanced activation and impaired suppression activity. In addition, they have also found that CXXC1 shares several binding sites with FOXP3 especially on Treg signature gene loci, which are necessary for maintaining homeostasis and identity of Treg cells.

      The findings of the current study are pretty intriguing, and it would be great if the authors could fully address the following comments to support these interesting findings.

      Major points:

      (1) There is insufficient evidence in the first part of the Results to support the conclusion that "FOXP3 functions as an activator by promoting H3K4Me3 deposition in Treg cells". The authors should compare the results for H3K4Me3 in FOXP3-negative conventional T cells to demonstrate that at these promoter loci, FOXP3 promotes H3K4Me3 deposition.

      We appreciate the reviewer’s critical observation regarding our claim about FOXP3’s role in promoting H3K4me3 deposition. We acknowledge that FOXP3 is a multifunctional transcription factor with diverse mechanisms of action, including transcriptional activation independent of H3K4me3 deposition and transcriptional repression that does not necessarily involve H3K27me3 deposition.

      Our intention was not to imply that promoting H3K4me3 deposition is the exclusive or predominant function of FOXP3 but rather to highlight that this mechanism contributes significantly to its role in regulating Treg cell function. We agree that our wording may have overstated this point, and we will revise the text to provide a more nuanced interpretation. Specifically, we will clarify that our observations suggest FOXP3 can facilitate transcriptional activation, in part, by promoting H3K4me3 deposition, but this does not preclude its other regulatory mechanisms.

      We will compare H3K4me3 levels at the promoter loci of interest between FOXP3-negative conventional T cells and FOXP3-positive regulatory T cells. This comparison will help elucidate whether FOXP3 directly promotes H3K4me3 deposition at these loci.

      (2) In Figure 3 F&G, the activation status and IFNγ production should be analyzed in Treg cells and Tconv cells separately rather than in total CD4+ T cells. Moreover, are there changes in autoantibodies and IgG and IgE levels in the serum of cKO mice?

      We appreciate the reviewer’s constructive feedback on the analyses presented in Figures 3F and 3G and the additional suggestion to investigate autoantibodies and serum immunoglobulin levels.

      Regarding Figures 3F and 3G, we agree that separating Treg cells and Tconv cells for analysis of activation status and IFN-γ production would provide a more precise understanding of the cellular dynamics in Cxxc1 cKO mice.

      To address this, we will reanalyze the data to examine Treg and Tconv cells independently and include these results in the revised manuscript.

      As for the changes in autoantibodies and serum IgG and IgE levels, we acknowledge that these parameters are important indicators of systemic immune dysregulation.

      We will now measure serum autoantibodies and immunoglobulin levels in Cxxc1 cKO mice and WT controls.

      (3) Why did Cxxc1-deficient Treg cells not show impaired suppression than WT Treg during in vitro suppression assay, despite the reduced expression of Treg cell suppression assay -associated markers at the transcriptional level demonstrated in both scRNA-seq and bulk RNA-seq?

      Thank you for your thoughtful question. We appreciate your interest in understanding the apparent discrepancy between the reduced expression of Treg-associated suppression markers at the transcriptional level and the lack of impaired suppression observed in the in vitro suppression assay.

      There are several potential explanations for this observation:

      (1) Functional Redundancy: Treg cell suppression is a complex, multi-faceted process involving various effector mechanisms such as cytokine production (e.g., IL-10, TGF-β), cell-cell contact, and metabolic regulation. Thus, even though the transcriptional signature of suppression-associated genes is altered, compensatory mechanisms may still allow Cxxc1-deficient Treg cells to retain functional suppression capacity under these specific in vitro conditions.

      (2) In Vitro Assay Limitations: The in vitro suppression assay is a simplified model of Treg function that may not capture all the complexities of Treg-mediated suppression in vivo. While we observed altered gene expression in Cxxc1-deficient Treg cells, this might not directly translate to a functional defect under the specific conditions of the assay. In vivo, additional factors such as cytokine milieu, cell-cell interactions, and tissue-specific environments may be required for full suppression, which could be missing in the in vitro assay.

      (4) Is there a disease in which Cxxc1 is expressed at low levels or absent in Treg cells? Is the same immunodeficiency phenotype present in patients as in mice?

      Thank you for your insightful question regarding the role of CXXC1 in Treg cells and its potential link to human disease. To our knowledge, no specific human disease has been identified where CXXC1 is expressed at low levels or absent specifically in Treg cells. There is currently no direct evidence of an immunodeficiency phenotype in human patients that parallels the one observed in Cxxc1-deficient mice.

      Reviewer #3 (Public review):

      In the report entitled "CXXC-finger protein 1 associates with FOXP3 to stabilize homeostasis and suppressive functions of regulatory T cells", the authors demonstrated that Cxxc1-deletion in Treg cells leads to the development of severe inflammatory disease with impaired suppressive function. Mechanistically, CXXC1 interacts with Foxp3 and regulates the expression of key Treg signature genes by modulating H3K4me3 deposition. Their findings are interesting and significant. However, there are several concerns regarding their analysis and conclusions.

      Major concerns:

      (1) Despite cKO mice showing an increase in Treg cells in the lymph nodes and Cxxc1-deficient Treg cells having normal suppressive function, the majority of cKO mice died within a month. What causes cKO mice to die from severe inflammation?

      Considering the results of Figures 4 and 5, a decrease in Treg cell population due to their reduced proliferative capacity may be one of the causes. It would be informative to analyze the population of tissue Treg cells.

      We thank the reviewer for this insightful comment and acknowledge the importance of understanding the causes of severe inflammation and early mortality in cKO mice. Based on our data and previous studies, we propose the following explanations:

      (1) Reduced Treg Proliferative Capacity: As shown in Figure 5I, the decreased proportion of FOXP3+Ki67+ Treg cells in cKO mice likely reflects impaired proliferative capacity, which may limit the expansion of functional Treg cells in response to inflammatory cues, particularly in peripheral tissues where active suppression is required.

      (2) Altered Treg Function and Activation: Cxxc1-deficient Treg cells exhibit increased expression of activation markers (Il2ra, Cd69) and pro-inflammatory genes (Ifng, Tbx21). This suggests a functional dysregulation that may impair their ability to suppress inflammation effectively, despite their presence in lymphoid organs.

      (3) Tissue Treg Populations: Although our study focuses on lymph node-resident Treg cells, tissue-resident Treg cells play a crucial role in maintaining local immune homeostasis. It is plausible that Cxxc1 deficiency compromises the accumulation or functionality of tissue Treg cells, contributing to uncontrolled inflammation in non-lymphoid organs. Unfortunately, we currently lack data on tissue Treg populations, which limits our ability to directly address this hypothesis.

      Regarding the suggestion to analyze tissue Treg populations, we agree that this would be an important next step in understanding the cause of the severe inflammation and early mortality in Cxxc1-deficient mice.

      We plan to perform detailed analyses of Treg cell populations in various tissues, including the gut, lung, and liver, to determine if there are specific defects in tissue-resident Treg cells that could contribute to the observed phenotype.

      (2) In Figure 5B, scRNA-seq analysis indicated that Mki67+ Treg subset are comparable between WT and Cxxc1-deficient Treg cells. On the other hand, FACS analysis demonstrated that Cxxc1-deficient Treg shows less Ki-67 expression compared to WT in Figure 5I. The authors should explain this discrepancy.

      Thank you for pointing out the apparent discrepancy between the scRNA-seq and FACS analyses regarding Ki-67 expression in Cxxc1-deficient Treg cells.

      In Figure 5B, the scRNA-seq analysis identified the Mki67+ Treg subset as comparable between WT and Cxxc1-deficient Treg cells. This finding reflects the overall proportion of cells expressing Mki67 transcripts within the Treg population. In contrast, the FACS analysis in Figure 5I specifically measures Ki-67 protein levels, revealing reduced expression in Cxxc1-deficient Treg cells compared to WT.

      To address this discrepancy more comprehensively, we will further analyze the scRNA-seq data to directly compare Mki67 mRNA expression levels between WT and Cxxc1-deficient Treg cells.

      In addition, the authors concluded on line 441 that CXXC1 plays a crucial role in maintaining Treg cell stability. However, there appears to be no data on Treg stability. Which data represent the Treg stability?

      We appreciate the reviewer’s observation and recognize that our wording may have been overly conclusive. Our data primarily highlight the impact of Cxxc1 deficiency on Treg cell homeostasis and transcriptional regulation, rather than providing direct evidence for Treg cell stability. Specifically, the downregulation of Treg-specific suppressive genes (Nt5e, Il10, Pdcd1) and the upregulation of pro-inflammatory markers (Gzmb, Ifng, Tbx21) indicate a shift in functional states. While these findings may suggest an indirect disruption in the maintenance of suppressive phenotypes, they do not constitute a direct measure of Treg cell stability.

      To address the reviewer’s concern, we will revise our conclusion to more accurately state that our data support a role for CXXC1 in maintaining Treg cell homeostasis and functional balance, without overextending claims about Treg cell stability. Thank you for bringing this to our attention, as it will help us improve the clarity and precision of our manuscript.

      (3) The authors found that Cxxc1-deficient Treg cells exhibit weaker H3K4me3 signals compared to WT in Figure 7. This result suggests that Cxxc1 regulates H3K4me3 modification via H3K4 methyltransferases in Treg cells. The authors should clarify which H3K4 methyltransferases contribute to the modulation of H3K4me3 deposition by Cxxc1 in Treg cells.

      Thank you for pointing out the need to clarify the role of H3K4 methyltransferases in the modulation of H3K4me3 deposition by CXXC1 in Treg cells.

      In our study, we found that Cxxc1-deficient Treg cells exhibit reduced H3K4me3 levels, as shown in Figure 7. CXXC1 has been previously reported to function as a non-catalytic component of the Set1/COMPASS complex, which contains H3K4 methyltransferases such as SETD1A and SETD1B. These methyltransferases are the primary enzymes responsible for H3K4 trimethylation.

      References:

      (1) Lee J.H., Skalnik D.G. CpG-binding protein (CXXC finger protein 1) is a component of the mammalian Set1 histone H3-Lys4 methyltransferase complex, the analogue of the yeast Set1/COMPASS complex. J. Biol. Chem. 2005; 280:41725–41731.

      (2). J. P. Thomson, P. J. Skene, J. Selfridge, T. Clouaire, J. Guy, S. Webb, A. R. W. Kerr, A. Deaton, R. Andrews, K. D. James, D. J. Turner, R. Illingworth, A. Bird, CpG islands influence chromatin structure via the CpG-binding protein Cfp1. Nature 464, 1082–1086 (2010).

      (3) Shilatifard, A. 2012. The COMPASS family of histone H3K4 methylases: mechanisms of regulation in development and disease pathogenesis. Annu. Rev. Biochem. 81:65–95.

      (4) Brown D.A., Di Cerbo V., Feldmann A., Ahn J., Ito S., Blackledge N.P., Nakayama M., McClellan M., Dimitrova E., Turberfield A.H. et al. The SET1 complex selects actively transcribed target genes via multivalent interaction with CpG Island chromatin. Cell Rep. 2017; 20:2313–2327.

      Furthermore, it would be important to investigate whether Cxxc1-deletion alters Foxp3 binding to target genes.

      Thank you for this important suggestion regarding the impact of Cxxc1 deletion on FOXP3 binding to target genes. We agree that understanding whether Cxxc1 deficiency affects FOXP3’s ability to bind to its target genes would provide valuable insight into the regulatory role of CXXC1 in Treg cell function.

      To address this, we plan to perform CUT&Tag experiments to assess FOXP3 binding profiles in Cxxc1-deficient versus wild-type Treg cells. These experiments will allow us to determine if Cxxc1 loss disrupts FOXP3’s occupancy at key regulatory sites, which may contribute to the observed functional impairments in Treg cells.

      (4) In Figure 7, the authors concluded that CXXC1 promotes Treg cell homeostasis and function by preserving the H3K4me3 modification since Cxxc1-deficient Treg cells show lower H3K4me3 densities at the key Treg signature genes. Are these Cxxc1-deficient Treg cells derived from mosaic mice? If Cxxc1-deficient Treg cells are derived from cKO mice, the gene expression and H3K4me3 modification status are inconsistent because scRNA-seq analysis indicated that expression of these Treg signature genes was increased in Cxxc1-deficient Treg cells compared to WT (Figure 5F and G).

      Thank you for the insightful comment. To clarify, the Cxxc1-deficient Treg cells analyzed for H3K4me3 modification in Figure 7 were indeed derived from Cxxc1 conditional knockout (cKO) mice, not mosaic mice.

      The scRNA-seq analysis presented in Figures 5F and G revealed an upregulation of Treg signature genes in Cxxc1-deficient Treg cells. This finding suggests that the loss of Cxxc1 drives these cells toward a pro-inflammatory, activated state, underscoring the pivotal role of CXXC1 in maintaining Treg cell homeostasis and suppressive function.

      Regarding the apparent discrepancy between the reduced H3K4me3 levels and the increased expression of these genes, it is important to note that H3K4me3 primarily functions as an epigenetic mark that facilitates chromatin accessibility and transcriptional regulation, acting as an upstream modulator of gene expression. However, gene expression levels are also influenced by downstream compensatory mechanisms and complex inflammatory environments. In this context, the reduction in H3K4me3 likely reflects the direct role of CXXC1 in epigenetic regulation, whereas the upregulation of gene expression in Cxxc1-deficient Treg cells may result as a side effect of the inflammatory environment.

      To further substantiate our findings, we performed RNA-seq analysis on Treg cells from Foxp3_Cre/+ _Cxxc1_fl/fl (“het-KO”) and their littermate _Foxp3_Cre/+ _Cxxc1_fl/+ (“het-WT”) female mice, as presented in Figure S6C. This analysis revealed a notable reduction in the expression of key Treg signature genes, including _Icos, Ctla4, Tnfrsf18, and Nt5e, in het-KO Treg cells. Importantly, the observed changes in gene expression were consistent with the altered H3K4me3 modification status, further supporting the epigenetic regulatory role of CXXC1. These results further emphasize the critical role of CXXC1 promotes Treg cell homeostasis and function by preserving the H3K4me3 modification.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The manuscript by Rowell et al aims to identify differences in TCR recombination and selection between foetal and adult thymus in mice. Authors sequenced the unpaired bulk TCR repertoire in foetal and adult mice thymi and studied both TCRB and TCRa characteristics in the double positive (DP, CD4+CD8+) and single positive (SP4 CD4+CD8CD3+ and SP8 CD4-CD8+CD3+) populations. They identified age-related differences in TCRa and TCRB segment usage, including a preferential bias toward 3'TRAV and 5' TRAJ rearrangements in foetal cells compared to adults who had a larger perveance for 5'TRAV segments. By depleting the thymocyte population in adult thymi using hydrocortisone, the authors demonstrated that the repertoire became more foetal like, they therefore argue that the preferential 5'TRAV rearrangements in adults may be resulting from prolonged/progressive TCRa rearrangements in the adult thymocytes. In line with previous studies, Authors demonstrate that the foetal TCR repertoire was less diverse, less evenly distributed and had fewer non-template insertions while containing more clonal expansions. In addition, the authors claim that changes in V-J usage and CDR1 and CDR2 in the DP vs SP repertoires indicated that positive selection of foetal thymocytes are less dependent on interactions with the MHC. 

      Strengths: 

      Overall, the manuscript provides an extensive analysis of the foetal and adult TCR repertoire in the thymus, resulting in new insights in T cell development in foetal and adult thymi. 

      Weaknesses: 

      Three major concerns arise:

      (1) the authors have analysed TCR repertoires of only 4 foetal and 4 adult mice, considering the high spread the study may have been underpowered. 

      Given the concerns of the reviewer we have sequenced more libraries and added more data to include repertoires from 7 embryos and 6 young adults (biological replicates from different sorts). We believe that including more replicates has indeed strengthened our study. 

      Our experimental approach was to sequence TCR transcripts, and in studies using RNA-sequencing of inbred mice, often only 3 individuals (biological replicates) are sequenced.

      Our study sequenced from 7 foetal thymuses (generating TCRα and TCRβ repertoires from 4 FACS-sorted cell populations); 6 adult thymuses (generating TCRα and TCRβ repertoires from 4 FACS-sorted cell populations); and 5 adult thymuses from hydrocortisone-treated mice (generating TCRα and TCRβ repertoires from FACS-sorted CD3lo and CD3hi DP populations). We thus analysed 124 distinct repertoires from different populations and libraries, and many tens of thousands of unique sequences.  

      (2) Gating strategies are missing and 

      We have included gating strategies for cell-sorting as SFig7 and SFig8.

      (3) the manuscript is very technical and clearly aimed for a highly specialised audience with expertise in both thymocyte development and TCR analysis. Authors are recommended to provide schematics of the TCR rearrangements/their findings and include a summary conclusions/implications of their findings at the end of each results section rather than waiting till the discussion. This will help the reader to interpret their findings while reading the results. 

      We have modified the manuscript to include a more general introductory paragraph (page 3) to introduce the reader to the topic and we have included brief summaries of the findings at the end of each result section (pages 7,9,10,12,13,15).

      Reviewer #2 (Public Review): 

      Summary: 

      The authors comprehensively assess differences in the TCRB and TCRA repertoires in the fetal and adult mouse thymus by deep sequencing of sorted cell populations. For TCRB and

      TCRA they observed biased gene segment usage and less diversity in fetal thymocytes. The TCRB repertoire was less evenly distributed and displayed more evidence of clonal expansions and repertoire sharing among individuals in fetal thymocytes. In both fetal and adult thymocytes they show skewing of V segment (CDR1-2) repertoires in CD4 and CD8 as compared to DP thymocytes, which they attribute to MHC-I vs MHC-II restriction during positive selection. However the authors assess these effects to be weaker in fetal thymocytes, suggesting weaker MHC-restriction. They conclude that in multiple respects fetal repertoires are distinct from and more innate-like than adult. 

      Strengths: 

      The analyses of the F18.5 and adult thymic repertoires are comprehensive with respect to the cell populations analyzed and the diversity of approaches used to characterize the repertoires. Because repertoires were analyzed in pre- and post-selection thymocyte subsets, the data offer the potential to assess repertoire selection at different developmental stages. The analysis of repertoire selection in fetal thymocytes may be unique. 

      Weaknesses: 

      (1) Problematic experimental design and some lack of familiarity with prior work have resulted in highly problematic interpretations of the data, particularly for TCRA repertoire development. 

      The authors note fetal but not adult thymocytes to be biased towards usage of 3' V segments and 5'J segments. It should be noted that these basic observations were made 20 years ago using PCR approaches (Pasqual et al., J.Exp.Med. 196:1163 (2002)), and even earlier by others.

      We have cited this manuscript (Introduction, page 5) which used PCR of genomic DNA to investigate some TCRα VJ rearrangements in foetal and adult thymus. In contrast, our study uses next generation sequencing of transcripts to investigate all possible combinations of TCRα and TCRβ VJ combinations in different sorted thymocyte populations ex vivo. The greater sensitivity of this more modern technology has thus enabled us to detect many more TCRαVJ rearrangements than the 2002 study, and to conclude on basis of stringent statistical testing that the foetal repertoire is enriched for 3’V to 5’J combinations (Fig. 4). 

      The authors also note that in fetal thymus this bias persists after positive selection, and it can be reproduced in adults during recovery from hydrocortisone treatment. The authors conclude that there are fewer rounds of sequential TCRA rearrangements in the fetal thymus, perhaps due to less time spent in the DP compartment in fetus versus adult. However, the repertoire difference noted by the authors does not require such an explanation. What the authors are analyzing in the fetus is the leading edge of a synchronous wave of TCRA rearrangements, whereas what they are analyzing in adults is the unsynchronized steady state distribution. It is certainly true, as has been shown previously, that the earliest TCRA rearrangements use 3' TRAV and 5'TRAJ segments. But analysis of adult thymocytes has shown that the progression from use of 3' TRAV and 5' TRAJ to use of 5' TRAV and 3' TRAJ takes several days (Carico et al., Cell Rep. 19:2157 (2017)). The same kinetics, imposed on fetal development, would put development of a more complete TCRA repertoire at or shortly after birth. In fact, Pasqual showed exactly this type of progression from F18 through D1 after birth, and could reproduce the progression by placing F16 thymic lobes in FTOC. It is not appropriate to compare a single snapshot of a synchronized process in early fetal thymocytes to the unsynchronized steady state situation in adults. In fact, the authors' own data support this contention, because when they synchronize adult thymocytes by using hydroxycortisone, they can replicate the fetal distribution. Along these lines, the fact that positive selection of fetal thymocytes using 3' TRAV and 5' TRAJ segments occurs within 2 days of thymocyte entry into the DP compartment does not mean that DP development in the fetus is intrinsically rapid and restricted to 2 days. It simply means that thymocytes bearing an early rearranging TCR can be positively selected shortly after TCR expression. The expectation would be that those DP thymocytes that had not undergone early positive selection using a 3' TRAV and a 5' TRAJ would remain longer in the DP compartment and continue the progression of TCRA rearrangements, with the potential for selection several days later using more 5'TRAV and 3'TRAJ. 

      We agree with this summary provided by the reviewer which corresponds closely to the points we made ourselves in the manuscript. Indeed, we discuss the synchronization and kinetics of first wave of T-cell development in Results page 13 and Discussion page 17, which was the rationale for the hydrocortisone experiment.  We have also discussed findings from Carico et al 2017 in this context (see pages 13, 16, 17).  

      (2) The authors note 3' V and 5'J biases for TCRB in fetal thymocytes. The previously outlined concerns about interpreting TCRA repertoire development do not directly apply here. But it would be appropriate to note that by deep sequencing, Sethna (PNAS 114:2253 (2017)) identified skewed usage of some of the same TRBV gene segments in fetal versus adult.  It should also be noted that Sethna did not detect significantly skewed usage of TRBJ  segments. Regardless, one might question whether the skewed usage of TRBJ segments detected here should be characterized as relating to chromosomal location. There are two logical ways one can think about chromosomal location of TRBJ segments - one being TRBJ1 cluster vs TRBJ2 cluster, the other being 5' to 3' within each cluster. The variation reported here does not obviously fit either pattern. Is there a statistically significant difference in aggregate use of the two clusters? There is certainly no clear pattern of use 5' to 3' across each cluster. 

      We have included a statistical comparison of the aggregate TRBJ use between the J1 cluster and the J2 cluster (see SFig5) and Results page 9. 

      (3) The authors show that biases in TCRA and TCRB V and J gene usage between fetal and adult thymocytes are mostly conserved between pre- and post-selection thymocytes (Fig 2). In striking contrast, TCRA and TCRB combinatorial repertoires show strong biases preselection that are largely erased in post-selection thymocytes (Fig 3). This apparent discrepancy is not addressed, but interpretation is challenging. 

      I think the reviewer is referring to heatmaps for individual gene segment usage shown in Figure 2 in comparison to combinatorial usage shown in Figure 4. There is not a discrepancy in the data, but rather the differences between these two figures lie in the way in which the comparisons are made and visualised.  The heatmaps in Figure 2A-D show mean proportional usage of each individual gene segment for each cell type in the two life stages, clustered by Euclidian distance. This visualisation clearly shows bias in foetal 3’ TRAV usage and 5’TRAJ usage (looking at areas of red, which have higher usage), with less pronounced enrichment for TRBV and TRBJ.  The heatmaps also show differences in intensity between different cell populations in each life-stage. 

      In contrast, in Figure 4 the tiles show combinations with statistically significant (P<0.05) differences in mean counts for each VJ combination in each cell type between 7 foetal and 6 adult repertoires by Student’s t-test, after correcting for False discovery rate (FDR) due to multiple combinations.  It is the case, that there are fewer significant differences in proportional combinatorial VxJ use between foetal and adult repertoires after selection. We find this an interesting finding and have expanded our discussion of this aspect of the data (page 10).  More than half of the significant differences persist after repertoire selection, and the reduction in each individual SP population, of course in part reflects the lineage divergence.

      (4) The observation that there is a higher proportion of nonproductive TCRB rearrangements in fetal thymus compared to adult is challenging to interpret, given that the results are based upon RNA sequencing so are unlikely to reflect the ratio in genomic DNA due to processes like NMD.

      We have added two sentences to explain that transcripts of non-productive rearrangements are eliminated by nonsense-mediated decay (NMD), but some non-productive transcripts are detected in many studies of TCR repertoire sequencing, and we have cited three studies from different groups that document this (see Results, page 10-11). We have not commented on how the increase in non-productive TCR rearrangements in the foetal populations (in comparison to adult) relates to rearrangements in genomic DNA or NMD.   We have likewise not commented on the possible significance or biological role of nonproductive TCR transcripts, but simply reported our findings.

      (5) An intriguing and paradoxical finding is that fetal DP, CD4 and CD8 thymocytes all display greater sharing of TCRB CDR3 sequences among individuals than do adults (Fig 5DE), whereas DP and CD8 thymocytes are shown to display greater CDR3 amino acid triplet motif sharing in adults (with a similar trend in CD4). 

      As foetal DP, CD4SP and CD8SP TCRbeta repertoires have fewer non-template insertions and lower means CDR3 length, they are expected to share more CDR3 repertoires than their adult counterparts.  However, in the case of CDR3 amino acid triplet motifs (k-mers) what is being analysed is the sharing of each possible individual k-mer. If k-mers are shared more in the adult for some populations, but CDR3 repertoires are shared more in the foetus, we think it means that some k-mers appear in many different CDR3 sequences in the adult, so that they are over-represented in multiple different CDR3s (presumably due to selection processes, although we agree that this is just an assumption).  

      The authors attribute high amino acid triplet sharing to the result of selection of recurrent motifs by contact with pMHC during positive selection. But this interpretation seems highly problematic because the difference between fetal and adult thymocytes is dramatic even in unfractionated DP thymocytes, the vast majority of which have not yet undergone positive selection. How then to explain the differences in CDR3 sharing visualized by the different approaches? 

      The TCRβ repertoire has been selected in the adult DP population through the process of β-selection, which is believed to involve immune synapse formation and MHC-interactions (Allam et al 2021,10.1083/jcb.201908108). We have now included this reference in the introduction to make this clear (page 4). However, we agree with the reviewer’s comments that it is challenging to explain the k-mer analysis and that we have not been able to actually show that increased k-mer sharing in the adult is a direct consequence of increased positive selection: it was our interpretation of this seemingly paradoxical finding.  For clarity, we have therefore removed the k-mer analyses from the manuscript.

      (6) The authors conclude that there is less MHC restriction in fetal thymocytes, based on measures of repertoire divergence from DP to CD4 and CD8 populations (Fig. 6). But the authors point to no evidence of this in analysis of TRBV usage, either by PC or heatmap analyses (A,B,D). The argument seems to rest on PC analysis of TRAV usage (Fig S6), despite the fact that dramatic differences in the SP4 and SP8 repertoires are readily apparent in the fetal thymocyte heatmaps. The data do not appear to be robust enough to provide strong support for the authors' conclusion. 

      We have written the text very carefully so as not to make the claim too strong, stating in the abstract: “In foetus we identified less influence of MHC-restriction on α-chain and β-chain combinatorial VxJ usage and CDR1xCDR2 (V region) usage in SP compared to adult, indicating weaker impact of MHC-restriction on the foetal TCR repertoire.” We are not saying that MHC-restriction does not impact VJ gene usage in foetal repertoires, but rather that it has less influence (particularly when compared to life-stage).  Evidence for this comes from:  [1] Heatmaps in Fig2A-D which show that all repertoires cluster first by life-stage ahead of cell type; [2] Fig3A and B: PCA of adult and foetal TCRβ VXJ combinations: All repertoires cluster by life-stage on PC1.  PC2 separates adult repertoires by cell type (adult SP8 are positive on PC2 while adult SP4 are negative on PC2, and DP cells are between them) but for foetal repertoires the SP8 and SP4 are highly dispersed with some SP4 cells falling on positive side of PC2.  Only foetal DP repertoires cluster tightly. [3] Fig6A-C: PCA of β−chain CDR1xCDR2 (corresponding to Vβ gene segment usage) again shows the same pattern.  Adult repertoires separate by cell type on PC2, (SP8 positive on PC2, SP4 negative on PC2, with DP in between), but foetal SP8 repertoires are much more dispersed.  [5] SFig6J-K: PCA of α−chain CDR1xCDR2 (Vα usage) frequency distributions: adult repertoires cluster together and are separated by cell type on PC2 (SP4 positive, SP8 negative), but foetal populations are highly dispersed and fail to cluster by cell type on either axis. [6] We have additionally added new PCA analyses to explore differences in MHC-restriction between foetal and adult SP populations.  This is shown in the new Figure 7. We reasoned that in a PCA that included foetal and adult repertoires together, the foetal repertoires might not segregate by SP cell type (MHC-restriction) because of their overall bias towards particular VJ combinations, which would mean that effectively the PCA would be imposing adult MHC restriction on the foetal repertoires.  We therefore carried out PCA in which we analysed the adult repertoires separately from the foetal repertoires.  As expected for adult repertoires, PCA separated SP4 repertoires from SP8 repertoires on PC1 in each comparison (β-chain VxJ (Fig. 7B), α-chain VxJ (Fig. 7F), β-chain CDR1xCDR2 (V region) (Fig. 7H) and α-chain CDR1xCDR2 (V region) (Fig. 7L)). In contrast, for foetal TCRα repertoires (α-chain VxJ and α-chain CDR1xCDR2 (V region)), PCA failed to separate SP4 from SP8 repertoires on PC1 or PC2, so we did not detect impact of MHC-restriction on foetal TCRβ repertoires (Fig. 7E and K).  For foetal TCRβ repertoires, PCA separated SP4 β-chain VxJ from SP8 on PC2, accounting for only 11.1% of variance (Fig. 7A) (in contrast to the 44.2% of variance accounted for by MHC-restriction in adult β-chain VxJ PCA (Fig. 7B)). Thus, in adult repertoires ~4-fold more of the variance in β-chain VxJ usage can be accounted for by MHC-restriction than in foetal repertoires. PCA of foetal β-chain CDR1xCDR2 (V region) separated SP4 from SP8 on PC1, accounting for 28.8% of variance, whereas in PCA of adult β-chain CDR1xCDR2, MHCrestriction accounted for 56.1% (>2-foldmore than in foetus).  Thus, even when we  considered only V-region usage alone, we detected a stronger influence of MHC-restriction on the TCRβ repertoire in adult compared to foetal thymus.  

      Reviewer #3 (Public Review): 

      Summary:

      This study provides a comparison of TCR gene segment usage between foetal and adult thymus.

      Strengths:

      Interesting computational analyses was performed to find interesting differences in TCR gene usage within unpaired TCRa and TCRb chains between foetal and adult thymus.  

      Weaknesses:

      This study was significantly lacking insight and interpretation into what the data analysed actually means for the biology. The dataset discussed in the paper is from only two experiments. One comparing foetal and adult thymi from 4 mice per group and another which involved hydrocortisone treatment. The paper uses TCR sequencing methodology that sequences each TCR alpha and beta chains in an unpaired way, meaning that the true identity of the TCR heterodimer is lost. This also has the added problem of overestimating clonality, and underestimating diversity.

      We have discussed the limitations and benefits of our approach of sequencing TCRβ and TCRα repertoires separately in the Discussion (page 19).  This approach allows the analysis of thousands of sequences from different cell types and different individuals at relatively low cost. We have made no claims in our manuscript about overall diversity or pairing, and given that each chain’s gene locus rearranges at a different time point in development, we believe it is of interest to consider the repertoires individually within this context.

      Limited detail in the methods sections also limits the ability for readers to properly interpret the dataset. What sex of mice were used? Are there any sex differences? What were the animal ethics approvals for the study?

      We have included this information in the Methods (page 19).  Both sexes were used and we found no sex differences, although that was not the focus of our study. All animal experimentation in the UK is carried out under UK Home Office Regulations (following ethical review). This is included in the Methods (page 19).  

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors): 

      Major points: 

      - Group sizes are very small (4 foetal and 4 adult mice). Considering the spread in TCR analysis (eg fig 1 B-H, Sup figures 2-4), the study is likely underpowered as it often looks like one mouse prevents or supports a statistical difference. Authors should therefore consider increasing the group size. 

      We have sequenced more libraries and included more data, from 7 foetal and 6 young adult animals (biological replicates).  

      - The authors should include a gating strategy for their sorted cells. This is essential to verify the quality of their findings. 

      We have added this to the Methods and SFig7 and SFig8.

      Authors should include a summary sentence at the end of each result section which interprets the main finding. Furthermore, the manuscript would greatly benefit from a schematic figure of their main findings, particularly with regards to the rearrangements and selection differences in foetal and adult thymi. 

      We have added a summary sentence to the end of each results section.

      - Authors should be more careful with their claim that MHC has less of an effect foetal TCR selection. Authors demonstrated that there is a difference in VJ recombination between the foetal and adult TCR repertoire, skewing the foetal TCR repertoire to certain variable and junctional segments. Since both CDR1 and CDR2 are encoded by the variable gene, this is likely to affect their ability to interact with the MHC during positive selection. Have Authors considered whether the selection process is actually a bystander effect of the differences in the rearrangement process? One way to support the authors claim is to demonstrate that mice with an alternative MHC background, have similar foetal/adult gene rearrangements but a different TCR repertoire in the SP populations. 

      Time and resources have prevented us from repeating our experiments in another strain of inbred mice.  However, we note that a previous PCR study that showed 3’TRAV to 5’TRAJ bias in foetal repertoires was carried out in BALB/c mice (Pasqual JEM 2002). We have added this point to the Discussion (page 17). 

      - (supplementary) tables have not been provided. 

      Supplementary Tables were uploaded with the submission.  STables 1 and 2 show antibodies used for cell sorts and STable 3 primers used.

      Moderate points: 

      - The loading plots in Figure 3 onward are visually strong. Authors could consider including an V and J (separate) loading plots for Figure 3 E, F and G to demonstrate preferential V and J usage. 

      We have included additional loading plots in Figure 7 for the new PCA we have added (see Fig. 7C, D,I and J).

      - "the proportion of non-productive rearrangements was higher in the foetal SP8 population than adults (Fig 5A)" Authors should explain how non-productive TCRs end up in SP populations as they need to pass positive and negative selection which both require interactions between the TCR and the MHC. 

      As we used RNA sequencing in our study, we did not comment on how the increase in nonproductive TCRbeta rearrangements in the foetal populations (in comparison to adult) relates to rearrangements in genomic DNA or to nonsense-mediated decay (NMD) that is believed to down-regulate transcripts of non-productively rearranged TCR.  We have not commented on the possible significance or biological role of non-productive TCR transcripts, but simply reported our findings. 

      - Authors have studied CDR3 sequential amino acid triplets (k-mers). However, CDR3 regions are longer than 3 amino acids in length, hence authors should provide 1) an overview/comparison of the identified k-mers in foetal or adult thymocytes 2) explain how different k-mers relate to each other, eg whether they are expressed in the same TCR. Have authors considered using alternative programs to identify CDR3 motifs that are based on the full CDR3amino acid sequence, eg TCRdist provides motifs and indicated which amino acids are germline encoded or inserted. 

      In light of this comment from this reviewer and also comments from Reviewer 2, we have removed the comparison of k-mers from the manuscript.  Please see response to point 5 of Reviewer 2.  

      - The term "innate-like" is confusing as it implies that foetal cells are not antigen specific.

      However, once in the circulation, foetal cells will respond in an antigen-specific manner.

      Hence authors should use another term. 

      We have removed the term “innate-like” from the abstract and the first time we used it in the first paragraph of the Discussion. However, the second time we used the term, we are actually taking it from the manuscript we cited (Beaudin et al 2016) and in this case we left it in. We agree that foetal cells are likely to respond in an antigen-specific manner. 

      - To support their hypothesis in the discussion "However, as TCRd gene segments are nested.... so that 5' TRAV segments are not favoured" can authors confirm that there are indeed less yd T cells in the foetal repertoire? 

      We have removed this section from the discussion, because although it is interesting, it is highly speculative, and the manuscript is already quite complicated to interpret.

      Minor points: 

      - The authors may find the publication by De Greef 2021 PNAS of interest to identify TRBD segments 

      - Authors need to clarify that they mean CDR3-beta in the sentence "The mean predicted CDR3 length.... compared to young adult" 

      We have included new data in the manuscript to show that mean CDR3 length is lower in all foetal populations of beta (Fig5C) and alpha (SFig5C) and clarified which we are referring to in the text. 

      - Authors should bring the section "During TCRb gene rearrangement, these segments.... Initiating the sequence of rearrangements" forward and include a schematic." Forward to figure 2 and provide the reader with a visual schematic of the foetal vs adult recombination events. 

      - Discussion: "The first wave of foetal abT-cells that leave the thymus... tolerant to both self and maternal MHC/antigens". Have Authors considered the alternative hypothesis published by Thomas 2019 in Curr Opin System Biol that the observed bias could potentially provide better protection against childhood pathogens? 

      We have indeed considered this, as stated in the first paragraph of the Discussion “The first wave of foetal αβT-cells that leave the thymus must provide early protection against infection in the neonatal animal”. We have now cited the Thomas 2019 study.

      - Discussion: Authors should rephrase the sentence "The transition from DP to SP cell in the foetus.... From DN3 to SP cell may be slower" as it is unclear what the authors mean. 

      We have rephrased this (see page 17)

      - Discussion "TRAV and TRAJ Array" do authors mean "TRAV and TRAJ area"? 

      We did indeed mean array (as in series of gene segments) but we have changed the wording for clarity (page 14).

      - Methods, Fluorescence activated cell sorting: can authors clarify whether they stained, sorted and sequenced the full thymus and /or specify how many cells were included. Can authors also explain why foetal and adult cells were treated differently (eg the volume of master mix)? 

      - Methods Fluorescence activated cell sorting authors should specify what they mean with "mastermix of either 1:50 (foetal thymus) or 1:100 (adult thymus)". Does this mean all antibodies in the foetal mastermix were 1:50 and all antibodies in the adult master mix were 1:100? If so, why were different concentrations used and why were antibodies not individually titrated before use?  

      We have clarified the methods and antibodies used are listed with clones in supplementary tables.

      Figures: 

      - Several figures did not fit on the page and therefore missed the top or side 

      - Figure 1A: missing a label on the Y axis

      This is visible

      - Figure 2A-D: please indicate the 5' and 3' terminus in each graph. The cell type legend should include two separate colours for the two DP populations. 

      We have added 5’ and 3’ labels.  The two DP populations are clearly labelled.

      - Figure 4: please indicate the 5' and 3' terminus in each graph. 

      We have added 5’ and 3’ labels.   

      - Figure 5C: y axis should read mean CDR3B length (aa), Figure 5D and E: y axis should read Jaccard Index CDR3B, Figure 5 F and G: y axis should read Jaccard index CDR3B k-mers. Same comment for Sup Fig 5 but then CDR3a. 

      We have added these labels for both Figure 5 and Supplementary Figure 6 (was SFig5 previously).

      - Figure 6C top label should read CDR1B x CDR2B with highest contribution 

      We have added this label.

      - Figure 7: please indicate the 5' and 3' terminus in each graph. 

      We have added 5’ and 3’ labels.  This is now Figure 8, as we have added new analyses (new Figure 7).

      - Supplementary Figure 1-4 are missing a colour legend next to the graphs.

      We have added the legends in.  

      Reviewer #2 (Recommendations For The Authors): 

      (1) The authors need to provide better support for the notion that the fetal thymus produces ab T cells with properties and functions that are distinct from adult T cells. There are several  ways they might provide a more meaningful assessment: (1) They could analyze the fetal repertoire at multiple time points. (2) They could compare instead the steady state distributions in early postnatal and adult thymus samples. (3) They could compare the peripheral T cell repertoires in the first week of life versus adult. This last approach would allow them to draw the most impactful conclusion. 

      We appreciate these suggestions.  Sadly, it is beyond our budget for the current manuscript and beyond the scope of our current study that we believe provides interesting new information.

      (2) Fig S2D shows TRBJ1-4 in black lettering meant to indicate no significant difference whereas the figure shows use of this gene segment to be elevated in adult. I believe TRBJ1-4 should be in blue lettering.

      This is now coloured correctly.

      (3) The figure call out on p11 (Fig5I-J) should be H-I.

      This is now corrected.

      (4) Please indicate in the main text that Jaccard analysis in Fig 5 D-E is for TCRB.

      This is now corrected.

      (5) The analysis of usage of TCRB CDR1xCDR2 combinations in Fig6D is said to "reflect the bias observed in their TRBV gene usage (Fig 2C)". Isn't it the case that every TRBV gene presents a distinct CDR1xCDR2 combination, meaning that there is no difference between TRBV usage and TRBV CDR1xCDR2 usage? If so, please make this clearer.

      Yes, this is the case, we have made this clearer in the text.

      Reviewer #3 (Recommendations For The Authors): 

      In general, although there is lots of interesting analyses that can be done with these large datasets, I feel as though the authors did not fully interpret the real meaning and significance of many of these results. Whilst there were some speculation on why a foetal repertoire might be different to those of adults in the discussion sections, the rationale for each individual analyses was not clearly explained. I would suggest that the rationale and a thorough explanation of each analyses be added to the results section, including a finishing sentence on what it means. 

      We have added short summaries to each results section to make the points we are making clearer.

      The authors did not mention how many cells were sorted for from each thymus for sequencing. Was the cell number normalised between each population? As this might have an influence on various downstream measurements of diversity, evenness and clonality, if there is a sampling issue. 

      This is explained in the methods.  We used sampling to allow comparisons between repertoires of different sizes, and this is also explained in the methods.

      The authors should include the cell sorting profiles and example flow cytometry plots, including gating strategies and the post sort purity of each sorted population. 

      We have included sorting strategies in the methods (SFig7 and SFig8).

      I think the manuscript could also be improved if there were some basic characterisation of foetal vs. adult thymus development. How many thymocytes are in a foetal vs adult thymus at the timepoints chosen? 

      I think there were some interesting findings in this paper. Given that overall, the foetal thymus appeared to be less diverse than that of the adult, one question I thought would be interesting to discuss was the overlap between the two repertoires. Is the foetal thymus simply a sub-fraction of the adult repertoire or is it totally distinct with no overlapping sequences? 

      Our analyses indicate that the repertoires are actually different. This is evident in Fig4 and in PCA loading plots shown in Fig, 3C and new Fig. 7C, D, I and J.

      I think that some of the interpretation in the results section may be a bit vague. "When we compaired by thymocyte population, each adult population clustered together, with adult SP4 separating from adult SP8 on PC2 and DP cells scoring in between, suggesting that PC2 might correspond to MHC restriction of the adult populations." - whilst I think I know what the authors mean, I do believe that this could be explained in clearer detail and more explicit. SP4 and SP8 are known to be positively selected in the thymus on distinct MHC class I and MHC class II molecules for example. 

      We have tried to clarify the text describing that PCA and additionally added a new Figure (new Fig. &) to compare the influence of MHC-restriction on the TCR repertoire in foetal and adult thymus.

      In the methods section, the age and sex of mice used were not explained at all. What was used in the experiment? Are there any sex differences? 

      Age and sex of mice is given in the methods.  We have not detected sex differences.

      This is a huge omission from the manuscript. In general, I don't believe the methods section has described the analysis in sufficient detail for replication. All analysis code and data should be publicly accessible and be in a format that allows for the reader to replicate the figures in the paper upon running the code. Perhaps even allowing them to run their own TCR datasets.  Overall, I think the manuscript needs some rewriting to include additional details and deeper interpretation of each individual analyses. 

      Sequencing data files will be made publicly available on UCL Research Data Repository.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors report compound heterozygous deleterious variants in the kinase domains of the non-receptor tyrosine kinases (NRTK) TNK2/ACK1 in familial SLE. They suggest that ACK1 and BRK deficiencies are associated with human SLE and impair efferocytosis.

      Strengths: 

      The identification of similar mutations in non-receptor tyrosine kinases (NRTKs) in two different families with familial SLE is a significant finding in human disease. Furthermore, the paper provides a detailed analysis of the molecular mechanisms behind the impairment of efferocytosis caused by mutations in ACK1 and BRK.

      Weaknesses: 

      A critical point in this paper is whether the loss of function of ACK1 or BRK contributes to the onset of familial SLE. The authors emphasize that inhibitors of ACK1/BRK worsened IgG deposition in the kidneys in a pristane-induced SLE model, which contributes not to the onset but to the exacerbation of SLE, thus only partially supporting their claim.

      The evidence supporting that the loss of function of ACK1 or BRK contributes to the onset of SLE in the patients from the 2 families mostly relies on the genetic analysis. As the reviewer states, the observation that inhibitors of ACK1/BRK worsened IgG deposition in the kidneys in a pristane-induced SLE model supports the genetic evidence.

      To further address the possible role of ACK1 or BRK variants in the onset of autoimmunity in vivo, we treated wild-type (WT) BALB/cByJ female mice with inhibitors in the absence of pristane.

      The results indicated that mice that had received a weekly injection of ACK1 or BRK inhibitors developed a large array of serum anti-nuclear IgG antibodies, including but not limited to autoantibodies associated with SLE such as anti-histones, anti-chromatin, anti U1-snRNP, anti-SSA, and anti-Ku in comparison to the control group inhibitor treated mice (Revised Fig 3A). However, they did not develop glomerular deposit of IgG after 12 weeks of treatment, in contrast to mice that have received Pristane (Revised Fig. 3B,C, Figure 3-figure supplement 1).

      These additional data suggests that inhibition of ACK1 and BRK stimulates the production of serum autoantibodies, which strengthen the claim that ACK1 and BRK kinase deficiency contribute to autoimmunity in BALB/cByJ.

      Reviewer #2 (Public Review):

      Summary: 

      In this manuscript, the authors revealed that genetic deficiencies of ACK1 and BRK are associated with human SLE. First, the authors found that compound heterozygous deleterious variants in the kinase domains of the non-receptor tyrosine kinases (NRTK) TNK2/ACK1 in one multiplex family and PTK6/BRK in another family. Then, by an experimental blockade of ACK1 or BRK in a mouse SLE model, they found an increase in glomerular IgG deposits and circulating autoantibodies. Furthermore, they reported that ACK and BRK variants from the SLE patients impaired the MERTK-mediated anti-inflammatory response to apoptotic cells in human induced pluripotent stem cells (hiPSC)-derived macrophages. This work identified new SLE-associated ACK and BRK variants and a role for the NRTK TNK2/ACK1 and PTK6/BRK in efferocytosis, providing a new molecular and cellular mechanism of SLE pathogenesis.

      Strengths: 

      This work identified new SLE-associated ACK and BRK variants and a role for the NRTK TNK2/ACK1 and PTK6/BRK in efferocytosis, providing a new molecular and cellular mechanism of SLE pathogenesis.

      Weaknesses: 

      Although the manuscript is well-organized and clearly stated, there are some points below that should be considered:

      In this study, the authors used forward genetic analyses to identify novel gene mutations that may cause SLE, combined with GWAS studies of SLE. To further explore the importance of these variants, haplotype analysis of two candidate genes could be performed, to observe the evolution and selection relationship of candidate genes in the population (UK 1000 biobank, for example). 

      To investigate whether ACK1/TNK2 or BRK/PTK6 were subject to selection, we gathered data using different metrics quantifying negative selection in the human genome. We collected the f parameter from SnIPRE1, lofTool2, and evoTol3, as well as intraspecies metrics from RVIS4, LOEUF5, and pLI6 (including pRec). We also used our in-house CoNeS metric7. None of these indicators suggest that the genes are under strong negative selection (Revised Figure 2-figure supplement 2). This is consistent with the deficiency being recessive. We also tested the variants with a MAF greater than 0.005. We found them to be neutral. We therefore did not test whether they were associated with any phenotype in the UK Biobank.

      Although the authors focused on SLE and macrophage efferocytosis in their studies, direct evidence of how macrophage efferocytosis significantly affects SLE is lacking. This point should at least be explicitly introduced and discussed by citing appropriate literature.

      We provide a more detailed description of the role of macrophage efferocytosis in autoimmunity and SLE in the revised manuscript. Specifically, we state (in the results section, paragraph: ACK1 and BRK kinase domain variants may lose the ability to link MERTK to RAC1, AKT and STAT3 activation for efferocytosis): “NRTKs such as ACK1 8 and PTK2/FAK 9 are also downstream targets of the TAM family receptor MERTK which is expressed on macrophages and controls the anti-inflammatory engulfment of apoptotic cells, a process known as efferocytosis 10-12. Efferocytosis allows for the clearance of apoptotic cells before they undergo necrosis and release intracellular inflammatory molecules, and simultaneously leads to increased production of anti-inflammatory molecules (TGFb, IL-10, and PGE2) and a decreased secretion of proinflammatory cytokines (TNF-alpha, IL-1b, IL-6) 10-14. In line with these findings, mice deficient in molecular components used by macrophages to efficiently perform efferocytosis, such as MFG-E8, MERTK, TIM4, and C1q, develop phenotypes associated with autoimmunity10,11,14-27. Furthermore, defects in efferocytosis are also observed in patients with SLE and glomerulonephritis14,28-31.“

      It is still not clear how the target molecules identified in this paper may influence macrophage efferocytosis. More direct evidence should be established. 

      Our studies show that wt -but not variants- of ACK1 and BRK are activated by MERTK, a key receptor that mediates the recognition of apoptotic cells. Our studies also show that wt -but not variants- activate RAC1 which is necessary for engulfment and phosphorylate AKT and STAT3 which are involved in the anti-inflammatory response to PtdSer recognition.

      The TAM family receptor MERTK mediates recognition of PtdSer on apoptotic cells via GAS6 and Protein S 10,15,32 leading to their engulfment, which involves activation of RAC1 for actin reorganization and the formation of a phagocytic cup 9,33. Using IP kinase assays we show that MERTK and GAS6 can activate the kinase activity of wild-type ACK1 8 or BRK but not of the patient’s ACK1 or BRK variant alleles (Figure 4D). To further support the role of ACK1 and BRK downstream from PtdSer recognition and uptake of apoptotic cells, we show that reference ACK1 and BRK alleles, in contrast to the patient variant alleles, can activate RAC1 to generate RAC-GTP which is necessary for engulfment 9,33 (Figure 4C).

      PtdSer recognition also typically stimulates an anti-inflammatory process mediated in part via AKT 34 and STAT3 and their target genes such as SOCS3 35-41 and results in the inhibition of LPS-mediated production of inflammatory mediators such as TNF and IL-1b, and the production of cytokines such as IL-10, TGFb 11,25-27,42. Consistent with this literature and the findings of the paper, we show that reference ACK1 and BRK, unlike the patient’s variant alleles, can phosphorylate AKT and STAT3 (Figure 4A, B). The role of ACK1 and BRK in these signaling pathways is further supported by our transcriptomics data comparing the response of controls, patients, and inhibitor-treated iPSC-derived macrophages to apoptotic thymocytes by RNA-seq. Specifically, we show Transcriptional repressors including the AKT targets ATF3, TGIF1, NFIL3, and KLF4, the STAT3 targets SOCS3 and DUSP5, as well as CEBPD and the inhibitor of E-BOX DNA Binding ID3 were among the top-ten genes which expression is induced by apoptotic cells in WT macrophages (Figure 4F), but this regulation was lost in mutant and inhibitor-treated macrophages (Figure 4F).

      For some transcriptional repressors mentioned in their studies, the authors should check whether there is clear experimental evidence. If not, it is recommended to supplement the experimental verifications for clarity.

      Transcriptional repressors including the AKT targets ATF3, TGIF1, NFIL3, and KLF4, the STAT3 targets SOCS3 and DUSP5, as well as CEBPD and the inhibitor of E-BOX DNA Binding ID3 were among the top-ten genes which expression is induced by apoptotic cells in WT macrophages (Figure 4F), but this regulation was lost in mutant and inhibitor-treated macrophages (Figure 4F).

      In the manuscript we cited published evidence, to the best of our knowledge, for the role of these genes in the regulation of inflammatory responses. Specifically we state: “ATF3, TGIF1, NFIL3, and KLF4 are involved in the negative regulation of inflammation in macrophages 35-38, SOCS3 is an inhibitor of the macrophage inflammatory response and DUSP5 is a negative regulator of ERK activation 39,40,43. These data suggest that the kinase domain of ACK1 and BRK contribute to the macrophage anti-inflammatory gene expression program driven by apoptotic cells.”

      In Figures 4C and 4D, it is seen that the usage of inhibitors causes cytoskeletal changes, however this reviewer would not have expected such large change. Did the authors check whether the cells die after heavy treatment by the inhibitors?

      We carefully examine the viability of Isogenic WT, BRK and ACK1 mutant macrophages (left panel) and of WT macrophages treated with ACK1 or BRK inhibitors and we did not observed changes in viability (Figure 4-figure supplement 2).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A crucial step in the development of SLE is the production of autoantibodies. It is shown in Figure 2F that inhibitors of ACK1/BRK enhanced the production of autoantibodies against histones and SSA in a pristane-induced SLE model, which is a significant result that could support the authors' claim. Strangely, this autoantigen panel does not include double-stranded DNA, RNP, or Sm, which should be presented regarding antibody production.

      We thank the reviewer for this comment. In the revised manuscript (Revised Figure 3 – Supplement 1) we added the remainder of the autoantibody panel, which includes double-stranded DNA, RNP, and Sm autoantibody levels. We also added the results for serum IgG autoantibody levels in BALB/cByJ mice treated for three months with DMSO, ACK1, or BRK inhibitors but did not receive a pristane injection (Revised Figure 3A). This data shows that mice which received ACK1 or BRK inhibitors had increased serum IgG autoantibodies in comparison to DMSO treated controls.

      Additionally, if there is information that inhibitors of ACK1/BRK promote the differentiation of follicular helper T cells, memory B cells, and plasma cells in a pristane-induced SLE model, it could be considered indirect evidence supporting the authors' claims.

      These are not available at present to the best of our knowledge.

      Reviewer #2 (Recommendations For The Authors):

      Minor points:

      * In the literature, unpaired t-tests and ordinary one-way ANOVA (Tukey's multiple comparisons test) were used for statistical analysis, which requires data to be normally distributed. This part of the proposal is reflected in the text, and the non-conforming results need to be statistically analyzed using the non-parametric test of graphpad prism.

      We would like to thank the reviewer for pointing out this oversight. In the revised manuscript, for all applicable datasets, we tested whether the data was normally distributed using a Shapiro-Wilk normality test. For datasets that were normally distributed statistical significance was determined by a Student t test or ordinary one-way ANOVA with Tukey’s multiple comparisons test depending on the number of conditions being compared and the experimental setup. In contrast, for datasets that were not normally distributed statistical significance was determined using a Mann-Whitney, Kruskal-Wallis multiple comparisons tests, or Wilcoxon matched-pairs signed rank test depending on the experimental setup. P values below 0.05 were considered significant for all statistical tests.

      The authors used different methods to represent the level of significant difference. Therefore, it is suggested that the significance level should be expressed by letters. 

      As suggested by the reviewer, in the revised manuscript we have designated the significance level throughout all figures using letters (p, or q values).

      For RNA-seq, more information should be provided in the paper. For example, the correlation between sample biological replicates, the total number of differentially expressed genes, and randomly selected genes for qRT-PCR results verification.

      We would like to thank the reviewer for pointing out this oversight. In the revised manuscript we provided more information regarding the RNA-seq dataset, including a Principal Component Analysis (PCA) showing correlation between sample replicates (Revised Figure 4-figure supplement 1A), as well as a table indicating the number of upregulated and downregulated genes between relevant datasets (Revised Figure 4-figure supplement 1B).

      The results of the RNA-seq analysis indicated that ACK1 and BRK contribute to the macrophage anti-inflammatory gene expression program driven by apoptotic cells. MERTK-dependent anti-inflammatory program elicited by apoptotic cells on macrophages is best evidenced by the reduction of LPS-mediated production of inflammatory mediators such as TNF or IL1b 25-27,34,44. Therefore, to validate the RNA-seq results in a functional manner we tested the decrease of LPS-induced production of TNF and IL1b by apoptotic cells in isogenic WT, ACK1 deficient, and BRK deficient macrophages. Consistent with the RNA-seq data, the functional assays indicated that ACK1 and BRK kinase activities are required for the decrease of TNF and IL1b production induced by LPS in response to apoptotic cells (Revised Figure 4H,I).

      The raw data files for the RNA-seq analysis have been deposited in the NCBI Gene Expression Omnibus under accession number GEO: GSE118730.

      The authors did not have the formats for some of the citations correct. This should be fixed. 

      References were reformatted.

      (1) Eilertson, K. E., Booth, J. G. & Bustamante, C. D. SnIPRE: selection inference using a Poisson random effects model. PLoS Comput Biol 8, e1002806 (2012). https://doi.org:10.1371/journal.pcbi.1002806

      (2) Fadista, J., Oskolkov, N., Hansson, O. & Groop, L. LoFtool: a gene intolerance score based on loss-of-function variants in 60 706 individuals. Bioinformatics 33, 471-474 (2017). https://doi.org:10.1093/bioinformatics/btv602

      (3) Rackham, O. J., Shihab, H. A., Johnson, M. R. & Petretto, E. EvoTol: a protein-sequence based evolutionary intolerance framework for disease-gene prioritization. Nucleic Acids Res 43, e33 (2015). https://doi.org:10.1093/nar/gku1322

      (4) Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet 9, e1003709 (2013). https://doi.org:10.1371/journal.pgen.1003709

      (5) Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434-443 (2020). https://doi.org:10.1038/s41586-020-2308-7

      (6) Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285-291 (2016). https://doi.org:10.1038/nature19057

      (7) Rapaport, F. et al. Negative selection on human genes underlying inborn errors depends on disease outcome and both the mode and mechanism of inheritance. Proc Natl Acad Sci U S A 118 (2021). https://doi.org:10.1073/pnas.2001248118

      (8) Mahajan, N. P., Whang, Y. E., Mohler, J. L. & Earp, H. S. Activated tyrosine kinase Ack1 promotes prostate tumorigenesis: role of Ack1 in polyubiquitination of tumor suppressor Wwox. Cancer Res 65, 10514-10523 (2005). https://doi.org:10.1158/0008-5472.CAN-05-1127

      (9) Wu, Y., Singh, S., Georgescu, M. M. & Birge, R. B. A role for Mer tyrosine kinase in alphavbeta5 integrin-mediated phagocytosis of apoptotic cells. J Cell Sci 118, 539-553 (2005). https://doi.org:10.1242/jcs.01632

      (10) Scott, R. S. et al. Phagocytosis and clearance of apoptotic cells is mediated by MER. Nature 411, 207-211 (2001). https://doi.org:10.1038/35075603

      (11) Henson, P. M. & Bratton, D. L. Antiinflammatory effects of apoptotic cells. J Clin Invest 123, 2773-2774 (2013). https://doi.org:10.1172/JCI69344

      (12) Henson, P. M. Cell Removal: Efferocytosis. Annu Rev Cell Dev Biol 33, 127-144 (2017). https://doi.org:10.1146/annurev-cellbio-111315-125315

      (13) deCathelineau, A. M. & Henson, P. M. The final step in programmed cell death: phagocytes carry apoptotic cells to the grave. Essays Biochem 39, 105-117 (2003). https://doi.org:10.1042/bse0390105

      (14) Nagata, S. Apoptosis and Clearance of Apoptotic Cells. Annu Rev Immunol 36, 489-517 (2018). https://doi.org:10.1146/annurev-immunol-042617-053010

      (15) Cohen, P. L. et al. Delayed apoptotic cell clearance and lupus-like autoimmunity in mice lacking the c-mer membrane tyrosine kinase. J Exp Med 196, 135-140 (2002). https://doi.org:10.1084/jem.20012094

      (16) Hanayama, R. et al. Autoimmune disease and impaired uptake of apoptotic cells in MFG-E8-deficient mice. Science 304, 1147-1150 (2004). https://doi.org:10.1126/science.1094359

      (17) Miyanishi, M., Segawa, K. & Nagata, S. Synergistic effect of Tim4 and MFG-E8 null mutations on the development of autoimmunity. Int Immunol 24, 551-559 (2012). https://doi.org:10.1093/intimm/dxs064

      (18) Colonna, L., Parry, G. C., Panicker, S. & Elkon, K. B. Uncoupling complement C1s activation from C1q binding in apoptotic cell phagocytosis and immunosuppressive capacity. Clin Immunol 163, 84-90 (2016). https://doi.org:10.1016/j.clim.2015.12.017

      (19) Nagata, S., Hanayama, R. & Kawane, K. Autoimmunity and the clearance of dead cells. Cell 140, 619-630 (2010). https://doi.org:10.1016/j.cell.2010.02.014

      (20) Kimani, S. G. et al. Contribution of Defective PS Recognition and Efferocytosis to Chronic Inflammation and Autoimmunity. Front Immunol 5, 566 (2014). https://doi.org:10.3389/fimmu.2014.00566

      (21) Hanayama, R., Tanaka, M., Miwa, K., Shinohara, A., Iwamatsu, A. & Nagata, S. Identification of a factor that links apoptotic cells to phagocytes. Nature 417, 182-187 (2002). https://doi.org:10.1038/417182a

      (22) Kawano, M. & Nagata, S. Lupus-like autoimmune disease caused by a lack of Xkr8, a caspase-dependent phospholipid scramblase. Proc Natl Acad Sci U S A 115, 2132-2137 (2018). https://doi.org:10.1073/pnas.1720732115

      (23) Watanabe-Fukunaga, R., Brannan, C. I., Copeland, N. G., Jenkins, N. A. & Nagata, S. Lymphoproliferation disorder in mice explained by defects in Fas antigen that mediates apoptosis. Nature 356, 314-317 (1992). https://doi.org:10.1038/356314a0

      (24) Singer, G. G., Carrera, A. C., Marshak-Rothstein, A., Martinez, C. & Abbas, A. K. Apoptosis, Fas and systemic autoimmunity: the MRL-lpr/lpr model. Current opinion in immunology 6, 913-920 (1994).

      (25) Cvetanovic, M. & Ucker, D. S. Innate immune discrimination of apoptotic cells: repression of proinflammatory macrophage transcription is coupled directly to specific recognition. J Immunol 172, 880-889 (2004). https://doi.org:10.4049/jimmunol.172.2.880

      (26) Fadok, V. A., Bratton, D. L., Konowal, A., Freed, P. W., Westcott, J. Y. & Henson, P. M. Macrophages that have ingested apoptotic cells in vitro inhibit proinflammatory cytokine production through autocrine/paracrine mechanisms involving TGF-beta, PGE2, and PAF. J Clin Invest 101, 890-898 (1998). https://doi.org:10.1172/JCI1112

      (27) Voll, R. E., Herrmann, M., Roth, E. A., Stach, C., Kalden, J. R. & Girkontaite, I. Immunosuppressive effects of apoptotic cells. Nature 390, 350-351 (1997). https://doi.org:10.1038/37022

      (28) Herrmann, M., Voll, R. E., Zoller, O. M., Hagenhofer, M., Ponner, B. B. & Kalden, J. R. Impaired phagocytosis of apoptotic cell material by monocyte-derived macrophages from patients with systemic lupus erythematosus. Arthritis Rheum 41, 1241-1250 (1998). https://doi.org:10.1002/1529-0131(199807)41:7<1241::AID-ART15>3.0.CO;2-H

      (29) Baumann, I. et al. Impaired uptake of apoptotic cells into tingible body macrophages in germinal centers of patients with systemic lupus erythematosus. Arthritis Rheum 46, 191-201 (2002). https://doi.org:10.1002/1529-0131(200201)46:1<191::AID-ART10027>3.0.CO;2-K

      (30) Schrijvers, D. M., De Meyer, G. R. Y., Kockx, M. M., Herman, A. G. & Martinet, W. Phagocytosis of apoptotic cells by macrophages is impaired in atherosclerosis. Arterioscl Throm Vas 25, 1256-1261 (2005). https://doi.org:10.1161/01.ATV.0000166517.18801.a7

      (31) Morioka, S., Maueroder, C. & Ravichandran, K. S. Living on the Edge: Efferocytosis at the Interface of Homeostasis and Pathology. Immunity 50, 1149-1162 (2019). https://doi.org:10.1016/j.immuni.2019.04.018

      (32) Seitz, H. M., Camenisch, T. D., Lemke, G., Earp, H. S. & Matsushima, G. K. Macrophages and dendritic cells use different Axl/Mertk/Tyro3 receptors in clearance of apoptotic cells. J Immunol 178, 5635-5642 (2007). https://doi.org:10.4049/jimmunol.178.9.5635

      (33) Mao, Y. & Finnemann, S. C. Regulation of phagocytosis by Rho GTPases. Small GTPases 6, 89-99 (2015). https://doi.org:10.4161/21541248.2014.989785

      (34) Sen, P. et al. Apoptotic cells induce Mer tyrosine kinase-dependent blockade of NF-kappaB activation in dendritic cells. Blood 109, 653-660 (2007). https://doi.org:10.1182/blood-2006-04-017368

      (35) Vergadi, E., Ieronymaki, E., Lyroni, K., Vaporidi, K. & Tsatsanis, C. Akt Signaling Pathway in Macrophage Activation and M1/M2 Polarization. J Immunol 198, 1006-1014 (2017). https://doi.org:10.4049/jimmunol.1601515

      (36) Byles, V. et al. The TSC-mTOR pathway regulates macrophage polarization. Nat Commun 4, 2834 (2013). https://doi.org:10.1038/ncomms3834

      (37) Liao, X. et al. Kruppel-like factor 4 regulates macrophage polarization. J Clin Invest 121, 2736-2749 (2011). https://doi.org:10.1172/JCI45444

      (38) Roberts, A. W., Lee, B. L., Deguine, J., John, S., Shlomchik, M. J. & Barton, G. M. Tissue-Resident Macrophages Are Locally Programmed for Silent Clearance of Apoptotic Cells. Immunity 47, 913-927 e916 (2017). https://doi.org:10.1016/j.immuni.2017.10.006

      (39) Matsukawa, A. et al. Stat3 in resident macrophages as a repressor protein of inflammatory response. J Immunol 175, 3354-3359 (2005).

      (40) Sica, A. & Mantovani, A. Macrophage plasticity and polarization: in vivo veritas. J Clin Invest 122, 787-795 (2012). https://doi.org:10.1172/JCI59643

      (41) Yi, Z., Li, L., Matsushima, G. K., Earp, H. S., Wang, B. & Tisch, R. A novel role for c-Src and STAT3 in apoptotic cell-mediated MerTK-dependent immunoregulation of dendritic cells. Blood 114, 3191-3198 (2009). https://doi.org:10.1182/blood-2009-03-207522

      (42) Rothlin, C. V., Carrera-Silva, E. A., Bosurgi, L. & Ghosh, S. TAM receptor signaling in immune homeostasis. Annu Rev Immunol 33, 355-391 (2015). https://doi.org:10.1146/annurev-immunol-032414-112103

      (43) Seo, H. et al. Dual-specificity phosphatase 5 acts as an anti-inflammatory regulator by inhibiting the ERK and NF-kappaB signaling pathways. Sci Rep 7, 17348 (2017). https://doi.org:10.1038/s41598-017-17591-9

      (44) Camenisch, T. D., Koller, B. H., Earp, H. S. & Matsushima, G. K. A novel receptor tyrosine kinase, Mer, inhibits TNF-alpha production and lipopolysaccharide-induced endotoxic shock. J Immunol 162, 3498-3503 (1999).

    1. Author Response:

      Reviewer #4 (Public Review):

      In this work, Tee et al. study the implications of Heparan Sulfate (HS) binding mutations observed on the Enterovirus A71 (EV-A71) capsid. HS-binding mutations are observed for several virus infections and are often presumed to be a cell culture adaptation. However, in the case of EV-A71, the presence of HS-binding mutations in clinical samples and the contradictory findings in animal studies have made the clinical relevance of HS-binding a subject of debate. Therefore, to better understand the role of HS-binding in EV-A71, the authors use a mouse-adapted EV-A71 variant (MP4) and compare it to a cell-adapted strong HS-binder (MP4-97R/167G). Using these two variants, the authors show that the strong HS-binder does not require acidification for uncoating and genome release. Furthermore, it is demonstrated that the capsid stability of the HS-binding variant is compromised, resulting in pH-independent uncoating. Overall, this study provides new insights demonstrating that seemingly beneficial mutations increasing viral replication may be counterbalanced by other unintended consequences.

      Strengths:

      The thoroughness of the experiments performed to demonstrate that the HS-binding phenotype results in pH-independent entry and capsid destabilisation is worth highlighting. In this regard, the authors have explored viral entry using a range of approaches involving lysosomotropic drugs, viral binding assays, and neutral red-labelled viruses coupled with diverse techniques such as FISH, RNAscope, and transient expression of constitutively active molecules to inhibit parts of the viral cycle. In my opinion, this is necessary to rule out the other downstream effects of the lysomotropic drugs and to confirm the role of the HS-binding mutation in the entry phase. The use of in silico analysis coupled with negative staining electron microscopy and environmental challenge assays is notable. Finally, the demonstration of some of the work using a human-relevant strain is commendable.

      We appreciate the reviewer recognition of the significance of our study and the precious advises.

      Weaknesses:

      A major weakness in this study is the focus on using a mouse-adapted EV-A71 strain (MP4). In the introduction, it is argued that HS-binding mutations are controversial due to their occurrence in cell culture. However, due to host limitations, mice are not the natural hosts for EV-A71 and thus, the same argument can be made for a mouse-adapted strain. It is not clear how different this strain is from circulating EV-A71 strains and the relevance of these findings to the human situation is questionable. This is particularly made evident in the discussion where it is highlighted that HS-binding variants (VP1-145G/Q mutants) have been associated with severe neurological cases while the same variants show attenuated phenotypes in mice and monkeys. This contrast between clinical data and animal studies should be highlighted in the introduction, rather than later in the discussion, as currently the in vivo animal studies are presented as the optimal situation and may lead to misconstrued conclusions from the results.

      As requested by the reviewer, we included new experiments performed with a clinical strain isolated in an immunosuppressed patient (Cordey et al., 2012). We compared the sensitivity of this human strain harboring or not the VP1 L97R and E167G mutations to HCQ and confirmed that the similar differential sensitivity to HCQ was observed as with the MP4 variant. This result is presented as a new supplementary figure (Figure 6-figure supplement 1) and is described in the result section of the revised manuscript (Page 7, lines 251).

      Page 7, lines 251: To determine if our observations are applicable to human strains, we examined the sensitivity of a closely related clinical strain. This strain was isolated from the respiratory tract of an immunosuppressed patient with a disseminated EV-A71 infection27. Additionally, we tested a strong HS-binding derivative that harbors the same VP1-L97R and E167G mutations as our MP4 double mutant. Notably, this human clinical strain shares 98.3% amino acid similarity with the MP4 variant used in this study and exhibits similar HS-binding phenotypes28. As shown in Figure 6-figure supplement 1, the original human strain was inhibited by HCQ, whereas the double mutant exhibited insensitivity to the drug.

      We also added the comment about discrepancy between clinical data and animal studies in the introduction as requested (page 2, lines 69-76): However, epidemiological surveillance of human EV-A71 infections19-21 and experimental evidence from 2D human fetal intestinal models22, human airway organoids23 and air-liquid interface cultures24 suggest that HS binding may enhance viral replication and virulence in humans. In addition, recent research has shown that EV-A71 can be released and transmitted via cellular extrusions25 or exosomes26, potentially preventing viral trapping of HS-binding strains in the circulation. Further studies are required to evaluate the true impact of HS-binding mutations on the spread and virulence of EV-A71 in both animal models and humans.

      An important consideration is that the results are based primarily on image analysis. The inclusion of RT-qPCR and/or plaque assays as supplementary data will help strengthen the findings.

      We have performed RT-qPCR to confirm the immunostaining data and included them in the supplementary data (Figure 1-figure supplement 1E). Reference to these data is made in the result section [Page 4, lines 114-116: These results were confirmed by viral load quantification with real-time RT-PCR (Figure 1-figure supplement 1E).]

      Moreover, there are suggestions of an intermediate binder having a different phenotype. As this intermediate binder is the clinical phenotype, data on the entry of this intermediate binder will be valuable.

      While we agree with reviewer that the single mutant is an intermediate binder and exhibits a clinical phenotype, we made the decision to work with variants that display clear phenotypes, selecting MP4 and the double mutant, as the latter is fully attenuated in both immunocompetent and immunosuppressed mice (Weng et al., 2023). Additionally, we performed an experiment using HCQ, where we observed an intermediate effect with the single mutant. This further confirmed our decision to proceed with MP4 and the double mutant for all experiments. The data supporting this are shown in Author response image 1, which we are sharing exclusively with the reviewer.

      Author response image 1.

      Differential sensitivity of MP4, MP4-97R and MP4-97R167G to Lysosomotropic drugs

      Another weakness in the study is the lack of contextualization of the results to current EV-A71 literature. For instance, SCARB2 is referred to as the internalization receptor but a recent study has shown that SCARB2 is not required for internalization (https://doi.org/10.1128%2Fjvi.02042-21). The findings from this study are consistent with the localization of SCARB2 in the lysosomal membranes. Furthermore, the same study has highlighted host sulfation as a key factor in EV-A71 entry. Post-translational sulfation introduces negatively charged residues on host proteins including HS and SCARB2. This increases the binding of HS-binding strains to these proteins. In this regard, the reduced infectivity upon soluble SCARB2 treatment may simply be due to enhanced binding rather than capsid opening as suggested in the results. Therefore, additional experiments (e.g. nSEM following soluble SCARB2 treatment) must be performed to support the conclusion of capsid opening, due to inherent instability, upon SCARB2 binding.

      We apologize for not citing this relevant literature excluding the role of SCARB2 in viral attachment. We have now included these references in the revised version of the manuscript. (Page 2, lines 54-56: “Since SCARB2 is mostly localized on endosomal and lysosomal membrane and sparsely on plasma membrane3,5, it seems to play only a minor role in EV-A71 cell attachment6,7.

      We thank the reviewer for mentioning the possibility that the sulfation of SCARB2 may enhance its binding to the mutated virus compared to the wild-type virus, potentially explaining the selective competitive inhibition of this variant by soluble SCARB2 produced in mammalian cells. To investigate this hypothesis, we performed nsEM imaging of the double mutant incubated with soluble SCARB2 and we observed an increase in the proportion of empty capsids in the presence of soluble SCARB2 (4% versus 0.7%), supporting our original findings that the inactivation is indeed associated with capsid opening. The results are included in the revised manuscript in Figure 5-figure supplement 4 and described on Page 7, lines 243-245: “However, the double mutant exhibited a ~5-fold increase in empty capsid percentage after treatment with sSCARB2 (Figure 5-figure supplement 4), consistent with the functional data above.”

      In addition to the above, other existing literature on EV-A71 pathogenesis using organoids contradicts some of the explanations of differential phenotype in clinical observations versus mice models. In the introduction, it is suggested that reduced neurovirulence of HS-binding strains is due to binding to the vascular endothelia. However, the correlation of clinical severity to viremia (https://doi.org/10.1186/1471-2334-14-417) and the association of HS-binding mutants to clinical disease counteract this suggestion. Similarly, viral infection in human organoids with EV-A71 results in as low as 0.4% of the cells being infected (https://doi.org/10.1038/s41564-023-01339-5). In this case, if viral binding to (ubiquitously expressed) HS results in viral trapping then the HS-binding mutants should show lowered infectivity in organoid models rather than the observed higher infectivity (https://doi.org/10.3389/fmicb.2023.1045587, https://doi.org/10.1038/s41426-018-0077-2). Finally, EV-A71 release has also been shown to occur in exosomes (https://doi.org/10.1093%2Finfdis%2Fjiaa174) which effectively provides a protective lipid membrane. These recent findings must be incorporated into the article and will help better contextualize their findings.

      We appreciate the reviewer thoughtful comments. We do not believe that the correlation between clinical severity and viremia contradicts the viral trapping hypothesis. For strains that do not bind to HS, the absence of viral trapping could indeed lead to higher viral concentrations in the bloodstream, potentially increasing neurovirulence. However, we agree with the reviewer that other observations in humans, along with experimental data from more relevant models such as organoids, challenge the trapping hypothesis. We are grateful for the suggested citations and have incorporated these references in the introduction, where we discuss this point in more detail

      Page 2, lines 69-76: “However, epidemiological surveillance of human EV-A71 infections19-21 and experimental evidence from 2D human fetal intestinal models22, human airway organoids23 and air-liquid interface cultures24 suggest that HS binding may enhance viral replication and virulence in humans. In addition, recent research has shown that EV-A71 can be released and transmitted via cellular extrusions25 or exosomes26, potentially preventing viral trapping of HS-binding strains in the circulation. Further studies are required to evaluate the true impact of HS-binding mutations on the spread and virulence of EV-A71 in both animal models and humans.”

      Overall, the authors present new findings with convincing methodology. The manuscript can be improved in the contextualization of the findings and highlighting the weakness in translating these findings to resolve the debate surrounding the relevance of HS-binding phenotype. The inclusion of additional experiments and data recommended to the authors will also help strengthen the manuscript.<br />

    1. Author Response:

      eLife Assessment

      This manuscript makes an important contribution to the understanding of protein-protein interaction (PPI) networks by challenging the widely held assumption that their degree distributions uniformly follow a power law. The authors present convincing evidence that biases in study design, such as data aggregation and selective research focus, may contribute to the appearance of power-law-like distributions. While the power law assumption has already been questioned in network biology, the methodological rigor and correction procedures introduced here are valuable for advancing our understanding of PPI network structure.

      Thanks for this assessment which perfectly reflects our study.

      Reviewer #1 (Public Review):

      This manuscript was previously reviewed and this earlier evaluation resulted in two conflicting assessments. I fully endorse the favourable opinion of former Reviewer 1 and find most negative comments of former Reviewer 2 inappropriate.

      This work is absolutely necessary. Even though the authors find it difficult to be fully assertive in the end, their ground work in trying to demonstrate the existence of bias in PPI data is undeniably valuable. Other authors have tried before to show the limitation of unequivocally assigning the degree distribution to a power law but these doubts have had a weak impact. This new study is a great opportunity to discuss further a concern for a simplistic view of PPI network topology. The recent contribution of Broido & Clauset was definitely one to bounce on. The approach of this new manuscript is compelling. Dividing the study in several parts, each reflecting an attempt to bring out commonly used shortcuts in PPI network analyses, makes sense.

      Surprisingly, the authors do not refer to the endless controversy of labeling hubs as party or date, which is another manifestation of the interpretative bias of PPI data.

      This is a good point. In particular, it may be interesting if hub nodes that emerge from considering only prey interactions differ regarding party and date nodes. We now refer to this distinction in the Discussion:

      “[...] Further work will be needed to establish if true hub proteins exist in the PPI network and what their role is. For instance, it was previously claimed (Han et al., 2004) – and controversially discussed (Agarwal et al., 2010) – that the correlation of gene expression values between hub nodes with their interaction partners follows a bimodal distribution, leading to the distinction of party (high correlation) and date (low correlation) hubs. In the future, it would be interesting to study if the ratio of party and date hubs changes when considering prey degree only.”

      The only worthy point prompted by former Reviewer 2 is the effect of spoke expansion. In their response, the authors suggest that it would probably extend questioning and even if it is considered as future work, it could be mentioned in the main manuscript.

      Thank you for this comment. We agree that considering different expansion methods is an interesting research question regarding its effect on the PL property. We have added the following sentences to the Discussion to highlight the opportunity for future work:

      “[...] An additional complexity arising in AP-MS studies is that more than two interaction partners can be detected. These -ary interactions are commonly transformed into binary interactions using either the spoke model, which reports all interactions with the bait protein (as used by IntAct, for example), or the matrix expansion model, which reports all pairwise interactions. Both expansion models can, in principle, introduce false positives and it would be interesting to consider the effect of expansion model choice on the PL property in future work.”

      In the end, this submission is an invitation to constructively rethink the analysis of PPI networks and it feeds the discussion on modelling degree distributions that should not be considered as a solved issue.

      Reviewer #2 (Public Review):

      Many naturally occurring networks are assumed to have a power-law (PL) degree distribution. This assumption has certainly been widely held in the field of protein interactomes (PPIs), although important studies around 2010 have conclusively shown that many of these PL distributions are either the result of data mis-handling or of sloppy statistical procedures (see e.g. Porter and Stumpf in Science around 2014, which I would advise the authors to cite). The value of the present study is to introduce a new mechanism, experiment bias, to explain the appearance of such distributions in the PPI case, and in particular to show how correcting empirically for this mechanism can lead to a reappraisal of which proteins are genuine hubs in these networks. The claims are well supported by empirical evidence and some theoretical analysis. Overall, this is a worthwhile contribution and, while its significance is somewhat dented by the fact that the PL enthusiasm of many had already been tempered by the studies mentioned above,

      Thanks a lot for your constructive feedback. We now cite the work by Porter and Stumpf and have addressed your specific recommendations as detailed below.

      Reviewer #3 (Public Review):

      I would like to congratulate the authors to an impressive piece of work highlighting important real and potential biases, which may lead to power-law distributed node degrees in protein-protein interaction networks. This manuscript is easy to follow and very well written manuscript. I truly enjoyed the concise and convincing scientific presentation. Even if some of the concerns have already been discussed or raised in the past, the manuscript assesses potential biases in PPIs in a rigorous manner.

      I deem the following observations highly relevant to be communicated to the community again:

      (1) PL-like distributions emerge by aggregation of data sets alone.

      (2) Research interest in itself is PL-distributed and drives PL-like properties in PPI networks

      (3) Bait usage is a major driver of PL-like behaviour.

      (4) Accounting for biases changes the biological interpretation of the networks

      (5) Simulation studies further corroborate these findings.

      Thank you for this positive assessment of our work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewing editor:

      The biological significance of the results presented in this manuscript is the potential absence of active sequestration mechanisms in certain species, leading to variation in their ability to transport and store specific compounds, such as alkaloids. The concept of passive accumulation is introduced as an evolutionary intermediate between toxin consumption and sequestration.

      I agree with the reviewers' comments on the limitations of the current manuscript. Additionally, I'd like to raise a point about combining data from LC/MS and GC/MS as these techniques have different sensitivities. GC-MS excels in annotation, allowing for confident identification of detected compounds. However, it may have limitations in the number of extractable substances. Conversely, LC-MS/MS offers a broader range of detectable substances, but annotation can be more challenging. While methods to bridge this gap exist, the current approach might not fully account for the potential influence of the analysis equipment on the observed differences in alkaloid numbers between the Texas and Panama samples analyzed by LC-MS/MS. To address this, consider including data from both methods (if possible) to gain a more comprehensive understanding of the alkaloid profiles. Alternatively, analyzing the Texas and Panama samples with GC-MS could be considered for a more focused comparison with the other samples.

      Thank you for the suggestion. Unfortunately, we do not have GC-MS data for the Texas and Panama samples. While the strength of these two datasets is that they present two independent lines of data corroborating that “undefended” frogs have detectable alkaloid levels, we have more explicitly made clear for readers that the datasets should not be compared directly. We reviewed the text to check that we carefully acknowledge in the manuscript the higher sensitivity of our LC-MS assay, and we added more detail about the differences between the two assay types (section 4d): “The UHPLC-HESI-MSMS pipeline used on the samples from Panama and Texas allows for higher sensitivity to detect a broader array of compounds compared to our GC-MS methods, but has lower retention-time resolution and produces less reliable structural predictions. Furthermore, due to the lack of liquid-chromatography-derived references for poison-frog alkaloids, precise alkaloid annotations from the UHPLC-HESI-MSMS dataset could not be obtained. Therefore, the UHPLC-HESI-MSMS and GC-MS datasets are not directly comparable, and UHPLC-HESI-MSMS data are not included in Fig. 2”. We have also revised the asterisk accompanying the table to further reinforce that alkaloid numbers between the two assay types should not be compared. It now states: “Note that the UHPLC-HESI-MS/MS and GC-MS assays differed in both instrument and analytical pipeline, so “Alkaloid Number” values from the two assay types should not be compared to each other directly”. We further point out differences between the two assay types in section 2b: “Similarly, the analysis of UHPLC-HESI-MS/MS data was untargeted, and thus enables a broader survey of chemistry compared to that from prior GC-MS studies.”

      Finally, we point out that the output from the analytical pipeline for UHPLC-HESI-MSMS annotates compounds as “alkaloids,” using broader criteria than the targeted GC-MS component of our study. In an effort to make the datasets more comparable, at least conceptually, we now include an assessment of which alkaloids identified by UHPLC-HESI-MSMS match known molecular formulae and structural classes in frogs (see Table S6 and revised text on lines 335-343 and 410-415.

      Reviewer #1 (Public Review):

      This is a very relevant study, clearly with the potential of having a high impact on future research on the evolution of chemical defense mechanisms in animals. The authors present a substantial number of new and surprising experimental results, i.e., the presence in low quantities of alkaloids in amphibians previously deemed to lack these toxins. These data are then combined with literature data to weave the importance of passive accumulation mechanisms into a 4-phases scenario of the evolution of chemical defense in alkaloid-containing poison frogs.

      In general, the new data presented in the manuscript are of high quality and high scientific interest, the suggested scenario compelling, and the discussion thorough. Also, the manuscript has been carefully prepared with a high quality of illustrations and very few typos in the text. Understanding that the majority of dendrobatid frogs, including species considered undefended, can contain low quantities of alkaloids in their skin provides an entirely new perspective to our understanding of how the amazing specializations of poison frogs evolved. Although only a few non-dendrobatids were included in the GCMS alkaloid screening, some of these also included minor quantities of alkaloids, and the capacity of passive alkaloid accumulation may therefore characterize numerous other frog clades, or even amphibians in general.

      Thank you for the kind evaluation.

      While the overall quality of the work is exceptional, major changes in the structure of the submitted manuscript are necessary to make it easier for readers to disentangle scope, hypotheses, evidence and newly developed theories.

      Based on reviewer comments, we revised the manuscript structure substantially to make the different aspects of the paper more readily identifiable to readers. Specifically we moved the content of Figure 2 into a new section in the introduction. We also added more introductory text to better introduce the main ideas of the new model and to summarize the scope and aim of the paper. We reorganized the result section headings and moved Figure 1 (now Fig. 3) down into section 2c.

      Reviewer #2 (Public Review):

      Summary:

      This was a well-executed and well-written paper. The authors have provided important new datasets that expand on previous investigations substantially. The discovery that changes in diet are not so closely correlated with the presence of alkaloids (based on the expanded sampling of non-defended species) is important, in my opinion.

      Strengths:

      Provision of several new expanded datasets using cutting edge technology and sampling a wide range of species that had not been sampled previously. A conceptually important paper that provides evidence for the importance of intermediate stages in the evolution of chemical defense and aposematism.

      Thank you for kind comments.

      Weaknesses:

      There were some aspects of the paper that I thought could be revised. One thing I was struck by is the lack of discussion of the potentially negative effects of toxin accumulation, and how this might play out in terms of different levels of toxicity in different species.

      Thank you for the suggestion. We now explicitly address the possible negative effects of toxin accumulation and how costs may play out with respect to varying levels of chemical defense among different organisms, including poison frogs. We note early on that, “short-term alkaloid feeding experiments (e.g., Daly et al., 1994; Sanchez et al., 2019) demonstrate that both defended and undefended dendrobatids can survive the immediate effects of alkaloid intake, although the degree of resistance and the alkaloids that different species can resist vary'' (section 2c), and we address the sparse literature suggesting some species-level variation in alkaloid resistance in frogs. Later, we make the point that, “origins of chemical defenses are also shaped by the cost of resisting and accumulating toxins, which can change over evolutionary time as animals adapt to novel relationships with toxins” (section 2d). We broadly discuss costs of target-site resistance, a common mode of molecular resistance in poison frogs and other animals, and compensatory molecular adaptations that offset the costs. We also discuss examples from the literature of negative effects of high levels of resistance and toxin accumulation that are not completely offset. We also note that to the best of our knowledge, potential lifetime fitness costs to alkaloid consumption by dendrobatids have not been evaluated.

      Further, are there aspects of ecology or evolutionary history that might make some species less vulnerable to the accumulation of toxins than others? This could be another factor that strongly influences the ultimate trajectory of a species in terms of being well-defended. I think the authors did a good job in terms of describing mechanistic factors that could affect toxicity (e.g. potential molecular mechanisms) but did not make much of an attempt to describe potential ecological factors that could impact trajectories of the evolution of toxicity. This may have been done on purpose (to avoid being too speculative), but I think it would be worth some consideration.

      We agree that other factors can influence the trajectory of chemical defense. We incorporated these ideas into the new section 2d, which provides a somewhat brief overview of ecological factors that could influence the origins of chemical defense, the physiological costs of toxin resistance and accumulation, and some of the possible eco-evo factors that shape chemical defense once it evolves.

      In the discussion, the authors make the claim that poison frogs don't (seem to) suffer from eating alkaloids. I don't think this claim has been properly tested (the cited references don't adequately address it). To do so would require an experimental approach, ideally obtained data on both lifespan and lifetime reproductive success.

      We agree with the reviewer that more data are necessary to make this broad claim, which we have removed. We revised this to state: “regardless, it is clear that all or nearly all dendrobatid poison frogs consume alkaloid-containing arthropods as part of their regular diet” (section 2c). We then expand on this statement with data from short-term experimental work that support the notion that at least some dendrobatids are resistant (i.e., can survive) the immediate effects of alkaloids. We also point out later in the manuscript that, “as far as we are aware, the possible lifetime fitness costs (e.g., in reproductive success) of alkaloid consumption in dendrobatids have not been measured” (section 2d).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      While in general I am very open to "unorthodox" ways to write a manuscript (i.e., differing from the standard structure intro-methods-results-discussion) I feel there is much room for improvement in this case. When reading the manuscript line by line, I was several times totally uncertain about the scope and content of the original data in the manuscript. It is too often unclear which of the outlined theories are new and why they are presented, which hypotheses were tested and why, which data were newly obtained, which technological improvements led to the novel and surprising results, and why no alternative hypotheses are tested. I feel the authors need to fundamentally reconsider the structure of the manuscript - which does not mean everything needs to be rewritten, but some major reshuffling of paragraphs from one section to the other may already lead to substantial improvement. I will in the following list (not ordered by priority) different issues that I encountered, without always providing a specific suggestion for improvement - please come up with an improved structure that removes these issues in one way or the other!

      Thank you for the suggestions. We did our best to improve the structure of the paper. Specifically, we substantially revised the introduction to provide a clearer background of the ideas leading up to the new evolutionary model. We moved most of what was previously figure 2 (now Fig. 1) into an earlier part of the introduction in the main text. We moved what was previously figure 1 (now Fig. 3) to much later in the discussion (section 2c). We attempted to clarify and separate throughout the text the new data from existing data. Please see our responses below for additional details.

      Line 42-45: Please provide a reference on this statement on traversing adaptive landscapes.

      We added the following reference: Martin, CH and PC Wainwright. 2013. Multiple fitness peaks on the adaptive landscape drive adaptive radiation in the wild. Science 339: 208-211. https://doi.org/10.1126/science.1227710

      Line 50: Why are these phases "likely" to occur? - no evidence is presented for this hypothesized high likelihood. Presenting this scenario already in the second paragraph of the intro is very weird. Are these really the only possible phases? Wouldn't it be possible to come up with totally different scenarios? In my opinion, this specific four-phase scenario should be more clearly labelled as a novel theory presented in this paper, and perhaps it should come much later in the introduction.

      Thank you for the suggestion. We moved this paragraph down into a new subsection of the introduction. We also revised the language to clarify that the model is a new evolutionary theory based on new and existing ideas.

      Line 51: Here you use for the first time the term "elimination". While it is intuitively clear what is meant by it, there still could be different meanings. The alkaloids could simply be passively excreted, or they could be actively biochemically decomposed. Later in the Discussion the authors imply that elimination requires some kind of metabolic process, but this perhaps should be made clearer already in the introduction.

      We now spend more time in the introduction describing pharmacokinetics as well as the terms we used (including elimination), which are slightly modified from terms in pharmacokinetics.

      Figure 1. I have major concerns about this figure. I found the figure very confusing, and the authors really need to reconsider and modify (simplify) it. The figure caption starts with "Major processes involved..." as if this was established textbook knowledge rather than a totally hypothetical illustration of how different factors (sequestration, elimination....) can lead to defended or undefended phenotypes. Only later on in the caption it becomes clear this is just a suggestion/hypothesis/model: "we hypothesize...".

      We revised the figure (now Fig. 3) and its legend. It now starts with the following text: “Hypothesized physiological processes that interact to determine the defense phenotype.” We also simplify the figure by removing two lines and recoding the table (see comment below).

      Secondly, the way the graph is drawn suggests some kind of experimental result where specific evolutionary pathways lead to very specific degrees of "defendedness", recognizable by the points on the right axis stacked very precisely one above the other. Do you really want to imply that you want to suggest such a specific model, where particular accumulation/intake/elimination rates lead to exactly these outcomes? Also, wouldn't it be possible to somewhat simplify the categories in the table? Again, why so specific, is there any experimental evidence for it? Why sometimes 1 plus, 2 plus, 3 plus? Wouldn't it be better to just suggest categories such as strong, weak and absent?

      We simplified the figure by removing the secondary (dashed) passive accumulation and active sequestration lines. We also changed the + signs to “low,” “med,” or “high” and tried to simplify the text in the figure and in the legend.

      Line 101-103: "We propose ..." Here, as the concluding statement of the introduction, the authors suggest a very general hypothesis which seems rather disconnected from the four-phase model and from the experimental results. Here, at the latest, I would have expected to learn (1) what the overall scope of the paper is, (2) which kind of approaches were followed and which novel experimental results will be presented in the following, and (3) how the experimental results will be used to derive a new theory / novel. Again, it is obvious that the scope of the paper is broader than testing just a single and narrow hypothesis, but rather to support and develop a broader theory and evolutionary model, but this should be clear to readers once they arrive at this line.

      Thank you for the suggestion. We added a paragraph to the end of the first section of the introduction that outlines the content of the rest of the paper. We also reorganized some of the subheadings to make the flow of ideas and the source of data in each subsection clearer. We split up and moved what was previously in section 2a into parts of the introduction and discussion. We moved the results text about diet and the discussion about resistance to section 2a, to better provide data and discussion of phases 1 and 2.

      Figure 2. My opinion on this figure is much less strong than on Fig. 1. However, the authors may want to reconsider whether it really makes sense to here show all the historical trees and theories (which are not really systematically reviewed in the text) or if they maybe wish to go on with panel D only (the most recent tree and scenario which is also used to consistently for further discussion in the manuscript).

      We moved the content from Fig. 2A–C to the main text (now section 1b) and narrowed the focus of Fig. 2 (now Fig. 1) to what was previously panel 2D.

      Results and Discussion: The whole section on phases 1 to 2 is not based on any new results. This is OK (as I said, I have no problems with "unorthodox" manuscript structure) but it should be clearer to readers why this is presented here and what it represents. A new theory? A recapitulation of textbook knowledge? Something necessary to later understand the experimental results?

      We split up and moved what was previously in section 2a into parts of the introduction and discussion. Now, section 2a still focuses on phases 1 and 2 but presents the diet data from our study (phase 1) and a review of known resistance mechanisms (phase 2; previously in the discussion section).

      Line 168. Here we have arrived at the "core" of the paper, that is, the actual experimental results. Surprisingly, you find alkaloids in dendrobatids usually considered "undefended". This is great, surprising and of high importance. However, I am missing at least some technical/methodological discussion about this finding, except for the statement that it was based on GCMS. Why have previous studies not detected these alkaloids? Did you use particularly sensitive GCMS instruments? Did you look more in depth than it was done in previous studies? Can you totally exclude these contaminations/artefacts?

      We added the following paragraph to section 2b: “The large number of structures that we identified is in part due to the way we reviewed GC-MS data: in addition to searching for alkaloids with known fragmentation patterns, we also searched for anything that could qualify as an alkaloid mass spectrometrically but that may not match a previously known structure in a reference database. Similarly, the analysis of UHPLC-HESI-MS/MS data was untargeted, and thus enables a broader survey of chemistry compared to that from prior GC-MS studies. Structural annotations in our UHPLC-HESI-MS/MS analysis were made using CANOPUS, a deep neural network that is able to classify unknown metabolites based on MS/MS fragmentation patterns, with 99.7% accuracy in cross-validation (Dührkop et al., 2021).” We also moved the paragraph on contamination from the methods section into section 2b.

      Line 169. This sentence (and several others in the subsequent paragraphs) do a poor job in explaining the taxon and specimen sampling. The particular sentence in this line is unclear: Did you include 27 species of dendrobatids AND IN ADDITION representatives of the main undefended clades, or did these 27 species INCLUDE representatives of the main undefended clades?

      We now present a brief overview of sampling in the last paragraph of the introduction (section 1c). We clarified sampling of the species: “In total we surveyed 104 animals representing 32 species of Neotropical frogs including 28 dendrobatid species, two bufonids, one leptodactylid, and one eleutherodactylid (see Methods). Each of the major undefended clades in Dendrobatidae (Fig. 1, Table 1) is represented in our dataset, with a total of 14 undefended dendrobatid species surveyed.” We also reviewed and clarified similar language in other places in the text (e.g., section 2b).

      Line 177. "undefended lineages" - of dendrobatids or of frogs in general? Given that you also include non-dendrobatids.

      Dendrobatids. The sentence now reads “Overall, we detected alkaloids in skins from 13 of 14 undefended dendrobatid species included in our study, although often with less diversity and relatively lower quantities than in defended lineages (Fig. 2, Table 1, Table S3, Table S4).”

      Line 188: "defe" should probably changed to "defended"?

      Corrected.

      Table 1. The taxon sampling clearly focuses on dendrobatids, with only a few other taxa. This is fine, however, it does not allow to test the hypothesis that something "special" predisposes dendrobatids to passive accumulation and alkaloid resistance. For this, a wider taxon sampling of other frog families would have been necessary to have a larger number of "control" data. Again, this is fine for the purpose of the study and is discussed later (line 399) but only very briefly. I feel it should be mentioned earlier on.

      Thank you for the suggestion. We now address this point earlier in the manuscript so that readers will not have the impression that there are sufficient data to infer that dendrobatids are predisposed to passive accumulation. We propose several phylogenetic alternatives, making it clear that determining the number and timing of origins of passive accumulation is not possible with our data (section 2c), ultimately noting that “discriminating a single origin [of passive accumulation] – no matter the timing – from multiple ones would require better phylogenetic resolution and more extensive alkaloid surveys, as we only assessed four non-dendrobatid species”.

      Reviewer #2 (Recommendations For The Authors):

      P2L60 - The description of figure 1 is somewhat confusing, as it first focuses on the graph in the bottom panel, then moves to describing aspects of the table (top panel), then back to the graph. I think it might make more sense to describe these two panels separately and in order.

      Thank you for the suggestion. We revised the figure (now Fig. 3) and its legend for clarity.

      P3L94 - Saying that three transitions makes this group "ideal" for studying complex phenotypic transitions is a bit hyperbolic, in my opinion. I suggest toning down this description.

      Thank you for the suggestion. We changed “ideal” to “suitable.”

      P3L101 - "We propose that changes in toxin metabolism through selection on mechanisms of toxin resistance likely play a major role in the evolution of acquired chemical defenses." This hypothesis appears to be a combination of earlier ideas, with a somewhat different emphasis. The authors acknowledge this and go through some of the earlier ideas, in the legend of figure 2. I would have preferred to see more discussion of this (particularly with reference to the history of the idea in reference to poison frogs) in the main body of the text.

      Thank you for the suggestion. We now more extensively discuss these prior studies in the introduction (section 1b and 1c). We also revised this figure (now Fig. 1) to focus on what was previously figure 2 panel D.

      P3L102 - Figure 2 - the phrase "Resistance to consuming some alkaloids" seems inappropriate - perhaps "Resistance to alkaloid poisoning after consumption" (or something similar) would be more accurate?

      We changed this to “Low alkaloid resistance”.

      P4L153 - "Accumulation of alkaloids in skin glands could help to prevent alkaloids from reaching their targets". This could be true, but why would skin glands be a preferred location of sequestration to avoid toxicity? The authors should explain why such glands would be particularly likely to serve as places of sequestration.

      Thank you for pointing out this ambiguity. We decided to remove our discussion of sequestration into skin glands, because it is challenging to discuss this process in toxin resistance without too much speculation.

      P4L154 - "Although direct evidence is lacking, some poison frogs may biotransform alkaloids into less toxic forms until they can be eliminated from the body, e.g., using cytochrome p450s". This would seem to contradict the argument of this process being a precursor to accumulating effective toxins.

      We agree that these processes seem contradictory. However, a few papers are starting to suggest that metabolic detoxification may be initially useful for lineages that eventually evolve toxin sequestration. This is because detoxification or elimination (clearance) of toxins allows increased intake of toxins. Because there is some delay in the removal of toxins from an animal’s body, increased consumption ultimately leads to higher toxin exposure and possible toxin diffusion into various body cavities, which can increase selective pressure to evolve other kinds of resistance mechanisms. This pattern was shown in an experiment with toxin-resistant fruit flies (Douglas et al., 2022). Many toxin-sequestering species still metabolize some toxins even if they sequester the majority – as we argue, the defense phenotype is the result of a balance among intake, elimination, and accumulation, all of which can interact simultaneously. In poison frogs specifically there is some evidence that p450s are upregulated after toxin consumption (Caty et al. 2019). One possible prediction is that the type of resistance that an animal has changes as toxin sequestration evolves. We talk a bit more about these patterns in section 2e.

      P5L186 - Table 1 legend - change "defe" to "defended"

      Corrected.

      P12L414 - "do not appear to suffer substantially from doing so as it is part of their regular diet". I don't think this claim has been properly tested, as of yet. It would require looking at the effects of a diet with and without toxins over the lifespan of the frogs, and the impact of that difference on both survival and fertility.

      Reviewer 1 also made this important observation, which we address above.

      P12L432 - "for toxin-resistant organisms, there is little cost to accumulating a toxin, yet there may be benefits in doing so." Yet toxin resistance may itself be a continuous trait, so there may be a cost that depends on the degree of toxin resistance. I don't see why the authors are proposing toxin resistance as a discrete trait when their main point is that toxin accumulation is not.

      We agree and removed this statement.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Yang et al. investigated the locations and hierarchies of NFATc1+ and PDGFRα+ cells in dental and periodontal mesenchyme. By combining intersectional and exclusive reporters, they attempted to distinguish among NFATc1+PDGFRα+, NFATc1+PDGFRα-, and NFATc1- PDGFRα+ cells. Using tissue clearing and serial section-based 3D reconstruction, they mapped the distribution atlas of these cell populations. Through DTA-induced ablation of PDGFRα+ cells, they demonstrated the crucial role of PDGFRα+ cells in the formation of the odontoblast cell layer and periodontal components.

      Thank you for your valuable comments and suggestions, which have greatly enhanced the quality of this research article. The manuscript has been significantly revised in accordance with the reviewers’ comments. All necessary experimental conditions and required data have been included, and all the questions and considerations have been well-addressed in the revised manuscript and supporting information.

      Main issues:

      (1) The authors did not quantify the contribution of PDGFRα+ cells or NFATc1+ cells to dental and periodontal lineages in PDGFRαCreER; Nfatc1DreER; LGRT mice. Zsgreen+ cells represented PDGFRα+ cells and their lineages. Tomato+ cells represented NFATc1+ cells and their lineages. Tomato+Zsgreen+ cells represented NFATc1+PDGFRα+ cells and their lineages. Conducting immunostaining experiments with lineage markers is essential to determine the physiological contributions of these cells to dental and periodontal homeostasis.

      Thanks for your question, we are sorry for the insufficient statement. Figure S9 provided statistical analysis of the number of PDGFR-α+ cells, NFATc1+ cells, and PDGFR-α+&NFATc1+ cells in the dental pulp and periodontal ligament (PDL). The results allow for a clear comparison of the contributions of single-positive and double-positive cells to both tissues. Additionally, the tracing results showed whether these three cell populations have the capacity to produce progeny cells. We further supplemented the analysis with immunofluorescence results of double-positive cells to identify their cell types, selecting AlphaV as the marker for mesenchymal stem cells (MSCs) and CD45 as the marker for hematopoietic cells. This part is further discussed in the manuscript as below:

      Page 14-15 in the revised manuscript, “To identify the population of PDGFR-α+ and NFATc1+ co-expressing cells in the pulp and periodontal ligament (PDL), we generated Pdgfr-aCreER; Nfatc1DreER; R26-LSL-RSR-tdT-DTR (LRTD) mice... Strong tdTomato signals were detected in both the PDL (Figure S22B) and pulp (Figure S22C). With respect to the MSC-specific marker AlphaV, we observed AlphaV+tdTomato+ cells in both regions. Additionally, CD45+ (hematopoietic marker) tdTomato+ cells were also present in these areas (Figure S22B, C). These findings suggest that the population of PDGFR-α+ and NFATc1+ co-expressing cells is heterogeneous.”

      (2) The authors attempted to use PDGFRαCreER; Nfatc1DreER;IR1 mice to illustrate the hierarchies of NFATc1+ and PDGFRα+ cells. According to the principle of the IR1 reporter, it requires sequential induction of PDGFRα-CreER and Nfatc1-DreER to investigate their genetic relationship. Upon induction by tamoxifen, NFATc1+PDGFRα- cells and NFATc1-PDGFRα+ cells were labeled by Tomato and Zsgreen, respectively. However, the reporter expression of NFATc1+PDGFRα+ cells was uncertain, most likely random. Therefore, the hierarchical relationship of NFATc1+ and PDGFRα+ cells cannot be reliably determined from PDGFRαCreER; Nfatc1DreER; IR1 mice.

      Thank you for your question. We have supplemented the control group (Pdgfr-αCreER; IR1) experimental data (Figure 8). By comparing the results of Pdgfr-αCreER; Nfatc1DreER; LGRT tracing assays, we confirmed that the expression pattern and range of PDGFR-a+ cells in pulp and PDL of Pdgfr-αCreER; IR1 mice are consistent with those observed in Pdgfr-αCreER; Nfatc1DreER; LGRT mice (Figure 6), and the same applies to NFATc1+ cells. All of our experimental results have been repeated multiple times. In addition, the IR1 system was initially developed by Professor Bin Zhou's lab and was validated for feasibility and stability in a paper published in Nature Medicine in 2017 (https://doi.org/10.1038/nm.4437). Moreover, Professor Zhou Bo O's team applied IR1 dual recombinases for bone lineage tracing in 2021 published in Cell Stem Cell, which also confirmed its feasibility and stability. (DOI: 10.1016/j.stem.2021.08.010)

      Reviewer #2 (Public Review):

      Summary:

      Yang et al. present an article investigating the spatiotemporal atlas of NFATc1+ and PDGFR-α+ cells within the dental and periodontal mesenchyme. The study explores their capacity for progeny cell generation and their relationships - both inclusive and hierarchical - under homeostatic conditions. Utilizing the Cre/loxP-Dre/Rox system to construct tool mice, combined with tissue transparency and continuous tissue slicing for 3D reconstruction, the researchers effectively mapped the distribution of NFATc1+ and PDGFR-α+ cells. Additionally, in conjunction with DTA mice, the study provides preliminary validation of the impact of PDGFR-α+ cells on dental pulp and periodontal tissues. Primarily, this study offers an in-situ distribution atlas for NFATc1+ and PDGFR-α+ cells but provides limited information regarding their origin, fate differentiation, and functionality.

      We would like to thank the reviewer for setting a high value on our study. Given many constructive suggestions, the manuscript has been revised to improve the quantity of this study. All the necessary discussions have also been added, and all the questions and concerns have been well-addressed in the revised manuscript. The point-to-point reply to the comments is listed below:

      Strengths:

      (1) Tissue transparency techniques and continuous tissue slicing for 3D reconstruction, combined with transgenic mice, provide high-quality images and rich, reliable data.

      (2) The Cre/loxP and Dre/Rox systems used by the researchers are powerful and innovative.

      (3) The IR1 lineage tracing model is significantly important for investigating cellular differentiation pathways.

      (4) This study provides effective spatial distribution information of NFATc1+/PDGFR-α+ cell populations in the dental and periodontal tissues of adult mice.

      Weaknesses:

      (1) In the functional experiment section, the investigation into the role of NFATc1+/PDGFR-α+ cell populations is somewhat lacking.

      Thank you so much for your comments and suggestions. We have supplemented the analysis with immunofluorescence results of double-positive cells to identify NFATc1+&PDGFR-α+ cell populations, selecting AlphaV as the marker for mesenchymal stem cells (MSCs) and CD45 as the marker for hematopoietic cells. This part was shown as below:

      Page 14-15 in the revised manuscript, “To identify the population of PDGFR-α+ and NFATc1+ co-expressing cells in the pulp and periodontal ligament (PDL), we generated Pdgfr-aCreER; Nfatc1DreER; R26-LSL-RSR-tdT-DTR (LRTD) mice… Strong tdTomato signals were detected in both the PDL (Figure S22B) and pulp (Figure S22C). With respect to the MSC-specific marker AlphaV, we observed AlphaV+tdTomato+ cells in both regions. Additionally, CD45+ (hematopoietic marker) tdTomato+ cells were also present in these areas (Figure S22B, C). These findings suggested that the population of PDGFR-a+ and NFATc1+ co-expressing cells is heterogeneous.”

      We also supplemented the discussion regarding the role of PDGFR-α+ population on page 17. Its potential role in pulp and periodontal formation had been suggested as well.    

      Page 17 in the revised manuscript, “After ablating PDGFR-α+ cells, we observed damage to the odontoblast layer and shrinkage of the pulp core in dental pulp tissue, indicating that PDGFR-α+ cells contribute to the composition of dental pulp tissue, particularly the odontoblast layer (Figure. 9C, D). In the periodontal ligament, we noted a reduction and destruction of collagen fibers, suggesting a role for PDGFR-α+ cells in periodontal tissue structure (Figure. 9E, F).”

      (2) The author mentions that 3D reconstruction of consecutive tissue slices can provide more detailed information on cell distribution, so what is the significance of using tissue-clearing techniques in this article?

      Thank you for your insightful comment, and we are sorry for the insufficient statement here. In our study, the utilization of tissue clearing techniques was to address some of the shortcomings associated with the 3D reconstruction of consecutive tissue slices, such as the compromised integrity of samples due to section layering, leading to discontinuities along the z-axis and potential loss of positive signals (Fig. S5, S13). Additionally, unavoidable tissue damage during the sectioning process may result in the loss of some information. As one of the most advanced imaging technologies currently available, tissue clearing/imaging allows for direct observation of the spatial location and relationships of fluorescently labeled cells within the intact tissue, which is more persuasive. Also, evolving beyond the analysis of structural and molecular biology of selected tissue sections, and expanding the focus to entire organs and organisms, is a trend in the development of the biomedical field (Nat Methods. 2024 Jul;21(7):1153-1165; Nat Commun. 2024 Feb 26;15(1):1764). Admittedly, no method is flawless; thus, our employment of two advanced imaging approaches aims to answer questions regarding the spatial positioning and relationships of PDGFR-α single-positive, NFATc1 single-positive cells, and PDGFR-α+ NFATc1+ cells from multiple perspectives. This is done to enhance the credibility and persuasiveness of our results.

      We greatly appreciate your suggestion, which have significantly complemented the content of our article. The corresponding statements have been added in the revised manuscript as below:

      Page 6 in the revised manuscript, “As one of the most advanced imaging technologies currently available, tissue clearing/imaging allows for direct observation of the spatial location and relationships of fluorescently labeled cells within the intact tissue. Therefore, according to the existing SUMIC tissue deep clearing (TC) methods, we modified and improved a rapid and efficient procedure, which enable rapid single-cell resolution and quantitative panoptic 3D light-sheet imaging.”

      (3) After reading the entire article, it is confusing whether the purpose of the article is to explore the distribution and function of NFATc1+/PDGFR-α+ cells in teeth and periodontal tissues, or to compare the differences between tissue clearing techniques and 3D reconstruction of continuous histological slices using NFATc1+/PDGFR-α+ cells?

      We sincerely appreciate your question and apologize for any ambiguous descriptions.

      The purpose of our study is to map the atlas of NFATc1+/ PDGFR-α+ inclusive, exclusive and hierarchical distribution in dental and periodontal mesenchyme. Under this premise, the two advanced imaging techniques were merely employed as means to elucidate this issue Indeed, in the previous manuscript, we did overemphasize the comparison and description of the differences between tissue clearing techniques and 3D reconstruction of continuous slices, which led to unnecessary misunderstandings for which we are deeply apologetic. Consequently, in this version of the manuscript, we have diminished the descriptions comparing their advantages and disadvantages, focusing instead on exploring the importance of NFATc1+/PDGFR-α+ cells. We appreciate your suggestions once again.

      Page 6 in the revised manuscript, “These two 3D-reconstruction and imaging technologies complement each other to jointly address the spatial positioning and hierarchical relationships of PDGFR-α+, NFATc1+, and PDGFR-α+ NFATc1+ cells from multiple perspectives.”

      (4) The researchers did not provide a clear definition of the cell types of NFATc1+/PDGFR-α+ cells in teeth and periodontal tissues.

      Thanks for your suggestions. We discovered through cell ablation experiments that the removal of PDGFR-α+ cells resulted in the destruction of the odontoblast layer in the dental pulp, shrinkage of the pulp core, and disruption of collagen fibers in the periodontal ligament. Combined with the results from lineage tracing, we conclude that PDGFR-α+ cells primarily constitute the mesenchymal cells that form the supporting tissues in both the dental pulp and periodontal ligament (Part 4.1). Through immunofluorescence staining, AlphaV was as the marker for mesenchymal stem cells (MSCs) and CD45 as the marker for hematopoietic cells, we observed that the double-positive cell population was a heterogeneous group, containing both mesenchymal stem cells (MSC) and hematopoietic cells (Part 4.2).

      (5) In studies related to long bones, the author defines the NFATc1+/PDGFR-α+ cell population as SSCs, which as a stem cell group should play an important role in tooth development or injury repair. However, the distribution patterns and functions of the NFATc1+/PDGFR-α+ cell population in these two conditions have not been discussed in this study.

      Thanks for your suggestions. The NFATc1+/PDGFR-α+ cell population was identified as playing an important role in tissue regeneration, especially in oral and maxillofacial tissues. Our research primarily focuses on the identification of NFATc1+ and PDGFR-α+ cells within dental and periodontal mesenchyme, highlighting their contribution to tissue homeostasis and regeneration. Although the NFATc1+/PDGFR-α+ cells were characterized in the context of other tissue types, their detailed role in tooth development and injury repair remains an area for further exploration.

      This part was further discussed on page 17-18 in the revised manuscript, “Cell ablation and immunofluorescence staining experiments further characterized the types and functions of PDGFR-α+/PDGFR-α+&NFATc1+ populations. After ablating PDGFR-α+ cells, we observed damage to the odontoblast layer and shrinkage of the pulp core in dental pulp tissue, indicating that PDGFR-α+ cells contribute to the composition of dental pulp tissue, particularly the odontoblast layer (Figure. 9C, D). In the periodontal ligament, we noted a reduction and destruction of collagen fibers, suggesting a role for PDGFR-α+ cells in periodontal tissue structure (Figure. 9E, F). Previous results confirmed the presence of double-positive cells in both dental pulp and periodontal tissues and provided insights into their hierarchical relationships in the periodontal ligament (Figure. 8). To further investigate the double-positive cell population, we developed an inducible dual-editing enzyme reporter system to label these cells with tdTomato signals. Using AlphaV as a marker for mesenchymal stem cells (MSCs) and CD45 for hematopoietic cells, we found that double-positive cells included components of both MSCs and hematopoietic cells (Figure S22B, C), indicating a heterogeneous population. Further experiments are necessary to determine whether the predominant role in this co-positive MSC population is played by PDGFR-α+ or NFATc1+ and to clarify the specific functions of these cells in the future.”

      Reviewer #3 (Public Review):

      Summary:

      This groundbreaking study provided the most advanced transgenic lineage tracing and advanced imaging techniques in deciphering dental/periodontal mesenchyme cells. In this study, authors utilized CRISPR/Cas9-mediated transgenic lineage tracing techniques to concurrently demonstrate the inclusive, exclusive, and hierarchical distributions of NFATc1+ and PDGFR-α+ cells and their lineage commitment in dental and periodontal mesenchyme.

      Strengths:

      In cooperating with tissue clearing-based advanced imaging and three-dimensional slices reconstruction, the distribution and hierarchical relationship of NFATc1+ and PDGFR-α+ cells and progeny cells plainly emerged, which undoubtedly broadens our understanding of their in vivo fate trajectories in craniomaxillofacial tissue. Also, the experiment design is comprehensive and well-executed, and the results are convincing and compelling.

      Weaknesses:

      Minor modifications could be made to the paper, including more details on the advantages of the methodology used by the authors in this study, compared to other studies.

      Thanks for your constructive comments and advice on how to improve the quality of this research article. We have thoroughly and carefully corrected the manuscript based on your suggestion, and all the necessary data have been added to support our claims. Meanwhile, all the questions and concerns have been well-addressed in the revised manuscript and the revised supplementary information. Thus, we believe that the quality of this paper has been significantly enhanced. We thank you again for your great efforts.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Line 134, the authors categorized the reporter systems into three types: intersectional reporters, exclusive reporters, and nested reporters. However, Figure 1A does not depict the nested reporters.

      Thanks for your helpful recommendation to improve the quality of this manuscript, and we are sorry for the mistake. In this revised manuscript, we have modified the content of Figure 1A, as displayed below:

      (2) Line 238, the authors mentioned that NFATc1 is expressed in the mandible and periodontal tissues based on their previous sequencing analyses. It would be better to cite the related reference or display the expression of NFATc1 in the Supplemental Figures.

      Thanks for your suggestions. We sincerely apologize for the typo that occurred during the writing process and have revised the original text to on page 9:

      “The previous sequencing analyses have reported the expression of NFATc1 in mandible and periodontal tissues20. (DOI: 10.1177/00220345221074356)”

      (3) Line 264, the figure callout "Figure 5E" does not exist, and the figure legends of Figure 5 contain the same error.

      We greatly appreciate your rigor and diligence, and we have corrected this error.

      (4) Line 280, the figure callout "Figure S12" is incorrect.

      Thank you for your efforts, and we are sorry for our negligence. The corresponding descriptions have been amended as below:

      Page 10 in the revised manuscript, “Consistent with the quantification of TC-based imaging results (Figure S9), the number of PDGFR-α+ cells and NFATc1+ cells were significantly higher than that in pulse group.”

      (5) Line 301, the figure callout "Figure 4" is erroneous.

      Thank you for your efforts, and we are sorry for our negligence. The corresponding descriptions have been amended as below:

      Page 11 in the revised manuscript, “After 11 days tracing, the number of PDGFR-α+ & NFATc1+ cells and PDGFR-α+NFATc1+ cells increased significantly (Figure 7)…”

      (6) Line 306, the sentence "Our previous study identified the presence of NFATc1+ cells in the cranium by single-cell sequencing (unpublished data)" could be improved by referencing specific data or findings.

      Thanks for your suggestions, and we are sorry for our negligence. The corresponding citation have been amended as below:

      Page 11 in the revised manuscript, “As a part of craniomaxillofacial hard tissue, we also intended to explore whether the presence of NFATc1+ and PDGFR-α+ cells in cranial bone tissue/suture is different from dental and periodontal tissue (our previous study has identified the presence of NFATc1+ cells in the cranium by single-cell sequencing28”

      (7) Line 341, the statement "Moreover, no PDGFR-α+ cells were detected in the Nfatc1DreER; IR1 group," needs further explanation or context.

      Thanks for your suggestions. The corresponding descriptions have been amended as below:

      Page 13 in the revised manuscript,  “Moreover, since the recombinase recognition sites are interleaved (loxP–rox–loxP–rox), recombination by one system will naturally remove a recognition site of the other system, rendering its reporter gene inactive for further recombination. The results showed no tdTomato+ cells or ZsGreen+ cells were detected in the Pdgfr-αCreER; IR1 or Nfatc1DreER; IR1 group respectively demonstrating the feasibility and accuracy of the IR1 system.”

      (8) Several statements in this text were duplicated. For instance, lines 365 to 376 are identical to lines 497 to 508. This redundancy should be addressed to improve the manuscript's clarity and conciseness.

      We greatly appreciate your suggestions, and we are sorry for the misunderstanding we may have caused. We have revised and integrated the entire Results 4 section (including lines 365 to 376 of the original manuscript) into the Discussion section to avoid unnecessary redundancy and misunderstandings. This adjustment also emphasizes that the goal of using two imaging techniques is to draw more credible conclusions from multiple perspectives, thereby mitigating the shortcomings of relying solely on existing advanced imaging methods. The revised content are as follows:

      Page 18 in the revised manuscript, “TC-based advanced imaging procedure can clearly visualize its 3D structure, reconstruct the whole across latitudes, and understand the spatial position and expression of each structure, which could avoid the bias of traditional single-layer slicing may cause, and provides a more intuitive and objective description of the existing situation. However, our results demonstrated TC still has some limitations…”

      Page 19 in the revised manuscript, “The 3D sections reconstruction results, however, effectively addressed the issue of weak tdTomato signal and provide a clearer visualization of the distribution of ZsGreen and tdTomato signals. For example, the tdTomato signal in the root pump, which was almost completely unobservable by TC-based imaging, can be clearly seen using confocal imaging and 3D reconstruction (Figure 3C-D, Figure 6C-D, and Figure S4, Figure S12). However, compared to TC, the quality of 3D reconstruction of sections still relies on the angle and quality of the sections, with the section angle having a significant impact on the reconstruction outcome. In addition, because the slice itself has a certain thickness (10 μM in this study), which leads to the appearance of discontinuous in the final reconstructed image, and the aesthetics and accuracy could be affected to a certain extent. Also, unavoidable tissue damage during the sectioning process may result in the loss of some information. Therefore, a variety of different information could be obtained through two different imaging technologies, which prompt us to use the advanced experimental procedure according to the actual purpose.”

      Reviewer #2 (Recommendations For The Authors):

      (1) It should be further highlighted in the article what cell type the NFATc1+/PDGFR-α+ cells should be defined as in teeth and periodontal tissues.

      Thank you so much for your suggestions. We have supplemented the analysis with immunofluorescence results of double-positive cells to identify NFATc1+&PDGFR-α+ cell populations, selecting AlphaV as the marker for mesenchymal stem cells (MSCs) and CD45 as the marker for hematopoietic cells.

      This part was on page 14-15 in the revised manuscript, “To identify the population of PDGFR-α+ and NFATc1+ co-expressing cells in the pulp and periodontal ligament (PDL), we generated Pdgfr-aCreER; Nfatc1DreER; R26-LSL-RSR-tdT-DTR (LRTD) mice… Strong tdTomato signals were detected in both the PDL (Figure S22B) and pulp (Figure S22C). With respect to the MSC-specific marker AlphaV, we observed AlphaV+tdTomato+ cells in both regions. Additionally, CD45+ (hematopoietic marker) tdTomato+ cells were also present in these areas (Figure S22B, C). These findings suggested that the population of PDGFR-a+ and NFATc1+ co-expressing cells is heterogeneous.”

      We also supplemented the discussion regarding the role of  PDGFR-α+ population on page 17. Its potential role in pulp and periodontal formation had been suggested as well:

      Page 17 in the revised manuscript: “After ablating PDGFR-α+ cells, we observed damage to the odontoblast layer and shrinkage of the pulp core in dental pulp tissue, indicating that PDGFR-α+ cells contribute to the composition of dental pulp tissue, particularly the odontoblast layer (Figure. 9C, D). In the periodontal ligament, we noted a reduction and destruction of collagen fibers, suggesting a role for PDGFR-α+ cells in periodontal tissue structure (Figure. 9E, F).”

      (2) The authors are advised to supplement the description of the cellular origin and the differentiation trajectory of NFATc1+/PDGFR-α+ cells in teeth and periodontal tissues.

      Thank you for your suggestion. Our study currently focused more on mapping the distribution atlas of NFATc1+PDGFRα+, NFATc1+PDGFRα-, and NFATc1-PDGFRα+ cells in adult homeostatic mice. In the next step, we plan to explore the differentiation trajectory of NFATc1+/PDGFRα+ cells during development using single-cell sequencing and other methods.

      (3) It is recommended to add figure labels to Figure 1B to facilitate reader comprehension.

      Thank you for your valuable suggestion to improve the quality of this manuscript. We have modified Figure 1B in the revised manuscript as follows:

      (4) Why compare 3D images from tissue clearing with 3D reconstructions of confocal imaging after consecutive tissue slicing?

      Thanks for your important and helpful comments to improve the quality of this manuscript, and we are sorry for the insufficient statement.

      The original intention of comparing the two methods was to is to draw more credible conclusions from multiple perspectives, thereby minimizing the limitations inherent in the singular use of current advanced imaging techniques. Indeed, the description in the previous manuscript could lead to misunderstandings among readers. Therefore, in the revised manuscript, we have modified and integrated the content of Results 4 section into the Discussion section to eliminate unnecessary verbosity and potential confusion.

      Page 18 in the revised manuscript, “TC-based advanced imaging procedure can clearly visualize its 3D structure, reconstruct the whole across latitudes, and understand the spatial position and expression of each structure, which could avoid the bias of traditional single-layer slicing may cause, and provides a more intuitive and objective description of the existing situation. However, our results demonstrated TC still has some limitations…”

      Page 19 in the revised manuscript, “The 3D sections reconstruction results, however, effectively addressed the issue of weak tdTomato signal and provide a clearer visualization of the distribution of Zsgreen and tdTomato signals. For example, the td-tomato signal in the root pump, which was almost completely unobservable by TC-based imaging, can be clearly seen using confocal imaging and 3D reconstruction (Figure 3C-D, Figure 6C-D, and Figure S4, Figure S12). However, compared to TC, the quality of 3D reconstruction of sections still relies on the angle and quality of the sections, with the section angle having a significant impact on the reconstruction outcome. In addition, because the slice itself has a certain thickness (10 μM in this study), which leads to the appearance of discontinuous in the final reconstructed image, and the aesthetics and accuracy could be affected to a certain extent. Also, unavoidable tissue damage during the sectioning process may result in the loss of some information. Therefore, a variety of different information could be obtained through two different imaging technologies, which prompt us to use the advanced experimental procedure according to the actual purpose.”

      (5) The experimental results section does not specify the age of the mice used, which lacks clarity for the reader and makes it difficult to determine at what developmental stage the observed distribution of NFATc1+/PDGFR-α+ cells occurs.

      Thank you for your suggestion. I apologize for overlooking this point. I only displayed the age of the mice in some of the figures. All the transgenic mice discussed in this article are adults around 12-14 weeks. I have added the specific weeks of age in the main text.

      (6) What is the rationale behind selecting day 1, day 3, and day 5 as the experimental time points in Figure 2B?

      Thanks for your questions. 48 hours after injection, TAM can be metabolized in the body and converted into 4-OHT, which then distributes thoroughly to various tissue systems through the bloodstream. Therefore, we chose to administer a booster dose 48 hours after the initial injection to ensure timely replenishment and achieve high labeling efficiency. This drug administration scheme has already been validated for feasibility in our preliminary studies.

      (7) In Figure 2E, why is there a large area of red signal visible in the tooth enamel?

      Thanks for your valuable comments and advice on how to improve the quality of this research article and our future work. As we discussed in the main text, the existing TC-based imaging techniques cannot meet the requirements for capturing as conspicuous tdTomato signals as ZsGreen, which may due to: 1) the editing efficiency of the DNA recombinase-mediated lineage-tracing system has limitations; 2) the lower presence of NFATc1+ cells in the region-of-interest (ROI) ensures weak signals of tdTomato; 3) the TC method as described may result in poor penetration of td-tomato fluorescence signals. Therefore, to clearly display the NFATc1+ cells in the ROI (periodontal ligament, pulp, and alveolar bone) as much as possible, we increased the intensity of excitation fluorescence of 561-channel of the Lightsheet fluorescence microscopy, which led to a large area of unrelated red signal in non-target areas (tooth enamel). In future work, we will further improve the TC procedure to shorten the sample processing time, and developing other transgenic mice to address this issue. Thanks again.

      (8) In the text at Line 249, the author notes that PDGFRα+ cells are widely distributed, and NFATc1+ cells are primarily located in the pulp horns. What is the relevance of their distribution to their function?

      Thank you very much for your suggestion. We found that PDGFRα+ cells are widely distributed in dental pulp tissue. Combined with the results from subsequent cell ablation experiments, it revealed that PDGFRα+ cells contribute to the formation of the odontoblast layer and the pulp core. In our supplementary data, we discovered through immunofluorescence staining that double-positive cells co-expressed AlphaV in the dental pulp, indicating that they possessed MSC components. We need to further investigate the relationship between their distribution and function in the future.

      (9) In Line 301 of the text, there is a mislabeling of Figure 4. Please verify this carefully throughout the document.

      Thank you for your efforts, and we are sorry for our negligence. We have made the necessary corrections and have meticulously reviewed the entire manuscript to ensure that there were no similar mistakes. The corresponding descriptions have been amended as below:

      Page 11 in the revised manuscript, “After 11 days tracing, the number of PDGFR-α+ & NFATc1+ cells and PDGFR-α+NFATc1+ cells increased significantly (Figure 7)…”

      (10) Between Lines 323 to 325, the author states: "the wider range of PDGFR-α+ cells than NFATc1+ cells were observed, which laid the foundation for our conjecture that NFATc1+ cells may contribute as subpopulation of PDGFR-α+ cells." This statement is inaccurate.

      Thank you for your suggestions. We apologize for the inaccuracies in our description and have made corrections in the original text.

      Page 12 in the revised manuscript, “the wider range of PDGFR-α+ cells than NFATc1+ cells were observed, we speculate that there may be a hierarchical relationship between the two.”

      (11) The author is advised to combine the use of single-cell sequencing data for cell trajectory analysis to corroborate the differentiation relationships between NFATc1+/PDGFR-α+ cells, discussing their specific origins and final differentiation fates.

      Thank you for your suggestion; it is very meaningful to us and will be the focus of our future research work.

      (12) In the Results 4 section, the comparison between tissue clearing imaging and 3D reconstruction of consecutive tissue slices could be discussed in the discussion section.

      We greatly appreciate your suggestions. We have revised and integrated the entire Results 4 section into the Discussion section to avoid unnecessary redundancy and misunderstandings. This adjustment also emphasizes that the goal of using two imaging techniques is to draw more credible conclusions from multiple perspectives, thereby mitigating the shortcomings of relying solely on existing advanced imaging methods. The revised content are as follows:

      Page 18 in the revised manuscript, “TC-based advanced imaging procedure can clearly visualize its 3D structure, reconstruct the whole across latitudes, and understand the spatial position and expression of each structure, which could avoid the bias of traditional single-layer slicing may cause, and provides a more intuitive and objective description of the existing situation. However, our results demonstrated TC still has some limitations…”

      Page 19 in the revised manuscript, “The 3D sections reconstruction results, however, effectively addressed the issue of weak tdTomato signal and provide a clearer visualization of the distribution of Zsgreen and tdTomato signals. For example, the td-tomato signal in the root pump, which was almost completely unobservable by TC-based imaging, can be clearly seen using confocal imaging and 3D reconstruction (Figure 3C-D, Figure 6C-D, and Figure S4, Figure S12). However, compared to TC, the quality of 3D reconstruction of sections still relies on the angle and quality of the sections, with the section angle having a significant impact on the reconstruction outcome. In addition, because the slice itself has a certain thickness (10 μM in this study), which leads to the appearance of discontinuous in the final reconstructed image, and the aesthetics and accuracy could be affected to a certain extent. Also, unavoidable tissue damage during the sectioning process may result in the loss of some information. Therefore, a variety of different information could be obtained through two different imaging technologies, which prompt us to use the advanced experimental procedure according to the actual purpose.”

      (13) The article only demonstrates the impact of removing PDGFR-α+ cells on the dental pulp and periodontal tissues of adult mice. What would be the impact of removing NFATc1α cells on teeth and periodontal tissues?

      Thank you for your suggestions. Our lab had been investigating the role of NFATc1+ cells in PDL and dental pulp tissues which is currently submitted to another journal. So please forgive me for not being able to present the data. The ablation assays showed that NFATc1+ cells may be involved in the formation of the odontoblast layer in dental pulp and in promoting osteogenic differentiation in the periodontal ligament.

      (14) The effects of removing PDGFR-α+ cells on the teeth and periodontal tissues of adult mice are shown in the article. What would be the impact on teeth and periodontal tissues if PDGFR-α cells were removed during early development?

      Thank you for your question. Our current research has not yet focused on the impact of PDGFR-α+ cells on the formation of periodontal ligaments and dental pulp tissue during the developmental stage. In our literature search, we found articles indicating that PDGFR-α was expressed at all stages of tooth development, and that PDGFR-α signaling was crucial for regulating the growth of the tooth apex and the proper extension of the palatal shelves during palatal fusion. Disruption of PDGFRα signaling interferes with apex growth and the critical extension of palatal shelves during craniofacial development. In the future, we would like to focus on the role of PDGFR-α cells during teeth development.

      (15) If the data on the skull are not presented in this paper, it is suggested not to overly describe it in the results section, or to include related skull data in supplementary figures.

      We appreciate your attention to detail and your suggestions for improving the clarity and presentation of our work. The corresponding results of cranium and cranial sutures region were shown in Video S7-9 in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      We sincerely appreciate your thorough review and positive feedback on our manuscript. In accordance with your recommendations, all the questions and concerns have been well-addressed in the revised manuscript. We believe these revisions further enhance the clarity and quality of our work. The point-to-point reply to the comments is listed below:

      (1) In line 181, the author claimed that "we modified and improved a rapid and efficient procedure...this ultrafast clearing technique could minimize the impact on transgenic mice." However, there is no mention in the main text of the amount of time required for other methods. How can the "rapid" element of your improved method be reflected? The author should briefly list a few other studies and discuss them.

      Thanks for your important and helpful comments, and we are sorry for the insufficient statement. In recent years, a variety of tissue clearing methods have emerged. Here is a summary of the methods and durations used for hard tissue clearing as published in several authoritative journals:

      Author response table 1.

      In comparison, our approach requires only approximately two days, thereby minimizing the potential damage to the tissue itself. Additionally, the study employs transgenic mice mediated by lineage tracing, and the shorter processing time also serves to reduce the impact on the fluorescence of the positive cells to a minimum.

      (2) In Figure S6, the author mentioned the use of another 3D reconstruction method-DICOM-3D. What is the advantage of this methodology? Is the conclusion drawn the same as the previous approaches? The author should propose corresponding discussions in this section.

      We sincerely appreciate your comments. The purpose of employing DICOM-3D reconstruction for the serial section images is to validate the constructed results obtained by Imaris. This method is based on sequential 2D DICOM images and utilizes 3D reconstruction and visualization technology to generate a stereoscopic 3D image with intuitive effects. Compared to Imaris reconstruction, this method offers a more straightforward and time-efficient approach. Regardless of the different reconstruction methods employed in this study, the ultimate goal remains consistent, which is to jointly address the spatial positioning and hierarchical relationships of PDGFR-α+, NFATc1+, and PDGFR-α+NFATc1+ cells from multiple perspectives, to enhance the credibility and persuasiveness of our results. We have also included the corresponding description in the revised manuscript as follows:

      Page 8-9 in the revised manuscript, “To enhance the comprehensive and accurate display of the reconstruction results and to mitigate the potential errors that may arise from relying on single reconstruction method, we employed an alternative 3D reconstruction method—DICOM-3D. This method is based on sequential 2D DICOM images and utilizes 3D reconstruction and visualization technology to generate a stereoscopic 3D image with intuitive effects, which was a comparatively straightforward and highly efficient approach. We transformed the serial IF images into DICOM format and subsequently reconstruct it, and the same conclusion can be drawn, namely, PDGFR-α+ cells almost constituted the whole structure of pulp and PDL, with NFATc1+ cells as subpopulation (Figure S6).

      (3) Line 292: Why was the tdTomato signal in confocal-based reconstruction more conspicuous than the TC procedure? Some descriptions would be beneficial for readers' understanding.

      Thank you very much for your comments. We hypothesize that the current light-sheet systems have inherent limitations in capturing tdTomato signals of intact tissue, which become more evident in tissues with inherently low fluorescence strengths (in this work, due to the limitations of editing efficiency in DNA recombinase mediated lineage-tracing system, which guaranteed weaker tdTomato signal compared to ZsGreen). In contrast, traditional confocal imaging techniques do not encounter such issues. The corresponding descriptions in the revised manuscript are shown as follows:

      Page 11 in the revised manuscript, “We hypothesize that the current light-sheet systems for intact tissue-imaging have inherent limitations in capturing tdTomato signals, which become more evident in tissues with inherently low fluorescence strengths (in this work, due to the limitations of editing efficiency in DNA recombinase mediated lineage-tracing system, which guaranteed weaker tdTomato signal compared to ZsGreen). In contrast, traditional confocal imaging techniques do not encounter such issues.”

      (4) Part 2.2, line 305: What is the purpose of analyzing the cranium and cranial sutures region through TC technology?

      Thank you for your comments. There are three main purposes of this part of the experiment. First, our research group has long been committed to studying the distribution and role of NFATc1+ SSCs in a variety of hard tissues, and our previous study has identified the presence of NFATc1+ cells in the cranium by single-cell sequencing. Therefore, in this work, we also intend to investigated the spatiotemporal atlas of NFATc1+ and PDGFR-α+ cells in cranium and cranial sutures region based on transgenic lineage tracing techniques. Second, as a part of craniomaxillofacial hard tissue, we intended to explore whether the presence of NFATc1+ and PDGFR-α+ cells in cranial bone tissue/suture is different from dental and periodontal tissue; In addition, the results in Video S7-9 further demonstrated that our improved tissue clearing procedure in this work is universal for a variety of hard tissues, which lay a foundation for our future researches.

      Page 11 in the revised manuscript, “As a part of craniomaxillofacial hard tissue, we also intended to explore whether the presence of NFATc1+ and PDGFR-α+ cells in cranial bone tissue/suture is different from dental and periodontal tissue (our previous study has identified the presence of NFATc1+ cells in the cranium by single-cell sequencing28”

      (5) Some images before & after the tissue-clearing procedure need to be provided in the supplemental file.

      Thanks for your important and helpful comments to improve the quality of this manuscript. We have included the corresponding description and photographs in the main text and the supplemental file as follows:

      Page 7 in the revised manuscript, “As shown in Figure S1A-B, we recorded bright-field images of the maxilla before and after clearing, and our procedure achieved high transparency of the whole tissue. On this basis, whole-tissue imaging can be achieved, with the observation of different cell type distribution in spatial 3D structure.”

      (6) In part 5, line 394, the author investigated the consequences of the ablation of PDGFR-α+ cells in dental pulp and periodontal mesenchymal tissues, but some research objectives and mechanisms need to be discussed here, regarding: "why choosing to ablation PDGFR-α+ cells instead of NFATc1+ cells? Was the hierarchical relationship between PDGFR-α+ cells and NFATc1+ cells considered during the experimental design?", etc.

      Thank you very much for your suggestion, it has been very helpful. We chose PDGFR-α+ cells as the subject for the cell ablation experiments based on the results from the previous lineage tracing and hierarchical relationship studies. We have included the corresponding description and photographs in the main text and the supplemental file as follows:

      Page 13 in the revised manuscript, “The results from the aforementioned lineage tracing experiments showed that PDGFR-α+ cells constitute a significant component of both dental pulp and periodontal tissues. Additionally, the hierarchical relationship experiments revealed that a portion of NFATc1+ cells in the periodontal ligament derives from PDGFR-α+ progenitor cells. Therefore, investigating the role of PDGFRα+ cells in dental pulp and periodontal tissues has become more urgent.”

      (7) Some claims in the main text were lack of literature citation, such as in lines 207 and 234.

      Thank you very much for your comments. We are deeply sorry for the mistakes. We have added the relevant references at the appropriate locations in the main text as follows:

      (1) line 207 of previous manuscript (page 8, line 206 in the revised manuscript): We sincerely apologize for the typo that occurred during the writing process and have revised the original text to: which was consistent with RNA-sequencing results in the previous study20. (DOI: 10.1177/00220345221074356)

      (2) line 234 of previous manuscript (page 9, line 234 in the revised manuscript): “we employed an alternative 3D reconstruction method—DICOM-3D27.” (DOI: 10.1177/09544119211020148)

      (8) What were the specific reasons for the conspicuous tdTomato signal in the reconstructed images obtained by traditional serial section-based confocal imaging, which were not as evident in TC imaging?

      Thank you very much for your comments. Traditional sectioning and subsequent confocal imaging can clearly display fluorescence signals on a single plane (Figure 3B, Figure 6B, Figure S3, S8, S11, S16, S19), therefore, after 3D reconstruction of multiple planes, it will still have a high resolution (Figure 3, 4, 7, 8). However, for TC imaging, the current light-sheet systems have inherent limitations in capturing tdTomato signals of intact tissue, which become more evident in tissues with inherently low fluorescence strengths (in this work, due to the limitations of editing efficiency in DNA recombinase mediated lineage-tracing system, which guaranteed weaker tdTomato signal compared to ZsGreen). In contrast, traditional confocal imaging techniques do not encounter such issues.

      (9) In tissue clearing techniques, do the chemical reagents and procedures used affect the signal intensity of tdTomato and Zsgreen?

      We appreciate your helpful comment. In this work, we modified and improved a rapid and efficient tissue deep clearing (TC) procedure based the existing SUMIC method, and  (Nature Cardiovascular Research, 2024, 3, 474–491; Cell, 2023, 186, 382-397.e24.). These researches have confirmed that the chemical reagents used in this method do not affect the inherent fluorescence signal of transgenic animals. With our improvements, we minimized the sample processing time as much as possible to avoid any potential adverse effects. The results in Figure 2, Figure 5, and Figure S1 indicated that after TC procedure, the tissue exhibit significant ZsGreen signals and certain tdTomato signals, which sufficiently support our conclusions.

      (10) How did you address the issue of sample integrity and discontinuities in the z-axis caused by the stratification of slices in your reconstructions?

      We greatly appreciate your comments. Currently, reconstruction techniques based on continuous sectioning cannot fully eliminate the discontinuities in the z-axis. Therefore, it is for this reason that we need to compensate for this deficiency by imaging the whole tissue through TC procedure. These two 3D-reconstruction and imaging technologies complement each other to jointly address the spatial positioning and hierarchical relationships of PDGFR-α+, NFATc1+, and PDGFR-α+NFATc1+ cells from multiple perspectives. Additionally, this deficiency can be minimized by improving the technical skills, reducing section thickness, and to minimize tissue loss during sectioning, which is our future research endeavors.

      (11) In Figure 2B, the schematic representation of the operational principle "Cre-loxp/Dre-loxp" does not correspond to the genotype "CreER/DreER". Please correct it.

      Thanks for your important comments. We are sincerely sorry for the mistake. We have modified Figure 2B in the revised manuscript as below:

      (12) Line 450, the specific distribution and differences of PDGFR-α+, NFATc1+, and PDGFR-α+&NFATc1+ cells in pulp and periodontal tissues need to be further described and explained.

      Thank you for your question. We have described this part on page 16 in the revised manuscript, “In PDL tissue, pulse data demonstrated widespread and abundant expression of PDGFR-α single-positive cells as well as NFATc1 single-positive cells, with no significant alteration in expression pattern or quantity after lineage tracing. Consequently, we conclude that in periodontal ligament and dental pulp tissues, PDGFR-α single-positive and NFATc1 single-positive cells primarily label intrinsic periodontal mesenchyme in PDL. Conversely, PDGFR-α+&NFATc1+ cells exhibited a more confined localization in PDL. The tracing data clearly illustrated that PDGFR-α+&NFATc1+ cells successfully gave rise to numerous progenies, which become predominant constituents within the periodontal ligament. In pulp tissue, the distribution of PDGFR-α single-positive cells was similar as that in PDL, primarily labeled odontoblast cell layer and there was not a significant increase in ZsGreen signal after tracing assay.”

      (13) In Figure S9, the sparse presence of NFATc1+ cells in pulp and periodontal tissue raises questions about the plasticity and differentiation potential of these cells. The author should include relevant discussions in this section.

      Thanks for your suggestion. Considering the plasticity and differentiation potential of NFATc1+ cells, we conducted immunofluorescence staining and found that the PDGFR-α+&NFATc1+ cell lineage in dental pulp and periodontal tissues represents a heterogeneous population. This population includes non-terminally differentiated mesenchymal stem cells (MSCs) as well as hematopoietic cells, indicating significant heterogeneity. We have also added this part of the discussion on page 17 of the manuscript.

      Page 17 in the revised manuscript, “Cell ablation and immunofluorescence staining experiments further characterized the types and functions of PDGFR-α+/PDGFR-α+&NFATc1+ populations. After ablating PDGFR-α+ cells, we observed damage to the odontoblast layer and shrinkage of the pulp core in dental pulp tissue, indicating that PDGFR-α+ cells contribute to the composition of dental pulp tissue, particularly the odontoblast layer (Figure. 9C, D). In the periodontal ligament, we noted a reduction and destruction of collagen fibers, suggesting a role for PDGFR-α+ cells in periodontal tissue structure (Figure. 9E, F). Previous results confirmed the presence of double-positive cells in both dental pulp and periodontal tissues and provided insights into their hierarchical relationships in the periodontal ligament (Figure. 8). To further investigate the double-positive cell population, we developed an inducible dual-editing enzyme reporter system to label these cells with tdTomato signals. Using AlphaV as a marker for mesenchymal stem cells (MSCs) and CD45 for hematopoietic cells, we found that double-positive cells included components of both MSCs and hematopoietic cells (Figure S22B, C), indicating a heterogeneous population. Further experiments are necessary to determine whether the predominant role in this co-positive MSC population is played by PDGFR-α+ or NFATc1+ and to clarify the specific functions of these cells in the future.”

      (14) Part 3, line 351, the authors were unable to confirm the hierarchical relationship between PDGFR-α+ and NFATc1+ cells in the dental pulp region. Could this be due to limitations in experimental design or technical methods? Have you considered other factors that might explain these results?

      Thank you for your question. We believe that the possible reason was that PDGFR-α+ cells were a widely distributed constitutive component of dental pulp tissue, while NFATc1+ cells had a more limited expression range, resulting in a significant difference between the two. Therefore, we were unable to calculate the differences. In the future, we could further investigate the hierarchical relationship between the two by increasing the sample size or through in vitro experiments such as immunoprecipitation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Public Reviewer Comments:

      Reviewer 1:

      In this work, Veseli et al. present a computational framework to infer the functional diversity of microbiomes in relation to microbial diversity directly from metagenomic data. The framework reconstructs metabolic modules from metagenomes and calculates the per-population copy number of each module, resulting in the proportion of microbes in the sample carrying certain genes. They applied this framework to a dataset of gut microbiomes from 109 inflammatory bowel disease (IBD) patients, 78 patients with other gastrointestinal conditions, and 229 healthy controls. They found that the microbiomes of IBD patients were enriched in a high fraction of metabolic pathways, including biosynthesis pathways such as those for amino acids, vitamins, nucleotides, and lipids. Hence, they had higher metabolic independence compared with healthy controls. To an extent, the authors also found a pathway enrichment suggesting higher metabolic independence in patients with gastrointestinal conditions other than IBD indicating this could be a signal for a general loss in host health. Finally, a machine learning classifier using high metabolic independence in microbiomes could predict IBD with good accuracy. Overall, this is an interesting and well-written article and presents a novel workflow that enables a comprehensive characterization of microbiome cohorts.

      We thank the reviewer for their interest in our study, their summary of its findings, and their kind words about the manuscript quality.

      Reviewer 2:

      This study builds upon the team's recent discovery that antibiotic treatment and other disturbances favour the persistence of bacteria with genomes that encode complete modules for the synthesis of essential metabolites (Watson et al. 2023). Veseli and collaborators now provide an in-depth analysis of metabolic pathway completeness within microbiomes, finding strong evidence for an enrichment of bacteria with high metabolic independence in the microbiomes associated with IBD and other gastrointestinal disorders. Importantly, this study provides new open-source software to facilitate the reconstruction of metabolic pathways, estimate their completeness and normalize their results according to species diversity. Finally, this study also shows that the metabolic independence of microbial communities can be used as a marker of dysbiosis. The function-based health index proposed here is more robust to individuals' lifestyles and geographic origin than previously proposed methods based on bacterial taxonomy.

      The implications of this study have the potential to spur a paradigm shift in the field. It shows that certain bacterial taxa that have been consistently associated with disease might not be harmful to their host as previously thought. These bacteria seem to be the only species that are able to survive in a stressed gut environment. They might even be important to rebuild a healthy microbiome (although the authors are careful not to make this speculation).

      This paper provides an in-depth discussion of the results, and limitations are clearly addressed throughout the manuscript. Some of the potential limitations relate to the use of large publicly available datasets, where sample processing and the definition of healthy status varies between studies. The authors have recognised these issues and their results were robust to analyses performed on a per-cohort basis. These potential limitations, therefore, are unlikely to have affected the conclusions of this study.

      Overall, this manuscript is a magnificent contribution to the field, likely to inspire many other studies to come.

      We thank the reviewer for their endorsement of our study and their precision regarding the evaluation of its strengths. We also appreciate their high expectations for its impact in the field.

      Reviewer 3:

      The major strength of this manuscript is the "anvi-estimate-metabolism' tool, which is already accessible online, extensively documented, and potentially broadly useful to microbial ecologists.

      We thank the reviewer for their recognition of the computational advances in this study. We also thank the reviewer for their suggestions that we have addressed below, which allowed us to strengthen our manuscript.

      However, the context for this tool and its validation is lacking in the current version of the manuscript. It is unclear whether similar tools exist; if so, it would help to benchmark this new tool against prior methods.

      The reviewer brings up a very good point about the lack of context for the `anvi-estimate-metabolism` program. While our efforts that led to the emergence of this software included detailed benchmarking efforts, a formal assessment of its performance and accuracy was indeed lacking. We are thankful for our reviewer to point this out, which motivated us to perform additional analyses to address such concerns. Our revision contains a new, 34-page long supplementary information file (Supplementary File 2) that includes a section titled “Comparison of anvi-estimate-metabolism to existing tools for metabolism reconstruction”. The text therein describes the landscape of currently available software for metabolism reconstruction and describes the features that make `anvi-estimate-metabolism` unique – namely, (1) its implementation of metrics that make it suitable for metagenome-level analyses (i.e., pathway copy number and stepwise interpretation of pathway definitions) and (2) its ability to process user-defined metabolic pathways rather than exclusively relying on KEGG. As described in that section, there is currently no other tool that can compute copy numbers of metabolic pathways from metagenomic data. Hence, it is not quite possible to benchmark the copy number methodology used in our study against prior methods; however, our benchmarking of this functionality with synthetic genomes and metagenomes (described later in this document) does provide necessary quantitative insights into its accuracy and efficiency.

      While comparison of the copy number calculations to other tools was not possible due to the unique nature of this functionality, it was possible to benchmark our gene function annotation methodology against existing tools that also annotate genes with KEGG KOfams, which is a step commonly used by various tools that aim to estimate metabolic potential in genomes and metagenomes. In the anvi’o software ecosystem the annotation of genes for metabolic reconstruction is implemented in `anvi-run-kegg-kofams`, and represents a step that is required by `anvi-estimate-metabolism`. As our comparisons were quite extensive and involved additional researchers, we described them in another study which we titled “Adaptive adjustment of significance thresholds produces large gains in microbial gene annotations and metabolic insights” (doi:10.1101/2024.07.03.601779) that is now cited from within our revision in the appropriate context. Briefly, our comparison of anvi’o, Kofamscan, and MicrobeAnnotator using 396 publicly-available bacterial genomes from 11 families demonstrated that `anvi-run-kegg-kofams` is able to identify an average of 12.8% more KO annotations per genome than the other tools, especially in families commonly found in the gut environment (Figure 1). Furthermore, anvi’o recovered the highest proportion of annotations that were independently validated using eggNOG-mapper. Our comparisons also showed that annotations from anvi’o yield at least 11.6% more complete metabolic modules than Kofamscan or MicrobeAnnotator, including the identification of butyrate biosynthesis in Lachnospiraceae genomes at rates similar to manual identification of this pathway in this clade (Figure 2a). Overall, our findings that are now described extensively in DOI:10.1101/2024.07.03.601779 show that our method captures high-quality annotations for accurate downstream metabolism estimates.

      We hope these new data help increase the reviewer’s confidence in our results.

      Simulated datasets could be used to validate the approach and test its robustness to different levels of bacterial richness, genome sizes, and annotation level.

      We thank the reviewer for this suggestion. It was an extremely useful exercise that not only helped us elucidate the nuances of our approach, but also enabled us to further highlight its strengths in our manuscript. We created simulated datasets including a total of 409 synthetic metagenomes that we used to test the robustness of our approach to different genome sizes, community sizes, and levels of diversity. Overall, our tests with these synthetic metagenomes demonstrated that our approach of computing PPCN values to summarize the metabolic capacity within a metagenomic community is accurate and robust to differences in all three critical variables. Most of these variables were weakly correlated between PPCN or PPCN accuracy, and the few correlations that were stronger in fact further supported our original hypothesis that we generated from our comparisons of healthy and IBD gut metagenomes. The methods and results of our validation efforts are explained in detail in our new Supplementary File 2 (see the section titled “Validation of per-population copy number (PPCN) approach on simulated metagenomic data”), but we copy here the subsection that summarizes our findings for the reviewer’s convenience:

      Overall impact on the comparison between healthy and IBD gut metagenomes

      “In summary, our validation strategy revealed good accuracy at estimating metagenome-level metabolic capacity relative to our genome-level knowledge in the simulated data. While it often underestimated average genomic completeness by ignoring partial copies of metabolic pathways and often overestimated average genomic copy number due to the effect of pathway complementarity between different community members, the magnitude of error was overall limited in range and the error distributions were centered at or near 0. Furthermore, we observed these broad error trends in all cases we tested, and therefore we expect that they would also apply to both sample groups in our comparative analysis. Thus, we next considered how the PPCN approach might have influenced our analyses that considered metagenomes from healthy individuals and from those who have IBD – two groups that differed from one another with respect to some of the variables considered in our tests.

      Most of the correlations between PPCN or PPCN accuracy and sample parameters were weak, yet significant (Table 1). They showed that community size and diversity level have limited influence on the PPCN calculation, while genome size does not influence its accuracy. The only exception was the moderate correlation between PPCN and genome size, particularly for the subset of IBD-enriched pathways. It was a negative correlation with the proportion of small genomes in a metagenome, indicating that PPCN values for these pathways are larger when there are more large genomes in the community and suggesting that these pathways tend to occur frequently in larger genomes. This is in line with our observation that IBD communities contain more large genomes and therefore confirms our interpretation that the populations surviving in the IBD gut microbiome are those with the genomic space to encode more metabolic capacities.

      If we consider even the weak correlations, two of those relationships indicate that our approach would be more accurate for IBD metagenomes than for healthy metagenomes. For instance, PPCN accuracy was slightly higher for smaller communities (as in IBD samples), with a weakly positive correlation between PPCN error and community size. It was also slightly more accurate for less diverse communities (as in IBD samples), with a weakly positive correlation between PPCN error and number of phyla. The only opposing trend was the weakly positive correlation between PPCN error and proportion of smaller genomes, which favors higher accuracy in communities with smaller genomes (as in healthy samples). Given that our analysis focuses on the pathways enriched in IBD samples, an overall higher accuracy in IBD samples would increase the confidence in our enrichment results.

      We also examined the accuracy of our method to predict the number of populations within a metagenome based on the distribution and frequency of single-copy core genes (i.e., the denominator in the calculation of PPCN). Our benchmarks show that the estimates are overall accurate, where most errors reflect a negligible amount of underestimations of the actual number of populations. Errors occurred more frequently for the realistic synthetic assemblies generated from simulated short read data than for the ideal synthetic assemblies generated from the combination of genomic contigs. The correlations between estimation accuracy and sample parameters indicated that the population estimates are more accurate for smaller communities and communities with more large genomes, as in IBD samples (Table 2). Thus, this method is more likely to underestimate the community size in healthy samples, and these errors could lead to overestimation of PPCN in healthy samples relative to IBD samples. Thus, the enrichment of a given pathway in the IBD samples would have to overcome its relative overestimation in the healthy sample group, making it more likely that we identified pathways that were truly enriched in the IBD communities.

      Overall, the consideration of our simulations in the context of healthy vs IBD metagenomes suggest that slight biases in our estimates as a function of unequal diversity with sample groups should have driven PPCN calculations towards a conclusion that is opposite of our observations under neutral conditions. Thus, clear differences between healthy vs IBD metagenomes that overcome these biases suggest that    biology, and not potential bioinformatics artifacts, is the primary driver of our observations.”

      Accordingly, we have added the following sentence summarizing the validation results to our paper:

      “Our validation of this method on simulated metagenomic data demonstrated that it is accurate in capturing metagenome-level metabolic capacity relative to genome-level metabolic capacity estimated from the same data (Supplementary File 2, Supplementary Table 6).”

      Early in this process of validation, we identified and fixed two minor bugs in our codebase. The bugs did not affect the results of our paper and therefore did not warrant a re-analysis of our data. The first bug, which is detailed in the Github issue https://github.com/merenlab/anvio/issues/2231 and fixed in the pull request https://github.com/merenlab/anvio/pull/2235, led to the overestimation of the number of microbial populations in a metagenome when the metagenome contains both Bacteria and Archaea. None of the gut metagenomes analyzed in our paper contained archaeal populations, so this bug did not affect our community size estimates.

      The second bug, which is detailed in the Github issue https://github.com/merenlab/anvio/issues/2217 and fixed in the pull request https://github.com/merenlab/anvio/pull/2218, caused inflation of stepwise copy numbers for a specific type of metabolic pathway in which the definition contained an inner parenthetical clause. This bug affected only 3 pathways in the KEGG MODULE database we used for our analysis, M00083, M00144, and M00149. It is worth noting that one of those pathways, M00083, was identified as an IBD-enriched module in our analysis. However, the copy number inflation resulting from this bug would have occurred equivalently in both the healthy and IBD sample groups and thus should not have impacted our comparative analysis.

      Regardless, we are grateful for the suggestion to validate our approach since it enabled us to identify and eliminate these minor issues.

      The concept of metabolic independence was intriguing, although it also raises some concerns about the overinterpretation of metagenomic data. As mentioned by the authors, IBD is associated with taxonomic shifts that could confound the copy number estimates that are the primary focus of this analysis. It is unclear if the current results can be explained by IBD-associated shifts in taxonomic composition and/or average genome size. The level of prior knowledge varies a lot between taxa; especially for the IBD-associated gamma-Proteobacteria.

      The reviewer brings up an important point, and we are thankful for the opportunity to clarify the impact of taxonomy on our analysis. Though IBD has been associated with taxonomic shifts in the gut microbiome, a major problem with such associations is that the taxonomic signal is extremely variable, leading to inconsistency in the observed shifts across different studies (doi:https://doi.org/10.3390/pathogens8030126). Indeed, one of the most comprehensive prior studies into this topic demonstrated that inter-individual variation is the largest contributor to all multi-omic measurements aiming to differentiate between the gut microbiome of individuals with IBD from that of healthy individuals, including taxonomy (doi:10.1038/s41586-019-1237-9). We therefore took a different approach to study this question that is independent of taxonomy, by focusing on metabolic potential estimated directly from metagenomes to elucidate an ecological explanation behind the reduced diversity of the IBD gut microbiome, which studies of taxonomic composition alone are not able to provide. Furthermore, the variability inherent to taxonomic profiles of the gut microbiome makes it unlikely that taxonomic shifts could confound our analysis, especially given our large sample set encompassing a variety of individuals with different origins, ages, and genders.

      We agree with the reviewer that our level of prior knowledge varies substantially across taxa. Regardless, the only prior knowledge with any bearing on our ability to estimate metabolic capacity in a taxonomy-independent manner is the extent of sequence diversity captured by our annotation models for the enzymes used in metabolic pathways. During our analysis, we had observed that metagenomes in the healthy group had fewer gene annotations than those in the IBD group and we therefore shared the reviewer’s concern about potential annotation bias, whereby less-studied genomes are not always incorporated into the Hidden Markov Models for annotating KEGG Orthologs, perhaps making it more likely for us to miss annotations in these genomes (and leading to lower completeness scores for metabolic pathways in the healthy samples). Our annotation method partially addresses this limitation by taking a second look at any unannotated genes and mindfully relaxing the bit score similarity thresholds to capture annotations for any genes that are slightly too different from reference sequences for annotation with default thresholds. As mentioned previously, our recent preprint demonstrates the efficacy of this strategy (doi:10.1101/2024.07.03.601779). To further address this concern, we also investigated the extent of distant homology in these metagenomes using AGNOSTOS (doi:https://doi.org/10.7554/eLife.67667), which showed a higher proportion of unknown genes in the healthy metagenomes and suggested that a substantial portion of the unannotated genes are not distant homologs of known enzymes that we failed to annotate due to lack of prior knowledge about them, but rather are completely novel functions. To describe these results, we added the following paragraph and two accompanying figures (Supplementary Figure 4g-h) to the section “Differential annotation efficiency between IBD and Healthy samples” in Supplementary File 1:

      “To understand the potential origins of the reduced annotation rate in healthy metagenomes, we ran AGNOSTOS (Vanni et al. 2022) to classify known and unknown genes within the healthy and IBD sample groups. AGNOSTOS clusters genes to contextualize them within an extensive reference dataset and then categorizes each gene as ‘known’ (has homology to genes annotated with Pfam domains of known function), ‘genomic unknown’ (has homology to genes in genomic reference databases that do not have known functional domains), or ‘environmental unknown’ (has homology to genes from metagenomes or MAGs that do not have known functional domains). The resulting classifications confirm that healthy metagenomes contain fewer ‘known’ genes than metagenomes in the IBD sample group – the proportion of ‘known’ genes classified by AGNOSTOS is about 3.0% less in the healthy metagenomes than in the IBD sample group, which is similar to the ~3.5% decrease in the proportion of ‘unannotated’ genes observed by simply counting the number of genes with at least one functional annotation (Supplementary Figure 4g-h, Supplementary Table 1e). Furthermore, the majority of the unannotated genes in either sample group were categorized by AGNOSTOS as ‘genomic unknown’ (Supplementary Figure 4g), suggesting that the unannotated sequences are genes without biochemically-characterized functions currently associated with them and are thus legitimately lacking a functional annotation in our analysis, rather than representing distant homologs of known protein families that we failed to annotate. Based upon the classifications, a systematic technical bias is unlikely driving the annotation discrepancy between the sample groups.”

      Furthermore, we have already discussed this limitation and its implications in our manuscript (see section “Key biosynthetic pathways are enriched in microbial populations from IBD samples”). To further clarify that our approach is independent of taxonomy, we have now also amended the following statement in our introduction:

      “Here we implemented a high-throughput, taxonomy-independent strategy to estimate metabolic capabilities of microbial communities directly from metagenomes and investigate whether the enrichment of populations with high metabolic independence predicts IBD in the human gut.”

      Finally, the reviewer is also correct that genome size is a part of the equation, as genome size and level of metabolic capacity are inextricable. In fact, we observed this in our analysis, as already stated in our paper:

      “HMI genomes were on average substantially larger (3.8 Mbp) than non-HMI genomes (2.9 Mbp) and encoded more genes (3,634 vs. 2,683 genes, respectively)”

      Since larger genomes have the space to encode more functional capacity, it follows that having higher metabolic independence would require a microbe to have a larger genome. The validation of our method on simulated metagenomic data supported this idea by demonstrating that the IBD-enriched metabolic pathways are commonly identified in large genomes. The validation also proved that genome size does not influence the accuracy of our approach (Supplementary File 2).

      It can be difficult to distinguish genes for biosynthesis and catabolism just from the KEGG module names and the new normalization tool proposed herein markedly affects the results relative to more traditional analyses.

      We agree with the reviewer that KEGG module names do not clearly indicate the presence of biosynthetic genes of interest. That said, KEGG is a commonly-used and extensively-curated resource, and many biologists (including ourselves) trust their categorization of genes into pathways. We hope that readers who are interested in specific genes within our results would make use of our publicly-available datasets (which include gene annotations) to conduct a targeted analysis based on their expertise and research question.

      However, we would like to respectfully note that the ability to distinguish the genes within each KEGG module may not be very useful to most readers, and is unlikely to have a meaningful impact in our findings. As the reviewer most likely appreciates, the presence of individual genes in isolation can be insufficient to indicate biosynthetic capacity, considering that 1) most biosynthetic pathways involve several biochemical conversions requiring a series of enzymes, 2) enzymes are often multi-functional rather than exclusive to one pathway, and 3) different organisms in a community may utilize enzymes encoded by different genes to perform the same or similar biochemical reaction in a pathway. We therefore made the choice to analyze metabolic capacity at the pathway level, because this would better reflect the biosynthetic abilities encoded by the multiple microbial populations within each metagenome.

      The reviewer also suggests that our novel normalization method affects our results, yet we believe that this normalization strategy is one of the strengths of our study in comparison to ‘more traditional analyses’ as it enables an appropriate comparison between metagenomes describing microbial communities of dramatically different degrees of richness. Indeed, we suspect that the lack of normalization in more traditional analyses may be one reason why prior analyses have so far failed to uncover any mechanistic explanation for the loss of diversity in the IBD gut microbiome. We hope that our validation efforts were sufficiently convincing in demonstrating the suitability of our approach, and copy here a particularly illuminating section of the validation results that we have added to Supplementary Information File 2:

      “As expected, we observed a significant positive correlation between metagenomic copy number (the numerator of PPCN) and community size in each group, likely driven by the increase in the copy number of core metabolic pathways in larger communities (Supplementary Figure 18). Interestingly, this correlation was much stronger for the subset of IBD-enriched pathways (0.49 <= R <= 0.67) than for all modules (0.12 <= R <=0.13).

      “However, the correlation was much weaker and often nonsignificant for the normalized PPCN data in both groups of modules (all modules: 0.01 < R < 0.04, enriched modules: 0.04 < R < 0.09, Supplementary Table 6b, Supplementary Figure 19), which demonstrates the suitability of our normalization method to remove the effect of community size in comparisons of metagenome-level metabolic capacity.”

      As such, it seems safer to view the current analysis as hypothesis-generating, requiring additional data to assess the degree to which metabolic dependencies are linked to IBD.

      We certainly agree with the reviewer that our study, similar to the vast majority of studies published every year, is a hypothesis-generating work. Any idea proposed in any scientific study in life sciences will certainly benefit from additional data analyses, and therefore we respectfully do not accept this as a valid criticism of our work. The inception of this study is linked to an earlier work that hypothesized high metabolic independence as a determinant of microbial fitness in stressed gut communities (doi:10.1186/s13059-023-02924-x), which lacked validation on larger sets of data. Our study tests this original hypothesis using a large number of metagenomes, and lends further support for it with approaches that are now better validated. Furthermore, there are other studies that agree with our interpretation of the data (doi:10.1101/2023.02.17.528570, doi:10.1038/s41540-021-00178-6), and we look forward to more computational and/or experimental work in the future to generate more evidence to evaluate these insights further.

      Response to Recommendations for the Authors

      Reviewer 1:

      My main comments include:

      - From the results reported in lines 178-185, it seems that metabolic pathways in general were enriched in IBD microbiomes, not specifically biosynthetic pathways. Can we really say then that the signal is specific for biosynthesis capabilities?

      We apologize for the confusion here. When we read the text again, we ourselves were confused with our phrasing.

      The reviewer is correct that a similar proportion of both biosynthetic and non-biosynthetic pathways had elevated per-population copy number (PPCN) values in the IBD samples. However, the low microbial diversity associated with IBD and the on average larger genome size of individual populations contributes to this relative enrichment of the majority of metabolic modules. To remove this bias and identify specific modules whose enrichment was highly conserved across microbial populations associated with IBD, we implemented two criteria: 1) we selected modules that passed a high statistical significance threshold in our enrichment test (Wilcoxon Rank Sum Test, FDR-adjusted p-value < 2e-10), and 2) we accounted for effect size by ranking these modules according to the difference between their median PPCN in IBD samples and their median PPCN in healthy samples, and keeping only those in the top 50% (which translated to an effect size threshold of > 0.12).

      This analysis revealed a set of metabolic modules that were consistently and highly significantly enriched in microbial communities associated with IBD. The majority of these metabolic modules encode biosynthesis pathways. Our use of the terms “elevated”, “enriched”, and “significantly enriched” in the previous version of the text was confusing to the reader. We thank the reviewer for pointing this out, and we hope that our revision of the text clarifies the analysis strategy and observations:

      “To gain insight into potential metabolic determinants of microbial survival in the IBD gut environment, we assessed the distribution of metabolic modules within samples from each group (IBD and healthy) with and without using PPCN normalization. Without normalizing, module copy numbers were overall higher in healthy samples (Figure 2a) and modules exhibited weak differential occurrence between cohorts (Figure 2b, 2c, Supplementary Figure 3). The application of PPCN reversed this trend, and most metabolic modules were elevated in IBD (Supplementary Figure 5). This observation is influenced by two independent aspects of the healthy and IBD microbiota. The first one is the increased representation of microbial organisms with smaller genomes in healthy individuals (Watson et al. 2023), which increases the likelihood that the overall copy number of a given metabolic module is below the actual number of populations. In contrast, one of the hallmarks of the IBD microbiota is the generally increased representation of organisms with larger genomes (Watson et al. 2023). The second aspect is that the generally higher diversity of microbes in healthy individuals increases the denominator of the PPCN. This results in a greater reduction in the PPCN of metabolic modules that are not shared across all members of the diverse gut microbial populations in health.

      To go beyond this general trend and identify modules that were highly conserved in the IBD group, we first selected those that passed a relatively high statistical significance threshold in our enrichment test (Wilcoxon Rank Sum Test, FDR-adjusted p-value < 2e-10). We then accounted for effect size by ranking these modules according to the difference between their median PPCN in IBD samples and their median PPCN in healthy samples, and keeping only those in the top 50% (which translated to an effect size threshold of > 0.12). This stringent filtering revealed a set of 33 metabolic modules that were significantly enriched in metagenomes obtained from individuals diagnosed with IBD (Figure 2d, 2e), 17 of which matched the modules that were associated with high metabolic independence previously (Watson et al. 2023) (Figure 2f). This result suggests that the PPCN normalization is an important step in comparative analyses of metabolisms between samples with different levels of microbial diversity.”

      Lines 178-185 from our original submission have been removed to avoid further confusion. These results can be found in Supplementary File 1 (section “Module enrichment without consideration of effect size leads to nonspecific results”).

      It is not entirely clear to me what is meant by PPCN normalization. Normalize the number of copy numbers to the overall number of genes?

      The idea behind using per-population copy number (PPCN) is to normalize the prevalence of each metabolic module found in an environment with the number of microbial populations within the same sample. PPCN achieves this by dividing the pathway copy numbers by the number of microbial populations in a given metagenome, which we estimate from the frequency of bacterial single-copy core genes. We have updated the description of the per-population copy number (PPCN) calculation to clarify its use:

      “Briefly, the PPCN estimates the proportion of microbes in a community with a particular metabolic capacity (Figure 1, Supplementary Figure 2) by normalizing observed metabolic module copy numbers with the ‘number of microbial populations in a given metagenome’, which we estimate using the single-copy core genes (SCGs) without relying on the reconstruction of individual genomes.”

      We also note that the equation for PPCN is shown in Figure 1.

      It is also not clear to me how the classifier predicts stress on microbiomes rather than dysbiosis.

      The reviewer asks an interesting question since it is true that we could also use the term “dysbiosis” rather than “stress”. Yet we refrained from the use of dysbiosis as it is considered a poorly-defined term to describe an altered microbiome often associated with a specific disease (doi:https://doi.org/10.3390/microorganisms10030578), such as IBD, relative to another poorly-defined state, “healthy microbiome” (doi:https://doi.org/10.1002/phar.2731). We do consider that stress is not necessarily a term that is less vague than dysbiosis, yet it has the advantage of being more common in studies of ecology compared to dysbiosis. Our relatively neutral stance towards which term to use has shifted dramatically due to one critical observation in our study: the identical patterns of enrichment of HMI microbes in individuals diagnosed with IBD as well as in healthy individuals treated with antibiotics. We appreciate that the observed changes in the antibiotics case can also fulfill the definition of “dysbiosis”, but the term “stress response” more accurately describes what the classifier identifies in our opinion.

      What is the advantage of using the estimate-metabolism pipeline presented in this article over workflows such as those using genome-scale models, which are repeatedly cited and discussed?

      Genome-scale models are often appropriate for a big-picture view of metabolism, and especially when the capability to perform quantitative simulations like flux-balance analysis is needed. For our investigation, we wanted a more specific and descriptive summary of metabolic capacity, so we focused on individual KEGG modules, which qualitatively describe subsets of the vast metabolic network with pathway names that all readers can understand, rather than working with an abstract model of the entire network. Furthermore, genome-scale models would have prevented us from assessing the redundancy (copy number) of metabolic pathways, as these networks usually focus on the presence-absence of gene annotations for enzymes in the network rather than the copy number of these annotations. The copy number metric has been critical for our analyses, considering that we are focusing on metabolic capacity at the community level and require the ability to normalize this metabolic capacity by the size of the community described by each metagenome. Finally, assessing a discrete set of metabolic pathways yielded a corresponding set of features that we used to create the machine learning classifier, whereas data from genome-scale models would not be as easily transferable into classifier features.

      Minor comments:

      Figure 2d and e are mentioned in the text before Figure 2a.

      We thank the reviewer for catching this. We have rewritten the section as follows to put the figure references in numerical order:

      !To gain insight into potential metabolic determinants of microbial survival in the IBD gut environment, we assessed the distribution of metabolic modules within samples from each group (IBD and healthy) with and without using PPCN normalization. Without normalizing, module copy numbers were overall higher in healthy samples (Figure 2a) and modules exhibited weak differential occurrence between cohorts (Figure 2b, 2c, Supplementary Figure 3). After the application of PPCN, most metabolic modules were elevated in IBD (Supplementary Figure 5). This observation is a product of two independent aspects of the healthy and IBD microbiota. The first one is the increased representation of microbial organisms with smaller genomes in healthy individuals (Watson et al. 2023), which increases the likelihood that the overall copy number of a given metabolic module is below the actual number of populations. In contrast, one of the hallmarks of the IBD microbiota is the generally increased representation of organisms with larger genomes (Watson et al. 2023). The second aspect is that the generally higher diversity of microbes in healthy individuals increases the denominator of the PPCN due to the higher number of populations detected in these samples. This results in a greater reduction in the PPCN of metabolic modules that are not shared across all members of the diverse gut microbial populations in health. To go beyond this general trend and identify modules that were highly conserved in the IBD group, we first selected those that passed a relatively high statistical significance threshold in our enrichment test (Wilcoxon Rank Sum Test, FDR-adjusted p-value <2e-10). We then accounted for effect size by ranking these modules according to the difference between their median PPCN in IBD samples and their median PPCN in healthy samples, and keeping only those in the top 50% (which translated to an effect size threshold of > 0.12). This stringent filtering revealed a set of 33 metabolic modules that were significantly enriched in metagenomes obtained from individuals diagnosed with IBD (Figure 2d, 2e), 17 of which matched the modules that were associated with high metabolic independence previously (Watson et al. 2023) (Figure 2f). This result suggests that the PPCN normalization is an important step in comparative analyses of metabolisms between samples with different levels of microbial diversity.!

      How much preparation is needed for users that want to apply the estimate-metabolism pipeline to their own datasets? From the documentation at anvi'o, it still seems like a significant effort.

      We thank the reviewer for this important question. The use of anvi-estimate-metabolism is simple, but the concept it makes available and the means it offers its users to interact with their data are not basic, thus its use requires some effort. Anvi’o provides users with the ability to directly interact with their data at each step of the analysis to have full control over the analysis and to make informed decisions on the way. In comparison to pre-defined analysis pipelines that often require no additional input from the user, this approach requires some level of involvement of the user throughout the process – namely, they must run a few programs in series rather than running just one pipeline command that quietly handles everything on their behalf. The most basic workflow for using `anvi-estimate-metabolism` is quite straightforward and requires four simple steps following the installation of anvi’o: 1. Run the program `anvi-setup-kegg-data` to download the KEGG data. 2. Convert the assembly FASTA file into an anvi’o-compatible database format with gene calls by running `anvi-gen-contigs-database`. 3. Annotate genes with KOs with the program `anvi-run-kegg-kofams`. 4. Get module completeness scores and copy numbers by running `anvi-estimate-metabolism`. In addition, we provide simple tutorials (such as the one at https://anvio.org/tutorials/fmt-mag-metabolism/) and reproducible bioinformatics workflows online (including for this study at https://merenlab.org/data/ibd-gut-metabolism/) which helps early career researchers to apply similar strategies to their own datasets. We are happy to report that we have been using this tool in our undergraduate education, and observed that students with no background in computation were able to apply it to their questions without any trouble.

      Reviewer 2:

      Congratulations on this great work, the manuscript is a pleasure to read. Minor questions that the authors might want to clarify:

      L 275: Why use reference genomes from the GTDB (for only 3 phyla) instead of using MAGs reconstructed from the data? I understand that assemblies based on individual samples would probably not yield enough complete MAGs, but I would expect that co-binning the assemblies for the entire dataset would.

      We thank the reviewer for their kind words. We certainly agree that metagenome assembled genomes (MAGs) reconstructed directly from the assemblies would by nature represent the populations in these communities better than reference genomes. However, one of our aims in this study was to avoid the often error-prone and time-consuming step of reconstructing MAGs. Most automatic binning algorithms inevitably make mistakes, and especially for metabolism estimation, low quality MAGs can introduce a bias in the analysis. At the same time the manual curation of each bin to remove any contamination would require a substantial effort and make the workflow less accessible for others to use. As an example, in our previous work (doi:10.1186/s13059-023-02924-x), careful refinement of MAGs from just two co-assemblies took two months. Here, we developed the PPCN workflow as a more scalable, assembly-level analysis to avoid the need for binning in the first place.

      To supplement and confirm the metagenome-level results, we decided to run a genome-level analysis. We used the GTDB since it represents the most comprehensive, dereplicated collection of reference genomes across the tree of life. We chose those 3 phyla in particular because of their ecological relevance in the human gut environment. Bacteroidetes and

      Firmicutes together represent the majority (up to ~90%) of the populations in healthy individuals (doi:10.1038/nature07540), and Proteobacteria represent the next most abundant phylum on average (2% ± 10%) (doi:10.1371/journal.pone.0206484).

      L 403: Should the Franzosa and Papa papers be referenced as numbers?

      Thanks for pointing this out. The rogue numerical citation was actually an artifact of the submission and was corrected to a long-format citation in the online version of the manuscript on the eLife website.

      Reviewer 3:

      The lack of any experimental validation contributes to the tentative nature of the conclusions that can be drawn at this time. Numerous studies have looked at the metabolism of gut bacterial species during in vitro growth, which could be mined to test if the in silico predictions of metabolism can be supported. Alternatively, the authors could isolate key strains of interest and study them in culture or in mouse models of IBD.

      We appreciate these suggestions and agree with the reviewer that experimental validation is important. However, we do not agree that either the use of mouse models or the isolation of individual microbial strains would be an appropriate experimental test in this case. The use of humanized gnotobiotic mice has critical limitations (see doi:10.1016/j.cell.2019.12.025 and references within the section on “human microbiota-associated murine models”). As it is not possible to establish a mouse model whose gut microbiota fully reflect the human gut microbiome, such an approach would neither be appropriate to validate our findings, nor would it have been possible to produce the insights we have gained based on environmental data. We are not sure how exactly a mouse model, even when ignoring the well established limitations, could improve or validate a comprehensive analysis of a large “environmental” datasets that resulted in highly significant signals.

      We are also not sure that we understand how the reviewer believes that the isolation of individual strains would aid in validating our findings. While we appreciate that not all relevant genes are captured by the available annotation routines and that some genes may be misannotated, the large dataset used here renders these concerns negligible. Isolating a small subset of bacterial populations would hardly lead to a representative sample and testing their metabolic capacities in vitro would not improve the reliability of our analysis.

      Boilerplate suggestions as vague as “isolate key strains of interest” or “experiment in mouse models of IBD” do not add or retract anything from our findings. Our findings and hypotheses are well supported by our data and extensive analyses.

      Line 9 - not sure this approach is hypothesis testing in the traditional sense, you might reword.

      Hypothesis testing occurs when one makes an observation, develops an hypothesis that explains the observation, and then gathers and analyzes data to investigate whether additional data support or disprove the hypothesis. We are not convinced a reword is necessary.

      Line 40 - the lack of consistent differences in IBD and healthy individuals does not mean that the microbiome doesn't impact disease. It's important to consider all the mechanistic studies in animal models and other systems.

      Our study does not claim that microbiome has no impact on the course of disease.

      Line 50 - this seemed out of place and undercuts the current findings. Upon checking Ref. 31, the analysis seems distinct enough to not mention in the introduction.

      We disagree. Ref 31 uses genome-scale metabolic models to identify the loss of cross-feeding interactions in the gut microbiome of individuals with IBD, which is another way of saying that the microbes in IBD no longer rely on their community for metabolic exchange – in other words, they are metabolically independent. This is an independent observation that is parallel to our results and confirms our analysis; hence, it is important to keep in our introduction.

      Line 55 - Ref. 32 looked at FMT, which should be explicitly stated here.

      The reviewer’s suggestion is not helpful. Ref 32 has a significant focus on IBD as it compares a total of 300 MAGs generated from individuals with IBD to 264 MAGs from healthy individuals and shows differences in metabolic enrichment between healthy and IBD samples independent of taxonomy, thus setting the stage for our current work. What model has been used to generate the initial insights that led to the IBD-related conclusion in Ref 32 has no significance in this context.

      Lines 92-107 - this text is out of place in the Results section and reads more like a review article. Please trim it down and move it to the introduction.

      We would like to draw the reviewer’s attention to the fact that this is a “Result and Discussion” section. In this specific case it is important for readers to appreciate the context for our new tool, as the reviewer commented in the public review. We kindly disagree with the reviewer’s suggestion to remove this text as that would diminish the context.

      Line 107 - is "selection" the word you meant to use?

      If the frequency of a given metabolic module remains the same or increases despite the decreasing diversity of the microbial community, it is conceivable to assume that its enrichment indicates the presence of a selective process to which the module responds. It is indeed the word we meant to use.

      Line 110 - this is the first mention of this new method, need to add it to the abstract and introduction.

      The reviewer must have overlooked the text passages in which we mention the strategy we developed within the abstract:

      “Here, we tested this hypothesis on a large scale, by developing a software framework to quantify the enrichment of microbial metabolisms in complex metagenomes as a function of microbial diversity.”

      And in the last paragraph of the introduction:

      “Here we implemented a high-throughput, taxonomy-independent strategy to estimate metabolic capabilities of microbial communities directly from metagenomes…”

      Figure 1 - a nice summary, but no data is shown to support the validity of this model. Consider shrinking the cartoon and adding validation with simulated datasets.

      We hope we have addressed this recommendation with the extensive validation efforts summarized above.

      Line 134 - need to state the FDR and effect size cutoffs used.

      We have reworded this sentence as follows to clarify which thresholds were used:

      “We identified significantly enriched modules using an FDR-adjusted p-value threshold of p < 2e-10 and an effect size threshold of > 0.12 from a Wilcoxon Rank Sum Test comparing IBD and healthy samples.”

      I'm also concerned about the simple comparison of IBD to healthy without adjusting for confounders like study, geographical location, age, sex, drug use, diet, etc. More text is needed to explain the nature of these data, how much metadata is available, and which other variables distinguish IBD from healthy.

      The reviewer is correct that there is a large amount of interindividual variation between samples due to host and environmental factors. However, the lack of adjusting for confounders was intentional, and in fact one of the critical strengths of our study. We observe a clear signal between healthy individuals and individuals diagnosed with IBD, despite the amount of interindividual variation in our diverse set of samples from 13 different studies (details of which are summarized in Supplementary Table 1). The clear increase in predicted metabolic capacity that we consistently observe in IBD patients using both metagenomes and genomes across diverse cohorts points to metabolic independence as a high-level trend that is predictive of microbial prevalence in stressed gut environments irrespective of host factors.

      Line 145 - calling PPCN normalization an "essential step" is a huge claim and requires a lot more data to back it up. Might be best to qualify this statement.

      We hope we have addressed this recommendation with our validation efforts. Supplementary Figures 18 and 19 in particular show evidence for the necessity of the normalization step. It is indeed an essential step if the purpose is to compare metabolic enrichment between cohorts of highly different microbial diversity.

      Figure 2a - the use of a 1:1 trend line seems potentially misleading. I would replace it with a best-fit line.

      Our purpose here was not to show the best fit. Instead, the 1:1 trend line separates the modules based on their relative abundance distribution between healthy individuals and individuals diagnosed with IBD. If the module is to the left of the line, it has a higher median copy number in healthy individuals and if the module is to the right, it has a higher median copy number in individuals with IBD. The line also helps to demonstrate the shift that occurs between the unnormalized data in Figure 2a. Without the normalization, more modules occur to the left of the

      1/1 line as a result of the higher raw copy numbers in healthy metagenomes which simply contain more microbial populations. With the normalization (Figure 2d), more modules fall on the right side of the 1/1 line due to higher PPCN values. A best-fit line would not serve well for these purposes.

      The text should be revised to state that this analysis actually did find many significant differences and to discuss whether they were the same modules identified in Figure 2d.

      We apologize for the confusion and thank the reviewer for bringing this issue to our attention. As mentioned above, the disparate levels of microbial diversity between healthy individuals and individuals with IBD resulted in much larger copy numbers of metabolic modules in healthy samples reflecting the often much larger communities. Hence, we ran statistical tests only on normalized (PPCN) data. The p-values associated with each module in Figure 2a, as well as the colors of each point, are based on the PPCN data in Figure 2d. We aimed to improve the clarity of the visual comparison between normalized and unnormalized results by identifying the same set of IBD-enriched modules in plots a-c and plots d-f.

      That being said, the reviewer’s comment made us realize the potential for confusion when using the normalized data’s statistical results in Figure 2a that otherwise shows results from unnormalized data. We have now run the same statistical test on the unnormalized (raw copy number) data and re-generated Figure 2a with the new FDR-adjusted p-values and points colored based on the statistical tests using unnormalized data. We’ve also removed the arrow connecting to Figure 2b (since we no longer show the same set of IBD-enriched modules in Figures 2a and 2b), and added a dashed line to indicate the effect size threshold (similar to the one in Figure 2d). We have updated the legend for Figure 2a-d to reflect these changes:

      When we used the same p-value threshold (p < 2e-10) as before and also filtered for an effect size larger than the mean (the same strategy used to set our effect size threshold for the normalized data), there are 10 modules that are significantly enriched based on the unnormalized data. Of course, it is difficult to gauge the relevance of these 10 modules to microbial fitness in the IBD gut environment since their raw copy numbers do not tell us anything about the relative proportion of community members that harbor these modules. Therefore, we are reluctant to add these modules to the results text. For the record, only 3 of those modules were also significantly enriched based on the normalized PPCN values: M00010 (Citrate cycle, first carbon oxidation), M00053 (Pyrimidine deoxyribonucleotide biosynthesis), and M00121 (Heme biosynthesis).

      Figure 2c,f - these panels raise a lot of concerns given that the choice of method inverts the trend. Without additional data/validation, it's hard to know which method is right.

      We hope we have addressed this recommendation with the extensive validation efforts summarized above. Inversion of the trend is an expected outcome, because the raw copy numbers of most metabolic modules are much lower in the IBD sample group due to lower community sizes.

      Line 167 - Need to take the KEGG names with a grain of salt, just because it says "biosynthesis" doesn't mean that the pathway goes in that direction in your bacterium of interest.

      We believe the reviewer is under a misapprehension regarding the general reversibility of KEGG metabolic modules, or indeed of metabolic pathways. Most metabolic pathways have one or several (practically) irreversible reactions. To demonstrate this for the 33 IBD-enriched modules, we evaluated their reversibility based upon their corresponding KEGG Pathway Maps, which indicate reaction reversibility via double-sided arrows. Aside from the signature modules M00705 and M00627, in 26 out of 31 pathway modules one or more irreversible reactions render these pathways one-directional. Indeed, on average the majority (54%) of the reactions in a given module are irreversible. When focusing on the 23 “biosynthesis” modules, 22 out of 23 (96%) modules have at least one irreversible reaction, and on average 64% of a given module’s reactions are irreversible. These data (which can be accessed at doi:10.6084/m9.figshare.27203226 for the reviewer’s convenience) challenge the reviewer’s notion that pathway directionality is free to change arbitrarily, since the presence of even one irreversible reaction effectively blocks the flux in the opposing direction. Thus, “biosynthesis” is indeed a meaningful term in KEGG module names.

      That said, KEGG Pathway Maps, though highly curated, are likely not the final word on whether a given reaction in a metabolic pathway can be considered reversible or irreversible in each microbial population and under all conditions. And our analysis, like many others that rely on metagenomic data, does not consider the environmental conditions in the gut such as temperature or metabolite concentrations that might influence the Gibbs free energy and thus the directionality of these reactions in vivo. However, even assuming general reversibility of metabolic pathways, this would not invalidate the fact that these microbes have the metabolic capacity to synthesize the respective molecules. In other words, the potential reversibility of pathways is irrelevant to our analysis since we are describing metabolic potential. The lac operon in E. coli might only be expressed in the absence of glucose, but E. coli always has the capability to degrade lactose regardless of whether that pathway is active. Thus, our overall conclusion that gut microbes associated with IBD are metabolically self-sufficient (encoding the enzymatic capability to synthesize certain key metabolites) remains valid irrespective of fixed or flexible pathway directionality.

      It's also important to be careful not to conflate KEGG modules (small subsets of a pathway) with the actual metabolic pathway. It's possible to have a module change in abundance while not altering the full pathway. Inspection of the individual genes could help in this respect - are they rate-limiting steps for biosynthesis or catabolism?

      The reviewer is absolutely correct that KEGG modules do not necessarily represent full pathways. We have updated the language in our manuscript to explicitly refer to “modules” rather than “pathways” whenever appropriate, to restrict the scope of the analysis to metabolic modules rather than full pathways.

      That said, we do not see how “inspection of individual genes” would improve our analysis. The strength of looking at complete modules rather than individual genes is that we can gain conclusive insights into a certain metabolic capacity. Of course, no pathway or module stands alone. However, the enrichment of metabolic modules does conclusively indicate that these modules are beneficial under the given conditions, such as stress caused by inflammation or antibiotic use. Whether a certain step in a module or pathway is rate limiting is completely irrelevant for this analysis.

      Line 177 - I'm not a big fan of the HMI acronym. Is there a LMI group? It seems simplistic to lump all of metabolism into dependent or independent, which in reality will differ depending on the specific substrate, the growth condition, and the strain.

      While we are sorry that our study failed to provide the reviewer with a term they could be a fan of, their input did not change our view that HMI, an acronym we have adapted from a previously peer-reviewed study (doi:10.1186/s13059-023-02924-x), is a powerfully simplistic means to describe a phenomenon we observe and demonstrate in multiple different ways with our extensive analyses. The argument that HMI or LMI status will differ given the growth condition, substrate availability, or strain differences is not helping this case either: our analyses cut across a large number of humans and naturally occurring microbial systems in their guts that are exposed to largely variable ‘growth conditions’ and ‘substrates’ and composed of many strain variants of similar populations. Yet, we observe a clear role for HMI despite all these differences. Perhaps it is because HMI simply describes a higher metabolic capacity based on a defined subset of largely biosynthetic pathways that we observe to be consistently enriched in a large dataset covering a large variety of host, environmental and diet factors and indicates that a population has a higher metabolic capacity to not rely on ecosystem services. We show in our analysis that in the inflamed gut these capacities are indeed required, which is why HMI populations are enriched in IBD samples. HMI has no relation to any of the constraints mentioned by the reviewer, which is one of the major strengths of this metric.

      Line 198 - It seems like a big assumption to state that efflux and drug resistance are unrelated to biosynthesis, as they could be genetically or even phenotypically linked.

      We agree with the reviewer and are thankful for their input. We have weakened the assertion in this statement.

      “These capacities may provide an advantage since antibiotics are a common treatment for IBDs (Nitzan et al. 2016), but are not necessarily related to the systematic enrichment of biosynthesis modules that likely provide resilience to general environmental stress rather than to a specific stressor such as antibiotics.”

      Lines 202-218 - I'd suggest removing this paragraph. The "non-IBD" data introduces even more complications to the meta-analysis and seems irrelevant to the current study.

      We thank the reviewer for this suggestion. Non-IBD data is important, but its relevance to the primary aims of the study is indeed negligible. We now have moved this paragraph to Supplementary File 1 (under the section “‘Non-IBD’ samples are intermediate to IBD and healthy samples”).

      The health gradient is particularly problematic, putting cancer closer to healthy than IBD.

      We took the reviewer’s advice and have swapped the order of the studies in Supplementary Figure 6 to place the cancer samples from Feng et al. closer to the IBD samples, on the other side of the non-IBD samples from the IBD studies.

      Lines 235-257 - should trim this down and move to the discussion.

      As mentioned above, we have opted for a “Results and Discussion format” for our manuscript, so we believe this discussion is in the correct place. We find it important to clearly highlight the limitations and potential biases of our work and trimming this text would take away from that goal.

      Figure 3 - panels are out of order. Need to put the current panel D below current panel C. Also, relabel panel letters to go top to bottom (the bottom panel should be D). Could change current panel 3D to a violin plot to match current 3C.

      We have updated Figure 3 by converting panel A into a new supplementary figure (Supplementary Figure 8), moving panels C and D below panel B, and relabeling the panels accordingly.

      Figure 3B - this panel was incredibly useful and quite surprising to me in many respects. I would have assumed that the Bacteroides would be in the "HMI" bin. Is this a function of the specific strains included here? Was B. theta or B. fragilis included?

      The reviewer makes an excellent observation that has been keeping us awake at night, yet somehow was not appropriately discussed in the text until their input. We are very thankful for their attention to detail here.

      It is indeed true that Bacteroides genomes are often detected with increased abundance in individuals with IBD and likely have a survival advantage in the IBD gut environment, Bacteroides fragilis and Bacteroides thetaiotaomicron being some of the most dominant residents of the IBD gut. Their non-HMI status is not a function of which strains were included, since all taxa here are represented by the representative genomes available in the publicly available Genome Taxonomy Database. Their non-HMI status comes from the fact that they have HMI scores of around 24 to 26, which fall slightly below the threshold score of 26.4 that we used to classify genomes as HMI. This threshold is back-calculated from the metabolic completion requirement of at least 80% average completion of all 33 metabolic modules that are significantly enriched in IBD. So these genomes are right there at the edge, but not quite over it.

      Thanks to this comment by our reviewer, we started wondering whether we should follow a more ‘literature-driven’ approach to set the threshold for HMI, rather than the 80% cutoff, and in fact attempted to lower the HMI score threshold to see if we could include more of the IBD-associated Bacteroides in the HMI bin. Author response table 1 below shows the relevant subset of our new Supplementary Table 3h, which describes the data from our tests on different thresholds.

      Author response table 1.

      Number and proportion of Bacteroides genomes classified as HMI at each HMI score threshold. There were 20 total Bacteroides genomes in the set of 338 gut microbes identified from the GTDB. The HMI score is computed by adding the percent completeness of all 33 IBD-enriched KEGG modules. The full table can be viewed in Supplementary Table 3h.

      Lowering the threshold to 24.75, which corresponds to an average of 75% completeness in the 33 IBD-enriched modules, enabled the classification of 6 Bacteroides genomes as HMI, including B. fragilis, B. intestinalis, B. theta, and B. faecis. However, it also identified several microbes that are not IBD-associated as HMI, including 75 genomes from the Lachnospiraceae family and 18 genomes from the Ruminococcaceae family. In the latter family, several Faecalibacterium genomes, including 10 representatives of Faecalibacterium prausnitzii, were considered HMI using this threshold. These microbes are empirically known to decrease in abundance during inflammatory gastrointestinal conditions (doi:10.3390/microorganisms8040573, doi:10.1093/femsre/fuad039), and therefore these genomes should not be considered HMI – at least not under the working definition of HMI used in our study. To avoid including such a large number of obvious false positives in the HMI bin, we decided to maintain a higher threshold despite the exclusion of Bacteroides genomes.

      This outcome demonstrates that our reductionist approach does not successfully capture every microbial population that is associated with IBD. Nevertheless, and in our opinion very surprisingly, the metric does capture a very large proportion of genomes with increased detection and abundance in IBD samples, as demonstrated by the peaks of detection/abundance that match to HMI status Author response image 1.

      Author response image 1.

      Screenshots of Figure 3 that demonstrate the overlapping signal between HMI status and genome detection/abundance in IBD.

      Furthermore, the violin plots in Figure 3B (formerly Figure 3C) clearly reflect the increased representation of HMI populations in IBD metagenomes. Although our classification method is imperfect, it still demonstrates the predictive power of metabolic competencies in identifying which microbes will survive in stressful gut environments. To ensure that readers recognize the crude nature of this classification strategy and the possibility that high metabolic independence can be achieved in different ways, we have added the following sentences to the relevant section of our manuscript:

      “Given the number of ways a genome can pass or fail this threshold, this arbitrary cut-off has significant shortcomings, which was demonstrated by the fact that several species in the Bacteroides group were not classified as HMI despite their frequent dominance of the gut microbiome of individuals with IBD (Saitoh et al. 2002; Wexler 2007; Vineis et al. 2016) (Supplementary File 1). That said, the genomes that were classified as HMI by this approach were consistently higher in their detection and abundance in IBD samples (Figure 3a). It is likely that there are multiple ways to have high metabolic independence which are not fully captured by the 33 IBD-enriched metabolic modules identified in this study.”

      We have also included a discussion of these findings in Supplementary Information File 1 (see section “Examining the impact of different HMI score thresholds on genome-level results”).

      This panel also makes it clear that many of these modules are widespread in all genomes and thus unlikely to meaningfully differ in the microbiome. It would be interesting to use this type of analysis to identify a subset of KEGG modules with high variability between strains.

      The figure makes it ‘look like’ many of these modules are widespread in all genomes and thus unlikely to meaningfully differ in the microbiome, but our quantitative analyses clearly demonstrate that these modules indeed differ meaningfully between microbiomes of healthy individuals and those diagnosed with IBD. For instance, the classifier that we built relying exclusively upon these modules’ PPCN values was able to reliably distinguish between the healthy and IBD sample groups in our dataset. The fact that the differentiating signal does not rely on rare metabolic or signature modules is what makes the classifier powerful enough to differentiate between “healthy” and “stressed” microbiomes in 86% of cases. Modules that are by nature less common could not serve this purpose. That said, we do agree with the reviewer that it might be interesting to study variability of KEGG modules as a function of variability between strains. This does not fall into the scope of this work, but we hope to assist others with the technical aspects of such work.

      Considering the entirety of the exchange in this section, perhaps there is a broader discussion to be had around this topic. In retrospect, not being able to perfectly split microbes into two groups that completely recapitulate their enrichment in healthy or IBD samples by a crude metric and an arbitrary threshold is not surprising at all. What is surprising is that such a crude metric in fact works for the vast majority of microbes and predicts their increased presence in the IBD gut by only considering their genetic make up. In some respects, we believe that the inability of this cutoff to propose a perfect classifier is similar to the limited power of metabolic independence concept and the classes of HMI or LMI to capture and fully explain microbial fitness in health and disease. What is again surprising here is that these almost offensively simple classes do capture more than what one would expect. We can envision a few ways to implement a more sophisticated HMI/LMI classifier, and it is certainly an important task that is achievable. However, we are hopeful that this technical work can also be done better by others in our field, and that step forward, along with further scrutinizing the relevance of HMI/LMI classes to understand metabolic factors that contribute to the biodiversity of stressful environments, will have to remain as future work.

      We thank the reviewer again for their comment here and pushing us to think more carefully and address the oddity regarding the poor representation of Bacteroides as HMI by our cutoff.

      Given that a lot of the gaps are in the Firmicutes, this panel also makes me more concerned about annotation bias. How many of these gaps are real?

      Analyses relying on gene annotations all suffer equally from the potential for missannotation or missing annotations, which primarily result from limitations in our reference databases for functional data. For instance, the Hidden Markov models for microbial genes in the KEGG Ortholog database are generated from a curated set of gene sequences primarily originating from cultivable microorganisms and particularly from commonly-used model organisms; hence, they do not capture the full extent of sequence diversity observed in populations that are less well-represented in reference databases – a category which includes several Firmicutes, as the reviewer points out. For KEGG KOfams in particular, the precomputed bit score thresholds for distinguishing between ‘good’ and ‘bad’ matches to a given model are often too stringent to enable annotation of genes that are just slightly too divergent from the set of known sequences, thus resulting in missing annotations. Based on our experience with these sorts of issues, we implemented a heuristic that reduces the number of missing annotations for KOs and captures significantly more homologs than other state-of-the-art approaches, as described in doi:10.1101/2024.07.03.601779. We refer the reviewer to our response to the related public comment about annotation bias above, which includes additional details about our investigations of annotation bias in our data. In comparison to the current standard, the heuristic we implemented improves functional annotation results. However, neither our nor any other bioinformatic study that relies on functional gene annotation can exclude the potential for annotation bias.

      Figure 3B plotting issues - need to use the full names of the modules; for example, M00844 is "arginine biosynthesis, ornithine => arginine", which changes the interpretation. Need a key for the heatmap on the figure. The tree is difficult to see, needs a darker font.

      We have darkened the lines of the tree and dendrogram, and added a legend for the heatmap gradient (see new version of Figure 3 above). Unfortunately, we could not fit the full names of the modules into the figure due to space constraints. However, the full module name and other relevant information can be found in Supplementary Table 2a, and the matrix of pathway completeness scores in these genomes (e.g., the values plotted in the heatmap) can be found in Supplementary Table 3b. We are not sure what the reviewer refers to when stating that “for example, M00844 is "arginine biosynthesis, ornithine => arginine", which changes the interpretation”. There is no ambiguity regarding the identity of KEGG module M00844, which is arginine biosynthesis from ornithine.

      Line 321 - more justification for the 80% cutoff is needed along with a sensitivity analysis to see if this choice matters for the key results.

      Inspired by this comment, and the one above regarding the classification of Bacteroides genomes, we tested several HMI score thresholds ranging from 75% to 85% average completeness of the 33 IBD-enriched modules. For each threshold, we computed all the key statistics reported in this section of our paper, including the statistical tests. We found that the choice of HMI score threshold does not influence the overall conclusions drawn in this section of our manuscript. Author response table 2 below shows the relevant subset of our new Supplementary Table 3h, which describes the results for each threshold:

      Author response table 2.

      Key genome-level results at each HMI score threshold. The HMI score is computed by adding the percent completeness of all 33 IBD-enriched KEGG modules. WRS – Wilcoxon Rank Sum test; KW – Kruskal-Wallis test. The full table can be viewed in Supplementary Table 3h

      We’ve summarized these findings in a new section of Supplementary File 1 entitled “Examining the impact of different HMI score thresholds on genome-level results”. We copy below the relevant text for the reviewer’s convenience:

      “Determining the HMI status of a given genome required us to set a threshold for the HMI score above which a genome would be considered to have high metabolic independence. We tested several different thresholds by varying the average percent completeness of the 33 IBD-enriched metabolic modules that we expected from the

      ‘HMI’ genomes from ≥ 75% (corresponding to an HMI score of ≥ 24.75) to ≥ 85% (corresponding to an HMI score of ≥ 28.05). For each threshold, we computed the same statistics and ran the same statistical tests as those reported in our main manuscript to assess the impact of these thresholds on the results (Supplementary Table 3h). At the highest threshold we tested (HMI score ≥ 28.05), a small proportion of the reference genomes (7%, or n = 24) were classified as HMI, so we did not test higher thresholds.

      We found that the results from comparing HMI genomes to non-HMI genomes are similar regardless of which HMI score threshold is used to classify genomes into either group. No matter which HMI score threshold was used, the mean genome size and mean number of genes were higher for HMI genomes than for non-HMI genomes. On average, the HMI genomes were about 1 Mb larger and had 1,032 more gene calls than non-HMI genomes. We ran two Wilcoxon Rank Sum statistical tests to assess the following null hypotheses: (1) HMI genomes do not have higher detection in IBD samples than non-HMI genomes, and (2) HMI genomes do not have higher detection in healthy samples than non-HMI genomes. For both tests, the p-values decreased (grew more significant) as the HMI score threshold decreased due to the inclusion of more genomes in the HMI bin. The first test for higher detection of HMI genomes than non-HMI genomes in IBD samples yielded p-values less than α = 0.05 at all HMI score thresholds. The second test for higher detection of HMI genomes than non-HMI genomes in healthy samples yielded p-values less than α = 0.05 for the three lowest HMI score thresholds (HMI score ≥ 24.75, ≥ 25.08, or ≥ 25.41). However, irrespective of significance threshold and HMI score threshold, there was always far stronger evidence to reject the first null hypothesis than the second, given that the p-value for the first test in IBD samples was 1 to 5 orders of magnitude lower (more significant) than the p-value for the second test in healthy samples.

      IBD samples harbored a significantly higher fraction of genomes classified as HMI than healthy or non-IBD samples, regardless of HMI score threshold (p < 1e-15, Kruskal-Wallis Rank Sum test). The p-values for this test increased (grew less significant) as the HMI score threshold decreased. This suggests that, at higher thresholds, relatively more genomes drop out of the HMI fraction in healthy/non-IBD samples than in IBD samples, thereby leading to larger differences and more significant p-values. Consequently, the HMI scores of genomes detected in IBD samples must be higher than the HMI scores of genomes detected in the other sample groups – indeed, the average HMI score of genomes detected within at least one IBD sample is 24.75, while the average score of genomes detected within at least one healthy sample is 22.78. Within a given sample, the mean HMI score of genomes detected within that sample is higher for the IBD group than in the healthy group: the average per-sample mean HMI score is 25.14 across IBD samples compared to the average of 23.00 across healthy samples.”

      Lines 357 and 454 - I would remove the discussion of the "gut environment" which isn't really addressed here. The observed trends could just as easily relate to microbial interactions or the effects of diet and pharmaceuticals. Perhaps the issue is the vague nature of this term, which I read to imply changes in the mammalian host. Given the level of evidence, I'd opt to keep the options open and discuss what additional data would help resolve these questions.

      We are in complete agreement with the reviewer that microbial interactions are likely an important driver of our observations. In healthy communities, microbial cross-feeding enables microbes with lower metabolic independence to establish and increase microbial diversity. Which is exactly why we are stating that “Community-level signal translates to individual microbial populations and provides insights into the microbial ecology of stressed gut environments”.

      Diet or usage of prescription drugs on the other hand, as discussed previously, likely varies substantially over the various cohorts investigated, and is thus not a driver of the observed trends. Instead, HMI works as a high level indicator that is not influenced by these variable host habits.

      Lines 354-394 - Could remove or dramatically trim down this text. Too much discussion for a results section.

      We kindly remind the reviewer that our manuscript is written following a “Results and Discussion” format. This section provides necessary context and justification for our classifier implementation, so we have left it as-is.

      Lines 395-441 - This section raised a lot of issues and could be qualified or even removed. The model was trained on modules that were IBD-associated in the same dataset, so it's not surprising that it worked. An independent test set would be required to see if this model has any broader utility.

      The point that we selected the IBD-enriched modules as features should not raise any concerns, as these modules would have emerged as the most important (ie, most highly weighted) features in our model even if we had included all modules in our training data. This is because machine learning classifiers by design pick out the features that best distinguish between classes, and the 33 IBD-associated modules are a selective subset of these (if they were not, they would not have been significantly enriched in the IBD sample group). That said, a carefully conducted feature selection process prior to model training is a standard best-practice in machine learning; thus, if anything, this should be interpreted as a point of confidence rather than a concern. Furthermore, we evaluated our model using cross-validation, a standard practice in the machine learning field that assesses the stability of model performance by training and testing the model on different subsets of the data. This effort established that the model is robust across different inputs as demonstrated by the per-fold confusion matrix and the ROC curve. These are all standard approaches in machine learning to quantify the model tradeoff between bias and variance. As for the independent test set, we went far and beyond, and applied our model to the antibiotic time-series dataset described later in this section, which, in our opinion, and likely also in the opinion of many experts, serves as one of the most convincing ways to test the utility of any model. Classification results here show that our hypothesis concerning the relevance of metabolic independence to microbial survival in stressed gut environments applies beyond the IBD case and includes antibiotic use, which is indeed a stronger validation for this hypothesis than any test we could have done on other IBD-related datasets. Regardless, we agree that any ‘broader’ utility of our model, such as its applications in clinical settings for diagnostic purposes, is something we certainly can not make strong claims about without more data. We have therefore qualified this section by adding the following sentence:

      “Determining whether such a model has broader utility as a diagnostic tool requires further research and validation; however, these results demonstrate the potential of HMI as an accessible diagnostic marker of IBD.”

      The application to the antibiotic intervention data raises additional concerns, as the model will predict IBD (labeled "stress" in Figure 5) where none exists.

      We apologize for this misunderstanding. The label “stress” actually means stress, not IBD. The figure the reviewer is referring to demonstrates that metabolic modules enriched in the gut microbiome of IBD patients are also temporarily enriched in the gut microbiome of healthy individuals treated with antibiotics for the duration of the treatment. While the classifier uses PPCN values for 33 metabolic modules enriched in microbiomes of IBD patients, it does not mean that this enrichment is exclusive to IBD. The classifier will distinguish between metagenomes in which the PPCN values for those 33 metabolic modules is higher and metagenomes in which the PPCN values are lower. Hence, our analysis demonstrates that during antibiotic usage in healthy individuals, the PPCN values of these 33 metabolic modules spike in a similar fashion to how they would in the gut community of a person with IBD. This points to a more general trend of high metabolic independence as a factor supporting microbial survival in conditions of stress; that is, the increase in metabolic independence is not specific to the IBD condition but rather a more generic ecological response to perturbations in the gut microbial community. We have clarified this point with the following addition to the paragraph summarizing these results:

      “All pre-treatment samples were classified as ‘healthy’ followed by a decline in the proportion of ‘healthy’ samples to a minimum 8 days post-treatment, and a gradual increase until 180 days post treatment, when over 90% of samples were classified as ‘healthy’ (Figure 5, Supplementary Table 4b). In other words, the increase in the HMI metric serves as an indicator of stress in the gut microbiome, regardless of whether that stress arises from the IBD condition or the application of antibiotics. These observations support the role of HMI as an ecological driver of microbial resilience during gut stress caused by a variety of environmental perturbations and demonstrate its diagnostic power in reflecting gut microbiome state.”

      We’ve also added the following sentence to the end of the legend for Figure 5:

      “Samples classified as ‘healthy’ by the model were considered to have ‘no stress’ (blue), while samples classified as ‘IBD’ were considered to be under ‘stress’ (red).”

      Figure S5A - should probably split this into 2 graphs since different data is analyzed.

      It is true that different sets of modules are used in either half of the figure; however, there is a significant amount of overlap between the sets (17 modules), which is why there are lines connecting the points for the same module as described in the figure legend. We are using this figure to make the point that the median PPCN value of each module increases, in both sets of modules, from the healthy sample group to the IBD sample group. Therefore, we believe the current presentation is appropriate.

      Figure S6A – this shows a substantial study effect and raises concerns about reproducibility.

      We examined potential batch effects in Supplementary Information File 1 (see section “Considerations of Batch Effect”), and found that any study effect was minor and overcome by the signal between groups:

      “The similar distribution of the median normalized copy number for each of the 33 IBD-enriched metabolic modules (summarized across all samples within a given study), across all studies within a given sample group (Supplementary Figure 6b), confirms that the sample group explains more of the trend than the study of origin.”

      Furthermore, within Supplementary Figure 6a, there is a clear increase between the non-IBD controls from Franzosa et al. 2018 and the IBD samples from the same study, as well as between the non-IBD controls from Schirmir et al. 2018 and the IBD samples from that study. As there is no study effect influencing those two comparisons, this reinforces the evidence that there is a true increase in the normalized copy numbers of these modules when comparing samples from more healthy individuals to those from less healthy individuals.

      Figure S7B - check numbers, which I think should sum to 33.

      The numbers should not sum to 33. In this test to determine whether the two largest studies had excessive influence on the identity of the IBD-enriched modules, we repeated our strategy to obtain 33 IBD-enriched modules (those with the 33 smallest p-values from the statistical test) from each set of samples – either (1) samples from Le Chatelier et al. 2013 and Vineis et al. 2016, or (2) samples that are not from those two studies. The 2 sets, containing 33 modules each, gives us a total of 66 IBD-enriched modules. By comparing those two sets, we found that 20 modules were present in both sets – hence the value of 20 in the center of the Venn Diagram. In each set, 13 modules were unique – hence the value of 13 on either side. 13 + 13 + 2*20 = 66 total modules.

      We again thank our reviewers for their time and interest, and invaluable input.

    1. Author response:

      ANALYTICAL

      (1) Figure 3 shows that the relationship between learning rate and informativeness for our rats was very similar to that shown with pigeons by Gibbon and Balsam (1981). We used multiple criteria to establish the number of trials to learn in our data, with the goal of demonstrating that the correspondence between the data sets was robust. To establish that they are effectively the same does require using an equivalent decision criterion for our data as was used for Gibbon and Balsam’s data. However, the criterion they used—at least one peck at the response key on at least 3 out of 4 consecutive trials—cannot be sensibly applied to our magazine entry data because rats make magazine entries during the inter-trial interval (whereas pigeons do not peck at the response key in the inter-trial interval). Therefore, evidence for conditioning in our paradigm must involve comparison between the response rate during CS and the baseline response rate. There are two ways one could adapt the Gibbon and Balsam criterion to our data. One way is to use a non-parametric signed rank test for evidence that the CS response rate exceeds the pre-CS response rate, and adopting a statistical criterion equivalent to Gibbon and Balsam’s 3-out-of-4 consecutive trials (p<.3125). The second method estimates the nDkl for the criterion used by Gibbon and Balsam. This could be done by assuming there are no responses in the inter-trial interval and a response probability of at least 0.75 during the CS (their criterion). This would correspond to an nDkl of 2.2 (odds ratio 27:1). The obtained nDkl could then be applied to our data to identify when the distribution of CS response rates has diverged by an equivalent amount from the distribution of pre-CS response rates.

      (2) A single regression line, as shown in Figure 6, is the simplest possible model of the relationship between response rate and reinforcement rate and it explains approximately 80% of the variance in response rate. Fixing the log-log slope at 1 yields the maximally simple model. (This regression is done in the logarithmic domain to satisfy the homoscedasticity assumption.) When transformed into the linear domain, this model assumes a truly scalar relation (linear, intercept at the origin) and assumes the same scale factor and the same scalar variability in response rates for both sets of data (ITI and CS). Our plot supports such a model. Its simplicity is its own motivation (Occam’s razor).

      If regression lines are fitted to the CS and ITI data separately, there is a small increase in explained variance (R2 = 0.82). We leave it to further research to determine whether such a complex model, with 4 parameters, is required. However, we do not think the present data warrant comparing the simplest possible model, with one parameter, to any more complex model for the following reasons:

      · When a brain—or any other machine—maps an observed (input) rate to a rate it produces (output rate), there is always an implicit scalar. In the special case where the produced rate equals the observed rate, the implicit scalar has value 1. Thus, there cannot be a simpler model than the one we propose, which is, in and of itself, interesting.

      · The present case is an intuitively accessible example of why the MDL (Minimum Description Length) approach to model complexity (Barron, Rissanen, & Yu, 1998; Grünwald, Myung, & Pitt, 2005; Rissanen, 1999) can yield a very different conclusion from the conclusion reached using the Bayesian Information Criterion (BIC) approach. The MDL approach measures the complexity of a model when given N data specified with precision of B bits per datum by computing (or approximating) the sum of the maximum-likelihoods of the model’s fits to all possible sets of N data with B precision per datum. The greater the sum over the maximum likelihoods, the more complex the model, that is, the greater its measured wiggle room, it’s capacity to fit data. Recall that von Neuman remarked to Fermi that with 4 parameters he could fit an elephant. His deeper point was that multi-parameter models bring neither insight nor predictive power; they explain only post-hoc, after one has adjusted their parameters in the light of the data. For realistic data sets like ours, the sums of maximum likelihoods are finite but astronomical. However, just as the Sterling approximation allows one to work with astronomical factorials, it has proved possible to develop readily computable approximations to these sums, which can be used to take model complexity into account when comparing models. Proponents of the MDL approach point out that the BIC is inadequate because models with the same number of parameters can have very different amounts of wiggle room. A standard illustration of this point is the contrast between logarithmic model and power-function model. Log regressions must be concave; whereas power function regressions can be concave, linear, or convex—yet they have the same number of parameters (one or two, depending on whether one counts the scale parameter that is always implicit). The MDL approach captures this difference in complexity because it measures wiggle room; the BIC approach does not, because it only counts parameters.

      · In the present case, one is comparing a model with no pivot and no vertical displacement at the boundary between the black dots and the red dots (the 1-parameter unilinear model) to a bilinear model that allows both a change in slope and a vertical displacement for both lines. The 4-parameter model is superior if we use the BIC to take model complexity into account. However, 4-parameter has ludicrously more wiggle room. It will provide excellent fits—high maximum likelihood—to data sets in which the red points have slope > 1, slope 0, or slope < 0 and in which it is also true that the intercept for the red points lies well below or well above the black points (non-overlap in the marginal distribution of the red and black data). The 1-parameter model, on the other hand, will provide terrible fits to all such data (very low maximum likelihoods). Thus, we believe the BIC does not properly capture the immense actual difference in the complexity between the 1-parameter model (unilinear with slope 1) to the 4-parameter model (bilinear with neither the slope nor the intercept fixed in the linear domain).

      · In any event, because the pivot (change in slope between black and red data sets), if any, is small and likewise for the displacement (vertical change), it suffices for now to know that the variance captured by the 1-parameter model is only marginally improved by adding three more parameters. Researchers using the properly corrected measured rate of head poking to measure the rate of reinforcement a subject expects can therefore assume that they have an approximately scalar measure of the subject’s expectation. Given our data, they won’t be far wrong even near the extremes of the values commonly used for rates of reinforcement. That is a major advance in current thinking, with strong implications for formal models of associative learning. It implies that the performance function that maps from the neurobiological realization of the subject’s expectation is not an unknown function. On the contrary, it’s the simplest possible function, the scalar function. That is a powerful constraint on brain-behavior linkage hypotheses, such as the many hypothesized relations between mesolimbic dopamine activity and the expectation that drives responding in Pavlovian conditioning (Berridge, 2012; Jeong et al., 2022; Y.  Niv, Daw, Joel, & Dayan, 2007; Y. Niv & Schoenbaum, 2008).

      The data in Figure 6 are taken from the last 5 sessions of training. The exact number of sessions was somewhat arbitrary but was chosen to meet two goals: (1) to capture asymptotic responding, which is why we restricted this to the end of the training, and (2) to obtain a sufficiently large sample of data to estimate reliably each rat’s response rate. We have checked what the data look like using the last 10 sessions, and can confirm it makes very little difference to the results.<br /> Finally, as noted by the reviews, the relationship between the contextual rate of reinforcement and ITI responding should also be evident if we had measured context responding prior to introducing the CS. However, there was no period in our experiment when rats were given unsignalled reinforcement (such as is done during “magazine training” in some experiments). Therefore, we could not measure responding based on contextual conditioning prior to the introduction of the CS. This is a question for future experiments that use an extended period of magazine training or “poor positive” protocols in which there are reinforcements during the ITIs as well as during the CSs. The learning rate equation has been shown to predict reinforcements to acquisition in the poor-positive case (Balsam, Fairhurst, & Gallistel, 2006).

      (3) One of us (CRG) has earlier suggested that responding appears abruptly when the accumulated evidence that the CS reinforcement rate is greater than the contextual rate exceeds a decision threshold (C.R.  Gallistel, Balsam, & Fairhurst, 2004). The new more extensive data require a more nuanced view. Evidence about the manner in which responding changes over the course of training is to some extent dependent on the analytic method used to track those changes. We presented two different approaches. The approach shown in Figures 7 and 8, extending on that developed by Harris (2022), assumes a monotonic increase in response rate and uses the slope of the cumulative response rate to identify when responding exceeds particular milestones (percentiles of the asymptotic response rate). This analysis suggests a steady rise in responding over trials. Within our theoretical model, this might reflect an increase in the animal’s certainty about the CS reinforcement rate with accumulated evidence from each trial. While this method should be able to distinguish between a gradual change and a single abrupt change in responding (Harris, 2022) it may not distinguish between a gradual change and multiple step-like changes in responding and cannot account for decreases in response rate.<br /> The other analytic method we used relies on the information theoretic measure of divergence, the nDkl (Gallistel & Latham, 2023), to identify each point of change (up or down) in the response record. With that method, we discern three trends. First, the onset tends to be abrupt in that the initial step up is often large (an increase in response rate by 50% or more of the difference between its initial value and its terminal value is common and there are instances where the initial step is to the terminal rate or higher). Second, there is marked within-subject variability in the response rate, characterised by large steps up and down in the parsed response rates following the initial step up, but this variability tends to decrease with further training (there tend to be fewer and smaller steps in both the ITI response rates and the CS response rate as training progresses). Third, the overall trend, seen most clearly when one averages across subjects within groups is to a moderately higher rate of responding later in training than after the initial rise. We think that the first tendency reflects an underlying decision process whose latency is controlled by diminishing uncertainty about the two reinforcement rates and hence about their ratio. We think that decreasing uncertainty about the true values of the estimated rates of reinforcement is also likely to be an important part of the explanation for the second tendency (decreasing within-subject variation in response rates). It is less clear whether diminishing uncertainty can explain the trend toward a somewhat greater difference in the two response rates as conditioning progresses. It is perhaps worth noting that the distribution of the estimates of the informativeness ratio is likely to be heavy tailed and have peculiar properties (as witness, for example, the distribution of the ratio of two gamma distributions with arbitrary shape and scale parameters) but we are unable at this time to propound an explanation of the third trend.

      (4) There is an error in the description provided in the text. The pre-CS period used to measure the ITI responding was 10 s rather than 20 s. There was always at least a 5-s gap between the end of the previous trial and the start of the pre-CS period.

      (5) Details about model fitting will be added in a revision. The question about fitting a single model or multiple models to the data in Figure 6 is addressed in response 2 above. In Figure 6, each rat provides 2 behavioural data points (ITI response rate and CS response rate) and 2 values for reinforcement rate (1/C and 1/T). There is a weak but significant correlation between the ITI and CS response rates (r = 0.28, p < 0.01; log transformed to correct for heteroscedasticity). By design, there is no correlation between the log reinforcement rates (r = 0.06, p = .404).

      CONCEPTUAL

      (1) It is important for the field to realize that the RW model cannot be used to explain the results of Rescorla’s (Rescorla, 1966; Rescorla, 1968, 1969) contingency-not-pairing experiments, despite what was claimed by Rescorla and Wagner (Rescorla & Wagner, 1972; Wagner & Rescorla, 1972) and has subsequently been claimed in many modelling papers and in most textbooks and reviews (Dayan & Niv, 2008; Y. Niv & Montague, 2008). Rescorla programmed reinforcements with a Poisson process. The defining property of a Poisson process is its flat hazard function; the reinforcements were equally likely at every moment in time when the process was running. This makes it impossible to say when non-reinforcements occurred and, a fortiori, to count them. The non-reinforcements are causal events in RW algorithm and subsequent versions of it. Their effects on associative strength are essential to the explanations proffered by these models. Non-reinforcements—failures to occur, updates when reinforcement is set to 0, hence also the lambda parameter—can have causal efficacy only when the successes may be predicted to occur at specified times (during “trials”). When reinforcements are programmed by a Poisson process, there are no such times. Attempts to apply the RW formula to reinforcement learning soon foundered on this problem (Gibbon, 1981; Gibbon, Berryman, & Thompson, 1974; Hallam, Grahame, & Miller, 1992; L.J. Hammond, 1980; L. J. Hammond & Paynter, 1983; Scott & Platt, 1985). The enduring popularity of the delta-rule updating equation in reinforcement learning depends on “big-concept” papers that don’t fit models to real data and discretize time into states while claiming to be real-time models (Y. Niv, 2009; Y. Niv, Daw, & Dayan, 2005).

      The information-theoretic approach to associative learning, which sometimes historically travels as RET (rate estimation theory), is unabashedly and inescapably representational. It assumes a temporal map and arithmetic machinery capable in principle of implementing any implementable computation. In short, it assumes a Turing-complete brain. It assumes that whatever the material basis of memory may be, it must make sense to ask of it how many bits can be stored in a given volume of material. This question is seldom posed in associative models of learning, nor by neurobiologists committed to the hypothesis that the Hebbian synapse is the material basis of memory. Many—including the new Nobelist, Geoffrey Hinton— would agree that the question makes no sense. When you assume that brains learn by rewiring themselves rather than by acquiring and storing information, it makes no sense.

      When a subject learns a rate of reinforcement, it bases its behavior on that expectation, and it alters its behavior when that expectation is disappointed. Subjects also learn probabilities when they are defined. They base some aspects of their behavior on those expectations, making computationally sophisticated use of their representation of the uncertainties (Balci, Freestone, & Gallistel, 2009; Chan & Harris, 2019; J. A. Harris, 2019; J.A. Harris & Andrew, 2017; J. A. Harris & Bouton, 2020; J. A. Harris, Kwok, & Gottlieb, 2019; Kheifets, Freestone, & Gallistel, 2017; Kheifets & Gallistel, 2012; Mallea, Schulhof, Gallistel, & Balsam, 2024 in press).

      (2) Rate estimation theory is oblivious to the temporal order in which experience with different predictors occurs. The matrix computation finds the additive solution, if it exists, to the data so far observed, on the assumption that predicted rates have remained the same. This is the stationarity assumption, which is implicit in a rate computation and was made explicit in the formulation of RET (C.R. Gallistel, 1990). When the additive solution does not exist, the RET algorithm treats the compound of two predictors as a third predictor, and computes the additive solution to the 3-predictor problem. Because it is oblivious to the order in which the data have been acquired, it predicts one-trial overshadowing and retroactive blocking and unblocking (C.R. Gallistel, 1990 pp 439 & 452-455).

      The RET algorithm is but one component of the information-theoretic model of associative learning (aka, TATAL, The Analytic Theory of Associative Learning Wilkes & Gallistel, 2016)). It solves the assignment-of-credit problem, not the change-detection problem. Because rates of reinforcement do sometimes change, the stationarity assumption, which is essential to the RET algorithm, must be tested when each new reinforcement occurs and when the interval since the last reinforcement has become longer than would be expected or the number of reinforcements has become significantly fewer than would be expected given the current estimate of the probability of reinforcement (C. R. Gallistel, Krishan, Liu, Miller, & Latham, 2014). In the information-theoretic approach to associative learning, detecting non-stationarity is done by an information-theoretic change-detecting algorithm. The algorithm correctly predicts that omitted reinforcements to extinction will be a constant (C.R. Gallistel, 2024 under review; Gibbon, Farrell, Locurto, Duncan, & Terrace, 1980). To put the prediction another way, unreinforced trials to extinction will increase in proportional to the trials/reinforcement during training (C.R. Gallistel, 2012; Wilkes & Gallistel, 2016). In other words, it predicts the best and most systematic data on the partial reinforcement extinction effect (PREE) known to us. The profound challenge to neo-Hullian delta-rule updating models that is posed by the PREE has been recognized for the better part of a century. To the best of our knowledge, no other formalized model of associative learning has overcome this challenge (Dayan & Niv, 2008; Mellgren, 2012). Explaining extinction algorithmically is straightforward when one adopts an information-theoretic perspective, because computing reinforcement-by-reinforcement the Kullback-Leibler divergence in a sequence of earlier rate (or probability!) estimates from the most recent estimate and multiplying the vector of divergences by the vector of effective sample sizes (C. R. Gallistel & Latham, 2022) detects and localized changes in rates and probabilities of reinforcement (C.R. Gallistel, 2024 under review). The computation presupposes the existence of a temporal map, a time-stamped record of past events. This supposition is strongly resisted by neuroscience-oriented reinforcement-learning modelers, who try to substitute the assumption of decaying eligibility traces.

      The very interesting Pearce-Ganesan findings (Ganesan & Pearce, 1988) are not predicted by RET, but nor do they run counter its predictions. RET has nothing to say about how subjects categorize appetitive reinforcements; nor, at this time, does the information-theoretic approach to an understanding of associative have anything to say about that.

      The same is not true for the Betts, Brandon & Wagner results (Betts, Brandon, & Wagner, 1996). They pretrained a blocking cue that predicted a painful paraorbital shock to one eye of a rabbit. This cue elicited an anticipatory blink in the threatened eye. It also potentiated the startle reflex made to a loud noise in one ear. A new cue that was then introduced, which always occurred in compound with the pretrained blocking cue. In one group, the painful shock continued to be delivered to the same eye as before; in another group, it was delivered to the skin around the other eye. In the group that continued to receive the shock to the same eye, the old cue effectively blocked conditioning of the new cue for both the eyeblink and the potentiated startle response. However, in the group for which the location of the shock changed to the other eye, the old cue did not block conditioning of the eyeblink response to the new cue but did block conditioning of the startle response to the new cue. The information-theoretic analysis of associative learning focusses on the encoding of measurable predictive temporal relationships, rather than on general and, to our mind, vague notions like CS processing and US processing. A painful shock elicits fear in a rabbit no matter where on the body surface it is experienced, because fear is a reaction to a very broad category of dangers, and fear potentiates the startle reflex regardless of the threat that causes fear. Once that prediction of such a threat is encoded; redundant cues will not be encoded that same way because the RET algorithm blocks the encoding of redundant predictions. A painful shock near an eye elicits a blink of the threatened eye as well as the fear that potentiates the startle. An appropriate encoding for the eye blink must specify the location of the threat. RET will attribute prediction of the threat to the new eye to the new cue—and not to the old cue, the pretrained blocker— while continuing to attribute to the old cue the prediction of a fear-causing threat, because the change in location does not alter that prediction. Therefore, the new cue will be encoded as predicting the new location of the threat to the eye, but not as predicting the large category non-specific threats that elicit fear and the potentiation of the startle, because that prediction remains valid. Changing that prediction would violate the stationarity assumption; predictive relations do not change unless the data imply that they must have changed. Unless we have made a slip in our logic, this would seem to explain Betts et al’s (1996) results. It does so with no free parameters, unlike AESOP, which has a notoriously large number of free parameters.

      Balci, F., Freestone, D., & Gallistel, C. R. (2009). Risk assessment in man and mouse. Proceedings of the National Academy of Science U S A, 106(7), 2459-2463. doi:10.1073/pnas.0812709106

      Balsam, P. D., Fairhurst, S., & Gallistel, C. R. (2006). Pavlovian contingencies and temporal information. Journal of Experimental Psychology: Animal Behavior Processes, 32, 284-294.

      Barron, A., Rissanen, J., & Yu, B. (1998). The minimum description length principle in coding and modeling. IEEE Transactions on Information Theory, 44(6), 2743-2760.

      Berridge, K. C. (2012). From prediction error to incentive salience: Mesolimbic computation of reward motivation. European Journal of Neuroscience.

      Betts, S. L., Brandon, S. E., & Wagner, A. R. (1996). Dissociation of the blocking of conditioned eyeblink and conditioned fear following a shift in US locus. Animal Learning and Behavior, 24(4), 459-470.

      Chan, C. K. J., & Harris, J. A. (2019). The partial reinforcement extinction effect: The proportion of trials reinforced during conditioning predicts the number of trials to extinction. Journal of Experimental Psychology: Animal Learning and Cognition, 45(1). doi:http://dx.doi.org/10.1037/xan0000190

      Dayan, P., & Niv, Y. (2008). Reinforcement learning: The good, the bad and the ugly. Current Opinion in Neurobiology, 18(2), 185-196.

      Gallistel, C. R. (1990). The organization of learning. Cambridge, MA: Bradford Books/MIT Press.

      Gallistel, C. R. (2012). Extinction from a rationalist perspective. Behav Processes, 90, 66-88. doi:10.1016/j.beproc.2012.02.008

      Gallistel, C. R. (2024 under review). Reconceptualized associative learning. Perspectives on Behavioral Science (Special Issue for SQAB 2024).

      Gallistel, C. R., Balsam, P. D., & Fairhurst, S. (2004). The learning curve: Implications of a quantitative analysis. Proceedings of the National Academy of Sciences, 101(36), 13124-13131.

      Gallistel, C. R., Krishan, M., Liu, Y., Miller, R. R., & Latham, P. E. (2014). The perception of probability. Psychological Review, 121, 96-123. doi:10.1037/a0035232

      Gallistel, C. R., & Latham, P. E. (2022). Bringing Bayes and Shannon to the Study of Behavioral and Neurobiological Timing. Timing & Time Perception. timing & TIME Perception, 1-61. doi:10.1163/22134468-bja10069

      Ganesan, R., & Pearce, J. M. (1988). Effect of changing the unconditioned stimulus on appetitive blocking. Journal of Experimental Psychology: Animal Behavior Processes, 14, 280-291.

      Gibbon, J. (1981). The contingency problem in autoshaping. In C. M. Locurto, H. S. Terrace, & J. Gibbon (Eds.), Autoshaping and conditioning theory (pp. 285-308). New York: Academic.

      Gibbon, J., & Balsam, P. (1981). Spreading association in time. In C. M. Locurto, H. S. Terrace, & J. Gibbon (Eds.), Autoshaping and conditioning theory (pp. 219-253). New York: Academic Press.

      Gibbon, J., Berryman, R., & Thompson, R. L. (1974). Contingency spaces and measures in classical and instrumental conditioning. Journal of the Experimental Analysis of Behavior, 21(3), 585-605. doi: 10.1901/jeab.1974.21-585

      Gibbon, J., Farrell, L., Locurto, C. M., Duncan, H. J., & Terrace, H. S. (1980). Partial reinforcement in autoshaping with pigeons. Animal Learning and Behavior, 8, 45–59. doi:doi.org/10.3758/BF03209729

      Grünwald, P. D., Myung, I. J., & Pitt, M. A. (2005). Advances in minimum description length: theory and applications. Cambridge, MA: MIT Press.

      Hallam, S. C., Grahame, N. J., & Miller, R. R. (1992). Exploring the edges of Pavlovian contingency space: An assessment of contignency theory and its various metrics. Learning and Motivation, 23, 225-249.

      Hammond, L. J. (1980). The effect of contingency upon the appetitive conditioning of free operant behavior. Journal of  the Experimental Analysis of Behavior, 34, 297-304. doi:10.1901/jeab.1980.34-297

      Hammond, L. J., & Paynter, W. E. (1983). Probabilistic contingency theories of animal conditioning: A critical analysis. Learning and Motivation, 14, 527-550. doi:10.1016/0023-9690(83)90031-0

      Harris, J. A. (2019). The importance of trials. Journal of Experimental Psychology: Animal Learning and Cognition, 45(4).

      Harris, J. A. (2022). The learning curve, revisited. Journal of Experimental Psychology: Animal Learning and Cognition, 48, 265-280.

      Harris, J. A., & Andrew, B. J. (2017). Time, Trials and Extinction. Journal of Experimental Psychology: Animal Learning and Cognition, 43(1), 15-29.

      Harris, J. A., & Bouton, M. E. (2020). Pavlovian conditioning under partial reinforcement: The effects of non-reinforced trials versus cumulative CS duration. The Journal of Experimental Psychology: Animal Learning & Cognition, 46, 256-272.

      Harris, J. A., Kwok, D. W. S., & Gottlieb, D. A. (2019). The partial reinforcement extinction effect depends on learning about nonreinforced trials rather than reinforcement rate. Journal of Experimental Psychology: Animal Behavior Learning and Cognition, 45(4). doi:10.1037/xan0000220

      Jeong, H., Taylor, A., Floeder, J. R., Lohmann, M., Mihalas, S., Wu, B., . . . Namboodiri, V. M. K. (2022). Mesolimbic dopamine release conveys causal associations. Science. doi:10.1126/science.abq6740

      Kheifets, A., Freestone, D., & Gallistel, C. R. (2017). Theoretical Implications of Quantitative Properties of Interval Timing and Probability Estimation in Mouse and Rat. Journal of the Experimental Analysis of Behavior, 108(1), 39-72. doi:doi.org/10.1002/jeab.261

      Kheifets, A., & Gallistel, C. R. (2012). Mice take calculated risks. Proceedings of the National Academy of Science, 109, 8776-8779. doi:doi.org/10.1073/pnas.1205131109

      Mallea, J., Schulhof, A., Gallistel, C. R., & Balsam, P. D. (2024 in press). Both probability and rate of reinforcement can affect the acquisition and maintenance of conditioned responses. Journal of Experimental Psychology: Animal Learning and Cognition.

      Mellgren, R. (2012). Partial reinforcement extinction effect. In N. M. Seel (Ed.), Encyclopedia of the Sciences of Learning. Boston, MA: Springer.

      Niv, Y. (2009). Reinforcement learning in the brain. Journal of Mathematical Psychology, 53, 139-154.

      Niv, Y., Daw, N. D., & Dayan, P. (2005). How fast to work: response vigor, motivation and tonic dopamine. In Y. Weiss, B. Schölkopf, & J. R. Platt (Eds.), NIPS 18 (pp. 1019–1026). Cambridge, MA: MIT Press.

      Niv, Y., Daw, N. D., Joel, D., & Dayan, P. (2007). Tonic dopamine: Opportunity costs and the control of response vigor. Psychopharmacology, 191(3), 507-520.

      Niv, Y., & Montague, P. R. (2008). Theoretical and empirical studies of learning. In  (., eds), pp. , Academic Press. In P. W. e. a. Glimcher (Ed.), Neuroeconomics: Decision-Making and the Brain (pp. 329–349). New York: Academic Press.

      Niv, Y., & Schoenbaum, G. (2008). Dialogues on prediction errors. Trends in Cognitive Sciences, 12(7), 265-272. doi:10.1016/j.tics.2008.03.006

      Rescorla, R. A. (1966). Predictability and the number of pairings in Pavlovian fear conditioning. Psychonomic Science, 4, 383-384.

      Rescorla, R. A. (1968). Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative and Physiological Psychology, 66(1), 1-5. doi:10.1037/h0025984

      Rescorla, R. A. (1969). Conditioned inhibition of fear resulting from negative CS-US contingencies. Journal of Comparative and Physiological Psychology, 67, 504-509.

      Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II (pp. 64-99). New York: Appleton-Century-Crofts.

      Rissanen, J. (1999). Hypothesis selection and testing by the MDL principle. The Computer Journal, 42, 260–269. doi:10.1093/comjnl/42.4.260

      Scott, G. K., & Platt, J. R. (1985). Model of response-reinforcement contingency. Journal of  Experimental Psychology: Animal Behavior Processes, 11(2), 152-171.

      Wagner, A. R., & Rescorla, R. A. (1972). Inhibition in Pavlovian conditioning: Appllication of a theory. In R. A. Boakes & S. Halliday (Eds.), Inhibition and learning. New York: Academic.

      Wilkes, J. T., & Gallistel, C. R. (2016). Information Theory, Memory, Prediction, and Timing in Associative Learning (original long version).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study addresses how 3' splice site choice is modulated by the conserved spliceosome-associated protein Fyv6. The authors provide compelling evidence Fyv6 functions to enable selection of 3' splice sites distal to a branch point and in doing so antagonizes more proximal, suboptimal 3' splice sites. The study would be improved through a more nuanced discussion of alternative possibilities and models, for instance in discussing the phenotypic impact of Fyv6 deletion.

      We thank the editors and reviewers for their supportive comments and assessment of this manuscript. We have improved the discussion at several points as suggested by the reviewers to include discussion of alternative possibilities.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      A key challenge at the second chemical step of splicing is the identification of the 3' splice site of an intron. This requires recruitment of factors dedicated to the second chemical step of splicing and exclusion of factors dedicated to the first chemical step of splicing. Through the highest resolution cyroEM structure of the spliceosome to-date, the authors show the binding site for Fyv6, a factor dedicated to the second chemical step of splicing, is mutually exclusive with the binding site for a distinct factor dedicated to the first chemical step of splicing, highlighting that splicing factors bind to the spliceosome at a specific stage not only by recognizing features specific to that stage but also by competing with factors that bind at other stages. The authors further reveal that Fyv6 functions at the second chemical step to promote selection of 3' splice sites distal to a branch point and thereby discriminate against proximal, suboptimal 3' splice site. Lastly, the authors show by cyroEM that Fyv6 physically interacts with the RNA helicase Prp22 and by genetics Fyv6 functionally interacts with this factor, implicating Fyv6 in 3'SS proofreading and mRNA release from the spliceosome. The evidence for this study is robust, with the inclusion of genomics, reporter assays, genetics, and cyroEM. Further, the data overall justify the conclusions, which will be of broad interest.

      Strengths:

      (1) The resolution of the cryoEM structure of Fyv6-bound spliceosomes at the second chemical step of splicing is exceptional (2.3 Angstroms at the catalytic core; 3.0-3.7 Angstroms at the periphery), providing the best view of this spliceosomal intermediate in particular and the core of the spliceosome in general.

      (2) The authors observe by cryoEM three distinct states of this spliceosome, each distinguished from the next by progressive loss of protein factors and/or RNA residues. The authors appropriately refrain from overinterpreting these states as reflecting distinct states in the splicing cycle, as too many cyroEM studies are prone to do, and instead interpret these observations to suggest interdependencies of binding. For example, when Fyv6, Slu7, and Prp18 are not observed, neither are the first and second residues of the intron, which otherwise interact, suggesting an interdependence between 3' splice site docking on the 5' splice site and binding of these second step factors to the spliceosome.

      (3) Conclusions are supported from multiple angles.

      (4) The interaction between Fyv6 and Syf1, revealed by the cyroEM structure, was shown to account for the temperature-sensitive phenotypes of a fyv6 deletion, through a truncation analysis.

      (5) Splicing changes were observed in vivo both by indirect copper reporter assays and directly by RT-PCR.

      (6) Changes observed by RNA-seq are validated by RT-PCR.

      (7) The authors go beyond simply observing a general shift to proximal 3'SS usage in the fyv6 deletion by RNA-seq by experimentally varying branch point to 3' splice site distance experimentally in a reporter and demonstrating in a controlled system that Fyv6 promotes distal 3' splice sites.

      (8) The importance of the Fyv6-Syf1 interaction for 3'SS recognition is demonstrated by truncations of both Fyv6 and of Syf1.

      (9) In general, the study was executed thoroughly and presented clearly.

      We thank the reviewer for their recognition of the strengths of our multi-faceted approach that led to highly supported conclusions.

      Weaknesses:

      (1) Despite the authors restraint in interpreting the three states of the spliceosome observed by cyroEM as sequential intermediates along the splicing pathway, it would be helpful to the general reader to explicitly acknowledge the alternative possibility that the difference states simply reflect decomposition from one intermediate during isolation of the complex (i.e., the loss of protein is an in vitro artifact, if an informative one).

      We thank the reviewer for noticing our restraint in interpreting these structures, and we agree that the scenario described by the reviewer is a possibility. We have now explicitly mentioned this in the Discussion on lines 755-757.

      (2) The authors acknowledge that for prp8 suppressors of the fyv6 deletion, suppression may be indirect, as originally proposed by the Query and Konarska labs - that is, that defects in the second step conformation of the spliceosome can be indirectly suppressed by compensating, destabilizing mutations in the first step spliceosome. Whereas some of the other suppressors of the fyv6 deletion can be interpreted as impacting directly the second step spliceosome (e.g., because the gene product is only present in the second step conformation), it seems that many more suppressors beyond prp8 mutants, especially those corresponding to bulky substitutions, which would more likely destabilize than stabilize, could similarly act indirectly by destabilization of first step conformation. The authors should acknowledge this where appropriate (e.g., for factors like Prp8 that are present in both first and second step conformations).

      We agree that this is also a possibility and have now included this on lines 480-486.

      Reviewer #2 (Public Review):

      In this manuscript, Senn, Lipinski, and colleagues report on the structure and function of the conserved spliceosomal protein Fyv6. Pre-mRNA splicing is a critical gene expression step that occurs in two steps, branching and exon ligation. Fyv6 had been recently identified by the Hoskins' lab as a factor that aids exon ligation (Lipinski et al., 2023), yet the mechanistic basis for Fyv6 function was less clear. Here, the authors combine yeast genetics, transcriptomics, biochemical assays, and structural biology to reveal the function of Fyv6. Specifically, they describe that Fyv6 promotes the usage of distal 3'SSs by stabilizing a network of interactions that include the RNA helicase PRP22 and the spliceosome subunit SYF1. They discuss a generalizible mechanism for splice site proofreading by spliceosomsal RNA helicases that could be modulated by other, regulatory splicing factors.

      This is a very high quality study, which expertly combines various approaches to provide new insights into the regulation of 3'SS choice, docking, and undocking. The cryo-EM data is also of excellent quality, which substantially extends on previous yeast P complex structures. This is also supported by the authors use of the latest data analysis tools (Relion-5, AlphaFold2 multimer predictions, Modelangelo). The authors re-evaluate published EM densities of yeast spliceosome complexes (B*, C,C*,P) for the presence or absence of Fyv6, substantiate Fyv6 as a 2nd step specific factor, confirm it as the homolog of the human protein FAM192A, and provide a model for how Fyv6 may fit into the splicing pathway. The biochemical experiments on probing the splicing effects of BP to 3'SS distances after Fyv6 KO, genetic experiments to probe Fyv6 and Syf1 domains, and the suppressor screening add substantially to the study and are well executed. The manuscript is clearly written and we particularly appreciated the nuanced discussions, for example for an alternative model by which Prp22 influences 3'SS undocking. The research findings will be of great interest to the pre-mRNA splicing community.

      We thank the reviewer for their positive comments on our manuscript.

      We have only few comments to improve an already strong manuscript.

      Comments:

      (1) Can the authors comment on how they justify K+ ion positions in their models (e.g. the K+ ion bridging G-1 and G+1 nucleotides)? How do they discriminate e.g. in the 'G-1 and G+1' case K+ from water?

      The assignment of K+ at this position is justified by both longer coordination distances and relatively high cryo-EM density compared to structured water molecules in the same vicinity. We have added a panel to figure3-figure supplement 4C to show the density for the G-1/G+1 bridging K+ ion and to show the adjacent density for putative water molecules which coordinate the ion. The K+ ion density is larger and has stronger signal than the adjacent water molecules. The coordination distances are also longer than would be expected for a Mg2+. For these reasons and because K+ was present in the purification buffer, we modelled the density as K+.

      (2) The authors comment on Yju2 and Fyv6 assignments in all yeast structures except for the ILS. Can the authors comment on if they have also looked into the assignment of Yju2 in the yeast ILS structure in the same manner? While it is possible that Fyv6 could dissociate and Yju2 reassociate at the P to ILS transition, this would merit a closer look given that in the yeast P complex Yju2 had been misassigned previously.

      We thank the reviewer for pointing out this very interesting topic! We have used ModelAngelo to analyze the S. cerevisiae ILS structure for support of density assignment as Yju2 (and not Fyv6). This analysis supports the assignment as Yju2 in this structure and we have no evidence to doubt its presence in those particular purified spliceosomes. We have updated Figure 4- figure supplement 1B accordingly.

      That being said, we do think that this issue should be studied more carefully in the future. The S. cerevisiae ILS structure (5Y88) was determined by purifying spliceosome complexes with a TAP-tag on Yju2. So the conclusion that Yju2 is part of the ILS spliceosome involves some circular logic: Yju2 is part of ILS spliceosome complexes because it is present in ILS complexes purified with Yju2. We also note that Yju2 was absent in ILS complexes recently determined from metazoans by the Plaschka group.  We have added some additional nuance to the Discussion to raise this important mechanistic point at lines 711-718.

      (3) For accessibility to a general reader, figures 1c, d, e, 2a, b, would benefit from additional headings or labels, to immediately convey what is being displayed. It is also not clear to us if Fig 1e might fit better in the supplement and be instead replaced by Supplementary Figure 1a (wt) , b (delta upf1), and a new c (delta fyv6) and new d (delta upf1, delta fyv6). This may allow the reader to better follow the rationale of the authors' use of the Fyv6/Upf1 double deletion.

      We thank the reviewer for the suggestion and have updated Figures 1 C-E to include additional information in the headings and labels. We have not changed the labels in Figures 2A, B but have added additional clarifying language to the legend.

      In terms of rearranging the figures, we thank the reviewer for the suggestion but have decided that the figures are best left in their current ordering.

      (4) The authors carefully interpret the various suppressor mutants, yet to a general reader the authors may wish to focus this section on only the most critical mutants for a better flow of the text.

      We thank the reviewer for this suggestion. While this section of the manuscript does contain (to quote Reviewer #3) “extensive new information regarding functional interactions”, it was a bit long. We have reduced this section of the manuscript by ~200 words for a more focused presentation for general readers.

      Reviewer #3 (Public Review):

      In this manuscript the authors expand their initial identification of Fyv6 as a protein involved in the second step of pre-mRNA splicing to investigate the transcriptome-wide impact of Fyv6 on splicing and gain a deeper understanding of the mechanism of Fyv6 action.

      They first use deep sequencing of transcripts in cells depleted of Fyv6 together with Upf1 (to limit loss of mis-spliced transcripts) to identify broad changes in the transcriptome due to loss of Fyv6. This includes both changes in overall gene expression, that are not deeply discussed, as well as alterations in choice of 3' splice sites - which is the focus of the rest of the manuscript

      They next provide the highest resolution structure of the post-catalytic spliceosome to date; providing unparalleled insight into details of the active site and peripheral components that haven't been well characterized previously.

      Using this structure they identify functionally critical interactions of Fyv6 with Syf1 but not Prp22, Prp8 and Slu7. Finally, a suppressor screen additionally provides extensive new information regarding functional interactions between these second step factors.

      Overall this manuscript reports new and essential information regarding molecular interactions within the spliceosome that determine the use of the 3' splice site. It would be helpful, especially to the non-expert, to summarize these in a table, figure or schematic in the discussion.

      We thank the reviewer for the positive comments and suggestions. We did include a summary figure in panel 7H. However, it was a bit buried. To highlight the summary figure more clearly, we have moved panel 7H to its own figure (Fig. 8).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The resolution of some panels is poor, nearly illegible (e.g., Supp Fig 1A, B).

      The resolution of panels in supplemental figure 1 has been increased. However, this may be an artifact of the PDF conversion process. We will pay attention to this during the publication process.

      (2) Panel S6B: 6HYU is a structure of DHX8, not DDX8

      We have corrected DDX8 to DHX8 in Supplemental Fig. S6D and associated figure legend.

      (3) The result that Syf1 truncations can suppress the Fyv6 deletion is impressive. The subsequent discussion seems muddled. A discussion of Fyv6 binding at the first step, instead of Yju2, doesn't seem relevant here (though worthy of consideration in the discussion), given that the starting mutation is the Fyv6 deletion. Further, conjuring rebinding of Yju2 based on the data in the paper seems unnecessarily speculative (assumes that biochemical state III is on pathway), unless I am unaware of some other evidence for such rebinding. Instead, a simpler explanation would seem to be that in the absence of Fyv6, Syf1 inappropriately binds Yju2 instead at the second step and that deletion of the common Fyv6/Yju2 binding site on Syf1 suppresses this defect. In this case, the ts phenotype of the Fyv6 deletion would result from inappropriate binding of Yju2, and the splicing defect would be due to loss of Fyv6 activity. Alternatively, especially considering the work of the labs of Query and Konarska, the authors should consider the possibility that i) the Fyv6 deletion destabilizes the second step conformation, shifting an equilibrium to the first step conformation, and that ii) the Syf1 truncation destabilizes binding of Yju2, thereby restoring the equilibrium. In this case the ts phenotype of the Fyv6 deletion is due to a disturbed equilibrium and the splicing defect is due to the failure of Fyv6 to function at the second step.

      We believe the reviewer is specifically referencing the final paragraph of this Results section (the paragraph that comes just before the section “Mutations in many different splicing factors…”). In retrospect, we agree that our discussion was convoluted. In particular, we emphasized rebinding of Yju2 based on its presence in the cryo-EM structure of the yeast ILS complex. However, given some uncertainties about whether or not Yju2 is a bona fide ILS component (as discussed above). We don’t think it is appropriate to over-emphasize rebinding of Yju2 and have decided to incorporate the elegant mechanisms proposed by the reviewer. This paragraph has now been edited accordingly (lines 386-395).

      (4) The authors imply they have performed biochemical studies, which I think is misleading. Of course, RT-PCR and primer extension assays for example are performed in vitro, but these are an analysis of RNA events that occurred in vivo. In my view a higher threshold should be used for defining "biochemistry". To me "biochemistry" would imply that the authors have, for example, investigated 3' splice site usage in splicing extracts of the fyv6 deletion or engaged in an analysis of the Syf1-Fyv6 interaction involving the expression of the interacting domains in bacteria followed by a binding analysis in the test tube.

      We disagree with the reviewer on this point. Biochemistry is defined as the “branch of sciences concerned with the chemical substances, reactions, and physico chemical processes which occur within living organisms; biological or physical chemistry.” (Oxford English Dictionary). Biochemical studies are not defined by whether or not they take place in vitro, in vivo, or even in silico. Indeed, much of the history of biochemistry (especially in studies of metabolism, for example) involved experiments occurring in vivo that reported on the molecular properties and mechanisms of biological processes. We think many of our experiments fall into this category including our structure/function analysis of splicing factors and the use of the ACT1-CUP1 reporter substrate.

      (5) The monovalents are shown; inositol phosphate is shown; is the binding of Prp22 to RNA shown?

      We have added a panel to Figure 3-figure supplement 4D showing density for the 3' exon within Prp22.

      (6) The authors invoke undocking of the 3'SS in the P complex. Where is the 3'SS in the ILS? The author's model predicts: undocked.

      In all ILS structures to date, the 3′ SS is undocked, in agreement with this prediction. We have now noted this observation in line 760.

      (7) Would be helpful to show fyv6 deletion in Fig 1b.

      We have included growth data for an additional fyv6 deletion strain (in a cup1Δ background) in Figure 1b. The results are quite similar to the upf1_Δ_ background except with slightly worse growth at 23°C.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments

      (1) Fig.3b is the arrow indicating the right rotation?

      This typo has been fixed.

      (2) Fig.4b, panel H is annotated, which should read 'F'.

      This typo has been fixed.

      (3) Line 178: "Finally, we analyzed the sequence features of the alternative 3ʹ SS activated by loss of Fyv6." We would suggest 'used after' instead of 'activated by'.

      We have replaced ‘activated by’ with ‘with increased use after’.

      (4) In Line 544, the authors speculate on a Slu7 requirement for 3'SS docking and on 3'SS docking maintenance. In the results section (Line 265) they however only mention the latter possibility. These statements should be consistent.

      We thank the reviewer for pointing this out. We have added a reference to docking maintenance to the results section at line 325.

      (5) Line 476: "Unexpectedly, Prp22 I1133R was actually deleterious when Fyv6 was present for this reporter." We suggest removing "actually".

      We have removed ‘actually’.

      (6) The authors describe the observed changes in splicing events in absolute numbers (e.g. in Fig 1c). To better assess for the reader whether these numbers reflect large or small effects of Fyv6 in defining mRNA isoforms, it would be more useful to state these as percent changes of total events or to provide a reference number for how many introns are spliced in S.c. See for example the statements in Lines 132 and 145.

      We have added a percentage at line 138 that indicates ~20% of introns in yeast showed splicing changes.

      Reviewer #3 (Recommendations For The Authors):

      Do the authors have a proposed explanation for the observed DGE in non-intron containing genes in the Fyv6 depleted cells?

      The simplest explanation is that this is an indirect effect due to splicing changes occurring in other genes (such as transcription factors, ribosomal protein genes, etc..). It is possible that this can be further dissected in the future using shorter-term knockdown of Fyv6 using Anchors Away or AID-tagging. However, that is beyond the scope of the current manuscript, and we do not wish to comment on these non-intron containing genes further at present.

      Figure 2A - What is going on with the events that show no FAnS value under one condition (i.e. are up against the X or Y axis)? These are of interest as most on the Y- axis are blue.

      The events along one of the axes denote alternative splice sites that are only detected under one condition (either when Fyv6 is present or when it is absent). At this stage, we do not wish to interpret these events further since most have a relatively low number of reads overall.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study reports single-cell RNA sequencing results of lung adenocarcinoma, comparing 4 treatment-naive and 5 post-neoadjuvant chemotherapy tumor samples.<br /> The authors claim that there are metabolic reprogramming in tumor cells as well as stromal and immune cells after chemotherapy.

      The most significant findings are in the macrophages that there are more pro-tumorigenic cells after chemotherapy, i.e. CD45+CD11b+ARG+ cells. In the treatment-naive samples, more anti-tumorigenic CD45+CD11b+CD86+ macrophages are found. They sorted each population and performed functional analyses.

      Strengths:

      Comparison of the treatment-naive and post-chemotherapy samples of lung adenocarcinoma.

      Weaknesses:

      (1) Lengthy descriptive clustering analysis, with indistinct direct comparisons between the treatment-naive and the post-chemotherapy samples.

      Thank you for your detailed review and valuable feedback. We have simplified the descriptive clustering analysis by removing redundant parts and retaining only the key content relevant to our findings. This should help readers to more easily grasp and focus on the main results.

      (2) No statistical analysis was performed for the comparison.

      We appreciate your constructive feedback and are committed to improving our research methodology and reporting to enhance the scientific rigor of our studies.

      (3) Difficult to match data to the text.

      Thank you for your feedback. We understand that there were difficulties in matching the data to the text. We have reviewed the manuscript carefully to ensure that all data points are clearly linked to the corresponding sections in the text.

      (4) ARG1 is a cytosolic enzyme that can be detected by intracellular staining after fixation. It is unclear how the staining and sorting was performed to measure function of sorted cells.

      We apologize for the error caused by miscommunication within our research team. We are currently using both ARG1 and CD206 antibodies in our studies. Due to a communication error, the technician mistakenly assumed ARG1 was another name for CD206 (MRC1), resulting in the incorrect labeling of CD206 as ARG1 in our experimental records. In reality, we used the CD206 antibody, which is consistent with the same surface marker shown in figure 6e. We have made corrections in the manuscript and experimental figures. Thank you for pointing this out, and we regret any misunderstanding this may have caused.

      Reviewer #2 (Public Review):

      In this study, Huang et al. performed a scRNA-seq analysis of lung adenocarcinoma (LUAD) specimens from 9 human patients, including 5 who received neoadjuvant chemotherapy (NCT), and 4 without treatment (control). The new data was produced using 10 × Genomics technology and comprises 83622 cells, of which 50055 and 33567 cells were derived from the NCT and control groups, respectively. Data was processed via R Seurat package, and various downstream analyses were conducted, including CNV, GSVA, functional enrichment, cell-cell interaction, and pseudotime trajectory analyses. Additionally, the authors performed several experiments for in vitro and in vivo validation of their findings, such as immunohistochemistry, immunofluorescence, flow cytometry, and animal experiments.

      The study extensively discusses the heterogeneity of cell populations in LUAD, comparing the samples with and without chemotherapy. However, there are several shortcomings that diminish the quality of this paper:

      • The number of cells included in the dataset is limited, and the number of patients from different groups is low, which may reduce the attractiveness of the dataset for other researchers to reuse. Additionally, there is no metadata on patients' clinical characteristics, such as age, sex, history of smoking, etc., which would be valuable for future studies.

      Thank you for your insightful feedback. We recognize that the limited number of cells and the small number of patients from different groups in our dataset may affect its appeal for reuse by other researchers. Additionally, we acknowledge the absence of metadata on patients' clinical characteristics, such as age, sex, and smoking history, which would indeed be valuable for future studies. We have compiled statistics on the patient's metadata and other information in the Supplementary Table 2.

      We appreciate your suggestions and will consider incorporating these aspects in future research to enhance the dataset's utility and attractiveness.

      • Several crucial details about the data analysis are missing: How many PCs were used for reduction? Which versions of Seurat/inferCNV/other packages were used? Why monocle2 was used and not monocle3 or other packages? Also, the authors use R version 3.6.1, and the current version is 4.3.2.

      Thank you for your detailed review and valuable suggestions. Below are our responses to the points you raised:

      Principal Components (PCs) Used for Reduction: We used the first 20 principal components (PCs) for dimensionality reduction. This choice was based on preliminary tests showing that 20 PCs captured the major variation in our data effectively.

      Versions of Packages: The versions of the packages used are as follows:

      Seurat: Version 4.0.1

      inferCNV: Version 1.18.1

      monocle2: Version 2.14.0

      Choice of monocle2 over monocle3 or Other Packages: We chose monocle2 because it performed better on our specific dataset, and its algorithms suited our research needs. Additionally, we are more familiar with the functionalities and outputs of monocle2, which allowed us to better interpret and apply the results.

      R Version: We used R version 3.6.1 at the beginning of our study to ensure consistency and reproducibility throughout the analysis. Although the current version of R is 4.3.2, we maintained the same version throughout our research. We will consider upgrading to the latest version of R and re-testing for compatibility and performance in future studies.

      We appreciate your attention to these details and will include this information in the revised manuscript.

      • It seems that the authors may lack a fundamental understanding of scRNA-seq data processing and the functions of Seurat. For instance, they state, 'Next, we classified cell types through dimensional reduction and unsupervised clustering via the Seurat package.' However, dimensional reduction and unsupervised clustering are not methods for cell classification. Typically, cell types are classified using marker genes or other established methods.

      Thank you for your insightful comments. We appreciate your guidance on the proper understanding and application of scRNA-seq data processing and the functions of Seurat.

      You are correct in noting that dimensional reduction and unsupervised clustering are not methods for cell classification. We apologize for the confusion in our original statement. What we intended to convey was that we performed dimensional reduction and unsupervised clustering using the Seurat package as preliminary steps in our analysis. Following these steps, we classified cell types based on established marker genes.

      "Therefore, to identify subclusters within each of these nine major cell types, we performed principal component analysis" (Line 127). Principal component analysis is a method for dimensionality reduction, not cell clustering.

      The authors did not mention the normalization or scaling of the data, which are crucial steps in scRNA-seq data preprocessing.

      Thank you for your insightful comments. We apologize for any confusion caused by our description in the manuscript. You are correct that principal component analysis (PCA) is primarily a method for dimensionality reduction rather than cell clustering. To clarify, we used PCA to reduce the dimensionality of our single-cell RNA-seq (scRNA-seq) data, which is a preliminary step before clustering the cells.

      In the revised manuscript, we have provided a more detailed description of our data preprocessing pipeline, including the normalization and scaling steps that are indeed crucial for scRNA-seq data analysis. Specifically, we performed the following steps:

      Normalization: We normalized the gene expression data to account for differences in sequencing depth and other technical variations.

      Scaling: We scaled the normalized data to ensure that each gene contributes equally to the PCA, which mitigates the effect of highly variable genes dominating the analysis.

      Following these preprocessing steps, we conducted PCA to reduce the dimensionality of the data, which facilitated the subsequent clustering of cells into subclusters.

      We hope this addresses your concerns, and we appreciate your valuable feedback that helped us improve the clarity and accuracy of our manuscript.

      • Numerous style and grammar mistakes are present in the main text. For instance, certain sections of the methods are written in the present tense, suggesting that parts of a protocol were copied without text editing. Furthermore, some sections of the introduction are written in the past tense when the present tense would be more suitable. Clusters are inconsistently referred to by numbers or cell types, leading to confusion. Additionally, the authors frequently use the term "evolution" when describing trajectory analysis, which may not be appropriate. Overall, significant revisions to the main text are required.

      Thank you for your detailed review and valuable feedback on our manuscript. We highly appreciate your suggestions and have made the following revisions to address the issues you pointed out:

      Tense Consistency: We have thoroughly reviewed and corrected the use of tenses throughout the manuscript. The Methods section now consistently uses the past tense, while the Introduction section uses the present tense where appropriate, ensuring coherence and consistency.

      Cluster Naming Consistency: We have standardized the naming conventions for clusters, consistently using either numbers or cell types to avoid any confusion.

      Appropriate Terminology: We have reviewed our use of the term "evolution" in the context of trajectory analysis. Where necessary, we have replaced it with more accurate terms such as "trajectory progression" or "developmental pathway" to better convey the intended meaning.

      • Some figures are not mentioned in order or are not referenced in the text at all, such as Figure 5l (where it is also unclear how the authors selected the root cells). Additionally, many figures have text that is too small to be read without zooming in. Overall, the quality of the figures is inconsistent and sometimes very poor.

      Thank you for your detailed review and valuable feedback on our manuscript. We have addressed the issues you raised as follows:

      Unreferenced Figures in the Text:

      We acknowledge the oversight regarding Figure 5l not being mentioned in the text. In the revised version, we will ensure that all figures are properly referenced and discussed within the relevant sections of the manuscript.

      Text Size in Figures:

      We understand the difficulty in reading small text within the figures. We will redesign all figures to ensure that text and annotations are legible at normal viewing sizes. This will involve increasing the resolution and text size in all figures to enhance readability.

      Inconsistent Quality of Figures:

      To address the inconsistency in figure quality, we will standardize the formatting of all figures and ensure they meet a high standard of clarity and presentation. This will improve the overall visual quality and professionalism of the manuscript.

      The results section lacks clarity on several points:<br /> • The authors state that "myofibroblasts exclusively originated from the control group". However, pathways up-regulated in myofibroblasts (such as glycolysis) were enhanced after chemotherapy, as indicated by GSVA score. Similarly, why are some clusters of TAMs from the control group associated with pathways enriched in chemotherapy group?

      Thank you for your insightful comments and questions regarding our manuscript. We appreciate the opportunity to clarify these points.

      Regarding the statement that "myofibroblasts exclusively originated from the control group," we acknowledge the confusion and would like to provide a more detailed explanation. While the initial identification indicated that myofibroblasts were predominantly found in the control group, subsequent analyses, including the Gene Set Variation Analysis (GSVA), revealed that certain pathways up-regulated in myofibroblasts, such as glycolysis, were indeed enhanced following chemotherapy. This suggests that chemotherapy may induce or enhance specific functional states in these cells that are not initially apparent from their origin alone.

      Similarly, the observation that some clusters of Tumor-Associated Macrophages (TAMs) from the control group are associated with pathways enriched in the chemotherapy group can be explained by the dynamic nature of cellular responses to treatment. TAMs, like other immune cells, can exhibit plasticity and adapt to the tumor microenvironment altered by chemotherapy. This plasticity may result in the activation of pathways typically associated with a chemotherapy response, even in cells originating from the control group.

      We will revise the manuscript to better articulate these findings and include additional data to support our explanations. This will help clarify the observed discrepancies and provide a more comprehensive understanding of the cellular dynamics in response to chemotherapy.

      • Further explanation is necessary regarding the distinctions between malignant and non-malignant cells, as well as regarding the upregulation of metabolism-related pathways in fibroblasts from the NCT group. Additionally, clarification is needed regarding why certain TAMs from the control group are associated with pathways enriched in the chemotherapy group.

      Thank you for your detailed review and for highlighting the areas that require further clarification. We appreciate the opportunity to provide additional explanations and improve our manuscript.

      We recognize the need to more clearly differentiate between malignant and non-malignant cells in our manuscript. We will include additional details on the criteria and markers used to distinguish these cell types. Specifically, we will elaborate on the molecular and phenotypic characteristics that were used to identify malignant cells, such as specific genetic mutations, aberrant signaling pathways, and distinct cell surface markers, as opposed to those used for identifying non-malignant cells.

      As mentioned above, the association of certain TAMs from the control group with pathways enriched in the chemotherapy group can be attributed to the inherent plasticity and adaptability of TAMs. We will provide a more detailed explanation of how TAMs can exhibit different functional states based on microenvironmental cues. This will include a discussion on the potential pre-existing heterogeneity within TAM populations and how even in the absence of direct chemotherapy exposure, some TAMs may display pathway activities similar to those seen in the chemotherapy group due to microenvironmental influences or intrinsic properties.

      • In the section titled 'Chemo-driven Pro-mac and Anti-mac Metabolic Reprogramming Exerted Diametrically Opposite Effects on Tumor Cells': The markers selected to characterize the anti- and pro-macrophages are commonly employed for describing M1 or M2 polarization. It is uncertain whether this new classification into anti- and pro-macrophages is necessary. Additionally, it should be noted that pro-macrophages are anti-inflammatory, while anti-macrophages are pro-inflammatory, which could lead to confusion. M2 macrophages are already recognized for their role in stimulating tumor relapse after chemotherapy.

      Thank you for your feedback. We appreciate the opportunity to clarify the rationale behind our terminology and the focus on functional phenotypic changes in macrophages before and after chemotherapy.

      Our intention in introducing the terms "pro-macrophages" and "anti-macrophages" was to highlight the distinct functional phenotypic changes in macrophages observed before and after chemotherapy. These terms were chosen to emphasize the functional roles these macrophages play in the tumor microenvironment in response to chemotherapy, rather than strictly adhering to the conventional M1/M2 polarization paradigm.

      We acknowledge that M2 macrophages are well-documented in stimulating tumor relapse after chemotherapy. Our use of "pro-macrophages" is intended to build on this established knowledge by providing a more nuanced understanding of their role in the post-chemotherapy tumor microenvironment. Similarly, "anti-macrophages" highlight the macrophages' role in mounting an anti-tumor response.

      • The authors suggest that there is "reprogramming of CD8+ cytotoxic cells" following chemotherapy (Line 409). It remains unclear whether they imply the reprogramming of other CD8+ T cells into cytotoxic cells. While it is indicated that cytotoxic cells from the control group differ from those in the NCT group and that NCT cytotoxic T cells exhibit higher cytotoxicity, the authors did not assess the expression of NK and NK-like T cell markers (aside from NKG7), which may possess greater cytotoxic potential than CD8+ cytotoxic cells. This could also elucidate why cytotoxic cells from the NCT and control groups are positioned on separate branches in trajectory analysis. Overall, with 22.5k T cells in the dataset, only 3 subtypes were identified, suggesting a need for improved cell annotations by the authors.

      Thank you for your valuable feedback regarding the classification and characterization of CD8+ cytotoxic cells following chemotherapy, and the need for improved cell annotations.

      We appreciate your point on the potential ambiguity around the "reprogramming of CD8+ cytotoxic cells" post-chemotherapy. In our study, we observed that CD8+ T cells from the control and NCT groups differ significantly in their cytotoxic profiles, with the NCT group's cytotoxic T cells displaying enhanced cytotoxicity. However, we did not imply the reprogramming of other CD8+ T cells into cytotoxic cells. Instead, our findings suggest a shift in the functional state of existing CD8+ cytotoxic cells, driven by chemotherapy, which aligns with the upregulation of genes associated with cytotoxic functions.

      We acknowledge that the expression of NK and NK-like T cell markers (apart from NKG7) was not comprehensively assessed. We agree that these markers may possess greater cytotoxic potential and could elucidate the separation observed in the trajectory analysis between cytotoxic cells from the NCT and control groups. This distinction may be attributed to differential cytotoxic potentials and functional states induced by chemotherapy.

      Furthermore, with 22,530 T cells in the dataset, only three subtypes were initially identified. We recognize the need for more refined cell annotations to capture the full spectrum of T cell diversity. This could involve a deeper analysis of additional markers to distinguish between various cytotoxic populations, including NK and NK-like T cells, and their respective roles in the tumor microenvironment post-chemotherapy.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I would recommend simplifying the manuscript and focusing on the differences between the treatment-naive and post-chemotherapy samples.

      Thank you for your valuable feedback on our manuscript. We greatly appreciate your suggestions and have carefully considered the proposed modifications.

      Upon re-evaluating our manuscript, we believe that the current structure and content most effectively convey our research findings. Our study aims to not only compare the treatment-naive and post-chemotherapy samples but also to highlight several important secondary findings that are integral to the overall research.

      Nevertheless, we understand your recommendation to simplify the manuscript. To address this, we have made some subtle adjustments to improve the readability and conciseness of the text. Additionally, we have included a section in the discussion that more explicitly highlights the differences between the treatment-naive and post-chemotherapy samples.

      IRB number for the human sample collection as well as animal experiments need to be provided.

      Thank you for your thorough review and for highlighting the need for the inclusion of the IRB number for the human sample collection and animal experiments.

      We apologize for this oversight and appreciate your attention to this important detail. The Institutional Review Board (IRB) approval number for the human sample collection is [B2019-436].

      This number has been added to the Methods section of our revised manuscript to ensure compliance with ethical standards and to provide transparency for our research.

      I put a question on the macrophage sorting experiment in the public review. Please clarify how the ARG1 staining was achieved with the preservation of cell viability.

      We apologize for the error caused by miscommunication within our research team. We are currently using both ARG1 and CD206 antibodies in our studies. Due to a communication error, the technician mistakenly assumed ARG1 was another name for CD206 (MRC1), resulting in the incorrect labeling of CD206 as ARG1 in our 0experimental records. In reality, we used the CD206 antibody, which is consistent with the same surface marker shown in figure 6e. We have made corrections in the manuscript and experimental figures. Thank you for pointing this out, and we regret any misunderstanding this may have caused.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      • Line 65- "Chemotherapy drugs, however, are very toxic and are prone to invalid". Line 75-77: "This heterogeneity in the TME includes the differences between tumor cells and tumor cells and the differences between various stromal cells and immune cells. Actively exploring the changes of multiple cells in the TME of LUAD after chemotherapy may finally find an excellent way to overcome chemotherapy resistance for LUAD." Please rewrite these parts.

      Thank you for your valuable comment. We have revised the manuscript according to your suggestion:

      Original (Line 65): "Chemotherapy drugs, however, are very toxic and are prone to invalid." Revised: "However, chemotherapy drugs are highly toxic and can often become ineffective."

      Original (Line 75-77): "This heterogeneity in the TME includes the differences between tumor cells and tumor cells and the differences between various stromal cells and immune cells. Actively exploring the changes of multiple cells in the TME of LUAD after chemotherapy may finally find an excellent way to overcome chemotherapy resistance for LUAD."

      Revised: "The heterogeneity within the tumor microenvironment (TME) encompasses not only the variations between different tumor cells but also among various stromal and immune cell types. Investigating the dynamic changes in multiple cell populations within the TME of LUAD following chemotherapy may provide crucial insights into overcoming chemotherapy resistance in LUAD."

      • Line 87: "The internal processes of the cells respectively drive immune cells and cancer cells to obtain glucose and glutamine preferentially."-> The internal metabolic changes in the cells drive...

      Thank you for your valuable comment. We have revised the manuscript according to your suggestion:

      Original (Line 87): "The internal processes of the cells respectively drive immune cells and cancer cells to obtain glucose and glutamine preferentially."

      Revised: "The internal metabolic changes in the cells drive immune cells and cancer cells to preferentially obtain glucose and glutamine."

      • Line 93: "an essential feature that affects the effect of chemotherapy"-> an essential feature that affects chemotherapy.

      Thank you for your valuable comment. We have revised the manuscript according to your suggestion:

      Original (Line 93): "Metabolic reprogramming in various cell types in the tumor microenvironment after undergoing chemotherapy may be an essential feature that affects the effect of chemotherapy."

      Revised: "Metabolic reprogramming in various cell types in the tumor microenvironment after undergoing chemotherapy may be an essential feature that affects chemotherapy."

      • Line 84: What do the immune cells depend on glucose for?

      Thank you for your valuable comment. We have revised the manuscript according to your suggestion:

      Original (Line 84): "However, recent studies have shown that tumor-infiltrating immune cells depend on glucose and immune cells especially macrophages consume more glucose than malignant cells."

      Revised: "However, recent studies have shown that tumor-infiltrating immune cells rely on glucose for their energy needs and functionality, with immune cells, particularly macrophages, consuming more glucose than malignant cells."

      • Line 223: "According to previous research, myofibroblast has been described"-> myofibroblasts have been described.

      Thank you for your valuable comment. We have revised the manuscript according to your suggestion:

      Original (Line 223): "According to previous research, myofibroblast has been described as a cancer-associated fibroblast that participated in extensive tissue remodeling, angiogenesis, and tumor progression."

      Revised: "According to previous research, myofibroblasts have been described as cancer-associated fibroblasts that participate in extensive tissue remodeling, angiogenesis, and tumor progression."

      • Line 239: "Considering the essential fibroblasts"-> Considering the essential role of fibroblasts.

      Thank you for your valuable comment. We have revised the manuscript according to your suggestion:

      Original (Line 239): "Considering the essential fibroblasts and their complicated function in shaping the tumor microenvironment..."

      Revised: "Considering the essential role of fibroblasts and their complicated function in shaping the tumor microenvironment..."

      • Line 251: "Further in vitro studies were required to elucidate these notable fibroblasts' potential function..." -> are required.

      Thank you for your valuable comments. We have revised the manuscript according to your suggestions:

      Original (Line 251): "Further in vitro studies were required to elucidate these notable fibroblasts' potential function..."

      Revised: "Further in vitro studies are required to elucidate these notable fibroblasts' potential function..."

      • Line 309: "Interestingly, we found that two subtypes, Anti-mac and Mix, can be converted to Pro-mac through pseudotime time analysis." -> via trajectory analysis we found that two subtypes...

      Thank you for your valuable comments. We have revised the manuscript according to your suggestions:

      Original (Line 309): "Interestingly, we found that two subtypes, Anti-mac and Mix, can be converted to Pro-mac through pseudotime time analysis."

      Revised: "Interestingly, via trajectory analysis we found that two subtypes, Anti-mac and Mix, can be converted to Pro-mac."

      • Line 458: "the interactions between malignant and macrophages"-> the interactions between malignant cells and macrophages.

      Thank you for your valuable comments. We have revised the manuscript according to your suggestions:

      Original (Line 458): "the interactions between malignant and macrophages"

      Revised: "the interactions between malignant cells and macrophages."

      • Line 486: "The 5-year survival rate is still gloomy" -> The 5-year survival rate is still low.

      Thank you for your valuable comments. We have revised the manuscript according to your suggestions:

      Original (Line 486): "The 5-year survival rate is still gloomy."

      Revised: "The 5-year survival rate is still low."

      • Line 491: "More and more efforts are devoted to targeted metabolism to overcome chemoresistance" -> More efforts are devoted to target cell metabolism...

      Thank you for your valuable comments. We have revised the manuscript according to your suggestions:

      Original (Line 491): "More and more efforts are devoted to targeted metabolism to overcome chemoresistance."

      Revised: "More efforts are devoted to targeting cell metabolism to overcome chemoresistance."

      • Line 594: "Repeat the above steps twice" -> This procedure was repeated twice.

      Thank you for your valuable comments. We have revised the manuscript according to your suggestions:

      Original (Line 594): "Repeat the above steps twice."

      Revised: "This procedure was repeated twice."

      • Line 620: How were the new potential markers verified? List the exact genes and experiments or a reference to a Figure.

      Thank you for your valuable comments. We have provided detailed information on how the new potential markers were verified, including the exact genes involved and the specific experiments conducted. A reference to the relevant Figure has also been added to the manuscript.

      • Line 637: Which immune cells were used as a background in CNV analysis? All immune cells or just T cells?

      Thank you for your valuable comments. In this study, all immune cells were used as background control cells.

      • Line 658: in a single cell

      Thank you for your valuable comments. We have revised the manuscript according to your suggestions.

      • Line 672: "a variety of environmental factors potentially affect" -> potentially affects/ may potentially affect.

      Thank you for your valuable comments. We have revised the manuscript according to your suggestions:

      Original (Line 672): "a variety of environmental factors potentially affect"

      Revised: "A variety of environmental factors may potentially affect"

      • Line 683: Which metabolites were tested?

      The metabolites tested included those related to glycolysis and oxidative phosphorylation (OXPHOS), such as glucose and various metabolites indicative of mitochondrial activity. The contents of these metabolites were analyzed to verify consistency with gene expression levels as mentioned in the analysis of metabolic pathways section.

      • Line 718: Required or acquired?

      The correct term should be "acquired" in the context of discussing drug resistance in tumor cells. The sentence likely refers to the "acquired drug resistance" of tumor cells, which is a common challenge in chemotherapy.

      • Line 726: What are the A549 cells?

      A549 cells are a human lung adenocarcinoma cell line commonly used in cancer research, particularly for studying lung cancer. In this study, A549 cells were used in animal experiments, mixed with tumor-associated macrophages (TAMs), and implanted into nude mice to study tumor formation and progression.

      • Line 631: "we set the following cut-off thresholds to reveal the marker genes of each cluster: adjusted P-value <0.01 and multiple changes >0.5." What metric is "multiple changes"? Commonly used measures are adjuster P-value and average Log2FC.

      Thank you for your valuable comment. We have revised the manuscript according to your suggestion. The term "multiple changes" was indeed a misstatement. The correct metric should be "log2 fold change (Log2FC)," which is a commonly used measure in gene expression studies. We have updated the manuscript to reflect this, using "adjusted P-value <0.01 and average Log2FC > 0.5" instead of "multiple changes > 0.5."

      • Figure 1f: "Samplied" -> Samples. What do the numbers on the left side of each column mean?

      Thank you for your valuable comment. The term "Samplied" was indeed a typographical error and has been corrected to "Samples". The numbers on the left side of each column likely represent cluster IDs or sample identifiers corresponding to the different patient samples or clusters analyzed in the study. We have clearly labeled these numbers in the figure to avoid any confusion.

      • Figure 2b: Please add a scale.

      Thank you for your valuable comment. We agree that adding a scale bar is crucial for accurately interpreting the size of the cells or structures shown in the figure. We have now included an appropriate scale bar during the figure preparation stage to provide this reference.

      • Figure 3d/4c: What is the matrix_27/3 metric? Is it average expression?

      Thank you for your valuable comment. The term "matrix_27/3" refers to a specific metric used in our analysis. This metric indeed represents the average expression levels of genes within a particular subset of the dataset. We will clarify this in the figure legend and the methods section to ensure that readers have a clear understanding of what the metric represents. Additionally, we will make sure that all such metrics are consistently and accurately described throughout the manuscript.

      • Figure 6e: Why CD206 staining is shown instead of ARG if ARG was chosen as the main gene for classification of Pro-macrophages?

      We apologize for the confusion regarding the use of CD206 staining in Figure 6e. This issue arose due to a miscommunication within our research team. While ARG1 was initially intended as the primary marker for Pro-macrophages, the technician mistakenly assumed ARG1 was another name for CD206 (MRC1), leading to the incorrect labeling of CD206 as ARG1 in our experimental records. In actuality, CD206 was used for the staining, which is consistent with the surface marker shown in Figure 6e. We have corrected this error in the manuscript and updated the experimental figures accordingly. We sincerely apologize for any misunderstanding this may have caused and appreciate the reviewer for bringing this to our attention.

      • Figures 6h and k: Please explain why do NCT Anti-macrophages show higher glucose and lactate uptake than the Anti-macrophages from the control group, while the size of tumors is the lowest in NCT Anti-macrophages in vivo?

      Thank you for your insightful comment. The observation that NCT Anti-macrophages exhibit higher glucose and lactate uptake while the tumor size is lowest could be attributed to the metabolic reprogramming induced by chemotherapy. It is possible that the enhanced metabolic activity in Anti-macrophages, characterized by increased glucose and lactate uptake, is linked to a more aggressive anti-tumor response in the NCT group. This heightened metabolic activity could reflect an increased energy demand necessary for sustaining enhanced immune functions, ultimately contributing to the reduction in tumor size. We will expand upon this explanation in the revised manuscript to provide a clearer interpretation of these findings.

      • The supplementary Table 1 needs a better legend/more explanation.

      Thank you for your valuable feedback. We have revised the legend for Supplementary Table 1 to provide a more detailed explanation of its contents.

      • No tSNE plot showing epithelial cells colored by patient, which may be important for observation of cell heterogeneity, especially in the epithelial cell population.

      Thank you for pointing this out. We agree that a tSNE plot showing epithelial cells colored by patient would be valuable for observing cell heterogeneity within the epithelial population.

      • Several acronyms not explained in the text (for example GSVA, NMF).

      Thank you for bringing this to our attention. We have ensured that all acronyms, including GSVA (Gene Set Variation Analysis) and NMF (Non-negative Matrix Factorization), are clearly defined in the text at their first mention.

      • Availability of data and material section: Please describe "other experimental data" in more detail.

      Thank you for your suggestion. We have expanded the "Availability of Data and Material" section to provide a more detailed description of the "other experimental data" referenced. This will include specific types of data generated, their formats, and 10how they can be accessed by other researchers. This clarification will enhance transparency and facilitate the reuse of our data by the research community.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This useful study examined the associations of a healthy lifestyle with comprehensive and organ-specific biological ages defined using common blood biomarkers and body measures. Its large sample size, longitudinal design, and robust statistical analysis provide solid support for the findings, which will be of interest to epidemiologists and clinicians.

      Thank you very much for your thoughtful review of our manuscript. Your valuable comments have greatly helped us improve our manuscript. We have carefully considered all the comments and suggestions made by the reviewers and have revised them to address each point. Below, we provide detailed responses to each of the reviewers' comments. Please note that the line numbers mentioned in the following responses correspond to the line numbers in the clean version of the manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study was to examine the associations of a healthy lifestyle with comprehensive and organ-specific biological ages. It emphasized the importance of lifestyle factors in biological ages, which were defined using common blood biomarkers and body measures.

      Strengths:

      The data were from a large cohort study and defined comprehensive and six-specified biological ages.

      Weaknesses:

      (1) Since only 8.5% of participants from the CMEC (China Multi-Ethnic Cohort Study) were included in the study, has any section bias happened?

      Thank you for your valuable question. We understand the concern regarding the potential selection bias due to only 8.5% of participants being included in the study. The baseline survey of China Multi-Ethnic Cohort Study (CMEC) employed a rigorous multi-stage stratified cluster sampling method and the repeat survey reevaluated approximately 10% of baseline participants through community-based cluster random sampling. Therefore, the sample of the repeat survey is representative. The second reason for the loss of sample size was the availability of biomarkers for BA calculation. We have compared characteristic of the overall population, the population included in and excluded from this study. Most characteristics were similar, but participants included in this study showed better in some health-related variables, one potential reason is healthier individuals were more likely to complete the follow-up survey. In conclusion, we believe that the impact of selection bias is limited.

      Author response table 1.

      Baseline characteristics of participants included and not included in the study

      BA, biological age; BMI, body mass index; CVD, cardiovascular disease; HLI, healthy lifestyle indicator.

      1 Data are presented as median (25th, 75th percentile) for continuous variables and count (percentage) for categorical variables.

      2 For HLI, "healthy" corresponds to a score of 4-5.

      3 Information on each validated BA has been reported. BA acceleration is the difference between each BA and CA in the same survey.

      (2) The authors should specify the efficiency of FFQ. How can FFQ genuinely reflect the actual intake? Moreover, how was the aMED calculated?

      Thank you for the comments and questions. We appreciate the opportunity to clarify these aspects of our study. For the first question, we evaluated the FFQ's reproducibility and validity by conducting repeated FFQs and 24-hour dietary recalls at the baseline survey. Intraclass correlation coefficients (ICC) for reproducibility ranged from 0.15 for fresh vegetables to 0.67 for alcohol, while deattenuated Spearman rank correlations for validity ranged from 0.10 for soybean products to 0.66 for rice. More details are provided in our previous study (Lancet Reg Health West Pac, 2021). We have added the corresponding content in both the main text and the supplementary materials.

      Methods, Page 8, lines 145-146: “The FFQ's reproducibility and validity were evaluated by conducting repeated FFQs and 24-hour dietary recalls.”

      Supplementary methods, Dietary assessment: “We evaluated the FFQ's reproducibility and validity by conducting repeated FFQs and 24-hour dietary recalls. Intraclass correlation coefficients for reproducibility ranged from 0.15 for fresh vegetables to 0.67 for alcohol, while deattenuated Spearman rank correlations for validity ranged from 0.10 for soybean products to 0.66 for rice.”

      For the second question, we apologize for any confusion. To avoid taking up too much space in the main text, we decided not to include the detailed aMED calculation (as described in Circulation, 2009) there and instead placed it in the supplementary materials:

      “Our calculated aMED score incorporates eight components: vegetables, legumes, fruits, whole grains, fish, the ratio of monounsaturated fatty acids (MUFA) to saturated fatty acids (SFA), red and processed meats, and alcohol. Each component's consumption was divided into sex-specific quintiles. Scores ranging from 1 to 5 were assigned based on quintile rankings to each component, except for red and processed meats and alcohol, for which the scoring was inverted. The alcohol criteria for the aMED was defined as moderate consumption. Since the healthy lifestyle index (HLI) already contained a drinking component, we removed the drinking item in the aMED, which had a score range of 7-35 with a higher score reflecting better adherence to the overall Mediterranean dietary pattern. We defined individuals with aMED scores ≥ population median as healthy diets.”

      Reference:

      (1) Xiao X, Qin Z, Lv X, Dai Y, Ciren Z, Yangla Y, et al. Dietary patterns and cardiometabolic risks in diverse less-developed ethnic minority regions: results from the China Multi-Ethnic Cohort (CMEC) Study. Lancet Reg Health West Pac. 2021;15:100252. doi: 10.1016/j.lanwpc.2021.100252.

      (2) Fung TT, Rexrode KM, Mantzoros CS, Manson JE, Willett WC, Hu FB. Mediterranean diet and incidence of and mortality from coronary heart disease and stroke in women. Circulation. 2009;119(8):1093-100. doi: 10.1161/circulationaha.108.816736.

      (3) HLI (range) and HLI (category) should be clearly defined.

      Thank you for the comment. We have added the definition of HLI (range) and HLI (category) in the methods section:

      Methods P9 lines 165-170: “The HLI was calculated by directly adding up the five lifestyle scores, ranging from 0-5, with a higher score representing an overall healthier lifestyle, denoted as HLI (range) in the following text. We then transformed HLI into a dichotomous variable in this study, denoted as HLI (category), where a score of 4-5 for HLI was considered a healthy lifestyle, and a score of 0-3 was considered an unfavorable lifestyle that could be improved.”

      (4) The comprehensive rationale and each specific BA construction should be clearly defined and discussed. For example, can cardiopulmonary BA be reflected only by using cardiopulmonary status? I do not think so.

      Thank you for the opportunity to clarify. We constructed the comprehensive BA based on all the available biochemical data from the CMEC study, selecting aging-related markers (J Gerontol A Biol Sci Med Sci, 2021), and further construct organ-specific BAs based on these selected biomarkers. The KDM algorithm does not specify biomarker types but requires them to be correlated with chronological age (CA) (Ageing Dev, 2006). Existing studies typically construct BA based on available biomarker, we included 15 biomarkers in this study, which could be considered comprehensive and extensive compared to previous research (J Transl Med. 2023; J Am Heart Assoc. 2024; Nat Cardiovasc Res. 2024). For how the biomarkers for each organ-specific BAs were selected, we categorized biomarkers primarily based on their relevance to the structure and function of each organ system according to the classification in previous studies (Nat Med, 2023; Cell Rep, 2022). Since the biomarkers we used came from clinical-lab data sets, they were categorized based on the clinical interpretation of blood chemistry tests following the methods outlined in the two referenced papers (Nat Med, 2023; Cell Rep, 2022). We only used biomarkers directly related to each specific system to minimize overlap between the indicators used for different BAs, thereby preserving the distinctiveness of organ-specific BAs. We acknowledge the limitations of this approach that a few biomarkers may not fully capture the complete aging process of a system, and certain indicators may be missing due to data constraints. However, the multi-organ BAs we constructed are cost-effective, easy to implement, and have been validated, making them valuable despite the limitations.

      Reference:

      (1) Verschoor CP, Belsky DW, Ma J, Cohen AA, Griffith LE, Raina P. Comparing Biological Age Estimates Using Domain-Specific Measures From the Canadian Longitudinal Study on Aging. J Gerontol A Biol Sci Med Sci. 2021;76(2):187-94. doi: 10.1093/gerona/glaa151.

      (2) Klemera P, Doubal S. A new approach to the concept and computation of biological age. Mech Ageing Dev. 2006;127(3):240-8. doi: 10.1016/j.mad.2005.10.004

      (3) Zhang R, Wu M, Zhang W, Liu X, Pu J, Wei T, et al. Association between life's essential 8 and biological ageing among US adults. J Transl Med. 2023;21(1):622. doi: 10.1186/s12967-023-04495-8.

      (4) Forrester SN, Baek J, Hou L, Roger V, Kiefe CI. A Comparison of 5 Measures of Accelerated Biological Aging and Their Association With Incident Cardiovascular Disease: The CARDIA Study. J Am Heart Assoc. 2024;13(8):e032847. doi: 10.1161/jaha.123.032847.

      (5) Jiang M, Tian S, Liu S, Wang Y, Guo X, Huang T, Lin X, Belsky DW, Baccarelli AA, Gao X. Accelerated biological aging elevates the risk of cardiometabolic multimorbidity and mortality. Nat Cardiovasc Res. 2024;3(3):332-42. doi: 10.1038/s44161-024-00438-8.

      (6) Tian YE, Cropley V, Maier AB, Lautenschlager NT, Breakspear M, Zalesky A. Heterogeneous aging across multiple organ systems and prediction of chronic disease and mortality. Nat Med. 2023;29(5):1221-31. doi: 10.1038/s41591-023-02296-6.

      (7) Nie C, Li Y, Li R, Yan Y, Zhang D, Li T, et al. Distinct biological ages of organs and systems identified from a multi-omics study. Cell Rep. 2022;38(10):110459. doi: 10.1016/j.celrep.2022.110459.

      (5) The lifestyle index is defined based on an equal-weight approach, but this does not reflect reality and cannot fully answer the research questions it raises.

      Thank you very much for your valuable suggestion. We used equal weight healthy lifestyle index (HLI) partly to facilitate comparisons with other studies. The equal-weight approach to construct the HLI is commonly used in current research (Bmj, 2021; Diabetes Care. 2022; Arch Gerontol Geriatr. 2022). The equal-weight HLI can demonstrate the average benefit of adopting each additional healthy lifestyle and avoid assumptions about the relative importance of different behaviors, which may vary depending on the population. To further clarify the importance of each lifestyle factor, we conducted quantile G-computation analysis, which can reflect the weight differences between lifestyle factors (PLoS Med, 2020; Clin Epigenetics, 2022).

      Reference:

      (1) Zhang YB, Chen C, Pan XF, Guo J, Li Y, Franco OH, Liu G, Pan A. Associations of healthy lifestyle and socioeconomic status with mortality and incident cardiovascular disease: two prospective cohort studies. Bmj. 2021;373:n604. doi: 10.1136/bmj.n604.

      (2) Han H, Cao Y, Feng C, Zheng Y, Dhana K, Zhu S, Shang C, Yuan C, Zong G. Association of a Healthy Lifestyle With All-Cause and Cause-Specific Mortality Among Individuals With Type 2 Diabetes: A Prospective Study in UK Biobank. Diabetes Care. 2022;45(2):319-29. doi: 10.2337/dc21-1512.

      (3) Jin S, Li C, Cao X, Chen C, Ye Z, Liu Z. Association of lifestyle with mortality and the mediating role of aging among older adults in China. Arch Gerontol Geriatr. 2022;98:104559. doi: 10.1016/j.archger.2021.104559.

      (4) Chudasama YV, Khunti K, Gillies CL, Dhalwani NN, Davies MJ, Yates T, Zaccardi F. Healthy lifestyle and life expectancy in people with multimorbidity in the UK Biobank: A longitudinal cohort study. PLoS Med. 2020;17(9):e1003332. doi: 10.1371/journal.pmed.1003332.

      (5) Kim K, Zheng Y, Joyce BT, Jiang H, Greenland P, Jacobs DR, Jr., et al. Relative contributions of six lifestyle- and health-related exposures to epigenetic aging: the Coronary Artery Risk Development in Young Adults (CARDIA) Study. Clin Epigenetics. 2022;14(1):85. doi: 10.1186/s13148-022-01304-9.

      Reviewer #2 (Public Review):

      This interesting study focuses on the association between lifestyle factors and comprehensive and organ-specific biological aging in a multi-ethnic cohort from Southwest China. It stands out for its large sample size, longitudinal design, and robust statistical analysis.

      Some issues deserve clarification to enhance this paper:

      (1) How were the biochemical indicators for organ-specific biological ages chosen, and are these indicators appropriate? Additionally, a more detailed description of the multi-organ biological ages should be provided to help understand the distribution and characteristics of BAs.

      We thank you for raising this point. As explained in our response to the fourth question from the first reviewer, we constructed the comprehensive BA b ased on all the available biochemical data from the CMEC study, selecting aging-related markers (J Gerontol A Biol Sci Med Sci, 2021), and further construct organ-specific BAs based on these selected biomarkers. The KDM algorithm does not specify biomarker types but requires them to be correlated with chronological age (CA) (Ageing Dev, 2006). Existing studies typically construct BA based on available biomarker, we included 15 biomarkers in this study, which could be considered comprehensive and extensive compared to previous research (J Transl Med. 2023; J Am Heart Assoc. 2024; Nat Cardiovasc Res. 2024). For how   the biomarkers for each organ-specific BAs were selected, we categorized biomarkers primarily based on their relevance to the structure and function of each organ system according to the classification in previous studies (Nat Med, 2023; Cell Rep, 2022). Since the biomarkers we used came from clinical-lab data sets, they were categorized based on the clinical interpretation of blood chemistry tests (Nat Med, 2023). We only used biomarkers directly related to each specific system to minimize overlap between the indicators used for different BAs, thereby preserving the distinctiveness of organ-specific BAs.

      We have added a descriptive table for the comprehensive and organ systems BAs in the supplementary materials to provide a more detailed understanding of the distribution and characteristics of BAs:

      Author response table 2.

      Description of BA and BA acceleration1

      BA, biological age

      1 Data are presented as mean (standard deviation).

      (2) The authors categorized the HLI score into a dichotomous variable, which may cause a loss of information. How did the authors address this potential issue?

      Thank you for raising this concern. We categorized each lifestyle factor into a binary variable based on relevant guidelines and studies, which recommend assigning a score of 1 if the guideline or study recommendations are met (Bmj, 2021; J Am Heart Assoc, 2023). While dichotomization may lead to some loss of information, it allows for a clearer interpretation and comparison of adherence to ideal healthy lifestyle behaviors. Another advantage of this treatment is that it allows for easy comparison with other studies. We categorized the HLI score into a dichotomous variable to enhance the practical relevance of the results (J Gerontol A Biol Sci Med Sci, 2021). Additionally, we conducted analyses using the continuous HLI score to ensure that our findings were robust, and the results were consistent with those obtained using the dichotomous HLI.

      Reference:

      (1) Verschoor CP, Belsky DW, Ma J, Cohen AA, Griffith LE, Raina P. Comparing Biological Age Estimates Using Domain-Specific Measures From the Canadian Longitudinal Study on Aging. J Gerontol A Biol Sci Med Sci. 2021;76(2):187-94. doi: 10.1093/gerona/glaa151.

      (2) Klemera P, Doubal S. A new approach to the concept and computation of biological age. Mech Ageing Dev. 2006;127(3):240-8. doi: 10.1016/j.mad.2005.10.004

      (3) Zhang R, Wu M, Zhang W, Liu X, Pu J, Wei T, et al. Association between life's essential 8 and biological ageing among US adults. J Transl Med. 2023;21(1):622. doi: 10.1186/s12967-023-04495-8.

      (4) Forrester SN, Baek J, Hou L, Roger V, Kiefe CI. A Comparison of 5 Measures of Accelerated Biological Aging and Their Association With Incident Cardiovascular Disease: The CARDIA Study. J Am Heart Assoc. 2024;13(8):e032847. doi: 10.1161/jaha.123.032847.

      (5) Jiang M, Tian S, Liu S, Wang Y, Guo X, Huang T, Lin X, Belsky DW, Baccarelli AA, Gao X. Accelerated biological aging elevates the risk of cardiometabolic multimorbidity and mortality. Nat Cardiovasc Res. 2024;3(3):332-42. doi: 10.1038/s44161-024-00438-8.

      (6) Tian YE, Cropley V, Maier AB, Lautenschlager NT, Breakspear M, Zalesky A. Heterogeneous aging across multiple organ systems and prediction of chronic disease and mortality. Nat Med. 2023;29(5):1221-31. doi: 10.1038/s41591-023-02296-6.

      (7) Nie C, Li Y, Li R, Yan Y, Zhang D, Li T, et al. Distinct biological ages of organs and systems identified from a multi-omics study. Cell Rep. 2022;38(10):110459. doi: 10.1016/j.celrep.2022.110459.

      (3) Because lifestyle data are self-reported, they may suffer from recall bias. This issue needs to be addressed in the limitations section.

      Thank you for your valuable suggestion. We acknowledge that the use of self-reported lifestyle data in our study may introduce recall bias, potentially affecting the accuracy of the information collected. We have added the following statement to the limitations section of our manuscript:

      Discussion, Page 22, lines 463-464: “Fifth, assessment of lifestyle factors was based on self-reported data collected through questionnaires, which may be subject to recall bias.”

      (4) It should be clarified whether the adjusted CA is the baseline value of CA. Additionally, why did the authors choose models with additional adjustments for time-invariant variables as their primary analysis? This approach does not align with standard FEM analysis (Lines 261-263).

      Thank you for the opportunity to clarify. We have changed the sentence to “baseline CA”. For the second question, in a standard fixed effects model (FEM), only time-varying variables are typically included. However, to enhance the flexibility of our models and account for potential variations in the association of time-invariant variables with CA, as has been commonly done in previous studies, we additionally adjusted for time-invariant variables and the baseline value of CA (BMC Med Res Methodol, 2024; Am J Clin Nutr, 2020). Moreover, sensitivity analyses using the standard FEM were conducted in this study, and robust results were obtained.

      Reference:

      (1) Tang D, Hu Y, Zhang N, Xiao X, Zhao X. Change analysis for intermediate disease markers in nutritional epidemiology: a causal inference perspective. BMC Med Res Methodol. 2024;24(1):49. doi: 10.1186/s12874-024-02167-9.

      (2) Trichia E, Luben R, Khaw KT, Wareham NJ, Imamura F, Forouhi NG. The associations of longitudinal changes in consumption of total and types of dairy products and markers of metabolic risk and adiposity: findings from the European Investigation into Cancer and Nutrition (EPIC)-Norfolk study, United Kingdom. Am J Clin Nutr. 2020;111(5):1018-26. doi: 10.1093/ajcn/nqz335.

      (5) How is the relative contribution calculated in the QGC analysis? The relative contribution of some lifestyle factors is not shown in Figure 2 and the supplementary figures, such as Supplementary Figure 7. These omissions should be explained.

      Thanks for the questions. The QGC obtains causal relationships and estimates weights for each component, which has been widely used in epidemiological research. More details about QGC can be found in the supplementary methods. The reason some results are not displayed is that we assumed all healthy lifestyle changes would have a protective effect on BA acceleration. However, the effect size of some lifestyle factors did not align with this assumption and lacked statistical significance. Because positive and negative weights were calculated separately in QGC, with all positive weights summing to 1 and all negative weights summing to 1, these factors would have had large positive weights. To avoid potential misunderstandings, we chose not to include these results in the figures. We have added explanations to the figure legends where applicable:

      “The blue bars represent results that are statistically significant in the FEM analysis, while the gray bars represent results in the FEM analysis that were not found to be statistically significant and positive weights were not shown.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      To enhance this paper, some issues deserve clarification:

      (1) How were the biochemical indicators for organ-specific biological ages chosen, and are these indicators appropriate? Additionally, please provide a more detailed description of the multi-organ biological ages to help understand BAs' the distribution and characteristics.

      (2) The authors categorized the HLI score into a dichotomous variable, which may cause a loss of information. How did the authors address this potential issue?

      (3) Because lifestyle data are self-reported, they may suffer from recall bias. This issue needs to be addressed in the limitations section.

      (4) Lines 261-263: Please clarify if the adjusted CA is the baseline value of CA. Additionally, why did you choose models with additional adjustments for time-invariant variables as your primary analysis? This approach does not align with standard FEM analysis.

      (5) How is the relative contribution calculated in the QGC analysis? The relative contribution of some lifestyle factors is not shown in Figure 2 and the supplementary figures, such as Supplementary Figure 7. Please explain these omissions.

      The above five issues overlap with those raised by Reviewer #2 (Public Review). Please refer to the responses provided earlier.

      Minor revision:

      Line 50: The expression "which factors" should be changed to "which lifestyle factor."

      Thank you for the suggestion. As suggested, we have used “which lifestyle factor” instead.

      Lines 91-92: "Aging exhibits variations across and with individuals" appears to be a clerical error. According to the context, it should be "Aging exhibits variations across and within individuals."

      We thank the reviewer for the correction. We have updated the text to read:

      “Aging exhibits variations across and within individuals.”

      Line 154: The authors mentioned "Considering previous studies" but lacked references. Please add the appropriate citations.

      Thank you for pointing this out. We apologize for the oversight. We have now added the appropriate citations to support the statement "Considering previous studies" in the revised manuscript.

      Lines 170-171: "regular exercise ("12 times/week", "3-5 times/week," or "daily or almost every day")"; the first item in parentheses should be "1-2 times/week"? Please verify and correct if necessary. Additionally, check the entire text carefully to avoid confusion caused by clerical errors.

      Thank you for your careful review. We have changed the sentence to "1-2 times/week." We have thoroughly checked the entire manuscript to ensure that no other clerical errors remain.

      Clarifications for Table 1:

      i. The expression "HLI=0" is difficult to understand. Please provide a more straightforward explanation or rephrase it.

      Thank you for your feedback. We have removed the confusing expression and provided a clearer explanation in the table legend for better understanding:

      “For HLI (category), "healthy" corresponds to a score of 4-5, while "unfavorable" corresponds to a score of 0-3.”

      ii. The baseline age is presented as an integer, but the follow-up age is not. Please clarify this discrepancy.

      Thank you for pointing out this discrepancy. We calculated the precise chronological age based on based on participants' survey dates and birth dates for the biological age calculations. Initially, the table presented age as integers, but we have now updated it to show the precise ages.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1:

      (1) Given that this is one of the first studies to report the mapping of longitudinal intactness of proviral genomes in the globally dominant subtype C, the manuscript would benefit from placing these findings in the context of what has been reported in other populations, for example, how decay rates of intact and defective genomes compare with that of other subtypes where known.  

      Most published studies are from men living with HIV-1 subtype B and the studies are not from the hyperacute infection phase and therefore a direct head-to-head comparison with the FRESH study is difficult.  However, we can cite/highlight and contrast our study with a few a few examples from acute infection studies as follows.

      a. Peluso et. al., JCI, 2020, showed that in Caucasian men (SCOPE study), with subtype B infection, initiating ART during chronic infection virus intact genomes decayed at a rate of 15.7% per year, while defective genomes decayed at a rate of 4% per year.  In our study we showed that in chronic treated participants genomes decreased at a rate of 25% (intact) and 3% (defective) per month for the first 6 months of treatment.

      b. White et. al., PNAS, 2021, demonstrated that in a cohort of African, white and mixed-race American men treated during acute infection, the rate of decay of intact viral genomes in the first phase of decay was <0.3 logs copies in the first 2-3 weeks following ART initiation. In the FRESH cohort our data from acute treated participants shows a comparable decay rate of 0.31 log copies per month for virus intact genomes.

      c. A study in Thailand (Leyre et. al., 2020, Science Translational Medicine), of predominantly HIV-1 CRF01-AE subtype compared HIV-reservoir levels in participants starting ART at the earliest stages of acute HIV infection (in the RV254/SEARCH 010 cohort) and participants initiating ART during chronic infection (in SEARCH 011 and RV304/SEARCH 013 cohorts). In keeping with our study, they showed that the frequency of infected cells with integrated HIV DNA remained stable in participants who initiated ART during chronic infection, while there was a sharp decay in these infected cells in all acutely treated individuals during the first 12 weeks of therapy.  Rates of decay were not provided and therefore a direct comparison with our data from the FRESH cohort is not possible.

      d. A study by Bruner et. al., Nat. Med. 2016, described the composition of proviral populations in acute treated (within 100 days) and chronic treated (>180 days), predominantly male subtype B cohort. In comparison to the FRESH chronic treated group, they showed that in chronic treated infection 98% (87% in FRESH) of viral genomes were defective, 80% (60% in FRESH) had large internal deletions and 14% (31% in FRESH) were hypermutated.  In acute treated 93% (48% in FRESH) were defective and 35% (7% in FRESH) were hypermutated.  The differences frequency of hypermutations could be explained by the differences in timing of infection specifically in the acute treated groups where FRESH participants initiate ART at a median of 1 day after infection.  It is also possible that sex- or race-based differences in immunological factors that impact the reservoir may play a role.  

      This study also showed that large deletions are non-random and occur at hotspots in the HIV-1 genome. The design of the subtype B IPDA assay (Bruner et. al., Nature, 2019) is based on optimal discrimination between intact and deleted sequences - obtained with a 5′ amplicon in the Ψ region and a 3′ amplicon in Envelope. This suggest that Envelope is a hotspot for large while deletions in Ψ is the site of frequent small deletions and is included in larger 5′ deletions. In the FRESH cohort of HIV-1 subtype C, genome deletions were most frequently observed between Integrase and Envelope relative to Gag (p<0.0001–0.001).

      e. In 2017, Heiner et. al., in Cell Rep, also described genetic characteristics of the latent HIV-1 reservoir in 3 acute treated and 3 chronic treated male study participants with subtype B HIV.  Their data was similar to Bruner et. al. above showing proportions of intact proviruses in participants who initiated therapy during acute/early infection at 6% (94% defective) and chronic infection at 3% (97% defective). In contrast the frequencies in FRESH in acute treated were 52% intact and 48% defective and in chronic infection were 13% intact and 87% defective.  These differences could be attributed to the timing of treatment initiation where in the aforementioned study early treatment ranged from 0.6-3.4 months after infection.

      (2) Indeed, in the abstract, the authors indicate that treatment was initiated before the peak. The use of the term 'peak' viremia in the hyperacute-treated group could perhaps be replaced with 'highest recorded viral load'. The statistical comparison of this measure in the two groups is perhaps more relevant with regards to viral burden over time or area under the curve viral load as these are previously reported as correlates of reservoir size.

      We have edited the manuscript text to describe the term peak viraemia in hyperacute treated participants more clearly (lines 443-444). We have now performed an analysis of area under the curve to compare viral burden in the two study groups and found associations with proviral DNA levels after one year. This has been added to the results section (lines 162-163).

      Reviewer #2:

      (1) Other factors also deserve consideration and include age, and environment (e.g. other comorbidities and coinfections.)

      We agree that these factors could play a role however participants in this study were of similar age (18-23), and information on co-morbidities and coinfections are not known.

      Reviewer #3:

      (1) The word reservoir should not be used to describe proviral DNA soon after ART initiation. It is generally agreed upon that there is still HIV DNA from actively infected cells (phase 1 & 2 decay of RNA) during the first 6-12 months of ART. Only after a full year of uninterrupted ART is it really safe to label intact proviral HIV DNA as an approximation of the reservoir. This should be amended throughout.

      We agree and where appropriate have amended the use of the word reservoir to only refer to the proviral load after full viral suppression, i.e., undetectable viral load.

      (2) All raw, individualized data should be made available for modelers and statisticians. It would be very nice to see the RNA and DNA data presented in a supplementary figure by an individual to get a better grasp of intra-host kinetics.

      We will make all relevant data available and accessible to interested parties on request. We have now added a section on data availability (lines 489-491).

      (3) The legend of Supplementary Figure 2 should list when samples were taken.

      The data in this figure represents an overall analysis of all sequences available for each participant at all time points.  This has now been explained more clearly in the figure legend.

      Recommendations for The Authors:

      Reviewer #1:

      (1) It is recommended that the introduction includes information to set the scene regarding what is currently reported on the composition of the reservoir for those not in the immediate field of study i.e., the reported percentage of defective genomes and in which settings/populations genome intactness has been mapped, as this remains an area of limited information.

      We have now included summary of other reported findings in the field in the introduction (lines 89-92, 9498) and discussion (lines 345-350).  A more detailed overview has been provided in the response to public reviews.

      (2) It may be beneficial to state in the main text of the paper what the purpose of the Raltegravir was and that it was only administered post-suppression. Looking at Table 1, only the hyperacute treatment group received Raltegravir and this could be seen as a confounder as it is an integrase inhibitor. Therefore, this should be explained.

      Once Raltegravir became available in South Africa, all new acute infections in the study cohort had an intensified 4-drug regimen that included Raltegravir.  A more detailed explanation has now been included in the methods section (lines 435-437).

      (3) Can the authors explain why the viral measures at 6 months post-ART are not shown for chronictreated individuals in Figure 1 or reported on in the text?

      The 6 months post-ART time point has been added to Figure 1.

      (4) Can the authors indicate in the discussion, how the breakdown of proviral composition compares to subtype B as reported in the literature, for example, are the common sites of deletion similar, or is the frequency of hypermutation similar?

      Added to discussion (lines 345-350).

      (5) Do the numbers above the bars in Figure 3 represent the number of sampled genomes? If so, this should be stated.

      Yes, the numbers above the bars represent the number of sampled genomes. This has been added to the Figure 3 legend.

      (6) In the section starting on line 141, the introduction implies a comparison with immunological features, yet what is being compared are markers of clinical disease progression rather than immune responses. This should be clarified/corrected.

      This has been corrected (line 153).

      (7) Line 170 uses the term 'immediately' following infection, however, was this not 1 -3 days after?

      We have changed the word “immediately” to “1-3 days post-detection” (line 181).

      (8) Can the sampling time-points for the two groups be given for the longitudinal sequencing analysis?

      The sequencing time points for each group is depicted in Figure 2.

      (9) Line 183 indicates that intact genomes contributed 65% of the total sequence pool, yet it's given as 35% in the paragraph above. Should this be defective genomes?

      Yes, this was a typographical error.  Now corrected to read “defective genomes” (line 193).

      (10) The section on decay kinetics of intact and defective genomes seems to overlap with the section above and would flow better if merged.

      Well noted, however we choose to keep these sections separate.

      (11) Some references in the text are given in writing instead of numbering.

      This has been corrected.

      (12) In the clonal expansion results section, can it be indicated between which two time-points expansion was measured?

      This analysis was performed with all sequences available for each participant at all time points.  We have added this explanation to the respective Figure legend.

      Reviewer #2:

      (1) The statement on line 384 "Our data showed that early ART...preserves innate immune factors" - what innate immune factors are being referred to?

      We have removed this statement.

      (2) HLA genotyping methods are not included in the Methods section

      Now included and referenced (lines 481-483).

      (3) Are CD4:CD8 ratios available for the cohorts? This could be another informative clinical parameter to analyse in relation to HIV-1 proviral load after 1 year of ART – as done for the other variables (peak VL, and the CD4 measures).

      Yes, CD4:CD8 ratios are available. We performed the recommended analysis but found no associations with HIV-1 proviral load after 1 year of ART. We have added this to the results section (lines 163-164).

      (4) Reference formatting: Paragraph starting at line 247 (Contribution of clonal expansion...) - the two references in this paragraph are not cited according to the numbering system as for the rest of the manuscript. The Lui et al, 2020 reference is missing from the reference list - so will change all the numbering throughout.

      This has been corrected.

      Reviewer #3:

      (1) To allow comparison to past work. I suggest changing decay using % to half-life. I would also mention the multiple studies looking at total and intact HIV DNA decay rates in the intro.

      We do not have enough data points to get a good estimate of the half-life and therefor report decay as percentage per month for the first 6 months. 

      (2) Line 73: variability is the wrong word as inter-individual variability is remarkably low. I think the authors mean "difference" between intact and total.

      We have changed the word variability to difference as suggested.

      (3) Line 297: I am personally not convinced that there is data that definitively shows total HIV DNA impacting the pathophysiology of infection. All of this work is deeply confounded by the impact of past viremia. The authors should talk about this in more detail or eliminate this sentence.

      We have reworded the statement to read “Total HIV-1 DNA is an important biomarker of clinical outcomes.” (Lines 308-309).

      (4) Line 317; There is no target cell limitation for reservoir cells. The vast majority of CD4+ T cells during suppressive ART are uninfected. The mechanism listing the number of reservoir cells is necessarily not target cell limitation.

      We agree. The statement this refers to has been reworded as follows: “Considering, that the majority of CD4 T cells remain uninfected it is likely that this does not represent a higher number of target cells, and this warrants further investigation.” (lines 325-326).

      (5) Line 322: Some people in the field bristle at the concept of total HIV DNA being part of the reservoir as defective viruses do not contribute to viremia. Please consider rephrasing. 

      We acknowledge that there are deferring opinions regarding total HIV DNA being part of the reservoir as defective viruses do not contribute to viremia, however defective HIV proviruses may contribute to persistent immune dysfunction and T cell exhaustion that are associated comorbidities and adverse clinical outcomes in people living with HIV.  We have explained in the text that total HIV-DNA does not distinguish between replication-competent and -defective viruses that contribute to the viral reservoir.

      (6) Line 339: The under-sampling statement is an understatement. The degree of under-sampling is massive and biases estimates of clonality and sensitivity for intact HIV. Please see and consider citing work by Dan Reeves on this subject.

      We agree and have cited work by Dan Reeves (line 358).

      (7) Line 351: This is not a head-to-head comparison of biphasic decay as the Siliciano group's work (and others) does not start to consider HIV decay until one year after ART. I think it is important to not consider what happens during the first year of ART to be reservoir decay necessarily.

      Well noted.

      (8) Line 366-371: This section is underwritten. In nearly all PWH studies to date, observed reservoirs are highly clonal.

      We agree that observed reservoirs are highly clonal but have not added anything further to this section.

      (9) It would be nice to have some background in the intro & discussion about whether there is any a priori reason that clade C reservoirs, or reservoirs in South African women, might differ (or not) from clade B reservoirs observed in different study participants.

      We have now added this to the introduction (lines 94-103).

      (10) Line 248: This sentence is likely not accurate. It is probable that most of the reservoir is sustained by the proliferation of infected CD4+ T cells. 50% is a low estimate due to under-sampling leading to false singleton samples. Moreover, singletons can also be part of former clones that have contracted, which is a natural outcome for CD4+ T cells responding to antigens &/or exhibiting homeostasis. The data as reported is fine but more complex ecologic methods are needed to truly probe the clonal structure of the reservoir given severe under sampling.

      Well noted.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their time and thoughtful comments on our manuscript. 

      We realised a preliminary version of Figure 2 was initially submitted, which we are replacing now with a novel version. Differences between the two figures are : 1) The schematic in Figure 2a was replaced with a new one in line with that of Figure 3a; 2) in Figure 2c details about the statistical analysis were removed from the legend and one datapoint that was erroneously removed at day 5 for the ΔMYR1-Luc condition was included. Regardless, these changes do not affect the results and the conclusions initially drawn.

      Public Reviews:

      Reviewer #1 (Public review): 

      Previous studies have highlighted some of these paracrine activities of Toxoplasma - and Rasogi et al (mBio, 2020) used a single cell sequencing approach of cells infected in vitro with the WT or MYR KO parasites - and one of their conclusions was that MYR-1 dependent paracrine activities counteract ROP-dependent processes.

      Similarly, Chen et al (JEM 2020) highlighted that a particular rhoptry protein (ROP16) could be injected into uninfected macrophages and move them to an anti-inflammatory state that might benefit the parasite. 

      We are aware of both these studies, where the injection of rhoptry proteins into cells that the parasite does not invade alters the host transcriptional profile establishing a permissive environment. However, here we propose a different paracrine effect that goes beyond the injected/uninfected cell. Specifically, we propose that one or more MYR1-dependent effectors alter the cytokine secretion profile of infected cells, which leads to overall changes in the immune response such as cell types recruited to the site of infection, or the activation state. 

      There are caveats around immunity and as yet no insight into how this works. In Figure 2 there is a marked defect in the ability of the parasites to expand at day 2 and day 5. Together, these data sets suggest that this paracrine effect mediated by MYR-1 works early - well before the development of adaptive responses. 

      Yes, we also hypothesise an early effect based on the data. Growth continues until day 5 at least, and then plateaus towards day 7, which makes us believe that the effect takes place within the first 5 days. We agree with the reviewer that the MYR1-mediated rescue acts before the involvement of the adaptive immune response, which is supported by our results obtained in Rag2-/- mice shown in Figure 3e. 

      Reviewer #2 (Public review): 

      Summary: 

      In this manuscript by Torelli et al., the authors propose that the major function of MYR1 and MYR1-dependent secreted proteins is to contribute to parasite survival in a paracrine manner rather than to protect parasites from cell-autonomous immune response. The authors conclude that these paracrine effects rescue ∆MYR1 or knockouts of MYR1-dependent effectors within pooled in vivo CRISPR screens. 

      Strengths: 

      The authors raised a more general concern that pooled CRISPR screens (not only in Toxoplasma but also other microbes or cancers) would miss important genes by "paracrine masking effect". Although there is no doubt that pooled CRISPR screens (especially in vivo CRISPR screens) are powerful techniques, I think this topic could be of interest to those fields and researchers. 

      Weaknesses: 

      In this version, the reviewer is not entirely convinced of the 'paracrine masking effect' because the in vivo experiments should include appropriate controls (see major point 2). 

      (1) It is convincing that co-infection of WT and ∆MYR1 parasites could rescue the growth of ∆MYR1 in mice shown by in vivo luciferase imaging. Also, this is consistent with ∆MYR1 parasites showing no in vivo fitness defect in the in vivo CRISPR screens conducted by several groups. Meanwhile, it has been reported previously and shown in this manuscript that ∆MYR1 parasites have an in vitro growth defect; however, ∆MYR1 parasites show no in vitro fitness defect the in vitro pooled CRISPR screen. The authors show that the competition defect of ∆MYR1 parasites cannot be rescued by co-infection with WT parasites in Figure 1c, which might indicate that no paracrine rescue occurred in an in vitro environment. The authors seem not to mention these discrepancies between in vitro CRISPR screens and in vitro competition assays. Why do ∆MYR1 parasites possess neutral in vitro fitness scores in in vitro CRISPR screens? Could the authors describe a reasonable hypothesis? 

      The reviewer raises a very interesting point, which at this stage, we cannot fully explain. A technical explanation could be that the relatively small growth defect detected for clean KOs, is not well represented in the CRISPR screens due to the variability of guides, where smaller differences in growth are not reliably captured and hidden within the noise of the assays. Another technical explanation may be median-centering: if the majority of KOs in the pool have a small growth defect, median centering would push these towards a zero. We have observed and reported this phenomenon in Young et al., 2019 for libraries containing a larger fraction of genes with a negative fitness score. In the library used here focusing on secreted proteins, we have not observed a strong trend to negative fitness scores, but cannot exclude smaller shifts. Because we have no solid base to favour any of the above mentioned explanations, we have decided to not speculate too much on this in the manuscript. However, we wanted to show all the data as the difference between these results may not be technical, but biological, which could inform future studies or results by us and others.  

      (2) The authors developed a mixed infection assay with an inoculum containing a 20:80 ratio of ΔMYR1-Luc parasites with either WT parasites or ΔMYR1 mutants not expressing luciferase, showing that the in vivo growth defect of ∆MYR1 parasites is rescued by the presence of WT parasites. Since this experiment lacks appropriate controls, interpretation could be difficult. Is this phenomenon specific to MYR1? If a co-inoculum of ∆GRA12-Luc with either WT parasites or GRA12 parasites not expressing luciferase is included, the data could be appropriately interpreted. 

      We are not quite sure what appropriate controls the reviewer refers to. We show here in Figures 3c and 3f that increasing parasite load by co-infecting mice with ∆MYR1 parasites is not sufficient to rescue ∆MYR1-Luc parasite growth. Co-infection with WT parasites, however, does result in increased ∆MYR1-Luc parasitaemia at day 7 p.i., indicating that MYR1 competence is required for the in vivo trans-rescue we describe. As ∆GRA12 parasites have a very strong cell-autonomous restriction in vitro and severe growth defect in vivo (Torelli et al., BioRxiv), these parasites would be rapidly depleted, which is also observed in all CRISPR screens from various laboratories. Therefore we do not think that co-infection with GRA12-deficient parasites would be an informative experiment here. We do speculate that mutant parasites for other proteins required for export (i.e. MYR 2, 3, 4, ROP17) could also be trans-rescued in addition to mutants for other MYR-dependent proteins such as GRA24 and GRA28, which remodel cytokine secretion and could individually, or synergistically, affect host cell immunity. Dissecting which Toxoplasma factor/s and host cytokine signalling pathways drive this trans-rescue effect is highly interesting, but beyond the scope of this manuscript. Here, we focused on the basic concept that an individual mutant can be rescued in trans in vivo, which we think is of importance beyond the field of Toxoplasma research. 

      (3) In the Discussion part, the authors argue that the rescue phenotype of mixed infection is not due to co-infection of host cells (lines 307-310). This data is important to support the authors' paracrine hypothesis and should be shown in the main figure.

      We understand the reviewer’s concern for rescue by co-infection of the same cell, but we largely exclude this hypothesis as Toxoplasma cell-autonomous effectors, such as GRA12 and ROP18, would also be rescued if that were to happen on a larger scale. We previously performed an in vivo experiment to assess co-infection rates of peritoneal exudate cells (PECs) by imaging using infection doses comparable to those used in the trans-rescue experiments. The total infection rate of PECs was 2.3%, so the overall number of infected cells per image was low, and not suitable for publication purposes. We tried to capture more cells using FACS analysis, however, PECs are highly autofluorescent in the yellow/green channels, which prevented us from drawing adequate conclusions using our GFP and mCherry strains. Because we see no rescue of GRA12 or ROP18 in CRISPR screens, and the overall in vivo co-infection rates were very low as observed by imaging, we did not think that generating strains expressing different fluorochromes compatible with standard FACS analysis, and then performing more in vivo experiments was best use of resources at the time. 

      (4) In the Discussion part, the authors assume that the rescue phenotype is the result of multiple MYR1-dependent effectors. I admit that this hypothesis could be possible since a recently published paper described the concerted action of numerous MYR1-dependent or independent effectors contributing to the hypermigration of infected cells (Ten Hoeve et al., mBio, 2024). I think this paragraph would be kind of overstated since the authors did not test any of the candidate effectors. Since the authors possess ∆IST parasites, they can test whether IST is involved in the "paracrine masking effect" or not to support their claim. 

      MYR1 deletion impairs the export of multiple Toxoplasma effectors into the host cell, including GRA16, GRA24, GRA28, HCE1/TEEGR etc, many of which can influence cytokine levels. As such, we speculate that it is a combination of multiple effector proteins that are responsible for the trans-rescue. As stated above, which parasite effectors, host cell types and cytokines are involved in the phenotype we describe are part of ongoing and future studies. Here, we wanted to focus on the key message, that in in vivo CRISPR screens, paracrine rescue of individual mutants can occur. While we will test IST mutants, it is probably not the top candidate as it only prevents upregulation of ISGs after exposure to IFN-γ, but has probably no role in already stimulated cells. As we still observe strong rescue past day 3, when IFN-γ levels are already elevated (Nishiyama 2020 Parasitol Int), IST probably plays no dominant role. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) Figure 1 - it's not obvious what concentration of IFN-gamma is being used in these assays (sorry if this is stated somewhere else). 

      All in vitro experiments were performed with 100 U/ml IFN-γ as stated in the Material & Methods section, however added this information in the figure legend of Figure 1.

      (2) Figure 3 This reviewer wonders if earlier differences are buried in the data sets. In Figure 3b it looks like there are early differences but this is lost in the collated data analysis in 3c. An early difference is quite apparent in Figure 2. 

      We agree with the reviewer that a difference is visible at day 3 and 5 in Figure 3b, however differences between experimental groups became statistically significant only at day 7 in Figure 3c (N = 4 biological replicates). We cannot compare results between Figure 3c and Figure 2c as the latter reports 100% WT or ΔMYR1 infections and not 20:80 mixes.

      (3) The authors conclude from their in vitro studies that MYR-1 is not required for in vitro growth in IFN-g activated macrophages. Given that the WT parasites still rescue MYR KO parasites in RAG mice it does imply that this paracrine effect would impact early innate responses. Since RAG mice do have a strong ILC/NK cell response that leads to the local production of IFN-g it would seem like a reasonable candidate. Do the authors know if the MYR KO have improved growth in the absence of IFN-g in vivo? This could be done using KO mice or with IFN-g neutralization. 

      MYR1 displayed a neutral score in CRISPR screens in IFN-γ KO mice (Tachibana et al Cell Reports 2023), suggesting that lack of IFN-γ does not specifically improve MYR1 mutant growth compared to other mutants in a pool. We believe that the rescue is rather driven by other cytokines that have been shown to be altered in a MYR1 dependent manner (i.e CCL2, IL-6, IL-12). But as laid out before, this is subject of future studies.  

      This is a submission that might benefit from a graphical model of how the authors view this system working. 

      We agree with the reviewer and we added a graphical model to the manuscript. 

      Reviewer #2 (Recommendations for the authors): 

      The authors previously published a study that combines CRISPR screens in Toxoplasma and host transcriptome by scRNA-seq (Butterworth et al., Cell Host Microbe 2023). I think the authors possess transcriptome of ∆MYR1-infected HFFs. Although I understand this screen is conducted in in-vitro culture and human fibroblasts, are there any differentially expressed genes or pathways that could explain the paracrine rescue phenomenon described in this manuscript?

      We thank the reviewer for this insightful comment, which is however hard to address.  Thousands of host cell genes within multiple pathways are affected by MYR1 deletion (Naor et al. mBio 2018; Butterworth et al. Cell Host Microbe 2023). Therefore the PerturbSeq dataset is not helpful to pinpoint specific immune mechanisms of rescue, and is speculative without any experimentation to back it up. However, we added a sentence in line 350 of the discussion to highlight known MYR1-related effects on immune-related pathways. “Individual MYR-related effectors that may be responsible for the paracrine rescue have not been investigated here and we hypothesise that the phenotype is likely the concerted result of multiple effectors that affect cytokine secretion. For example, previous studies showed that both GRA18 and GRA28 can induce release of CCL22 from infected cells (He 2018 eLife; Rudzki 2021 mBio), while GRA16 and HCE1/TEEGR impair NF-kB signalling and the potential release of pro-inflammatory cytokines such as IL-6, IL-1β and TNF (Seo 2020 Int J Mol Sci; Braun 2019 Nat Microbiol). Regardless of the effector(s), our results highlight an important novel function of MYR1-dependent effectors by establishing a supportive environment in trans for Toxoplasma growth within the peritoneum.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths and weaknesses:

      Although the revised manuscript has significantly improved in the quality of pictures, there seems to be still a discrepancy in Figure 2A: quantification result suggested that NIC (1um) treatment increased the number of colonies from 300 to around 450 (1.5 folds), whereas representative picture shown that the difference was 3 to 12 living organoids (4 folds).

      As reviewer points out, the selected picture was not representative image of “control” group in Figure2A. We replaced it by the new representative image in this revised version.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      A minor point to be corrected:

      Please consider removing "In consistent with this notion", which is repetitive with "Similarly".

      " NIC is supposed to activate Wnt signaling via Hippo-YAP/TAZ and Notch signaling. In consistent with this notion. Similarly, the expression of target proteins (Sox9, TCF4 and, C-myc)..."

      We corrected it according to the reviewer’s suggestion.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Tiedje et al. investigated the transient impact of indoor residual spraying (IRS) followed by seasonal malaria chemoprevention (SMC) on the plasmodium falciparum parasite population in a high transmission setting. The parasite population was characterized by sequencing the highly variable DBL$\alpha$ tag as a proxy for var genes, a method known as varcoding. Varcoding presents a unique opportunity due to the extraordinary diversity observed as well as the extremely low overlap of repertoires between parasite strains. The authors also present a new Bayesian approach to estimating individual multiplicity of infection (MOI) from the measured DBL$\alpha$ repertoire, addressing some of the potential shortcomings of the approach that have been previously discussed. The authors also present a new epidemiological endpoint, the so-called "census population size", to evaluate the impact of interventions. This study provides a nice example of how varcoding technology can be leveraged, as well as the importance of using diverse genetic markers for characterizing populations, especially in the context of high transmission. The data are robust and clearly show the transient impact of IRS in a high transmission setting, however, some aspects of the analysis are confusing.

      (1) Approaching MOI estimation with a Bayesian framework is a well-received addition to the varcoding methodology that helps to address the uncertainty associated with not knowing the true repertoire size. It's unfortunate that while the authors clearly explored the ability to estimate the population MOI distribution, they opted to use only MAP estimates. Embracing the Bayesian methodology fully would have been interesting, as the posterior distribution of population MOI could have been better explored. 

      We thank the reviewer for appreciating the extension of var_coding we present here. We believe the comment on maximum _a posteriori (MAP) refers to the way we obtained population-level MOI from the individual MOI estimates. We would like to note that reliance on MAP was only one of two approaches we described, although we then presented only MAP.  Having calculated both, we did not observe major differences between the two, for this data set.  Nonetheless, we revised the manuscript to include the result based on the mixture distribution which considers all the individual MOI distributions in the Figure supplement 6.

      (2) The "census population size" endpoint has unclear utility. It is defined as the sum of MOI across measured samples, making it sensitive to the total number of samples collected and genotyped. This means that the values are not comparable outside of this study, and are only roughly comparable between strata in the context of prevalence where we understand that approximately the same number of samples were collected. In contrast, mean MOI would be insensitive to differences in sample size, why was this not explored? It's also unclear in what way this is a "census". While the sample size is certainly large, it is nowhere near a complete enumeration of the parasite population in question, as evidenced by the extremely low level of pairwise type sharing in the observed data. 

      We consider the quantity a census in that it is a total enumeration or count of infections in a given population sample and over a given time period. In this sense, it gives us a tangible notion of the size of the parasite population, in an ecological sense, distinct from the formal effective population size used in population genetics. Given the low overlap between var repertoires of parasites (as observed in monoclonal infections), the population size we have calculated translates to a diversity of strains or repertoires.  But our focus here is in a measure of population size itself.  The distinction between population size in terms of infection counts and effective population size from population genetics has been made before for pathogens (see for example Bedford et al. for the seasonal influenza virus and for the measles virus (Bedford et al., 2011)), and it is also clear in the ecological literature for non-pathogen populations (Palstra and Fraser, 2012). 

      We completely agree with the dependence of our quantity on sample size. We used it for comparisons across time of samples of the same depth, to describe the large population size characteristic of high transmission which persists across the IRS intervention. Of course, one would like to be able to use this quantity across studies that differ in sampling depth and the reviewer makes an insightful and useful suggestion.  It is true that we can use mean MOI, and indeed there is a simple map between our population size and mean MOI (as we just need to divide or multiply by sample size, respectively) (Table supplement 7).  We can go further, as with mean MOI we can presumably extrapolate to the full sample size of the host population, or to the population size of another sample in another location. What is needed for this purpose is a stable mean MOI relative to sample size.  We can show that indeed in our study mean MOI is stable in that way, by subsampling to different depths our original sample (Figure supplement 8 in the revised manuscript). We now include in the revision discussion of this point, which allows an extrapolation of the census population size to the whole population of hosts in the local area.

      We have also clarified the time denominator: Given the typical duration of infection, we expect our population size to be representative of a per-generation measure_._

      (3) The extraordinary diversity of DBL$\alpha$ presents challenges to analyzing the data. The authors explore the variability in repertoire richness and frequency over the course of the study, noting that richness rapidly declined following IRS and later rebounded, while the frequency of rare types increased, and then later declined back to baseline levels. The authors attribute this to fundamental changes in population structure. While there may have been some changes to the population, the observed differences in richness as well as frequency before and after IRS may also be compatible with simply sampling fewer cases, and thus fewer DBL$\alpha$ sequences. The shift back to frequency and richness that is similar to pre-IRS also coincides with a similar total number of samples collected. The authors explore this to some degree with their survival analysis, demonstrating that a substantial number of rare sequences did not persist between timepoints and that rarer sequences had a higher probability of dropping out. This might also be explained by the extreme stochasticity of the highly diverse DBL$\alpha$, especially for rare sequences that are observed only once, rather than any fundamental shifts in the population structure.

      We thank the reviewer raising this question which led us to consider whether the change in the number of DBLα types over the course of the study (and intervention) follows from simply sampling fewer P. falciparum cases. We interpreted this question as basically meaning that one can predict the former from the latter in a simple way, and that therefore, tracking the changes in DBLα type diversity would be unnecessary.  A simple map would be for example a linear relationship (a given proportion of DBLα types lost given genomes lost), and even more trivially, a linear loss with a slope of one (same proportion).  Note, however, that for such expectations, one needs to rely on some knowledge of strain structure and gene composition. In particular, we would need to assume a complete lack of overlap and no gene repeats in a given genome. We have previously shown that immune selection leads to selection for minimum overlap and distinct genes in repertoires at high transmission (see for example (He et al., 2018)) for theoretical and empirical evidence of both patterns). Also, since the size of the gene pool is very large, even random repertoires would lead to limited overlap (even though the empirical overlap is even smaller than that expected at random (Day et al., 2017)). Despite these conservators, we cannot a priori assume a pattern of complete non-overlap and distinct genes, and ignore plausible complexities introduced by the gene frequency distribution.  

      To examine this insightful question, we simulated the loss of a given proportion of genomes from baseline in 2012 and examined the resulting loss of DBLα types. We specifically cumulated the loss of infections in individuals until it reached a given proportion (we can do this on the basis of the estimated individual MOI values). We repeated this procedure 500 times for each proportion, as the random selection of individual infection to be removed, introduces some variation. Figure 2 below shows that the relationship is nonlinear, and that one quantity is not a simple proportion of the other.  For example, the loss of half the genomes does not result in the loss of half the DBLα types. 

      Author response image 1.

      Non-linear relationship between the loss of DBLα types and the loss of a given proportion of genomes. The graph shows that the removal of parasite genomes from the population through intervention does not lead to the loss of the same proportion of DBLα types, as the initial removal of genomes involves the loss of rare DBLα types mostly whereas common DBLα types persist until a high proportion of genomes are lost. The survey data (pink dots) used for this subsampling analysis was sampled at the end of wet/high transmission season in Oct 2012 from Bongo District from northern Ghana. We used the Bayesian formulation of the _var_coding method proposed in this work to calculate the multiplicity of infection of each isolate to further obtain the total number of genomes. The randomized surveys (black dots) were obtained based on “curveball algorithm” (Strona et al., 2014) which keep isolate lengths and type frequency distribution.

      We also investigated whether the resulting pattern changed significantly if we randomized the composition of the isolates.  We performed such randomization with the “curveball algorithm” (Strona et al., 2014). This algorithm randomizes the presence-absence matrix with rows corresponding to the isolates and columns, to the different DBLα types; importantly, it preserves the DBLα type frequency and the length of isolates. We generated 500 randomizations and repeated the simulated loss of genomes as above. The data presented in Figure 2 above show that the pattern is similar to that obtained for the empirical data presented in this study in Ghana. We interpret this to mean that the number of genes is so large, that the reduced overlap relative to random due to immune selection (see (Day et al., 2017)) does not play a key role in this specific pattern. 

      Reviewer #2 (Public Review):  

      In this manuscript, Tiedje and colleagues longitudinally track changes in parasite numbers across four time points as a way of assessing the effect of malaria control interventions in Ghana. Some of the study results have been reported previously, and in this publication, the authors focus on age-stratification of the results. Malaria prevalence was lower in all age groups after IRS. Follow-up with SMC, however, maintained lower parasite prevalence in the targeted age group but not the population as a whole. Additionally, they observe that diversity measures rebounds more slowly than prevalence measures. Overall, I found these results clear, convincing, and well-presented. They add to a growing literature that demonstrates the relevance of asymptomatic reservoirs.  There is growing interest in developing an expanded toolkit for genomic epidemiology in malaria, and detecting changes in transmission intensity is one major application. As the authors summarize, there is no one-size-fits-all approach, and the Bayesian MOIvar estimate developed here has the potential to complement currently used methods. I find its extension to a calculation of absolute parasite numbers appealing as this could serve as both a conceptually straightforward and biologically meaningful metric. However, I am not fully convinced the current implementation will be applied meaningfully across additional studies. 

      (1) I find the term "census population size" problematic as the groups being analyzed (hosts grouped by age at a single time point) do not delineate distinct parasite populations. Separate parasite lineages are not moving through time within these host bins. Rather, there is a single parasite population that is stochastically divided across hosts at each time point. I find this distinction important for interpreting the results and remaining mindful that the 2,000 samples at each time point comprise a subsample of the true population. Instead of "census population size", I suggest simplifying it to "census count" or "parasite lineage count".  It would be fascinating to use the obtained results to model absolute parasite numbers at the whole population level (taking into account, for instance, the age structure of the population), and I do hope this group takes that on at some point even if it remains outside the scope of this paper. Such work could enable calculations of absolute---rather than relative---fitness and help us further understand parasite distributions across hosts.

      Lineages moving exclusively through a given type of host or “patch”  are not a necessary requirement for enumerating the size of the total infections in such subset.  It is true that what we have is a single parasite population, but we are enumerating for the season the respective size in host classes (children and adults). This is akin to enumerating subsets of a population in ecological settings where one has multiple habitat patches, with individuals able to move across patches.

      Remaining mindful that the count is relative to sample size is an important point. Please see our response to comment (2) of reviewer 1, also for the choice of terminology. We prefer not to adopt “census count” as a census in our mind is a count, and we are not clear on the concept of lineage for these highly recombinant parasites.  Also, census population size has been adopted already in the literature for both pathogens and non-pathogens, to make a distinction with the notion of effective population size in population genetics (see our response to reviewer 1) and is consistent with our usage as outlined in the introduction. 

      Thank you for the comment on an absolute number which would extrapolate to the whole host population.  Please see again our response to comment (2) of reviewer 1, on how we can use mean MOI for this purpose once the sampling is sufficient for this quantity to become constant/stable with sampling effort.

      (2) I'm uncertain how to contextualize the diversity results without taking into account the total number of samples analyzed in each group. Because of this, I would like a further explanation as to why the authors consider absolute parasite count more relevant than the combined MOI distribution itself (which would have sample count as a denominator). It seems to me that the "per host" component is needed to compare across age groups and time points---let alone different studies.

      Again, thank you for the insightful comment. We provide this number as a separate quantity and not a distribution, although it is clearly related to the mean MOI of such distribution. It gives a tangible sense for the actual infection count (different from prevalence) from the perspective of the parasite population in the ecological sense. The “per host” notion which enables an extrapolation to any host population size for the purpose of a complete count, or for comparison with another study site, has been discussed in the above responses for reviewer 1 and now in the revision of the discussion.

      (3) Thinking about the applicability of this approach to other studies, I would be interested in a larger treatment of how overlapping DBLα repertoires would impact MOIvar estimates. Is there a definable upper bound above which the method is unreliable? Alternatively, can repertoire overlap be incorporated into the MOI estimator? 

      This is a very good point and one we now discuss further in our revision. There is no predefined upper bound one can present a priori. Intuitively, the approach to estimate MOI would appear to breakdown as overlap moves away from extremely low values, and therefore for locations with low transmission intensity.  Interestingly, we have observed that this is not the case in our paper by Labbe et al. (Labbé et al., 2023) where we used model simulations in a gradient of three transmission intensities, from high to low values. The original _var_coding method performed well across the gradient. This robustness may arise from a nonlinear and fast transition from low to high overlap that is accompanied by MOI changing rapidly from primarily multiclonal (MOI > 1) to monoclonal (MOI = 1). This matter clearly needs to be investigated further, including ways to extend the estimation to explicitly include the distribution of overlap.

      Smaller comments:

      - Figure 1 provides confidence intervals for the prevalence estimates, but these aren't carried through on the other plots (and Figure 5 has lost CIs for both metrics). The relationship between prevalence and diversity is one of the interesting points in this paper, and it would be helpful to have CIs for both metrics when they are directly compared. 

      Based on the reviewer’s advice we have revised both Figure 4 and Figure 5, to include the missing uncertainty intervals. The specific approach for each quantity is described in the corresponding caption.

      Reviewer #3 (Public Review): 

      Summary: 

      The manuscript coins a term "the census population size" which they define from the diversity of malaria parasites observed in the human community. They use it to explore changes in parasite diversity in more than 2000 people in Ghana following different control interventions. 

      Strengths: 

      This is a good demonstration of how genetic information can be used to augment routinely recorded epidemiological and entomological data to understand the dynamics of malaria and how it is controlled. The genetic information does add to our understanding, though by how much is currently unclear (in this setting it says the same thing as age-stratified parasite prevalence), and its relevance moving forward will depend on the practicalities and cost of the data collection and analysis. Nevertheless, this is a great dataset with good analysis and a good attempt to understand more about what is going on in the parasite population. 

      Census population size is complementary to parasite prevalence where the former gives a measure of the “parasite population size”, and the latter describes the “proportion of infected hosts”.  The reason we see similar trends for the “genetic information” (i.e., census population size) and “age-specific parasite prevalence” is because we identify all samples for var_coding based on the microscopy (i.e., all microscopy positive _P. falciparum isolates). But what is more relevant here is the relative percentage change in parasite prevalence and census population size following the IRS intervention. To make this point clearer in the revised manuscript we have updated Figure 4 and included additional panels plotting this percentage change from the 2012 baseline, for both census population size and prevalence (Figure 4EF). Overall, we see a greater percentage change in 2014 (and 2015), relative to the 2012 baseline, for census parasite population size vs. parasite prevalence (Figure 4EF) as a consequence of the significant changes in distributions of MOI following the IRS intervention (Figure 3). As discussed in the Results following the deployment of IRS in 2014 census population size decreased by 72.5% relative to the 2012 baseline survey (pre-IRS) whereas parasite prevalence only decreased by 54.5%. 

      With respect to the reviewer’s comment on “practicalities and cost”, var_coding has been used to successfully amplify _P. falciparum DNA collected as DBS that have been stored for more than 5-years from both clinical and lower density asymptomatic infection, without the additional step and added cost of sWGA ($8 to $32 USD per isolates, for costing estimates see (LaVerriere et al., 2022; Tessema et al., 2020)), which is currently required by other molecular surveillance methods (Jacob et al., 2021; LaVerriere et al., 2022; Oyola et al., 2016). _Var_coding involves a single PCR per isolate using degenerate primers, where a large number of isolates can be multiplexed into a single pool for amplicon sequencing.  Thus, the overall costs for incorporating molecular surveillance with _var_coding are mainly driven by the number of PCRs/clean-ups, the number samples indexed per sequencing run, and the NGS technology used (discussed in more detail in our publication Ghansah et al. (Ghansah et al., 2023)). Previous work has shown that _var_coding can be use both locally and globally for molecular surveillance, without the need to be customized or updated, thus it can be fairly easily deployed in malaria endemic regions (Chen et al., 2011; Day et al., 2017; Rougeron et al., 2017; Ruybal-Pesántez et al., 2022, 2021; Tonkin-Hill et al., 2021).

      Weaknesses: 

      Overall the manuscript is well-written and generally comprehensively explained. Some terms could be clarified to help the reader and I had some issues with a section of the methods and some of the more definitive statements given the evidence supporting them. 

      Thank you for the overall positive assessment. On addressing the “issues with a section of the methods” and “some of the more definitive statements given the evidence supporting them”, it is impossible to do so however, without an explicit indication of which methods and statements the reviewer is referring to. Hopefully, the answers to the detailed comments and questions of reviewers 1 and 2 address any methodological concerns (i.e., in the Materials and Methods and Results). To the issue of “definitive statements”, etc. we are unable to respond without further information.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Line 273: there is a reference to a figure which supports the empirical distribution of repertoire given MOI = 1, but the figure does not appear to exist.

      We now included the correct figure for the repertoire size distribution as Figure supplement 3 (previously published in Labbé et al (Labbé et al., 2023)). This figure was accidently forgotten when the manuscript was submitted for review, we thank the reviewer for bringing this to our attention.

      Line 299: while this likely makes little difference, an insignificant result from a Kolmogorov-Smirnov test doesn't tell you if the distributions are the same, it only means there is not enough evidence to determine they are different (i.e. fail to reject the null). Also, what does the "mean MOI difference" column in supplementary table 3 mean? 

      The mean MOI difference is the difference in the mean value between the pairwise comparison of the true population-level MOI distribution, that of the population-level MOI estimates from either pooling the maximum a posteriori (MAP) estimates per individual host or the mixture distribution, or that of the population-level MOI estimates from different prior choices. This is now clarified as requested in the Table supplements 3 - 6. 

      Figure 4: how are the confidence intervals for the estimated number of var repertoires calculated? Also should include horizontal error bars for prevalence measures.

      The confidence intervals were calculated based on a bootstrap approach. We re-sampled 10,000 replicates from the original population-level MOI distribution with replacement. Each resampled replicate is the same size as the original sample. We then derive the 95% CI based on the distribution of the mean MOI of those resampled replicates. This is now clarified as requested in the Figure 4 caption (as well as Table supplement 7 footnotes). In addition, we have also updated Figure 4AB and have included the 95% CI for all measures for clarity. 

      Reviewer #2 (Recommendations For The Authors): 

      -  I would like to see a plot like Supplemental Figure 8 for the upsA DBLα repertoire size. 

      The upsA repertoire size for each survey and by age group has now been provided as requested in Figure supplement 5AB. 

      -  Supplemental Table 2 is cut off in the pdf. 

      We have now resolved this issue so that the Table supplement 2 is no longer cut off.  

      Reviewer #3 (Recommendations For The Authors): 

      The manuscript terms the phrase "census population size". To me, the census is all about the number of individuals, not necessarily their diversity. I appreciate that there is no simple term for this, and I imagine the authors have considered many alternatives, but could it be clearer to say the "genetic census population size"? For example, I found the short title not particularly descriptive "Impact of IRS and SMC on census population size", which certainly didn't make me think of parasite diversity.

      Please see our response to comment (2) of reviewer 1. We prefer not to add “genetic” to the phrase as the distinction from effective population size from population genetics is important, and the quantity we are after is an ecological one. 

      The authors do not currently say much about the potential biases in the genetic data and how this might influence results. It seems likely that because (i) patients with sub-microscopic parasitaemia were not sampled and (ii) because a moderate number of (likely low density) samples failed to generate genetic data, that the observed MOI is an overestimate. I'd be interested to hear the authors' thoughts about how this could be overcome or taken into account in the future. 

      We thank the reviewer for this this comment and agree that this is an interesting area for further consideration. However, based on research from the Day Lab that is currently under review (Tan et al. 2024, under review), the estimated MOI using the Bayesian approach is likely not an “overestimate” but rather an “underestimate”. In this research by Tan et al. (2024) isolate MOI was estimated and compared using different initial whole blood volumes (e.g., 1, 10, 50, 100 uL) for the gDNA extraction. Using _var_coding and comparing these different volumes it was found that MOI was significantly “underestimated” when small blood volumes were used for the gDNA extraction, i.e., there was a ~3-fold increase in median MOI between 1μL and 100μL blood. Ultimately these findings will allow us to make computational corrections so that more accurate estimates of MOI can be obtained from the DBS in the future.

      The authors do not make much of LLIN use and for me, this can explain some of the trends. The first survey was conducted soon after a mass distribution whereas the last was done at least a year after (when fewer people would have been using the nets which are older and less effective). We have also seen a rise in pyrethroid resistance in the mosquito populations of the area which could further diminish the LLIN activity. This difference in LLIN efficacy between the first and last survey could explain similar prevalence, yet lower diversity (in Figures 4B/5). However, it also might mean that statements such as Line 478 "This is indicative of a loss of immunity during IRS which may relate to the observed loss of var richness, especially the many rare types" need to be tapered as the higher prevalence observed in this age group could be caused by lower LLIN efficacy at the time of the last survey, not loss of immunity (though both could be true).  

      We thank the reviewer for this question and agree that (i) LLIN usage and (ii) pyrethroid resistance are important factors to consider. 

      (i) Over the course of this study self-reported LLIN usage the previous night remained high across all age groups in each of the surveys (≥ 83.5%), in fact more participants reported sleeping under an LLIN in 2017 (96.8%) following the discontinuation of IRS compared to the 2012 baseline survey (89.1%). This increase in LLIN usage in 2017 is likely a result of several factors including a rebound in the local vector population making LLINs necessary again, increased community education and/or awareness on the importance of using LLINs, among others. Information on the LLINs (i.e., PermaNet 2.0, Olyset, or DawaPlus 2.0) distributed and participant reported usage the previous night has now been included in the Materials and Methods as requested by the reviewer.

      (ii) As to the reviewer’s question on increased in pyrethroid resistance in Ghana over the study period, research undertaken by our entomology collaborators (Noguchi Memorial Insftute for Medical Research: Profs. S. Dadzie and M. Appawu; and Navrongo Health Research Centre:  Dr. V. Asoala) has shown that pyrethroid resistance is a major problem across the country, including the Upper East Region. Preliminary studies from Bongo District (2013 - 2015), were undertaken to monitor for mutations in the voltage gated sodium channel gene that have been associated with knockdown resistance to pyrethroids and DDT in West Africa (kdr-w). Through this analysis the homozygote resistance kdr-w allele (RR) was found in 90% of An. gambiae s.s. samples tested from Bongo, providing evidence of high pyrethroid resistance in Bongo District dating back to 2013, i.e., prior to the IRS intervention (S. Dadzie, M. Appawu, personal communication). Although we do not have data in Bongo District on kdr-w from 2017 (i.e., post-IRS), we can hypothesize that pyrethroid resistance likely did not decline in the area, given the widespread deployment and use of LLINs.

      Thus, given this information that (i) self-reported LLIN usage remained high in all surveys (≥ 83.5%), and that (ii) there was evidence of high pyrethroid resistance in 2013 (i.e., kdr-w (RR) _~_90%), the rebound in prevalence observed for the older age groups (i.e., adolescents and adults) in 2017 is therefore best explained by a loss of immunity.

      I must confess I got a little lost with some of the Bayesian model section methods and the figure supplements. Line 272 reads "The measurement error is simply the repertoire size distribution, that is, the distribution of the number of non-upsA DBLα types sequenced given MOI = 1, which is empirically available (Figure supplement 3)." This does not appear correct as this figure is measuring kl divergence. If this is not a mistake in graph ordering please consider explaining the rationale for why this graph is being used to justify your point. 

      We now included the correct figure for the repertoire size distribution as Figure supplement 3 (previously published in Labbé et al (Labbé et al., 2023)). This figure was accidently forgotten when the manuscript was submitted for review, we thank the reviewer for bringing our attention to this matter. We hope that the inclusion of this Figure as well as a more detailed description of the Bayesian approach helps to makes this section in the Materials and Methods clearer for the reader. 

      I was somewhat surprised that the choice of prior for estimating the MOI distribution at the population level did not make much difference. To me, the negative binomial distribution makes much more sense. I was left wondering, as you are only measuring MOI in positive individuals, whether you used zero truncated Poisson and zero truncated negative binomial distributions, and if not, whether this was a cause of a lack of difference between uniform and other priors. 

      Thank you for the relevant question. We have indeed considered different priors and the robustness of our  estimates to this choice and have now better described this in the text. We focused on individuals who had a confirmed microscopic asymptomatic P. falciparum infection for our MOI estimation, as median P. falciparum densities were overall low in this population during each survey (i.e., median ≤ 520 parasites/µL, see Table supplement 1). Thus, we used either a uniform prior excluding zero or a zero truncated negative binomial distribution when exploring the impact of priors on the final population-level MOI distribution.  A uniform prior and a zero-truncated negative binomial distribution with parameters within the range typical of high-transmission endemic regions (higher mean MOI with tails around higher MOI values) produce similar MOI  estimates at both the individual and population level. However, when setting the parameter range of the zero-truncated negative binomial to be of those in low transmission endemic regions where the empirical MOI distribution centers around mono-clonal infections with the majority of MOI = 1 or 2 (mean MOI » 1.5, no tail around higher MOI values), the final population-level MOI distribution does deviate more from that assuming the aforementioned prior and parameter choices. The final individual- and population-level MOI estimates are not sensitive to the specifics of the prior MOI distribution as long as this distribution captures the tail around higher MOI values with above-zero probability.   

      The high MOI in children <5yrs in 2017 (immediately after SMC) is very interesting. Any thoughts on how/why? 

      This result indicates that although the prevalence of asymptomatic P. falciparum infections remained significantly lower for the younger children targeted by SMC in 2017 compared 2012, they still carried multiclonal infections, as the reviewer has pointed out (Figure 3B). Importantly this upward shift in the MOI distributions (and median MOI) was observed in all age groups in 2017, not just the younger children, and provides evidence that transmission intensity in Bongo has rebounded in 2017, 32-months a er the discontinuation of IRS.  This increase in MOI for younger children at first glance may seem to be surprising, but instead likely shows the limitations of SMC to clear and/or supress the establishment of newly acquired infections, particularly at the end of the transmission season following the final cycle of SMC (i.e., end of September 2017 in Bongo District; NMEP/GHS, personal communication) when the posttreatment prophylactic effects of SMC would have waned (Chotsiri et al., 2022).  

      Line 521 in the penultimate paragraph says "we have analysed only low density...." should this not be "moderate" density, as low density infections might not be detected? The density range itself is not reported in the manuscript so could be added. 

      In Table supplement 1 we have provided the median, including the inter-quartile range, across each survey by age group. For the revision we have now provided the density min-max range, as requested by the reviewer. Finally, we have revised the statement in the discussion so that it now reads “….we have analysed low- to moderate-density, chronic asymptomatic infections (see Table supplement 1)……”.   

      Data availability - From the text the full breakdown of the epidemiological survey does not appear to be available, just a summary of defined age bounds in the SI. Provision of these data (with associated covariates such as parasite density and host characteristics linked to genetic samples) would facilitate more in-depth secondary analyses. 

      To address this question, we have updated the “Data availability statement” section with the following statement: “All data associated with this study are available in the main text, the Supporting Information, or upon reasonable request for research purposes to the corresponding author, Prof. Karen Day (karen.day@unimelb.edu.au).”  

      REFERENCES

      Bedford T, Cobey S, Pascual M. 2011. Strength and tempo of selection revealed in viral gene genealogies. BMC Evol Biol 11. doi:10.1186/1471-2148-11-220

      Chen DS, Barry AE, Leliwa-Sytek A, Smith T-AA, Peterson I, Brown SM, Migot-Nabias F, Deloron P, Kortok MM, Marsh K, Daily JP, Ndiaye D, Sarr O, Mboup S, Day KP. 2011. A molecular epidemiological study of var gene diversity to characterize the reservoir of Plasmodium falciparum in humans in Africa. PLoS One 6:e16629. doi:10.1371/journal.pone.0016629

      Chotsiri P, White NJ, Tarning J. 2022. Pharmacokinetic considerations in seasonal malaria chemoprevention. Trends Parasitol. doi:10.1016/j.pt.2022.05.003

      Day KP, Artzy-Randrup Y, Tiedje KE, Rougeron V, Chen DS, Rask TS, Rorick MM, Migot-Nabias F, Deloron P, Luty AJF, Pascual M. 2017. Evidence of Strain Structure in Plasmodium falciparum Var Gene Repertoires in Children from Gabon, West Africa. PNAS 114:E4103–E4111. doi:10.1073/pnas.1613018114

      Ghansah A, Tiedje KE, Argyropoulos DC, Onwona CO, Deed SL, Labbé F, Oduro AR, Koram KA, Pascual M, Day KP. 2023. Comparison of molecular surveillance methods to assess changes in the population genetics of Plasmodium falciparum in high transmission. Fron9ers in Parasitology 2:1067966. doi: 10.3389/fpara.2023.1067966

      He Q, Pilosof S, Tiedje KE, Ruybal-Pesántez S, Artzy-Randrup Y, Baskerville EB, Day KP, Pascual M. 2018. Networks of genetic similarity reveal non-neutral processes shape strain structure in Plasmodium falciparum. Nat Commun 9:1817. doi:10.1038/s41467-018-04219-3

      Jacob CG, Thuy-nhien N, Mayxay M, Maude RJ, Quang HH, Hongvanthong B, Park N, Goodwin S, Ringwald P, Chindavongsa K, Newton P, Ashley E. 2021. Genetic surveillance in the Greater Mekong subregion and South Asia to support malaria control and elimination. Elife 10:1–22.

      Labbé F, He Q, Zhan Q, Tiedje KE, Argyropoulos DC, Tan MH, Ghansah A, Day KP, Pascual M. 2023. Neutral vs . non-neutral genetic footprints of Plasmodium falciparum multiclonal infections. PLoS Comput Biol 19:e1010816. doi:doi.org/10.1101/2022.06.27.497801

      LaVerriere E, Schwabl P, Carrasquilla M, Taylor AR, Johnson ZM, Shieh M, Panchal R, Straub TJ, Kuzma R, Watson S, Buckee CO, Andrade CM, Portugal S, Crompton PD, Traore B, Rayner JC, Corredor V, James K, Cox H, Early AM, MacInnis BL, Neafsey DE. 2022. Design and implementation of multiplexed amplicon sequencing panels to serve genomic epidemiology of infectious disease: A malaria case study. Mol Ecol Resour 2285–2303. doi:10.1111/1755-0998.13622

      Oyola SO, Ariani C V., Hamilton WL, Kekre M, Amenga-Etego LN, Ghansah A, Rutledge GG, Redmond S, Manske M, Jyothi D, Jacob CG, Ogo TD, Rockeg K, Newbold CI, Berriman M, Kwiatkowski DP. 2016. Whole genome sequencing of Plasmodium falciparum from dried blood spots using selecFve whole genome amplification. Malar J 15:1–12. doi:10.1186/s12936-016-1641-7

      Palstra FP, Fraser DJ. 2012. Effective/census population size ratio estimation: A compendium and appraisal. Ecol Evol 2:2357–2365. doi:10.1002/ece3.329

      Rougeron V, Tiedje KE, Chen DS, Rask TS, Gamboa D, Maestre A, Musset L, Legrand E, Noya O, Yalcindag E, Renaud F, Prugnolle F, Day KP. 2017. Evolutionary structure of Plasmodium falciparum major variant surface antigen genes in South America : Implications for epidemic transmission and surveillance. Ecol Evol 7:9376–9390. doi:10.1002/ece3.3425

      Ruybal-Pesántez S, Sáenz FE, Deed S, Johnson EK, Larremore DB, Vera-Arias CA, Tiedje KE, Day KP. 2021. Clinical malaria incidence following an outbreak in Ecuador was predominantly associated with Plasmodium falciparum with recombinant variant antigen gene repertoires. medRxiv.

      Ruybal-Pesántez S, Tiedje KE, Pilosof S, Tonkin-Hill G, He Q, Rask TS, Amenga-Etego L, Oduro AR, Koram KA, Pascual M, Day KP. 2022. Age-specific patterns of DBLa var diversity can explain why residents of high malaria transmission areas remain susceptible to Plasmodium falciparum blood stage infection throughout life. Int J Parasitol 20:721–731.

      Strona G, Nappo D, Boccacci F, Fagorini S, San-Miguel-Ayanz J. 2014. A fast and unbiased procedure to randomize ecological binary matrices with fixed row and column totals. Nat Commun 5. doi:10.1038/ncomms5114

      Tessema SK, Hathaway NJ, Teyssier NB, Murphy M, Chen A, Aydemir O, Duarte EM, Simone W, Colborn J, Saute F, Crawford E, Aide P, Bailey JA, Greenhouse B. 2020. Sensitive, highly multiplexed sequencing of microhaplotypes from the Plasmodium falciparum heterozygome. Journal of Infec9ous Diseases 225:1227–1237.

      Tonkin-Hill G, Ruybal-Pesántez S, Tiedje KE, Rougeron V, Duffy MF, Zakeri S, Pumpaibool T, Harnyuganakorn P, Branch OH, Ruiz-Mesıa L, Rask TS, Prugnolle F, Papenfuss AT, Chan Y, Day KP. 2021. Evolutionary analyses of the major variant surface antigen-encoding genes reveal population structure of Plasmodium falciparum within and between continents. PLoS Genet 7:e1009269. doi:10.1371/journal.pgen.1009269

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their manuscript, Gomez-Frittelli and colleagues characterize the expression of cadherin6 (and -8) in colonic IPANs of mice. Moreover, they found that these cdh6-expressing IPANs are capable of initiating colonic motor complexes in the distal colon, but not proximal and midcolon. They support their claim by morphological, electrophysiological, optogenetic, and pharmacological experiments.

      Strengths:

      The work is very impressive and involves several genetic models and state-of-the-art physiological setups including respective controls. It is a very well-written manuscript that truly contributes to our understanding of GI-motility and its anatomical and physiological basis. The authors were able to convincingly answer their research questions with a wide range of methods without overselling their results.

      We greatly appreciate the reviewer’s time, careful reading and support of our study.

      Weaknesses:

      The authors put quite some emphasis on stating that cdh6 is a synaptic protein (in the title and throughout the text), which interacts in a homophilic fashion. They deduct that cdh6 might be involved in IPAN-IPAN synapses (line 247ff.). However, Cdh6 does not only interact in synapses and is expressed by non-neuronal cells as well (see e.g., expression in the proximal tubuli of the kidney). Moreover, cdh6 does not only build homodimers, but also heterodimers with Chd9 as well as Cdh7, -10, and -14 (see e.g., Shimoyama et al. 2000, DOI: 10.1042/0264-6021:3490159). It would therefore be interesting to assess the expression pattern of cdh6-proteins using immunostainings in combination with synaptic markers to substantiate the authors' claim or at least add the possibility of cell-cell-interactions other than synapses to the discussion. Additionally, an immunostaining of cdh6 would confirm if the expression of tdTomato in smooth muscle cells of the cdh6-creERT model is valid or a leaky expression (false positive).

      We agree with the reviewer that Cdh6 could be mediating some other cell-cell interaction besides synapses between IPANs, and will include more on this in the discussion. Cdh6 primarily forms homodimers but, as the reviewer points out, has been known to also form heterodimers with some other cadherins. We performed RNAscope in the colonic myenteric plexus with Cdh7 and found no expression (data not shown). Cdh10 is suggested to have very low expression (Drokhlyansky et al., 2020), possibly in putative secretomotor vasodilator neurons, and Cdh14 has not been assayed in any RNAseq screens. We attempted to visualize Cdh6 protein via antibody staining (Duan et al., 2018) but our efforts did not result in sufficient signal or resolution to identify synapses in the ENS, which remain broadly challenging to assay. Similarly, immunostaining with Cdh6 antibody was unable to confirm Cdh6 protein in tdT-expressing muscle cells, or by RNAscope. We will address these caveats in the discussion section.

      (1) E. Drokhlyansky, C. S. Smillie, N. V. Wittenberghe, M. Ericsson, G. K. Griffin, G. Eraslan, D. Dionne, M. S. Cuoco, M. N. Goder-Reiser, T. Sharova, O. Kuksenko, A. J. Aguirre, G. M. Boland, D. Graham, O. Rozenblatt-Rosen, R. J. Xavier, A. Regev, The Human and Mouse Enteric Nervous System at Single-Cell Resolution. Cell 182, 1606-1622.e23 (2020).

      (2) X. Duan, A. Krishnaswamy, M. A. Laboulaye, J. Liu, Y.-R. Peng, M. Yamagata, K. Toma, J. R. Sanes, Cadherin Combinations Recruit Dendrites of Distinct Retinal Neurons to a Shared Interneuronal Scaffold. Neuron 99, 1145-1154.e6 (2018).

      Reviewer #2 (Public review):

      Summary:

      Intrinsic primary afferent neurons are an interesting population of enteric neurons that transduce stimuli from the mucosa, initiate reflexive neurocircuitry involved in motor and secretory functions, and modulate gut immune responses. The morphology, neurochemical coding, and electrophysiological properties of these cells have been relatively well described in a long literature dating back to the late 1800's but questions remain regarding their roles in enteric neurocircuitry, potential subsets with unique functions, and contributions to disease. Here, the authors provide RNAscope, immunolabeling, electrophysiological, and organ function data characterizing IPANs in mice and suggest that Cdh6 is an additional marker of these cells.

      Strengths:

      This paper would likely be of interest to a focused enteric neuroscience audience and increase information regarding the properties of IPANs in mice. These data are useful and suggest that prior data from studies of IPANs in other species are likely translatable to mice.

      We appreciate the reviewer’s support of our study and insightful critiques for its improvement.

      Weaknesses:

      The advance presented here beyond what is already known is minimal. Some of the core conclusions are overstated and there are multiple other major issues that limit enthusiasm. Key control experiments are lacking and data do not specifically address the properties of the proposed Cdh6+ population.

      Major weaknesses:

      (1) The novelty of this study is relatively low. The main point of novelty suggests an additional marker of IPANs (Cdh6) that would add to the known list of markers for these cells. How useful this would be is unclear. Other main findings basically confirm that IPANs in mice display the same classical characteristics that have been known for many years from studies in guinea pigs, rats, mice and humans.

      We appreciate the already existing markers for IPANs in the ENS and the existing literature characterizing these neurons. The primary intent of this study was to use these well established characteristics of IPANs in both mice and other species to characterize Cdh6-expressing neurons in the mouse myenteric plexus and confirm their classification as IPANs.

      (2) Some of the main conclusions of this study are overstated and claims of priority are made that are not true. For example, the authors state in lines 27-28 of the abstract that their findings provide the "first demonstration of selective activation of a single neurochemical and functional class of enteric neurons". This is certainly not true since Gould et al (AJP-GIL 2019) expressed ChR2 in nitrergic enteric neurons and showed that activating those cells disrupted CMC activity. In fact, prior work by the authors themselves (Hibberd et al., Gastro 2018) showed that activating calretinin neurons with ChR2 evoked motor responses. Work by other groups has used chemogenetics and optogenetics to show the effects of activating multiple other classes of neurons in the gut.

      We believe our phrasing in this sentence was misleading. Whilst single neurochemical classes of enteric neurons have been manipulated to alter gut functions, all such instances to date do not represent manipulation of a single functional class of enteric neurons. In the given examples, NOS and calretinin are each expressed to varying degrees across putative motor neurons, interneurons and IPANs. In contrast, Chd6 is restricted to IPANs and therefore this study is the first optogenetic investigation of enteric neurons from a single putative functional class. We will alter this segment in the revised manuscript to emphasize this point and differentiate this study from those previous.

      (3) Critical controls are needed to support the optogenetic experiments. Control experiments are needed to show that ChR2 expression a) does not change the baseline properties of the neurons, b) that stimulation with the chosen intensity of light elicits physiologically relevant responses in those neurons, and c) that stimulation via ChR2 elicits comparable responses in IPANs in the different gut regions focused on here.

      We completely agree controls are essential. However, our paper is not the first to express ChR2 in enteric neurons. Authors of our paper have shown in Hibberd et al. 2018 that expression of ChR2 in a heterogeneous population of myenteric neurons did not change network properties of the myenteric plexus. This was demonstrated in the lack of change in control CMC characteristics in mice expressing ChR2 under basal conditions (without blue light exposure). Regarding question (b), that it should be shown that stimulation with the chosen intensity of light elicits physiologically relevant responses in those neurons. We show the restricted expression of ChR2 in IPANs and that motor responses (to blue light) are blocked by selective nerve conduction blockade.

      Regarding question (c), that our study should demonstrate that stimulation via ChR2 elicits comparable responses in IPANs in the different gut regions. We would not expect each region of the gut to behave comparably. This is because the different gut regions (i.e. proximal, mid, distal) are very different anatomically, as is anatomy of the myenteric plexus and myenteric ganglia between each region, including the density of IPANs within each ganglia, in addition to the presence of different patterns of electrical and mechanical activity [Spencer et al., 2020]. Hence, it is difficult to expect that between regions stimulation of ChR2 should induce similar physiological responses. The motor output we record in our study (CMCs) is a unified motor program that involves the temporal coordination of hundreds of thousands of enteric neurons and a complex neural circuit that we have previously characterized [Spencer et al., 2018]. But, never has any study until now been able to selectively stimulate a single functional class of enteric neurons (with light) to avoid indiscriminate activation of other classes of neurons.

      (1) T. J. Hibberd, J. Feng, J. Luo, P. Yang, V. K. Samineni, R. W. Gereau, N. Kelley, H. Hu, N. J. Spencer, Optogenetic Induction of Colonic Motility in Mice. Gastroenterology 155, 514-528.e6 (2018).

      (2) N. J. Spencer, L. Travis, L. Wiklendt, T. J. Hibberd, M. Costa, P. Dinning, H. Hu, Diversity of neurogenic smooth muscle electrical rhythmicity in mouse proximal colon. American Journal of Physiology-Gastrointestinal and Liver Physiology 318, G244–G253 (2020).

      (3) N. J. Spencer, T. J. Hibberd, L. Travis, L. Wiklendt, M. Costa, H. Hu, S. J. Brookes, D. A. Wattchow, P. G. Dinning, D. J. Keating, J. Sorensen, Identification of a Rhythmic Firing Pattern in the Enteric Nervous System That Generates Rhythmic Electrical Activity in Smooth Muscle. J. Neurosci. 38, 5507–5522 (2018).

      (4) The electrophysiological characterization of mouse IPANs is useful but this is a basic characterization of any IPAN and really says nothing specifically about Cdh6+ neurons. The electrophysiological characterization was also only done in a small fraction of colonic IPANs, and it is not clear if these represent cell properties in the distal colon or proximal colon, and whether these properties might be extrapolated to IPANs in the different regions. Similarly, blocking IH with ZD7288 affects all IPANs and does not add specific information regarding the role of the proposed Cdh6+ subtype.

      Our electrophysiological characterization was guided to be within a subset of Cdh6+ neurons by Hb9:GFP expression. As in the prior comment (1) above, we used these experiments to confirm classification of Cdh6+ (Hb9:GFP+) neurons in the distal colon as IPANs. We will clarify that these experiments were performed in the distal colon and agree that we cannot extrapolate that these properties are also representative of IPANs in the proximal colon. We apologize that this was confusing. Finally, we agree with the reviewer that ZD7288 affects all IPANs in the ENS and will clarify this in the text.

      (5) Why SMP IPANs were not included in the analysis of Cdh6 expression is a little puzzling. IPANs are present in the SMP of the small intestine and colon, and it would be useful to know if this proposed marker is also present in these cells.

      We agree with the reviewer. In addition to characterizing Cdh6 in the myenteric plexus, it would be interesting to query if sensory neurons located within the SMP also express Cdh6. Our preliminary data (n=2) show ~6-12% tdT/Hu neurons in Cdh6-tdT ileum and colon (data not shown). We will add a sentence to the discussion.

      (6) The emphasis on IH being a rhythmicity indicator seems a bit premature. There is no evidence to suggest that IH and IT are rhythm-generating currents in the ENS.

      Regarding the statement there is no evidence to suggest that IH and IT are rhythm-generating currents in the ENS. We agree with the reviewer that evidence of rhythm generation by IH and IT in the ENS has not been explicitly confirmed. We are confident the reviewer agrees that an absence of evidence is not evidence of absence, although the presence of IH has been well described in enteric neurons. We will modify the text in the results to indicate more clearly that IH and IT are known to participate in rhythm generation in thalamocortical circuits, though their roles in the ENS remain unknown. Our discussion of the potential role of IH or IT in rhythm generation or oscillatory firing of the ENS is constrained to speculation in the discussion section of the text.

      (7) As the authors point out in the introduction and discuss later on, Type II Cadherins such as Cdh6 bind homophillically to the same cadherin at both pre- and post-synapse. The apparent enrichment of Cdh6 in IPANs would suggest extensive expression in synaptic terminals that would also suggest extensive IPAN-IPAN connections unless other subtypes of neurons express this protein. Such synaptic connections are not typical of IPANs and raise the question of whether or not IPANs actually express the functional protein and if so, what might be its role. Not having this information limits the usefulness of this as a proposed marker.

      We agree with the reviewer that the proposed IPAN-IPAN connection is novel although it has been proposed before (Kunze et al., 1993). As detailed in our response to Reviewer #1, we attempted to confirm Cdh6 protein expression, but were unsuccessful, due to insufficient signal and resolution. We therefore discuss potential IPAN interconnectivity in the discussion, in the context of contrasting literature.

      (1) W. A. A. Kunze, J. B. Furness, J. C. Bornstein, Simultaneous intracellular recordings from enteric neurons reveal that myenteric ah neurons transmit via slow excitatory postsynaptic potentials. Neuroscience 55, 685–694 (1993).

      (8) Experiments shown in Figures 6J and K use a tethered pellet to drive motor responses. By definition, these are not CMCs as stated by the authors.

      The reviewer makes a valid criticism as to the terminology, since tethered pellet experiments do not record propagation. We believe the periodic bouts of propulsive force on the pellet is triggered by the same activity underlying the CMC. In our experience, these activities have similar periodicity, force and identical pharmacological properties. Consistent with this, we also tested full colons (n = 2) set up for typical CMC recordings by multiple force transducers, finding that CMCs were abolished by ZD7288, similar to fixed pellet recordings (data not shown).

      (9) The data from the optogenetic experiments are difficult to understand. How would stimulating IPANs in the distal colon generate retrograde CMCs and stimulating IPANs in the proximal colon do nothing? Additional characterization of the Cdh6+ population of cells is needed to understand the mechanisms underlying these effects.

      We agree that the different optogenetic responses in the proximal and distal colon are challenging to interpret, but perhaps not surprising in the wider context. It is not only possible that the different optogenetic responses in this study reflect regional differences in the Chd6+ neuronal populations, but also differences in neural circuits within these gut regions. A study some time ago by the authors showed that electrical stimulation of the proximal mouse colon was unable to evoke a retrograde (aborally) propagating CMC (Spencer, Bywater, 2002), but stimulation of the distal colon was readily able to. We concluded that at the oral lesion site there is a preferential bias of descending inhibitory nerve projections, since the ascending excitatory pathways have been cut off. In contrast, stimulation of the distal colon was readily able to activate an ascending excitatory neural pathway, and hence induce the complex CMC circuits required to generate an orally propagating CMC. Indeed, other recent studies have added to a growing body of evidence for significant differences in the behaviors and neural circuits of the two regions (Li et al., 2019, Costa et al., 2021a, Costa et al., 2021b, Nestor-Kalinoski et al., 2022). We will expand this discussion.

      (1) N. J. Spencer, R. A. Bywater, Enteric nerve stimulation evokes a premature colonic migrating motor complex in mouse. Neurogastroenterology & Motility 14, 657–665 (2002).

      (2) Li Z, Hao MM, Van den Haute C, Baekelandt V, Boesmans W, Vanden Berghe P (2019) Regional complexity in enteric neuron wiring reflects diversity of motility patterns in the mouse large intestine. Elife 8.

      (3). Costa M, Keightley LJ, Hibberd TJ, Wiklendt L, Dinning PG, Brookes SJ, Spencer NJ (2021a) Motor patterns in the proximal and distal mouse colon which underlie formation and propulsion of feces. Neurogastroenterol Motil e14098.

      (4) Costa M, Keightley LJ, Hibberd TJ, Wiklendt L, Smolilo DJ, Dinning PG, Brookes SJ, Spencer NJ (2021b) Characterization of alternating neurogenic motor patterns in mouse colon. Neurogastroenterol Motil 33:e14047.

      (5) Nestor-Kalinoski A, Smith-Edwards KM, Meerschaert K, Margiotta JF, Rajwa B, Davis BM, Howard MJ (2022) Unique Neural Circuit Connectivity of Mouse Proximal, Middle, and Distal Colon Defines Regional Colonic Motor Patterns. Cell Mol Gastroenterol Hepatol 13:309-337.e303.

    1. Author response:

      We are very pleased to see these positive reviews of our preprint.

      Reviewers 1 and 3 raise issues around PIP-PP1 interactions.

      (1) Role of the “RVxF-ΦΦ-R-W string”

      Most PIPs interact with the globular PP1 catalytic core through short linear interaction motifs (SLiMs) and Choy et al (PNAS 2014) previously showed that many PIPs interact with PP1 through conserved trio of SLiMs, RVxF-ΦΦ-R, which is also present in the Phactrs.

      Previous structural analysis showed the trajectory of the PPP1R15A/B, Neurabin/Spinphilin (PPP1R9A/B), and PNUTS (PPP1R10) PIPs across the PP1 surface encompasses not only the RVxF-ΦΦ-R trio, but also additional sequences C-terminal to it (Chen et al, eLife, 2015). This extended trajectory is maintained in the Phactr1-PP1 complex (Fedoryshchak et al, eLife (2020). Based on structural alignment we proposed the existence of an additional hydrophobic “W” SLiM that interacts with the PP1 residues I133 and Y134.

      The extended “RVxF-ΦΦ-R-W” interaction brings sequences C-terminal to the “W” SLiM into the vicinity of the hydrophobic groove that adjoins the PP1 catalytic centre. In the Phactr1/PP1 complex, these sequences remodel the groove, generating a novel pocket that facilitates sequence-specific substrate recognition.

      This raises the possibility that sequences C-terminal to the extended “RVxF-ΦΦ-R-W string” in the other complexes also confer sequence-specific substrate recognition, and our study aims to test this hypothesis. Indeed, the hydrophobic groove structures of the Neurabin/Spinophilin/PP1 and Phactr1/PP1 complexes differ significantly (Ragusa et al, 2010; see Fedoryshchak et al 2020, Fig2 FigSupp1).

      (2) Orientation of the W side chain

      Reviewer 1 points out that in the substrate-bound PP1/PPP1R15A/Actin/eIF2 pre-dephosphorylation complex the W sidechain is inverted with respect to its orientation in  PP1-PPP1R15B complex (Yan et al, NSMB 2021). The authors proposed that this may reflect the role of actin in assembly of the quaternary complex. This does not necessarily invalidate the notion that sequences C-terminal to the “W” motif might play a role in actin-independent substrate recognition, and we therefore consider our inclusion of the R15A/B fusions in our analysis to be reasonable.

      (3) Conservation of W

      The motif ‘W’ does not mandate tryptophan - Phactrs and PPP1R15A/B indeed have W at this position but Neurabin/spinophilin contain VDP, which makes similar interactions. Similarly the _“_RVxF” motifs in Phactr1, Neurabin/Spinophilin, PPP1R15A/B and PNUTS are LIRF, KIKF, KV(R/T)F and TVTW respectively.

      In our revision, we will present comparisons of the differentially remodelled/modified PP1 hydrophobic groove in the various complexes, discuss the different orientations of the tryptophan in the previously published PPP1R15A/PP1 and PPP1R15B/PP1 structures. We will also address the other issues raised by the referees.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ma, Yang et al. report a new investigation aimed at elucidating one of the key nutrients S. Typhimurium (STM) utilizes with the nutrient-poor intracellular niche within the macrophage, focusing on the amino acid beta-alanine. From these data, the authors report that beta-alanine plays an important role in mediating STM infection and virulence. The authors employ a multidisciplinary approach that includes some mouse studies and ultimately propose a mechanism by which panD, involved in B-Ala synthesis, mediates the regulation of zinc homeostasis in Salmonella. The impact of this work is questionable. There are already many studies reporting Salmonella-effector interactions, and while this adds to that knowledge it is not a significant advance over previous studies. While the authors are investigating an interesting question, the work has two important weaknesses; if addressed, the conclusions of this work and broader relevance to bacterial pathogenesis would be enhanced.

      Strengths:

      This reviewer appreciates the multidisciplinary nature of the work. The overall presentation of the figure graphics are clear and organized.

      Weaknesses:

      First, this study is very light on mechanistic investigations, even though a mechanism is proposed. Zinc homeostasis in cells, and roles in bacteria infections, are complex processes with many players. The authors have not thoroughly investigated the mechanisms underlying the roles of B-Ala and panD in impacting STM infection such that other factors cannot be ruled out. Defining the cellular content of Zn2+ STM in vivo would be one such route. With further mechanistic studies, the possibility cannot be ruled out that the authors have simply deleted two important genes and seen an infection defect - this may not relate directly to Zn2+ acquisition.

      Thank you for your patient and thoughtful reading as well as the constructive comments and advice about our manuscript. We will revise the manuscript based on your comments and suggestions.

      You are right that this work have not thoroughly investigated the mechanisms underlying the roles of β-Ala, panD and zinc in impacting Salmonella infection. We will perform additional experiments to detect the content of zinc during Salmonella infection in vivo and in vitro, according to your suggestions.

      We agree that other unknown mechanism(s) are also involved in the virulence regulation by β-Ala in Salmonella, as our results showed that the double mutant Δ_panD_Δ_znuA_ (cannot synthesis of β-Ala and uptake of zinc) is more attenuated than the single mutant Δ_znuA_ (Figure 5D), suggesting that the contribution of β-Ala to the virulence of Salmonella is partially dependent on zinc acquisition_._ We will reword the related description throughout the manuscript for clarity.

      Second, the authors hint at their newly described mechanism/pathway being important for disease and possibly a target for therapeutics. This claim is not justified given that they have employed a single STM strain, which was isolated from chickens and is not even a clinical isolate. The authors could enhance the impact of their findings and relevance to human disease by demonstrating it occurs in human clinical isolates and possibly other serovars. Further, the use of mouse macrophage as a model, and mice, have limited translatability to human STM infections.

      We thank your comments and advice regarding our manuscript and are delighted to accept them.

      You are right that our current findings are relatively limited and not sufficient for disease therapeutics. We will reword the related description throughout the manuscript. Based on this comment, we will also use Salmonella Typhi and human macrophages to perform additional experiments to extend our findings. Salmonella Typhi is a human-limited Salmonella serovar and the cause of typhoid fever, a severe lethal systemic disease. Salmonella Typhimurium (STM) cause systemic disease in mice, which is similar to the symptoms of typhoid fever in human and has been widely used to explore the pathogenesis of Salmonella.

      Reviewer #2 (Public review):

      Summary:

      Salmonella exploits host- and bacteria-derived β-alanine to efficiently replicate in host macrophages and cause systemic disease. β-alanine executes this by increasing the expression of zinc transporter genes and therefore the uptake of zinc by intracellular Salmonella

      Strengths:

      The experiments designed are thorough and the claims made are directly related to the outcome of the experiments. No overreaching claims were made.

      Weaknesses:

      A little deeper insight was expected, particularly towards the mechanistic aspects. For example, zinc transport was found to be the cause of the b-alanine-mediated effect on Salmonella intracellular replication. It would have been very interesting to see which are the governing factors that may get activated or inhibited due to Zn accumulation that supports such intracellular replication.

      We appreciate your review and advice. We will design and perform additional experiments to further investigate the mechanisms by which β-Ala, panD and zinc influence Salmonella infection, according to your suggestions. For example, we will detect the content of zinc during Salmonella infection in vivo and in vitro.

      Reviewer #3 (Public review):

      Summary:

      Salmonella is interesting due to its life within a compact compartment, which we call SCV or Salmonella containing vacuole in the field of Salmonella. SCV is a tight-fitting vacuole where the acquisition of nutrients is a key factor by Salmonella. The authors among many nutrients, focussed on beta-alanine. It is also known from many other studies that Salmonella requires beta-alanine. The authors have done in vitro RAW macrophage infection assays and In vivo mouse infection assays to see the life of Salmonella in the presence of beta-alanine. They concluded by comprehending that beta-alanine modulates the expression of many genes including zinc transporters which are required for pathogenesis.

      Strengths:

      This study made a couple of knockouts in Salmonella and did a transcriptomic investigation to understand the global gene expression pattern.

      Weaknesses:

      The following questions are unanswered:

      (1) It is not clear how the exogenous beta-alanine is taken up by macrophages.

      We thank the reviewer for the question. It is reported that β-alanine is delivered to eukaryotic cells through TauT (SLC6A6) and PAT1 (SLC36A1) transporters (Am J Physiol Cell Physiol. 2020 Apr 1;318(4):C777-C786; Br J Pharmacol 161: 589 –600, 2010; Biochim Biophys Acta 1194: 44 –52, 1994). We will add this information in the revised manuscript.

      (2) It is not clear how the Beta-alanine from the cytosol of the macrophage enters the SCV.

      Thank you for pointing it out. You are right that the above question is not clear. We will do our best to achieve this issue, via reviewing literature, designing and performing additional experiments.

      (3) It is not clear how the beta-alanine from SCV enters the bacterial cytosol.

      Thank you for the question. We have attempted to find the transporter of β-alanine in Salmonella, but we found that the CycA transporter transports β-alanine  in Escherichia coli but not in Salmonella, despite Salmonella is the closely related species of E. coli.

      According to your suggestion, we will perform additional experiments to verify whether BasC is involved in the transport of β-alanine into Salmonella cytosol.

      (4) There is no clarity on the utilization of exogenous beta-alanine of the host and the de novo synthesis of beta-alanine by panD of Salmonella.

      Thank you for the question. Our results showed that β-alanine concentrations were downregulated in the Salmonella-infected RAW264.7 cells, and the replication of Salmonella in RAW264.7 cells was significantly increased with the addition of β-alanine to the culture medium (RPMI) of RAW264.7 cells, implying that intracellular Salmonella use host-derived β-alanine for growth. Unfortunately, we have not found the transporter of exogenous β-alanine into Salmonella cytosol. We will perform additional experiments to verify whether BasC is involved in the transport of β-alanine into Salmonella cytosol, or search for other transporters that are responsible for the uptake of β-alanine into Salmonella.

      Upon confirming the β-alanine transporter in Salmonella, we will compare the intracellular replication and virulence between WT and the transporter mutant strain, via cell and mice infection assays. If the replication ability and virulence of the mutant strain decreases relative to WT, suggesting that Salmonella uptakes the exogenous beta-alanine of the host to enhance intracellular replication and its virulence in mice.

      We have found that the replication of Salmonella panD mutant in macrophages and the virulence in mice were significantly decreased relative to WT, suggesting that the de novo synthesis of β-alanine is important for Salmonella intracellular replication and virulence_. To further confirm that both uptake of host-derived β-alanine and de novo synthesis of β-alanine are critical for the full virulence of _Salmonella, we will generate the double mutant of panD and β-alanine transporter gene. If the replication ability and virulence of the double mutant decreases compared with each of the single mutant, suggesting that Salmonella both utilizes the exogenous beta-alanine of the host and de novo synthesis of β-alanine for full virulence.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The paper by Tolossa et al. presents classification studies that aim to predict the anatomical location of a neuron from the statistics of its in-vivo firing pattern. They study two types of statistics (ISI distribution, PSTH) and try to predict the location at different resolutions (region, subregion, cortical layer).

      Strengths:

      This paper provides a systematic quantification of the single-neuron firing vs location relationship.

      The quality of the classification setup seems high.

      The paper uncovers that, at the single neuron level, the firing pattern of a neuron carries some information on the neuron's anatomical location, although the predictive accuracy is not high enough to rely on this relationship in most cases.

      Thank you for your thoughtful feedback. The level of predictive accuracy offered by our current approach, while far above chance, is insufficient for electrode localization in most cases. Although, we speculate that our results represent a lower limit on possible performance—future improvements are almost certain as larger datasets are generated, more diverse features of neural activity are employed, and more advanced ML tools are implemented. We note that the current performance indicates a far more reliable embedding of anatomy in spiking than precedented by the modest statistical significance previously described in the literature. It would have been impossible to achieve this without the tremendous resources provided by the Allen Institute. In our revision, we will clarify that major performance improvements are both possible and probable.

      Weaknesses:

      As the authors mention in the Discussion, it is not clear whether the observed differences in firing are epiphenomenal. If the anatomical location information is useful to the neuron, to what extent can this be inferred from the vicinity of the synaptic site, based on the neurotransmitter and neuromodulator identities? Why would the neuron need to dynamically update its prediction of the anatomical location of its pre-synaptic partner based on activity when that location is static, and if that information is genetically encoded in synaptic proteins, etc (e.g., the type of the synaptic site)? Note that the neuron does not need to classify all possible locations to guess the location of its pre-synaptic partner because it may only receive input from a subset of locations.  If an argument on activity-based estimation being more advantageous to the neuron than synaptic site-based estimation cannot be made, I believe limiting the scope of the paper (e.g., in the Introduction) to an epiphenomenal observation and its quantification will improve the scientific quality.

      Summarily, in response to the two reviewers, we will minimize our discussion of this question in the revision. However, given that our results are either epiphenomenal or functional, we feel that it is important to indicate these possibilities, even if this indication is succinct and conservative.

      In pursuit of a more concise revision, we will not expand our discussion to accommodate this interesting conversation with the reviewer, but we are excited to briefly offer our perspective here.

      Regarding the epiphenomenal nature of our observations: this is a complex question that would be challenging but not impossible to validate experimentally. It has been previously established that neurons, especially those that integrate inputs from a variety of regions and are involved in diverse functions, could benefit from mechanisms for dynamically parsing inputs (Gutig, Sompolinsky 2006). Neurotransmitter and neuromodulator identities may indeed convey some information about presynaptic neuron location (e.g., NE may originate from the locus coeruleus). However, hypothetically, the binding of a neurotransmitter only bears on the postsynaptic neuron via ionic current, or second messenger activity. Postsynaptic neurons do not consume or otherwise endocytose the neurotransmitter, thus the ability of a neuron to “know” the presynaptic identity is a function of induced postsynaptic activity. Certainly, there are multiple streams of information that can provide insight into anatomical location all taking the ultimate form of neural activity and membrane dynamics. This would be broadly consistent with (for example) reward prediction error which is evident in dopamine release, firing rates, spiking patterns, and oscillatory rhythms.

      We could imagine a possible role for the embedding of location in spiking patterns. It is important to note that many neurons in neighboring areas share common neurotransmitters (e.g., glutamate, GABA). Neurons receiving input from multiple regions with similar neurotransmitter profiles could benefit from additional information in the spiking patterns for distinguishing input sources, especially for multimodal integration. For instance, an inferior parietal lobule neuron or microcircuit could be downstream from both auditory cortex (listening) and Broca’s area (speaking). Imagine an individual is in a crowded coffee shop waiting for their drink order to be called while speaking to their friend. In this scenario, it may be important to recognize region-specific activity and thus selectively attend to it. Thus, it is unlikely that neurons actively update a “location prediction,” but rather that location-related information is passively embedded in spike patterning and this might be dynamically leveraged in computation. We emphasize that this is a simplified conceptual example and not a hypothesis that we test in the paper. This conversation, however, is a wonderful example of the thought experiments that we hope will grow from this type of work.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Tolossa et al. analyze Inter-spike intervals from various freely available datasets from the Allen Institute and from a dataset from Steinmetz et al. They show that they can modestly decode between gross brain regions (Visual vs. Hippocampus vs. Thalamus), and modestly separate sub-areas within brain regions (DG vs. CA1 or various visual brain areas).

      Strengths:

      The paper is reasonably well written, and the definitions are quite well done. For example, the authors clearly explained transductive vs. inductive inference in their decoders. E.g., transductive learning allows the decoder to learn features from each animal, whereas inductive inference focuses on withheld animals and prioritizes the learning of generalizable features.

      Thank you!

      Weaknesses:

      However, even with some of these positive aspects, I still found the manuscript to be a laundry list of results, where some results are overly explained and not particularly compelling or interesting, whereas interesting results are not strongly described or emphasized. The overall problem is that the study is not cohesive, and the authors need to either come up with a tool or demonstrate a scientific finding. The current version attempts to split the middle and thus is not as impactful as it could be

      In our revision, we will endeavor to present our results in line with your suggestions. Thank you for the careful and thorough feedback that will improve the readability of our manuscript. We strove to be complete in establishing the logic leading to our ultimate finding—that a robust code for anatomical location can be extracted from single neuron spike trains, but not from more traditional descriptions of neural activity. Our detection of this code, albeit not perfect in performance, is, in most cases, both far above chance levels and is robust to animal identity and laboratory of origin. Our presentation of these results is cohesive in as much as we sequentially establish a series of results that build towards a concluding set of experiments. We start by establishing a baseline via standard measurements and then explore more challenging problems through more complex models that build toward our final test.  Based on your feedback, we will contract and expand elements of this sequence.

      While our findings raise the possibility of developing a computational tool for electrode localization, pending additional features and/or datasets, our current focus is on establishing the neurobiological principle of anatomical embedding in spike trains. The purpose of briefly mentioning a possible application is that we hope to encourage those engaged in machine-learning on multi-modal neural data that this problem is tractable, yet still open. Based on your feedback, we will clarify that the focus of our current work is not an introduction of a new tool.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Mitotic kinesins carry out crucial roles in intracellular motility and mitotic spindle organization. Although many mitotic kinesins have been extensively studied, a few conserved mitotic motors remain poorly explored, including chromosome-associated kinesins. Here, Furusaki et al reconstitute recombinant chromosome-associated kinesin or chromokinesin (Kid) and reveal processive plus-end motility along microtubules. The authors purify multiple versions of Kid, revealing dimeric organization and their processive microtubule plus-ended motility which depends on their conserved motor domains, neck linkers, and coiled-coil regions. The study reveals for the first time that KID can recruit and transport duplex DNA along microtubules using its conserved C-terminal DNA binding domain. The work provides crucial revised thinking about the mechanisms of Chromokinesins mitosis as physical processive motors that mobilize chromosomes towards the microtubule plus ends in early metaphase. 

      Strengths: 

      The authors reconstitute multiple chromosome-associated kinesin (KID) orthologs from Xenopus and humans with microtubules and determine their oligomerization. The study shows how coiled-coil and neck linker regions of KID are essential for its function as its deletion leads to non-processive motility. CHimeras placing the KID coiled-coil and neck linker on the KIF1A motor domain led to the production of a processive recombinant motor supporting the compatibility of their motility mechanisms. The KID c-terminal tail binds and transports only double-stranded DNA and its deletion or single-stranded DNA leads to defects in this activity.

      Thank you very much.

      Weaknesses: 

      A minor weakness in the studies is that they do not resolve the mechanisms of KID in binding large duplex DNA molecules or condensed chromatin. The authors suggest a model in which KID forms multimers along large chromosomes that lead to their transport, but this model was not directly tested. 

      Thank you very much for your suggestion.

      We will attempt to observe the movement of longer dsDNA and/or DNA-bead complexes and compare their motility with that of a single KID motor to elucidate the cooperativity of the motor protein.

      Reviewer #2 (Public review): 

      Summary: 

      Previous work in the field highlighted the role of the kinesin-10 motor protein Kid (KIF22) in the polar ejection force during prometaphase. However, the biochemical and biophysical properties of Kid that enabled it to serve in this role were unclear. The authors demonstrate that human and xenopus Kid proteins are processive kinesins that function as homodimeric molecules. The data are solid and support the findings although the text could use some editing to improve clarity. 

      Strengths: 

      A highlight of the work is the reconstitution of DNA transport in vitro. 

      A second highlight is the demonstration that the monomer vs dimer state is dependent on protein concentration. 

      Thank you very much.

      Weaknesses: 

      The authors make several assumptions of the monomer vs dimer state of various Kid constructs without verifying the protein state using e.g. size exclusion chromatography and/or nanophotometry. They also make statements about monomer-to-dimer transitions on the microtubule without showing or quantifying the data. 

      As reviewer suggests, the monomer-to-dimer transitions on the microtubule is a speculation. What we can measure in our hands are (1) monomer and dimer ratio in the solution and (2) particle movement on microtubules. At the pmol/L condition, Kid is monomeric in solution but exhibits processive movement on microtubules. Dimerization is generally required for the processivity. Therefore, we suggest Kid forms a dimer on microtubules.

      To show that Kid forms a dimer on microtubules, we will perform photobleaching assays and measure the fluorescent intensities of each particle on microtubules to determine their oligomeric state.

      The discussion needs to better put the work into context regarding the ability of non-processive motors to work in teams (formerly thought to be the case for Kid) and how their findings on Kid change this prevailing view in the case of polar ejection force. 

      We will look for the example of non-processive motors and include them in the Discussion and Citation. As described by this reviewer, Kid was originally thought to be a non-processive motor. We hope that our current work would change that view.  

      The authors also do not mention previous work on kinesins with non-conventional neck linker/neck coil regions that have been shown to move processively. Their work on Kid needs to be put into this context.

      We have thought that most kinesins, belonging to the cargo-transport classes, have conserved neck linker domain and neck coil domains, with Kid being exception. We will search for more citations, including non-transport classes of kinesins, and re-write the Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      First of all, we would like to thank the reviewers for their very constructive comments, which helped us to improve the manuscript! In response to the raised issues, we have performed new experiments and made necessary changes on the manuscript.

      eLife Assessment

      The study describes a valuable new technology in the field of targeted protein degradation that allows identification of E3-ubiquitin ligases that target a protein of interest. The presented data are convincing, however, it is unclear whether the proposed system can be successfully used in high throughput applications. This technology will serve the community in the initial stages of developing targeted protein degraders.

      We thank the eLife editors for the positive assessment and have clarified the scalability of our system for high throughput applications in the revised manuscript (see our response to both reviewer’s comment on weakness point 1).

      Reviewer #1 (Public Review):

      Summary:

      PROTACs are heterobifunctional molecules that utilize the Ubiquitin Proteasome System to selectively degrade target proteins within cells. Upon introduction to the cells, PROTACs capture the activity of the E3 ubiquitin ligases for ubiquitination of the targeted protein, leading to its subsequent degradation by the proteasome. The main benefit of PROTAC technology is that it expands the "druggable proteome" and provides numerous possibilities for therapeutic use. However, there are also some difficulties, including the one addressed in this manuscript: identifying suitable target-E3 ligase pairs for successful degradation. Currently, only a few out of about 600 E3 ligases are used to develop PROTAC compounds, which creates the need to identify other E3 ligases that could be used in PROTAC synthesis. Testing the efficacy of PROTAC compounds has been limited to empirical tests, leading to lengthy and often failure-prone processes. This manuscript addressed the need for faster and more reliable assays to identify the compatible pairs of E3 ligases-target proteins. The authors propose using the RiPA assay, which depends on rapamycin-induced dimerization of FKBP12 protein with FRB domain. The PROTAC technology is advancing rapidly, making this manuscript both timely and essential. The RiPA assay might be useful in identifying novel E3 ligases that could be utilized in PROTAC technology. Additionally, it could be used at the initial stages of PROTAC development, looking for the best E3 ligase for the specific target.

      The authors described an elegant assay that is scalable, easy-to-use, and applicable to a wide range of cellular models. This method allows for the quantitative validation of the degradation efficacy of a given pair of E3 ligase-target proteins, using luciferase activity as a measure. Importantly, the assay also enables the measurement of kinetics in living cells, enhancing its practicality.

      Strengths:

      (1) The authors have addressed the crucial needs that arise during PROTAC development. In the introduction, they nicely describe the advantages and disadvantages of the PROTAC technology and explain why such an assay is needed.

      (2) The study includes essential controls in experiments (important for generating new assay), such as using the FRB vector without E3 ligase as a negative control, testing different linkers (which may influence the efficacy of the degradation), and creating and testing K-less vectors to exclude the possibility of luciferase or FKBP12 ubiquitination instead of WDR5 (the target protein). Additionally, the position of the luc in the FKBP12 vector and the position of VHL in the FRB vector are tested. Different E3 ligases are tested using previously identified target proteins, confirming the assay's utility and accuracy.

      (3) The study identified a "new" E3 ligase that is suitable for PROTAC technology (FBXL).

      We greatly appreciate the reviewer’s positive feedback on our work. To evaluate our system further, in our revised manuscript we have conducted additional analysis on KRASG12D degradation via VHL and CRBN within our K-less system. Consistent with previous findings of VHL-harnessing PROTACs, our assay demonstrated that VHL mediated efficient degradation of KRASG12D while CRBN induced only a minor effect. This new data is presented in Figure 2 - figure supplement 1C of the revised manuscript.

      Weaknesses:

      · It is not clear how feasible it would be to adapt the assay for high-throughput screens.

      The design of our study is a well-based assay. It is therefore possible but not realistic to evaluate all 600 and more human E3 ligases. Nonetheless, if interested in all E3 ligases, our assay could be adapted for pooled experimental strategies, as demonstrated in Poirson, J., Cho, H., Dhillon, A. et al., Nature 628, 878–886 (2024).

      Our system offers several advantages over pooled screens, including the generation of more quantitative data and faster testing of selected candidates. Pooled screens, by contrast, require more time due to the necessity of next-generation sequencing and bioinformatics analysis. Moreover, in response to the reviewers comment, we have included a schematic in the revised manuscript (Figure 4 - figure supplement 1A) that outlines the assay duration and hands-on time for target and E3 ligase candidates.

      · In some experiments, the efficacy of WDR5 degradation tested by immunoblotting appears to be lower than luciferase activity (e.g., Figure 2G and H).

      We concur with the reviewer that in some instances, the degradation observed via immunoblotting appears lower than that indicated by luciferase activity. Thus, we have quantified the western and added it to the respective blots. This discrepancy may result from the non-linearity of western blots.

      Reviewer #2 (Public Review):

      Summary:

      Adhikari and colleagues developed a new technique, rapamycin-induced proximity assay (RiPA), to identify E3-ubiquitin (ub) ligases of a protein target, aiming at identifying additional E3 ligases that could be targeted for PROTAC generation or ligases that may degrade a protein target. The study is timely, as expanding the landscape of E3-ub ligases for developing targeted degraders is a primary direction in the field.

      Strengths:

      The study's strength lies in its practical application of the FRB:FKBP12 system. This system is used to identify E3-ub ligases that would degrade a target of interest, as evidenced by the reduction in luminescence upon the addition of rapamycin. This approach effectively mimics the potential action of a PROTAC.

      We are delighted with this assessment of our work by the reviewer. To evaluate our system further, in our revised manuscript we have conducted additional analysis on KRASG12D degradation via VHL and CRBN within our K-less system. Consistent with previous findings of VHL-harnessing PROTACs, our assay demonstrated that VHL mediated efficient degradation of KRASG12D while CRBN induced only a minor effect. This new data is presented in Figure 2 - figure supplement 1C of the revised manuscript.

      Weaknesses:

      (1) While the technique shows promise, its application in a discovery setting, particularly for high-throughput or unbiased E3-ub ligase identification, may pose challenges. The authors should provide more detailed insights into these potential difficulties to foster a more comprehensive understanding of RiPA's limitations.

      The design of our study is well-based assay . It is therefore possible but not realistic to evaluate all 600 and more human E3 ligases. Nonetheless, if interested in all E3 ligases, our assay could be adapted for pooled experimental strategies, as demonstrated in Poirson, J., Cho, H., Dhillon, A. et al., Nature 628, 878–886 (2024).

      Our system offers several advantages over pooled screens, including the generation of more quantitative data and faster testing of selected candidates. Pooled screens, by contrast, require more time due to the necessity of next-generation sequencing and bioinformatics analysis. Moreover, in response to the reviewers comment, we have included a schematic in the revised manuscript (Figure 4 - figure supplement 1A) that outlines the assay duration and hands-on time for target and E3 ligase candidates.

      We also added the following sentences to the Limitations of the study section of the revised manuscript (line 322-326): “While our system offers easy testing of different tagging approaches and due to its simple workflow facilitates the rapid characterization of novel E3 ligases across multiple targets, it is currently not optimized for high-throughput evaluation of all 600+ E3 ligases. Achieving such scale would necessitate further adaptations, including the incorporation of pooled experimental strategies.”

      (2) While RiPA will help identify E3 ligases, PROTAC design would still be empirical. The authors should discuss this limitation. Could the technology be applied to molecular glue generation?

      We agree with the reviewer that our assay rationalizes the choice of E3 ligases but that PROTAC design (“linkerology”) is still mostly empirical. To address this, we included the following line in the Limitations of the study section of our initial manuscript (line 327-330): “Conversely, it is also conceivable that an E3 ligase that can efficiently decrease the levels of a particular target in the RiPA setting may be less suitable for PROTACs, since PROTACs that mimic the steric interaction of the target/E3 pair may not be easily identified in the chemical space.”

      Regarding molecular glues, our assay could also be instrumental in identifying suitable E3 ligases for a target protein prior to screening for molecular glues, provided that the screening system specifically screens E3 ligase and target pairs. However, as most molecular glue screens are currently agnostic to specific E3 ligases or targets, our system may not be applicable in those cases. We have elaborated on this in the discussion section of the revised manuscript (line 271-274): “We envision that this setting will be valuable for identifying the most suitable E3 ligase candidates for PROTACs aimed at specific proteins, and for guiding E3 ligase selection when screening for molecular glues targeting specific E3 ligase and protein pairs.”

      (3) Controls to verify the intended mechanism of action are missing, such as using a proteasome inhibitor or VHL inhibitors/siRNA to verify on-target effects. Verification of the target E3 ligase complex after rapamycin addition via orthogonal approaches, such as IP, should be considered.

      We thank the reviewer for the comment. Particularly VHL siRNA is not beneficial in this setup, as we overexpress the E3 ligase rather than relying on endogenous protein.

      To verify mechanism of action, we performed additional experiments in the presence of proteosomal inhibitor MG132 and neddylation inhibitor MLN4924 with target KRASG12D and E3 ligase VHL. The results is shown in Figure 2H of the revised manuscript.

      Minor concern:

      The graphs in Figure 1E are missing.

      We thank the reviewer for pointing this out. We corrected the figure in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      •  Optionally, the authors could add control experiments with Aurora B and Crb vectors (there shouldn't be any degradation) and experiments confirming that the degradation occurs via the proteasome. For example, the addition of proteasome inhibitors (such as bortezomib) should decrease the efficiency of the target degradation and confirm that targets are degraded via the proteasome system.

      Regarding Aurora-B degradation, as far as we know, there are no specific Aurora-B PROTACs reported. Thus, there is no definitive evidence that CRBN could not degrade Aurora-B. Nevertheless, we performed assays with Aurora-B and VHL, CRBN, or FRB, and observed more effective degradation of Aurora-B by VHL than CRBN. This data is now included in Figure 2 - figure supplement 1B of the revised manuscript.

      • It would also be helpful to provide a possible explanation for why the ratio 1:1 of vectors did not induce the degradation (regarding Figure 1D).

      We believe the lack of degradation with 1:1 vector ratio is due to the differential expression levels of endogenous FKBP12 and mTOR in HEK293 cells. According to Human Protein Atlas, the normalized protein-coding transcripts per million (nTPM) for FKBP12 and mTOR in HEK293 cells are 160 and 24 respectively, indicating that FKBP12 is expressed at levels approximately 6.7 times higher than mTOR. This disparity likely limits the heterodimerization of exclusively fusion proteins upon rapamycin addition. To increase the likelihood of FKBP12 and FRB fusion protein dimerization, we used a higher ratio of the FRB component during transfection, considering the higher endogenous expression of FKBP12.

      • It would be helpful to add more explanation for the data in Figure 1F, including whether there is a difference between vectors with different positions of VHL and FRB and why the FRB-VHL vector is less expressed without rapamycin.

      We thank the reviewer for the comment. Regarding the vector orientations of VHL/FRB and WDR5/Luc/FKBP12, we have consistently observed different migration behaviors for WDR5 and VHL constructs, despite their same molecular weights. This observation aligns with literature reports where differential running behavior is noted when FRB or FKBP12 (or their mutants) are tagged to the N- or C-terminus of a protein (Bondeson, D.P., Mullin-Bernstein, Z., Oliver, S. et al. Nat Commun 13, 5495 (2022); Mabe, S., Nagamune, T. & Kawahara, M. Sci Rep 4, 6127 (2014)). We have now included the following explanation in the figure legend of Figure 1F of the revised manuscript: “WDR5 and VHL fusion proteins tagged at the N- and C-terminal show different migration behaviors despite having same molecular weight.”

      Additionally, the stabilizing effect of rapamycin on FRB (or its mutants), FRB fusion proteins, and FRB-containing proteins has been documented (Stankunas, K., Bayle, J.H., Havranek, J.J. et al. ChemBioChem, 8(10), 1162-1169 (2007); Stankunas, K., Bayle, J.H., Gestwicki J.E. et al. Mol Cell, 12(6), 1615–1624 (2003); Zhang, C., Cui, M., Cui, Y. et al. J. Vis. Exp. (150), e59656 (2019)). We believe that the degree of stabilization by rapamycin could differ between N- and C-terminal FRB fusion proteins.

      • Finally, the mistake in Figure 2G (where the lanes are wrongly labelled, BRBN-FRB and FRB) should be corrected. Also please correct the graph in Figure 1E (there seems to be a problem with bars for 1:100). There are some typos, such as in lines 38, 277, and 288.

      Thank you for bringing this to our attention. We have corrected all the mentioned errors.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In this work, the authors examine the activity and function of D1 and D2 MSNs in dorsomedial striatum (DMS) during an interval timing task. In this task, animals must first nose poke into a cued port on the left or right; if not rewarded after 6 seconds, they must switch to the other port. Thus, this task requires animals to estimate if at least 6 seconds have passed after the first nose poke. After verifying that animals estimate the passage of 6 seconds, the authors examine striatal activity during this interval. They report that D1-MSNs tend to decrease activity, while D2MSNs increase activity, throughout this interval. They suggest that this activity follows a driftdiffusion model, in which activity increases (or decreases) to a threshold after which a decision is made. The authors next report that optogenetically inhibiting D1 or D2 MSNs, or pharmacologically blocking D1 and D2 receptors, increased the average wait time. This suggests that both D1 and D2 neurons contribute to the estimate of time, with a decrease in their activity corresponding to a decrease in the rate of 'drift' in their drift-diffusion model. Lastly, the authors examine MSN activity while pharmacologically inhibiting D1 or D2 receptors. The authors observe most recorded MSNs neurons decrease their activity over the interval, with the rate decreasing with D1/D2 receptor inhibition. 

      We appreciate the careful read by this reviewer. 

      Major strengths: 

      The study employs a wide range of techniques - including animal behavioral training, electrophysiology, optogenetic manipulation, pharmacological manipulations, and computational modeling. The question posed by the authors - how striatal activity contributes to interval timing - is of importance to the field and has been the focus of many studies and labs. This paper contributes to that line of work by investigating whether D1 and D2 neurons have similar activity patterns during the timed interval, as might be expected based on prior work based on striatal manipulations. However, the authors find that D1 and D2 neurons have distinct activity patterns. They then provide a decision-making model that is consistent with all results. The data within the paper is presented very clearly, and the authors have done a nice job presenting the data in a transparent manner (e.g., showing individual cells and animals). Overall, the manuscript is relatively easy to read and clear, with sufficient detail given in most places regarding the experimental paradigm or analyses used. 

      We are glad that our main points come clearly through.

      Major weaknesses: 

      One weakness to me is the impact of identifying whether D1 and D2 had similar or different activity patterns. Does observing increasing/decreasing activity in D2 versus D1, or different activity patterns in D1 and D2, support one model of interval timing over another, or does it further support a more specific idea of how DMS contributes to interval timing? 

      This is a great point - we were not clear.  We observe distinct patterns of D2 and D1-MSN activity, but that disrupting either D2-MSNs or D1-MSNs led to increased response time.  The model that this supports is that D2-MSNs and D1-MSN ensemble activity represents temporal evidence.  This is a very specific model that can be rigorously tested in future work.  We have now made this very clear in the abstract (Page 2). 

      “We found that D2-MSNs and D1-MSNs exhibited distinct dynamics over temporal intervals as quantified by principal component analyses and trial-by-trial generalized linear models. MSN recordings helped construct and constrain a fourparameter drift-diffusion computational model in which MSN ensemble activity represented the accumulation of temporal evidence. This model predicted that disrupting either D2-MSNs or D1-MSNs would increase interval timing response times and alter MSN firing. In line with this prediction, we found that optogenetic inhibition or pharmacological disruption of either D2-MSNs or D1-MSNs increased interval timing response times.”

      And in the results on Page 18:  

      “Because both D2-MSNs and D1-MSNs accumulate temporal evidence, disrupting either MSN type in the model changed the slope. The results were obtained by simultaneously decreasing the drift rate D (equivalent to lengthening the neurons’ integration time constant) and lowering the level of network noise 𝝈: D = 𝟎. 𝟏𝟐𝟗, 𝝈 = 𝟎. 𝟎𝟒𝟑 for D2-MSNs in Fig 4A (in red; changes in noise had to accompany changes in drift rate to preserve switch response time variance. See Methods); and 𝑫 = 𝟎. 𝟏𝟐𝟐, 𝝈 = 𝟎. 𝟎𝟒𝟑 for D1-MSNs in Fig 4B (in blue). The model predicted that disrupting either D2-MSNs or D1-MSNs would increase switch response times (Fig 4C and Fig 4D) and would shift MSN dynamics.” 

      And in the discussion (Page 30): 

      “Striatal MSNs are critical for temporal control of action (Emmons et al., 2017; Gouvea et al., 2015; Mello et al., 2015). Three broad models have been proposed for how striatal MSN ensembles represent time: 1) the striatal beat frequency model, in which MSNs encode temporal information based on neuronal synchrony (Matell and Meck, 2004); 2) the distributed coding model, in which time is represented by the state of the network (Paton and Buonomano, 2018); and 3) the DDM, in which neuronal activity monotonically drifts toward a threshold after which responses are initiated (Emmons et al., 2017; Simen et al., 2011; Wang et al., 2018). While our data do not formally resolve these possibilities, our results show that D2-MSNs and D1MSNs exhibit opposing changes in firing rate dynamics in PC1 over the interval. Past work by our group and others has demonstrated that PC1 dynamics can scale over multiple intervals to represent time (Emmons et al., 2020, 2017; Gouvea et al., 2015; Mello et al., 2015; Wang et al., 2018). We find that low-parameter DDMs account for interval timing behavior with both intact and disrupted striatal D2- and D1-MSNs. While other models can capture interval timing behavior and account for MSN neuronal activity, our model does so parsimoniously with relatively few parameters (Matell and Meck, 2004; Paton and Buonomano, 2018; Simen et al., 2011). We and others have shown previously that ramping activity scales to multiple intervals, and DDMs can be readily adapted by changing the drift rate (Emmons et al., 2017; Gouvea et al., 2015; Mello et al., 2015; Simen et al., 2011). Interestingly, decoding performance was high early in the interval; indeed, animals may have been focused on this initial interval (Balci and Gallistel, 2006) in making temporal comparisons and deciding whether to switch response nosepokes.”

      Regarding the reviewer’s specific question – it is not clear why D1-MSNs and D2-MSNs have opposing patterns of activity, as integration of temporal evidence can certainly be achieved increasing or decreasing firing rates alone. These patterns have been seen in motor control. Prefrontal neurons, which control striatal ramping, also ramp up and down. We have now included a paragraph on Page 30 explicitly discussing these ideas; however, future experiments will be required to investigate the source of the divergent patterns of activity among D2-MSNs and D1-MSNs.   

      “D2-MSNs and D1-MSNs play complementary roles in movement. For instance, stimulating D1-MSNs facilitates movement, whereas stimulating D2-MSNs impairs movement (Kravitz et al., 2010). Both populations have been shown to have complementary patterns of activity during movements with MSNs firing at different phases of action initiation and selection (Tecuapetla et al., 2016). Further dissection of action selection programs reveals that opposing patterns of activation among D2MSNs and D1-MSNs suppress and guide actions, respectively, in the dorsolateral striatum (Cruz et al., 2022). A particular advantage of interval timing is that it captures a cognitive behavior within a single dimension — time. When projected along the temporal dimension, it was surprising that D2-MSNs and D1-MSNs had opposing patterns of activity. Ramping activity in the prefrontal cortex can increase or decrease; and prefrontal neurons project to and control striatal ramping activity (Emmons et al., 2020, 2017; Wang et al., 2018).  It is possible that differences in D2MSNs and D1-MSNs reflect differences in cortical ramping, which may themselves reflect more complex integrative or accumulatory processes. Further experiments are required to investigate these differences. Past pharmacological work from our group and others has shown that disrupting D2- or D1-MSNs slows timing (De Corte et al., 2019b; Drew et al., 2007, 2003; Stutt et al., 2024) and are in agreement with pharmacological and optogenetic results in this manuscript. Computational modeling predicted that disrupting either D2-MSNs or D1-MSNs increased selfreported estimates of time, which was supported by both optogenetic and pharmacological experiments.”

      I found the results presented in Figures 2 and 3 to be a little confusing or misleading. In Figure 2, the authors appear to claim that D1 neurons decrease their activity over the time interval while D2 neurons increase activity. The authors use this result to suggest that D1/D2 activity patterns are different. In Figure 3, a different analysis is done, and this time D2 neurons do not significantly increase their activity with time, conflicting with Figure 2. While in both figures, there is a significant difference between the mean slopes across the population, the secondary effect of positive/negative slope for D2/D1 neurons changes. I find this especially confusing as the authors refer back to the positive/negative slope for D2/D1 neurons result throughout the rest of the text.  

      We were not clear.  First, we attempted to quantify these differences based on PCA and slope.  We have rephrased our characterization of these differences by changing text on (Page 9) to: 

      “These PETHs revealed that for the 6-second interval immediately after trial start, many putative D2-MSN neurons appeared to ramp up while many putative D1-MSNs appeared to ramp down. For 32 putative D2-MSNs average PETH activity increased over the 6-second interval immediately after trial start, whereas for 41 putative D1-MSNs, average PETH activity decreased. Accordingly, D2-MSNs and D1-MSNs had differences in activity early in the interval (0-5 seconds; F = 4.5, p = 0.04 accounting for variance between mice) but not late in the interval (5-6 seconds; F = 1.9, p = 0.17 accounting for variance between mice). Examination of a longer interval of 10 seconds before to 18 seconds after trial start revealed the greatest separation in D2-MSN and D1-MSN dynamics during the 6-second interval after trial start (Fig S2). Strikingly, these data suggest that D2-MSNs and D1-MSNs might display distinct dynamics during interval timing.” 

      We have rephrased our discussion on PCA to quantify differences in Fig 2G-H using data-driven methods (Page 12): 

      “To quantify differences between D2-MSNs vs D1-MSNs in Fig 2G-H, we turned to principal component analysis (PCA), a data-driven tool to capture the diversity of neuronal activity (Kim et al., 2017a). Work by our group and others has uniformly identified PC1 as a linear component among corticostriatal neuronal ensembles during interval timing (Bruce et al., 2021; Emmons et al., 2020, 2019, 2017; Kim et al., 2017a; Narayanan et al., 2013; Narayanan and Laubach, 2009; Parker et al., 2014; Wang et al., 2018). We analyzed PCA calculated from all D2-MSN and D1MSN PETHs over the 6-second interval immediately after trial start. PCA identified time-dependent ramping activity as PC1 (Fig 3A), a key temporal signal that explained 54% of variance among tagged MSNs (Fig 3B; variance for PC1 p = 0.009 vs 46 (44-49)% for any pattern of PC1 variance derived from random data; Narayanan, 2016). Consistent with population averages from Fig 2G&H, D2-MSNs and D1-MSNs had opposite patterns of activity with negative PC1 scores for D2MSNs and positive PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1-MSNs: 2.8 (-2.8 – 4.9); F = 8.8, p = 0.004 accounting for variance between mice (Fig S3A); Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F = 0.44, p = 0.51) or switching direction (F = 1.73, p = 0.19)).”

      And finally, we directly investigate the heart of the reviewer’s question by explicitly comparing PC1 scores – a data-driven analysis of neuronal patterns that explain the least variance – and show that they are less than 0 for D2-MSNs (i.e., negatively correlated with a down-ramping pattern, or ramping up), and greater than 0 for D1MSNs (i.e., positively correlated with an up-ramping pattern): 

      “Importantly, PC1 scores for D2-MSNs were significantly less than 0 (signrank D2MSN PC1 scores vs 0: p = 0.02), implying that because PC1 ramps down, D2-MSNs tended to ramp up. Conversely, PC1 scores for D1-MSNs were significantly greater than 0 (signrank D1-MSN PC1 scores vs 0: p = 0.05), implying that D1-MSNs tended to ramp down.  Thus, analysis of PC1 in Fig 3A-C suggested that D2-MSNs (Fig 2G) and D1-MSNs (Fig 2H) had opposing ramping dynamics.”

      We interpret these data on Page 16: 

      “Our analysis of average activity (Fig 2G-H) and PC1 (Fig 3A-C) suggested that D2MSNs and D1-MSNs might have opposing dynamics. However, past computational models of interval timing have relied on drift-diffusion dynamics that increases over the interval and accumulates evidence over time (Nguyen et al., 2020; Simen et al., 2011).”

      The reviewer mentions our analysis of ‘mean slopes across the population’ -which we clarify as trial-by-trial slope analysis, which is distinct from the population averages in 2G-H and 3A-C.  We have now made this clear (Page 12). 

      “To interrogate these dynamics at a trial-by-trial level, we calculated the linear slope of D2-MSN and D1-MSN activity over the first 6 seconds of each trial using generalized linear modeling (GLM) of effects of time in the interval vs trial-by-trial firing rate (Latimer et al., 2015).  Note that this analysis focuses on each trial rather than population averages in Fig 2G-H and Fig 3A-C.”

      Finally, as the reviewer suggests, we have removed the term ‘slope’ from the rest of the paper, as the increasing/decreasing comes from averages and analyses of PC1.  We have removed all discussion of ‘opposing’ slope or ‘increasing/decreasing’ slope. 

      It is a bit unclear to me how the authors chose the parameters for the model, and how well the model explains behavior is quantified. It seems that the authors didn't perform cross-validation across trials (i.e., they chose parameters that explained behavior across all trials combined, rather than choosing parameters from a subset of trials and determining whether those parameters are robust enough to explain behavior on held-out trials). I think this would increase the robustness of the result. 

      In addition, it remains a bit unclear to me how the authors changed the specific parameters they did to model the optogenetic manipulation. It seems these parameters were chosen because they fit the manipulation data. This makes me wonder if this model is flexible enough that there is almost always a set of parameters that would explain any experimental result; in other words, I'm not sure this model has high explanatory power. 

      We are glad the reviewer raised these points.  First, we have now included a complete exploration of the parameter space, exactly as the reviewer recommends.  These are described in the methods (Page 41): 

      “Selection of DDMs parameters. Our goal was to build DDMs with dynamics that produce “response times” according to the observed distribution of mice switch times. The selection of parameter values in Fig 4 was done in three steps. First, we fit the distribution of the mice behavioral data with a Gamma distribution and found its fitting values for shape 𝜶𝑴 and rate 𝜷𝑴 (Table S2 and Fig S8; R2 Data vs Gamma ≥ 𝟎. 𝟗𝟒). We recognized that the mean 𝝁𝑴 and the coefficient of variation 𝑪𝑽𝑴 are directly related to the shape and rate of the Gamma distribution by formulas 𝝁𝑴 \= 𝜶𝑴/𝜷𝑴 and 𝑪𝑽𝑴 \= 𝟏/√𝜶𝑴.  Next, we fixed parameters 𝑭 and 𝒃 in DDM (e.g., for D2-MSNs: 𝑭 = 𝟏, 𝒃 = 𝟎. 𝟓𝟐) and simulated the DDM for a range of values for 𝑫 and 𝝈. For each pair (𝑫, 𝝈), one computational “experiment” generated 500 response times with mean 𝝁 and coefficient of variation 𝑪𝑽. We repeated the “experiment” 10 times and took the group median of 𝝁 and 𝑪𝑽 to obtain the simulation-based statistical measures 𝝁𝑺 and 𝑪𝑽𝑺. Last, we plotted 𝑬𝝁 \= |(𝝁𝑺 − 𝝁𝑴)/𝝁𝑴| and 𝑬𝒄𝒗 \= |𝑪𝑽𝑺 − 𝑪𝑽𝑴|, the respective relative error and the absolute error to data (Fig S7). We considered that parameter values (𝑫, 𝝈) provided a good DDM fit of mice behavioral data whenever  𝑬𝝁 ≤ 𝟎. 𝟎𝟓    and 𝑬𝒄𝒗

      And included a new Fig S7 which shows the parameter space: 

      These new data clearly comment on the parameter space of our model. 

      Finally, the reviewer mentions cross-validation.  We did this at length on our model and data fits.  We used 10-fold cross-validation as fitlm needs enough data for the individual fits.  We found that the fit was extremely stable – i.e, we ended up with standard deviations in R2<0.004 for all comparisons.  Thus, we added the following point to the methods on Page 41:  

      “10-fold cross-validation revealed highly stable fits between gamma, models and data.”

      Lastly, the results are based on a relatively small dataset (tens of cells). 

      This is an important point.  Although it is a small optogenetically-tagged dataset, we have adequate statistical power and large effect sizes, which we now detail in the text on Page 12:

      “Consistent with population averages from Fig 2G&H, D2-MSNs and D1-MSNs had opposite patterns of activity with negative PC1 scores for D2-MSNs and positive PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1MSNs: 2.8 (-2.8 – 4.9); F = 8.8, p = 0.004 accounting for variance between mice (Fig S3A); Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F = 0.44, p = 0.51) or switching direction (F = 1.73, p = 0.19)).”

      And:  

      “GLM analysis also demonstrated that D2-MSNs had significantly different slopes (0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1-MSNs (-0.20 (-0.47– 0.06; Fig 3D; F = 8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98; no reliable effect of sex (F = 0.02, p = 0.88) or switching direction (F = 1.72, p = 0.19)).”

      And we have included the reviewers point as a limitation on Page 33:  

      “Second, although we had adequate statistical power and medium-to-large effect sizes, optogenetic tagging is low-yield, and it is possible that recording more of these neurons would afford greater opportunity to identify more robust results and alternative coding schemes, such as neuronal synchrony.”

      Impact: 

      The task and data presented by the authors are very intriguing, and there are many groups interested in how striatal activity contributes to the neural perception of time. The authors perform a wide variety of experiments and analysis to examine how DMS activity influences time perception during an interval-timing task, allowing for insight into this process. However, the significance of the key finding -- that D1 and D2 activity is distinct across time -- remains somewhat ambiguous to me. 

      Again, we are glad that the reviewer appreciated our main point, and we very much appreciate the additional points about interpretation, model parameters, and statistical power. If there is any way we can clarify the text further we are happy to do so.  

      Reviewer #2 (Public Review):  

      (1) Regarding the results in Figure 2 and Figure 5: for the heatmaps in Fig.2F and Fig.2E, the overall activity pattern of D1 and D2 MSNs looks very similar, both D1 and D2 MSNs contains neurons showing decreasing or increasing activity during interval timing. And the optogenetic and pharmacologic inhibition of either D1 or D2 MSNs resulted in similar behavior outcomes. To me, the D1 and D2 MSN activities were more complementary than opposing. 

      This is a great point. In our last revision, R3 suggested that complementary means opposing – and suggested we change the title to reflect this.  Our original title was ‘Complementary cognitive roles for D2-MSNs and D1-MSNs during interval timing’ – and we have changed the title back to this. We have clarified what we meant by complementary in the abstract (Page 2):

      “Together, our findings demonstrate that D2-MSNs and D1-MSNs had opposing dynamics yet played complementary cognitive roles, implying that striatal direct and indirect pathways work together to shape temporal control of action.”

      And on Page 30: 

      “These data, when combined with our model predictions, demonstrate that despite opposing dynamics,  D2-MSNs and D1-MSN contribute complementary temporal evidence to controlling actions in time.”

      If the authors want to emphasize the opposing side of D1 and D2 MSNs, then the manipulation experiments need to be re-designed, since the average activity of D2 MSNs increased, while D1 MSNs decreased during interval timing, instead of using inhibitory manipulations in both pathways, the authors should use inhibitory manipulation in D2-MSNs, while using optogenetic or pharmacology to activate D1-MSNs. In this way, the authors can demonstrate the opposing role of D1 and D2 MSNs and the functions of increased activity in D2-MSNs and decreased activity in D1-MSNs. 

      These are great ideas, which we agree with.  We would like to emphasize the complementary nature as noted in our original title, and not the opposing side of D1/D2 MSNs. The experiments proposed by reviewer are certainly worth doing, but would likely be quite complex to find the right stimulation parameters to affect timing without affecting movement – and we have now included them as an important limitation / future direction (Page 33):

      “Fifth, we did not deliver stimulation to the striatum because our pilot experiments triggered movement artifacts or task-specific dyskinesias (Kravitz et al., 2010). Future stimulation approaches carefully titrated to striatal physiology may affect interval timing without affecting movement.”

      (2) Regarding the results in Figure 3 C and D, Figure 6 H and Figure 7 D, what is the sample size? From the single data points in the figures, it seems that the authors were using the number of cells to do statistical tests and plot the figures. For example, Figure 3 C, if the authors use n= 32 D2 MSNs and n= 41D1 MSNs to do the statistical test, it could make a small difference to be statistically significant. The authors should use the number of mice to do the statistical tests. 

      These are important points that were discussed at length in the prior review.  First, for the sample size, we now have detailed in our Table 1: 

      Second, we have detailed our statistical approach which explicitly deals with repeated observations of neurons across mice (Page 43):

      “Statistics. All data and statistical approaches were reviewed by the Biostatistics, Epidemiology, and Research Design Core (BERD) at the Institute for Clinical and Translational Sciences (ICTS) at the University of Iowa. All code and data are made available at http://narayanan.lab.uiowa.edu/article/datasets. We used the median to measure central tendency and the interquartile range to measure spread. We used Wilcoxon nonparametric tests to compare behavior between experimental conditions and Cohen’s d to calculate effect size. Analyses of putative single-unit activity and basic physiological properties were carried out using custom routines for MATLAB. For all neuronal analyses, variability between animals was accounted for using generalized linear-mixed effects models and incorporating a random effect for each mouse into the model, which allows us to account for inherent betweenmouse variability. We used fitglme in MATLAB and verified main effects using lmer in R. We accounted for variability between MSNs in pharmacological datasets in which we could match MSNs between saline, D2 blockade, and D1 blockade. P values < 0.05 were interpreted as significant.”   

      We have formally reviewed this approach with professional biostatisticians at the University of Iowa.

      Finally, we note that we do have adequate statistical power for analysis of Fig 3C and D:  we have adequate statistical power and large effect sizes, which we now detail in the text on Page 12:

      “Consistent with population averages from Fig 2G&H, D2-MSNs and D1-MSNs had opposite patterns of activity with negative PC1 scores for D2-MSNs and positive PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1MSNs: 2.8 (-2.8 – 4.9); F = 8.8, p = 0.004 accounting for variance between mice (Fig S3A); Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F = 0.44, p = 0.51) or switching direction (F = 1.73, p = 0.19)).”

      And, on Page 12:  

      “GLM analysis also demonstrated that D2-MSNs had significantly different slopes (0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1-MSNs (-0.20 (-0.47– 0.06; Fig 3D; F = 8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98; no reliable effect of sex (F = 0.02, p = 0.88) or switching direction (F = 1.72, p = 0.19)).”

      And we have included the reviewers point as a limitation on Page 33: 

      “Second, although we had adequate statistical power and medium-to-large effect sizes, optogenetic tagging is low-yield, and it is possible that recording more of these neurons would afford greater opportunity to identify more robust results and alternative coding schemes, such as neuronal synchrony.”

      (3) Regarding the results in Figure 5, wly at is the reason for the increase in the response times? The authors should plot the position track during intervals (0-6 s) with or without optogenetic or pharmacologic inhibition. The authors can check Figures 3, 5, and 6 in the paper https://doi.org/10.1016/j.cell.2016.06.032 for reference to analyze the data. 

      These are key points, and we are glad the reviewer raised them.  Our interpretation is that response time increases – without reliable changes in other task-specific movements such as nosepoke reaction time or traversal time (Fig S9).  This was lacking in our prior manuscript, and we are glad the reviewer raised it.  We have now added this to Page 30

      “Our interpretation is that because the activity of D2-MSN and D1-MSN ensembles represents the accumulation evidence, pharmacological/optogenetic disruption of D2-MSN/D1-MSN activity slows this accumulation process, leading to slower interval timing-response times (Fig 5) without changing other task-specific movements (Fig S9).  These results provide new insight into how opposing patterns of striatal MSN activity control behavior in similar ways and show that they play a complementary role in elementary cognitive operations.”

      Regarding the tracking of velocity, we unfortunately do not have this information reliably across all conditions. This citation is a beautiful landmark paper, and we are working on collecting this information in our new datasets going forward.  We have included this as a major limitation (Page 34): 

      “Still, future work combining motion tracking/accelerometry with neuronal ensemble recording and optogenetics and including bisection tasks may further unravel timing vs. movement in MSN dynamics (Robbe, 2023; Tecuapetla et al., 2016).”

      Once again, we are appreciative of the thoughtful points raised by this reviewer.  

      Reviewer #3 (Public Review): 

      Summary: 

      The cognitive striatum, also known as the dorsomedial striatum, receives input from brain regions involved in high-level cognition and plays a crucial role in processing cognitive information. However, despite its importance, the extent to which different projection pathways of the striatum contribute to this information processing remains unclear. In this paper, Bruce et al. conducted a study using various causal and correlational techniques to investigate how these pathways collectively contribute to interval timing in mice. Their results were consistent with previous research, showing that the direct and indirect striatal pathways perform opposing roles in processing elapsed time. Based on their findings, the authors proposed a revised computational model in which two separate accumulators track evidence for elapsed time in opposing directions. These results have significant implications for understanding the neural mechanisms underlying cognitive impairment in neurological and psychiatric disorders, as disruptions in the balance between direct and indirect pathway activity are commonly observed in such conditions. 

      Strengths: 

      The authors employed a well-established approach to study interval timing and employed optogenetic tagging to observe the behavior of specific cell types in the striatum. Additionally, the authors utilized two complementary techniques to assess the impact of manipulating the activity of these pathways on behavior. Finally, the authors utilized their experimental findings to enhance the theoretical comprehension of interval timing using a computational model. 

      We very much appreciate the considered read and comments by the reviewer, and recognition of the breadth of techniques in this manuscript. 

      Weaknesses: 

      The behavioral task used in this study is best suited for investigating elapsed time perception, rather than interval timing. Timing bisection tasks are often employed to study interval timing in humans and animals. In the optogenetic experiment, the laser was kept on for too long (18 seconds) at high power (12 mW). This has been shown to cause adverse effects on population activity (for example, through heating the tissue) that are not necessarily related to their function during the task epochs. Given the systemic delivery of pharmacological interventions, it is difficult to conclude that the effects are specific to the dorsomedial striatum. Future studies should use the local infusion of drugs into the dorsomedial striatum. 

      These are important points.  We agree with them completely and have now included responses to them.  First, bisection tasks certainly have advantages – we have justified our approach in the discussion (Page 32):

      “Our task version has been used extensively to study interval timing in mice and humans (Balci et al., 2008; Bruce et al., 2021; Stutt et al., 2024; Tosun et al., 2016; Weber et al., 2023). However, temporal bisection tasks, in which animals hold during a temporal cue and respond at different locations depending on cue length, have advantages in studying how animals time an interval because animals are not moving while estimating cue duration (Paton and Buonomano, 2018; Robbe, 2023; Soares et al., 2016). Our interval timing task version – in which mice switch between two response nosepokes to indicate their interval estimate has elapsed – has been used extensively in rodent models of neurodegenerative disease (Larson et al., 2022; Weber et al., 2024, 2023; Zhang et al., 2021), as well as in humans (Stutt et al., 2024). This version of interval timing involves motor timing, which engages executive function and has more translational relevance for human diseases than perceptual timing or bisection tasks (Brown, 2006; Farajzadeh and Sanayei, 2024; Nombela et al., 2016; Singh et al., 2021).  Furthermore, because many therapeutics targeting dopamine receptors are used clinically, these findings help describe how dopaminergic drugs might affect cognitive function and dysfunction. Future studies of D2-MSNs and D1-MSNs in temporal bisection and other timing tasks may further clarify the relative roles of D2- and D1-MSNs in interval timing and time estimation.”

      Second – we have included an explicit control that has the same laser that is on for the same epoch as in the experimental animal – and find no effects.  This is now detailed in the methods: (Page 37): 

      “To control for heating and nonspecific effects of optogenetics, we performed control experiments in mice without opsins using identical laser parameters in D2-cre or D1-cre mice (Fig S6).”

      And in the results (Page 21): 

      “To control for heating and nonspecific effects of optogenetics, we performed control experiments in D2-cre mice without opsins using identical laser parameters; we found no reliable effects for opsin-negative controls (Fig S6).”

      And on Page 21:

      “As with D2-MSNs, we found no reliable effects with opsin-negative controls in D1MSNs (Fig S6).”

      We have now detailed these results in Figure S6:

      Regarding focal pharmacology, we performed this experiment with focal infusion of D1/D2 antagonists in our prior work, which we have now cited (Page 4):

      “Similar behavioral effects were found with systemic (Stutt et al., 2024) or focal infusion of D2 or D1 antagonists locally within the dorsomedial striatum (De Corte et al., 2019a).”

      Comments on revised version: 

      Thank you for the comprehensive revisions. Most of my (addressable) concerns were addressed. The current version of your manuscript appears significantly improved. 

      Once again, we appreciate the reviewer’s constructive and insightful comments and careful review of our manuscript.  Their comments have been extremely helpful.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editor for their positive view and constructive valuable comments on the manuscript.  Following we address the suggestions of the reviewers.

      Reviewer #1 (Public Review):

      (1) It will be interesting to monitor the levels of another MIM insertase namely, OXA1. This will help to understand whether some of the observed changes in levels of OXPHOS subunits are related to alterations in the amounts of this insertase.

      OXA1 was not detected in the untargeted mass spectrometry analysis, most likely due to the fact that it is a polytopic membrane protein, spanning the membrane five times (1,2). Consequently, we measured OXA1 levels with immunoblotting, comparing patient fibroblast cells to the HC. No significant change in OXA1 steady state levels was observed.

      These results are now displayed (Fig. S3B and C) and discussed in the revised manuscript.

      Figure 3: How do the authors explain that although TIMM17 and TIMM23 were found to be significantly reduced by Western analysis they were not detected as such by the Mass Spec. method?

      The untargeted mass spectrometry in the current study failed to detect the presence of TIMM17 for both, patient fibroblasts and mice neurons, while TIMM23 was detected only for mice neurons and a decrease was observed for this protein but was not significant. This is most likely due to the fact that TIMM17 and TIMM23 are both polytopic membrane proteins, spanning the membrane four times, which makes it difficult to extract them in quantities suitable for MS detection (2,3).

      (2) How do the authors explain the higher levels of some proteins in the TIMM50 mutated cells?

      The levels of fully functional TIM23 complex are deceased in patients' fibroblasts. Therefore, the mechanism by which the steady state level of some TIM23 substrate proteins is increased, can only be explained relying on events that occur outside the mitochondria. This could include increase in transcription, translation or post translation modifications, all of which may increase their steady state level albite the decrease in the steady state level of the import complex.

      (3) Can the authors elaborate on why mutated cells are impaired in their ability to switch their energetic emphasis to glycolysis when needed?

      Cellular regulation of the metabolic switch to glycolysis occurs via two known pathways: 1) Activation of AMP-activated protein kinase (AMPK) by increased levels of AMP/ADP (4). 2) Inhibition of pyruvate dehydrogenase (PDH) complexes by pyruvate dehydrogenase kinases (PDK) (5). Therefore, changes in the steady state levels of any of these regulators could push the cells towards anaerobic energy production, when needed. In our model systems, we did not observe changes in any of the AMPK, PDH or PDK subunits that were detected in our untargeted mass spectrometry analysis (see volcano plots below, no PDK subunits were detected in patient fibroblasts). Although this doesn’t directly explain why the cells have an impaired ability to switch their energetic emphasis, it does possibly explain why the switch did not occur de facto.

      Author response image 1.

      Reviewer #2 (Public Review):

      (1) The authors claim in the abstract, the introduction, and the discussion that TIMM50 and the TIM23 translocase might not be relevant for mitochondrial protein import in mammals. This is misleading and certainly wrong!!!

      Indeed, it was not in our intention to claim that the TIM23 complex might not be relevant. We have now rewritten the relevant parts to convey the correct message:

      Abstract –

      Line 25 - “Strikingly, TIMM50 deficiency had no impact on the steady state levels of most of its putative substrates, suggesting that even low levels of a functional TIM23 complex are sufficient to maintain the majority of complex-dependent mitochondrial proteome.”

      Introduction –

      Line 87 - Surprisingly, functional and physiological analysis points to the possibility that low levels of TIM23 complex core subunits (TIMM50, TIMM17 and TIMM23) are sufficient for maintaining steady-state levels of most presequence-containing proteins. However, the reduced TIM23CORE component levels do affect some critical mitochondrial properties and neuronal activity.

      Discussion –

      Line 339 – “…surprising, as normal TIM23 complex levels are suggested to be indispensable for the translocation of presequence-containing mitochondrial proteins…”

      Line 344 – “…it is possible that unlike what occurs in yeast, normal levels of mammalian TIMM50 and TIM23 complex are mainly essential for maintaining the steady state levels of intricate complexes/assemblies.”

      Line 396 – “In summary, our results suggest that even low levels of TIMM50 and TIM23CORE components suffice in maintaining the majority of mitochondrial matrix and inner membrane proteome. Nevertheless, reductions in TIMM50 levels led to a decrease of many OXPHOS and MRP complex subunits, which indicates that normal TIMM50 levels might be mainly essential for maintaining the steady state levels and assembly of intricate complex proteins.”

      Reviewer #1 (Recommendations For The Authors):

      (1) Lines 25-26: The authors write "Strikingly, TIMM50 deficiency had no impact on the steady state levels of most of its substrates". Since the current data challenges the definition of some proteins as substrates of TIMM50, I suggest using the term "putative substrates".

      Changed as suggested

      (2) Line 27: It is not clear whether the wording "general import role of TIM23" it refers to the TIM23 protein or the TIM23 complex. This should be clarified.

      Clarified. It now states "TIM23 complex".

      (3) Line 72: should be "and plays".

      Changed as suggested.

      (4) It will be helpful to include in Figure 1 a small scheme of TIMM50 and to indicate in which domain the T252M mutation is located.

      We predicted the AlphaFold human TIMM50 structure and indicated the mutation site and the different TIMM50 domains. The structure is included in Fig. 1A.

      (5) I suggest labelling the "Y" axis in Fig. 1B as "Protein level (% of control)".

      Changed as suggested in Fig. 1C (previously Fig. 1B) and in Fig. 2C.

      (6) Line 179: since the authors tested here only about 10 mitochondrial proteins (out of 1500), I think that the word "many" should be replaced by "several representative" resulting in "steady state levels of several representative mitochondrial proteins".

      Changed as requested.

      (7) Line 208: correct typo.

      Typo was corrected.

      (8) Figure 4 is partially redundant as its data is part of Figure 3. The authors can consider combining these two figures. Accordingly, large parts of the legend of Figure 4 are repeating information in the legend to Figure 3 and can refer to it.

      We revamped Figures 3 and 4. Figure 3 now shows the analysis of fibroblasts proteomics while Figure 4 focuses on neurons proteomics. We also modified the legend of Figure 4.

      Reviewer #2 (Recommendations For The Authors):

      (1) Abstract: 'Strikingly, TIMM50 deficiency had no impact on the steady state levels of most of its substrates, challenging the currently accepted import dogma of the essential general import role of TIM23 and suggesting that fully functioning TIM23 complex is not essential for maintaining the steady state level of the majority of mitochondrial proteins'. This sentence needs to be rephrased. The data do not challenge any dogma! The authors only show that lower levels of functional TIM23 are sufficient.

      We have rewritten all the relevant sentences as suggested (details are also mentioned in response to reviewer 2 public review point 1)

      (2) Introduction: 'Surprisingly, functional and physiological analysis points to the possibility that TIMM50 and a fully functional TIM23 complex are not essential for maintaining steady-state levels of most presequence-containing proteins'. This again needs to be rephrased.

      Rewritten as suggested (details mentioned in response to reviewer 2 public review point 1)

      (3) Discussion: 'In summary, our results challenge the main dogma that TIMM50 is essential for maintaining the mitochondrial matrix and inner membrane proteome, as steady state level of most mitochondrial matrix and inner membrane proteins did not change in either patient fibroblasts or mouse neurons following a significant decrease in TIMM50 levels.' This again needs to be rephrased.

      Rewritten as suggested (details mentioned in response to reviewer 2 public review point 1)

      (4) The analysis of the proteomics experiment should be improved. The authors show in Figures 3 and 4 several times the same volcano plots in which different groups of proteins are indicated. It would be good to add (a) a principal component analysis to show that the replicates from the mutant samples are consistently different from the controls, (b) a correlation plot that compares the log-fold-change of P1 to that of P2 to show which of the proteins are consistently changed in P1 and P2 and (c) a GO term analysis to show in an unbiased way whether mitochondrial proteins are particular affected upon TIMM50 depletion.

      Figures 3 and 4 have been changed to avoid redundancy. Figure 3 now focuses on fibroblasts proteomics (with additional analysis), while Figure 4 focuses on neurons proteomics. PCA analysis was added in Fig S1, showing that the proteomics replicates of both patients (P1 and P2) are consistently different than the healthy control (HC) replicates. Correlation plots were added in Figure 3C and D, showing high correlation of the downregulated and upregulated mitochondrial proteins between P1 and P2. These plots further highlight that MIM proteins are more affected than matrix proteins and that the OXPHOS and MRP systems comprise the majority of significantly downregulated proteins in both patients. GO term analysis was performed for all the detected proteins that got significantly downregulated in both patients. The GO term analysis is displayed in Figure S3A, and shows that mitochondrial proteins, mainly of the OXPHOS and MRP machineries, are particularly affected.

      (5) Figure 1. The figure shows the levels of TIM and TOM subunits in two mutant samples. The quantifications suggest that the levels of TIMM21, TOMM40, and mtHsp60 are not affected. However, from the figure, it seems that there are increased levels of TIMM21 and reduced levels of TOMM40 and mtHsp60. Unfortunately, in the figure most of the signals are overexposed. Since this is a central element of the study, it would be good to load dilutions of the samples to make sure that the signals are indeed in the linear range and do scale with the amounts of samples loaded.

      The representative WB panels display the Actin loading control of the representative TIMM50 repeat (the top panel). However, each protein was tested separately, at least three times, and was normalized to its own Actin loading control.

      (6) Figure 2B. All panels are shown in color except the panel for TIMM17B which is grayscale. This should be changed to make them look equal.

      All the western blot panels were changed to grayscale.

      (7) Discussion: 'Despite being involved in the import of the majority of the mitochondrial proteome, no study thus far characterized the effects of TIMM50 deficiency on the entire mitochondrial proteome.' This sentence is not correct as proteomic data were published previously, for example for Trypanosomes (PMID: 34517757) and human cells (PMID: 38828998).

      We have corrected the statement to “Despite being involved in the import of the majority of the mitochondrial proteome, little is known about the effects of TIMM50 deficiency on the entire mitochondrial proteome.”

      (8) A recent study on a very similar topic was published by Diana Stojanovki's group that needs to be cited: PMID: 38828998. The results of this comprehensive study also need to be discussed!!!

      We have added the following in the discussion:

      Line 362 – “These observations are similar to the recent analysis of patient-derived fibroblasts which demonstrated that TIMM50 mutations lead to severe deficiency in the level of TIMM50 protein (6,7). Notably, this decrease in TIMM50 was accompanied with a decrease in the level of other two core subunits, TIMM23 and TIMM17. However, unexpectedly, proteomics analysis in our study and that conducted by Crameri et al., 2024 indicate that steady state levels of most TIM23-dependent proteins are not affected despite a drastic decrease in the levels of the TIM23CORE complex (7). The most affected proteins constitute of intricate complexes, such as OXPHOS and MRP machineries. Thus, both these studies indicate a surprising possibility that even reduced levels of the TIM23CORE components are sufficient for maintaining the steady state levels of most presequence containing substrates.

      (1) Homberg B, Rehling P, Cruz-Zaragoza LD. The multifaceted mitochondrial OXA insertase. Trends Cell Biol. 2023;33(9):765–72.

      (2) Carroll J, Altman MC, Fearnley IM, Walker JE. Identification of membrane proteins by tandem mass spectrometry of protein ions. Proc Natl Acad Sci U S A. 2007;104(36):14330–5.

      (3) Ting SY, Schilke BA, Hayashi M, Craig EA. Architecture of the TIM23 inner mitochondrial translocon and interactions with the matrix import motor. J Biol Chem [Internet]. 2014;289(41):28689–96. Available from: http://dx.doi.org/10.1074/jbc.M114.588152

      (4) Trefts E, Shaw RJ. AMPK: restoring metabolic homeostasis over space and time. Mol Cell [Internet]. 2021;81(18):3677–90. Available from: https://doi.org/10.1016/j.molcel.2021.08.015

      (5) Zhang S, Hulver MW, McMillan RP, Cline MA, Gilbert ER. The pivotal role of pyruvate dehydrogenase kinases in metabolic flexibility. Nutr Metab. 2014;11(1):1–9.

      (6) Reyes A, Melchionda L, Burlina A, Robinson AJ, Ghezzi D, Zeviani M.  Mutations in TIMM50 compromise cell survival in OxPhos‐dependent metabolic conditions . EMBO Mol Med. 2018;

      (7) Crameri JJ, Palmer CS, Stait T, Jackson TD, Lynch M, Sinclair A, et al. Reduced Protein Import via TIM23 SORT Drives Disease Pathology in TIMM50-Associated Mitochondrial Disease. Mol Cell Biol [Internet]. 2024;0(0):1–19. Available from: https://doi.org/10.1080/10985549.2024.2353652

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      A subset of fibroblast growth factor (FGF) proteins (FGF11-FGF14; often referred to as fibroblast growth factor homologous factors because they are not thought to be secreted and do not seem to act as growth factors) have been implicated in modulating neuronal excitability, however, the exact mechanisms are unclear. In part, this is because it is unclear how different FGF isoforms alter ion channel activity in different neuronal populations. In this study, the authors explore the role of FGF 13 in epilepsy using a variety of FGF13 knock-out mouse models, including several targeted cell-type specific conditional knockout mouse lines. The study is intriguing as it indicates that FGF13 plays an especially important role in inhibitory neurons. Furthermore, although FGF13 has been studied as a regulator of neuronal voltage-gated sodium channels, the authors present data indicating that FGF13 knockout in inhibitory neurons induces seizures not by altering sodium current properties but by reducing voltage-gated potassium currents in inhibitory neurons. While intriguing, the data are incomplete in several aspects and thus the mechanisms by which various FGF13 variants induce Developmental and Epileptic Encephalopathies are not resolved by the data presented. 

      Strengths: 

      A major strength is the array of techniques used to assess the mice and the electrical activity of the neurons. 

      The multiple mouse knock-out models utilized are a strength, clearly demonstrating that FGF13 expression in inhibitory neurons, and possibly specific sub-populations of inhibitory neurons, is critically important. 

      The data on the increased sensitivity to febrile seizures in KO mice are very nice, provide clear evidence for regulation of excitability in inhibitory neurons by FGF13. 

      The Gad2Fgf13-KO mice indicated that several Fgf13 splice variants may be expressed in inhibitory neurons and suggest that the Fgf13-VY splice variants may have previously unrecognized specific roles in regulating neuronal excitability. 

      The data on males and females from the various KO mice lines indicates a clear gene dosage effect for this X-linked gene. 

      The unbiased metabolomic analysis supports the assertion that Fgf13 expression in inhibitory neurons is important in regulating seizure susceptibility. 

      Weaknesses: 

      The knockout approach can be powerful but also has distinct limitations. Multiple missense mutations in FGF13-S have been identified. The knockout models employed here are not appropriate for understanding how these missense variants lead to altered neuronal excitability. While the data show that complete loss of Fgf13 from excitatory forebrain neurons is not sufficient to induce seizure susceptibility, it does not rule out that specific variants (e.g., R11C) might alter the excitability of forebrain neurons. The missense variants may alter excitatory and/or inhibitory neuron excitability in distinct ways from a full FGF13 knockout. 

      We agree with this overall interpretation of our data and have updated our language in the Discussion to make the distinction between mechanisms attributable to a knockout compared to a missense variant. We note, however, that the proposed mechanism by which missense variants (e.g., R11C) drive seizures is through loss of long-term inactivation in excitatory neurons and our excitatory knockout model shows loss of long-term inactivation in excitatory neurons. Thus, our knockout model demonstrates that the mechanism(s) by which the missense variants alter neuronal excitability in excitatory neurons must exclude long-term inactivation, thereby providing some clarity regarding the proposed mechanism for those missense variants.

      The electrophysiological experiments are intriguing but not comprehensive enough to support all of the conclusions regarding how FGF13 modulates neuronal excitability. 

      We agree and have updated the language in our Discussion to clarify speculation from conclusions that are directly supported by data.

      Another concern is the use of different ages of neurons for different experiments. For example, sodium currents in Figures 2 and 5 (and Supplemental Figures 2 and 7) are recorded from cultured neurons, which may have very different properties (including changes in sodium channel complexes) from neurons in vivo that drive the development of seizure activity. 

      We agree and acknowledge the important differences between neurons examined in culture and in vivo, yet the in vitro vs in vivo preparations were necessitated by the specific experiments. While these differences are important, previous gene profiling studies comparing primary hippocampal neurons with developing mouse hippocampus have found that although gene expression is accelerated in vitro, gene expression profiles in vitro and in vivo are similar (PMID: 11438693). Moreover, the relative immaturity of the cultured neurons is balanced at least in part because the in vivo experiments were performed on very young animals (~P12), which also have relatively immature neurons. Thus, we predict that sodium channel complexes studied in vitro are informative for the in vivo aspects of this investigation.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors address three primary questions: 

      (1) how FGF13 variants confer seizure susceptibility, 

      (2) the specific cell types involved, and 

      (3) the underlying mechanisms, particularly regarding Nav dysfunction. 

      They use different Cre drivers to generate cell type-specific knockouts (KOs). First, using Nestin-Cre to create a whole-brain Fgf13 KO, they observed spontaneous seizures and premature death. While KO of Fgf13 in excitatory neurons does not lead to spontaneous seizures, KO in inhibitory neurons recapitulates the seizures and premature death observed in the Nestin-Cre KO. They further narrow down the critical cell type to MGE-derived interneurons (INs), demonstrating that MGE-neuron-specific KO partially reproduces the observed phenotypes. "All interneuron" KOs exhibit deficits in synaptic transmission and interneuron excitability, not seen in excitatory neuron-specific KOs. Finally, they rescue the defects in the interneuron-specific KO by expressing specific Fgf13 isoforms. This is an elegant and important study adding to our knowledge of mechanisms that contribute to seizures. 

      Strengths 

      • The study provides much-needed cell type-specific KO models. 

      • The authors use appropriate Cre lines and characterize the phenotypes of the different KOs. 

      • The metabolomic analysis complements the rest of the data effectively. 

      • The study confirms and extends previous research using improved approaches (KO lines vs. in vitro KD or antibody infusion). 

      • The methods and analyses are robust and well-executed. 

      Weaknesses 

      • One weakness lies in the use of the Nkx2.1 line (instead of Nkx2.1CreER) in the paper. As a result, some answers to key questions are incomplete. For instance, it remains unclear whether the observed effects are due to Chandelier cells or NGFCs, potentially both MGE and CGE derived, explaining why Nkx2.1 alone does not fully replicate the overall inhibitory KO. Using Nkx2.1CreER could have helped address the cell specificity. With the Nkx2.1 line used in the paper, the answer is partial. 

      We agree that while our data is consistent with the possibility of a role for Fgf13 in chandelier function, the current Cre driver does not provide sufficient direct evidence. We performed preliminary experiments (unpublished) using a Nkx2.1CreER driver, with late embryonic induction with a tamoxifen dosage validated for sparse labeling of chandelier cells (30846310). While we successfully replicated sparse labeling of neocortical chandelier cells (using a Cre-dependent Ai9 reporter), we were unable to determine if there was a significant loss of FGF13 as measured by immunohistochemistry since FGF13+ cells are only a small subset of the already sparse cells. Because multiple snRNA-seq studies identified Fgf13 as a marker for chandelier cells, we speculated—now more carefully circumspect—about the role of chandelier cells vs NGFCs.

      • While the mechanism behind the reduced inhibitory drive in the IN-specific KO is suggested to be presynaptic, the chosen method does not allow them to exactly identify the mechanisms (spontaneous vs mEPSC/mIPSC), and whether it is a loss of inhibitory synapses (potentially axo-axonic) or release probability. 

      We agree that this is an important limitation of our work, and that we are unable to identify the exact mechanism behind the reduced inhibitory drive. We are continuing to explore this question in a follow-up study.

      • Some supporting data (e.g. Supplemental Figure 7 and 8) appear to come from only one (or two) WT and one (or two) KO mice. Supplementary data, like main data, should come from at least three mice in total to be considered complete/solid (even if the statistical analysis is done with cells). 

      All panels in the manuscript, including supplementary data, except supplementary 7D and 8A, have N(mouse)≥3. Time limitations (graduating student) prevented us from obtaining a larger N. Because those supplementary data are not critical for supporting our conclusions, we removed them.

      General Assessment 

      The general conclusions of this paper are supported by data. As it is, the claim that "these results enhance our understanding of the molecular mechanisms that drive the pathogenesis of Fgf13-related seizures" is partially supported. A more cautious term may be more appropriate, as the study shows the mechanism is not Nav-mediated and suggests alternative mechanisms without unambiguously identifying them. The conclusion that the findings "expand our understanding of FGF13 functions in different neuron subsets" is supported, although somewhat overstated, as the work is not conclusive about the exact neuron subtypes. However, it does indeed show differential functions for specific neuronal classes, which is a significant result. 

      Impact and Utility 

      This paper is undoubtedly valuable. Understanding that excitatory neurons are not the primary contributors to the observed phenotypes is crucial. The finding that the effects are not MGE-unique is also important. This work provides a solid foundation for further research and will be a useful resource for future studies. 

      Reviewer #3 (Public Review): 

      Summary: 

      The authors aimed to determine the mechanism by which seizures emerge in Developmental and Epileptic Encephalopathies caused by variants in the gene FGF13. Loss of FGF13 in excitatory neurons had no effect on seizure phenotype as compared to the loss of FGF13 in GABAergic interneurons, which in contrast caused a dramatic proseizure phenotype and early death in these animals. They were able to show that Fgf13 ablation and consequent loss of FGF13-S and FGF13-VY reduced overall inhibitory input from Fgf13-expressing interneurons onto hippocampal pyramidal neurons. This was shown to occur not via disruption to voltage-gated sodium channels but rather by reducing potassium currents and action potential repolarisation in these interneurons. 

      Strengths: 

      The authors employed multiple well-validated, novel mouse lines with FGF13 knocked out in specific cell types including all neurons, all excitatory cells, all GABAergic interneurons, or a subset of MGE-derived interneurons, including axo-axonic chandelier cells. The phenotypes of each of these four mouse lines were carefully characterised to reveal clear differences with the most fundamental being that Interneuron-targeted deletion of FGF13 led to perinatal mortality associated with extensive seizures and impaired the hippocampal inhibitory/excitatory balance while deletion of FGF13 in excitatory neurons caused no detectable seizures and no survival deficits. 

      The authors made excellent use of western blotting and in situ hybridisation of the different FGF13 isoforms to determine which isoforms are expressed in which cell types, with FGF3-S predominantly in excitatory neurons and FGF13-VY and FGF13-V predominantly in GABAergic neurons. 

      The authors performed a highly detailed electrophysiological analysis of excitatory neurons and GABAergic interneurons with FGF13 deficits using whole-cell patch clamp. This enabled them to show that FGF13 removal did not affect voltage-gated sodium channels in interneurons, but rather reduced the action of potassium channels, with the resultant effect of making it more likely that interneurons enter depolarisation block. These findings were strengthened by the demonstration that viral re-expression of different Fgf13 splice isoforms could partially rescue deficits in interneuron action potential output and restore K+ channel current size. 

      Additionally, the discussion was nuanced and demonstrated how the current findings resolved previous apparent contradictions in the field involving the function of FGF13. 

      These findings will have a significant impact on our understanding of how FGF13 causes seizures and death in DEEs, and the action of different FGF13 isoforms within different neuronal cell types, particularly GABAergic interneurons. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The limitations of the KO model should be fully discussed in the discussion. It should be clear that knocking out FGF13 does not provide insight into how missense mutations such as R11C may alter excitatory and/or inhibitory neuron excitability. 

      We agree with this overall interpretation of our data and have updated our language in the Discussion to make the distinction between mechanisms attributable to a knockout compared to a missense variant. We note, however, that the proposed mechanism by which missense variants (e.g., R11C) drive seizures is through loss of long-term inactivation in excitatory neurons and our excitatory knockout model shows loss of long-term inactivation in excitatory neurons. Thus, our knockout model demonstrates that the mechanism(s) by which the missense variants alter neuronal excitability in excitatory neurons must exclude long-term inactivation, thereby providing some clarity regarding the proposed mechanism for those missense variants.

      It is important to know what sodium channel isoforms are expressed in the cultured neurons used in the experiments for Figures 2 and 5. Are Nav1.1, Nav1.2, Nav1.3, and Nav1.6 expressed at appropriate levels in the cultures? 

      We agree it is important to know that the sodium channel isoforms expressed in our hippocampal neurons are expressed at physiologically relevant levels, for further validation of our primary culture system. We have added RT-qPCR data from our hippocampal neuron cultures (Supplemental Figure 2B) showing the relative levels of SCN1A, SCN2A, SCN3A, and SCN8A, which are similar to the relative levels of voltage-gated sodium channel isoforms found in rodent and human forebrain in early development (Figure 1 in PMID: 35031483).

      The electrophysiological experiments are intriguing but limited. One, it would be helpful to report if there were any changes in resting membrane potential for the cells reported in Figure 5. It is also inappropriate to unequivocally state that "Nav currents were not significantly affected by Fgf13 knockout in Gad2Fghf13 KO neurons" as only a sampling of properties was investigated. Recovery from inactivation and persistent current amplitudes were not evaluated. Furthermore, while it looks like long-term inactivation is not altered, only one specific protocol was used and currents measured from cultured neurons may not be fully representative of neuronal properties in vivo. 

      We agree that we performed a selective analysis of Nav currents—selected because those are the major parameters that have been associated with FGF13 modulation. Because we did not observe significant differences in NaV currents, we therefore hypothesized that FGF13 affected other currents, as previously observed, and consequently assessed potassium currents, for which we did observe a difference. Further, we note that our sodium current and potassium current results are consistent with, and supportive of, our action potential data in which we find no deficit in AP initiation, but rather a deficit in AP repolarization. We revised the text to reflect the more limited analysis of Nav currents. Regarding long-term inactivation, we also agree that measurements in cultured neurons may not fully represent neuronal properties in vivo; however, we note that regulation of long-term inactivation by FGF13 has previously been assessed only in cultured cells (and not in neurons). Thus, our protocols were designed to query that modulation previously reported.

      The first sentence of the results section is misleading: "To determine how FGF13 variants contribute to seizure disorders, we developed genetic mouse models that eliminate Fgf13 in specific neuronal cell types." The knockouts do not target specific splice isoforms and do not help determine how missense variants contribute to DEE. This should be modified to reflect better what is actually being tested. 

      We agree and have revised our text to state that our goal was to assess how FGF13 contributes to neuronal excitability and thereby accurately reflect the cell type-specific, but not isoform specific, targeting.

      Reviewer #2 (Recommendations For The Authors): 

      • The sentence in the introduction stating "an unusual example of differential expression of an alternatively spliced neuronal gene in excitatory vs. inhibitor neurons" is factually incorrect, especially for transcripts regulating intrinsic properties like FGF13. Refer to PMID: 31451803 for more details and consider rephrasing this statement. 

      We updated our text to reflect the similarity of Fgf13’s cell type-specific alternative splicing to other genes known to control synaptic interactions and neuronal architecture and added the suggested reference.

      • Consistency is needed in the manuscript regarding the term "BASEscope" or "basescope"; the correct version is "BaseScope." 

      We corrected the text accordingly.

      • In the discussion, the term "reduced overall inhibitory drive" might be more appropriate than "input." 

      We updated the text accordingly.

      • The authors should refer to the Fgf13 data in the database from Furlanis et al., which complements their findings: https://scheiffele-splice.scicore.unibas.ch/

      We agree and now incorporate this reference.

      • The phrase "Fgf13 silencing in Nkx2.1 expressing neurons" should be clarified to include the use of CreER, which was crucial and effectively resulted in the labeling of a different subtype of interneurons, see PMID: 23180771. 

      We agree and have updated our text accordingly.

      • Be more cautious when discussing the role of FGF13 in chandelier function; while it seems probable, the current Cre driver used provides no direct evidence. 

      We agree (as noted above) that while our data are consistent with the possibility of a role for Fgf13 in chandelier function, the current Cre driver used is insufficient to offer direct evidence and therefore updated our text in the discussion.

      • The gene dosage effect is interesting, it would be interesting to explore it further in the future. 

      We agree. Because our data suggest that seizures result from loss of inhibitory neuron input, we hypothesize that the gene dosage effect derives from further loss of inhibitory neuron input and thus more hyperexcitability.

      • Another critical aspect not addressed here and of interest for the future is the distinction between the role of FGF13 in interneuron development versus general maintenance. Using Nkx2.1CreER could have helped address both cell specificity and developmental roles. 

      We agree that there may be an interesting distinction between the role of Fgf13 in development versus general maintenance. We have piloted an Nkx2.1-CreER targeted deletion of Fgf13 from cortical interneurons but have been unsuccessful with significant deletion of Fgf13, likely because the Nkx2.1-CreER strategy targets only a sparse subset of interneurons and FGF13 is expressed in only a subset of total interneurons. Thus, use of the Nkxs.1-CreER strategy is challenging. We are looking for ways to optimize.

      Reviewer #3 (Recommendations For The Authors): 

      This was a truly fabulous paper, with an exceptional quantity of beautiful data. I would like to congratulate the authors on their superb work. 

      In the discussion, the authors correctly draw attention to the fact that the clear pro-seizure phenotype they see when FGF13 was knocked out more specifically in a subset of interneurons including chandelier cells, adds to our understanding of the role of FGF13 in chandelier cells. More than that though, given that FGF13 is reducing excitability in these cells AND this results in a strong pro-seizure phenotype, they may want to postulate that this lends further weight to the argument that chandeliers cells are likely powerful regulators of network excitability despite suggestions in the field that they could potentially have a proexcitatory function (see Szabadics et al. Science 2006). 

      We agree this is interesting and have elaborated on our discussion of chandelier cells to include this point while also addressing the important caveats noted by reviewer 2.

      A minor point: 

      On page 26 the sentence: 

      "Here, we were able to assess FGF13-S and FGF13-VY, chosen because they are most abundantly expressed isoforms in the adult mouse brain, but the inability to rescue electrophysiological consequences completely with either isoform alone leaves open the possibility that other isoforms (e.g., FGF13-U, FGF13-V, and FGF13-VY) also make critical contributions." Should the last "FGF13-VY" be removed? 

      We thank the reviewer for noticing the error and have updated the text accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This useful study reports how neuronal activity in the prefrontal cortex maps time intervals during which animals have to wait until reaching a reward and how this mapping is preserved across days. However, the evidence supporting the claims is incomplete as these sequential neuronal patterns do not necessarily represent time but instead may be correlated with stereotypical behavior and restraint from impulsive decision, which would require further controls (e.g. behavioral analysis) to clarify the main message. The study will be of interest to neuroscientists interested in decision making and motor control.

      We thank the editors and reviewers for the constructive comments. In light of the questions mentioned by the reviewers, we have performed additional analyses in our revision, particularly aiming to address issues related to single-cell scalability, and effects of motivation and movement. We believe these additional data will greatly improve the rigor and clarity of our study. We are grateful for the review process of eLife.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary:

      This paper investigates the neural population activity patterns of the medial frontal cortex in rats performing a nose poking timing task using in vivo calcium imaging. The results showed neurons that were active at the beginning and end of the nose poking and neurons that formed sequential patterns of activation that covaried with the timed interval during nose poking on a trial-by-trial basis. The former were not stable across sessions, while the latter tended to remain stable over weeks. The analysis on incorrect trials suggests the shorter non-rewarded intervals were due to errors in the scaling of the sequential pattern of activity.

      Strengths:

      This study measured stable signals using in vivo calcium imaging during experimental sessions that were separated by many days in animals performing a nose poking timing task. The correlation analysis on the activation profile to separate the cells in the three groups was effective and the functional dissociation between beginning and end, and duration cells was revealing. The analysis on the stability of decoding of both the nose poking state and poking time was very informative. Hence, this study dissected a neural population that formed sequential patterns of activation that encoded timed intervals. 

      We thank the reviewer for the positive comments.

      Weaknesses:

      It is not clear whether animals had enough simultaneously recorded cells to perform the analyzes of Figures 2-4. In fact, rat 3 had 18 responsive neurons which probably is not enough to get robust neural sequences for the trial-by-trial analysis and the correct and incorrect trial analysis. 

      We thank the reviewer for the comment. Our imaging data generally yielded 50-150 cells in each session. The 18 neurons mentioned by the reviewer are from the duration cell category. We have now provided the number of imaged cells from each rat in the new Supplementary figure 1D. In addition, we have plotted the duration cells’ sequential activity of individual trials for each rat in new Supplementary figure 1B and 1C. These data demonstrate robust sequential activities from the duration cells.

      In addition, the analysis of behavioral errors could be improved. The analysis in Figure 4A could be replaced by a detailed analysis on the speed, and the geometry of neural population trajectories for correct and incorrect trials.

      We thank the reviewer for the suggestions. We have now performed analyses of the neural population trajectories as the reviewer suggested. We have calculated the neural population trajectories using the first two principal components of the neural activities during nose poke events. While both correct and incorrect trials show similar shapes of the trajectories, correct trials show more expanded paths, with longer lengths on average. These new results are now updated in Figure 4. Since type I or type II errors would likely generate trajectories not following the general direction which is different from our observations, these results are consistent with our conclusion that scaling errors contribute to the incorrect behavior timing in these rats.

      In the case of Figure 4G is not clear why the density of errors formed two clusters instead of having a linear relation with the produce duration. I would be recommendable to compute the scaling factor on neuronal population trajectories and single cell activity or the computation of the center of mass to test the type III errors. 

      To clarify the original Figure 4G, the correct trials tended to show positive time estimation errors while the incorrect trials showed negative time estimation errors. We believe that the polarity switch between these two types suggests a possible use of this neural mechanism to time the action of the rats.

      In addition, we have performed the analysis suggested by the reviewer in our revision. We calculated two types of scaling factors. On individual cell level, we computed the peak position of individual trials to the expected positions from averaged template. And on neural population level, we searched for a scaling multiplier to resample the calcium activity data and minimized the differences between scaled activity and the expected template. Using these two factors, we found that correct trials show significantly larger scaling compared to incorrect trials, consistent with our original interpretation that behavior errors are primarily correlated with scaling errors in the neural activities (type III error). These new results are now incorporated in Figure 4 and we have also updated the main text for the descriptions.

      Due to the slow time resolution of calcium imaging, it is difficult to perform robust analysis on ramping activity. Therefore, I recommend downplaying the conclusion that: "Together, our data suggest that sequential activity might be a more relevant coding regime than the ramping activity in representing time under physiological conditions." 

      We agree with the reviewer, and have now modified this sentence in the abstract.

      Reviewer #2 (Public Review):

      In this manuscript, Li and collaborators set out to investigate the neuronal mechanisms underlying "subjective time estimation" in rats. For this purpose, they conducted calcium imaging in the prefrontal cortex of water-restricted rats that were required to perform an action (nosepoking) for a short duration to obtain drops of water. The authors provided evidence that animals progressively improved in performing their task. They subsequently analyzed the calcium imaging activity of neurons and identify start, duration, and stop cells associated with the nose poke. Specifically, they focused on duration cells and demonstrated that these cells served as a good proxy for timing on a trial-by-trial basis, scaling their pattern of actvity in accordance with changes in behavioral performance. In summary, as stated in the title, the authors claim to provide mechanistic insights into subjective time estimation in rats, a function they deem important for various cognitive conditions.

      This study aligns with a wide range of studies in system neuroscience that presume that rodents solve timing tasks through an explicit internal estimation of duration, underpinned by neuronal representations of time. Within this framework, the authors performed complex and challenging experiments, along with advanced data analysis, which undoubtedly merits acknowledgement. However, the question of time perception is a challenging one, and caution should be exercised when applying abstract ideas derived from human cognition to animals. Studying so-called time perception in rats has significant shortcomings because, whether acknowledged or not, rats do not passively estimate time in their heads. They are constantly in motion. Moreover, rats do not perform the task for the sake of estimating time but to obtain their rewards are they water restricted. Their behavior will therefore reflects their motivation and urgency to obtain rewards. Unfortunately, it appears that the authors are not aware of these shortcomings. These alternative processes (motivation, sensorimotor dynamics) that occur during task performance are likely to influence neuronal activity. Consequently, my review will be rather critical. It is not however intended to be dismissive. I acknowledge that the authors may have been influenced by numerous published studies that already draw similar conclusions. Unfortunately, all the data presented in this study can be explained without invoking the concept of time estimation. Therefore, I hope the authors will find my comments constructive and understand that as scientists, we cannot ignore alternative interpretations, even if they conflict with our a priori philosophical stance (e.g., duration can be explicitly estimated by reading neuronal representation of time) and anthropomorphic assumptions (e.g., rats estimate time as humans do). While space is limited in a review, if the authors are interested, they can refer to a lengthy review I recently published on this topic, which demonstrates that my criticism is supported by a wide range of timing experiments across species (Robbe, 2023). In addition to this major conceptual issue that cast doubt on most of the conclusions of the study, there are also several major statistical issues.

      Main Concerns

      (1) The authors used a task in which rats must poke for a minimal amount of time (300 ms and then 1500 ms) to be able to obtain a drop of water delivered a few centimeters right below the nosepoke. They claim that their task is a time estimation task. However, they forget that they work with thirsty rats that are eager to get water sooner than later (there is a reason why they start by a short duration!). This task is mainly probing the animals ability to wait (that is impulse control) rather than time estimation per se. Second, the task does not require to estimate precisely time because there appear to be no penalties when the nosepokes are too short or when they exceed. So it will be unclear if the variation in nosepoke reflects motivational changes rather than time estimation changes. The fact that this behavioral task is a poor assay for time estimation and rather reflects impulse control is shown by the tendency of animals to perform nose-pokes that are too short, the very slow improvement in their performance (Figure 1, with most of the mice making short responses), and the huge variability. Not only do the behavioral data not support the claim of the authors in terms of what the animals are actually doing (estimating time), but this also completely annhilates the interpretation of the Ca++ imaging data, which can be explained by motivational factors (changes in neuronal activity occurring while the animals nose poke may reflect a growing sens of urgency to check if water is available). 

      We would like to respond to the reviewer’s comments 1, 2 and 4 together, since they all focus on the same issue. We thank the reviewer for the very thoughtful comments and for sharing his detailed reasoning from a recently published review (Robbe, 2023). A lot of discussions go beyond the scope of this study, and we agree that whether there is an explicit representation of time (an internal clock) in the brain is a difficult question to be answer, particularly by using animal behaviors. In fact, even with fully conscious humans and elaborated task design, we think it is still questionable to clearly dissociate the neural substrate of “timing” from “motor”. In the end, it may as well be that as the reviewer cited from Bergson’sarticle, the experience of time cannot be measured.

      Studying the neural representation of any internal state may suffer from the same ambiguity. With all due respect, however, we would like to limit our response to the scope of our results. According to the reviewer, two alternative interpretations of the task-related sequential activity exist: 1, duration cells may represent fidgeting or orofacial movements and 2, duration cells may represent motivation or motion plan of the rats. To test the first alternative interpretation, we have now performed a more comprehensive analysis of the behavior data at all the limbs and visible body parts of the experimental rats during nose poke and analyzed its periodicity among different trials. We found that the coding cells (including duration, start and end cells) activities were not modulated by these motions, arguing against this possibility. These data are now included in the new Supp. Figure 2, and we have added corresponding texts in the manuscript.

      Regarding the second alternative interpretation, we think our data in the original Figure 4G argues against it. In this graph, we plotted the decoding error of time using the duration cells’ activity against the actual duration of the trials. If the sequential activity of durations cells only represents motivation, then the errors should be linearly modulated by trial durations. The unimodal distribution we observed (Figure 4G and see graph below for a re-plot without signs) suggests that the scaling factor of the sequential activity represents information related to time. And the fact that this unimodal distribution centered at the time threshold of the task provides strong evidence for the active use of scaling factor for time estimation.

      In order to further test the relationship to motivation, we have measured the time interval between exiting nose poke to the start of licking water reward as an independent measurement of motivation for each trial. We found that this reward-seeking time was positively correlated with the trial durations, suggesting that the durations were correlated with motivation to some degree. And when we scaled the activities of the duration cells by this reward-seeking time, we found that the patterns of the sequential activities were largely diminished, and showed a significantly lower peak entropy compared to the same activities scaled by trial durations. The remaining sequential pattern may be due to the correlation between trial durations and motivation (Supp. Figure 2), and the sequential pattern reflects timing more prominently. These analyses provide further evidence that the sequential activities were not coding motivations. These data are included in Figure 2F, 2K and supp. Figure 3 in revised manuscript.

      Author response image 1.

      Regarding whether the scaling sequential activity we report represents behavioral timing or true time estimation, we did not have evidence on this point. However, a previous study has shown that PFC silencing led to disruption of the mouse’s timing behavior without affecting the execution of the task (PMID: 24367075), arguing against the behavior timing interpretation. The main surprising finding of our present study is that these duration cells are different from the start and end cells

      in terms of their coding stability. Thus, future studies dissecting the anatomical microcircuit of these duration cells may provide further clues regarding whether they are connected with reward-related or motion-related brain regions. This may help partially resolve the “time” vs.

      “motor” debate the reviewer mentioned.

      (2) A second issue is that the authors seem to assume that rats are perfectly immobile and perform like some kind of robots that would initiate nose pokes, maintain them, and remove them in a very discretized manner. However, in this kind of task, rats are constantly moving from the reward magazine to the nose poke. They also move while nose-poking (either their body or their mouth), and when they come out of the nose poke, they immediately move toward the reward spout. Thus, there is a continuous stream of movements, including fidgeting, that will covary with timing. Numerous studies have shown that sensorimotor dynamics influence neural activity, even in the prefrontal cortex. Therefore, the authors cannot rule out that what the records reflect are movements (and the scaling of movement) rather than underlying processes of time estimation (some kind of timer). Concretely, start cells could represent the ending of the movement going from the water spout to the nosepoke, and end cells could be neurons that initiate (if one can really isolate any initiation, which I doubt) the movement from the nosepoke to the water spout. Duration cells could reflect fidgeting or orofacial movements combined with an increasing urgency to leave the nose pokes.

      (3) The statistics should be rethought for both the behavioral and neuronal data. They should be conducted separately for all the rats, as there is likely interindividual variability in the impulsivity of the animals.

      We thank the reviewer for the comment, yet we are not quite sure what specifically was asked by the reviewer. It appears that the reviewer requires we conduct our analysis using each rat individually. In our revised manuscript, we have conducted and reported analyses with individual rat in the original Figure 1C, Figure 2C, G, K, Figure 4F.

      (4) The fact that neuronal activity reflects an integration of movement and motivational factors rather than some abstract timing appears to be well compatible with the analysis conducted on the error trials (Figure 4), considering that the sensorimotor and motivational dynamics will rescale with the durations of the nose poke. 

      (5) The authors should mention upfront in the main text (result section) the temporal resolution allowed by their Ca+ probe and discuss whether it is fast enough in regard of behavioral dynamics occurring in the task. 

      We thank the reviewer for the suggestion. We have originally mentioned the caveat of calcium imaging in the interpretation of our results. We have now incorporated more texts for this purpose during our revision. In terms of behavioral dynamics (start and end of nose poke in this case), we think calcium imaging could provide sufficient kinetics. However, the more refined dynamics related to the reproducibility of the sequential activity or the precise representation of individual cells on the scaled duration may be benefited from improved time resolution.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors): 

      (1) Please refer explicitly to the three types of cells in the abstract. 

      We have now modified the abstract as suggested during revision.

      (2) Please refer to the work of Betancourt et al., 2023 Cell Reports, where a trial-by-trail analysis on the correlation between neural trajectory dynamics in MPC and timing behavior is reported. In that same paper the stability of neural sequences across task parameters is reported. 

      We have now cited and discussed the study in the discussion section of the revised manuscript.

      (3) Please state the number of studied animals at the beginning of the results section. 

      We have now provided this information as requested. The numbers of rats are also plotted in Figure 1D for each analysis.

      (4) Why do the middle and right panels of Figure 2E show duration cells. 

      Figure 2E was intended to show examples of duration cells’ activity. We included different examples of cells that peak at different points in the scaled duration. We believe these multiple examples would give the readers a straight forward impression of these cells’ activity patterns.

      (5) Which behavioral sessions of Figure 1B were analyzed further.

      We have now labeled the analyzed sessions in Figure 1B with red color in the revised manuscript.

      (6) In Figure 3A-C please increase the time before the beginning of the trial in order to visualize properly the activation patterns of the start cells.

      We thank the reviewer for the suggestion and have now modified the figure accordingly in the revised manuscript.

      (7) Please state what could be the behavioral and functional effect of the ablation of the cortical tissue on top of mPFC.

      We thank the reviewer for the question. In our experience, mice with lens implanted in the mPFC did not show observable difference with mice without surgery in the acquisition of the task and the distribution of the nose-poke durations. In our dataset, rats with the lens implantation showed similar nose-poking behavior as those without lens implantation (Figure 1B). Thus, it seems that the effect of ablation, if any, was quite limited, in the scope of our task.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      Building on their own prior work, the authors present valuable findings that add to our understanding of cortical astrocytes, which respond to synaptic activity with calcium release in subcellular domains that can proceed to larger calcium waves. The proposed concept of a spatial "threshold" is based on solid evidence from in vivo and ex vivo imaging data and the use of mutant mice. However, details of the specific threshold should be taken with caution and appear incomplete unless supported by additional experiments with higher resolution in space and time.

      We thank the reviewers and editors for the positive assessment of our work as containing valuable findings that add to our understanding of cortical astrocytes. We also appreciate their positive appraisal of the proposed concept of a spatial threshold supported by solid evidence. 

      Regarding their specific comments, we truly appreciate them because they have helped to clarify issues and to improve the study. Point-by-point responses to these comments are provided below. Regarding the general comment on the spatial and temporal resolution of our study, we would like to clarify that the spatial and temporal resolution used in the current study (i.e., 2 - 5 Hz framerate using a 25x objective with 1.7x digital zoom with pixels on the order of 1 µm2) is within the norm in the field, does not compromise the results, nor diminish the main conceptual advancement of the study, namely the existence of a spatial threshold for astrocyte calcium surge. 

      We respect the thoughtfulness of the reviewers and editors towards improving the paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      Lines et al., provide evidence for a sequence of events in vivo in adult anesthetized mice that begin with a footshock driving activation of neural projections into layer 2/3 somatosensory cortex, which in turn triggers a rise in calcium in astrocytes within "domains" of their "arbor". The authors segment the astrocyte morphology based on SR101 signal and show that the timing of "arbor" Ca2+ activation precedes somatic activation and that somatic activation only occurs if at least {greater than or equal to}22.6% of the total segmented astrocyte "arbor" area is active. Thus, the authors frame this {greater than or equal to}22.6% activation as a spatial property (spatial threshold) with certain temporal characteristics - i.e., must occur before soma and global activation. The authors then elaborate on this spatial threshold by providing evidence for its intrinsic nature - is not set by the level of neuronal stimulus and is dependent on whether IP3R2, which drives Ca2+ release from the endoplasmic reticulum (ER) in astrocytes, is expressed. Lastly, the authors suggest a potential physiologic role for this spatial threshold by showing ex vivo how exogenous activation of layer 2/3 astrocytes by ATP application can gate glutamate gliotransmission to layer 2/3 cortical neurons - with a strong correlation between the number of active astrocyte Ca2+ domains and the slow inward current (SIC) frequency recorded from nearby neurons as a readout of glutamatergic gliotransmission. This is interesting and would potentially be of great interest to readers within and outside the glia research community, especially in how the authors have tried to systematically deconstruct some of the steps underlying signal integration and propagation in astrocytes. Many of the conclusions posited by the authors are potentially important but we think their approach needs experimental/analytical refinement and elaboration.

      We thank the reviewer for her/his positive appraisal and comments that has helped us to improve the study. In response to their insights, we aim to address the key points raised below:

      (1) Sequence of Events: We acknowledge the reviewer's interest in our findings regarding the sequence of events. We have provided a more detailed description of the methods and results to clarify the spatiotemporal relationships between domain activation and spatiotemporal clustering, to centripetal and centrifugal calcium propagation in relation to soma activation.

      (2) Spatial Threshold: The reviewer accurately identifies our characterization of a spatial threshold (≥22.6% activation) with temporal characteristics as a crucial aspect of our study. We have expanded upon this concept by offering a clearer illustration of how this threshold relates to somatic and global activation.

      (3) Intrinsic Nature of Spatial Threshold: The reviewer's insightful observation regarding the inherent quality of the spatial threshold, regardless of its dependence on neuronal stimuli is noteworthy. We have provided additional details to substantiate this claim, shedding more light on the fundamental nature of this phenomenon.

      (4) Physiological Implications: The reviewer rightly highlights the potential physiological significance of our findings, particularly in relation to gliotransmission in cortical neurons. We have enhanced our discussion by elaborating on the implications of these observations.

      The primary issue for us, and which we would encourage the authors to address, relates to the low spatialtemporal resolution of their approach. This issue does not necessarily compromise the concept of a spatial threshold, but more refined observations and analyses are likely to provide more reliable quantitative parameters and a more comprehensive view of the mode of Ca2+ signal integration in astrocytes. 

      We agree with the reviewer that our spatial-temporal resolution (2 – 5 Hz framerate using a 25x objective and 1.7x digital zoom with pixels on the order of 1 µm) does not compromise the proposed concept of the existence of a spatial threshold for the intracellular calcium expansion.

      For this reason, and because their observations might be perceived as both a conceptual and numerical standard in the field, we believe that the authors should proceed with both experimental and analytical refinement. Notably, we have difficulty with the reported mean delays of astrocyte Ca2+ elevations upon sensory stimulation. The 11s delay for response onset in "arbor" and 13s in the soma are extremely long, and we do not think they represent a true physiologic latency for astrocyte responses to the sensory activity. Indeed, such delays appear to be slower even than those reported in the initial studies of sensory stimulation in anesthetized mice with limited spatial-temporal resolution (Wang et al. Nat Neurosci., 2006) - not to say of more recent and refined ones in awake mice (Stobart et al. Neuron, 2018) that identified even sub-second astrocyte Ca2+ responses, largely preserved in IP3R2KO mice. Thus, we are inclined to believe that the slowness of responses reported here is an indicator of experimental/analytical issues. There can be several explanations of such slowness that the authors may want to consider for improving their approach: (a) The authors apparently use low zoom imaging for acquiring signals from several astrocytes present in the FOV: do all of these astrocytes respond homogeneously in terms of delay from sensory stimulus? Perhaps some are faster responders than others and only this population is directly activated by the stimulus. Others could be slower in activation because they respond secondarily to stimuli. In this case, the authors could focus their analysis specifically on the "fast-responding population". (b) By focusing on individual astrocytes and using higher zoom, the authors could unmask more subtle Ca2+ elevations that precede those reported in the current manuscript. These signals have been reported to occur mainly in regions of the astrocyte that are GCaMP6-positive but SR101-negative and constitute a large percentage of its volume (Bindocci et al., 2017). By restricting analysis to the SR101-positive part of the astrocyte, the authors might miss the fastest components of the astrocyte Ca2+ response likely representing the primary signals triggered by synaptic activity. It would be important if they could identify such signals in their records, and establish if none/few/many of them propagate to the SR-101-positive part of the astrocyte. In other words, if there is only a single spatial threshold, the one the authors reported, or two or more of them along the path of signal propagation towards the cell soma that leads eventually to the transformation of the signal into a global astrocyte Ca2+ surge. 

      We thank the reviewer for these excellent and important comments. The qualm with the mean delays of astrocyte activation is indeed a result of averaging together astrocyte responses to a 20 second stimulus. Indeed, astrocyte responses are heterogeneous and many astrocytes respond much quicker, as can be seen in example traces in Figs. 1D, 1G, and 3C. Indeed, with any biological system variability exists, however here we take the averaged responses in order to identify a general property of astrocyte calcium dynamics: the existence of the concept of a spatial threshold for astrocyte calcium surge. We have now included a paragraph in the Discussion section on this subject on P15, L16-22:

      “We were able to discover this general phenomenon of astrocyte physiology through the use of a novel computational tool that allowed us to combine almost 1000 astrocyte responses. Variation is rife in biological systems, and there are sure to be eccentricities within astrocyte calcium responses. Here, we focused on grouped data to better understand what appears to be an intrinsic property of astrocyte physiology. We used different statistical examinations and tested our hypothesis in vivo and in situ, and all these methods together provide a more complete picture of the existence of a spatial threshold for astrocyte calcium surge.“

      The specialized work of Stobart et al. 2018, was focused more on the fast activation of microdomain subpopulations than the induction of later somatic activation. Indeed, Stobart et al. 2018 and Wang et al. 2006 also found that somatic responses of astrocytes were delayed in the range of seconds. Importantly, Wang et al., 2006 describe that the activation of astrocytes is frequency dependent, that is, the higher the frequency, the faster and higher the activation. In the present, work we stimulated at just 2 Hz to better investigate the spatial threshold. Excitingly, the results showed by Stobart et al., 2018 agree with ours, Rupprecht et al. 2024 and Fedotova et al. 2023, that there is a sequence of activation from the domains to the somas, which could be due to the time that is required for the summation of the initial microdomain signal to reach a threshold capable to activate the soma. These above referenced studies have many similarities with our own but are different in the underlying scientific question that led to diverging methodology, however we want to stress that we agree with the reviewers that our methods provide sufficient evidence for the cell-scale scientific phenomenon that we are studying, which is the spatial threshold for astrocyte calcium surge. Finally, we have included an additional figure (new Figure 5) that only looks at the calcium dynamics of early responding cells and found no significant difference in the spatial threshold in this population compared to our original quantification.

      In this context, there is another concept that we encourage the authors to better clarify: whether the spatial threshold that they describe is constituted by the enlargement of a continuous wavefront of Ca2+ elevation, e.g. in a single process, that eventually reaches 22.6% of the segmented astrocyte, or can it also be

      constituted by several distinct Ca2+ elevations occurring in separate domains of the arbor, but overall totaling 22.6% of the segmented surface? Mechanistically, the latter would suggest the presence of a general excitability threshold of the astrocyte, whereas the former would identify a driving force threshold for the centripetal wavefront. In light of the above points, we think the authors should use caution in presenting and interpreting the experiments in which they use SIC as a readout. Their results might lead some readers to bluntly interpret the 22.6% spatial threshold as the threshold required for the astrocyte to evoke gliotransmitter release. Indeed, SIC are robust signals recorded somatically from a single neuron and likely integrate activation of many synapses all belonging to that neuron. On the other hand, an astrocyte impinges in a myriad of synapses belonging to several distinct neurons. In our opinion, it is quite possible that more local gliotransmission occurs at lower Ca2+ signal thresholds (see above) that may not be efficiently detected by using SIC as a readout; a more sensitive approach, such as the use of a gliotransmitter sensor expressed all along the astrocyte plasma-membrane could be tested to this aim.  

      The reviewer raised an excellent point. Whether the spatial threshold of 22.6% occur in the segmented astrocyte or may be reached occurring in separate domains of the arbor, is an important question and we address this by the inclusion of a novel analysis shown in the new figure (new Figure 5) in the revised version of the manuscript. In this new analysis, we demonstrate that the average distance between domain activation is not significantly different between subthreshold activity and the activity that precedes or follows the suprathreshold cellular activation. In contrast, we do find a significant difference in the average time between domain activation between subthreshold activity and activity that precedes and follows suprathreshold activation. We go further with a generalized linear model to show that percent area of active domains and temporal clustering is related to soma activation and not spatial clustering. This suggests that domain activation doesn’t need to be spatially clustered together to induce soma activation and subsequent calcium surge, but more importantly, domain activation must be over the spatial threshold and occur within a timeframe. This has been added to the Results on P10, L2-40:

      “Our results demonstrate the relationship between the percentage of active domains and soma activation and subsequent calcium surge. Next, we were interested in the spatiotemporal properties of domain activity leading up to and during calcium surge. Because we imaged groups of astrocytes, we were able to constrain our analyses to fast responders (onset < median population onset) in order to evaluate astrocytes that were more likely to respond to neuronal-evoked sensory stimulation and not nearby astrocyte activation (Figure 5A). In this population the spatial threshold was 23.8% within the 95% confidence intervals of [21.2%, 24.0%]. First, we created temporal maps, where each domain is labeled as its onset relative to soma activation, of individual astrocyte calcium responses to study the spatiotemporal profile of astrocyte calcium surge (Bindocci et al., 2017; Rupprecht et al., 2024) (Figure 5B). Using temporal maps, we quantified the spatial clustering of responding domains by measuring the average distance between active domains. We found that the average distance between active domains in subthreshold astrocyte responses were not significantly different from pre-soma suprathreshold activity (16.3 ± 0.4 µm in No-soma cells versus 16.2 ± 0.3 µm in Pre-soma cells, p = 0.75; n = 286 No-soma vs n = 326 Pre-soma, 30 populations and 3 animals; Figure 5C). Following soma activation, astrocyte calcium surge was marked with no significant change in the average distance between active domains (16.0 ± 0.3 µm in Post-soma cells versus 16.3 ± 0.4 µm in No-soma cells, p = 0.57 and 16.2 ± 0.3 µm in Presoma cells, p = 0.31; n = 326 soma active and n = 286 no soma active, 30 populations and 3 animals; Figure 5C). Taken together this suggests that on average domain activation happens in a nonlocal fashion that may illustrate the underlying nonlocal activation of nearby synaptic activity. Next, we interrogated the temporal patterning of domain activation by quantifying the average time between domain responses, and found that the average time between domain responses was significantly decreased in pre-soma suprathreshold activity compared to subthreshold activities without subsequent soma activation (9.4 ± 0.3 s in No-soma cells versus 4.4 ± 0.2 s in Pre-soma cells, p < 0.001; n = 326 soma active vs n = 286 not soma active, 30 populations and 3 animals; Figure 5D). The average time between domain activation was even less after the soma became active during calcium surge (2.1 ± 0.1 s in Post-soma versus 9.4 ± 0.3 s in No-Soma cells, p < 0.001 and 4.4 ± 0.1 s in Pre-soma cells, p < 0.001; n = 326 soma active and n = 286 not soma active, 30 populations and 3 animals; Figure 5D). This corroborates our findings in Figure S2 and highlights the difference in temporal profiles between subthreshold activity and astrocyte calcium surge. 

      We then tested the contribution of each of our three variables describing domain activation (percent area, average distance and time) to elicit soma activation by creating a general linear model. We found that overall, there was a significant relationship between these variables and the soma response (p = 5.5e-114), with the percent area having the largest effect (p = 3.5e-70) followed by the average time (p = 3.6e-7), and average distance having no significant effect (p = 0.12). Taken together this suggests that the overall spatial clustering of active domains has no effect on soma activation, and the percent area of active domains within a constrained time window having the largest effect.”

      Regarding comments on SIC, we fully agree with the reviewer. In the revised version of the manuscript, we have included text in the discussion to ensure the correct interpretation of the results, i.e., the observed 22.6% spatial threshold for the SIC does not necessarily indicate an intrinsic property of gliotransmitter release; rather, since SICs have been shown to be calcium-dependent, it is not surprising that their presence, monitored at the whole-cell soma, matches the threshold for the intracellular calcium extension. We have added to the Discussion P16, L15-30:

      “Astrocyte calcium activity induces multiple downstream signaling cascades, such as the release of gliotransmitters (Araque et al., 2014; de Ceglia et al., 2023). Using patch-clamp recordings of a single nearby neuron we showed that a nearby population of astrocyte calcium surge is also correlated to the increase in slow inward currents (SICs), previously demonstrated to be dependent on astrocytic vesicular release of glutamate (Araque et al., 2000; Durkee et al., 2019; Fellin et al., 2004). The increase of SICs we observed from patching a single neuron is likely the integration of gliotransmitter release onto synapses from a group of nearby astrocytes. Indeed, subthreshold astrocyte calcium increases alone can trigger activity in contacted dendrites (Di Castro et al., 2011). An exciting avenue of future research would be to observe the impact of a single astrocyte calcium surge on nearby neurons (Refaeli and Goshen, 2022). How many neurons would be affected, and would this singular event be observable through patch clamp from a single neuron? The output of astrocyte calcium surge is equally important to network communication as the labeling of astrocyte calcium surge, as it identifies a biologically relevant effect onto nearby neurons. Many downstream signaling mechanisms may be activated following astrocyte calcium surge, and the effect of locally concentrated domain activity vs astrocyte calcium surge should be studied further on different astrocyte outputs.”

      Additional considerations are that the authors propose an event sequence as follows: stimulus - synaptic drive to L2/3 - arbor activation - spatial threshold - soma activation - post soma activation - gliotransmission. This seems reminiscent of the sequence underlying neuronal spike propagation - from dendrite to soma to axon, and the resulting vesicular release. However, there is no consensus within the glial field about an analogous framework for astrocytes. Thus, "arbor activation", "soma activation", and "post soma activation" are not established `terms-of-art´. Similarly, the way the authors use the term "domain" contrasts with how others have (Agarwal et al., 2017; Shigetomi et al., 2013; Di Castro et al., 2011; Grosche et al., 1999) and may produce some confusion. The authors could adopt a more flexible nomenclature or clarify that their terms do not have a defined structural-functional basis, being just constructs that they justifiably adapted to deal with the spatial complexity of astrocytes in line with their past studies (Lines et al., 2020; Lines et al., 2021).

      We agree there is no consensus within the glial field about this event sequence. One major difference between this sequence of events and neuronal spike propagation is directionality from dendrite to soma to axon. It is unknown whether directionality of the calcium signal exists in astrocytes. However, our finding in Figure 5E suggests a directionality of centripetal propagation from the arborization to the soma to elicit calcium surge that leads to centrifugal propagation. In the Results on P10-11, L41-8:

      “Recent work studying astrocyte integration has suggested a centripetal model of astrocyte calcium, where more distal regions of the astrocyte arborization become active initially and activation flows towards the soma (Fedotova et al., 2023; Rupprecht et al., 2024). Here, we confirm this finding, where activated domains located distal from the soma respond sooner than domains more proximal to the soma (linear correlation: p < 0.05, R2 = 0.67; n = 30 populations, 3 animals; Figure 4E). Next, we build upon this result to also demonstrate that following soma activation, astrocyte calcium surge propagates outward in a centrifugal pattern, where domains proximal to the soma become active prior to distal domains (linear correlation: p < 0.01, R2 = 0.89; n = 30 populations, 3 animals; Figure 4E). Together these results detail that intracellular astrocyte calcium follows a centripetal model until the soma is activated leading to a calcium surge that flows centrifugally. This suggests that astrocytes have the capabilities to integrate the nearby local synaptic population, and if this activity exceeds the spatial threshold then it leads to a whole-cell response that spreads outward.” 

      And in the Discussion P15, L3-15:

      “Close examinations of the calcium surge uncovered distinct propagations whether before or after soma activation. Firstly, our analysis found that temporal clustering changed before and after calcium surge, with both being above subthreshold activity, and that this characteristic was absent when assessing spatial clustering. When comparing the percent area, spatial and temporal clustering of active domains using a GLM, we found that the percent area was the most significant parameter describing a threshold to soma activation. We then compared the delay of domain activation and its distance from the soma, and recreated previous results that suggest a centripetal model of astrocytic calcium responses from the distal arborizations to the soma (Fedotova et al., 2023; Rupprecht et al., 2023). Here, we went a step further and discovered that soma activation switches this directionality for astrocytic calcium surge to propagate outward in a centrifugal manner away from the soma. Taken together, these results demonstrate the integrative potential of astrocyte calcium responses and characterize further the astrocyte calcium surge to relay this other parts of the astrocyte.”

      The term “microdomain” is used in the references above to define distal subcellular domains in contact with synapses, and in order to dissociate from this term we adopt the nomenclature “domain” to define all subcellular domains in the astrocyte arborization. These items have been discussed and clarified in the revised version of the manuscript on P5, L17-19:

      “The concept of domain to define all subcellular domains in the astrocyte arborization should not be confused with the concept of microdomain, that usually refers to the distal subcellular domains in contact with synapses.”

      Our previous points suggest that the paper would be significantly strengthened by new experimental observations focusing on single astrocytes and using acquisitions at higher spatial and temporal resolution. If the authors will not pursue this option, we encourage them to at least improve their analysis, and at the same time recognize in the text some limitations of their experimental approach as discussed above. We indicate here several levels of possible analytical refinement.

      We believe our spatial (25x objective and 1.7x digital zoom with pixels on the order of 1µm) and temporal (2 – 5 Hz framerate) resolution is within the range used in the glial field. In any case the existence of a spatial threshold for astrocyte calcium surge is not compromised with the use of this imaging resolution.

      The first relates to the selection of astrocytes being analyzed, and the need to focus on a much narrower subpopulation than (for example) 987 astrocytes used for the core data. This selection would take into greater consideration the aspects of structure and latency. With the structural and latency-based criteria for selection, the number of astrocytes to analyze might be reduced by 10-fold or more, making our second analytical recommendation much more feasible.

      We agree that individual differences exist, however, establishing a general concept requires the sampling of many astrocytes. Nevertheless, we have included a new figure (new Figure 5) that analyzes early responders.

      For structure-based selection - Genetically-encoded Ca2+ indicators such as GCaMP6 are in principle expressed throughout an astrocyte, even in regions that are not labelled by SR101. Moreover, astrocytes form independent 3D territories, so one can safely assume that the GCaMP6 signal within an astrocyte volume belongs to that specific astrocyte (this is particularly evident if the neighboring astrocytes are GCaMP6negative). Therefore, authors could extend their analysis of Ca2+ signals in individual astrocytes to the regions that are SR101-negative and try to better integrate fast signals in their spatial threshold concept. Even if they decided to be conservative on their methods, and stick to the astrocyte segmentation based on the SR-101 signal, they should acknowledge that SR101 dye staining quality can vary considerably between individual astrocytes within a FOV - some astrocytes will have much greater structural visibility in the distal processes than others. This means that some astrocytes may have segmented domains extending more distally than others and we think that authors should privilege such astrocytes for analysis. However, cases like the representative astrocytes shown in Figure 4A or Figure S1B, have segmented domains localized only to proximal processes near the soma. Accordingly, given the reported timing differences between "arbor" and "soma" activation, one might expect there to be comparable timing differences between domains that are distal vs proximal to the soma as well. Fast signals in peripheral regions of astrocytes in contact with synapses are largely IP3R2-independent (Stobart et al., 2018). However, the quality of SR101 staining has implications for interpreting the IP3R2 KO data. There is evidence IP3R2 KO may preferentially impact activity near the soma (Srinivasan et al., 2015). Thus, astrocytes with insufficient staining - visible only in the soma and proximal domains - might show a biased effect for IP3R2 KO. While not necessarily disrupting the core conclusions made by the authors based on their analysis of SR101-segmented astrocytes, we think results would be strengthened if astrocytes with sufficient SR101 staining - i.e. more consistent with previous reports of L2/3 astrocyte area (Lanjakornsiripan et al., 2018) - were only included. This could be achieved by using max or cumulative projections of individual astrocytes in combination with SR101 staining to construct more holistic structural maps (Bindocci et al., 2017).

      We agree with the ideas concerning SR101, and indeed there could be variability in the origins of the astrocyte calcium signal. Astrocyte territory boundaries can be difficult to discern when both astrocytes express GCaMP6. Also, SR101-negative domains could encapsulate an area that is only partially that of astrocyte territory, including also extracellular space. Here we take a conservative approach to constrain ROIs to SR101positive astrocyte territory outlines without invading neighboring cells or extracellular space in order to reduce error in the estimate of a spatial threshold. The effect of IP3R2 KO preferentially impacting activity near the soma is interesting, and in line with our conclusions. We agree that the findings from SR101-negative pixels would not necessarily disrupt the core conclusions of the study, and the additional analysis suggested would further strengthen results. We have since included on the limitations of the study in the Discussion P15, L3137:

      “In this study, we chose to limit our examinations of calcium activity that was within the bounds determined by SR101 staining. Much work has shown that astrocyte territories are more akin to sponge-like morphology with small microdomains making up the end feet of their distal arborizations (Baldwin et al., 2024). Here, we took a conservative approach to not incorporate these fine morphological processes and only take SR101-postive pixels for analysis in order to reduce the possible error of including a neighboring astrocyte or extracellular space in our analyses. Much work can be done to extend these results.”

      For latency-based selection - The authors record calcium activity within a FOV containing at least 20+ astrocytes over a period of 60s, during which a 2Hz hindpaw stimulation at 2mA is applied for 20s. As discussed above, presumably some astrocytes in a FOV are the first to respond to the stimulus series, while others likely respond with longer latency to the stimulus. For the shorter-latency responders <3s, it is easier to attribute their calcium increases as "following the sensory information" projecting to L2/3. In other cases, when "arbor" responses occur at 10s or later, only after 20 stimulus events (at 2Hz), it is likely they are being activated by a more complex and recurrent circuit containing several rounds of neuron-glia crosstalk etc., which would be mechanistically distinct from astrocytes responding earlier. We suggest that authors focus more on the shorter latency response astrocytes, as they are more likely to have activity corresponding to the stimulus itself.

      We agree that different times of astrocyte calcium increases may be due to different mechanisms outside of the astrocyte. We believe the spatial threshold will be intrinsic to these external variables; yet we believe that longer latency responses are physiological and may carry important information to determining the astrocyte calcium responses. Indeed, we have performed the spatial threshold analysis on early responders (first half of responding cells), and found the spatial threshold in that population (23.8%) is within the 95% confidence interval [21.2%, 24.0%]. Additionally, the slow responders were also within the confidence interval (22.6%).

      The second level of analysis refinement we suggest relates specifically to the issue of propagation and timing for the activity within "arbor", "soma" and "post-soma". Currently, the authors use an ROI-based approach that segments the "arbor" into domains. We suggest that this approach could be supplemented by a more robust temporal analysis. This could for example involve starting with temporal maps that take pixels above a certain amplitude and plot their timing relative to the stimulus-onset, or (better) the first active pixel of the astrocyte. This type of approach has become increasingly used (Bindocci et al., 2017; Wang et al., 2019; Ruprecht et al., 2022) and we think its use can greatly help clarify both the proposed sequence and better characterize the spatial threshold. We think this analysis should specifically address several important points:

      We agree that the creation of temporal maps from our own data would be interesting, and we provide the results of the suggested analysis within the new figure (new Figure 5) in the revised version of the manuscript. In this analysis we show that subthreshold, pre-soma and post-soma dynamics are significantly different in time. These added results of including temporal maps strengthen our claim of a spatial threshold, by quantifying the distinct temporal and spatial dynamics of domain activation before and after the spatial threshold is met (i.e. soma activation), and highlights differences in subthreshold and suprathreshold activity.

      (1) Where/when does the astrocyte activation begin? Understanding the beginning is very important, particularly because another potential spatial threshold - preceding the one the authors describe in the paper - could gate the initial activation of more distal processes, as discussed above. This sequentially earlier spatial threshold could (for example) rely on microdomain interaction with synaptic elements and (in contrast) be IP3R2 independent (Srinivasan et al., 2015, Stobart et al., 2018). We would be interested to know whether, in a subset of astrocytes that meet the structure and latency criteria proposed above and can produce global activation, there is an initial local GCaMP6f response of a minimal size that must occur before propagation towards the soma begins. The data associated with varying stimulus parameters could potentially be useful here and reveal stimulus intensity/duration-dependent differences.

      This is a very important point. It is difficult to pinpoint the beginning of the signal, which is why we rely on the average of responses. The additional analysis we provide based on temporal maps (new Figure 5) shows a very interesting result in that there is no significant difference between the spatial clustering of, or average distance between, activated domains in subthreshold and pre-soma suprathreshold activity. This result, along with the General Linear Model, suggests that there is not another subcellular potential spatial threshold, as the activity is the same. Instead, the main difference between activity in the domains that leads to soma activation or not is the overall percentage of domains active and not necessarily how that spatial activity is organized. We have also added this point in the Discussion section to highlight the importance of this result. P15, L3-8:

      “Close examinations of the calcium surge uncovered distinct propagations whether before or after soma activation. Firstly, our analysis found that temporal clustering changed before and after calcium surge, with both being above subthreshold activity, and that this characteristic was absent when assessing spatial clustering. When comparing the percent area, spatial and temporal clustering of active domains using a GLM, we found that the percent area was the most significant parameter describing a threshold to soma activation.”

      (2) Whether the propagation in the authors' experimental model is centripetal? This is implied throughout the manuscript but never shown. We think establishing whether (or not) the calcium dynamics are centripetal is important because it would clarify whether spatially adjacent domains within the "arbor" need to be sequentially active before reaching the threshold and then reaching the soma. More broadly, visualizing propagation will help to better visualize summation, which is presumably how the threshold is first reached (and overcome).

      The alternative hypothesis of a general excitability threshold, as discussed above, would be challenged here and possibly rejected, thereby clarifying the nature of the Ca2+ process that needs to reach a threshold for further expansion to the soma and other parts of the astrocyte.

      We agree that our view is centripetal when considering activity leading up to soma activation. Indeed, we have found arborization activity precedes soma activity (Figure 3), soma activity appears to rely on the percent area of domain activity (Figure 4), and pre-soma domain activity comes online earlier in domains distal from the soma (new Figure 5). However, whether this is intrinsic or due to the fact that synapses are more likely to occur in the periphery requires further studies. Our new results in the new Figure 5 demonstrating that subthreshold activity has a spatial organization that is not significantly different than pre-soma activity in suprathreshold cases argues in favor of a general excitability threshold hypothesis. However, we do not see these hypotheses as mutually exclusive. Excitingly, we have also found that following soma activation, calcium surge appears to follow a centrifugal propagation. We have since added the topic of a centripetal-centrifugal experimental model to the Discussion P15, L8-15:

      “We then compared the delay of domain activation and its distance from the soma, and recreated previous results that suggest a centripetal model of astrocytic calcium responses from the distal arborizations to the soma (Fedotova et al., 2023; Rupprecht et al., 2024). Here, we went a step further and discovered that soma activation switches this directionality for astrocytic calcium surge to propagate outward in a centrifugal manner away from the soma. Taken together, these results demonstrate the integrative potential of astrocyte calcium responses and characterize further the astrocyte calcium surge to relay this other parts of the astrocyte.”

      (3) In complement to the previous point: we understand that the spatial threshold does not per se have a location, but is there some spatial logic underlying the organization of active domains before the soma response occurs? One can easily imagine multiple scenarios of sparse heterogeneous GCaMP6f signal distributions that correspond to {greater than or equal to}22.6% of the arborization, but that would not be expected to trigger soma activation. For example, the diagram in Figure 4C showing the astrocyte response to 2Hz stim (which lacks a soma response) underscores this point. It looks like it has {greater than or equal to}22.6% activation that is sparsely localized throughout the arborization. If an alternative spatial distribution for this activity occurred, such that it localized primarily to a specific process within the arbor, would it be more likely to trigger a soma response?

      This is an interesting point and our new spatiotemporal analysis found in the new figure (new Figure 5) aims to shed some light on this and is answered above. To our knowledge, there is no mechanism in astrocytes to impose directionality on calcium propagation, like rectifying voltage-gated sodium channels in neuronal voltage propagation. We found that the delay of domain activation compared to soma onset is significantly correlated to the distance from the soma (new Figure 5E). In addition, spatial clustering is not significantly different compared in pre-soma vs. non responders or post-soma. Together this suggests that centripetal propagation may be occurring throughout the entire cell and not in a local clustered way. Our findings also suggest that following soma activation astrocyte calcium surge follows a mostly centrifugal pattern (new Figure 5E).

      (4) Does "pre-soma" activation predict the location and onset time of "post-soma" activation? For example, are arbor domains that were part of the "pre-soma" response the first to exhibit GCaMP6f signal in the "post-soma" response?

      Please see above comments.

      Reviewer #2 (Public Review):

      Lines et al investigated the integration of calcium signals in astrocytes of the primary somatosensory cortex. Their goal was to better characterize the mechanisms that govern the spatial characteristics of calcium signals in astrocytes. In line with previous reports in the field, they found that most events originated and stayed localized within microdomains in distal astrocyte processes, occasionally coinciding with larger events in the soma, referred to as calcium surges. As a single astrocyte communicates with hundreds of thousands of synapses simultaneously, understanding the spatial integration of calcium signals in astrocytes and the mechanisms governing the latter is of tremendous importance to deepen our understanding of signal processing in the central nervous system. The authors thus aimed to unveil the properties governing the emergence of calcium surges. The main claim of this manuscript is that there would be a spatial threshold of ~23% of microdomain activation above which a calcium surge, i.e. a calcium signal that spreads to the soma, is observed. Although the study provides data that is highly valuable for the community, the conclusions of the current version of the manuscript seem a little too assertive and general compared with what can be deduced from the data and methods used.

      The major strength of this study is the experimental approach that allowed the authors to obtain numerous and informative calcium recordings in vivo in the somatosensory cortex in mice in response to sensory stimuli as well as in situ. Notably, they developed an interesting approach to modulating the number of active domains in peripheral astrocyte processes by varying the intensity of peripheral stimulation (its amplitude, frequency, or duration).

      We thank the reviewer for their kind and thoughtful review of our study.

      The major weakness of the manuscript is the method used to analyze and quantify calcium activity, which mostly relies on the analysis of averaged data and overlooks the variability of the signals measured. As a result, the main claims from the manuscript seem to be incompletely supported by the data. The choice of the use of a custom-made semi-automatic ROI-based calcium event detection algorithm rather than established state-of-the-art software, such as the event-based calcium event detection software AQuA (DOI: 10.1038/s41593-019-0492-2), is insufficiently discussed and may bias the analysis. Some references on this matter include: Semyanov et al, Nature Rev Neuro, 2020 (DOI: 10.1038/s41583-020-0361-8); Covelo et al 2022, J Mol Neurosci (DOI: 10.1007/s12031-022-02006-w) & Wang et al, 2019, Nat Neuroscience (DOI: 10.1038/s41593-019-0492-2). Moreover, the ROIs used to quantify calcium activity are based on structural imaging of astrocytes, which may not be functionally relevant.

      Unfortunately, there is no general consensus for calcium analysis in the astrocyte or neuronal field, and many groups use custom made software made in lab or custom software such as GECIquant, STARDUST, AQuA or AQuA2. While AQuA is an event-based calcium event detection software, it may be that not including inactive domains that are SR101 positive could underestimate the spatial threshold for calcium surge. Our data is not based on the functional events but is based on calcium with structural constraints within a single astrocyte. This is crucial to properly determine the ratio of active vs inactive pixels within a single astrocyte.

      For the reasons listed above, the manuscript would probably benefit from some rephrasing of the conclusions and a discussion highlighting the advantages and limitations of the methodological approach. The question investigated by this study is of great importance in the field of neuroscience as the mechanisms dictating the spatio-temporal properties of calcium signals in astrocytes are poorly characterized, yet are essential to understand their involvement in the modulation of signal integration within neural circuits.

      We thank the reviewer for their suggestions to benefit the conclusions and discussion. We have now included a paragraph outlining the limitations of the study in the Discussion P15, L23-37:

      “The investigation of the spatial threshold could be improved in the future in a number of ways. One being the use of state-of-the-art imaging in 3D(Bindocci et al., 2017). While the original publication using 3D imaging to study astrocyte physiology does not necessarily imply that there would be different calcium dynamics in one axis over another, the three-dimensional examination of the spatial threshold could refine the findings we present here. To better control the system, mice imaged here were under anesthesia, and this is a method that has been used to characterize many foundational physiological results in the field (Hubel and Wiesel, 1962; Mountcastle et al., 1957). However, assessing the spatial threshold in awake freely moving animals would be the next logical step. In this study, we chose to limit our examinations of calcium activity that was within the bounds determined by SR101 staining. Much work has shown that astrocyte territories are more akin to sponge-like morphology with small microdomains making up the end feet of their distal arborizations (Baldwin et al., 2024). Here, we took a conservative approach to not incorporate these fine morphological processes and only take SR101-postive pixels for analysis in order to reduce the possible error of including a neighboring astrocyte or extracellular space in our analyses. Much work can be done to extend these results.”

      Reviewer #3 (Public Review):

      Summary:

      The study aims to elucidate the spatial dynamics of subcellular astrocytic calcium signaling. Specifically, they elucidate how subdomain activity above a certain spatial threshold (~23% of domains being active) heralds a calcium surge that also affects the astrocytic soma. Moreover, they demonstrate that processes on average are included earlier than the soma and that IP3R2 is necessary for calcium surges to occur. Finally, they associate calcium surges with slow inward currents. Strengths:

      The study addresses an interesting topic that is only partially understood. The study uses multiple methods including in vivo two-photon microscopy, acute brain slices, electrophysiology, pharmacology, and knockout models. The conclusions are strengthened by the same findings in both in vivo anesthetized mice and in brain slices.

      We thank the reviewer for the positive assessment of the study and his/her comments.

      Weaknesses:

      The method that has been used to quantify astrocytic calcium signals only analyzes what seems to be a small proportion of the total astrocytic domain on the example micrographs, where a structure is visible in the SR101 channel (see for instance Reeves et al. J. Neurosci. 2011, demonstrating to what extent SR101 outlines an astrocyte). This would potentially heavily bias the results: from the example illustrations presented it is clear that the calcium increases in what is putatively the same astrocyte goes well beyond what is outlined with automatically placed small ROIs. The smallest astrocytic processes are an order of magnitude smaller than the resolution of optical imaging and would not be outlined by either SR101 or with the segmentation method judged by the ROIs presented in the figures. Completely ignoring these very large parts of the spatial domain of an astrocyte, in particular when making claims about a spatial threshold, seems inappropriate. Several recent methods published use pixel-by-pixel event-based approaches to define calcium signals. The data should have been analyzed using such a method within a complete astrocyte spatial domain in addition to the analyses presented. Also, the authors do not discuss how two-dimensional sampling of calcium signals from an astrocyte that has processes in three dimensions (see Bindocci et al, Science 2017) may affect the results: if subdomain activation is not homogeneously distributed in the three-dimensional space within the astrocyte territory, the assumptions and findings between a correlation between subdomain activation and somatic activation may be affected.

      In order to reduce noise from individual pixels, we chose to segment astrocyte arborizations into domains of several pixels. As pointed out previously, including pixels outside of the SR101-positive territory runs the risk of including a pixel that may be from a neighboring cell or mostly comprised of extracellular space, and we chose the conservative approach to avoid this source of error. We agree that the results have limitations from being acquired in 2D instead of 3D, but it is likely to assume the 3D astrocyte is homogeneously distributed and that the 2D plane is representative of the whole astrocyte. Indeed, no dimensional effects were reported in Bindocci et al, Science 2017. We have included a paragraph in the discussion to address this limitation in our study on P15, L23-27:

      “The investigation of the spatial threshold could be improved in the future in a number of ways. One being the use of state-of-the-art imaging in 3D(Bindocci et al., 2017). While the original publication using 3D imaging to study astrocyte physiology does not necessarily imply that there would be different calcium dynamics in one axis over another, the three-dimensional examination of the spatial threshold could refine the findings we present here.”

      The experiments are performed either in anesthetized mice, or in slices. The study would have come across as much more solid and interesting if at least a small set of experiments were performed also in awake mice (for instance during spontaneous behavior), given the profound effect of anesthesia on astrocytic calcium signaling and the highly invasive nature of preparing acute brain slices. The authors mention the caveat of studying anesthetized mice but claim that the intracellular machinery should remain the same. This explanation appears a bit dismissive as the response of an astrocyte not only depends on the internal machinery of the astrocyte, but also on how the astrocyte is stimulated: for instance synaptic stimulation or sensory input likely would be dependent on brain state and concurrent neuromodulatory signaling which is absent in both experimental paradigms. The discussion would have been more balanced if these aspects were dealt with more thoroughly.

      Yes, we agree that this is a limitation, and we acknowledge this is in the Discussion P15, L27-31:

      “To better control the system, mice imaged here were under anesthesia, and this is a method that has been used to characterize many foundational physiological results in the field (Hubel and Wiesel, 1962; Mountcastle et al., 1957). However, assessing the spatial threshold in awake freely moving animals would be the next logical step.”

      The study uses a heaviside step function to define a spatial 'threshold' for somata either being included or not in a calcium signal. However, Fig 4E and 5D showing how the method separates the signal provide little understanding for the reader. The most informative figure that could support the main finding of the study, namely a ~23% spatial threshold for astrocyte calcium surges reaching the soma, is Fig. 4G, showing the relationship between the percentage of arborizations active and the soma calcium signal. A similar plot should have been presented in Fig 5 as well. Looking at this distribution, though, it is not clear why ~23% would be a clear threshold to separate soma involvement, one can only speculate how the threshold for a soma event would influence this number. Even if the analyses in Fig. 4H and the fact that the same threshold appears in two experimental paradigms strengthen the case, the results would have been more convincing if several types of statistical modeling describing the continuous distribution of values presented in Fig. 4E (in addition to the heaviside step function) were presented.

      We agree with the reviewer and have added to the paper a discussion for our justification on the use of the Heaviside step function, and have included this in the methods section. We chose the Heaviside step function to represent the on/off situation that we observed in the data that suggested a threshold in the biology. We agree with the reviewer that Fig. 4G is informative and demonstrates that under 23% most of the soma fluorescence values are clustered at baseline. We agree that a different statistical model describing the data would be more convincing and confirmed the spatial threshold with the use of a confidence interval in the text and supported the use of percent domains active for this threshold over other properties such as spatial or temporal clustering using a general linear model. P18-19, L34-2:

      “Heaviside step function

      The Heaviside step function below in equation 4 is used to mathematically model the transition from one state to the next and has been used in simple integrate and fire models (Bueno-Orovio et al., 2008; Gerstner, 2000).

      The Heaviside step function 𝐻(𝑎) is zero everywhere before the threshold area (𝑎 ) and one everywhere afterwards. From the data shown in Figure 4E where each point (𝑆(𝑎)) is an individual astrocyte response with its percent area (𝑎) domains active and if the soma was active or not denoted by a 1 or 0 respectively. To determine 𝑎 in our data we iteratively subtracted 𝐻(𝑎) from  𝑆(𝑎) for all possible values of 𝑎 to create an error term over 𝑎. The area of the minimum of that error term was denoted the threshold area.”

      The description of methods should have been considerably more thorough throughout. For instance which temperature the acute slice experiments were performed at, and whether slices were prepared in ice-cold solution, are crucial to know as these parameters heavily influence both astrocyte morphology and signaling. Moreover, no monitoring of physiological parameters (oxygen level, CO2, arterial blood gas analyses, temperature etc) of the in vivo anesthetized mice is mentioned. These aspects are critical to control for when working with acute in vivo two-photon microscopy of mice; the physiological parameters rapidly decay within a few hours with anesthesia and following surgery.

      We have increased the thoroughness of our methods section. Especially including that body temperature and respiration were indeed monitored throughout anesthesia.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      (1) We think it would improve the paper if the authors provided a frame-by-frame example over (for example) 10-15 frames showing the spatiotemporal evolution of responses, where each frame represents 1s or 2s. This could be included with the temporal maps we proposed above.

      We agree that this is a useful example and have included it in our new figure (new Figure 5, specifically see Figure 5A) that uses temporal maps to analyze the spatiotemporal properties of calcium dynamics (Figure 5B).

      (2) Concerning the evidence in the present manuscript, we are not clear on what "populations" means. Can the authors clarify in methods? It is our understanding that 987 astrocytes from 30 populations from 3 mice were the source for the core data in the paper. What are the 30 populations, and how were the 987 astrocytes distributed across the populations? Are they roughly 10 FOVs per mouse? If so, please clarify roughly how far apart FOVs from the same mouse were, and how much delay between stim protocol application there was when a FOV was changed to a new FOV. Also, if for example, the 10th FOV from mouse 1 "saw" 9 rounds of stimulation before recording the response to the 10th stim round. To this point, was there any indication of response differences in populations that were recorded earlier vs later in the experimental sequence for each mouse?

      Descriptions of data will be included with the uploaded datasets following acceptance.

      (3) The description of the results on page 6 is a bit confusing for us. In lines 1-4, are the authors saying that 57.7% of astrocytes in a FOV exhibited responses within their soma and arborization, while 15.1% had responses only in arborization? If so, this is not clear to us from Figure 2C, where we count ~25 astrocytes in the FOV, maybe 8 or 9 astrocytes with activity in the arborization + soma (after stimulation), and 8 or 9 astrocytes with responses only in arborization. Is there something we do not understand, or is the second panel simply not representative of the group data?

      Figure 2D is representative of the group data and does indeed show 57.7% of the population responds within the soma and arborization, and a 15.1% of astrocytes with responses in only their arborizations. It is unable to observe in this image whether arborizations are active or just increases in one or a few domains, as may not be enough activity to be detected when sampling over the entire arborization.

      (4) In the second part of page 6 - when the authors apply linear regression - are they saying that there is a linear relationship between the amount (area) of activity measured in the arborization versus the soma, where populations of astrocytes with 50% activation of the arborization also tend to have 50% activation in their somas? If so, then this is not apparent by the map provided in Figure 2C, where it looks like soma activation (within the subpopulation) is 100% irrespective of the apparent activity in the arborization. This needs to be clarified. If not, and what they mean is that the probability of finding an active soma is related to the amount of activation within the arborization, this needs to be stated more clearly.

      When testing the linear relationship between somas active vs arborizations active, we find a significant linear correlation (p < 0.001, R2 = 0.90).

      (5) In the experiments where stimulation duration, frequency, and intensity were varied to determine the percentage of domains that were on, it would be helpful to better understand the protocol in terms of sequence. In the methods it seems that hindpaw stimulation intensity was first pseudo-randomly varied at 2Hz for 10s, followed by pseudorandomly varied stimulation frequency and then pseudo-randomly varied duration - both at 2mA for 10s. Is this correct?

      We have since updated the methods section to better describe the experimental protocol.

      (6) In Figure 3E the alignment of the "arbor" to the somatic response is a bit misleading. The signals being averaged for the "arbor" are composed of temporally heterogeneous sources (from distal and proximal domains) and when averaged will produce an artificially slow rise time. In contrast, the averaged somatic signals are composed of much more homogenous sources (arising from a more singular event) and therefore have a sharp rise time. It would make more sense to align their kinetics relative to the stimulus onset. It would also make more sense to compare the somatic response of astrocytes to the "arbor" of astrocytes which respond rapidly vs slowly to the foot-shock.

      Aligning the responses to the stimulus onset would exacerbate the artificially slow rise time for the soma and arborization as not all cells come online at the same time from stimulus onset.

      Reviewer #2 (Recommendations For The Authors):

      Data availability

      It seems that the data is not shared on a public repository, while it appears to be necessary according to eLife's general principles (see https://elife-rp.msubmit.net/html/eliferp_author_instructions.html#dataavailability).

      We will upload raw data to a repository upon acceptance of the manuscript.

      Data analysis

      - Why did the authors choose the heaviside step function to characterize conditions for somatic event initiation? It seems that this approach is averaging very heterogeneous data (some cells do not display somatic events even with ~50% domains active while some display somatic events with < 5 it seems).

      Please see discussion to variability in the responses to the public reviews. We have since included more discussion on the use of the Heaviside step function in the Methods section.  

      - Averaging of the data. It seems that the approach chosen to quantify calcium activity overlooks the variability of the signals measured ("Astrocyte calcium quantifications were averaged over all astrocytes of a single video and these values were used in statistical testing.", l.22-23, page 15). What is the variability of the measured features between different astrocytes? Between different animals? To what extent does this averaging strategy overlook the variability of the signals/how much information do we expect to lose? The manuscript would probably benefit from a more advanced statistical approach to analyze the data.

      Is it possible to extract information from the data that would indicate mechanisms allowing somatic activity when the percentage of domain activation was lower than the threshold? How about the opposite (i.e when no global event was triggered even when the percentage of domain activation was high)?

      We are indeed combining the responses from many different diverse astrocyte responses, and we see this as a strength of the paper. Variation is a hallmark of biology, and we have added this to the discussion. In the rare cases where astrocyte somas do not come online when the percent of arborizations is over threshold, or the opposite when somas activate with little domain activation, we would say this is most likely due to imaging 2D instead of the entire 3D cell. We have also added this into our discussion.

      - Here are a few suggestions for additional analysis that might be of interest to the community:

      - Measuring calcium activity in domains depending on their distance from the soma. This would allow us to better understand the spatial integration of the signals and notably answer the following question: Does the emergence of somatic events depend on the spatial distribution of active domains? (and does a smaller domain-soma distance facilitate the emergence of a calcium surge with a lower percentage of active domains?) These measurements could be visualized with plots of xy position of the domains (domain-soma distance) = f(time) with a colormap reflecting dF/F0, for example, at different times pre- and post-somatic events. Instead of DF/F0, these plots could also display the correlation between domain activities.

      We have performed this analysis, and it is now in the new figure (new Figure 5).

      - Adding temporality to the data analysis. It seems that calcium activity is "concatenated" during the whole duration prior to the somatic event (pre-soma) and after (post-soma). However, it is unclear how long the domains remained active and how many domains were still active at the onset of the somatic event. Adding a finer temporal analysis might help answer questions such as the potential need for some degree of synchronization of domain activity to trigger calcium surges.

      It could notably be interesting to measure the level of synchrony of events as a function of their distance from the soma and to analyze how it correlates with the properties of the somatic event.

      We have now included temporal analysis of astrocyte calcium surge in our new figure (new Figure 5). While we did see examples of spatially clustered domain activation in our data, those examples usually included other non-clustered domain activities and when including all of the active domains within an astrocytes arborization, we found no difference between the distance between activated domains before and after soma activation, even when comparing to subthreshold domain activity.

      Experiments

      - Would it be possible to apply different levels of stimulation to a given cell in order to discriminate whether the "no-soma" cells can display somatic events when neuronal activity is enhanced?

      Increased sensory stimulation does increase soma activity (Please see Lines et al., Nature Communications, 2020). An example of increased stimulation leading to somatic activation where it was not present in lower stimuli can be seen in Figure 4A-C.

      - Why choose a stimulation of 2 mA, 2 Hz for 20 sec in the experiments on IP3R2-/- mice?

      Has the same set of various stimulation protocols featured in Figure 4 been applied to IP3R2-/- mice? If so, were more domains activated as stimulation intensity (amplitude; duration, or frequency) increased? Could it trigger somatic events? This information seems necessary to be able to assert that calcium surges rely on the IP3R2 pathway.

      These experiments were not performed.

      -  Adding intermediary values of ATP pulse duration to Figure 6 (e.g. 50 ms and 75 ms) might strengthen the claim that the linear increase of SIC frequency with ATP application duration is only observed above the ~23% threshold.

      Agreed, however these experiments were not performed.

      Minor corrections to the text and figures.

      Methods

      The reader might benefit from a little more detail regarding the analysis of calcium signals. Notably, what was the duration of the calcium recordings? Was it constant across the different conditions tested in the study? Was it different in slice experiments versus in vivo experiments? What were the durations of the pre- and post- soma recordings and their variability? Was the calcium activity normalized for each astrocyte or animal? If not, why not consider normalizing the post-stimulation activity with pre-stimulation baseline activity?

      Similarly, some information on the stimulation protocol seems to be lacking: what was the frequency and intensity of the stimulus in the experiments where stimulus duration varied? Concurrently, what were the duration and intensity when frequency varied? What were the duration and frequency when the intensity varied?

      It might be beneficial to add further information on the algorithm of the Calsee software. What is it performing? How was it tested? Why is it referred to as "semi"-automatic, i.e. what might the user be needing to do manually? The segmentation seems to be omitting some branches connecting distal ROIs to the soma (see e.g. Fig S1.E). How would this influence the analysis and results?

      Results

      - Some assessments in the manuscript seem a bit too assertive/general compared to what can be deduced from the evidence presented in the figures. It could be beneficial to the reader to rephrase the latter. Some examples are listed below:

      - "These results indicate that astrocyte responses occurred initially in the arborizations, which is consistent with the idea that synapses are likely to be accessed at the astrocyte arborization ", l.11-12 page 7. The fact that the time to peak is lower in the arborization does not necessarily mean that signals initiate there. It could be because the kinetics/pathways in those compartments are different or there could be a dilution effect in the soma. Indeed, an influx of the same amount of calcium ions in the soma vs in a small domain will not correspond to the same DF/F0 in those compartments and might thus remain undetected in the soma.

      - "Using transgenic IP3R2-/- mice, we found that the activation of type-2 IP3 receptors is necessary for the generation of astrocyte calcium surge" (page 4, line 1-2), "present data further demonstrate that IP3R2 are necessary for the propagation of astrocyte calcium surge." (l. 18-19 page 13) -> As discussed above, the evidence does not seem to be strong enough to assert that IP3R2 is necessary to trigger somatic events. The results indicate that the IP3R2 pathway seems to facilitate the emergence of somatic events. As astrocytes differ strongly in terms of morphology and expression profiles depending on physiological conditions, the conclusions of this study might only apply to the specific experimental conditions used: region studied, age of the animal, type of sensory stimuli performed, and so on.

      - "These results indicate that spatial threshold of the astrocyte calcium surge has a functional impact on gliotransmission, which have important consequences on the spatial extension of the astrocyte-neuron communication and synaptic regulation", l.41-48 page 11. Figure 6 seems to indicate a correlation between the proportion of astrocyte domains activated and the frequency of SICs. The data seems insufficient to conclude that there is a causal relationship between calcium surge in the astrocyte and gliotransmission or SIC frequency.

      -" These results indicate that, on average, subcellular calcium events located in astrocyte arborizations are related to soma activation.", page 6 l 15-16. It may be more informative to specify the correlation measured: i.e the larger the arborization activity, the larger the percentage of active somas.

      Figures

      Figure 2: Adding more details in the figure legend explaining how the different parameters are calculated might be useful to the reader. Notably, what does soma active (%) refer to?

      Figure 3: Could it be possible to add individual traces of calcium activity in the soma and arborization of individual cells to provide a glimpse of the variability of the signals measured?

      Fig4. B-C: Could it be possible to add in the legend information on the timeline between stimulation and calcium signal recording? (and the duration of the latter).

      Fig4 D-E: Why is the maximum number of active domains in panel D ~50-60% but goes up to ~100% in panel E? Could it be that plotting SEM rather than STD might misrepresent the variability in the percentage of active domains for each stimulus property?

      Fig4F: It seems that the threshold changes with the frequency of the stimulus: e.g. at 10 Hz, the threshold seems larger than 22.6%. What would that mean?

      Fig4G: - Why do some data points display a soma amplitude < 0 DF/F0 ?

      - Why choose a sigmoid fit? What are the statistics associated to the fit? Is it in accordance with the threshold of 23%? Would a linear fit provide a good fit?

      Fig5F: - It seems that a few IP3R2-/- astrocytes displayed somatic events? If so, it might be interesting to mention this in the discussion section and to speculate on why that might be. - It seems that panel 5F displays the average percentage of somas that got activated rather than the probability of somatic events.

      - Is it possible that the effect seen in domains vs arborization is due to statistical effects (as n=2450 vs 112)?

      Fig S1: Panel D legend: double labeling of the radius used for each plot might be useful, notably for colorblind readers as the colors might be hard to see.

      Discussion

      - The discussion section might benefit from a discussion on the similitude between the data presented here and previous reports that reported similar results, i.e that most calcium signals in astrocytes were located in the distal processes, forming microdomains that rarely propagated to the soma. These include Bindocci et al 2017 Science (DOI:10.1126/science.aai8185) and Georgiou et al, Science Advances, 2022 (DOI: 10.1126/sciadv.abe5371).

      Thank you for the suggestions. We have now changed portions of the Methods, Results  and Discussion sections.

      Reviewer #3 (Recommendations For The Authors):

      The text could potentially be improved somewhat.

      Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      There is a long-standing idea that choices influence evaluation: options we choose are re-evaluated to be better than they were before the choice. There has been some debate about this finding, and the authors developed several novel methods for detecting these re-evaluations in task designs where options are repeatedly presented against several alternatives. Using these novel methods the authors clearly demonstrate this re-evaluation phenomenon in several existing datasets.

      Strengths:

      The paper is well-written and the figures are clear. The authors provided evidence for the behaviour effect using several techniques and generated surrogate data (where the ground truth is known) to demonstrate the robustness of their methods.

      Weaknesses:

      The description of the results of the fMRI analysis in the text is not complete: weakening the claim that their re-evaluation algorithm better reveals neural valuation processes.

      We appreciate the reviewer’s comment regarding the incomplete account of the fMRI results. In response, we implemented Reviewer #2's suggestion to run additional GLM models for a clearer interpretation of our findings. We also took this opportunity to apply updated preprocessing to the fMRI data and revise the GLM models, making them both simpler and more comprehensive. The results section is thus substantially revised, now including a new main figure and several supplemental figures that more clearly present our fMRI findings. Additionally, we have uploaded the statistical maps to NeuroVault, allowing readers to explore the full maps interactively rather than relying solely on the static images in the paper. The new analyses strengthen our original conclusion: dynamic values (previously referred to as revalued values, following the reviewer’s suggestion) better explain BOLD activity in the ventromedial prefrontal cortex, a region consistently associated with valuation, than static values (values reported prior to the choice phase in the auction procedure).

      Reviewer #2 (Public Review):

      Summary:

      Zylberberg and colleagues show that food choice outcomes and BOLD signal in the vmPFC are better explained by algorithms that update subjective values during the sequence of choices compared to algorithms based on static values acquired before the decision phase. This study presents a valuable means of reducing the apparent stochasticity of choices in common laboratory experiment designs. The evidence supporting the claims of the authors is solid, although currently limited to choices between food items because no other goods were examined. The work will be of interest to researchers examining decision-making across various social and biological sciences.

      Strengths:

      The paper analyses multiple food choice datasets to check the robustness of its findings in that domain.

      The paper presents simulations and robustness checks to back up its core claims.

      Weaknesses:

      To avoid potential misunderstandings of their work, I think it would be useful for the authors to clarify their statements and implications regarding the utility of item ratings/bids (e-values) in explaining choice behavior. Currently, the paper emphasizes that e-values have limited power to predict choices without explicitly stating the likely reason for this limitation given its own results or pointing out that this limitation is not unique to e-values and would apply to choice outcomes or any other preference elicitation measure too. The core of the paper rests on the argument that the subjective values of the food items are not stored as a relatively constant value, but instead are constructed at the time of choice based on the individual's current state. That is, a food's subjective value is a dynamic creation, and any measure of subjective value will become less accurate with time or new inputs (see Figure 3 regarding choice outcomes, for example). The e-values will change with time, choice deliberation, or other experiences to reflect the change in subjective value. Indeed, most previous studies of choice-induced preference change, including those cited in this manuscript, use multiple elicitations of e-values to detect these changes. It is important to clearly state that this paper provides no data on whether e-values are more or less limited than any other measure of eliciting subjective value. Rather, the paper shows that a static estimate of a food's subjective value at a single point in time has limited power to predict future choices. Thus, a more accurate label for the e-values would be static values because stationarity is the key assumption rather than the means by which the values are elicited or inferred.

      Thank you for this helpful comment. We changed the terminology following the reviewer’s suggestion. The “explicit” values (e-values or ve) are now called “static” values (s-values or vs). Accordingly, we also changed the “Reval” values (r-values or vr) to “dynamic” values (d-values or vd).

      We also address the reviewer's more general point about the utility of item ratings/bids (s-values) and whether our results are likely to hold with other ways of eliciting subjective values. We added a new sub-section in Discussion addressing this and other limitations of our study. To address the reviewer’s point, we write:

      “One limitation of our study is that we only examined tasks in which static values were elicited from explicit reports of the value of food items. It remains to be determined if other ways of eliciting subjective values (e.g., Jensen and Miller, 2010) would lead to similar results. We think so, as the analysis of trials with identical item pairs (Fig. 3) and the difference between forward and backward Reval (Fig. 7) are inconsistent with the notion that values are static, regardless of their precise value. It also remains to be determined if our results will generalize to non-food items whose value is less sensitive to satiety and other dynamic bodily states. Perceptual decisions also exhibit sequential dependencies, and it remains to be explored whether these can be explained as a process of value construction, similar to what we propose here for the food-choice task (Gupta et al., 2024; Cho et al., 2002; Zylberberg et al., 2018; Abrahamyan et al., 2016).”

      There is a puzzling discrepancy between the fits of a DDM using e-values in Figure 1 versus Figure 5. In Figure 1, the DDM using e-values provides a rather good fit to the empirical data, while in Figure 5 its match to the same empirical data appears to be substantially worse. I suspect that this is because the value difference on the x-axis in Figure 1 is based on the e-values, while in Figure 5 it is based on the r-values from the Reval algorithm. However, the computation of the value difference measure on the two x-axes is not explicitly described in the figures or methods section and these details should be added to the manuscript. If my guess is correct, then I think it is misleading to plot the DDM fit to e-values against choice and RT curves derived from r-values. Comparing Figures 1 and 5, it seems that changing the axes creates an artificial impression that the DDM using e-values is much worse than the one fit using r-values.

      We agree with the reviewer that this way of presenting the DDM fits could be misleading. In the previous version of the manuscript, we included the two fits in the same figure panel to make it clear that the sensitivity (slope) of the choice function is greater when we fit the data using the r-values (now d-values) than when we fit them using the e-values (now s-values). In the revised version of Figure 5, we include the data points already shown in Figure 1, so that each DDM fit is shown with their corresponding data points. Thus we avoid giving the false impression that the DDM model fit using the s-values is much worse than the one fit using the d-values. This said, the fit is indeed worse, as we now show with the formal model comparison suggested by the reviewer (next comment).

      Relatedly, do model comparison metrics favor a DDM using r-values over one using e-values in any of the datasets tested? Such tests, which use the full distribution of response times without dividing the continuum of decision difficulty into arbitrary hard and easy bins, would be more convincing than the tests of RT differences between the categorical divisions of hard versus easy.

      We now include the model comparison suggested by the reviewer. The comparison shows that the DDM model using dynamic values explains the choice and response time data better than one using static values. One potential caveat of this comparison, which explains why we did not include it in the original version of the manuscript, is that the d-values are obtained from a fit to the choice data, which could bias the subsequent DDM comparison. We control for this in three ways: (1) by calculating the difference in Bayesian Information Criterion (BIC) between the models, penalizing the DDM model that uses the d-values for the additional parameter (δ); (2) by comparing the difference in BIC against simulations of a model in which the choice and RT data were obtained assuming static values; this analysis shows that if values were static, the DDM using static values would be favored in the comparison despite having one fewer parameter; (3) ignoring the DDM fit to the choices in the model comparison, and just comparing how well the two models explain the RTs; this comparison is unbiased because the δ values are fit only to the choice data, not the RTs. These analyses are now included in Figure 5 and Figure 5–Figure supplement 2.

      Revaluation and reduction in the imprecision of subjective value representations during (or after) a choice are not mutually exclusive. The fact that applying Reval in the forward trial order leads to lower deviance than applying it in the backwards order (Figure 7) suggests that revaluation does occur. It doesn't tell us if there is also a reduction in imprecision. A comparison of backwards Reval versus no Reval would indicate whether there is a reduction in imprecision in addition to revaluation. Model comparison metrics and plots of the deviance from the logistic regression fit using e-values against backward and forward Reval models would be useful to show the relative improvement for both forms of Reval.

      We agree with the reviewer that the occurrence of revaluation does not preclude other factors from affecting valuation. Following the reviewer’s suggestion we added a panel to Figure 6 (new panel B), in which we show the change in the deviance from the logistic regression fits between Reval (forward direction) and no-Reval. The figure clearly shows that the difference in deviance for the data is much larger than that obtained from simulations of choice data generated from the logistic fits to the static values (shown in red).

      Interestingly, we also observe that the deviance obtained after applying Reval in the backward direction is lower than that obtained using the s-values. We added a panel to figure 7 showing this (Fig. 7B). This observation, however, does not imply that there are factors affecting valuation besides revaluation (e.g.,”reduction in imprecision”). Indeed, as we now show in a new panel in Figure 11 (panel F), the same effect (lower deviance for backward Reval than no-Reval) is observed in simulations of the ceDDM.

      Besides the new figure panels (Fig. 6B, 7B, 11F), we mention in Discussion (new subsection, “Limitations...”, paragraph #2) the possibility that there are other non-dynamic contributions to the reduction in deviance for Backward Reval compared to no-Reval:

      “Another limitation of our study is that, in one of the datasets we analyzed (Sepulveda et al. 2020), applying Reval in the forward direction was no better than applying it in the backward direction (Fig. 10). We speculate that this failure is related to idiosyncrasies of the experimental design, in particular, the use of alternating blocks of trials with different instructions (select preferred vs. select non-preferred). More importantly, Reval applied in the backward direction led to a significant reduction in deviance relative to that obtained using the static values. This reduction was also observed in the ceDDM, suggesting that the effect may be explained by the changes in valuation during deliberation. However, we cannot discard a contribution from other, non-dynamic changes in valuation between the rating and choice phase including contextual effects (Lichtenstein and Slovic, 2006), stochastic variability in explicit value reporting (Polania et al., 2019), and the limited range of numerical scales used to report value.”

      Did the analyses of BOLD activity shown in Figure 9 orthogonalize between the various e-valueand r-value-based regressors? I assume they were not because the idea was to let the two types of regressors compete for variance, but orthogonalization is common in fMRI analyses so it would be good to clarify that this was not used in this case. Assuming no orthogonalization, the unique variance for the r-value of the chosen option in a model that also includes the e-value of the chosen option is the delta term that distinguishes the r and e-values. The delta term is a scaled count of how often the food item was chosen and rejected in previous trials. It would be useful to know if the vmPFC BOLD activity correlates directly with this count or the entire r-value (e-value + delta). That is easily tested using two additional models that include only the r-value or only the delta term for each trial.

      We did not orthogonalize the static value and dynamic value regressors. We have included this detail in the revised methods. We thank the reviewer for the suggestion to run additional models to improve our ability to interpret our findings. We have substantially revised all fMRI-related sections of the paper. We took this opportunity to apply standardized and reproducible preprocessing steps implemented in fmriprep, present whole-brain corrected maps on a reconstructed surface of a template brain, and include links to the full statistical maps for the reader to navigate the full map, rather than rely on the static image in the figures. We implemented four models in total: model 1 includes both static value (Vs) obtained during the auction procedure prior to the choice phase and dynamic value (Vd) output by the revaluation algorithm (similar to the model presented in the first submission); model 2 includes only delta = Vd - Vs; model 3 includes only Vs; model 4 includes only Vd. All models included the same confound and nuisance regressors. We found that Vd was positively related to BOLD in vmPFC when accounting for Vs, correcting for familywise error rate at the whole brain level. Interestingly, the relationship between delta and vmPFC BOLD did not survive whole-brain correction and the effect size of the relationship between Vd and vmPFC bold in model 4 was larger than the effect size of the relationship between Vs and vmPFC bold in model 3 and survived correction at the whole brain level encompassing more of the vmPFC. Together, these findings bolster our claim that Vd better accounts for BOLD variability in vmPFC, a brain region reliably linked to valuation.

      Please confirm that the correlation coefficients shown in Figure 11 B are autocorrelations in the MCMC chains at various lags. If this interpretation is incorrect, please give more detail on how these coefficients were computed and what they represent.

      We added a paragraph in Methods explaining how we compute the correlations in Figure 11B (last paragraph of the sub-section “Correlated-evidence DDM” in Methods):

      “The correlations in Fig. 11B were generated using the best-fitting parameters for each participant to simulate 100,000 Markov chains. We generate Markov chain samples independently for the left and right items over a 1-second period. To illustrate noise correlations, the simulations assume that the static value of both the left and right items is zero. We then and for each of the Markov chains (𝑥). Pearson's𝑥 correlation is computed between these 𝑡 calculate the difference in dynamic value ( ) between the left and right items at each time (𝑡) differences at time zero, 𝑥𝑖(𝑡 = 0), and at time 𝑥𝑖(𝑡 = τ), for different time lags τ. Correlations were calculated independently for each participant. Each trace in Fig. 11B represents a different participant.”

      The paper presents the ceDDM as a proof-of-principle type model that can reproduce certain features of the empirical data. There are other plausible modifications to bounded evidence accumulation (BEA) models that may also reproduce these features as well or better than the ceDDM. For example, a DDM in which the starting point bias is a function of how often the two items were chosen or rejected in previous trials. My point is not that I think other BEA models would be better than the ceDDM, but rather that we don't know because the tests have not been run. Naturally, no paper can test all potential models and I am not suggesting that this paper should compare the ceDDM to other BEA processes. However, it should clearly state what we can and cannot conclude from the results it presents.

      Indeed, the ceDDM should be interpreted as a proof-of-principle model, which shows that drifting values can explain many of our results. It is definitely wrong in the details, and we are open to the possibility that a different way of introducing sequential dependencies between decisions may lead to a better match to the experimental data. We now mention this in a new subsection of Discussion, “Limitations...” paragraph #3:

      “Finally, we emphasize that the ceDDM should be interpreted as a proof-of-principle model used to illustrate how stochastic fluctuations in item desirability can explain many of our results. We chose to model value changes following an MCMC process. However, other stochastic processes or other ways of introducing sequential dependencies (e.g., variability in the starting point of evidence accumulation) may also explain the behavioral observations. Furthermore, there likely are other ways to induce changes in the value of items other than through past decisions. For example, attentional manipulations or other experiences (e.g., actual food consumption) may change one's preference for an item. The current version of the ceDDM does not allow for these influences on value, but we see no fundamental limitation to incorporating them in future instantiations of the model.”

      This work has important practical implications for many studies in the decision sciences that seek to understand how various factors influence choice outcomes. By better accounting for the context-specific nature of value construction, studies can gain more precise estimates of the effects of treatments of interest on decision processes.

      Thank you!

      That said, there are limitations to the generalizability of these findings that should be noted.

      These limitations stem from the fact that the paper only analyzes choices between food items and the outcomes of the choices are not realized until the end of the study (i.e., participants do not eat the chosen item before making the next choice). This creates at least two important limitations. First, preferences over food items may be particularly sensitive to mindsets/bodily states. We don't yet know how large the choice deltas may be for other types of goods whose value is less sensitive to satiety and other dynamic bodily states. Second, the somewhat artificial situation of making numerous choices between different pairs of items without receiving or consuming anything may eliminate potential decreases in the preference for the chosen item that would occur in the wild outside the lab setting. It seems quite probable that in many real-world decisions, the value of a chosen good is reduced in future choices because the individual does not need or want multiples of that item. Naturally, this depends on the durability of the good and the time between choices. A decrease in the value of chosen goods is still an example of dynamic value construction, but I don't see how such a decrease could be produced by the ceDDM.

      These are all great points. The question of how generalizable our results are to other domains is wide open. We do have preliminary evidence suggesting that in a perceptual decision-making task with two relevant dimensions (motion and color; Kang, Loffler et al. eLife 2021), the dimension that was most informative to resolve preference in the past is prioritized in future decisions. We believe that a similar process underlies the apparent change in value in value-based decisions. We decided not to include this experiment in the manuscript, as it would make the paper much longer and the experimental designs are very different. Exploring the question of generality is a matter for future studies.

      We also agree that food consumption is likely to change the value of the items. For example, after eating something salty we are likely to want something to drink. We mention in the revised manuscript that time, choice deliberation, attentional allocation and other experiences (including food consumption) are likely to change the value of the alternatives and thus affect future choices and valuations.

      The ceDDM captures only sequential dependencies that can be attributed to values that undergo diffusion-type changes during deliberation. While the ceDDM captures many of the experimental observations, the value of an item may change for reasons not captured by the ceDDM. For example, food consumption is likely to change the value of items (e.g., wanting something to drink after eating something salty). The reviewer is correct that the current version of ceDDM could not account for these changes in value. However, we see no fundamental limitation to extending the ceDDM to account for them.

      We discuss these issues in a new subsection in Discussion (“Limitations...” paragraph #3).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Summary

      The authors address assumptions of bounded accumulation of evidence for value-based decision-making. They provide convincing evidence that subjects drift in their subjective preferences across time and demonstrate valuable methods to detect these drifts in certain task designs.

      My specific comments are intended to assist the authors with making the paper as clear as possible. My only major concern is with the reporting of the fMRI results.

      Thank you, please see our responses above for a description of the changes we made to the fMRI analyses.

      Specific comments

      - In the intro, I would ask the authors to consider the idea that things like slow drift in vigilance/motivation or faster drifts in spatial attention could also generate serial dependencies in perceptual tasks. I think the argument that these effects are larger in value-based tasks is reasonable, but the authors go a bit too far (in my opinion) arguing that similar effects do not exist *at all* in perceptual decision-making.

      We added a sentence in the Discussion (new section on Limitations, paragraph #1) mentioning some of the literature on sequential dependencies in perceptual tasks and asking whether there might be a common explanation for such dependencies for perceptual and value-based decisions. We tried including this in the Introduction, but we thought it disrupted the flow too much.

      - Figure 1: would it not be more clear to swap the order of panels A and B? Since B comes first in the task?

      We agree, we swapped the order of panels A and B.

      - Figure 2: the label 'simulations' might be better as 'e-value simulations'

      Yes, we changed the label ‘simulations’ to ‘simulations with s-values’ (we changed the term explicit value to static value, following a suggestion by Reviewer #2).

      - For the results related to Figure 2, some citations related to gaps between "stated versus revealed preferences" seem appropriate.

      We added a few relevant citations where we explain the results related to Figure 2.

      - Figure 3: in addition to a decrease in match preferences over the session, it would be nice to look at other features of the task which might have varied over the session. e.g. were earlier trials more likely to be predicted by e-value?

      We do see a trend in this direction, but the effect is not significant. The following figure shows the consistency of the choices with the stated values, as a function of the |∆value|, for the first half (blue) and the second half (red) of the trials. The x-axis discretizes the absolute value of the difference in static value between the left and right items, binned in 17 bins of approximately equal number of trials.

      Author response image 1.

      The slope is shallower for the second half, but a logistic regression model revealed that the difference is not significant:

      ,

      where Ilate is an indicator variable that takes a value of 1 for the second half of the trials and zero otherwise.

      As expected from the figure β2 was negative (-0.15) but the effect was not significant (p-value =0.32, likelihood ratio test).

      We feel we do not have much to say about this result, which may be due to lack of statistical power, so we would rather not include this analysis in the revised manuscript.

      It is worth noting that if we repeat the analysis using the dynamic values obtained from Reval instead of the static values, the consistency is overall much greater and little difference is observed between the first and second halves of the experiment:

      Author response image 2.

      - The e-value DDM fit in Figure 1C/D goes through the points pretty well, but the e-value fits in 5A do not because of a mismatch with the axis. The x-axis needs to say whether the value difference is the e-value or the r-value. Also, it seems only fair to plot the DDM for the r-value on a plot with the x-axis being the e-value.

      Thank you for this comment, we have now changed Figure 5A, such that both sets of data points are shown (data grouped by both e-values and by r-values). We agree that the previous version made it seem as if the fits were worse for the DDM fit to the e-values. The fits are indeed worse, as revealed by a new DDM model comparison (Figure 5–Figure supplement 2), but the effect is more subtle than the previous version of the figure implied.

      - How is Figure 5B "model free" empirical support? The fact that the r-value model gives better separation of the RTs on easy and hard trials doesn't seem "model-free" and also it isn't clear how this directly relates to being a better model. It seems that just showing a box-plot of the R2 for the RT of the two models would be better?

      We agree that “model free” may not be the best expression, since the r-values (now d-values) are derived from a model (Reval). Our intention was to make clear that because Reval only depends on the choices, the relationship between RT and ∆vdynamic is a prediction. We no longer use the term, model free, in the caption. We tried to clarify the point in Results, where we explain this figure panel. We have also included a new model comparison (Figure 5–Figure supplement 2), showing that the DDM model fit to the d-values explains choice and RT better than one fit to the s-values.

      This said, we do consider the separation in RTs between easy and hard trials to be a valid metric to compare the accuracy of the static and dynamic values. The key assumption is that there is a monotonically decreasing relationship between value difference, ∆v, and response time. The monotonic relationship does not need to hold for individual trials (due to the noisiness of the RTs) but should hold if one were to average a large enough number of trials for each value of ∆v.

      Under this assumption, the more truthful a value representation is (i.e., the closer the value we infer is to the true subjective value of the item on a given trial, assuming one exists), the greater the difference in RTs between trials judged to be difficult and those considered easy. To illustrate this with an extreme case, if an experimenter’s valuation of the items is very inaccurate (e.g., done randomly), then on average there will be no difference between easy and difficult RTs as determined by this scoring.

      - Line 189: Are the stats associated with Eq 7, was the model fit subject by subject? Combining subjects? A mixed-effects model? Why not show a scatter plot of the coefficients of Δvₑ and Δvᵣ (1 point/subject).

      The model was not fit separately for each subject. Instead, we concatenated trials from all subjects, allowing each subject to have a different bias term (β0,i ).

      We have now replaced it with the analysis suggested by the reviewer. We fit the logistic regression model independently for each participant. The scatter plot suggested by the reviewer is shown in Figure 5–Figure supplement 1. Error bars indicate the s.e. of the regression coefficients:

      It can be seen that the result is consistent with what we reported before: βd is significantly positive for all participants, while βs is not.

      - I think Figure S1 should be a main figure.

      Thank you for this suggestion, we have now included the former Figure S1 as an additional panel in Figure 5.

      - Fig 9 figure and text (line 259) don't exactly match. In the text it says that the BOLD correlated with vᵣ and not vₑ, but the caption says there were correlations with vᵣ after controlling for vₑ. Is there really nothing in the brain that correlated with vₑ? This seems hard to believe given how correlated the two estimates are. In the methods, 8 regressors are described. A more detailed description of the results is needed.

      Thank you for pointing out the inconsistency in our portrayal of the results in the main text and in the figure caption. We have substantially revised all fMRI methods, re-ran fMRI data preprocessing and implemented new, simpler, and more comprehensive GLM models following Reviewer #2's suggestion. Consequently, we have replaced Figure 9, added Figure 9 — Figure Supplement 1, and uploaded all maps to NeuroVault. These new models and maps allow for a clearer interpretation of our findings. More details about the fMRI analyses in the methods and results are included in the revision. We took care to use similar language in the main text and in the figure captions to convey the results and interpretation. The new analyses strengthen our original conclusion: dynamic values better explain BOLD activity in the ventromedial prefrontal cortex, a region consistently associated with valuation, than static values.

      - It's great that the authors reanalyzed existing datasets (fig 10). I think the ΔRT plots are the least clear way to show that _reval_ is better. Why not a figure like Figure 6a and Figure 7 for the existing datasets?

      We agree with the reviewer. We have replaced Fig. 10 with a more detailed version. For each dataset, we show the ΔRT plots, but we also show figures equivalent to Fig. 6a, Fig. 7a, and the new Fig. 6b (Deviance with and without Reval).

      Reviewer #2 (Recommendations For The Authors):

      I assume that the data and analysis code will be made publicly and openly available once the version of record is established.

      Yes, the data and analysis code is now available at: https://github.com/arielzylberberg/Reval_eLife_2024

      We added a Data Availability statement to the manuscript.

    1. Author response:

      Joint Public Review:

      In the microglia research community, it is accepted that microglia change their shape both gradually and acutely along a continuum that is influenced by external factors both in their microenvironments and in circulation. Ideally, a given morphological state reflects a functional state that provides insight into a microglia's role in physiological and pathological conditions. The current manuscript introduces MorphoCellSorter, an open-source tool designed for automated morphometric analysis of microglia. This method adds to the many programs and platforms available to assess the characteristics of microglial morphology; however, MorphoCellSorter is unique in that it uses Andrew's plotting to rank populations of cells together (in control and experimental groups) and presents "big picture" views of how entire populations of microglia alter under different conditions. Notably, MorphoCellSorter is versatile, as it can be used across a wide array of imaging techniques and equipment. For example, the authors use MorphoCellSorter on images of fixed and live tissues representing different biological contexts such as embryonic stages, Alzheimer's disease models, stroke, and primary cell cultures.

      This manuscript outlines a strategy for efficiently ranking microglia beyond the classical homeostatic vs. active morphological states. The outcome offers only a minor improvement over the already available strategies that have the same challenge: how to interpret the ranking functionally.

      We would like to thank the reviewers for their careful reading and constructive comments and questions. While MorphoCellSorter currently does not rank cells functionally based on their morphology, its broad range of application, ease of use and capacity to handle large datasets provide a solid foundation. Combined with advances in single-cell transcriptomics, MorphoCellSorter could potentially enable the future prediction of cell functions based on morphology.

      Strengths and Weaknesses:

      (1) The authors offer an alternative perspective on microglia morphology, exploring the option to rank microglia instead of categorizing them with means of clusterings like k-means, which should better reflect the concept of a microglia morphology continuum. They demonstrate that these ranked representations of morphology can be illustrated using histograms across the entire population, allowing the identification of potential shifts between experimental groups. Although the idea of using Andrews curves is innovative, the distance between ranked morphologies is challenging to measure, raising the question of whether the authors oversimplify the problem. 

      We have access to the distance between cells through the Andrew’s score of each cell. However, the challenge is that these distances are relative values and specific to each dataset. While we believe that these distances could provide valuable information, we have not yet determined the most effective way to represent and utilize this data in a meaningful manner.

      Also, the discussion about the pipeline's uniqueness does not go into the details of alternative models.The introduction remains weak in outlining the limitations of current methods (L90). Acknowledging this limitation will be necessary.

      Thank you for these insightful comments. The discussion about alternative methods was already present in the discussion L586-598 but to answer the request of the reviewers, we have revised the introduction and discussion sections to more clearly address the limitations of current methods, as well as discussed the uniqueness of the pipeline. Additionally, we have reorganized Figure 1 to more effectively highlight the main caveats associated with clustering, the primary method currently in use.

      (2) The manuscript suffers from several overstatements and simplifications, which need to be resolved. For example:

      a) L40: The authors talk about "accurately ranked cells". Based on their results, the term "accuracy" is still unclear in this context.

      Thank you for this comment. Our use of the term "accurately" was intended to convey that the ranking was correct based on comparison with human experts, though we agree that it may have been overstated. We have removed "accurately" and propose to replace it with "properly" to better reflect the intended meaning.

      b) L50: Microglial processes are not necessarily evenly distributed in the healthy brain. Depending on their embedded environment, they can have longer process extensions (e.g., frontal cortex versus cerebellum).

      Thank you for raising this point to our attention. We removed evenly to be more inclusive on the various morphologies of microglia cells in this introductory sentence

      c) L69: The term "metabolic challenge" is very broad, ranging from glycolysis/FAO switches to ATP-mediated morphological adaptations, and it needs further clarification about the author's intended meaning.

      Thank you for this comment, indeed we clarified to specify that we were talking about the metabolic challenge triggered by ischemia and added a reference as well.

      d) L75: Is morphology truly "easy" to obtain? 

      Yes, it is in comparison to other parameters such as transcripts or metabolism, but we understand the point made by the reviewer and we found another way of writing it.  As an alternative we propose: “morphology is an indicator accessible through…”

      e) L80: The sentence structure implies that clustering or artificial intelligence (AI) are parameters, which is incorrect. Furthermore, the authors should clarify the term "AI" in their intended context of morphological analysis.

      We apologize for this confusing writing, we reformulated the sentence as follows: “Artificial intelligence (AI) approaches such as machine learning have also been used to categorize morphologies (Leyh et al., 2021)”.

      f) L390f: An assumption is made that the contralateral hemisphere is a non-pathological condition. How confident are the authors about this statement? The brain is still exposed to a pathological condition, which does not stop at one brain hemisphere.

      We did not say that the contralateral is non-pathological but that the microglial cells have a non-pathological morphology which is slightly different. The contralateral side in ischemic experiments is classically used as a control (Rutkai et al 2022). Although It has been reported that differences in transcript levels can be found between sham operated animals and contralateral hemisphere in tMCAO mice (Filippenkov et al 2022) https://doi.org/10.3390/ijms23137308 showing that indeed the contralateral side is in a different state that sham controls, no report have been made on differences in term of morphology.

      We have removed “non-pathological” to avoid misinterpretations

      g) Methodological questions:

      a) L299: An inversion operation was applied to specific parameters. The description needs to clarify the necessity of this since the PCA does not require it.

      Indeed, we are sorry for this lack of explanation. Some morphological indexes rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, simplifying data interpretation. This clarification has been added to the revised manuscript as follows:

      “Lacunarity, roundness factor, convex hull radii ratio, processes cell areas ratio and skeleton processes ratio were subjected to an inversion operation in order to homogenize the parameters before conducting the PCA: indeed, some parameters rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, thus simplifying data interpretation.”

      b) Different biological samples have been collected across different species (rat, mouse) and disease conditions (stroke, Alzheimer's disease). Sex is a relevant component in microglia morphology. At first glance, information on sex is missing for several of the samples. The authors should always refer to Table 1 in their manuscript to avoid this confusion. Furthermore, how many biological animals have been analyzed? It would be beneficial for the study to compare different sexes and see how accurate Andrew's ranking would be in ranking differences between males and females. If they have a rationale for choosing one sex, this should be explained.

      As reported in the literature, we acknowledge the presence of sex differences in microglial cell morphology. Due to ethical considerations and our commitment to reducing animal use, we did not conduct dedicated experiments specifically for developing MorphoCellSorter. Instead, we relied on existing brain sections provided by collaborators, which were already prepared and included tissue from only one sex—either female or male—except in the case of newborn pups, whose sex is not easily determined. Consequently, we were unable to evaluate whether MorphoCellSorter is sensitive enough to detect morphological differences in microglia attributable to sex. Although assessing this aspect is feasible, we are uncertain if it would yield additional insights relevant to MorphoCellSorter’s design and intended applications.

      To address this, we have included additional references in Table 1 of the revised manuscript and clearly indicated the sex of the animals from which each dataset was obtained.

      c) In the methodology, the slice thickness has been given in a range. Is there a particular reason for this variability? 

      We could not spot any range in the text, we usually used 30µm thick sections in order to have entire or close to entire microglia cells.

      Although the thickness of the sections was identical for all the sections of a given dataset, only the plans containing the cells of interest were selected during the imaging for both of the ischemic stroke model. This explains why depending on how the cell is distributed in Z the range of the plans acquired vary.

      Also, the slice thickness is inadequate to cover the entire microglia morphology. How do the authors include this limitation of their strategy? Did the authors define a cut-off for incomplete microglia? 

      We found that 30 µm sections provide an effective balance, capturing entire or nearly entire microglial cells (consistent with what we observe in vivo) while allowing sufficient antibody penetration to ensure strong signal quality, even at the section's center. In our segmentation process, we excluded microglia located near the section edges (i.e., cells with processes visible on the first or last plane of image acquisition, as well as those close to the field of view’s boundary). Although our analysis pipeline should also function with thicker sections (>30 µm), we confirmed that thinner sections (15 µm or less) are inadequate for detecting morphological differences, as tested initially on the AD model. Segmented, incomplete microglia lack the necessary structural information to accurately reflect morphological differences thus impairing the detection of existing morphological differences.

      c) The manuscript outlines that the authors have used different preprocessing pipelines, which is great for being transparent about this process. Yet, it would be relevant to provide a rationale for the different imaging processing and segmentation pipelines and platform usages (Supplementary Figure 7). For example, it is not clear why the Z maximum projection is performed at the end for the Alzheimer's Disease model, while it's done at the beginning of the others.

      The same holds through for cropping, filter values, etc. Would it be possible to analyze the images with the same pipelines and compare whether a specific pipeline should be preferable to others?

      The pre-processing steps depend on the quality of the images in each dataset. For example, in the AD dataset, images acquired with a wide-field microscope were considerably noisier compared to those obtained via confocal microscopy. In this case, reducing noise plane-by-plane was more effective than applying noise reduction on a Z-projection, as we would typically do for confocal images. Given that accurate segmentation is essential for reliable analysis in MorphoCellSorter, we chose to tailor the segmentation approach for each dataset individually. We recommend future users of MorphoCellSorter take a similar approach. This clarification has been added to the discussion.

      On a note, Matlab is not open-access, 

      This is correct. We are currently translating this Matlab script in Python, this will be available soon on Github. 

      https://github.com/Pascuallab/MorphCellSorter.

      This also includes combining the different animals to see which insights could be gained using the proposed pipelines.

      Because of what we have been explaining earlier, having a common segmentation process for very diverse types of acquisitions (magnification, resolution and type of images) is not optimal in terms of segmentation and accuracy in the analysis. Although we could feed MorphoCellSorter with all this data from a unique segmentation pipeline, the results might be very difficult to interprete.

      d) L227: Performing manual thresholding isn't ideal because it implies the preprocessing could be improved. Additionally, it is important to consider that morphology may vary depending on the thresholding parameters. Comparing different acquisitions that have been binarized using different criteria could introduce biases.

      As noted earlier, segmentation is not the main focus of this paper, and we leave it to users to select the segmentation method best suited to their datasets. Although, we acknowledge that automated thresholding would be in theory ideal, we were confronted toimage acquisitions that were notuniform, even within the same sample. For instance, in ischemic brain samples, lipofuscin from cell death introduces background noise that can artificially impact threshold levels. We tested global and local algorithms to automatically binarize the cells but these approaches resulted often on imperfect and not optimized segmentation for every cell. In our experience, manually adjusting the threshold provides a more accurate, reliable, and comparable selection of cellular elements, even though it introduces some subjectivity. To ensure consistency in segmentation, we recommend that the same person performs the analysis across all conditions. This clarification has been added to the discussion.

      e) Parameter choices: L375: When using k-means clustering, it is good practice to determine the number of clusters (k) using silhouette or elbow scores. Simply selecting a value of k based on its previous usage in the literature is not rigorous, as the optimal number of clusters depends on the specific data structure. If they are seeking a more objective clustering approach, they could also consider employing other unsupervised techniques, (e.g. HDBSCAN) (L403f).

      We do agree with the referee’s comment but the purpose of the k-mean we used was just to illustrate the fact that the clusters generated are artificial and do not correspond to the reality of the continuum of microglia morphology. In the course of the study we used the elbow score to determine the k means but this did not work well because no clear elbow was visible in some datasets (probably because of the continuum of microglia morphologies). Anyway, using whatever k value will not change the problem that those clusters are quite artificial and that the boundaries of those clusters are quite arbitrary whatever the way k is determined manually or mathematically.

      L373: A rationale for the choice of the 20 non-dimensional parameters as well as a detailed explanation of their computation such as the skeleton process ratio is missing. Also, how strongly correlated are those parameters, and how might this correlation bias the data outcomes?

      Thank you for raising this point. There is no specific rationale beyond our goal of being as exhaustive as possible, incorporating most of the parameters found in the literature, as well as some additional ones that we believed could provide a more thorough description of microglial morphology.

      Indeed, some of these parameters are correlated. Initially, we considered this might be problematic, but we quickly found that these correlations essentially act as factors that help assign more weight to certain parameters, reflecting their likely greater importance in a given dataset. Rather than being a limitation, the correlated parameters actually enhance the ranking. We tested removing some of these parameters in earlier versions of MorphoCellSorter, and found that doing so reduced the accuracy of the tool.

      Differences between circularity and roundness factors are not coming across and require further clarification. 

      These are two distinct ways of characterizing morphological complexity, and we borrowed these parameters and kept the name from the existing literature, not necessarily in the context of microglia. In our case, these parameters are used to describe the overall shape of the cell. The advantage of using different metrics to calculate similar parameters is that, depending on the dataset, one method may be better suited to capture specific morphological features of a given dataset. MorphoCellSorter selects the parameter that best explains the greatest dispersion in the data, allowing for a more accurate characterization of the morphology.

      One is applied to the soma and the other to the cell, but why is neither circularity nor loudness factor applied to both?

      None of the parameters concern the cell body by itself. The cell body is always relative to another metric(s). Because these parameters and what they represent does not seem to be  very clear we will add a graphic representation of the type of measurements and measure they provide in the revised version of the manuscript.

      f) PCA analysis:

      The authors spend a lot of text to describe the basic principles of PCA. PCA is mathematically well-described and does not require such depth in the description and would be sufficient with references.

      Thank you for this comment indeed the description of PCA may be too exhaustive, we will simplify the text. 

      Furthermore, there are the following points that require attention:

      L321: PC1 is the most important part of the data could be an incorrect statement because the highest dispersion could be noise, which would not be the most relevant part of the data. Therefore, the term "important" has to be clarified.

      We are not sure in the case of segmented images the noise would represent most of the data, as by doing segmentation we also remove most of the noise, but maybe the reviewer is concerned about another type of noise? Nonetheless, we thank the reviewer for his comment and we propose the following change, that should solve this potential issue.

      “_PC_1 is the direction in which data is most dispersed.”

      L323: As before, it's not given that the first two components hold all the information.

      Thank you for this comment we modified this statement as follows: “The two first components represent most of the information (about 70%), hence we can consider the plan PC_1, PC_2 as the principal plan reducing the dataset to a two dimensional space”

      L327 and L331 contain mistakes in the nomenclature: Mix up of "wi" should be "wn" because "i" does not refer to anything. The same for "phi i = arctan(yn/wn)" should be "phi n".

      Thanks a lot for these comments. We have made the changes in the text as proposed by the reviewer.

      L348: Spearman's correlation measures monotonic correlation, not linear correlation. Either the authors used Pearson Correlation for linearity or Spearman correlation for monotonic. This needs to be clarified to avoid misunderstandings.

      Sorry for the misunderstanding, we did use Spearman correlation which is monotonic, we thus changed linear by monotonic in the text. Thanks a lot for the careful reading.

      g) If the authors find no morphological alteration, how can they ensure that the algorithm is sensitive enough to detect them? When morphologies are similar, it's harder to spot differences. In cases where morphological differences are more apparent, like stroke, classification is more straightforward.

      We are not entirely sure we fully understand the reviewer's comment. When data are similar or nearly identical, MorphoCellSorter performs comparably to human experts (see Table 1). However, the advantage of using MorphoCellSorter is that it ranks cells do.much faster while achieving accuracy similar to that of human experts AND gives them a value on an axis (andrews score), which a human expert certainly can't. For example, in the case of mouse embryos, MorphoCellSorter’s ranking was as accurate as that made by human experts. Based on this ranking, the distributions were similar, suggesting that the morphologies are generally consistent across samples.

      The algorithm itself does not detect anything—it simply ranks cells according to the provided parameters. Therefore, it is unlikely that sensitivity is an issue; the algorithm ranks the cells based on existing data. The most critical factor in the analysis is the segmentation step, which is not the focus of our paper. However, the more accurate the segmentation, the more distinct the parameters will be if actual differences exist. Thus, sensitivity concerns are more related to the quality of image acquisition or the segmentation process rather than the ranking itself. Once MorphoCellSorter receives the parameters, it ranks the cells accordingly. When cells are very similar, the ranking process becomes more complex, as reflected in the correlation values comparing expert rankings to those from MorphoCellSorter (Table 1). 

      Moreover, MorphoCellSorter does not only provide a ranking: the morphological indexes automatically computed offer useful information to compare the cells’ morphology between groups.

      h) Minor aspects:

      % notation requires to include (weight/volume) annotation.

      This has been done in the revised version of the manuscript

      Citation/source of the different mouse lines should be included in the method sections (e.g. L117).

      The reference of the mouse line has been added (RRID:IMSR_JAX:005582) to the revised version of the manuscript.

      L125: The length of the single housing should be specified to ensure no variability in this context.

      The mice were kept 24h00 individually, this is now stated in the text

      L673: Typo to the reference to the figure.

      This has been corrected, thank you for your thoughtful reading.

    1. Author response:

      We thank the editor and the reviewers for the positive evaluation of our manuscript and the thoughtful comments. Below we provide a provisional reply to the reviewers’ comments, which we will address in more detail in the revised manuscript.

      Reviewer 1 highlights three important alternative interpretations of our results: (1) sustained suppression, (2) enhancement followed by suppression, and (3) priming. We believe that these alternatives need to be addressed to improve the conclusions we can draw from the available data.

      (1) Sustained suppression: As outlined by R1, it is possible that participants suppressed the HPDL throughout the entire experiment, instead of proactively instantiating suppression on each trial. While possible, we believe that this account is unlikely to explain the present results, given the utilized analysis approach, a voxel-wise GLM fit to the BOLD data per run (see Materials and Methods for details). Specifically, we derived parameter estimates from this GLM per location to estimate the relative suppression. Sustained suppression would modulate BOLD responses throughout the run, i.e. also during the implicit baseline period used to estimate the contrast parameter estimates. Hence, a sustained suppression should not result in a differential modulation between locations, as the BOLD response at the HPDL during the baseline period would be equally suppressed as during the trial. We will discuss this important aspect in the revised manuscript.

      (2) Enhancement followed by suppression: R1 correctly points out that BOLD data, given the poor temporal resolution, do not allow for the detection of potential transient enhancements at the HPDL followed by a later and more pronounced suppression (akin to “search and destroy”). We agree with this assessment. However, we would also argue that a transient enhancement followed by sustained suppression before search onset constitutes proactive suppression in line with our interpretation, because suppression would still arise proactively (i.e., before search and hence distractor onset). Whether brief enhancement precedes suppression cannot be elucidated by our data, but we believe that it constitutes an interesting avenue for future studies using time-resolved and spatially specific recording methods. We will address this important addition in the updated manuscript.

      (3) Priming: It is possible that participants particularly suppress locations which on previous trials contained a distractor. This account constitutes a different perspective than statistical learning integrating across many trials. We believe that it is likely that both accounts contribute to the observed effect to some degree, as both the distant (but often repeated) and the most recent past should inform our priors. Indeed, arguably recent trials should be particularly informative for our predictions as natural environments vary across time, and hence the statistical learning system should remain sensitive to potential changes in the environment. In short, we agree with R1 that the n-1 trial may impact suppression, and therefore charting the potential contributions of this type of priming compared to statistical learning is a relevant addition to the manuscript. We will perform the suggested analysis; however, we also note that dividing trials based on the n-1 trial will significantly reduce the reliability of the parameter estimates (e.g. only ~1/3 of trials follow omissions).

      Reviewer 2 had two valuable suggestions to advance the inferences we can draw from the available data. In particular, R2 proposed two additional analyses, which we will consider during revision.

      First, R2 suggests separating the utilized early visual cortex (EVC) ROI mask into the three retinotopic areas comprising EVC (V1, V2, V3) and to perform the key analyses in surface space for each ROI separately. We agree that exploring distractor suppression across V1, V2 and V3 separately is an interesting extension to our results. Our reasoning to combine early visual areas into one mask was two-fold: First, we did not have an a priori reason to expected distinct neural suppression between these early ROIs. Therefore, we did not acquire retinotopy data to reliably separate V1, V2 and V3, instead opting to increase the number of search task trials. The lack of retinotopy data naturally limits the reliability of the resulting cortical segmentation. However, we believe that separating EVC into its constituent areas using anatomical data is nonetheless a promising addition to our primary analyses. Therefore, during revision we will explore the main suppression analyses split into V1, V2, and V3.

      Second, R2 highlights that behavioral facilitation and neural suppression could be correlated across participants. The rationale is that should neural suppression in EVC relate to the facilitation of behavioral responses, we may expect a positive relationship between neural suppression at the HPDL and RTs across participants. We agree with R2’s suggestion and will perform the analysis accordingly. However, we note that any results should be interpreted with caution, as the present sample size of n=28 is small for an across participant correlation analysis involving neural and behavioral difference scores.

      In summary, we believe that addressing the reviewers' suggestions will substantially improve our manuscript, particularly regarding the interpretation and scope of our findings.

    1. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      The manuscript presents a significant and rigorous investigation into the role of CHMP5 in regulating bone formation and cellular senescence. The study provides compelling evidence that CHMP5 is essential for maintaining endolysosomal function and controlling mitochondrial ROS levels, thereby preventing the senescence of skeletal progenitor cells. 

      Strengths: 

      The authors demonstrate that the deletion of Chmp5 results in endolysosomal dysfunction, elevated mitochondrial ROS, and ultimately enhanced bone formation through both autonomous and paracrine mechanisms. The innovative use of senolytic drugs to ameliorate musculoskeletal abnormalities in Chmp5-deficient mice is a novel and critical finding, suggesting potential therapeutic strategies for musculoskeletal disorders linked to endolysosomal dysfunction. 

      Weaknesses: 

      The manuscript requires a deeper discussion or exploration of CHMP5's roles and a more refined analysis of senolytic drug specificity and effects. This would greatly enhance the comprehensiveness and clarity of the manuscript. 

      We thank the reviewer for these insightful comments. The tissue-specific roles of CHMP5 and the specificity of quercetin and dasatinib treatments in Chmp5-deficient mice will be further discussed and clarified in the revised manuscript. 

      Reviewer #2 (Public review): 

      Summary: 

      The authors try to show the importance of CHMP5 for skeletal development. 

      Strengths: 

      The findings of this manuscript are interesting. The mouse phenotypes are well done and are of interest to a broader (bone) field. 

      Weaknesses: 

      The mechanistic insights are mediocre, and the cellular senescence aspect poor. 

      In total, it has not been shown that there are actual senescent cells that are reduced after D+Q-treatment. These statements need to be scaled back substantially. 

      We thank the reviewer for these suggestive comments. Although multiple hallmarks of cell senescence were shown in CHMP5-deficient skeletal progenitors, we will detect and add additional markers of cell senescence in the revised manuscript. 

      In addition, the effects and specificity of the Q+D treatment will be further discussed and clarified with the revision.

      Reviewer #3 (Public review): 

      Summary: 

      In this study, Zhang et al. reported that CHMP5 restricts bone formation by controlling endolysosome-mitochondrion-mediated cell senescence. The effects of CHMP5 on osteoclastic bone resorption and bone turnover have been reported previously (PMID: 26195726), in which study the aberrant bone phenotype was observed in the CHMP5ctsk-CKO mouse model, using the same mouse model, Zhang et al., report a novel role of CHMP5 on osteogenesis through affecting cell senescence. Overall, it is an interesting study and provides new insights in the field of cell senescence and bone. 

      Strengths: 

      Analyzed the bone phenotype OF CHMP5-periskeletal progenitor-CKO mouse model and found the novel role of senescent cells on osteogenesis and migration. 

      Weaknesses: 

      (1) There are a lot of papers that have reported that senescence impairs osteogenesis of skeletal stem cells. In this study, the author claimed that Chmp5 deficiency induces skeletal progennitor cell senescence and enhanced osteogenesis. Can the authors explain the controversial results? 

      Different skeletal stem cell populations in time and space have been identified and reported. This study shows that Chmp5 deficiency in periskeletal and endosteal skeletal progenitors causes cell senescence and aberrant bone formation. Although cell senescence during aging can impair osteogenesis of certain skeletal stem cells, which contributes to diseases with low bone mass such as osteoporosis, aging can also increase heterotopic mineralization/calcification in musculoskeletal soft tissues such as ligaments and tendons, which is consistent with our results in this study. These reflect out-of-order musculoskeletal mineralization during aging. We will expand the discussion and clarify the results of CHMP5-regulated cell senescence in osteogenesis in the revised manuscript.

      (2) Co-culture of Chmp5-KO periskeletal progenitors with WT ones should be conducted to detect the migration and osteogenesis of WT cells in response to Chmp5-KO-induced senescent cells. In addition, the co-culture of WT periskeletal progenitors with senescent cells induced by H2O2, radiation, or from aged mice would provide more information.

      Increased osteogenesis of WT skeletal progenitors in the periskeletal lesion was shown to be a paracrine mechanism of abnormal bone formation in Chmp5Ctsk mice. The coculture experiment will help confirm the effect of Chmp5-deficient skeletal progenitors on the osteogenesis of neighboring WT skeletal progenitors.

      Notably, the cause and outcome of cell senescence are highly heterogeneous, and different causes of cell senescence can cause significantly different outcomes. Although the coculture of WT periskeletal progenitors with senescent cells induced by H2O2, radiation, or from aged mice would be very interesting, these are beyond the scope of the current study.

      (3) Many EVs were secreted from Chmp5-deleted periskeletal progenitors, compared to the rarely detected EVs around WT cells. Since EVs of BMSCs or osteoprogenitors show strong effects of promoting osteogenesis, did the EVs contribute to the enhanced osteogenesis induced by Chmp5-defeciency? 

      The WT skeletal progenitor cells from Chmp5Ctsk mice have an increased capacity of osteogenesis compared to the corresponding cells from control animals, suggesting that the EVs of the Chmp5-deleted periskeletal progenitors could promote osteogenesis of the WT skeletal progenitors, which represents a paracrine mechanism of abnormal bone formation in Chmp5 deficient animals. We will discuss and clarify these results in the revised manuscript.

      (4) EVs secreted from senescent cells propagate senescence and impair osteogenesis, why do EVs secreted from senescent cells induced by Chmp5-defeciency have opposite effects on osteogenesis? 

      The question is similar to comment #1. The functional heterogeneity of cellular senescence will be discussed in further detail and clarified in the revised manuscript.

      (5) The Chmp5-ctsk mice show accelerated aging-related phenotypes, such as hair loss and joint stiffness. Did Ctsk also label cells in hair follicles or joint tissue? 

      This is an interesting question. Although we did not check the expression of CHMP5 in hair follicles, which is outside the scope of the present study, the result in Fig. 1E showed the expression of CHMP5 in joint ligaments. Notably, abnormal periskeletal bone formation occurs predominantly at the joint ligament insertion site in Chmp5Ctsk mice, which will be elucidated and discussed in the revised manuscript.

      (6) Fifteen proteins were found to increase and five proteins to decrease in the cell supernatant of Chmp5Ctsk periskeletal progenitors. How about SASP factors in the secretory profile? 

      As mentioned above, the SASP phenotype and related factors of senescent cells could be highly heterogeneous depending on inducers, cell types, and timing of senescence. Most of the proteins we identified in the secretome analysis have previously been reported in the secretory profile of osteoblasts. Although we were also interested in the change of some common SASP factors, such as inflammatory cytokines, the experiment did not detect these factors because of their small molecular weights and the technical limitations of mass spec analysis. 

      (7) D+Q treatment mitigates musculoskeletal pathologies in Chmp5 conditional knockout mice. In the previously published paper (CHMP5 controls bone turnover rates by dampening NF-κB activity in osteoclasts), inhibition of osteoclastic bone resorption rescues the aberrant bone phenotype of the Chmp5 conditional knockout mice. Whether the effects of D+Q on bone overgrowth is because of the inhibition of bone resorption? 

      Although in Chmp5Ctsk mice we cannot exclude the effect of D+Q on osteoclasts, the effect of D+Q on osteoblast lineage cells, which is the focus of the current study, was verified in Chmp5Dmp1 mice. We will expand the discussion and make these results clearer with the revision.

      (8) The role of VPS4A in cell senescence should be measured to support the conclusion that CHMP5 regulates osteogenesis by affecting cell senescence. 

      We agree that additional experiments examining the role of VPS4A in cell senescence will provide more mechanistic insights. The focus of the current study is to report that CHMP5 restricts abnormal bone formation by preventing endolysosome-mitochondrion-mediated cell senescence. The roles of VPS4A in cell senescence and skeletal biology will be explored in separate studies.

      (9) Cell senescence with markers, such as p21 and H2AX, co-stained with GFP should be performed in the mouse models to indicate the effects of Chmp5 on cell senescence in vivo. 

      We will examine additional markers of cell senescence, as the reviewers suggest, in the revised manuscript.

      (10) ADTC5 cell as osteochondromas cells line, is not a good cell model of periskeletal progenitors. Maybe primary periskeletal progenitor cell is a better choice. 

      We were aware that ATDC5 cells are typically used as a chondrocyte progenitor cell line. However, our previous study showed that ATDC5 cells could also be used as a reasonable cell model for periskeletal progenitors. Furthermore, the corresponding results from primary periskeletal progenitors were shown. We will further clarify this in the revision.

      In general, the comments of these reviewers will help clarify our results and further strengthen our conclusion. We will address these comments and questions point to point in more detail in the revised manuscript.

    1. Author response:

      We sincerely thank the reviewers for their constructive feedback and the editor for facilitating this thorough review. We found the suggestions insightful and valuable for refining our manuscript.  We would like to clarify a few points in an initial response before presenting the fully updated manuscript. First of all, we would like to emphasize the multi-scale nature of our approach, where we derived insights from both atomistic and coarse-grained simulations. Reviewers focused mostly on the coarse-grained simulations, the drawbacks of which we are aware and were a strong motivation for starting with the atomistic approach. Reviewer 1 mentioned a lack of a proposed mechanism for the increased condensate forming propensity at 300K vs. 290K, and we feel we had clearly pointed to the aromatic contacts as a mechanism for this, but we will make sure to clarify this further in the revision. Furthermore, reviewer 1 was critical of our use of the 10% adjustment to Martini protein-water interactions, which has previously been thoroughly presented and assessed in the literature (see for example Tesei et al JCTC 2022). Furthermore, for our specific system we were encouraged by the favorable comparison of our Martini simulations to the atomistic simulations, e.g. for radius of gyration, contact propensity, and solvent accessibility. We will make sure to emphasize this more clearly in the revision. Finally, we are grateful for the feedback from both reviewers and will use their comments as a guide to incorporate additional analyses and extended simulations to strengthen our conclusions in an upcoming revision.

    1. Author response:

      We thank the reviewers for their thoughtful comments. 

      Based on their suggestions we will: 

      (1) Use more accurate language to describe the hypothalamus regions under investigation in this study. While we aimed to primarily investigate the medial preoptic area (MPOA), our dissections and sequencing data in fact capture several regions of the anterior hypothalamus including the anteroventral periventricular (AVPV), paraventricular (PVN), supraoptic (SON), suprachiasmatic nuclei (SCN), and more. We will revise the language in our manuscript to reflect that our study in fact investigates the cellular evolution of the anterior hypothalamus across behaviorally divergent deer mice.

      (2) Revise our language to clarify that while our study provides a rich dataset for generating hypotheses about which cell types may contribute to behavioral differences, it does not provide any evidence of causal relationships. We hope to investigate this further in future work.

      (3) Clarify specific methodological choices for which reviewers had questions, especially about the hypothalamic regions for which we did histology to validate cell abundance differences and methodological choices related to mapping our cell clusters to Mus cell types.

      Our responses to each reviewer’s specific comments are below.

      Reviewer #1:

      The major limitation of the study is the absence of causal experiments linking the observed changes in MPOA cell types to species-specific social behaviors. While the study provides valuable correlational data, it lacks functional experiments that would demonstrate a direct relationship between the neuronal differences and behavior. For instance, manipulating these cell types or gene expressions in vivo and observing their effects on behavior would have strengthened the conclusions, although I certainly appreciate the difficulty in this, especially in non-musculus mice. Without such experiments, the study remains speculative about how these neuronal differences contribute to the evolution of social behaviors.

      Yes, we agree the study lacks functional experiments. We hope that the dataset is of value for generating hypotheses about how hypothalamic neuronal cell types may govern species-specific social behaviors, and for these hypotheses to be functionally tested by us and others in future work.

      Reviewer #2:

      Some methodology could be further explained, like the decision of a 15% cutoff value for cell type assignment per cluster, or the necessity of a multi-step analysis pipeline for gene enrichment studies.

      A 15% cutoff value for cell type assignment was chosen to include all known homology correspondences between our dataset and the Mus atlas. For example, i14:Avp/Cck cells from the Mus atlas represent Avp cells from the suprachiasmatic nuclei (SCN). Though only 17.3% of cluster 15 maps to i14:Avp/Cck, we know these two clusters correspond based on the expression of Avp and additional SCN marker genes in cluster 15 (Supp Fig 6). We will further explain this cutoff in the revised manuscript.

      Our gene enrichment study includes a multi-step analysis pipeline because we wanted to control for confounders that may be introduced because of gene expression level. Genes that are more highly expressed are more accurately quantified and thus more likely to be identified as differentially expressed. Therefore, we wanted to test for gene enrichments in our set of DE genes against a background of genes with similar expression levels. We will clarify this motivation in the revised manuscript.

      The authors should exercise strong caution in making inferences about these differences being the basis of parental behavior. It is possible, given connections to relevant research, but without direct intervention, direct claims should be avoided. There should be clear distinctions of what to conclude and what to propose as possibilities for future research.

      Yes, we agree that we are unable to make direct claims about neuronal differences being the basis of parental behavior. We will revise our language to be clearer about which relationships we are hypothesizing and what we propose as possibilities for future research.

      Histology is not performed on all regions included in the sequencing analysis.

      We apologize that our language describing the hypothalamic regions included in the sequencing analysis and those included in the histology is unclear. We aimed to dissect the medial preoptic region for the sequencing analysis, but additionally captured parts of the anterior hypothalamus including the paraventricular (PVN), supraoptic (SON), and suprachiasmatic nuclei (SCN), and more.  Our histology was performed across the entire hypothalamus and includes all regions included in the sequencing data. We will revise the manuscript to more accurately describe the hypothalamic regions for which we investigated.

      Reviewer #3:

      My primary concern is that the dataset is limited: 52,121 neuronal nuclei across 24 samples, which does not provide many cells per cluster to analyze comparatively across sex and species, particularly given the heterogeneity of the region dissected. The Supplementary table reports lower UMIs/genes per cell than is typically seen as well. Perhaps additional information could be obtained from the data by not restricting the analyses to cells that can be assigned to Mus types. A direct comparison of the two Peromyscus species could be valuable as would a more complete Peromyscus POA atlas.

      Our dataset reports ~1,500 genes and ~1,000 UMIs per nuclei which is indeed lower than is typically reported in other single nuclei datasets. Some of this discrepancy is due to a lower quality genome and annotated transcriptome available for Peromyscus compared to Mus musculus, which results in a lower mapping rate than is typically reported in Mus studies. However, our dataset was sufficient to identify known peptidergic cell types (Supp Fig 6) and to map homology to Mus cell types for 34 (64%) of our 53 clusters. Additionally, although some of our clusters contain small numbers of cells, our differential abundance analysis accounts for the variance in cell numbers observed across samples and should be robust against any increase in variance due to small numbers. In fact, even differential abundance of very small cell clusters such as oxytocin neurons (cell type 40) was validated by histology. 

      We would like to clarify that all analyses were performed on all cell clusters, regardless of whether or not they could be assigned homology to a Mus cell type. All the cell types that we identified as differentially abundant or contained significant sex differences happened to be cell types for which homology to a Mus cell type could be defined. This may arise for a relatively uninteresting reason: cell types that have more distinct transcriptional signatures will be more accurately clustered, leading to more accurate identification of homology as well as more accurate measurements of differential abundance / expression. We will revise language to make this more clear in our manuscript.

      In Supplement 7, it appears that most neurons can be assigned as excitatory or inhibitory, but then so many of these cells remain in the unassigned "gray blob" seen in panel 1E. Clustering of excitatory and inhibitory neurons separately, as in prior cited work in Mus POA (refs 31 and 57) may boost statistical power to detect sex and species differences in cell types. Perhaps the cells that cannot be assigned to Mus contain too few reads to be useful, in which case they should be filtered out in the QC. The technical challenges of a comparative single-cell approach are considerable, so it benefits the scientific community to provide transparency about them.

      We are not certain about why we are unable to cluster and assign homology to many of our cells (i.e. cells in the unassigned “gray blob”). However, we note that even in the Mus atlas, many cells did not belong to obvious clusters by UMAP visualization and that several clusters lacked notable marker genes and were designated simply as “Gaba” and “Glut” clusters. Therefore, it is unsurprising that our own dataset also contains cells that lack the transcriptional signatures needed to be clustered and/or mapped to Mus cell types. We do know, however, that the median number of reads/nuclei is uniform across cell clusters and does not explain why some clusters could not be assigned to Mus. We will add this information to our revised manuscript. 

      We do not think that a two-stage clustering (i.e. clustering first by excitatory vs. inhibitory neurons) is expected to gain power to resolve cell types in this case. Excitatory vs. inhibitory neurons are clearly separable on our UMAP (Supp Fig 7) so that information is already being used by our clustering procedure. However, we will explore this further in our revised manuscript to see if doing so will boost statistical power.

      The Calb1 dimorphism as observed by immunostaining, appears much more extensive in P. maniculatus compared to P. polionotus (Figures 3 E and F). This finding is not reflected in the counts of the i20:Gal/Moxd1 cluster. The use of Calb1 staining as a proxy for the Gal/Moxd1 cluster would be strengthened if the number of POA Calb1+ neurons that are found in each cluster was apparent. There may be additional Calb+ neurons in the cells that are not annotated to a Mus cluster. This clarification would add support to the overall conclusion that there is reduced sexual dimorphism in P. polionotus.

      From the Mus MPOA atlas (which includes both single-cell sequencing data and imaging-based spatial information), it is known that the i20:Gal/Moxd1 cluster comprises sexually dimorphic cells that make up both the BNST and the SDN-POA. These sexually dimorphic cells are well-studied and known to be marked by Calb1, which we used in immunostaining as a proxy for i20:Gal/Moxd1. 

      However, we would like to clarify that in our study, the immunostaining of Calb1+ neurons and the sequencing counts of the i20:Gal/Moxd1 cluster are not completely reflective of each other because our sequencing dataset only captured the ventral portion of the BNST. Therefore our i20:Gal/Moxd1 counts contain a combination of some Calb1+ BNST cells and likely all Calb1+ SDN-POA cells and is difficult to interpret on its own. Our histology, however, covers the entire hypothalamus and is more reliable for identifying sex and species differences in each region. We will clarify this in the revised manuscript. 

      The relationship between the sex steroid receptor expression and the sex bias in gene expression would be improved if the sex bias in sex steroid receptor expression was included in Supplementary Figure 10.

      We will include this in the revised manuscript. 

      There is no explanation for the finding that there is a female bias in gene expression across all cell types in P. polionotus.

      We also find this observation interesting but don’t have a good explanation for why at this point. We plan to follow this up in future work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      We thank Reviewer #1 for the relevant and insightful comments on our paper. Please find our detailed answers below in the Recommendations to the Authors section.

      Summary: 

      The researchers examined how individuals who were born blind or lost their vision early in life process information, specifically focusing on the decoding of Braille characters. They explored the transition of Braille character information from tactile sensory inputs, based on which hand was used for reading, to perceptual representations that are not dependent on the reading hand. 

      They identified tactile sensory representations in areas responsible for touch processing and perceptual representations in brain regions typically involved in visual reading, with the lateral occipital complex serving as a pivotal "hinge" region between them.

      In terms of temporal information processing, they discovered that tactile sensory representations occur prior to cognitive-perceptual representations. The researchers suggest that this pattern indicates that even in situations of significant brain adaptability, there is a consistent chronological progression from sensory to cognitive processing. 

      Strengths: 

      By combining fMRI and EEG, and focusing on the diagnostic case of Braille reading, the paper provides an integrated view of the transformation processing from sensation to perception in the visually deprived brain. Such a multimodal approach is still rare in the study of human brain plasticity and allows us to discern the nature of information processing in blind people's early visual cortex, as well as the time course of information processing in a situation of significant brain adaptability. 

      Weaknesses: 

      The lack of a sighted control group limits the interpretations of the results in terms of profound cortical reorganization, or simple unmasking of the architectural potentials already present in the normally developing brain. 

      We thank the reviewer for raising this important point! We acknowledge that our claims regarding the unmasking of architectural potentials in both the normally developing and visually deprived brain are limited by the study design we employed. However, we note that defining an appropriate control group and assessing non-visual reading in sighted participants is far from straightforward. We discuss these issues in our response to the Public Review of Reviewer 2.

      Moreover, the conclusions regarding the behavioral relevance of the sensory and perceptual representations in the putatively reorganized brain are limited due to the behavioral measurements adopted.

      We agree with the reviewer that the relation between behavior and neural representations as established via perceived similarity judgments are task-dependent, and that a richer assessment of behavior would be valuable. Please note, however, that this limitation pertains to any experimental task used to assess behavior in the laboratory. Our major goal was to assess whether the identified neural representations are suitably formatted to be used by the brain for at least one behavior rather than being epiphenomenal. We found that the representations are suitably formatted for similarity judgments, thus establishing that they are relevant for at least this behavior. We also argue that judging similarity is a complex task that may underlie many other relevant behaviors. We discuss this point further in response to the Recommendations to the Authors.

      Reviewer #2 (Public Review): 

      We thank the reviewer for the considerate and thoughtful suggestions. Please find a detailed description of the implemented changes below.

      Summary: 

      Haupt and colleagues performed a well-designed study to test the spatial and temporal gradient of perceiving braille letters in blind individuals. Using cross-hand decoding of the read letters, and comparing it to the decoding of the read letter for each hand, they defined perceptual and sensory responses. Then they compared where (using fMRI) and when (using EEG) these were decodable. Using fMRI, they showed that low-level tactile responses specific to each hand are decodable from the primary and secondary somatosensory cortex as well as from IPS subregions, the insula, and LOC. In contrast, more abstract representations of the braille letter independent from the reading hand were decodable from several visual ROIs, LOC, VWFA, and surprisingly also EVC. Using a parallel EEG design, they showed that sensory hand-specific responses emerge in time before perceptual braille letter representations. Last, they used RSA to show that the behavioral similarity of the letter pairs correlates to the neural signal of both fMRI (for the perceptual decoding, in visual and ventral ROIs) and EEG (for both sensory and perceptual decoding). 

      Strengths: 

      This is a very well-designed study and it is analyzed well. The writing clearly describes the analyses and results. Overall, the study provides convincing evidence from EEG and fMRI that the decoding of letter identity across the reading hand occurs in the visual cortex in blindness. Further, it addresses important questions about the visual cortex hierarchy in blindness (whether it parallels that of the sighted brain or is inverted) and its link to braille reading. 

      Weaknesses: 

      Although I have some comments and requests for clarification about the details of the methods, my main comment is that the manuscript could benefit from expanding its discussion. Specifically, I'd appreciate the authors drawing clearer theoretical conclusions about what this data suggests about the direction of information flow in the reorganized visual system in blindness, the role VWFA plays in blindness (revised from the original sighted role or similar to it?), how information arrives to the visual cortex, and what the authors' predictions would be if a parallel experiment would be carried out in sighted people (is this a multisensory recruitment or reorganization?). The data has the potential to speak to a lot of questions about the scope of brain plasticity, and that would interest broad audiences. 

      We thank the reviewer for the opportunity to provide clearer theoretical conclusions from our data. We elaborate on each of the points raised by the reviewer in the discussion section.

      Concerning the direction of information flow in the reorganized visual system in blindness, we focus on information arrival to EVC and information flow beyond EVC.

      p. 11, ll. 376-386, Discussion 4.1:

      “Overall, identifying braille letter representations in widespread brain areas raises the question of how information flow is organized in the visually deprived brain. Functional connectivity studies report deprivation-driven changes of thalamo-cortical connections which could explain both arrival of information to and further flow of information beyond EVC. First, the coexistence of early thalamic connections to both S1 and V1 (Müller et al., 2019) would enable EVC to receive from different sources and at different timepoints. Second, potentially overlapping connections from both sensory cortices to other visual or parietal areas (Ioannides et al., 2013) could enable the visually deprived brain to process information in a widespread and interconnected array of brain areas. In such a network architecture, several brain areas receive and forward information at the same time. In contrast to information discretely traveling from one processing unit to the next in the sighted brain’s processing cascade, we can rather picture information flowing in a spatially and functionally more distributed and overlapping fashion.”

      Regarding the role of VWFA, we propose that the functional organization of VWFA is modality-independent.

      p. 10, ll. 346-348, Discussion 4.1:

      “Second, we found that VWFA contains perceptual but not sensory braille letter representations. By clarifying the representational format of language representations in VWFA, our results support previous findings of the VWFA being functionally selective for letter and word stimuli in the visually deprived brain (Reich et al., 2011; Striem-Amit et al., 2012; Liu et al., 2023). Together, these findings suggest that the functional organization of the VWFA is modality-independent (Reich et al., 2011), depicting an important contribution to the ongoing debate on how visual experience shapes representations along the ventral stream (Bedny et al., 2021).” Lastly, we would like to share our thoughts about carrying out a parallel experiment in sighted people. 

      In general, we agree that it seems insightful to conduct a parallel, analogous experiment in sighted participants with the aim to disentangle whether the effects seen in blind participants are due to multisensory recruitment or reorganization. However, before making predictions regarding the outcome, we would have to define an analogous experiment in sighted participants that taps into the same mechanisms. This, however, is difficult to do as it is unclear what counts as analogous. For example, if we compare braille reading to reading visually presented braille dot arrays or Roman letters, we will assess visual object processing, a different mechanism from that involved in braille reading. Alternatively, if we compare braille reading to sighted participants reading embossed Roman letters haptically or ideally even reading Braille after extensive training, we still face the inherent problem that sighted participants have visual experiences and could use visual imagery strategies in these nonvisual tasks. As we cannot experimentally ensure that sighted participants do not use visual strategies to solve a task, this would always complicate drawing conclusions about the underlying processes. More specifically, we could never pinpoint whether differences between sighted and blind participants are due to measuring different mechanisms or measuring the same mechanism and unravelling underlying changes (i.e., multisensory recruitment or reorganization). Finally, apart from potential confounds due to visual imagery, considering populations of sighted readers and Braille readers as only differing with regard to their input modality and otherwise being comparable is problematic: In general, blind populations are more heterogenous than most typical samples due to various factors such as aetiologies, onset and severity (Merabet & Pascual-Leone, 2010). Even when carrying out studies in highly specific population subsamples, such as in congenitally blind braille readers, vast within-group differences remain, e.g., the quality and quantity of their braille education, as well as across braille and print readers, e.g., different passive exposure to braille versus written letters during childhood (Englebretson et al., 2023). Hence, to fully match the groups in terms of learning experience we would, for example, have to teach sighted infants braille reading in childhood and follow them up until a comparable age. This approach does not seem feasible. 

      p. 10, ll. 328-341, Discussion 4.1:

      “We note that our findings contribute additional evidence but cannot conclusively distinguish between the competing hypotheses that visually deprived brains dynamically adjust to the environmental constraints versus that they undergo a profound cortical reorganization. Resolving this debate would require an analogous experiment in sighted people which taps into the same mechanisms as the present study. Defining a suitable control experiment is, however, difficult. Any other type of reading would likely tap into different mechanism than braille reading. Further, whenever sighted participants are asked to perform a haptic reading task, outcomes can be confounded by visual imagery driving visual cortex (Dijkstra et al., 2019). Thus, the results would remain ambiguous as to whether observed differences between the groups index different mechanisms or plastic changes in the same mechanisms. Last, matching groups of sighted readers and braille readers such that they only differ with regard to their input modality seems practically unfeasible: There are vast differences within the blind population in general, e.g., aetiologies, onset and severity, and the subsample of congenitally blind braille readers more specifically, e.g., the quality and quantity of their braille education, as well as across braille and print readers, e.g., different passive exposure to braille versus written letters during childhood (Englebretson et al., 2023; Merabet & Pascual-Leone, 2010).”

      While we appreciate that the conclusions we can draw from our results are limited by our sample and defining an appropriate parallel experiment in sighted participants is difficult for the reasons discussed above, we would still like to share our speculations regarding the process underlying our result pattern. We think that our results, taken together with results of previous studies, suggest that EVC does not undergo fundamental reorganization in the case of visual deprivation. Rather, it can flexibly adjust to given processing requirements. This flexibility is not infinite; adjustments are limited by the area’s architectural and computational capacity. Importantly, we think that this claim refers to an unmasking of preexisting potential rather than multisensory recruitment.

      To aid in drawing even more concrete conclusions about the flow of information, I suggest that the authors also add at least another early visual ROI to plot more clearly whether EVC's response to braille letters arrives there through an inverted cortical hierarchy, intermediate stages from VWFA, or directly, as found in the sighted brain for spoken language. 

      We thank the reviewer for this comment. However, EVC here consists of V1 to V3, and we already also assess V4, LOC, VWFA and LFA. Thus, we assess regions at all levels of processing from mid- over low- to high-level and cannot add a further interim ROI. Our results using this ROI set do not allow us to arbitrate between the hypotheses raised by the reviewer.

      Similarly, it may be informative to look specifically at the occipital electrodes' time differences between decoding for the different parameters and their correlation to behavior.

      We thank the reviewer for this suggestion. However, the spatial resolution of EEG measurements is limited, and we cannot convincingly determine the neural source of signals being recorded from specific electrodes, i.e., occipital. When we reduce the number of electrodes before analysis, we primarily see comparable qualitative trends in the data albeit with a reduction in signal-to-noise-ratio.

      To illustrate, we repeated the EEG time decoding and the EEG-behavior RSA with only occipital and parieto-occipital electrodes (n=8) instead of all electrodes (n=63) and added the results to the Supplementary Material (see Supplementary Figure 3 and 4). Overall, we observe a reduction in signal-to-noise-ratio. This is not surprising given that the EEG searchlight decoding results (Figure 3b) reveal sources of the decoding signals extend beyond occipital and parieto-occipital electrodes. 

      In the EEG time decoding analysis, we see a comparable trend to the whole brain EEG analysis but do not find a significant difference in onsets of sensory and perceptual representation. 

      In the behavior-EEG RSA, we do find that the correlations between behavior and sensory representations emerge significantly earlier than correlations between behavior and perceptual representations. (N = 11, 1,000 bootstraps, one-tailed bootstrap test against zero, P< 0.001). This result is in line with the whole brain EEG analysis.

      Regarding the methods, further detail on the ability to read with both hands equally and any residual vision of the participants would be helpful.

      We thank the reviewer for raising this point. We assessed participants’ letter reading capabilities in a short screening task prior to the experiment. Participants read letters with both hands separately and we used the same presentation time as in the experiment. As the result showed that average performance for recognizing letters with the left hand (89%) and right hand (88%) were comparable. We did not measure continuous reading in the present study, and we did not assess further information about participants’ ability to read equally well with both hands. 

      While the information about the screening task was previously included in Methods section 5.3.2 EEG experiment, we now moved it into a separate section 5.3.3 Braille screening task to make the information better accessible. 

      p. 14, ll. 529-533, Methods 5.3.3:

      “Prior to the experiment, participants completed a short screening task during which each letter of the alphabet was presented for 500ms to each hand in random order. Participants were asked to verbally report the letter they had perceived to assess their reading capabilities with both hands using the same presentation time as in the experiment. The average performance for the left hand was 89% correct (SD = 10) and for the right hand it was 88% correct (SD = 13).”

      We thank the reviewer for the suggestion to include information regarding participant’s residual vision. We now added information about participants’ residual light perception to Supplementary Table 1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) ROI vs Searchlight Results: Figures 2 b and c do not seem to match. The ROI results (b) should be somehow consistent with the whole brain results (c), but "perceptual" decoding in the searchlight (in green) seems localized in sensorimotor areas while for the same classification, no sensorimotor ROI is significant. can the authors clarify this difference?

      Similarly, perceptual decoding does not emerge in EVC with the searchlight analysis, whereas is quite strong in ROI analysis.

      We agree that the results of the ROI and searchlight decoding do not show a direct match. We think that this difference is due to methodological reasons. For example, ROI decoding can be more sensitive when ROIs follow functionally relevant boundaries in the brain, in comparison to spheres used in searchlight decoding that do not. In turn, searchlight decoding may be more sensitive when information is distributed across functional boundaries that would be captured in different ROIs rather than combined, or when ROI definition is difficult (such as here in the visual system of blind participants).

      However, we point out that the primary goal of our searchlight decoding was to show that no other areas beyond our hypothesized ROIs contained braille letter representations, rather than reproducing the ROI results.

      Decoding accuracies are tested against chance (50% for pairwise classifications) according to methods. In the case of "sensory and perceptual" and "perceptual" classification, this is straightforward. In the case of the analysis that isolates "sensory" representations though the difference is computed between "sensory and perceptual" and "perceptual" decoding accuracies, the accuracies resulting from this difference should thus be centered around 0.

      Are the accuracies tested against 0 in this case? This is not specified in the methods. Furthermore, the data reported in Figure 2 and Figure 3. seem to have 0% as a baseline and the label states "decoding accuracy". Can the authors clarify whether the reported data are the difference in accuracy with an estimated empirical baseline or an expected baseline of 50%? 

      The reviewer is correct in stating that we tested “sensory and perceptual” and “perceptual” against chance level and the difference score “sensory” against 0 and that this information was missing in the methods section.

      We now specify in the methods that we are testing the accuracies for the “sensory” analysis against 0.

      p. 16, ll. 625-627, Methods 5.6:

      “We conducted subject-specific braille letter classification in two ways. First, we classified between letter pairs presented to one reading hand, i.e., we trained and tested a classifier on brain data recorded during the presentation of braille stimuli to the same hand (either the right or the left hand). This yields a measure of hand-dependent braille letter information in neural measurements. We refer to this analysis as within-hand classification. Second, we classified between letter pairs presented to different hands in that we trained a classifier on brain data recorded during the presentation of stimuli to one hand (e.g., right), and tested it on data related to the other hand (e.g., left). This yields a measure of hand-independent braille letter information in neural measurements. We refer to this analysis as across-hand classification. We tested both within-hand and across-hand pairwise classification accuracies against a chance level of 50%. We also calculated a within-across hand classification score which we compared against 0.”

      Regarding Figures 2 and 3, we plot the results as decoding accuracies minus chance level to standardize the y-axes for all three analyses, i.e., compare them to 0. We have corrected the y-axis labels accordingly. 

      In our analyses, we assumed an expected baseline of 50%. But in the response below we provide evidence that our results remain stable whether using an expected or empirical baseline.

      If my understanding is correct, a potential problem persists. The different analyses may not be comparable, because in the "sensory" analysis the baseline is empirically defined, being the classification accuracies of the "perceptual" decoding, while in the other two analyses, the baseline is set at 50%. There are suggestions in the literature to derive empirically defined baselines by randomly shuffling the trial labels and repeating the classification accuracies [grootswagers 2017]. In the context of the present work, its use will make the different statistical analyses more comparable. I would thus suggest the authors define the baseline empirically for all their analyses or, given the high computational demand of this analysis, provide evidence that the results are not affected by this difference in the baseline. 

      We thank the reviewer for raising this point. As the reviewer correctly stated, the “sensory” analysis has an empirically defined baseline because it is a difference score while in the other two analyses the baseline is set at 50%.

      To provide evidence that our results are not affected by this difference in baseline, we now re-ran the EEG time decoding. We derived null distributions from the empirical data for all three analyses, following the guidelines from Grootswagers 2017 (page 688, section “Evaluation of Classifier Performance and Group Level Statistical Testing Statistical”):

      “Another popular alternative is the permutation test, which entails repeatedly shuffling the data and recomputing classifier performance on the shuffled data to obtain a null distribution, which is then compared against observed classifier performance on the original set to assess statistical significance (see, e.g., Kaiser et al., 2016; Cichy et al., 2014; Isik et al., 2014). Permutation tests are especially useful when no assumptions about the null distribution can be made (e.g., in the case of biased classifiers or unbalanced data), but they take much longer to run (e.g., repeating the analysis 10,000 times).”

      Running a sign permutation test with 10,000 repetitions, we show that the results are comparable to the previously reported results based on one-sided Wilcoxon signed rank tests. We are, therefore, confident that our reported results are not affected by this difference in baseline. We now added this control analysis to the results section and supplementary material (see Supplementary Figure 5).

      p. 7-8, ll. 213-215, Results 3.2: 

      “Importantly, the temporal dynamics of sensory and perceptual representations differed significantly. Compared to sensory representations, the significance onset of perceptual representations was delayed by 107ms (21-167ms) (N = 11, 1,000 bootstraps, one-tailed bootstrap test against zero, P= 0.012). This results pattern was consistent when defining the analysis baseline empirically (see Supplementary Figure 5).”

      (2) According to the authors, perceptual rather than sensory braille letter representations identified in space are suitably formatted to guide behavior. However, they acknowledge that this finding is likely to be task-dependent because it is based on subject similarity ratings.

      Maybe they could use a more objective similarity measurement of Braille letters similarity?

      For instance, they can compare letters using Jaccard similarity (See for instance: Bottini et al. 2022). 

      We thank the reviewer for the opportunity to clarify. We acknowledge that our findings regarding the behavioral relevance of the identified neural representations are task-dependent. But, importantly, this is not because we use perceived similarity ratings as a measurement, but because we only use one measurement while there are infinitely many other potential tasks to assess behavior. This means that the same limitation holds when using another similarity measure like Jaccard similarity. We now clarify this in the Discussion section: 

      p. 12, ll. 419-420, Discussion 4.3:

      “Our results clarified that perceptual rather than sensory braille letter representations identified in space are suitably formatted to guide behavior. However, we only use one specific task to assess behavior and, therefore, acknowledge that this finding is taskdependent.”

      Nevertheless, we calculated Jaccard similarity based on the definition used in Bottini et. al. There are no significant correlations for the EEG-behavior or fMRI-behavior RSA when we use the Jaccard matrix and subject-specific EEG or fMRI RDMs (see Supplementary Figure 6).

      This demonstrates that braille letter similarity ratings are significantly correlated with neural representations in space and time but Jaccard similarity of braille dot overlaps is not. 

      (3) If the primacy of perceptual similarity holds also with more objective measures of letter similarity, I think the authors should spend a few more words characterizing the results in fMRI and EEG that are rather divergent (concerning this analysis). Indeed, EEG analysis shows a significant correlation between similarity ratings and within-hand classification accuracy, although this correlation does not emerge in the "sensory" ROIs. I think these findings can be put together, hypothesizing that sensory-based similarity correlates with behavior but only in perceptual ROIs. However, why so? Can the authors provide a more mechanistic explanation? Am I missing something? 

      We thank the reviewer for this intriguing idea. We now speculate about how we could harmonize the results from the behavior-EEG and behavior-fMRI RSAs in the discussion section. 

      p. 12, ll. 438-442, Discussion 4.3:

      “Similarity ratings and sensory representations as captured by EEG are correlated, and so are similarity ratings and representations in perceptual ROIs, but not sensory ROIs. This might be interpreted as suggesting a link between the sensory representations captured in EEG and the representations in perceptual ROIs. However, we do not have any evidence towards this idea. Differing signalto-noise ratios for the different ROIs and sensory versus perceptual analysis could be an alternative explanation.“

      (4) In the methods they state that EEG decoding is tested against chance at each time point but these results are not reported, only latency analysis is reported. Can the authors report the significant time points of the EEG time series decoding?  

      We thank the reviewer for catching this inconsistency! We have now added this information to Figure 3a.

      (5) In fMRI ROI definition procedure, the top 321 voxels of each anatomical ROI that had the highest functional activation were selected. The number of voxels is based on the smaller ROI, which to my understanding means that for this ROI all the voxels were selected potentially introducing noise and impacting the comparison between ROIs. Can the authors clarify which ROI was the smallest? 

      Thank you for the question! The smallest ROI was V4. This indeed means that for this ROI all voxels were selected. This could have led to our results being noisy in V4 but should not influence the results in other ROIs. We now added this information to the methods section.  p. 15, ll. 592, Methods 5.4.4:

      “The smallest mask was V4 which included 321 voxels.”

      (6) Finally, the author suggests that: "Importantly, higher-level computations are not limited to the EVC in visually deprived brains. Natural sound representations 41 and language activations 53 are also located in EVC of sighted participants. This suggests that EVC, in general, has the capacity to process higher-level information 54. Thus, EVC in the visually deprived brain might not be undergoing fundamental changes in brain organization 53. This promotes a view of brain plasticity in which the cortex is capable of dynamic adjustments within pre-existing computational capacity limits 4,53-55." - The presence of a sighted control group would have strengthened this claim. 

      We agree with the reviewer and now discuss the limitations of our approach in the discussion section (see response to weaknesses raised by Reviewer 2 in the Public Review above).

      Reviewer #2 (Recommendations For The Authors): 

      (1) Can the authors comment on the reaction time of the two reading hands? Completely ambidextrous reading is not necessarily common, so any differences in ability or response time across the hands may affect the EEG results. Alternatively, do the authors have any additional behavioral data about the participants' ability to read well with both hands? 

      We thank the reviewer for these questions! We did not assess reaction times and acknowledge this as a limitation. We did, however, measure accuracies and would have expected to see a speed-accuracy-trade off if reaction times would differ between hands, i.e., we would have expected lower accuracy for the hand with higher RTs. But this was not the case: our participants had comparable accuracy values when reading letters with both hands (see methods section 5.3.3 and answer to Public Review above). This measure indicated that participants recognized Braille letters presented for 500ms equally well with both index fingers.

      (2) Please add information about any residual sight in the blind participants (or are they all without light perception?)

      We have now added information about residual light perception in Supplementary Table 1 (see above in response to Public Review).

      (3) Is active tactile exploration involved, or are the participants not moving their fingers at all over the piezo-actuators? Can the authors elaborate more on how the participants used this passive input?

      We thank the reviewer for the opportunity to clarify. Our experimental setup does not involve tactile exploration or sliding motions. Instead, participants rest their index fingers on the piezo-actuators and feel the static sensation of dots pushing up against their fingertips. We assume that participants used the passive input of specific dot stimulation location on fingers to perceive a dot array which, in turn, led to the percept of a braille letter.

      We now specify this information in the methods section.

      p. 13, ll. 474-475, Methods 5.2:

      “The modules were taped to the clothes of a participant for the fMRI experiment and on the table for the EEG and behavioral experiment. This way, participants could read in a comfortable position with their index fingers resting on the braille cells to avoid motion confounds. Importantly, our experimental setup did not involve tactile exploration or sliding motions. We instructed participants to read letters regardless of whether the pins passively stimulated their immobile right or left index finger.”

      (4) I appreciated the RSA analysis, but remain curious about what the ratings were based on.

      Do the authors know what parameters participants used to rate for? Were these consistent across participants? That would aid in interpreting the results.

      We thank the reviewer for the interest in our representational similarity analyses linking the neural representations to behavior. 

      We do not know which parameters participants explicitly used to rate the similarity between letters. We instructed participants to freely compare the similarity of pairs of braille letters without specifying which parameters they should use for the similarity assessment. We speculate that participants used a mixture of low-level features such as stimulation location on fingers and higher-level features such as linguistic similarity between letters. We now clarify the free comparison of braille letter pairs in the methods section:

      p. 14, ll. 538-539, Methods 5.3.4:

      “Each pair of letters was presented once, and participants compared them with the same finger. We instructed participants to freely compare the similarity of pairs of Braille letters without specifying which parameters they should use for the similarity assessment. The rating was without time constraints, meaning participants decided when they rated the stimuli. Participants were asked to verbally rate the similarity of each pair of braille letters on a scale from 1 = very similar to 7 = very different and the experimenter noted down their responses.”

      (5) Can the authors provide confusion matrices for the decoding analyses in the supplementary materials? This could be informative in understanding what pairs of letters are most discernable and where. 

      We have added confusion matrices for within- and between-hand decoding for all ROIs and for the time points 100ms, 200ms, 300ms and 400ms to the Supplementary Material (see Supplementary Figures 7-10).

      (6) Was slice time correction done for the fMRI data? This is not reported. 

      We now added this information to the methods section - our fMRI preprocessing pipeline did not include slice timing correction.  

      p. 14, ll. 554, Methods 5.4.2:

      “We did not apply high or low-pass temporal filters and did not perform slice time correction.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #3 (Public review):

      Summary:

      Juan Liu et al. investigated the interplay between habitat fragmentation and climate-driven thermophilization in birds in an island system in China. They used extensive bird monitoring data (9 surveys per year per island) across 36 islands of varying size and isolation from the mainland covering 10 years. The authors use extensive modeling frameworks to test a general increase of the occurrence and abundance of warm-dwelling species and vice versa for cold-dwelling species using the widely used Community Temperature Index (CTI), as well the relationship between island fragmentation in terms of island area and isolation from the mainland on extinction and colonization rates of cold- and warm-adapted species. They found that indeed there was thermophilization happening during the last 10 years, which was more pronounced for the CTI based on abundances and less clearly for the occurrence based metric. Generally, the authors show that this is driven by an increased colonization rate of warm-dwelling and an increased extinction rate of cold-dwelling species. Interestingly, they unravel some of the mechanisms behind this dynamic by showing that warm-adapted species increased while cold-dwelling decreased more strongly on smaller islands, which is - according to the authors - due to lowered thermal buffering on smaller islands (which was supported by air temperature monitoring done during the study period on small and large islands). They argue, that the increased extinction rate of cold-adapted species could also be due to lowered habitat heterogeneity on smaller islands. With regards to island isolation, they show that also both thermophilization processes (increase of warm and decrease of cold-adapted species) was stronger on islands closer to the mainland, due to closer sources to species populations of either group on the mainland as compared to limited dispersal (i.e. range shift potential) in more isolated islands.

      The conclusions drawn in this study are sound, and mostly well supported by the results. Only few aspects leave open questions and could quite likely be further supported by the authors themselves thanks to their apparent extensive understanding of the study system.

      Strengths:

      The study questions and hypotheses are very well aligned with the methods used, ranging from field surveys to extensive modeling frameworks, as well as with the conclusions drawn from the results. The study addresses a complex question on the interplay between habitat fragmentation and climate-driven thermophilization which can naturally be affected by a multitude of additional factors than the ones included here. Nevertheless, the authors use a well balanced method of simplifying this to the most important factors in question (CTI change, extinction, colonization, together with habitat fragmentation metrics of isolation and island area). The interpretation of the results presents interesting mechanisms without being too bold on their findings and by providing important links to the existing literature as well as to additional data and analyses presented in the appendix.

      Weaknesses:

      The metric of island isolation based on distance to the mainland seems a bit too oversimplified as in real-life the study system rather represents an island network where the islands of different sizes are in varying distances to each other, such that smaller islands can potentially draw from the species pools from near-by larger islands too - rather than just from the mainland. Although the authors do explain the reason for this metric, backed up by earlier research, a network approach could be worthwhile exploring in future research done in this system. The fact, that the authors did find a signal of island isolation does support their method, but the variation in responses to this metric could hint on a more complex pattern going on in real-life than was assumed for this study.

      Thank you again for this suggestion. Based on the previous revision, we discussed more about the importance of taking the island network into future research. The paragraph is now on Lines 294-304:

      “As a caveat, we only consider the distance to the nearest mainland as a measure of fragmentation, consistent with previous work in this system (Si et al., 2014), but we acknowledge that other distance-based metrics of isolation that incorporate inter-island connections and island size could hint on a more complex pattern going on in real-life than was assumed for this study, thus reveal additional insights on fragmentation effects. For instance, smaller islands may also potentially utilize species pools from nearby larger islands, rather than being limited solely to those from the mainland. The spatial arrangement of islands, like the arrangement of habitat, can influence niche tracking of species (Fourcade et al., 2021). Future studies should use a network approach to take these metrics into account to thoroughly understand the influence of isolation and spatial arrangement of patches in mediating the effect of climate warming on species.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Great job on the revision! The new version reads well and in my opinion all comments were addressed appropriately. A few additional comments are as follows:

      Thank you very much for your further review and recognition. We have carefully modified the manuscript according to all recommendations.

      (1) L 62: replace shifts with process

      Done. We also added the word “transforming” to match this revision. The new sentence is now on Lines 61-63:

      “Habitat fragmentation, usually defined as the process of transforming continuous habitat into spatially isolated and small patches”

      (2) L 363: Your metric for habitat fragmentation is isolation and habitat area and I think this could be introduced already in the introduction, where you somewhat define fragmentation (although it could be clearer still). You could also discuss this in the discussion more, that other measures of fragmentation may be interesting to look at.

      Thank you for this suggestion. We now introduced metric of habitat fragmentation in the Introduction part after habitat fragmentation was defined. The sentence is now on Lines 64-66:

      “Among the various ways in which habitat fragmentation is conceptualized and measured, patch area and isolation are two of the most used measures (Fahrig, 2003).”

      (3) L 384: replace for with because of

      Done.

      (4) L 388: "Following this filtering, 60 ...."

      Done.

      (5) Figure 1: In panels b-d you use different terms (fragmented, small, isolated) but aiming to describe the same thing. I would highly recommend to either use fragmented islands or isolated islands for all panels. Although I see that in your study fragmentation includes both, habitat loss and isolation. So make this clear in the figure caption too...

      Thank you very much for this suggestion. It’s important to maintain consistency in using “fragmentation”. We change “fragmented, small, isolated” into “Fragmented patches” in the caption of b-d. The modified caption is now on Line 771:

      (6) L 783: replace background with habitat (or landscape) and exhibit with exemplify

      Done. The new sentence is now on Lines 782-784:

      “The three distinct patches signify a fragmented landscape and the community in the middle of the three patches was selected to exemplify colonization-extinction dynamics in fragmented habitats.”

      (7) One bigger thing is the definition of fragmentation in your study for which you used habitat area (from habitat loss process) and isolation. This could still be clarified a bit more, especially in the figures. In Fig. 1 the smaller panels b-d could all be titled fragmented islands as this is what the different terms describe in your study (small, isolated) and thus the figure would become even clearer. Otherwise I'm happy with the changes made.

      Thank you for raising this important question. Yes, “habitat fragmentation” in our research includes both habitat loss and fragmentation per se. We have clarified the caption of b-d in Figure 1 as suggested by Recommendation (5). We believe this can make it clearer to the readers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      This study explores the neural control of muscle by decomposing the firing activity of constituent motor units from the grid of surface electromyography (EMG) in the Tibialis (TA) Anterior and Vastus Lateralis (VL) during isometric contractions. The study involves extensive samples of motor units across the broadest range of voluntary contraction intensities up to 80% of MVC. The authors examine the rate coding of the population of motor units, which describes the instantaneous firing rate of each motor unit as a function of muscle force. This relationship is characterized by a natural logarithm function that delineates two distinct phases: an initial phase with a steep acceleration in firing rate, particularly pronounced in low-threshold motor units, and a subsequent modest linear increase in firing rate, more significant in high-threshold motor units. 

      Strengths: 

      The study makes a significant contribution to the field of neuromuscular physiology by providing a detailed analysis of motor unit behavior during muscle contractions in a few ways.

      (1) The significance lies in its comprehensive framework of motor unit activity during isometric contractions in a broad range of intensities, providing insights into the non-linear relationship between the firing rate and the muscle force. The extensive sample of motor units across the pool confirms the observation in animal studies in which the spinal motoneuron exhibits a discharge consisting of distinct phases in response to synaptic currents, under the influence of persistent inward currents. As such, it is now reasonable to state the human motor units across the pool are also under the control of gain modulation via some neuromodulatory effects in addition to synaptic inputs arising from ionotropic effects.

      (2) The firing scheme across the entire motoneuron pool revealed in this study reconciles the discrepancy in firing organization under debate; i.e., whether it is 'onion skin' like or not (Heckman and Enoka 2012). The onion skin like model states that the low threshold motor units discharge higher than high threshold motor units and have been held for a long time because the firing behaviors were examined in a partial range of contraction force range due to technical limitations. This reconciliation is crucial because it is fundamental to modelling the organization of motor unit recruitment and rate coding to achieve a desired force generation to advance our understanding of motor control.

      (3) The extensive data collection with a novel blind source separation algorithm on the expanded number of channels of surface EMG signal provides a robust dataset that enhances the reliability and validity of findings, setting a new standard for empirical studies in the field. 

      Collectively, this study fills several knowledge gaps in the field and advances our understanding of the mechanism underlying the isometric force generation.

      We thank the reviewer for their positive appreciation of our work.

      Weaknesses: 

      Although the findings and claims based on them are mostly well aligned, some accounts of the methods and claims need to be clarified.

      (1) The authors examine the input-output function of a motor unit by constructing models, using force as an input and discharge rate as an output. It sounds circular, or the other way around to use the muscle force as an input variable, because the muscle force is the result of motor unit discharges, not the cause that elicits the discharges. More specifically, as a result of non-linear interactions of synchronous and/or asynchronous discharges of a population of a given motoneuron pool that give rise to transient increase/maintenance in twitch force, the gross muscle force is attained. I acknowledge that it is extremely challenging experimentally to measure synaptic currents impinging upon the spinal motoneurons in human subjects and the author has an assumption that the force could be used as a proxy of synaptic currents. However, it is necessary to explicitly provide the caveats and rationale behind that. Force could be used as the input variable for modelling.

      Force is indeed used in this study as a proxy of the common excitatory synaptic currents as their direct measurement is not possible in vivo in humans. It is worth noting that this approach has been extensively used in the past by many groups to study rate coding (e.g., Monsters & Chan, De Luca’s, Heckman’s, and Fuglevand’s groups). Heckman’s, Gorassini’s, Fuglevand’s groups and others have considered the non-linearities in the relation between motor unit firing rates and muscle force in humans as an indicator of the impact of neuromodulation on motor unit behaviour and changes of the intrinsic properties of motoneurons.

      One could also use the cumulative spike train as a more direct estimate of common excitatory inputs, assuming that it is possible to identify a group of motor units not influenced by PICs, as done when selecting a reference low-threshold motor neuron in the delta F method (Gorassini et al., 1998), or the cumulative spike train of low-threshold motor neurons (Afsharipour et al., 2020). However, this approach was not possible in our study as we did not have the same units across contractions to estimate cumulative spike trains. It was therefore not possible to pool the data across contractions as we did to generate force/firing rate relations on the widest range of force.

      We added a sentence in the discussion to highlight this limitation (P19, L470):

      ‘This result must be confirmed with a more direct proxy of the net synaptic drive, such as the firing rate of a reference low-threshold motor neuron used in the delta F method (Gorassini et al., 1998), or the cumulative spike train of low-threshold motor neurons (Afsharipour et al., 2020)’.

      (2) The authors examine the firing organizations in TA and VL in this study without explicit purposes and rationale for choosing these muscles. The lack of accounts makes it hard for the readers to interpret the data presented, particularly in terms of comparing the results from the different muscles.

      We wanted to compare the rate coding of pools of motor units from proximal (VL) and distal (TA) muscles within the lower limb. Indeed, distal and proximal muscles exhibit differences in rate coding and spatial recruitments (De Luca et al., 1982, J Physiol), potentially due to different levels of recurrent inhibition (Cullheim & Kellerth, 1978, J Physiol; Rossi & Mazzocchio, 1991, Exp Brain Res; Edgley et al., 2021, J Neurosci) or different levels of neuromodulation depending on their involvement (or not) in postural control (Hoonsgaard et al., 1988, J Physiol; Kim et al., 2020, J Neurophysiol).

      We added a paragraph at the beginning of the result section to support our muscle choice (P6; L137): ‘16 participants performed either isometric dorsiflexion (n = 8) or knee extension tasks (n = 8) while we recorded the EMG activity of the tibialis anterior (TA - dorsiflexion) or the vastus lateralis (VL – knee extension) with four arrays of 64 surface electrodes (256 electrodes per muscle). The motoneuron pools of these two muscles of the lower limb receive a large part of common input (Laine et al., 2015; Negro et al., 2016a), constraining the recruitment of their motor units in a fixed order across tasks. They are therefore good candidates for an accurate description of rate coding. Moreover, we wanted to determine whether differences in rate coding observed between proximal and distal muscles in the upper limb (De Luca et al., 1982) were also present in the lower limb.’.

      Another factor that guided our muscle choice was the low risk of crosstalk. For this, we verified with ultrasound that our arrays of 256 electrodes only covered the muscle of interest, staying away from the neighbouring muscles. This was possible as superficial muscles from the leg are bulkier than those from the upper limb. Given the small diameter of each electrode (2 mm), it is unlikely that the motor units from the neighbouring muscles were in the recorded muscle volume (Farina et al., 2003, IEEE Trans Biomed Eng)

      (3) In the methods, the author described the manual curation process after applying the blind source separation algorithm. For the readers to understand the whole process of decomposition and to secure rigor and robustness of the analyses, it would be necessary to provide details on what exact curation is performed with what criteria. 

      The manual curation of EMG decomposition with blind source separation is different from what is classically done with intramuscular EMG and template-matching algorithms. 

      In short, our decomposition algorithm uses fast independent component analysis (fastICA) to retrieve motor unit spike trains from the EMG signals. For this, it iteratively optimises a set of weights, i.e., a separation vector, for each motor unit. The projection of the EMG signals on this separation vector generates a sparse motor unit pulse train, with most of its samples close to zero and only a few samples close to one (Figure 1B). The discharge times are estimated from this motor unit pulse train using a peak detection function and a k-mean classification with two classes to separate the high peaks (spikes) from the low peaks (noise and other motor units).

      The manual curation consists of inspecting the automatic detection of the peaks of the motor unit pulse train and manually add missed peaks (missed discharge times) or remove wrongly detected peaks. Then, the separation vector is updated using the correct discharge times and the motor unit pulse train recalculated. This procedure generally improves the distance between the discharge times and the noise, which confirm the accuracy of the manual curation. If that’s not the case, the motor unit is discarded from the analyses.

      We added a section on manual editing in the methods (P23, L615):

      ‘At the end of these automatic steps, all the motor unit pulse trains and identified discharge times were visually inspected, and manual editing was performed to correct the false identification of artifacts or the missed discharge times (Del Vecchio et al., 2020; Hug et al., 2021; Avrillon et al., 2023). The manual editing consisted of i) removing the spikes causing erroneous discharge rates (outliers), ii) adding the discharge times clearly separated from the noise, iii) recalculating the separation vector, iv) reapplying the separation vector on the entire EMG signals, and v) repeating this procedure until the selection of all the discharge times is achieved. The manual editing of potential missed discharge times and falsely identified discharge times was never immediately accepted. Instead, the procedure was consistently followed by the application of the updated motor unit separation vector on the entire EMG signals to generate a new motor unit pulse train. Then, the manual editing was only accepted when the silhouette value increased or stayed well above the threshold of 0.9 quantified with the silhouette value (Negro et al., 2016b). Only these motor units were retained for further analysis.’

      (4) In Figure 3, the early recruited units tend to become untraceable in the higher range of contraction. This is more pronounced in the muscle VL. This limitation would ambiguate the whole firing curve along the force axis and therefore limitation and the applicability in the different muscles needs to be discussed. 

      The loss of low threshold motor units in the higher range of contractions was caused either by the decrease in signal-to-noise ratio for small motor units when many larger ones are recruited, or by the cancellation of the surface action potentials of the small units in the interference electromyographic signal, or by the recruitment of a motor unit with a very similar spatio-temporal filter (an example is shown in the figure below). In the latter case, the motor unit pulse train contains peaks that represent the discharge times of both motor units (green and red dots in the simulated example below), making them undistinguishable by the operator during manual editing.

      Author response image 1.

      This was discussed in the results (P7; L190):

      ‘On average, we tracked 67.1 ± 10.0% (25th–75th percentile: 53.9 – 80.1%) of the motor units between consecutive contraction levels (10% increments, e.g., between 10% and 20% MVC) for TA and 57.2 ± 5.1% (25th–75th percentile: 46.6 – 68.3%) of the motor units for VL (Figure S2). There are two explanations for the inability to track all motor units across consecutive contraction levels. First, some motor units are recruited at higher targets only. Second, it is challenging to track small motor units beyond a few contraction levels due to a lower signal-to-noise ratio for the small motor units when larger motor units are recruited, or signal cancellation (Keenan et al., 2005; Farina et al., 2014a).’

      However, we believe that it had a limited impact on the output of the paper, as the non-linear portion of the rate coding/force relation due to the persistent inward currents occurs during the first seconds after recruitment, before plateauing (for a review see Binder et al., 2020, Physiology).

      (5) It is unclear how commonly the notion "the long-held belief that rate coding is similar across motor units from the same pool" is held among the community without a reference. Different firing organizations have been modelled and discussed in the seminal paper by Fuglevand et al. (1993) and as far as I understand, the debate has not converged to a specific consensus. As such, any reference would be required to support the claim the notion is widely recognized.

      In the paper of Fuglevand et al., (1993, J Neurophysiol), all the motor units had the same rate coding pattern relative to the excitatory input, though they changed the slope of the relations and the saturation threshold of motor units between simulations. This is similar to the paper of De Luca & Contessa (2012, J Neurophysiol), where the equation used to simulate the rate coding was non-linear, but consistent across motor units.  

      We added these citations to the text:

      ‘Overall, we found that motor units within a pool exhibit distinct rate coding with changes in force level (Figure 2 and 3), which contrasts with the long-held belief that rate coding is similar across motor units from the same pool (Fuglevand et al., 1993; De Luca and Contessa, 2012).’

      (6) The authors claim that the firing behavior as a function of force is well characterized by a natural logarithmic function, which consists of initial steep acceleration followed by a modest increase in firing rate. Arguably the gain modulation in firing rate could be attributed to a neuromodulatory effect on the spinal motoneuron, which has been suggested by a number of animal studies. However, the complexity of the interactions between ionotropic and neuromodulatory inputs to motoneurons may require further elucidation to fully understand the mechanisms of neural control; it is possible to consider the differential acceleration among different threshold motor units as a differential combinatory effect of ionotropic and neuromodulatory inputs, but it is not trivially determined how differentially or systematically the inputs are organized. Likewise, the authors make an account for the difference in firing rate between TA and VL in terms of different amounts or balances of excitatory and inhibitory inputs to the motoneuron pool, but again this could be explained by other factors, such as a different extent of neuromodulatory effects. To determine the complexity of the interactions, further studies will be warranted.

      We appreciate the reviewer’s view on this point, as we indeed only indirectly inferred the combination of neuromodulatory and ionotropic inputs to motoneurons in this study. A more direct manipulation of the sources of neuromodulatory and ionotropic inputs will be required in the future to directly highlight the mechanisms responsible for these variations in rate coding within pools. However, it is also worth noting that the acceleration in firing rate, the increase in firing rate during the ramp up, and the hysteresis between ramps up and downs have been used to infer the distribution of ionotropic and neuromodulatory inputs from the firing rate/force relations (Johnson et al., 2017; Beauchamp et al., 2023; Chardon et al., 2023). This approach has been validated with hundreds of thousands of simulations using a biophysical model of motor neurons (Chardon et al., 2023). There is also a series of studies in humans showing how the absence of neuromodulation modulated via inhibitory inputs (Revill & Fuglevand, 2017) or medication blocking serotonin receptors (Goodlich et al., 2023) impact the non-linearity of the firing rate/force relation. Therefore, we are confident that the differences observed within and between pools are linked to different distribution of excitatory/inhibitory inputs and neuromodulation.

      We added a sentence in the discussion to highlight this point (P18; L435):

      ‘Taken together, these results show how ionotropic and neuromodulatory inputs to motoneurons uniquely combine to generate distinct rate coding across the pool, even if a more direct manipulation of the sources of neuromodulatory and ionotropic inputs will be required to directly estimate their interactions.’

      (7) It is unclear with the account " ... the bandwidth of muscle force is < 10Hz during isometric contraction" in the manuscript alone, and therefore, it is difficult to understand the following claim. It appears very interesting and crucial for motor unit discharge and force generation and maintenance because it would pose a question of why the discharge rate of most motor units is higher than 10Hz, despite the bandwidth being so limited, but needs to be elaborated.

      We described the slow fluctuations in smoothed firing rates associated with the variations in force observed during isometric contractions. The bandwidth of muscle force is lower than 10Hz due to the contractile properties of muscle tissues (Baldissera et al., 1998, J Physiol). Having an average firing rate higher than this bandwidth enables the pool of motor neurons to effectively transmit the common inputs (the main discriminant of muscle force) over this bandwidth without distortion (Farina et al., 2014, J Physiol). Increasing the firing rate beyond the muscle bandwidth also increases the power of the spike train at the direct current frequency (frequency equal to 0) since this power is related to the number of spikes per second. Thus, increasing the firing rate well beyond the muscle bandwidth still has a clear effect in force. To illustrate this point, note that electrical stimuli delivered at 100 Hz can lead to an increase in muscle force.

      Reviewer #2 (Public Review):  

      Summary: 

      The motivation for this study is to provide a comprehensive assessment of motor unit firing rate responses of entire pools during isometric contractions. The authors have used new quantitative methods to extract more unique motor units across contractions than prior studies. This was achieved by recording muscle fibre action potentials from four high-density surface electromyogram (HDsEMG) arrays (Caillet et al., 2023), quantifying residual EMG comparing the recorded and data-based simulation (Figure 1A-B), and developing a metric to compare the spatial identification for each motor unit (Figure 1D-E). From identified motor units, the authors have provided a detailed characterization of recruitment and firing rate responses during slow voluntary isometric contractions in the vastus lateralis and tibialis anterior muscles up to 80% of maximum intensity. In the lower limb, it is interesting how lower threshold motor units have firing rate responses that saturate, whereas higher threshold units that presumably produce higher muscle contractile forces continue to increase their firing rate. In many ways, these results agree with the rate coding of motor units in the extensor digitorum communis muscle (Monster and Chan, 1977). The paper is detailed, and the analyses are well explained. However, there are several points that I think should be addressed to strengthen the paper.

      We thank the reviewer for their positive appreciation of our work.

      General comments: 

      (1) The authors claim they have measured the complete rate coding profiles of motor units in the vastus lateralis and tibialis anterior muscles. However, this study quantified rate coding during slow and prolonged voluntary isometric contractions whereas the function of rate coding during movements (Grimby and Hannerz, 1977) or more complex isometric contractions (Cutsem and Duchateau, 2005; Marshall et al., 2022) remains unexplored. For example, supraspinal inputs may not scale the same way across low and higher threshold motor units, or between muscles (Devanne et al., 1997), making the response of firing rates to increasing isometric contraction force less clear. 

      We agree with the reviewer that rate coding strategies may vary with the velocity and the type of contractions (Duchateau & Enoka, 2008, J Physiol). It is thus likely that the firing rate would increase during the first milliseconds of fast contractions, with the occurrence of doublets (Cutsem and Duchateau, 2005, J Physiol; Del Vecchio et al., 2019, J Physiol), or that motor unit firing rate may be lower during lengthening than shortening contractions (Duchateau & Enoka, J Physiol). 

      However, the decomposition of EMG signals in non-stationary conditions remains challenging, and is still limited to slow varying patterns of force (Chen et al., 2000, Oliveira & Negro, 2021, Mendez Guerra et al., 2024, Yeung et al., 2024). Future methodological developments will be required to expand our findings to other patterns of force.

      Conceptually, the authors focus on the literature on intrinsic motoneurone properties, but in vivo, other possibilities are that descending supraspinal drive, spinal network dynamics, and afferent inputs have different effects across motor unit sizes, muscles, and types of contractions. Also, the influence from local muscles that act as synergists (e.g., vastii muscles for the vastus lateralis, and peroneal muscles that evert the foot for the tibialis anterior) or antagonists (coactivation during higher contraction intensities would stiffen the joint) may provide differential forms of proprioceptive feedback across motor pools. 

      The reviewer is right that differences in spinal network dynamics and afferent inputs may explain the differences in rate coding observed between the two muscles. Indeed, computational models have shown how the pattern of inhibitory inputs may affect the increase in firing rate during linear increase in force (Powers & Heckman, 2017, J Neurophysiol; Chardon et al., 2023, Elife). Specifically, the difference observed between proportional inhibitory inputs vs. a push pull pattern mirror the differences observed here between the TA (push-pull like pattern) and the VL (proportional pattern). This difference may reflect the impact of various pathways of inhibition, such as reciprocal inhibition or recurrent inhibition from homonymous motor units or motor units from synergistic muscles. 

      These points have been further discussed in the manuscript (P19; L475):

      ‘The increase in firing rate was also significantly greater for TA motor units than for those in VL. This difference may reflect a varying balance between excitatory/inhibitory synaptic inputs and neuromodulation due to multiple spinal circuits (Heckman and Binder, 1993; Heckman et al., 2008; Johnson et al., 2017; Powers and Heckman, 2017; Chardon et al., 2023; Škarabot et al., 2023). Specifically, the strength of recurrent and reciprocal inhibitory inputs to motoneurons innervating VL and TA, and their proportional or inverse covariation with excitatory inputs, respectively, may explain the differences in rate limiting and maximal firing rates (Heckman and Binder, 1993; Heckman et al., 2008; Johnson et al., 2017; Powers and Heckman, 2017; Chardon et al., 2023; Škarabot et al., 2023). Thus, the motor units from the VL may receive more recurrent inhibition than those of distal muscles, though direct evidence of these differences remains to be found in humans (Windhorst, 1996). Interestingly, similar differences in rate coding were previously observed between proximal and distal muscles of the upper limb (De Luca et al., 1982). However, other muscles that serve different functions within the human body, such as muscles from the face, have different rate coding characteristics with much higher firing rates (Kirk et al., 2021). Future work should investigate those muscles and other to reveal the myriads of rate coding strategies in human muscles.’

      (2) The evidence that the entire motor unit pool was recorded per muscle is not clear. There appears to be substantial residual EMG (Figure 1B), signal cancellation of smaller motor units (lines 172-176), some participants had fewer than 20 identified motor units, and contractions never went above 80% of MVC. Also, to my understanding, there remains no gold-standard in awake humans to estimate the total motor unit number in order to determine if the entire pool was decomposed. 

      The reviewer is right that we did not decode the full pool of motor units. As indicated in the initial version of the manuscript (e.g. title, introduction), we considered that we identified an extensive sample of motor units representative of the dynamic of the pool. This claim was supported by the identification of motor units with recruitment thresholds ranging from 0 to 75% of the maximal force. 

      This statement was in the introduction (P4; L109): ‘We were able to identify up to ~200 unique active motor units per muscle and per participant in two human muscles in vivo, yielding extensive samples of motor units that are representative of the entire motoneuron pools (Caillet et al., 2023a).’

      Furthermore, using four HDsEMG arrays also raises questions about how some channels were placed over non-target muscles, and if motor units were decomposed from surrounding synergists.

      A factor that guided our muscle choice was the low risk of crosstalk. For this, we verified with ultrasound that our arrays of 256 electrodes only covered the muscle of interest, staying away from the neighbouring muscles. This was possible as superficial muscles from the leg are bulkier than those from the upper limb. Given the small diameter of each electrode (2 mm), it is unlikely that the motor units from the neighbouring muscles were in the recorded muscle volume.

      (3) The authors claim (Abstract L51; Discussion L376) that a commonly held view in the field is that rate coding is similar across motor units from the same pool. Perhaps this is in reference to some studies that have carefully assessed lower threshold motor units during lower force ramp contractions (e.g., Fuglevand et al., 2015; Revill and Fuglevand, 2017). However, a more complete integration of the literature exploring motor unit firing rate responses during rapid isometric contractions, comparing different muscles and contraction intensities would be helpful. From Figure 3, the range of rate coding in the tibialis anterior (~7-40 Hz) is greater than the vastus lateralis (~5-22 Hz) muscle across contraction levels. In agreement with other studies, the range of rate coding within some muscles is different than others (Kirk et al., 2021) and during maximal intensity (Bellemare et al., 1983) or rapid contractions (Desmedt and Godaux, 1978). Likewise, within a motor pool, there is a diversity of firing rate responses across motor units of different sizes as a function of isometric force (Monster and Chan, 1977; Desmedt and Godaux, 1977; Kukula and Clamann, 1981; Del Vecchio et al., 2019; Marshall et al., 2022). A strength of this paper is how firing rate responses are quantified across a wide range of motor unit recruitment thresholds and between two muscles. I suggest improving clarity for the general reader, especially in the motivation for testing two lower limb muscles, and elaborating on some of the functional implications.

      We thank the reviewer for his input on this question. We have added references to these works and lines of research in the discussion:

      (P18; L449): ‘In addition, rate coding patterns should also vary with the pattern of contractions, with fast contractions lowering the range of recruitment thresholds within motoneuron pools (Desmedt and Godaux, 1977b, 1979; van Bolhuis et al., 1997). The variability in rate coding observed here between motor units from the same pool could lead to small deviations from the size principle sometimes observed between pairs of units during isometric contractions with various patterns of force (Desmedt and Godaux, 1979; Marshall et al., 2022) or during the derecruitment phase (Bracklein et al., 2022).’ (P19; L487): ‘However, other muscles that serve different functions within the human body, such as muscles from the face, have different rate coding characteristics with much higher firing rates (Kirk et al., 2021). Future work should investigate those muscles and other to reveal the myriads of rate coding strategies in human muscles.’

      In addition to the responses above, we have added a section at the beginning of the results to motivate the choice of the muscles (P6; L137):

      ‘16 participants performed either isometric dorsiflexion (n = 8) or knee extension tasks (n = 8) while we recorded the EMG activity of the tibialis anterior (TA - dorsiflexion) or the vastus lateralis (VL – knee extension) with four arrays of 64 surface electrodes (256 electrodes per muscle). The motoneuron pools of these two muscles of the lower limb receive a large part of common input (Laine et al., 2015; Negro et al., 2016a), constraining the recruitment of their motor units in a fixed order across tasks. They are therefore good candidates for an accurate description of rate coding. Moreover, we wanted to determine whether differences in rate coding observed between proximal and distal muscles in the upper limb (De Luca et al., 1982) were also present in the lower limb.’.

      Reviewer #3 (Public Review): 

      Summary: 

      This is an interesting manuscript that uses state-of-the-art experimental and simulation approaches to quantify motor unit discharge patterns in the human TA and VL. The non-linear profiles of motor unit discharge were calculated and found to have an initial acceleration phase followed by an attenuation phase. Lower threshold motor units had a larger gain of the initial acceleration whereas the higher threshold motor unit had a higher gain in the attenuation phase. These data represent a technical feat and are important for understanding how humans generate and control voluntary force. 

      Strengths: 

      The authors used rigorous, state-of-the-art analyses to decompose and validate their motor unit data during a wide range of voluntary efforts.

      The analyses are clearly presented, applied, and visualized. 

      The supplemental data provides important transparency. 

      We thank the reviewer for their positive appreciation of our work.

      Weaknesses: 

      The number of participants and muscles tested are quite small - particularly given the constraints on yield. It is unclear if this will translate to other motor pools. The justification for TA and VL should be provided.

      One strength of our study is to provide relations between key-parameters of rate coding (acceleration in firing rate, increase in firing rate, hysteresis) and the recruitment thresholds of motor units within two different pools, and for each individual participant. These relations were consistent across all the participants (Figures 2 to 4), making us confident that increasing the sample size would not change the conclusions of the study.

      It is likely that the differences observed here between the VL and TA will also appear between other muscles of the leg, due to differences in the arrays of excitatory and inhibitory inputs they receive, the pattern of inhibitory inputs during increases in force (recurrent/reciprocal inhibition), and different levels of neuromodulation (Johnson et al., 2017, J Neurophysiol; Beauchamp et al., 2023; J Neural Eng). We have added a paragraph in the results to motivate our choice of muscles (P6; L137):

      ‘16 participants performed either isometric dorsiflexion (n = 8) or knee extension tasks (n = 8) while we recorded the EMG activity of the tibialis anterior (TA - dorsiflexion) or the vastus lateralis (VL – knee extension) with four arrays of 64 surface electrodes (256 electrodes per muscle). The motoneuron pools of these two muscles of the lower limb receive a large part of common input (Laine et al., 2015; Negro et al., 2016a), constraining the recruitment of their motor units in a fixed order across tasks. They are therefore good candidates for an accurate description of rate coding. Moreover, we wanted to determine whether differences in rate coding observed between proximal and distal muscles in the upper limb (De Luca et al., 1982) were also present in the lower limb.’.

      While an impressive effort was made to identify and track motor units across a range of contractions, it appears that a substantial portion of muscle force was not identified. Though high-intensity contractions are challenging to decompose - the authors are commended for their technical ability to record population motor unit discharge times with recruitment thresholds up to 75% of a participant's maximal voluntary contractions. However previous groups have seen substantial recruitment of motor units above 80% and even 90% maximum activation in the soleus. Given the innervation ratios of higher threshold motor units, if recruitment continued to 100%, the top quartile would likely represent a substantial portion of the traditional fast-fatigable motor units. It would be highly interesting to understand the recruitment and rate coding of the highest threshold motor units, at a minimum I would suggest using terms other than "entire range" or "full spectrum of recruitment thresholds"

      Motor units were indeed identified between 0 and 80% of the maximal force in this study. This is due to the requirements of the decomposition algorithm that needs sustained and stable contraction to converge toward a set of separation vectors that generate sparse spike trains. Thus, it was not possible for our participants to sustain contractions above 80%MVC without generating fatigue.

      However, it is important to note that only a few motor units are recruited above 80% of the maximal force in the TA (Van Cutsem et al., 1998, J Physiol), as well as in other muscles of the lower limb (Oya et al., 2009, J Physiol; Aeles et al., 2020, J Neurophysiol). Thus, we may have only missed a few motor units recruited above 80% of the maximal force. Nevertheless, we removed the terms ‘full spectrum of recruitment thresholds’ and ‘entire range’ from the manuscript to now read ‘most of the spectrum of recruitment thresholds observed in humans.’.

      The quantification of hysteresis using torque appears to make self-evident the observation that lower threshold motor units demonstrate less hysteresis with respect to torque. If there is motor unit discharge there will be force. I believe this limitation goes beyond the floor effects discussed in the manuscript. Traditionally, individuals have used the discharge of a lower threshold unit as the measure on which to apply hysteresis analyses to infer ion channel function in human spinal motoneurons.

      We agree with the reviewer that the hysteresis is classically estimated using the firing rate of a ‘reporter unit’ with the delta F method (introduced in humans by Gorassini et al..), or most recently with the advances in motor unit identification using the cumulative spike train of the identified motor unit. The researchers use this data as a proxy of the synaptic drive, and compare their values at recruitment and derecruitment thresholds of the ‘test unit’. 

      As mentioned above in response to reviewer 1, this approach was not possible in our study as we did not have the same units across contractions to estimate cumulative spike trains. It was therefore not possible to pool the data across contractions as we did here to generate force/firing rate relations on the widest range of force. This limitation is now highlighted in the discussion section (P19; L470): ‘This result must be confirmed with a more direct proxy of the net synaptic drive, such as the firing rate of a reference low-threshold motor neuron used in the delta F method (Gorassini et al., 1998), or the cumulative spike train of low-threshold motor neurons (Afsharipour et al., 2020).’.

      The main findings are not entirely novel. See Monster and Chan 1977 and Kanosue et al 1979. 

      We agree with the reviewer that the results of the paper are remarkably aligned with previous experimental findings in humans, in animals, or with in vitro and in silico models. However, we believe that our study shows in humans the incredible variety of rate coding patterns within a pool of motor units that span most of the spectrum of recruitment thresholds observed in humans. It also highlights the variability of rate coding patterns between motor neurons that have a similar recruitment threshold. Finally, we observe differences between pools of motor neurons innervating two different muscles in the lower limb, mirroring what has been done in the past in the upper limb muscle. 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors): 

      The wording 'decode' across the manuscript may sound somewhat unsuitable for the context, because 'decode' would involve interpreting the signals and activities to understand how they relate to specific variables or proxies of behavior. Here in this study it does not necessarily involve the interpretation, but sounds to be used for decomposing the signal into the constituent motor units. As such, it might be appropriate to use other words such as decompose, read out, or extract.

      ‘Decode’ was removed from the manuscript to now read motor unit ‘identification’

      Reviewer #2 (Recommendations For The Authors): 

      Figures 1 and 2 are informative and interesting. Figures 3 and 4 are harder to interpret. For example, in Figure 4, data plotted along the diagonal is overplotted and not as informative.

      For the sake of clarity, we separated the lines of the fits and the scatter plots in in the right panels in Figure 3. In Figure 4, we remove the scatter plots and only reported the lines of the fits for each participant. 

      Do you think the different durations of the isometric plateau across contraction intensities influenced motor unit derecruitment? Longer duration in lower threshold motor units would have resulted in a larger effect of PICs?

      We did not find an effect of the duration of the plateau on the derecruitment threshold. Notably, a computational study found that the duration of the plateau may impact the delta F, due to the combination of PICs, spike threshold accommodation and spike frequency adaptation (Revill & Fuglevand, 2011, J Neurophysiol). However, we did not use the delta F value here to estimate the effect of PICs on the hysteresis. 

      L703. For the measure of firing rate hysteresis the difference between recruitment and derecruitment was calculated, but why not use the delta-F method? This is more commonly used to assess hysteresis as a rough estimate of intrinsic dynamics.

      As further discussed above, this approach was not possible in our study as we did not have the same units across contractions to estimate cumulative spike trains. It was therefore not possible to pool the data across contractions as we did here to generate force/firing rate relations on the widest range of force.

      This was mentioned in the discussion (P19; L470):

      ‘This result must be confirmed with a more direct proxy of the net synaptic drive, such as the firing rate of a reference low-threshold motor neuron used in the delta F method (Gorassini et al., 1998), or the cumulative spike train of low-threshold motor neurons (Afsharipour et al., 2020).’

      L144. The standard deviation seems high. Some participants had fewer than 20 motor units and your number of participants per muscle was eight, could you state the complete range?

      A table was added in the results section to indicate the yields of the decomposition per contraction.

      If other studies are able to randomly sample motor units with intramuscular electrodes does this also represent an estimate of rate coding from the 'entire' pool? One criticism of HDsEMG arrays is that they are biased towards decomposing superficial larger motor units and in the male sex. 

      The decomposition of EMG signals recorded with arrays of surface electrodes is indeed biased toward the identification of motor units with the larger action potentials in the signal (large and superficial; Farina & Holobar, 2016, Proceedings of IEEE). We took advantage of the latter limitation by performing successive contractions at different levels of force with the objective to identify the last recruited motor units (larger units according to the size principle), while tracking the smaller ones. In that way, we were able to sequentially identify motor units recruited from 0% to 75% of the maximal force. A similar approach could be applied to selective intramuscular electrodes. However, because identifying motor units up to maximal force requires a highly selective pair of fine wires or needle electrodes, the procedure described above should be repeated hundreds of times to reach the same samples as those obtained in our study.

      L151-161. The ratio between simulated and decomposed surface EMG reached 55% for the TA and 70% for the VL. How does this provide support that the "entire" MU pool was sampled?

      As said above, we do not identify all the motor units during each contraction, but rather the larger ones with the larger action potentials within the EMG signals. However, we used here a sequential approach to identify new motor units during each trial while tracking smaller units. In that way, we were able to sequentially identify on average 130 motor units per muscle.

      To avoid any confusion, we removed the references to ‘entire’ pools in the manuscript.  

      L266. How is it possible that in some participants no motor units were recruited below 5% of MVC? Do the authors suspect they produced force from synergist muscles or that the decomposition failed to identify these presumably smaller and deeper motor units?

      This mostly results from the limitations of the decomposition algorithm. In these participants, it is likely that the decomposition was biased toward motor units only active during the plateau of force or recruited at the end of the ramp.

      Figure 2B. Do the higher threshold motor units with linear responses receive more inhibitory input (coactivation) or are devoid of large PIC effects?

      Were antagonist muscles recorded? During higher contraction intensities, greater antagonist coactivation in some trials or participants may have linearized the firing rate profiles (e.g., Revill and Fuglevand, 2017).

      L427. This is a neat finding that higher threshold motor units are less likely to have the functional  hallmark of a strong PIC effect and may therefore be more representative of extrinsic inputs. Could this be an advantage to increase the precision of stronger contractions or reduce the fatigability of muscle fibres during repeated strong contractions?

      Synaptic contacts with Renshaw cells (Fyffe, 1991, J Neurophysiol) and Ia inhibitory interneurons (Heckman & Binder, 1991, J Neurophysiol) are widespread within pools of motor units, which induces homogeneously distributed inhibitory inputs. However, the amplitude of these inhibitory inputs can increase with muscle force. We found that the EMG amplitude of the soleus and the gastrocnemius medialis recorded with bipolar EMG during the dorsiflexion increased with the force. Therefore, the higher inhibitory at higher force may also contribute to the linearisation of the force/firing rate relations observed with high threshold motor neurons, as suggested by Revill and Fuglevand (2017, J Physiol). 

      We discussed this point in the new version of the manuscript (P17; L415):

      ‘The level of recurrent and reciprocal inhibition has also probably increased with the increase in force during the ramp up, progressively blunting the effect of persistent inward currents for late-recruited motor units (Kuo et al., 2003; Hyngstrom et al., 2007; Revill and Fuglevand, 2017). This may also explain the larger percentage of high-threshold motor units with a linear fit for the firing rate/force relation (Figure 2), as the integration of larger inhibitory inputs should linearise the firing rate/force relation (Revill and Fuglevand, 2017).’. 

      In Figure 2B, it makes sense that linear firing rate responses occur later in the ramp contraction when myotendinous slack is lower. Do the authors think contractile dynamics are matched to the firing rate profiles?

      To our knowledge, there is no direct data on the link between the linearity of the force/firing rate relation and the stiffness of the tendon. A recent work from Mazzo et al. (2021, J Physiol) has shown that repeated stretches of calf muscles, which induce a decrease in their stiffness, induced an increase in motor unit firing rate at low levels of forces. This indicates that the contractile properties of the muscle may potentially also impact the profile of rate coding when considered as function of force. 

      We added this point in the discussion (P20; L512):

      ‘On a different note, the steep increase in firing rate over the first percentages of the ramp-up may also enable the motor units to produce the required level of force despite having a more compliant muscletendon unit (Mazzo et al., 2021).’

      L371. It is likely that Marshall et al., 2022, recorded over 100 unique motor units from the same animal.

      The reviewer is right that Marshall may have identified hundreds of motor units across sessions in one non-human primate. However, there is no ways to verify this statement as they used fine wire electrodes inserted in different locations in each session, which made it impossible to verify the uniqueness of each identified unit. Conversely, we verified in our study that all the motor units were unique using the distribution of their surface action potentials across the 236 surface electrodes.

      L378. What do the authors mean by "rate coding is similar"? I find this statement confusing. Is this regarding the absolute firing rate range, response to force increases, hysteresis, or how they scale with contraction intensity?

      This statement was removed from the discussion to avoid any confusion.

      Reviewer #3 (Recommendations For The Authors): 

      The authors may want to consider other mechanisms of the linearization of discharge rates of medium and high threshold motor units. Monica's work may suggest that, over time, there is a subthreshold activation of the PIC, which serves to linearize the eventual suprathreshold activation underlying repetitive discharge. Additionally, Andy has shown that inhibitory drive from cutaneous inputs can linearize the initial acceleration of low threshold motor units - cutaneous inputs, or even Ib inputs, may be greater later in the contraction and serve to linearize discharge rates. 

      We thank the reviewer for their input on the discussion, where we now discuss this point:

      ‘The level of recurrent and reciprocal inhibition has also probably increased with the increase in force during the ramp up, progressively blunting the effect of persistent inward currents for late-recruited motor units (Kuo et al., 2003; Hyngstrom et al., 2007; Revill and Fuglevand, 2017). This may also explain the larger percentage of high-threshold motor units with a linear fit for the firing rate/force relation (Figure 2), as the integration of larger inhibitory inputs should linearise the firing rate/force relation (Revill and Fuglevand, 2017).’. 

      Lines 433 - intrinsic properties, in particular the afterhyperpolarization, will likely influence maximal discharge rate and provide a ceiling to the change in firing rate.

      This point is now discussed in the draft (P17; L428):

      ‘This difference may be explained by smaller excitatory synaptic inputs onto low- than high-threshold motoneurons (Powers and Binder, 2001; Heckman and Enoka, 2012), lower synaptic driving potential of the dendritic membrane (Powers and Binder, 2000; Cushing et al., 2005; Fuglevand et al., 2015), and longer and larger afterhyperpolarisation phase in low- than high-threshold motoneurons (Bakels and Kernell, 1993; Gardiner, 1993; Deardorff et al., 2013; Caillet et al., 2022).’

      The actual yield per contraction is not entirely clear. Figure S2 is quite nice in this regard, but a table with this and other information on it may be helpful. This would help with the beginning of the abstract and discussion when it is stated that, on average over 100 motor units were identified per person. 

      We added a table in the results to give the number of motor units identified per contraction.

      Are the thin film units represented in S2 and S3?

      Only motor units identified from signals recorded with arrays of surface electrodes are presented in figures S2 and S3.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Otero-Coronel et al. address an important question for neuroscience - how does a premotor neuron capable of directly controlling behavior integrate multiple sources of sensory inputs to inform action selection? For this, they focused on the teleost Mauthner cell, long known to be at the core of a fast escape circuit. What is particularly interesting in this work is the naturalistic approach they took. Classically, the M-cell was characterized, both behaviorally and physiologically, using an unimodal sensory space. Here the authors make the effort (substantial!) to study the physiology of the M-cell taking into account both the visual and auditory inputs. They performed well-informed electrophysiological approaches to decipher how the M-cell integrates the information of two sensory modalities depending on the strength and temporal relation between them.

      Strengths:

      The empirical results are convincing and well-supported. The manuscript is well-written and organized. The experimental approaches and the selection of stimulus parameters are clear and informed by the bibliography. The major finding is that multisensory integration increases the certainty of environmental information in an inherently noisy environment.

      Weaknesses:

      Even though the manuscript and figures are well organized, I found myself struggling to understand key points of the figures.

      For example, in Figure 1 it is not clear what are actually the Tonic and Phasic components. The figure will benefit from more details on this matter. Then, in Figure 4 the label for the traces in panel A is needed since I was not able to pick up that they were coming from different sensory pathways.

      We added an inset to Figure 1 showing how the tonic and phasic components are measured. We now use solid colors instead of transparencies, and the color scheme was modified for consistency. We added labels to the traces used as examples in Figure 4 panel A.

      In line 338 it should be optic tectum and not "optical tectum".

      We replaced two instances of the term “optical tectum” with “optic tectum”.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Otero-Coronel and colleagues use a combination of acoustic stimuli and electrical stimulation of the tectum to study MSI in the M-cells of adult goldfish. They first perform a necessary piece of groundwork in calibrating tectal stimulation for maximal M-cell MSI, and then characterize this MSI with slightly varying tectal and acoustic inputs. Next, they quantify the magnitude and timing of FFI that each type of input has on the M-cell, finding that both the tectum and the auditory system drive FFI, but that FFI decays more slowly for auditory signals. These are novel results that would be of interest to a broader sensory neuroscience community. By then providing pairs of stimuli separated by 50ms, they assess the ability of the first stimulus to suppress responses to the second, finding that acoustic stimuli strongly suppress subsequent acoustic responses in the M-cell, that they weakly suppress subsequent tectal stimulation, and that tectal stimulation does not appreciably inhibit subsequent stimuli of either type. Finally, they show that M-cell physiology mirrors previously reported behavioural data in which stronger stimuli underwent less integration.

      The manuscript is generally well-written and clear. The discussion of results is appropriately broad and open-ended. It's a good document. Our major concerns regarding the study's validity are captured in the individual comments below. In terms of impact, the most compelling new observation is the quantification of the FFI from the two sources and the logical extension of these FFI dynamics to M-cell physiology during MSI. It is also nice, but unsurprising, to see that the relationship between stimulus strength and MSI is similar for M-cell physiology to what has previously been shown for behavior. While we find the results interesting, we think that they will be of greatest interest to those specifically interested in M-cell physiology and function.

      Strengths:

      The methods applied are challenging and appropriate and appear to be well executed. Open questions about the physiological underpinnings of M-cell function are addressed using sound experimental design and methodology, and convincing results are provided that advance our understanding of how two streams of sensory information can interact to control behavior.

      Weaknesses:

      Our concerns about the manuscript are captured in the following specific comments, which we hope will provide a useful perspective for readers and actionable suggestions for the authors.

      Comment 1 (Minor):

      Line 124. Direct stimulation of the tectum to drive M-cell-projecting tectal neurons not only bypasses the retina, it also bypasses intra-tectal processing and inputs to the tectum from other sources (notably the thalamus). This is not an issue with the interpretation of the results, but this description gives the (false) impression that bypassing the retina is sufficient to prevent adaptation. Adding a sentence or two to accurately reflect the complexity of the upstream circuitry (beyond the retina) would be welcome.

      The reviewer is right in that direct tectal stimulation bypasses all neural processing upstream, not only that produced in the retina and that the tectum does not exclusively process visual information. The revised version now acknowledges (lines 245-252, revised manuscript) the complexity of the system.

      Comment 2 (Major): The premise is that stimulation of the tectum is a proxy for a visual stimulus, but the tectum also carries the auditory, lateral line, and vestibular information. This seems like a confound in the interpretation of this preparation as a simple audio-visual paradigm. Minimally, this confound should be noted and addressed. The first heading of the Results should not refer to "visual tectal stimuli".

      We changed the heading of the corresponding section of the Results section as requested and also omitted the term “optic” when we did not specifically refer to tectal circuits that process optic information.  

      Comment 3 (Major): Figure 1 and associated text.

      It is unclear and not mentioned in the Methods section how phasic and tonic responses were calculated. It is clear from the example traces that there is a change in tonic responses and the accumulation of subthreshold responses. Depending on how tonic responses were calculated, perhaps the authors could overlay a low-passed filtered trace and/or show calculations based on the filtered trace at each tectal train duration.

      The revised version of the manuscript now includes a description of how the phasic and tonic components were calculated (lines 163-172). We also modified the color scheme and the inset of Figure 1A to clarify how these two components were defined. Since we quantified the response in a 12 ms window, we did not include an overlayed low-pass filtered trace since it might be confusing with respect to the metric used.

      Comment 4 (Minor): Figure 3 and associated text.

      This is a lovely experiment. Although it is not written in text, it provides logic for the next experiment in choosing a 50ms time interval. It would be great if the authors calculated the first timepoint at which the percentage of shunting inhibition is not significantly different from zero. This would provide a convincing basis for picking 50ms for the next experiment. That said, I suspect that this time point would be earlier than 50 ms. This may explain and add further complexity to why the authors found mostly linear or sublinear integration, and perhaps the basis for future experiments to test different stimulus time intervals. Please move calculations to Methods.

      We moved calculations to the Methods section (lines 201-208). We mention the rationale for selecting the 50 ms interval in the next experiment (Figure 4, lines 369-371) and discuss in detail the potential contribution of FFI to the complexity of the integration taking place in the M-cell circuit (Discussion, lines 512-535).

      Comment 5 (Major): Figure 4C and lines 398-410.

      These are beautiful examples of M-cell firing, but the text suggests that they occurred rarely and nowhere close to significantly above events observed from single modalities. We do not see this as a valid result to report because there is insufficient evidence that the phenomenon shown is consistent or representative of your data.

      Our experimental conditions required anesthesia and paralysis, conditions designed to reduce neuronal firing and suppress motor output. We think it is valuable to report that we still see that simultaneous presentation subthreshold unisensory stimuli can add up to become suprathreshold, paralleling behavioral observations. We do not claim and acknowledge that those examples are representative of our recording conditions, but are likely to be more representative of the multisensory integration process taking place in freely moving fish. The revised manuscript adds context to these example traces to justify their inclusion (lines 420-426).

      Reviewer #2 (Recommendations For The Authors):

      Methods

      The Methods section on "Auditory stimuli" contains a long background on the biophysics of the M-cell and its inputs. This does not belong in Methods. The same is true, to a lesser degree, in the next heading. The argument that direct stimulation of the tectum is necessary to bypass adaptation should be in Results, not Methods.

      Following the reviewer recommendation, we have moved both paragraphs to the Results section.

      Figure 1 and associated text.

      Visually, the use of transparency to differentiate phasic and tonic calculations is difficult to read. Example traces are also cut off at the top and bottom at random sizes.

      We changed the color scheme to avoid the use of transparency and modified the inset of Figure 1A to clarify how the phasic and tonic components were calculated. We also modified the dimensions of the clipping mask used to trim the stimulation artifacts of sample traces to make them more similar while still enabling clear observation of the phasic and tonic components of the response.

      Line 338 "optical tectum" is not correct. "optic tectum" is more common, or better still, just "tectum".

      We apologize for the error. The two instances of “optical tectum” were replaced by the correct term (“optic tectum”).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Comments:

      (1) We find it interesting that the reshaped model showed decreased firing rates of the projection neurons. We note that maximizing the entropy <-ln p(x)> with a regularizing term -\lambda <\sum _i f(x_i)>, which reflects the mean firing rate, results in \lambda _i = \lambda for all i in the Boltzmann distribution. In other words, in addition to the homeostatic effect of synaptic normalization which is shown in Figures 3B-D, setting all \lambda_i = 1 itself might have a homeostatic effect on the firing rates. It would be better if the contribution of these two homeostatic effects be separated. One suggestion is to verify the homeostatic effect of synaptic normalization by changing the value of \lambda.

      This is an interesting question and we, therefore, explored the effects of different values of $\lambda$ on the performance of unconstrained reshaped RP models and their firing rates. The new supp. Figure 2B presents the results of this exploration: We found that for models with a small set of projections, a high value of $\lambda$ results in better performance than models with low ones, while for models with a large set of projections we find the opposite relation. The mean firing rates of the projection neurons for models with different values of $\lambda$ show a clear trend, where higher $\lambda$ values results in lower mean firing rates.

      Thus, these results suggest an interplay between the optimal size of the projection set and the value of $\lambda$ one should pick. For the population sizes and projection sets we have used here, $\lambda=1$ is a good choice, but, for different population sizes or data sets a different value of $\lambda$ might be better.

      Thus, in addition to supp. Figure 2B, we therefore added the following to the main text:

      “An additional set of parameters that might affect the Reshaped RP models are the coefficients $\lambda$, that weigh each of the projections. Above, we used $\lambda=1$ for all projections, here we investigated the effect of the value of $\lambda$ on the performance of the Reshaped RP models (supp. Figure 2B). We find that for models with a small projection set, high $\lambda$ values result in better performance than models with low values. We find an opposite relation for models with large number projection sets. (We submit that the performance decrease of Reshaped RP models with high value of $\lambda$, as the number of projections grows, is a reflection of the non-convex nature of the Reshaped RP optimization problem).

      The mean firing rates of the projection neurons for models with different values of $\lambda$ show a clear trend, higher $\lambda$ values results in lower mean firing rates. Thus, we conclude that there is an interplay between the number of projections and the value of $\lambda$ we should pick. For the sizes of projection sets we have used here, $\lambda=1$ is a good choice, but, we note that in general, one should probably seek the appropriate value of $\lambda$ for different population sizes or data sets.”

      In addition, we explored the effect of synaptic normalization on models with different values of $\lambda$ (supp. Figure 3). We found that homeostatic Reshaped RP models are superior to the non-homeostatic Reshaped RP models: For low values of $\lambda$, the homeostatic and Reshaped RP models show similar performance in terms of log-likelihood, whereas the homeostatic models are more efficient. For high values of $\lambda_i$ homeostatic models are not only more efficient but also show better performance. These results indicate that the benefit of the homeostatic model is insensitive to the specific choice of $\lambda$.

      In addition to supp. Figure 3, we added the following to the main text:

      “Exploring the effect of synaptic normalization on models with different values of $\lambda$ (supp. Figure 3), we find that homeostatic Reshaped RP models are superior to the non-homeostatic Reshaped RP models: For low values of $\lambda$, the homeostatic and Reshaped RP models show similar performance in terms of log-likelihood, whereas the homeostatic models are more efficient. Importantly, for high values of $\lambda_i$ homeostatic models are not only more efficient but also show better performance. We conclude that the benefit of the homeostatic model is insensitive to the specific choice of $\lambda$.”

      (2) As far as we understand, \theta_i (thresholds of the neurons) are fixed to 1 in the article. Optimizing the neural threshold as well as synaptic weights is a natural procedure (both biologically and engineeringly), and can easily be computed by a similar expression to that of a_ij (equation 3). Do the results still hold when changing \theta _i is allowed as well? For example,

      a. If \theta _i becomes larger, the mean firing rates will decrease. Does the backprop model still have higher firing rates than the reshaped model when \theta _i are also optimized?

      b. Changing \theta _i affects the dynamic range of the projection neurons, thus could modify the effect of synaptic constraints. In particular, does it affect the performance of the bounded model (relative to the homeostatic input models)?

      We followed the referee’s suggestion, and extended our current analysis, and added threshold optimization to the Reshape and Backpropagation models, which is now shown in supp. Figure 2A.  Comparing the performance and properties of these models to ones with fixed thresholds, we found that this addition had a small effect on the performance of the models in terms of their likelihood. (supp. Figure 2A). We further find that backpropagation models with tuned thresholds show lower firing rates compared to backpropagation models with fixed threshold, while reshaped RP models with optimized thresholds show higher firing rates compared to models with fixed threshold. These differences are, again, rather small, and both versions of the reshaped RP models show lower firing rates compared to both versions of the backpropagation models.

      In addition to supp. Figure 2A, we added the following to the main text:

      “The projections' threshold $\theta_i$, which is analogous to the spiking threshold of the projection neurons, strongly affects the projections' firing rates. We asked how, in addition to reshaping the coefficients of each projection, we can also change $\theta_i$ to optimize the reshaped RP and backpropagation models.

      We find that this addition has a small effect on the performance of the models in terms of their likelihood (supp. Figure 2A).

      We also find that this has a small effect on the firing rates of the projection neurons: backpropagation models with tuned thresholds show lower firing rates compared to backpropagation models with fixed threshold, whereas reshaped RP models with optimized thresholds show higher firing rates compared to models with fixed threshold. Yet, both versions of the reshaped RP models show lower firing rates compared to both versions of the backpropagation models. Given the small effect of tuning threshold on models' performance and their internal properties, we will, henceforth, focus on Reshaped RP models with fixed thresholds.”

      (3) In Figure 1, the authors claim that the reshaped RP model outperforms the RP model. This improved performance might be partly because the reshaped RP model has more parameters to be optimized than the RP model. Indeed, let the number of projections N and the in-degree of the projections K, then the RP model and the reshaped RP model have N and KN parameters, respectively. Does the reshaped model still outperform the original one when only (randomly chosen) N weights (out of a_ij) are allowed to be optimized and the rest is fixed? (or, does it still outperform the original model with the same number of optimized parameters (i.e. N/K neurons)?)

      Indeed, the number of tuned parameters in the reshaped RP model is much larger compared to the number of tuned parameters in an RP model with the same projection set size. Yet, we submit that the larger number of tuned parameters is not the reason for the improved performance of the reshaped RP model: Maoz et al [30] have already shown that by optimizing an RP model with a small projection set using the pruning and replacement of projections (P&R), one can reach high accuracy with an almost order of magnitude fewer projections. Thus, we argue that the improved performance stems from the properties of the projections in the model.

      Accordingly, we therefore added supp. Figure 2B that shows the performance of P&R sigmoid RP model compared to RP and reshaped RP models. We added the following to the main text:

      “Because reshaping may change all the existing synapses of each projection, the number of parameters is the number of projections times the projections in-degree. While this is much larger than the number of parameters that we learn for the RP model (one for each projection), we suggest that the performance of the reshaped models is not a naive result of having more parameters. In particular, we have seen that RP models that use a small set of projections can be very accurate when the projections are optimized using the pruning and replacement process [30] (see also supp. Figure 1B). Thus, it is really the nature of the projections that shapes the performance. Indeed, our results here show that a small fixed connectivity projection set with weight tuning is enough for accurate performance which is on par or better than an RP model with more projections.”

      (4) In Figure 2, the authors have demonstrated that the homeostatic synaptic normalization outperforms the bounded model when the allowed synaptic cost is small. One possible hypothesis for explaining this fact is that the optimal solution lies in the region where only a small number of |a_ij| is large and the rest is near 0. If it is possible to verify this idea by, for example, exhibiting the distribution of a_ij after optimization, it would help the readers to better understand the mechanism behind the superiority of the homeostatic input model.

      We modified supp. Figure 4 and made the following change in the relevant part in the main text to address the reviewer comment about the distribution of the $a_{ij}$ values:

      “Figure 5E shows the mean rotation angle over 100 homeostatic models as a function of synaptic cost -- reflecting that the different forms of homeostatic regulation results in different reshaped projections. We show in Supp. Figure 4C the histogram of the rotation angles of several different homeostatic models, as well as the unconstrained Reshape model.

      Analyzing the distribution of the synaptic weights $a_{ij}$ after learning leads to a similar conclusion (supp. Figure 4D): The peak of the histograms is at $a_{ij} = 0$, implying that during reshaping most synapses are effectively pruned. While the distribution is broader for models with higher synaptic budget, it is asymmetric, showing local maxima at different values of $a_{ij}$.

      The diversity of solutions that the different model classes and parameters show imply a form of redundancy in model choice or learning procedure. This reflects a multiplicity of ways to learn or optimize such networks that biology could use to shape or tune neural population codes.“

      (5) In Figures 5D and 5E, the authors present how different reshaping constraints result in different learning processes ("rotation"). We find these results quite intriguing, but it would help the readers understand them if there is more explanation or interpretation. For example,

      a. In the "Reshape - Hom. circuit 4.0" plot (Fig 5D, upper-left), the rotation angle between the two models is almost always the same. This is reasonable since the Homeostatic Circuit model is the least constrained model and could be almost irrelevant to the optimization process. Is there any similar interpretation to the other 3 plots of Figure 5D?

      We added a short discussion of this difference to the main text, but do not have a geometric or other intuitive explanation for the nature of these differences.

      b. In Figure 5E, is there any intuitive explanation for why the three models take minimum rotation angle at similar global synaptic cost (~0.3)?

      We added discussion of this issue to the main text, and the histogram of the rotation angles in Supp Figure 4c shows that they are not identical. But, we don’t have an intuitive explanation for why the mean values are so similar.

      Recommendations for the authors:

      (1) Some claims on the effect of synaptic normalization on the reshaped model sound a little overstated since the presented evidence does not clearly show the improvement of the computational performance (in comparison to the vanilla reshaped model) in terms of maximizing the likelihood of the inputs. Here are some examples of such claims: "Incorporating more biological features and utilizing synaptic normalization in the learning process, results in even more efficient and accurate models." (in Abstract), "Thus, our new scalable, efficient, and highly accurate population code models are not only biologically-plausible but are actually optimized due to their biological features." (in Abstract), or "in our Reshaped RP models, homeostatic plasticity optimizes the performance of network models" (in Discussion).

      We changed the wording according to the reviewers’ suggestions.

      (2) In equation (1) and the following sentence, \theta _j (threshold) should be \theta _i.

      Fixed

      (3) While the authors mention that "reshaping with normalization or without it drives the projection neurons to converge to similar average firing rate values (Figure 3B)", they also claim that "reshaping with normalization implies lower firing rates as well as... (Figure 3E)". These two claims look a little inconsistent to us. Besides, it is not very clear from Figure 3E that the normalization decreases the firing rate (it is clear from Figure 3B, though). How about just deleting "lower firing rates as well as"?

      We changed the wording according to the reviewers’ suggestion.

      (4) The captions of Figures 4D and 4E should be exchanged.

      Fixed

      (5) Typo in In Figure 4F: "normalized in-dgreree".

      Fixed

      (6) In Figure 5D (upper left plot) the choice of "Reshape" and "Bounded3.0" looks a bit weird. Is this the typo of "Hom. cicruit 4.0"?

      There is no typo in the figure labels. We discussed the results of figure 5D in our response to point (5) in the public comments list and addressed the upper left panel of figure 5D in the main text.

      (7) In the paper, the letter \theta represents (1) the threshold of the projection neurons (eq. 1), (2) the "ceiling" value of the bounded model, and (3) the rotation angle of projections (Figure 5). We find this notation a bit confusing and recommend using different notations for different entities.

      Thanks for the suggestion, we changed the confusing notations: (1) The threshold of each projection neuron is still $\theta$, following the notation of the original RP model formulation [30]. (2) The notation of the “ceiling” value of the bounded model is now $\omega$. (3) The rotation angle of the projections during reshape is now marked by $\alpha$.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank you for the time you took to review our work and for your feedback! The main changes to the manuscript are: 

      (1) We have added additional analysis of running onsets in closed and open loop conditions for audiomotor (Figure 2H) and visuomotor (Figure 3H) coupling.  

      (2) We have also added analysis of running speed and pupil dilation upon mismatch presentation (Figures S2A and S2B, S4A and S4B, and S5A and S5B).

      (3) We have expanded on the discussion of the nature of differences between audiomotor and visuomotor mismatches.

      Reviewer #1:

      The manuscript presents a short report investigating mismatch responses in the auditory cortex, following previous studies focused on the visual cortex. By correlating the mouse locomotion speed with acoustic feedback levels, the authors demonstrate excitatory responses in a subset of neurons to halts in expected acoustic feedback. They show a lack of responses to mismatch in the visual modality. A subset of neurons show enhanced mismatch responses when both auditory and visual modalities are coupled to the animal's locomotion. 

      While the study is well-designed and addresses a timely question, several concerns exist regarding the quantification of animal behavior, potential alternative explanations for recorded signals, correlation between excitatory responses and animal velocity, discrepancies in reported values, and clarity regarding the identity of certain neurons. 

      Strengths: 

      (1) Well-designed study addressing a timely question in the field. 

      (2) Successful transition from previous work focused on the visual cortex to the auditory cortex, demonstrating generic principles in mismatch responses. 

      (3) The correlation between mouse locomotion speed and acoustic feedback levels provides evidence for a prediction signal in the auditory cortex. 

      (4) Coupling of visual and auditory feedback shows putative multimodal integration in the auditory cortex. 

      Weaknesses: 

      (1) Lack of quantification of animal behavior upon mismatches, potentially leading to alternative interpretations of recorded signals. 

      (2) Unclear correlation between excitatory responses and animal velocity during halts, particularly in closed-loop versus playback conditions. 

      (3) Discrepancies in reported values in a few figure panels raise questions about data consistency and interpretation. 

      (4) Ambiguity regarding the identity of the [AM+VM] MM neurons. 

      The manuscript is a short report following up on a series of papers focusing on mismatch responses between sensory inputs and predicted signals. While previous studies focused on the visual modality, here the authors moved to the auditory modality. By pairing mouse locomotion speed to the sound level of the acoustic feedback, they show that a subpopulation of neurons displays excitatory responses to halts in the (expected) acoustic feedback. These responses were lower in the open-loop state, when the feedback was uncorrelated to the animal locomotion. 

      Overall it is a well-designed study, with a timely and well-posed question. I have several concerns regarding the nature of the MM responses and their interpretations. 

      - One lacks quantification of the animal behavior upon mismatches. Behavioral responses may trigger responses in the mouse auditory cortex, and this would be an alternative explanation to the recorded signals. 

      What is the animal speed following closed-loop halts (we only have these data for the playback condition)? 

      We have quantified the running speed of the mouse following audiomotor and visuomotor mismatches. We found no evidence of a change in running speed. We have added this to Figures S2A and S4A, respectively.

      Is there any pupillometry to quantify possible changes in internal states upon halts (both closed-loop and playback)?

      The term 'internal state' may be somewhat ambiguous in this context. We assume the reviewer is asking whether we have any evidence for possible neuromodulatory changes. We know that there are noradrenergic responses in visual cortex to visuomotor mismatches (Jordan and Keller, 2023), but no cholinergic responses (Yogesh and Keller, 2023). Pupillometry, however, is likely not always sensitive enough to pick up these responses. With very strong neuromodulatory responses (e.g. to air puffs, or other startling stimuli), pupil dilation is of course detected, but this effect is likely at best threshold linear. Looking at changes in pupil size following audiomotor and visuomotor mismatch responses, we found no evidence of a change. We have added this to Figures S2B and S4B, respectively. Note, we suspect this is also strongly experience-dependent. The first audio- or visuomotor mismatch the mouse encounters is likely a more salient stimulus (to the rest of the brain, not necessarily to auditory or visual cortex), than the following ones.  

      These quantifications must be provided for the auditory mismatches but also for the VM or [AM+VM] mismatches.  

      During the presentation of multimodal mismatches [AM + VM], mice did not exhibit significant changes in running speed or pupil diameter. These data have been now added to Figures S5A and S5B.

      - AM MM neurons supposedly receive a (excitatory) locomotion-driven prediction signal. Therefore the magnitude of the excitation should depend on the actual animal velocity. Does the halt-evoked response in a closed loop correlate with the animal speed during the halt? Is the correlation less in the playback condition? 

      This is indeed what one would expect. We fear, however, that we don’t have sufficient data to address this question properly. Moreover, there is an important experimental caveat that makes the interpretation of the results difficult. In addition to the sound we experimentally couple to the locomotion speed of the mouse, the mouse self-generates sound by running (the treadmill rotating, changes to the airflow of the air-supported treadmill, footsteps, etc.). These sources of sound all also correlate in intensity with running speed. Thus, it is not entirely clear how our increase in sound amplitude with increasing running speed relates to the increase in self-generated sounds on the treadmill. This is one of the key reasons we usually do this type of experiment in the visual system where experimental control of visual flow feedback (in a given retinotopic location) is straightforward. 

      Having said that, if we look at the how mismatch responses change as a function of locomotion speed across the entire population of neurons, there appears to be no systematic change with running speed (and the effects are highly dependent on speed bins we choose). However, just looking at the most audiomotor mismatch responsive neurons, we find a trend for increased responses with increasing running speed (Author response image 1). We analyzed the top 5% of cells that showed the strongest response to mismatch (MM) and divided the MM trials into three groups based on running speed: slow (10-20 cm/s), middle (20-30 cm/s), and fast (>30 cm/s). Given the fact that we have on average 14 mismatch events in total per neuron, we don’t have sufficient data to analyze this. 

      Author response image 1.

      The average response of strongest AM MM responders to AM mismatches as a function of running speed (data are from 51 cells, 11 fields of view, 6 mice). 

      Values in Figure 2H are way higher than what can be observed in Figures 2C, and D. Could you explain the mismatch in values? Same for 3H and 4F. 

      In Figure 2H (now Figure S2F), we display responses from 4 755 individual neurons. Since most recorded neurons did not exhibit significant responses to mismatch presentations, their responses cluster around zero, significantly contributing to the final average shown in panel D. To clarify how individual neurons contribute to the overall population activity, we have added a histogram showing the distribution of neurons responding to audiomotor mismatch and sound playback halts. We hope this addition clarifies how individual neuron responses affect the final population activity. 

      Furthermore, neurons exhibiting suppression upon closed-loop halts (Figure 2C) show changes in deltaF/F of the same order of magnitude as the AM MM neurons (with excitatory responses). I cannot picture where these neurons are found in the scatter plot of Figure 2H. 

      This is caused by a ceiling effect. While we could adjust the scale of the heat map to capture neurons with very high responses (e.g. [-50 50], Author response image 2), doing so would obscure the response dynamics of most neurons. Note that the number of neurons on the y-axis far exceeds the resolution of this figure and thus there are also aliasing issues that mask the strong responses. 

      Author response image 2.

      Responses of all L2/3 ACx neurons to audiomotor mismatches. Same as Figure 2C with different color scale [-50 50] which does not capture most of the neural activity.  

      - Are [AM+VM] MM neurons AM neurons? 

      Many of [AM + VM] and [AM] neurons overlap but it is not exactly the same population. This is partially visible in Figure 4F. There is a subset of neurons (13.7%; red dots, Figure 4F) that selectively responded to the concurrent [AM+VM] mismatch, while a different subset of neurons (11.2%; yellow dots, Figure 4F) selectively responded to the mismatch responses in isolation. The [VM] response contributes only little to the sum of the two responses [AM] + [VM]. 

      Please do not use orange in Figure 4F, it is perceptually too similar to red. 

      We have now changed it to yellow. 

      Reviewer #2 (Public Review): 

      In this study, Solyga and Keller use multimodal closed-loop paradigms in conjunction with multiphoton imaging of cortical responses to assess whether and how sensorimotor prediction errors in one modality influence the computation of prediction errors in another modality. Their work addresses an important open question pertaining to the relevance of non-hierarchical (lateral cortico-cortical) interactions in predictive processing within the neocortex. 

      Specifically, they monitor GCaMP6f responses of layer 2/3 neurons in the auditory cortex of head-fixed mice engaged in VR paradigms where running is coupled to auditory, visual, or audio-visual sensory feedback. The authors find strong auditory and motor responses in the auditory cortex, as well as weak responses to visual stimuli. Further, in agreement with previous work, they find that the auditory cortex responds to audiomotor mismatches in a manner similar to that observed in visual cortex for visuomotor mismatches. Most importantly, while visuomotor mismatches by themselves do not trigger significant responses in the auditory cortex, simultaneous coupling of audio-visual inputs to movement non-linearly enhances mismatch responses in the auditory cortex. 

      Their results thus suggest that prediction errors within a given sensory modality are non-trivially influenced by prediction errors from another modality. These findings are novel, interesting, and important, especially in the context of understanding the role of lateral cortico-cortical interactions and in outlining predictive processing as a general theory of cortical function. 

      In its current form, the manuscript lacks sufficient description of methodological details pertaining to the closed-loop training and the overall experimental design. In several scenarios, while the results per se are convincing and interesting, their exact interpretation is challenging given the uncertainty about the actual experimental protocols (more on this below). Second, the authors are laser-focused on sensorimotor errors (mismatch responses) and focus almost exclusively on what happens when stimuli deviate from the animal's expectations. 

      While the authors consistently report strong running-onset responses (during open-loop) in the auditory cortex in both auditory and visual versions of the task, they do not discuss their interpretation in the different task settings (see below), nor do they analyze how these responses change during closed-loop i.e. when predictions align with sensory evidence. 

      However, I believe all my concerns can be easily addressed by additional analyses and incorporation of methodological details in the text. 

      Major concerns: 

      (1) Insufficient analysis of audiomotor mismatches in the auditory cortex: 

      Lack of analysis of the dependence of audiomotor mismatches on the running speed: it would be helpful if the authors could clarify whether the observed audiomotor mismatch responses are just binary or scale with the degree of mismatch (i.e. running speed). Along the same lines, how should one interpret the lack of dependence of the playback halt responses on the running speed? Shouldn't we expect that during playback, the responses of mismatch neurons scale with the running speed? 

      Regarding the scaling of AM mismatch responses with running speed, please see our response to reviewer 1 above to the same question. 

      Regarding the playback halt response and dependence on running speed, we would not expect there to be a dependence. The playback halt response (by design) measures the strength of the sensory response to a cessation of a stimulus (think OFF response). These typically are less strong in cortex than the corresponding ON responses but need to be controlled for (else a mismatch response might just be an OFF response – the prediction error is quantified as the difference between AM mismatch response and playback halt response). Given that sound onset responses only have a small dependence on running state, we would similarly expect sound offset (playback halt) responses to exhibit only minimal dependence on running state. 

      Slow temporal dynamics of audiomotor mismatches: despite the transient nature of the mismatches (1s), auditory mismatch responses last for several seconds. They appear significantly slower than previous reports for analogous visuomotor mismatches in V1 (by the same group, using the same methods) and even in comparison to the multimodal mismatches within this study (Figure 4C). What might explain this sustained activity? Is it due to a sustained change in the animal's running in response to the auditory mismatch? 

      This is correct, neither AM or AM+VM mismatch return to baseline in the 3 seconds following onset. VM mismatch response in visual cortex also do not return to baseline in that time window (see e.g.

      Figure 1E in (Attinger et al., 2017), or Figure 1F in (Zmarz and Keller, 2016). What the origin or computation significance of this sustained calcium response is we do not know. In intracellular signals, we do not see this sustained response (Jordan and Keller, 2020). Also peculiar is indeed the fact that in the case of AM mismatch the sustained response is similar in strength to the initial response. But also here, why this would be the case, we do not know. It is conceivable that the initial and the sustained calcium response have different origins, if the sustained response amplitude is all or nothing, the fact that the AM mismatch response is the smallest of the three could explain why sustained and initial responses are closer than for [AM+VM] or VM (in visual cortex) mismatch responses. All sustained responses appear to be roughly 1% dF/F. There are no apparent changes in running speed or pupil dilation that would correlate with the sustained activity (new panel A in Figure S2). 

      (2) Insufficient analysis and discussion of running onset responses during audiomotor sessions: The authors report strong running-onset responses during open-loop in identified mismatch neurons. They also highlight that these responses are in agreement with their model of subtractive prediction error, which relies on subtracting the bottom-up sensory evidence from top-down motor-related predictions. I agree, and, thus, assume that running-onset responses during the open loop in identified 'mismatch' neurons reflect the motor-related predictions of sensory input that the animal has learned to expect. If this is true, one would expect that such running-onset responses should dampen during closed-loop, when sensory evidence matches expectations and therefore cancels out this prediction. It would be nice if the authors test this explicitly by analyzing the running-related activity of the same neurons during closed-loop sessions. 

      Thank you for the suggestion. We now show running onset responses in both closed and open loop conditions for audiomotor and visuomotor coupling (new Figures 2H and 3H). In closed loop, we observe only a transient running onset response. In the open loop condition, running onset responses are sustained. For the visuomotor coupling, running onset responses are sustained in both closed and open loop conditions. This would be consistent with a slightly delayed cancellation of sound and motor related inputs in the audiomotor closed loop condition but not otherwise. 

      (3) Ambiguity in the interpretation of responses in visuomotor sessions. 

      Unlike for auditory stimuli, the authors show that there are no obvious responses to visuomotor mismatches or playback halts in the auditory cortex. However, the interpretation of these results is somewhat complicated by the uncertainty related to the training history of these mice. Were these mice exclusively trained on the visuomotor version of the task or also on the auditory version? I could not find this info in the Methods. From the legend for Figure 4D, it appears that the same mice were trained on all versions of the task. Is this the case? If yes, what was the training sequence? Were the mice first trained on the auditory and then the visual version? 

      The training history of the animals is important to outline the nature of the predictions and mismatch responses that one should expect to observe in the auditory cortex during visuomotor sessions.

      Depending on whether the mice in Figure 3 were trained on visual only or both visual and auditory tasks, the open-loop running onset responses may have different interpretations. 

      a) If the mice were trained only on the visual task, how should one interpret the strong running onset responses in the auditory cortex? Are these sensorimotor predictions (presumably of visual stimuli) that are conveyed to the auditory cortex? If so, what may be their role? 

      b) If the mice were also trained on the auditory version, then a potential explanation of the running-onset responses is that they are audiomotor predictions lingering from the previously learned sensorimotor coupling. In this case, one should expect that in the visual version of the task, these audiomotor predictions (within the auditory cortex) would not get canceled out even during the closedloop periods. In other words, mismatch neurons should constantly be in an error state (more active) in the closed-loop visuomotor task. Is this the case? 

      If so, how should one then interpret the lack of a 'visuomotor mismatch' aligned to the visual halts, over and above this background of continuous errors? 

      As such, the manuscript would benefit from clearly stating in the main text the experimental conditions such as training history, and from discussing the relevant possible interpretations of the responses. 

      Mice were not trained on either audiomotor or visuomotor coupling and were reared normally. Prior to the recording day, the mice were habituated to running on the air-supported treadmill without any coupling for up to 5 days. On the first recording day, the mice experienced all three types of sessions (audiomotor, visuomotor, or combined coupling) in a random order for the first time. We have clarified this in the methods. 

      Regarding the question of how one should interpret the strong running onset responses in the auditory cortex, this is complicated by the fact that – unless mice are raised visually or auditorily deprived – they always have life-long experience with visuomotor or audiomotor coupling. The visuomotor coupling they experience in VR is geometrically matched to what they would experience by moving in the real world, for the audiomotor coupling the exact relationship is less clear, but there are a diverse set of sound sources that scale in loudness with increasing running speed. Hence running onset responses reflect either such learned associations (as the reviewer also speculates), or spurious input. Rearing mice without coupling between movement and visual feedback does not abolish movement related responses in visual cortex (Attinger et al., 2017), to the contrary, it enhances them considerably. We suspect this reflects visual cortex being recruited for other functions in the absence of visual input. But given the data we have we cannot distinguish the different possible sources of running related responses. It is very likely that any “training” related effect we could achieve in a few hours pales in comparison to the life-long experience the mouse has in the world. 

      Regarding the lack of a 'visuomotor mismatch' aligned to the visual halts, we are not sure we understand. Our interpretation is that there are no (or only a very small - we speculate that any nonzero VM mismatch response is just inherited from visual cortex) VM mismatch responses in auditory cortex above chance. Our data are consistent with the interpretation that there is no opposition of bottom up visual and top down motor related input in auditory cortex, hence no VM mismatch responses (independent of how strong the top-down motor related input is). This is of course not surprising – this is more of a sanity check and becomes relevant in the context of interpreting AM+VM responses. 

      (4) Ambiguity in the interpretation of responses in multimodal versus unimodal sessions. 

      The authors show that multimodal (auditory + visual) mismatches trigger stronger responses than unimodal mismatches presented in isolation (auditory only or visual only). Further, they find that even though visual mismatches by themselves do not evoke a significant response, co-presentation of visual and auditory stimuli non-linearly augments the mismatch responses suggesting the presence of nonhierarchical interactions between various predictive processing streams. 

      In my opinion, this is an important result, but its interpretation is nuanced given insufficient details about the experimental design. It appears that responses to unimodal mismatches are obtained from sessions in which only one stimulus is presented (unimodal closed-loop sessions). Is this actually the case? An alternative and perhaps cleaner experimental design would be to create unimodal mismatches within a multimodal closed-loop session while keeping the other stimulus still coupled to the movement. 

      This is correct, unimodal mismatches were acquired in unimodal coupling. Testing unimodal mismatch responses in multimodally coupled VR is an interesting idea we had initially even pursued. However, halting visual flow in a condition of coupling of both visual flow and sound amplitude to running speed has an additional complication. Introducing an audiomotor mismatch in this coupling inherently also creates an audiovisual (AV) mismatch, and the same applies to visuomotor mismatches, which cause a concurrent visuoaudio (VA) mismatch (Figure R3). This assumes that there are cross modal predictions from visual cortex to auditory cortex as there are from auditory cortex to visual cortex (Garner and Keller, 2022). There are interesting differences between the different types of mismatches, but with the all the necessary passive controls this quickly exceeded the amount of data we could reasonably acquire for this paper. This remains an interesting question for future research. 

      Author response image 3.

      Rationale of unimodal mismatches introduced within multimodal paradigm. 

      Given the current experiment design (if my assumption is correct), it is unclear if the multimodal potentiation of mismatch responses is a consequence of nonlinear interactions between prediction/error signals exchanged across visual and auditory modalities. Alternatively, could this result from providing visual stimuli (coupled or uncoupled to movement) on top of the auditory stimuli? If it is the latter, would the observed results still be evidence of non-hierarchical interactions between various predictive processing streams? 

      Mice are not in complete darkness during the AM mismatch experiments (the VR is off, but there is low ambient light in the experimental rooms primarily from computer screens), so we can rule out the possibility that the difference comes from having “no” visual input during AM mismatch responses. Addressing the question of whether it is this particular stimulus that cause the increase would require an experiment in which we couple sound amplitude but keep visual flow open loop. We did not do this, but also think this is highly unlikely. However, as described above, we did do an experiment in which we coupled both sound amplitude and visual flow to running, and then either halted visual flow, or sound amplitude, or both. Comparing the [AM+VM] and [AM+AV] mismatch responses, we find that [AM+VM] responses are larger than [AM+AV] responses as one would expect from an interaction between [AM] and [VM] responses (Author response image 4). Finally, either way the conclusion that there are nonhierarchical interactions of prediction error computations holds either way – if any visual stimulus (either visuomotor mismatch, or visual flow responses) influences audiomotor mismatch responses, this is evidence of non-hierarchical interactions.   

      Author response image 4.

      Average population response of all L2/3 neurons to concurrent [AM + VM] or [AM+AV] mismatch. Gray shading indicates the duration of the stimulus.

      Along the same lines, it would be interesting to analyze how the coupling of visual as well as auditory stimuli to movement influences responses in the auditory cortex in close-loop in comparison to auditoryonly sessions. Also, do running onset responses change in open-loop in multimodal vs. unimodal playback sessions? 

      We agree, and why we started out doing the experiments described above. We stopped with this however, because it quickly became a combinatorial nightmare. We will leave addressing the question of how different types of coupling influences responses in auditory cortex to brave future neuroscientists. 

      Regarding the question of running onset responses, in both the multimodal and auditory only paradigms, running onset responses are transient; bottom-up sensory evidence is quickly subtracted from top-down motor-related prediction (Author response image 5). While there appears to be a small difference in the dynamics of running onset responses between these two paradigms, it was not significant. Note, we also have much less data than we would like here for this type of analysis. 

      Author response image 5.

      Running onset responses recorded in unimodal and multimodal closed loop sessions (1903 neurons, 16 fields of view, 8 mice)

      We also compared running onsets in open loop sessions and did not find any significant differences between unimodal and multimodal sessions (Author response image 6). We found only six sessions in which animals performed at least two running onsets in each session type, therefore, we do not have enough data to include it in the manuscript. 

      Author response image 6.

      Running onset responses recorded within unimodal and multimodal open loop sessions (659 cells, 6 field of view, 5 mice).

      Minor concerns and comments:

      (1) Rapid learning of audiomotor mismatches: It is interesting that auditory mismatches are present even on day 1 and do not appear to get stronger with learning (same on day 2). The authors comment that this could be because the coupling is learned rapidly (line 110). How does this compare to the rate at which visuomotor coupling is learned? Is this rapid learning also observable in the animal's behavior i.e. is there a change in running speed in response to the mismatch? 

      In the visual system this is a bit more complicated. If you look at visuomotor mismatch responses in a normally reared mouse, responses are present from the first mismatch (as far as we can tell given the inherently small dataset with just one response pre mouse). However, this is of course confounded by the fact that a normally reared mouse has visuomotor coupling throughout life from eye-opening. Raising mice in complete darkness, we have shown that approximately 20 min of coupling are sufficient to establish visuomotor mismatch responses (Attinger et al., 2017). 

      Regarding the behavioral changes that correlate with learning, we are not sure what the reviewer would expect. We cannot detect a change in mismatch responses and hence would also not expect to see a change in behavior.

      (2) The authors should clarify whether the sound and running onset responses of the auditory mismatch neurons in Figure 2E were acquired during open-loop. This is most likely the case, but explicitly stating it would be helpful. 

      Both responses were measured in isolation (i.e. VR off, just sound and just running onset), not in an open-loop session. We have clarified in the figure legend that these are the same data as in Figure 1H and N. 

      (3) In lines 87-88, the authors state 'Visual responses also appeared overall similar but with a small increase in strength during running ...'. This statement would benefit from clarification. From Figure S1 it appears that when the animal is sitting there are no visual responses in the auditory cortex. But when the animal is moving, small positive responses are present. Are these actually 'visual' responses - perhaps a visual prediction sent from the visual cortex to the auditory cortex that is gated by movement? If so, are they modulated by features of visual stimuli eg. contrast, intensity? Or, do these responses simply reflect motor-related activity (running)? Would they be present to the same extent in the same neurons even in the dark? 

      This was wrong indeed - we have rephrased the statement as suggested. Regarding the source of visual responses, we use the term “visual response” operationally here agnostic to what pathway might be driving it (i.e. it could be a prediction triggered by visual input). 

      We did not test if recorded visual responses are modulated by contrast or intensity. However, testing whether they are would not help us distinguish whether the responses are ‘visual’ or ‘visual predictions’. Finally, regarding the question about whether they are motor-related responses, this might be a misunderstanding. These are responses to visual stimuli while the mouse is already running (i.e. there is no running onset), hence we cannot test whether these responses are present in the dark (this would be the equivalent of looking at random triggers in the dark while the mouse is running).  

      (4) The authors comment in the text (lines 106-107) about cessation of sound amplitude during audiomotor mismatches as being analogous to halting of visual flow in visuomotor mismatches. However, sound amplitude versus visual flow are quite different in nature. In the visuomotor paradigm, the amount of visual stimulation (photons per unit time) does not necessarily change systematically with running speed. Whereas, in the audiomotor paradigm, the SNR of the stimulus itself changes with running speed which may impact the accuracy of predictions. On a broader note, under natural settings, while the visual flow is coupled to movement, sound amplitude may vary more idiosyncratically with movement. 

      This is a question of coding space. The coding space of visual cortex of the mouse is probably visual flow (or change in image) not number of photons. This already starts in the retina. The demonstration of this is quite impressive. A completely static image on the retina will fade to zero response (even though the number of photons remains constant). This is also why most visual physiologists use dynamic stimuli – e.g. drifting gratings, not static gratings – to map visual responses in visual cortex. If responses were linear in number of photons, this would make less of a difference. The correspondence we make is between visual flow (which we assume is the main coding space of mouse V1 – this is not established fact, but probably implicitly the general consensus of the field) and sound amplitude. Responses in auditory cortex are probably more linear in sound amplitude than visual cortex responses are linear in number of photons, but whether that is the correct coding space is still unclear, and as far as we can tell there is no clear consensus in the field. We did consider coupling running speed to frequency, which may work as well, but given the possible equivalence (as argued above) and the fact that we could see similar responses with sound amplitude coupling we did not explore frequency coupling. 

      If visual speed is the coding space of V1, SNR should behave equivalently in both cases. 

      Perhaps such differences might explain why unlike in the case of visual cortex experiments, running speed does not affect the strength of playback responses in the auditory cortex. 

      Possible, but the more straightforward framing of this point is that sensory responses are enhanced by running in visual cortex while they are not in auditory cortex. A playback halt response (by design) is just a sensory response. Why running does not generally increase sensory responses in auditory cortex (L2/3 neurons), but does so in visual cortex, would be the more general version of the same question.

      We fear we have no intelligent answer to this question.  

      Reviewer #3 (Public Review): 

      This study explores sensory prediction errors in the sensory cortex. It focuses on the question of how these signals are shaped by non-hierarchical interactions, specifically multimodal signals arising from same-level cortical areas. The authors used 2-photon imaging of mouse auditory cortex in head-fixed mice that were presented with sounds and/or visual stimuli while moving on a ball. First, responses to pure tones, visual stimuli, and movement onset were characterized. Then, the authors made the running speed of the mouse predictive of sound intensity and/or visual flow. Mismatches were created through the interruption of sound and/or visual flow for 1 second while the animal moved, disrupting the expected sensory signal given the speed of movement. As a control, the same sensory stimuli triggered by the animal's movement were presented to the animal decoupled from its movement. The authors suggest that auditory responses to the unpredicted silence reflect mismatch responses. That these mismatch responses were enhanced when the visual flow was congruently interrupted, indicates the cross-modal influence of prediction error signals. 

      This study's strengths are the relevance of the question and the design of the experiment. The authors are experts in the techniques used. The analysis explores neither the full power of the experimental design nor the population activity recorded with 2-photon, leaving open the question of to what extent what the authors call mismatch responses are not sensory responses to sound interruption. The auditory system is sensitive to transitions and indeed responses to the interruption of the sound are similar in quality, if not quantity, in the predictive and the control situation. 

      This study's strengths are the relevance of the question and the design of the experiment. The authors are experts in the techniques used. The analysis explores neither the full power of the experimental design nor the population activity recorded with 2-photon, leaving open the question of to what extent what the authors call mismatch responses are not sensory responses to sound interruption. The auditory system is sensitive to transitions and indeed responses to the interruption of the sound are similar in quality, if not quantity, in the predictive and the control situation. The pattern they observe is different from the visuomotor mismatch responses the authors found in V1 (Keller et al., 2012), where the interruption of visual flow did not activate neuronal activity in the decoupled condition. 

      Just to add brief context to this. The reviewer is correct here, the (Keller et al., 2012) paper reports finding no responses to playback halt. However, this was likely a consequence of indicator sensitivity (these experiments were done with what now seems like a pre-historic version of GCaMP). Experiments performed with more modern indicators do find playback halt responses in visual cortex (see e.g. (Zmarz and Keller, 2016)). 

      The auditory system is sensitive to transitions, also those to silence. See the work of the Linden or the Barkat labs on-off responses, and also that of the Mesgarani lab (Khalighinejad et al., 2019) on responses to transitions 'to clean' (Figure 1c) in the human auditory cortex. Since the responses described in the current work are modulated by movement and the relationship between movement and sound is more consistent during the coupled sessions, this could explain the difference in response size between coupled and uncoupled sessions. There is also the question of learning. Prediction signals develop over a period of several days and are frequency-specific (Schneider et al., 2018). From a different angle, in Keller et al. 2012, mismatch responses decrease over time as one might expect from repetition. 

      Also for brief context, this might be a misconception. We don’t find a decrease of mismatch responses in the (Keller et al., 2012) paper – we assume what the reviewer is referring to is the fact that mismatch responses decrease in open-loop conditions (they normally do not in closed-loop conditions). This is the behavior one would expect if the mouse learns that movement no longer predicts visual feedback. 

      It would help to see the responses to varying sound intensity as a function of previous intensity, and to plot the interruption response as a function of both transition and movement in both conditions. 

      Given the large populations of neurons recorded and the diversity of the responses, from clearly negative to clearly positive, it would be interesting to understand better whether the diversity reflects the diversity of sounds used or a diversity of cell types, or both. 

      Comments and questions: 

      Does movement generate a sound and does this change with the speed of movement? It would be useful to have this in the methods. 

      There are three ways to interpret the question – below the answers to all three:

      (1) Running speed is experimentally coupled to sound amplitude of a tone played through a loudspeaker. Tone amplitude is scaled with running speed of the mouse in a closed loop fashion. We assume this is not what the reviewer meant, as this is described in the methods (and the results section). 

      (2) Movements of the mouse naturally generate sounds (footsteps, legs moving against fur, etc.). Most of these sounds trivially scale with the frequency of leg movements – we assume this also not what the reviewer meant. 

      (3) Finally, there are experimental sounds related to the rotation speed of the air supported treadmill that increase with running speed of the mouse. We have added this to the methods as suggested. 

      Figures 1a and 2a. The mouse is very hard to see. Focus on mouse, objective, and sensory stimuli? The figures are generally very clear though. 

      We have enlarged the mouse as suggested. 

      1A-K was the animal running while these responses were measured? 

      We did not restrict this analysis to running or sitting and pooled responses over both conditions.  We have made this more explicit in the results section.  

      Data in Figure 1: Since the modulation of sensory responses by movement is relevant for the mismatch responses, I would move this analysis from S1 to Figure 1 and analyze the responses more finely in terms of running speed relative to sound and gratings. I would include here a more thorough analysis of the responses to 8kHz at varying intensities, for example in the decoupled sessions. Does the response adapt? Does it follow the intensity? 

      We agree that these are interesting questions, but they do not directly pertain to our conclusions here. The key point Figure S1 addresses is whether auditory responses are generally enhanced by running (as they are e.g. in visual cortex) – the answer, on average, is no. We have tried emphasizing this more, but it changes the flow of the paper away from our main message, hence we have left the panels in the supplements. 

      Regarding the 8kHz modulation, there is a general increase of the suppression of activity with increasing sound amplitude (Author response image 7 and Author response image 8). But due to the continuously varying amplitude of the stimulus, we do not have sufficient data (or do not know how to with the data we have) to address questions of adaptation. We assume there is some form of adaptation. However, either way, we don’t see how this would change our conclusions. 

      Author response image 7.

      Neural activity as a function of sound level in an AM open loop session. 

      Author response image 8.

      The average sound evoked population response of all ACx layer 2/3 neurons to 60 dB or 75 dB 8 kHz pure tones. Stimulus duration was 1 s (gray shading).

      2C-D why not talk of motor modulation? Paralleling what happens in response to auditory and visual stimuli? 

      This is correct, a mismatch response (we use mismatch here to operationally describe the stimulus – not the interpretation) can be described either as a prediction error (this is the interpretation) or a stimulus specific motor modulation. Note, the key here is “stimulus specific”. It is stimulus specific as there is an approximately 3x change between mismatch and playback halt (the same sensory stimulus with and without locomotion), but basically no change for sound onsets (Figure S1). Having said that, one explanation (prediction error) has predictive power (and hence is testable – see e.g. (Vasilevskaya et al., 2023) for an extensive discussion on exactly this argument for mismatch responses in visual cortex), while the other does not (a “stimulus specific” motor modulation has no predictive value or computational theory behind it and is simply a description). Thus, we choose to interpret it as a prediction error. Note, this finding does not stand in isolation and many of the testable predictions of the predictive processing interpretation have turned out to be correct (see e.g. (Keller and Mrsic-Flogel, 2018) for a review). 

      Note, we try to only use the interpretation of “prediction error” when motivating why we do the experiments, and in the discussion, but not directly in the description of the results (e.g. in Figure 2).  

      How does the mismatch affect the behavior of the mouse? Does it stop running? This could also influence the size of the response. 

      We quantified animal behavior during audiomotor mismatches and did not find any significant acceleration or slowing down upon mismatch events. Thus, neural responses recorded during AM mismatches are unlikely to be explained by changes in animal behavior. These data have been added in Figure S2A and Figure S4A.

      Figure 3. What about neurons that were positively modulated by both grating and movement? How do these neurons respond to the mismatch? 

      Neurons positively modulated by both grating and movement were slightly more responsive to MM than the rest of the population, though this difference was not significant (Author response image 9). This is also visible in Figure 3G – the high VM mismatch responsive neurons are randomly distributed in regard to correlation with running speed and visual flow speed. 

      Author response image 9.

      Responses to visuomotor mismatches of neurons positively modulated by grating and movement and remaining of the population.

      Line 176. The authors say 'Thus, in the case of a [AM + VM] mismatch both the halted visual flow and the halted sound amplitude are predicted by running speed' but the mismatch (halted flow and amplitude) is not predicted by the speed, correct? Please rephrase. 

      Thank you for pointing this out – this was indeed phrased incorrectly. We have corrected this. 

      How was the sound and/or visual flow interruption triggered? Did the animal have to run at a minimum speed in order for it to happen?

      Sound and visual flow interruptions were triggered randomly, independent of the animal's running speed. However, for the analysis, only MM presentations during which animals were running at a speed of at least 0.3 cm/s were included. The 0.3 cm/s was simply the (arbitrary) threshold we used to determine if the mouse was running. In a completely stationary mouse a mismatch event will not have any effect (sound amplitude/visual flow speed are already at 0). This is described in the methods section.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      (1) We agreed that there was insufficient evidence for the authors' conclusion that Myc-overexpressing clones lacking Fmi become losers. We request that the authors change the text to discuss that suppression of Myc clone growth through Fmi depletion is reminiscent of a cell acquiring loser status, although at this point in the manuscript there is no clear demonstration whether this is mostly driven by growth suppression and/or an increase in apoptosis.

      We agree that at the point in the manuscript where we have only described the clone sizes, one cannot make firm conclusions about competition, so we have changed the language to reflect this. We argue that after showing our apoptosis data, those conclusions become firm. Please see the more lengthy responses to reviewers below.

      (2) We agreed that the apoptosis assay, data and interpretation need to be improved. The graphs in Fig. 4O and P should be better discussed in the text and in the legend. Additionally, the graphs are lacking the red lines that are written in the text.

      We regret that we did not adequately explain the data displayed in these two graphs. Supercompetition tends to cause apoptosis in both winners and losers, with the ratio between WT and super-competitor cells being critical in deciding the outcome of competition. We wanted to represent this visually but failed to properly explain our analysis. We have rewritten the figure legend and our discussion in the main text, hopefully making it clearer. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper is focused on the role of Cadherin Flamingo (Fmi) in cell competition in developing Drosophila tissues. A primary genetic tool is monitoring tissue overgrowths caused by making clones in the eye disc that expression activated Ras (RasV12) and that are depleted for the polarity gene scribble (scrib). The main system that they use is ey-flp, which make continuous clones in the developing eye-antennal disc beginning at the earliest stages of disc development. It should be noted that RasV12, scrib-i (or lgl-i) clones only lead to tumors/overgrowths when generated by continuous clones, which presumably creates a privileged environment that insulates them from competition. Discrete (hs-flp) RasV12, lgl-i clones are in fact out-competed (PMID: 20679206), which is something to bear in mind. They assess the role of fmi in several kinds of winners, and their data support the conclusion that fmi is required for winner status. However, they make the claim that loss of fmi from Myc winners converts them to losers, and the data supporting this conclusion is not compelling.

      Strengths:

      Fmi has been studied for its role in planar cell polarity, and its potential role in competition is interesting.

      Weaknesses:

      I have read the revised manuscript and have found issues that need to be resolved. The biggest concern is the overstatement of the results that loss of fmi from Myc-overexpressing clones turns them into losers. This is not shown in a compelling manner in the revised manuscript and the authors need to tone down their language or perform more experiments to support their claims. Additionally, the data about apoptosis is not sufficiently explained.

      We take issue with this reviewer’s framing of their criticism. First, the reviewer is selectively reporting the results published in PMID: 20679206. They correctly state that those authors show that small discreet clones of RasV12 lgl are eliminated (Fig. 3B), but they omit the fact that the authors also show that larger RasV12 lgl clones induce apoptosis in the surrounding wild type cells, and therefore behave as winners (Fig. 3C). Hence, the size of the clone appears to determine its winner/loser status. Of course, lgl is not scrib, and it is not a certainty that they would behave similarly, but they also show that large RasV12 scrib clones induce considerable apoptosis of the neighboring wild type cells. 

      The reviewer then discusses “continuous” clones induced by ey-flp, as we use in our manuscript. Here, the term “continuous” is probably misleading; because ey is expressed ubiquitously in the disc from early in development, it is most likely the case that the majority of cells have flipped relatively early, resulting in ~half the cells becoming clone and the other ~half twin spot. The clone cells then likely fuse to make larger clones. We show that ey-flp induced RasV12 scrib clones also behave as winners. It is logical to conclude that this is because they are large. The reviewer talks about “a privileged environment that insulates them from competition,” but if they were insulated from competition, how could they become winners? Because they occupy more territory than the wild type cells, and because they induce apoptosis in the wild type neighbors, they are winners. 

      Having shown that ey-flp induced RasV12 scrib clones behave as winners, we then remove Fmi from these clones, and show that they behave as losers by the same criteria: they occupy less area than the wild type cells (our Fig. 1 and Fig. 1 Supp 2), and they induce apoptosis in the wild type cells (our Fig 4A-H). 

      With respect to the comment about additional experiments are needed to support the claim that loss of Fmi from Myc winners converts them to losers, we’re not sure what additional data the reviewer would want. As for the tumor clones, we show that >>Myc clones get bigger than the twin control clones (Fig. 2), and we measure similar low levels of apoptosis in each (Fig. 4I-K, O). In contrast >>Myc fmi clones are out-grown by wild type clones, and apoptosis is higher in the >>Myc fmi clones than in the wild type clones (Fig. 4L-N, P-S). We therefore believe it is correct to say that >>Myc clones become losers when Fmi is removed.

      In additional comments, the reviewer takes issue with using winner and loser language at the point in the manuscript where we have only shown the clone sizes but not yet the apoptosis data, and about this we agree. We have changed the language accordingly. 

      Re explanation of the apoptosis data, see the response to reviewer #3.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bosch et al. reveal Flamingo (Fmi), a planar cell polarity (PCP) protein, is essential for maintaining 'winner' cells in cell competition, using Drosophila imaginal epithelia as a model. They argue that tumor growth induced by scrib-RNAi and RasV12 competition is slowed by Fmi depletion. This effect is unique to Fmi, not seen with other PCP proteins. Additional cell competition models are applied to further confirm Fmi's role in 'winner' cells. The authors also show that Fmi's role in cell competition is separate from its function in PCP formation.

      Strengths:

      (1) The identification of Fmi as a potential regulator of cell competition under various conditions is interesting.

      (2) The authors demonstrate that the involvement of Fmi in cell competition is distinct from its role in planar cell polarity (PCP) development.

      Weaknesses:

      (1) The authors provide a superficial description of the related phenotypes, lacking a mechanistic understanding of how Fmi regulates cell competition. While induction of apoptosis and JNK activation are commonly observed outcomes in various cell competition conditions, it is crucial to determine the specific mechanisms through which they are induced in fmi-depleted clones. Furthermore, it is recommended that the authors utilize the power of fly genetics to conduct a series of genetic epistasis analyses.

      We agree that it is desirable to have a mechanistic understanding of Fmi’s role in competition, but that is beyond the scope of this manuscript. Here, our goal is to report the phenomenon. We understand and share with the reviewer the interest in better understanding the relationship between Fmi and JNK signaling in competition. The role of JNK in competition, tumorigenesis and cell death is infamously complex. In some preliminary experiments, we explored some epistasis experiments, but these were inconclusive so we elected to not report them here. In the future, we will continue with additional analyses to gain a better understanding of the mechanism by which Fmi affects competition.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Bosch and colleagues describe an unexpected function of Flamingo, a core component of the planar cell polarity pathway, in cell competition in Drosophila wing and eye disc. While Flamingo depletion has no impact on tumour growth (upon induction of Ras and depletion of Scribble throughout the eye disc), and no impact when depleted in WT cells, it specifically tunes down winner clone expansion in various genetic contexts, including the overexpression of Myc, the combination of Scribble depletion with activation of Ras in clones or the early clonal depletion of Scribble in eye disc. Flamingo depletion reduces proliferation rate and increases the rate of apoptosis in the winner clones, hence reducing their competitiveness up to forcing their full elimination (hence becoming now "loser"). This function of Flamingo in cell competition is specific of Flamingo as it cannot be recapitulated with other components of the PCP pathway, does not rely on interaction of Flamingo in trans, nor on the presence of its cadherin domain. Thus, this function is likely to rely on a non-canonical function of Flamingo which may rely on downstream GPCR signaling.

      This unexpected function of Flamingo is by itself very interesting. In the framework of cell competition, these results are also important as they describe, to my knowledge, one of the only genetic conditions that specifically affect the winner cells without any impact when depleted in the loser cells. Moreover, Flamingo do not just suppress the competitive advantage of winner clones, but even turn them in putative losers. This specificity, while not clearly understood at this stage, opens a lot of exciting mechanistic questions, but also a very interesting long term avenue for therapeutic purpose as targeting Flamingo should then affect very specifically the putative winner/oncogenic clones without any impact in WT cells.

      The data and the demonstration are very clean and compelling, with all the appropriate controls, proper quantifications and backed-up by observations in various tissues and genetic backgrounds. I don't see any weakness in the demonstration and all the points raised and claimed by the authors are all very well substantiated by the data. As such, I don't have any suggestions to reinforce the demonstration.

      While not necessary for the demonstration, documenting the subcellular localisation and levels of Flamingo in these different competition scenarios may have been relevant and provide some hints on a putative mechanism (specifically by comparing its localisation in winner and loser cells).

      While we did not perform a thorough analysis, our current revision of the manuscript shows Fmi staining results that do not support a change in subcellular localization of Fmi. In our images, Fmi seemed to localize similarly along the winner-loser clone boundaries, and inside and outside the clones. We cannot rule out that a subtle change in localization is taking place that could perhaps be detected with higher resolution imaging.

      Also, on a more interpretative note, the absence of impact of Flamingo depletion on JNK activation does not exclude some interesting genetic interactions. JNK output can be very contextual (for instance depending on Hippo pathway status), and it would be interesting in the future to check if Flamingo depletion could somehow alter the effect of JNK in the winner cells and promote downstream activation of apoptosis (which might normally be suppressed). It would be interesting to check if Flamingo depletion could have an impact in other contexts involving JNK activation or upon mild activation of JNK in clones.

      See our comment to Reviewer 2 regarding JNK.

      Strengths:

      A clean and compelling demonstration of the function of Flamingo in winner cells during cell competition

      One of the rare genetic conditions that affects very specifically winner cells without any impact in losers, and then can completely switch the outcome of competition (which opens an interesting therapeutic perspective on the long term) Weaknesses:

      The mechanistic understanding obviously remains quite limited at this stage especially since the signaling does not go through the PCP pathway.

      We agree that in the future, it will be desirable to gain a mechanistic understanding of Fmi’s role in competition.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I have read the revised manuscript and have found issues that need to be resolved. The biggest concern is the overstatement of the results that loss of fmi from Myc-overexpressing clones turns them into losers. This is not shown in a compelling manner in the revised manuscript and the authors need to tone down their language or perform more experiments to support their claims.

      (1) I do not agree with the language used by the authors last paragraph of p. 4 stating loss of fmi from Myc supercompetitors (Fig. 2) makes them losers. At this point in the paper, they only use clone size as a readout. By definition, losers in imaginal discs die by apoptosis, which is not measured in this figure. As such, the authors do not prove that fmi-mutant Myc over-expressing clones are now losers at this point in the manuscript. The authors should discuss this in the results section regarding Fig. 2.

      We have modified the language in text and figure legend to acknowledge that the clone size data alone do not demonstrate competition.

      (2) Related to point #1, I do not agree with the language in the legend of Fig. 2H that the graph is measuring "supercompetition". They are only measuring clone ratios, not apoptosis. Growing to a smaller size does not make a clone have loser status without also assessing cell death.

      (a) I suggest that the authors remove the sentence "A ratio over 0 indicates supercompetition of nGFP+ clones, and below 0 indicates nGFP+ cells are losers." in the legend to Fig. 2H. Instead, they should describe the assay in times of clone ratios.

      The reviewer raises a valid point, as at this point in the manuscript we did not quantify cell death and proliferation. However, based on decades of knowledge of supercompetiton, Myc clones are classified as super-competitors in every instance they’ve been studied. (Myc clones show apoptosis when competing with WT cells, while at the same time they eliminate WT neighbors by apoptosis to become winners. Their faster proliferation rate may be what ultimately makes them winners.) We changed the language to address this distinction. 

      (3) In Fig. 4, they do attempt to monitor apoptosis, which is the fate of bona fide losers in imaginal tissue. However, I have several concerns about these data (panels 4I-K, O and P have been added to the revised manuscript.)

      (a) In Fig. 4I-K, why is there no death of WT cells which would be expected based on de la Cova Cell 2004? The authors need to comment on this.

      (b) Cell death should also be observed in the Myc over-expressing clones but none is seen in this disc (see de la Cova 2004 and PMID: 18257071 Fig. 4). The authors need to comment on this.

      We do not understand why the reviewer raises these two points. We see some cell death in >Myc eye discs both in winners and losers, as displayed in the graph. In our hands, the levels were on average very low. The example shown is representative of the analysis and shows apoptosis both in WT and >Myc cells, highlighted by the arrows in 4J. We added a mention to the arrows in the figure legend to make it clearer. In the main text, we already compared our observations to the same publication the reviewer mentions (De la Cova 2004). 

      (c) The data in panel 4O is not explained sufficiently in the legend or results section. What do the lines between the data points in the left side of the panel mean? Why is there a bunch of clustered data points in the right part of the Fig. 4O, when two different genotypes are listed below? I would have expected two clusters of points. The authors need to comment on this.

      We intended to convey as much information as possible in an informative manner in these graphs, and we regret not explaining better the analysis shown. We modified the legends for the apoptosis analysis to better explain the displayed data.

      (d) What is the sample size (n) for the genotypes listed in this figure? The authors need to comment on this and explicitly list the sample size in the legend.

      We added the n for both conditions to the figure. 

      (e) In panels 4L-N, why is the death occurring in the apparent center of the fmiE59>>Myc clone. If these clones are truly losers as the authors claim, then apoptosis should be seen at the boundaries between the fmiE59>>Myc clone and the WT clones. The results in this figure are not compelling, yet this is the critical piece of data to support their claim that fmiE59>>Myc clone are losers. The authors need to comment on this.

      The majority of cell death in this example is observed 1-3 cells away from the clone boundary. In some cases, we observe cell death farther from the boundary, but those cells were not counted in our analyses. As described in our methods, we only considered for the analysis cells at the clone boundary or in the vicinity, as those are the ones that most probably have apoptosis triggered by the neighboring clone.

      (f) There is no red line in Fig. 4O and 4P, in contrast to what is written in the legend in the revised manuscript. This should be corrected.

      We thank the reviewer for catching the error about the line. We have now simplified the graph by removing the line at Y=0 and just leave one dashed line, representing the mean difference between WT and >>Myc cells.

      (4) On p. 10, the reference Harvey and Tapon 2007 to support hpo-/- supercompetitor status is incorrect. The references are Ziosi 2010 and Neto-Silva 2010. This should be changed.

      We thank the reviewer for the correction. While the review we provided discusses the role of the Hpo pathway in proliferation and cancer, it does not discuss competition. The reference we intended to include here was Ziosi 2010. We now cite both in the revised manuscript.

      (5) The legend for Fig. 3A-H is missing from the revised manuscript. This needs to be added.

      This was likely a copy-edit glitch. The missing parts of the legend have been restored.

      (6) Material and methods is missing details on the hs-induced clones. The authors need to specifically state when the clones were generated and when they were analyzed in hours after egg laying.

      The timing of the heat-shock and analysis was described in the methods: “Heat-shock was performed on late first instar and early second instar larvae, 48 hrs after egg laying (AEL). Vials were kept at 25ºC after heat-shock until larvae were dissected”. And additionally, in the dissection methods: “Third instar wandering larvae (120 hrs AEL) were dissected…” We have included in this revision the length of the heat-shock (15 min). 

      I have read the rebuttal and some of my concerns are not sufficiently addressed.

      (8) I raised the point of continuously-generated clones becoming large enough to evade competition, and I disagree with the authors' reply. I think that competition of RasV12, scrib (or lgl) competition largely depends the size of the clone, which is de facto larger when generated by continuous expression of flp (such as eyeless or tubulin promoters used in this study). I think that at that point, we are at an impasse with respect to this issue, but I wanted to register my disagreement for the record. Related to this, one possible reason for the fragmentation of the fmimutant Myc overexpressing clones in the wing disc is because they were not continuously generated and hence did not merge with other clones.

      Please see the discussion above in the public comments. We remain unclear about what, exactly, the reviewer disagrees. As stated above, we think they are correct that the size of the clone is critical in determining winner vs loser status.

      Reviewer #2 (Recommendations for the authors):

      Although the authors have addressed some of my concerns, I still feel that a detailed mechanistic understanding is essential. I hope the authors will conduct additional experiments to solve this issue.

      We also consider the mechanism of interest and will pursue this in the future. To test our hypotheses we require a set of genetic mutants that are still in the making that will help us dissect the function and potential partners of Fmi, and we hope to have these results in a future publication.

      Reviewer #3 (Recommendations for the authors):

      - There is no clear demonstration that the relative decrease of clone size in UASMyc/Fmi mutant is mostly driven by either a context dependant suppression of growth and/or an increase of apoptosis (the latter being the more classic feature of loser phenotype).

      We believe that it is driven by both, and refrain from making assumptions about the magnitude of contribution from each. This question is something that we will be interested to explore in the future.

      The distribution of cell death in Fmi/UAS-Myc mutant is somehow surprising and may not fit with most of the competition scenarios where death is mostly restricted to clone periphery (although this may be quite variable and would require much more quantification to be clear).

      While we observe some cell death far from clone boundaries, most of the dying cells are a few cells away from a clone boundary. In other publications quantifying cell death, examples of cell death farther from the boundary are not rare (See for example Moreno and Basler 2004 Fig 6, De la Cova et al. Fig 2, Meyer et al 2014 Fig 2). We did not count cells dying far from clone boundaries in our analysis.

      I just noticed a few mistakes in the legend :

      Figure 3M legend is missing (it would be useful to know at which stage the quantification is performed)

      Another reviewer brought to our attention the problems with Fig 3 legend. We restored the missing parts.

      It would be good to give an estimate of the number of larvae observed when showing the representative cases in Figure 1 .

      This is a good point. We now include these numbers in the figure legend.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Recommendations for the authors): 

      Discussion, page 28. The argument that the authors put forward justifying the (small) size of the spontaneous EPSCs seems reasonable. Nonetheless, it would be good to have an amplitude distribution constructed with voltage-evoked EPSCs to compare with that of spontaneous EPSCs. Not the large initial EPSC, obtained upon IHC depolarization but rather EPSCs occurring later during the longer pulses (figure 4). The authors made the claim that upon IHC depolarization, EPSCs sizes increased, but this is not backed with data. 

      Following the reviewer recommendation, we have analyzed the voltage-evoked EPSCs occurring during the last 20 ms of the Masker stimulus. We compared the cumulative distribution of the amplitude of these eEPSCs to the cumulative distribution of the amplitude of the sEPSCs (Figure 1-figure supplement 1, panel G) from the same synapses. The two distributions are significantly different (p < 0.0001, Kolmogorov-Smirnov test), with evoked EPSCs having larger amplitudes (average sEPSC amplitude of -97.28 ± 2.22 pA [median 82.10 pA] vs average eEPSC amplitude of 135.8 ± 3.24 pA [median 120.0 pA]).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study investigates protein-protein interactions (PPIs) within the nuage, a germline-specific organelle essential for piRNA biogenesis in Drosophila melanogaster, using AlphaFold2 to predict interactions among 20 nuage-localizing proteins. The authors identify five novel interaction candidates and experimentally validate three of them, including Spindle-E and Squash, through co-immunoprecipitation assays. They confirm the functional significance of these interactions by disrupting salt bridges at the Spn-E_Squ interface. The study further expands its scope to analyze approximately 430 oogenesis-related proteins, validating three additional interaction pairs. A comprehensive screen of around 12,000 Drosophila proteins for interactions with the key piRNA pathway player, Piwi, identifies 164 potential binding partners. Overall, the research demonstrates that in silico approaches using AlphaFold2 can link bioinformatics predictions with experimental validation, streamlining the identification of novel protein interactions and reducing the reliance on extensive experimental efforts. The manuscript is commendably clear and easy to follow; however, areas for improvement should be addressed to enhance its clarity and rigor.

      Major Concerns:

      (1) While AlphaFold2 was developed and trained primarily for predicting protein structures and their interactions, applying it to predict protein-protein interactions is an extrapolation of its intended use. This introduces several important considerations and risks. First, it assumes that AlphaFold's accuracy in structure prediction extends to interactions, despite not being explicitly trained for this task. Additionally, the assumption that high-scoring models with structural complementarity imply biologically relevant interactions is not always valid. Experimental validation is essential to address these uncertainties, as over-reliance on computational predictions without such validation can lead to false positives and inaccurate conclusions. The authors should expand on the assumptions, limitations, and risks associated with using AlphaFold2 for predicting protein-protein interactions.

      We appreciate the reviewer's point. The prediction of protein-protein interactions using AlphaFold2 relies on the number of conserved homologous sequences and previous conformational data. We shall add limitations and risks to the AlphaFold2 prediction method in the revised manuscript.

      (2) The authors experimentally validated three interactions, out of five predicted interactions, using co-immunoprecipitation (co-IP). They attributed the lack of validation for the other two predictions to the limitations of the co-IP method. However, further clarification on the potential limitations of the co-immunoprecipitation behind the negative results would strengthen the conclusions. While co-IP is a widely used technique, it may not detect weak or transient interactions, which could explain the failure to validate some predictions. Suggesting alternative validation methods such as FRET or mass spectrometry could further substantiate the results. On the other hand, AlphaFold2 predictions are not infallible and may generate false positives, particularly when dealing with structurally plausible but biologically irrelevant interactions. By acknowledging both the potential limitations of co-IP and the possibility of false positives from AlphaFold2, the authors can provide a more balanced interpretation of their findings.

      We appreciate the reviewer's point of view. We have used the co-IP method to detect interactions in this study. However, as the reviewer pointed out, it is likely that weak and transient interactions may not be detected. We plan to add a note on the detection limits of the co-IP method and the possibility that AlphaFold2 method produces false positives in the revised manuscript.

      (3) In line 143, the authors state that "This approach identified 13 pairs; seven of these were already known to form complexes, confirming the effectiveness of AlphaFold2 in predicting complex formations (Table 2). The highest pcScore pair was the Zuc homodimer, possibly because AlphaFold2 had learned from Zuc homodimer's crystal structure registered in the database." While the authors mentioned the presence of the Zuc homodimer's crystal structure, they do not provide a systematic bioinformatics analysis to evaluate pairwise sequence identity or check for the presence of existing structures for all the proteins or protein pairs (or their homologs) in databases such as the Protein Data Bank (PDB) or Swiss-Model. Conducting such an analysis is critical, as it significantly impacts the novelty and reliability of AlphaFold2 predictions. For instance, high sequence identity between the query proteins could lead to high-scoring models for biologically irrelevant interactions. Including this information would strengthen the conclusions regarding the accuracy and utility of the predictions.

      We appreciate the reviewer's critical point. The AlphaFold2 method generates a high confidence score when the 3D structure of the protein of interest, or of proteins with very similar sequences, is solved. We will investigate whether the proteins used in this study are included in the 3D structure database and add the information to the revised manuscript.

      (4) While the manuscript successfully identifies novel protein interactions, the broader biological significance of these interactions remains underexplored. The manuscript could benefit from elaborating on how these findings may contribute to understanding the piRNA pathway and its implications on germline development, transposon repression, and oogenesis.

      We plan to add to the revise manuscript the potential biological significance of the novel protein-protein interactions presented in this manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors use AlphaFold2 to identify potential binding partners of nuage localizing proteins.

      Strengths:

      The main strength of the paper is that the authors experimentally verify a subset of the predicted interactions.

      Many studies have been performed to predict protein-protein interactions in various subsets of proteins. The interesting story here is that the authors (i) focus on an organelle that contains quite some intrinsically disordered proteins and (ii) experimentally verify some (but not all) predictions.

      Weaknesses:

      Identification of pairwise interactions is only a first step towards understanding complex interactions. It is pretty clear from the predictions that some (but certainly not all) of the pairs could be used to build larger complexes. AlphaFold easily handles proteins up to 4-5000 residues, so this should be possible. I suggest that the authors do this to provide more biological insights.

      We thank the reviewer for his kind suggestions. Although dimer structure predictions were made in this manuscript, if a protein is predicted to interact with two other proteins, it is possible that three proteins could interact. We plan to add such trimer predictions to the revise manuscript.

      Another weakness is the use of a non-standard name for "ranking confidence" - the author calls it the pcScore - while the name used in AlphaFold (and many other publications) is ranking confidence.

      We take the reviewer’s point and will revise the text accordingly.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Through a series of CRISPR-Cas9 screens, the GPX4 antioxidant pathway was identified as a critical suppressor of cold-induced cell death in hibernator-derived cells. Hamster BHK-21 cells exposed to repeated cold and rewarming cycles revealed five genes (Gpx4, Eefsec, Pstk, Secisbp2, and Sepsecs) as critical components of the GPX4 pathway, which protects against cold-induced ferroptosis. A second screen with continuous cold exposure confirmed the essential role of GPX4 in prolonged cold tolerance. GPX4 knockout lines exhibited complete cell death within four days of cold exposure, and pharmacological inhibition of GPX4 further increased cell death, underscoring the necessity of GPX4's catalytic activity in cold conditions.

      An additional CRISPR screen in human cold-sensitive K562 cells identified 176 genes for cold survival. The GPX4 pathway was found to confer significant resistance to cold in hibernators and human cells, with GPX4 loss significantly increasing cold-induced cell death.

      Comparing hamster and human GPX4, overexpression of GPX4 in human K562 cells, whether hamster or human GPX4, dramatically improved cold tolerance, while catalytically dead mutants showed no such effect. These findings suggest that GPX4 abundance is a key limiting factor for cold tolerance in human cells, and primary cell types show strong sensitivity to GPX4 loss, highlighting that differences in cold tolerance across species may be due to varying GPX4-mediated protection.

      Strengths:

      (1) Innovative Approach: The study employs a series of unbiased genome-wide CRISPR-Cas9 screens in both hibernator- and non-hibernator-derived cells to investigate the mechanisms controlling cellular cold tolerance. Notably, this is the first genome-scale CRISPR-Cas9 screen conducted in cells derived from a hibernator, the Syrian hamster.

      (2) Identification of the GPX4 Pathway: Identifying glutathione peroxidase 4 (GPX4) as a critical suppressor of cold-induced cell death significantly contributes to the field. Recently, GPX4 was also reported as a potent regulator of cold tolerance through overexpression screening (Sone et al.) in hamsters, which further supports this finding.

      (3) Improved Cold Viability Assessment: The study identifies an important technical artifact in using trypan blue to assess cell viability following cold exposure. It reveals that cells stained immediately after cold exposure retain the dye, inaccurately indicating cell death. By introducing a brief rewarming period before viability assessment, the authors significantly improve the accuracy of detecting cold-induced cell death. This refinement in methodology ensures more reliable results and sets a new standard for future research on cold stress in cells.

      Weaknesses:

      (1) Mechanisms Regulating GPX4 Levels: While the study highlights GPX4 levels as a major determinant of cellular cold tolerance, it does not discuss how these levels are regulated or why they differ between hibernators and non-hibernators. This omission leaves an important aspect of GPX4's role in cold tolerance unexplored.

      (2) Generalizability Across Species: Although the study demonstrates the role of GPX4 in several mammalian species, it does not investigate whether this mechanism extends to other vertebrates (e.g., fish and amphibians) that also face cold challenges. This limitation could restrict the broader evolutionary claims made by the study.

      (3) Variability in Cold Sensitivity Across Human Cell Lines: The study observes significant variability in cold tolerance among different human cell lines but does not explain these differences clearly. This leaves a key aspect of human cell cold sensitivity insufficiently addressed.

      We thank the reviewer for the positive evaluation and thoughtful comments on the manuscript. We acknowledge that our study does not delve into the mechanisms regulating GPX4 levels, including differences between hibernators and non-hibernators, differences between cell types, or the possibility that GPX4 levels are dynamically regulated by environmental conditions. We consider these as interesting open questions that could be addressed in future studies.

      While our study focused entirely on mammalian species, we agree that examining cold tolerance mechanisms across a broader range of vertebrates, including fish and amphibians, could enhance our evolutionary perspective. Interestingly, previous work has indicated that C.elegans adapt to cold temperatures through ferritin mediated Fe2+ detoxification. This suggests that cold induces Fe2+-mediated toxicity in C.elegans as well as mammalian cells, but that the mechanisms through which distantly related species counteract cold-mediated cell death may vary. 

      Finally, we agree that the variability in cold sensitivity across human cell lines could be further explored, and we will strongly consider conducting follow up experiments to examine the extent to which this variability is driven by levels of GPX4.

      We are grateful for these insightful comments, as they highlight important avenues for future research. Addressing these questions will enable a more comprehensive understanding of GPX4's role in cold tolerance and its evolutionary significance across diverse organisms.

      Reviewer #2 (Public review):

      Summary:

      Lam et al., present a very intriguing whole genome CRISPR screen in Syrian Hamster cells as well as K562 cells to identify key genes involved in hypothermia-rewarming tolerance. Survival screens were performed by exposing cells to 4C in a cooled CO2 incubator followed by a rewarming period of 30 minutes prior to survival analysis. In this paradigm, Syrian hamster-derived cell lines exhibit more robust survival than human cell lines (BHK-21 and HaK vs HT1080, HeLa, RPE1, and K562). A genome-wide Syrian hamster CRISPR library was created targeting all annotated genes with 10 guides/gene. LV transduction of the library was performed in BHK-21 cells and the survival screen procedures involved 3 cycles of 4C cold exposure x4 days followed by 2 days of re-warming.

      When compared to controls maintained at 37C, 9 genes were required for BHK-21 survival of cold cycling conditions and 5 of these 9 are known components of the GPX4 antioxidant pathway. GPX4 KO BHK-21 cells had reduced cell growth at 37C and profoundly worse cold tolerance which could be reduced by GPX4 expression. GPX4 inhibitors also reduced survival in cold. CRISPR KO screens and GPX4 KO in K562 cells revealed comparable results (though intriguingly glutathione biosynthesis genes were more critical to K562 cells than BHK-21 cells). Human or Syrian hamster GPX4 overexpression improved cold tolerance.

      Strengths:

      This is a very nicely written paper that clearly communicates in figures and text complicated experimental manipulations and in vitro genetic screening and cell survival data. The focus on GPX4 is interesting and relatively novel. The converging pharmacologic, loss-of-function, and gain-of-function experiments are also a strength.

      Weaknesses:

      A recently published article (Reference 43, Sone et al.) also independently explored the role of GPX4 in Syrian hamster cold tolerance through gain-of-function screening. Further exploration of the GPX4 species-specific mechanisms would be of great interest, but this is considered a minor weakness given the already very comprehensive and compelling data presented.

      We greatly appreciate the reviewer’s compliments and thoughtful comments on our manuscript. We agree with the reviewer that our approach (dual unbiased genome-scale screens in human and hamster cells) and the recent investigation by Sone et al (gain-of-function screening involving the insertion of hamster cDNA into human cells) mutually strengthen the importance of GPX4 in cold tolerance across cell types and species.

      Reviewer #3 (Public review):

      Summary:

      This work aims to address a fundamental biological question: how do mammalian cells achieve/lose tolerance to cold exposure? The authors first tried to establish an experimental system for cell cold exposure and evaluation of cell death and then performed genome-scale CRISPR-Cas9 screening on immortalized cell lines from Syrian Hamster (BHK-21) and human (K562) for key genes that are associated with cell survival during prolonged cold exposure. From these screenings, they focused on glutathione peroxidase 4 (GPX4). Using genetic modifications or pharmacological interventions, and multiple cell models including primary cells from various mammalian species, they showed that GPX4 proteins are likely to retain their activities at 4 {degree sign}C, functioning to prevent cold-induced cell ferroptosis.

      Strengths:

      (1) This paper is neatly written and hence easy to follow.

      (2) Experiments are well designed.

      (3) The data showing the overall good cell survival after a prolonged cold exposure or repeated cold-warm cycles are helpful to show the advantages of the experimental instruments and methods the authors used, and hence the validity of their results.

      (4) The CRISPR-Cas9 screening is a great attempt.

      (5) Multiple cell types from hibernating mammals (cold tolerant) and cold-intolerant species are used to test their findings.

      (6) Although some may argue that other labs have published works with different approaches that have pointed out the importance of GPX4 and ferroptosis in hamster cell survival from anoxia-reoxygenation or cold exposure models, hence hurting the novelty of this work, this reviewer thinks that it is highly valuable to have independent research groups and different methods/systems to validate an important concept.

      Weaknesses:

      (1) Only cell death was robustly surveyed; though cell proliferation was evaluated too in some experiments, other cellular functions, such as mitochondrial ATP production vs. glycolysis, and the extent of lipid peroxidation, could have been measured to reflect cellular physiology.

      Validations on complex tissues or in vivo systems would have further strengthened the work and its impact.

      CRISPR-Cas9 screening may have technical limitations as knock-out of some essential genes/pathways may lead to cell lethality during screening, and hence the relevance of these genes/pathways to cell cold tolerance may not be noted. From the data presented in this study, this reviewer thinks that the GPX4 pathway is likely a conserved mechanism for long-term cold survival, but not for cold sensitivity or acute cell death from cold exposure. In line with my such speculation, their CRISPR-Cas9 screening revealed genes in the GPX4 pathway from a relatively cold-sensitive human cell line, but the endogenous GPX4 pathway is seemingly operational in this cold-sensitive cell line. Also, these cells are viable after GPX4 knock-out. Dead cells from the acute cold exposure phase may detached, or their genomic DNAs have been severely damaged by the time of sample collection, hence not giving any meaningful sequencing reads. Crippling other factors/pathways such as FOXO1 (PMID: 38570500) or 5-aminolevulinic acid (ALA) metabolism (PMID: 35401816) have been shown to severely aggravate cold-induced cell death, including TUNEL-revealed DNA damage, within a much shorter time scale, whilst loss-function knockouts of FOXO1 or ALA Synthase 1 (ALAS1) are usually cell lethal. Thus, they and other possible essential genes may not be screenable from the current experimental protocol. These important points need to be taken into consideration by the authors.

      We thank the reviewer for highlighting the novelty of using genome-scale CRISPR-Cas9 screens and the validation of GPX4 function across cell types and mammalian species. 

      We acknowledge that our study primarily focused on measuring cell death using Trypan Blue dye exclusion. To validate the Trypan Blue assay, cell survival data was orthogonally measured using the LDH release assays (Fig. 1g). The proliferation potential of putatively live cells was assessed by counting the increase in live cells following 24 h at 37°C (Fig. 1b). Prompted by your question, we will add additional data to the final version of the manuscript in which we show that following 1 day at 4°C, K562 cells rapidly restarted their cell cycle and double in numbers every 21 hours (Author response image 1). This rate is indistinguishable from the replication rate of cells that were not previously exposed to 4°C, suggesting that the cells following cold exposure are both alive and functionally capable of replicating.

      Author response image 1.

      Population doubling time of K562 cells cultured at 37°C (pink) and cells that are rewarmed to 37°C following 1 day of 4°C exposure

      We agree that assessing additional cellular functions, such as mitochondrial ATP production, glycolysis, lipid metabolism and peroxidation could provide a more comprehensive understanding of cellular physiology under cold stress and would be valuable future studies. Similarly, we appreciate the suggestion to validate our findings in complex tissues or in vivo models. We recognize that such validation could strengthen the implications of our study and enhance its translational potential; however, due to their complexity, we believe that these additional studies are beyond the scope of our current study.

      We agree with the reviewer that CRISPR-Cas9 screens have limitations. For example our screen was designed to identify genes that are preferentially required for cellular fitness at 4°C versus 37°C. There are many genes that are required for cellular survival at 4°C as well as 37°C that are not discussed (Table S2, S5). Also, given that the screen is designed to disrupt a single gene per cell, genes that have redundant functions in cold-tolerance will likely be missed. Given the reviewer’s questions, we will expand the discussion of the paper to highlight limitations of the screen.

      We apologize for any lack of clarity about the methods we employed during the screen and will expand the methods section to provide further details. For example, for the BHK-21 screen we eliminated dead cells by sequencing cells that reattached after rewarming to 37°C for either 30 minutes (15 day cold exposure screen) or 24 hours (4°C cycling screen). Indeed, at the point of cell collection for both BHK-21 and K562 screens, the fraction of live cells was greater than 92% and 95%, respectively.  We respectfully disagree with the reviewer that our screens would miss genes that affect acute cold tolerance. Any cells that would have died either early or late during cold exposure would have not been sequenced, and thus the sgRNAs targeting a specific gene in those cells would appear depleted, regardless of whether these cells died early/acutely or later during cold exposure. 

      We thank the reviewer for pointing out two additionally highly relevant studies. Interestingly, the genes implicated in cold tolerance in these studies, FOXO1 and ALAS1, did not appear essential for survival at 37°C or 4°C  in BHK-21 or K562 cells. There are several possibilities that could explain this finding: 1) our screen may not have successfully knocked out these genes, 2) other proteins may have compensated for their loss, or 3) these pathways may regulate cold tolerance in some but not all cell types. We apologize that in the current version of the manuscript we did not reflect on these recent studies. We will expand our discussion to include their findings. 

      Once again, we are grateful for the reviewer’s insights, which have highlighted key areas for further exploration as well as pointed to specific ways to improve our manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment

      This study addresses a question in sensory ethology and active sensing in particular. It links the production of a specific signal - electrosensory chirps - to various contexts and conditions to argue that the main function is to enhance conspecific localization rather than communication as previously believed. The study provides a lot of valuable data, but the methods section is incomplete making it difficult to evaluate the claims.

      We have now added to the methods a new paragraph describing in better detail the analysis done to prepare the data used in figure 7. The figure itself has been substantially changed: we now show EOD fields and electric images using voltage, instead of current and we have better illustrated the comparisons between chirps and beats using statistical analysis.

      Eventually, we are equally grateful to all Reviewers for the constructive criticism and for the time spent in evaluating our manuscript. It certainly helped to improve both the quality of the data presented as well as the readability of the text.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors investigate the role of chirping in a species of weakly electric fish. They subject the fish to various scenarios and correlate the production of chirps with many different factors. They find major correlations between the background beat signals (continuously present during any social interactions) or some aspects of social and environmental conditions with the propensity to produce different types of chirps. By analyzing more specifically different aspects of these correlations they conclude that chirping patterns are related to navigation purposes and the need to localize the source of the beat signal (i.e. the location of the conspecific).

      The study provides a wealth of interesting observations of behavior and much of this data constitutes a useful dataset to document the patterns of social interactions in these fish. Some data, in particular the high propensity to chirp in cluttered environments, raises interesting questions. Their main hypothesis is a useful addition to the debate on the function of these chirps and is worth being considered and explored further.

      After the initial reviewers' comments, the authors performed a welcome revision of the way the results are presented. Overall the study has been improved by the revision. However, one piece of new data is perplexing to me. The new figure 7 presents the results of a model analysis of the strength of the EI caused by a second fish to localize when the focal fish is chirping. From my understanding of this type of model, EOD frequency is not a parameter in the model since it evaluates the strength of the field at a given point in time. Therefore the only thing that matters is the phase relationship and strength of the EOD. Assuming that the second fish's EOD is kept constant and the phase relationship is also the same, the only difference during a chirp that could affect the result of the calculation is the potential decrease in EOD amplitude during the chirp. It is indeed logical that if the focal fish decreased its EOD amplitude the target fish's EOD becomes relatively stronger. Where things are harder to understand is why the different types of chirps (e.g. type 1 vs type 2) lead to the same increase in signal even though they are typically associated with different levels of amplitude modulations. Also, it is hard to imagine that a type 2 chirp that is barely associated with any decrease in EOD amplitude (0-10% maybe), would cause a doubling of the EI strength. There might be something I don't understand but the authors should provide a lot more details on how this result is obtained and convince us that it makes sense.

      We hope we have now resolved the Reviewer’s concerns by applying major edits to Figure 7. We now use voltage - not current - to quantify the impact of chirps on electric images. The effect of chirps is here estimated using the integral of the beat AM, as a broad measure of the potential effects chirping may have on electroreceptors. We underline in the text that this analysis does not represent proof for any type of processing occurring in the fish brain, but we only express in hypothetical terms that - based on the beat perturbations measured - additional spatial information may potentially be available in electric images, as a consequence of chirping. Whether the fish uses this information, or not, needs to be assessed through electrophysiology in future studies.

      Finally, the reviewer is concerned about this sentence in the rebuttal - "The methods section has been edited to clarify the approach (not yet)". This section is unfinished, which suggests that it is difficult to explain the modeling results from a logical point of view. Thus the reviewer's major concern from the previous review remains unresolved. To summarize, the model calculates field strengths at an instant in time and integrates over time with a 500 ms window. This window is 10 times longer than the small chirps, while the longer chirps cover a much larger proportion of the window. Yet, the small chirps have a bigger impact on discriminability than the longer chirps. The authors should attempt to explain this seemingly contradictory result. This remains a major issue because this analysis was the most direct evidence that chirping could impact localization accuracy.

      We added a new method section describing the new figure and hopefully it is explaining more clearly how the effect of chirps is calculated. Since most p-units are affected by the beat cyclic AMs, any change on the electric image caused by a chirp will result in changes in transcutaneous voltage - i.e. the voltage measurable at the receptor level. Overall, this added analysis is not a central point of the manuscript, it is part of an attempt to hint to physiological mechanisms implied which cannot be explored in the current study. We do not mean to propose that these estimates represent alternatives to electrophysiological recordings, rather theoretical evidences which could in fact support this type of investigation. 

      Reviewer #2 (Public Review):

      Studying Apteronotus leptorhynchus (the weakly electric brown ghost knifefish), the authors provide evidence that 'chirps' (brief modulations in the frequency and amplitude of the ongoing wave-like electric signal) function in active sensing (specifically homeoactive sensing) rather than communication. Chirping is a behavior that has been well studied, including numerous studies on the sensory coding of chirps and the neural mechanisms for chirp generation. Chirps are largely thought to function in communication behavior, so this alternative function is a very exciting possibility that should have a great impact on the field.

      The authors provide convincing evidence that chirps may function in homeoactive sensing. In particular, the evidence showing increased chirping in more cluttered environments and a relationship between chirping and movement are especially strong and suggestive. Their evidence arguing against a role for chirps in communication is not as strong. However, based on an extensive review of the literature, the authors conclude, I think fairly, that the evidence arguing in favor of a communication function is limited and inconclusive. Thus, the real strength of this study is not that it conclusively refutes the communication hypothesis, but that it calls this hypothesis into question while also providing compelling evidence in favor of an alternative function.

      In summary, although the evidence against a role for chirps in communication is not as strong as the evidence for a role in active sensing, this study presents very interesting data that is sure to stimulate discussion and follow-up studies. The authors acknowledge that chirps could function as both a communication and homeactive sensing signal, and the language arguing against a communication function is appropriately measured. A given electrical behavior could serve both communication and homeoactive sensing. I suspect this is quite common in electric fish (not just in gymnotiforms such as the species studied here, but also in the distantly related mormyrids), and perhaps in other actively sensing species such as echolocating animals.

      We are grateful to the Reviewer for the kind assessment.

      Reviewer #3 (Public Review):

      Summary:

      This important paper provides the best-to-date characterization of chirping in weakly electric fish using a large number of variables. These include environment (free vs divided fish, with or without clutter), breeding state, gender, intruder vs resident, social status, locomotion state and social and environmental experience, without and with playback experiments. It applies state-of-the-art methods for reducing the dimensionality of the data and finding patterns of correlation between different kinds of variables (factor analysis, K-means). The strength of the evidence, collated from a large number of trials with many controls, leads to the conclusion that the traditionally assumed communication function of chirps may be secondary to its role in environmental assessment and exploration that takes social context into account. Based on their extensive analyses, the authors suggest that chirps are mainly used as probes that help detect beats caused by other fish as well as objects.

      Strengths:

      The work is based on completely novel recordings using interaction chambers. The amount of new data and associated analyses is simply staggering, and yet, well organized in presentation. The study further evaluates the electric field strength around a fish (via modelling with the boundary element method) and how its decay parallels the chirp rate, thereby relating the above variables to electric field geometry. The BEM modelling also convincingly predicts how the electric image of a receiver conspecific on a sending fish is enhanced by a chirp.

      The main conclusions are that the lack of any significant behavioural correlates for chirping, and the lack of temporal patterning in chirp time series, cast doubt on a primary communication goal for most chirps. Rather, the key determinants of chirping are the difference in frequency between two interacting conspecifics as well as individual subjects' environmental and social experience. The paper concludes that there is a lack of evidence for stereotyped temporal patterning of chirp time series, as well as of sender-receiver chirp transitions beyond the known increase in chirp frequency during an interaction. The authors carefully submit that the new putative echolocation function of chirps is not mutually exclusive with a possible communication function.

      These conclusions by themselves will be very useful to the field. They will also allow scientists working on other "communication" systems to perhaps reconsider and expand the goals of the probes used in those senses. A lot of data are summarized in this paper, with thorough referencing to past work.

      The alternative hypotheses that arise from the work are that chirps are mainly used as environmental probes for better beat detection and processing and object localization, and in this sense are self-directed signals. This led to their prediction that environmental complexity ("clutter") should increase chirp rate, which is fact was revealed by their new experiments. The authors also argue that waveform EODs have less power across high spatial frequencies compared to pulse-type fish, with a resulting relatively impoverished power of resolution. Chirping in wave-type fish could temporarily compensate for the lower frequency resolution while still being able to resolve EOD perturbations with a good temporal definition (which pulse-type fish lack due to low pulse rates).

      The authors also advance the interesting idea that the sinusoidal frequency modulations caused by chirps are the electric fish's solution to the minute (and undetectable by neural wetware) echo-delays available to it, due to the propagation of electric fields at the speed of light in water. The paper provides a number of experimental avenues to pursue in order to validate the non-communication role of chirps.

      We are grateful to the Reviewer for the kind assessment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors have tried to dissect the functions of Proteasome activator 28γ (PA28γ) which is known to activate proteasomal function in an ATP-independent manner. Although there are multiple works that have highlighted the role of this protein in tumours, this study specifically tried to develop a correlation with Complement C1q binding protein (C1QBp) that is associated with immune response and energy homeostasis.

      Strengths:

      The observations of the authors hint that beyond PA28y's association with the proteasome, it might also stabilize certain proteins such as C1QBP which influences energy metabolism.

      Weaknesses:

      The strength of the work also becomes its main drawback. That is, how PA28y stabilizes C1QBP or how C1QBP elicits its pro-tumourigenic role under PA28y OE.<br /> In most of the experiments, the authors have been dependent on the parallel changes in the expression of both the proteins to justify their stabilizing interaction. However, this approach is indirect at best and does not confirm the direct stabilizing effect of this interaction. IP experiments do not indicate direct interaction and have some quality issues. The upregulation of C1QBP might be indirect at best. It is quite possible that PA28y might be degrading some secondary protein/complex that is responsible for C1QBP expression. Since the core idea of the work is PA28y direct interaction with C1QBP stabilizing it, the same should be demonstrated in a more convincing manner.

      Thank you very much for the important comments. Using AlphaFold 3, we found that interaction between PA28γ and C1QBP may depend on amino acids 1-167 and 1-213 (Revised Appendix Figure 1D-H), which was confirmed by our immunoprecipitation (Revised Figure 1I). In the future, we will use nuclear magnetic resonance spectroscopy to analyze protein-protein interaction between PA28γ and C1QBP and demonstrate it by GST pull down in vitro experiments.

      In all of the assays, C1QBP has been detected as doublet. However, the expression pattern of the two bands varies depending on the experiment. In some cases, the upper band is intensely stained and in some the lower bands. Do C1QBP isoforms exist and are they differentially regulated depending on experiment conditions/tissue types?

      Thank you very much for the important comments. We have rechecked the experimental results with two bands, which may have been caused by using polyclonal antibody of C1QBP (Abcam: ab101267). Therefore, we conducted the experiment with monoclonal antibody of C1QBP (Cell Signaling Technology: #6502) and replaced the corresponding images in revised figure (Revised Figure 1E and Revised Appendix Figure 3D).

      Problems with the background of the work: Line 76. This statement is far-fetched. There are presently a number of works of literature that have dealt with the metabolic programming of OSCC including identification of specific metabolites. Moreover, beyond the estimation of OCR, the authors have not conducted any experiments related to metabolism. In the Introduction, the significance of this study and how it will extend our understanding of OSCC needs to be elaborated.

      Thank you very much for the important comments. Based on your suggestion, we have revised the content and updated the references (“Introduction”, Paragraph 2, Line 13-17 and Paragraph 4, Line 5-8). In addition, we plan to conduct experiments to investigate the regulation of metabolism by PA28γ and C1QBP and update our data in the future.

      The modified content is as follows:

      “Current research on metabolic reprogramming in OSCC primarily focused on mechanism of glycolytic metabolism and metabolic shift from glycolysis to oxidative phosphorylation (OXPHOS) of oral squamous cell carcinoma, which lays the groundwork for novel therapeutic interventions to counteract OSCC (Chen et al., 2024; Zhang et al., 2020).”

      “It is the first study to describe the undiscovered role of PA28γ in promoting the malignant progression of OSCC by elevating mitochondrial function, providing new clinical insights for the treatment of OSCC.”

      Reviewer #2 (Public review):

      Summary:

      The authors tried to determine how PA28g functions in oral squamous cell carcinoma (OSCC) cells. They hypothesized it may act through metabolic reprogramming in the mitochondria.

      Strengths:

      They found that the genes of PA28g and C1QBP are in an overlapping interaction network after an analysis of a genome database. They also found that the two proteins interact in coimmunoprecipitation and pull-down assays using the lysate from OSCC cells with or without expression of the exogenous genes. They used truncated C1QBP proteins to map the interaction site to the N-terminal 167 residues of C1QBP protein. They observed the levels of the two proteins are positively correlated in the cells. They provided evidence for the colocalization of the two proteins in the mitochondria, the effect on mitochondrial form and function in vitro and in vivo OSCC models, and the correlation of the protein expression with the prognosis of cancer patients.

      Weaknesses:

      Many data sets are shown in figures that cannot be understood without more descriptions, either in the text or the legend, e.g., Figure 1A. Similarly, many abbreviations are not defined.

      Thank you very much for the important comments. We have revised the descriptions in the legend to make it easier to understand.

      Some of the pull-down and coimmunoprecipitation data do not support the conclusion about the PA28g-C1QBP interaction. For example, in Appendix Figure 1B the Flag-C1QBP was detected in the Myc beads pull-down when the protein was expressed in the 293T cells without the Myc-PA28g, suggesting that the pull-down was not due to the interaction of the C1QBP and PA28g proteins. In Appendix Figure 1C, assume the SFB stands for a biotin tag, then the SFB-PA28g should be detected in the cells expressing this protein after pull-down by streptavidin; however, it was not. The Western blot data in Figure 1E and many other figures must be quantified before any conclusions about the levels of proteins can be drawn.

      Thank you very much for the meticulous review. We have rechecked the experimental results, and we made a mistake in the labeling of the image. Therefore, we have corrected it in the revised figure (Revised Appendix Figure 1B, C). In addition, we have conducted a quantitative analysis of gray values to confirm the results of western blot data are accurate by Image J software.

      The immunoprecipitation method is flawed as it is described. The antigen (PA28g or C1QBP) should bind to the respective antibody that in turn should binds to Protein G beads. The resulting immunocomplex should end up in the pellet fraction after centrifugation and be analyzed further by Western blot for coprecipitates. However, the method in the Appendix states that the supernatant was used for the Western blot.

      Thank you very much for the careful review. We have corrected it in the revised appendix file (“Supplemental Materials and Methods”, Part“Immunoprecipitation assay”, Line 4-6).

      The modified content is as follows:

      The sample was shaken on a horizontal shaker for 4 h, after which the deposit was collected for western blotting.

      To conclude that PA28g stabilizes C1QBP through their physical interaction in the cells, one must show whether a protease inhibitor can substitute PA28q and prevent C1QBP degradation, and show whether a mutation that disrupts the PA28g-C1QBP interaction can reduce the stability of C1QBP. In Figure 1F, all cells expressed Myc-PA28g. Therefore, the conclusion that PA28g prevented C1QBP degradation cannot be reached. Instead, since more Myc-PA28g was detected in the cells expressing Flag-C1QBP compared to the cells not expressing this protein, a conclusion would be that the C1QBP stabilized the PA28g. Figure 1G is a quantification of Western blot data that should be shown.

      Thank you very much for the meticulous review. We have rechecked the experimental results, and we made a mistake in the labeling of the image. Therefore, we have corrected it in the revised figure. Compared with the control group, the presence of Myc-PA28γ significantly increased the expression level of Flag-C1QBP (Revised Figure 1F). Gray value analysis showed that in cells transfected with Myc-PA28γ, the decay rate of Flag-C1QBP was significantly slower than that of the control group (Revised Figure 1G), suggesting that PA28γ can delay the protein degradation of C1QBP and stabilize its protein level. This indicates that an increase in the level of PA28γ protein can significantly enhance the expression level of C1QBP protein, while PA28γ can slow down the degradation rate of C1QBP and improve its stability. In addition, we plan to conduct experiments to investigate the effects of protease inhibitors and PA28γ mutants on the stability of C1QBP and update our data in the future.

      The binding site for PA28g in C1QBP was mapped to the N-terminal 167 residues using truncated proteins. One caveat would be that some truncated proteins did not fold correctly in the absence of the sequence that was removed. Thus, the C-terminal region of the C1QBP with residues 168-283 may still bind to the PA29g in the context of full-length protein. In Figure 1I, more Flag-C1QBP 1-167 was pulled down by Myc-PA28g than the full-length protein or the Flag-C1QBP 1-213. Why?

      Thank you very much for the important comments. Immunoprecipitation is a qualitative experiment. Using AlphaFold 3, we found that interaction between PA28γ and C1QBP may depend on amino acids 1-167 and 1-213 (Revised Appendix Figure 1D-H), which was confirmed by our immunoprecipitation (Revised Figure 1I).

      The interaction site in PA28g for C1QBP was not mapped, which prevents further analysis of the interaction. Also, if the interaction domain can be determined, structural modeling of the complex would be feasible using AlphaFold2 or other programs. Then, it is possible to test point mutations that may disrupt the interaction and if so, the functional effect.

      Thank you very much for the important comments. Based on your suggestion, we have added relevant content to the revised appendix figure. (Revised Appendix Figure 1D-H).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) There are a lot of typos in the figure and manuscript that need to be addressed.

      Thank you very much for the important comments. We have corrected the typos in the revised figure and manuscript.

      (2) Figure 1A: The amount of protein that has been immunoprecipitated is more than the actual amount present in the lysate. The authors should calculate the efficiency of the precipitation to support their results.

      Thank you very much for the important comments. Immunoprecipitation is a qualitative experiment. Moreover, it can enrich specific proteins and their binding partners, increase their concentration in the sample, and thus improve the sensitivity of detection.

      (3) Figure 1D: The relative expression levels of C1QBP look similar in almost all cell lines except for HN12. It seems that the relation of PA28y with C1QBP is more of a cell type-specific effect. It would be better if the blots were quantified, and the differences were statistically determined.

      Thank you very much for the important comments. We have conducted a quantitative analysis of gray values to confirm the results of western blot data are accurate by Image J software.

      (4) Figure 1E: How do the authors quantify the expression of the protein in absolute terms? From the methods, it is understood that the flag-tagged construct is stably expressed. Under such conditions, how the authors observed the variable expression of the protein should be elaborated.

      Thank you very much for the important comments. We transfected Flag-PA28γ plasmids at 0ug, 0.5ug, 1ug, and 2ug in 293T cells. After collecting the protein for Western Blot, we found that the protein expression of Flag-PA28γ gradually increased. Moreover, the increased protein expression of C1QBP is consistent with the expression of Flag-PA28γ, which indicated a dose-dependent relationship between the two proteins.

      (5) Figures 1F, G: The data does not correlate with the arguments presented in the text. The authors propose that interaction with PA28y increases the stability of C1QBP. However, the experiment lacks appropriate controls. Ideally, the expression of C1QBP should be tested in the presence and absence of PA28y. Moreover, the observed difference in expression between lanes 1-4 and 5-8 for myc-PA28y needs to be explained. Are the samples from different sources with variable PA28y expression? Figure 1G quantification for C1QBP does not correlate with the figure presented in F since the expression of the protein in the first four lanes is undetectable.

      Thank you very much for the meticulous review. We have rechecked the experimental results, and we made a mistake in the labeling of the image. Therefore, we have corrected it in the revised figure. Compared with the control group, the presence of Myc-PA28γ significantly increased the expression level of Flag-C1QBP (Revised Figure 1F). Gray value analysis showed that in cells transfected with Myc-PA28γ, the decay rate of Flag-C1QBP was significantly slower than that of the control group (Revised Figure 1G), suggesting that PA28γ can delay the protein degradation of C1QBP and stabilize its protein level. This indicates that an increase in the level of PA28γ protein can significantly enhance the expression level of C1QBP protein, while PA28γ can slow down the degradation rate of C1QBP and improve its stability. In addition, we plan to conduct experiments to investigate the effects of protease inhibitors and PA28γ mutants on the stability of C1QBP and update our data in the future.

      (6) Appendix Figure 1B: Lane 1 does not express Myc-tagged protein but pull-down has been performed using Myc beads. Then how come flag-C1qbp is getting pulled down in lane 1 if there is no PA28y? This indicates a non-specific interaction of C1qbp with the substrata under the experimental conditions used. Similarly, in Figure 1C SFB-PA28y is expressed in both lanes but is reflected only in lane 2 and not in lane 1 even when pull-down is being performed using SFB beads, again reflecting the non-specificity of the interactions shown through immunoprecipitated.

      Thank you very much for the meticulous review. We have rechecked the experimental results, and we made a mistake in the labeling of the image. Therefore, we have corrected it in the revised figure (Revised Appendix Figure 1B, C).

      (7) Figure 2A: Figure 2A the co-localization of P28y with C1QBP in mitochondria is not very convincing. The authors are urged to provide high-resolution images for the same along with quantification of co-localization coefficients.

      Thank you very much for the important comments. We plan to obtain high-resolution images of co-localization of PA28γ with C1QBP in mitochondria and add the quantification analysis. We will update our data in the future.

      (8) Figure 2C: Mitochondria dynamics is an interplay of multiple factors. From the images, it seems that PA28y OE elevates mitochondria biogenesis in general which is having an umbrella effect on mitochondria fusion/fission and OCR. Images also do not convincingly indicate changes in mitochondrial length. The role of PA28y on mitochondria dynamics requires further justification. However, the presented data does not underline whether the changes in mitochondria behaviour are a consequence of PA28y and C1QBP interaction. Correlating higher mitochondria respiration with ROS generation is a far-fetched conclusion since, at present, there are multiple reports that suggest otherwise.

      Thank you very much for the important comments. We plan to knock out the interaction regions between PA28γ and C1QBP (like amino acids 1-167 and 1-213) to confirm whether PA28γ affects mitochondrial function through C1QBP and update our data in the future.

      (9) Line 157: The presented data does not substantiate the claims made that Pa28y regulates mitochondrial function through C1QBP.

      Thank you very much for the important comments. Based on your suggestion, we have made some modifications to make it more accurate (“Results”, Part “PA28γ and C1QBP colocalize in mitochondria and affect mitochondrial functions”, Paragraph 3, Line 1-2).

      The modified content is as follows:

      “Collectively, these data suggest that PA28γ, which co-localizes with C1QBP in mitochondria, may involve in regulating mitochondrial morphology and function.”

      (10) Line 159: From the past data it is not very clear how PA28y upregulates C1QBP, hence the statement is not well supported. The presented data indicates the presence of a functional association between the two proteins.

      Thank you very much for the important comments. We detected the expression of C1QBP in two PA28γ-overexpressing OSCC cells (UM1 and 4MOSC2) and found an increase in C1QBP expression (Revised Figure 4B). Based on the results of the protein levels of the mitochondrial respiratory chain complex and other mitochondrial functional proteins, we believe that PA28γ regulates mitochondrial function by upregulating C1QBP.

      (11) Figure 4A, B: Given the mitochondrial role of C1QBP, the lesser levels of mitochondrial proteins upon C1QBP silencing are expected. Does it get phenocopied upon PA28y silencing? Similarly, all the subsequent mitochondrial phenotypes in D should be seen in a PA28y-depleted background.

      Thank you very much for the important comments. We plan to detect the mitochondrial protein expressions and OCRs of PA28γ-silenced OSCC cells. We will update our data in the future.

      (12) Line 198: The presented data do indicate a functional association between these two proteins but it does not provide a solid evidence for the same.

      Thank you very much for the important comments. Based on your suggestion, we have made some modifications to make it more accurate (“Discussion”, Paragraph 1, Line 9-10).

      The modified content is as follows:

      “Excitingly, we found the evidence that PA28γ interacts with and stabilizes C1QBP.”

      (13) Line 218-220: In this work, the authors highlight the non-degradome role of PA28y and hence, this fact should be treated appropriately in discussion in line with the presented data.

      Thank you very much for the important comments. Based on your suggestion, we have added relevant content to the revised manuscript (“Discussion”, Paragraph 2, Line 16-19).

      The modified content is as follows:

      “In addition, PA28γ can also play as a non-degradome role on tumor angiogenesis. For example, PA28γ can regulate the activation of NF-κB to promote the secretion of IL-6 and CCL2 in OSCC cells, thus promoting the angiogenesis of endothelial cells ( S. Liu et al., 2018).”

      (14) Line 236-240: Although the authors' statement on organ heterogeneity being the cause for getting the contrasting result is justifiable but here there is no direct evidence of PA28y involvement in regulation of OXPHOS and its impact on cellular metabolism (glycolysis, metabolic signalling, etc).

      Thank you very much for the important comments. Based on your suggestion, we have made some modifications to make it more accurate (“Discussion”, Paragraph 3, Line 7-9).

      The modified content is as follows:

      “Therefore, PA28γ's regulation of OXPHOS may impact cellular energy metabolism.”

      (15) Line 249: No conclusive data supporting this statement.

      Thank you very much for the important comments. Based on your suggestion, we have made some modifications to make it more accurate (“Discussion”, Paragraph 5, Line 1-3).

      The modified content is as follows:

      “Furthermore, our study reveals that PA28γ can regulate C1QBP and influence mitochondrial morphology and function by enhancing the expression of OPA1, MFN1, MFN2 and the mitochondrial respiratory complex.”

      Reviewer #2 (Recommendations for the authors):

      (1) The images shown in Figure 2A need to be quantified before the conclusion about the mitochondrial colocalization of the two proteins can be drawn. In Figure 2B and Appendix Figure 2A, the mitochondrial vacuoles and ridge should be indicated for general readers, and quantification should be performed before the conclusion is drawn.

      Thank you very much for the important comments. We will update our data in the future.

      (2) The OCR data from two cell lines are shown in Figure 2E and F. Which is which? The sentence, "The results indicated ... compared to control cells" in lines 130-132, was confusing; perhaps, it would be clear if "were significantly greater" could be deleted.

      Thank you very much for the important comments. We have re-labeled the Figure 2E and F to make it clearly (Revised Figure 2E, F). Based on your suggestion, we have deleted the words in revised manuscript. (“Results”, Part “PA28γ and C1QBP colocalize in mitochondria and affect mitochondrial functions”, Paragraph 1, Line 9-11).

      The modified content is as follows:

      “The results indicated significantly higher basal respiration, maximal OCRs and ATP production in PA28γ-overexpressing cells compared to control cells (Fig. 2G-I and Appendix Fig. 2B-D).”

      (3) Figures 4E-H show the migration, invasive, and proliferation capabilities of the cells. Which for which?

      Thank you very much for the important comments. We have re-labeled the Figure 4F-H to make it clearly (Revised Figure 4F-H).

      (4) In the Discussion, lines 198-201, it states that "C1QBP enhances ... function of OPA1, MNF1, MFN2..." What is the evidence? In lines 222-224, it says that "the binding sites ... may mask the specific ... modification sites". Please justify. In lines 253-254, "fuse" and fuses" are misleading, Did the authors mean "localize" and "localizes"?

      Thank you very much for the important comments. Based on your suggestion, we have made some modifications to make it more accurate (“Discussion”, Paragraph 1, Line 9-13, Paragraph 2, Line 20-23, and Paragraph 5, Line 3-6).

      The modified content is as follows:

      “Excitingly, we found the evidence that PA28γ interacts with and stabilizes C1QBP. We speculate that aberrantly accumulated C1QBP enhances the function of mitochondrial OXPHOS and leads to the production of additional ATP and ROS by activating the expression and function of OPA1, MNF1, MFN2 and mitochondrial respiratory chain complex proteins.”

      “Our study reveals that PA28γ interacts with C1QBP and stabilizes C1QBP at the protein level. Therefore, we speculate that the binding sites of PA28γ and C1QBP may mask the specific post-translational modification sites of C1QBP and inhibit its degradation.”

      “Mitochondrial fusion, crucial for oxidative metabolism and cell proliferation, is regulated by MFN1, MFN2, and OPA1. The first two fuse with the outer mitochondrial membrane, while the last fuses with the inner mitochondrial membrane (Westermann, 2010).”

      (5) Figure 6 was not referred to in the text. In this figure, PA28g and C1QBP are located in the inner membrane and matrix. Has this been determined? What is the blue ovals that are intermediaries of PA28g/C1QBP and OPA1/MFN1/MFN2?

      Thank you very much for the important comments. According to our immunofluorescence assay (Figure 2A), PA28γ is in both the nucleus and cytoplasm. A recent study has demonstrated that PA28γ can shuttle between the nucleus and cytoplasm, participating in various cellular processes. Furthermore, GeneCard information indicates that the subcellular localization of PA28γ includes the nucleus, cytoplasm and mitochondria (Author response image 1). In this article, we mainly focus on the functions of PA28γ and C1QBP located in the cytoplasm. Therefore, figure 6 mainly displays PA28γ and C1QBP in the cytoplasm. Based on your suggestion, we have made some modifications to make it more accurate in revised figure (Revised Figure 6).

      Author response image 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The manuscript by Poltavski and colleagues describes the discovery of previously unreported enteric neural crestderived cells (ENCDC) which are marked by Pax2 and originating from the Placodes. By creating multiple conditional mouse mutants, the authors demonstrate these cells are a distinct population from the previously reported ENCDCs which originate from the Vagal neural crest cells and express Wnt1.

      These Pax2-positive ENCDCs are affected due to the loss of both Ret and Ednrb highlighting that these cells are also ultimately part of the canonical processes governing ENCDC and enteric nervous system (ENS) development. The authors also make explant cultures from the mouse GI tract to detect how Ednrb signaling is important for Ret signaling pathways in these cells and rediscovers the interactions between these 2 pathways. One important observation the authors make is that CGRP-positive neurons in the adult distal colon seem to be primarily derived from these Pax2-positive ENCDCs, which are significantly reduced in the Ednrb mutants, thus highlighting the role of Ednrb in maintaining this neuronal type.

      I appreciate the amount of work the authors have put into generating the mouse models to detect these cells, but there isn't any new insight on either the nature of ENCDC development or the role of Ret and Ednrb. Also, there are sophisticated single-cell genomics methods to detect rare cell type/states these days and the authors should either employ some of those themselves in these mouse models or look at extensively publicly available single-cell datasets of the developing wildtype and mutant mouse and human ENS to map out the global transcriptional profile of these cells. A more detailed analysis of these Pax2-positive cells would be really helpful to both the ENS community as well as researchers studying gut motility disorders.

      We would like to point out that the reviewer’s comments in both Public Review and in some cases reiterated in Recommendations for the Authors are rooted in several misunderstandings. The reviewer writes “Pax2-positive ENCDCs”, as if the Pax2 lineage (properly, the Pax2Cre-labeled lineage) of the ENS is a subset of neural crest, and states that “there isn’t any new insight” from our study on ENS development. Our conclusion is quite different, that the Pax2Cre lineage (placode-derived) is distinct from the neural crest-derived cell lineage. The reviewer may not have appreciated that our study establishes a fundamental reinterpretation of the very long-standing dogma that the ENS is derived solely from neural crest. We believe that finding and characterizing the unique contribution of an independent cell lineage to the ENS provides critical new perspectives into ENS development and the etiology of Hirschsprung disease. One feature of the Pax2Cre (placodal) lineage is as the source of CGRP-positive mechanosensory neurons in the colon (as the reviewer mentioned), but this is one feature of the larger conceptual discovery of the existence of a separate lineage contribution to the ENS, not the most important observation in and of itself.

      The reviewer continues by saying that we “rediscovered” the interaction between Ednrb and Ret in ENS development. In our study we show that the two lineages (placode-derived and neural crest-derived) employ Ednrb and Ret signaling in distinct ways. This isn’t simply rediscovery, this is new insight. To the extent that both lineages utilize both signaling axes (albeit with mechanistic differences) is a primary reason why the unique placodal lineage contribution to the ENS remained unsuspected until now. We have revised the text to make these points more clear in our revised manuscript.

      The reviewer also suggests single cell genomic methods, which is addressed below in our response to the reviewer’s first recommendation.

      Reviewer #2 (Public Review):

      This manuscript by Poltavski and colleagues explores the relative contributions of Pax2- and Wnt1- lineagederived cells in the enteric nervous system (ENS) and how they are each affected by disruptions in Ret and Endrb signaling. The current understanding of ENS development in mice is that vagal neural crest progenitors derived from a Wnt1+ lineage migrate into and colonize the developing gut. The sacral neural crest was thought to make a small contribution to the hindgut in addition but recent work has questioned that contribution and shown that the ENS is entirely populated by the vagal crest (PMID: 38452824). GDNF-Ret and Endothelin3-Ednrb signaling are both known to be essential for normal ENS development and loss of function mutations are associated with a congenital disorder called Hirschsprung's disease. The transcription factor Pax2 has been studied in CNS and cranial placode development but has not been previously implicated in ENS development. In this work, the authors begin with the unexpected observation that conditional knockout of Ednrb in Pax2-expressing cells causes a similar aganglionosis, growth retardation, and obstructed defecation as conditional knockout of Ednrb in Wnt1-expressing cells. The investigators then use the Pax2 and Wnt1 Cre transgenic lines to lineage-trace ENS derivatives and assess the effects of loss of Ret or Ednrb during embryonic development in these lineages. Finally, they use explants from the corresponding embryos to examine the effects of GDNF on progenitor outgrowth and differentiation.

      Strengths:

      -  The manuscript is overall very well illustrated with high-resolution images and figures. Extensive data are presented.

      -  The identification of Pax2 expression as a lineage marker that distinguishes a subset of cells in the ENS that may be distinct from cells derived from Wnt1+ progenitors is an interesting new observation that challenges the current understanding of ENS development.

      -  Pax2 has not been previously implicated in ENS development - this manuscript does not directly test that role but hints at the possibility.

      -  Interrogation of two distinct signaling pathways involved in ENS development and their relative effects on the two purported lineages.

      The reviewer provided a succinct and accurate summary of our analysis. We correct just the one statement that the ENS is entirely populated by vagal crest. The paper cited by the reviewer (PMID: 38452824) used Wnt1DreERT2 to lineage label the NC population, so of course only looked at neural crest (comparing vagal vs. sacral NC). The advance in our study is to newly document the independent contribution of the placodal lineage.

      Weaknesses:

      -  The major challenge with interpreting this work is the use of two transgenic lines, rather than knock-ins, Wnt1Cre and Pax2-Cre, which are not well characterized in terms of fidelity to native gene expression and recombination efficiency in the ENS. If 100% of cells that express Wnt1 do not express this transgene or if the Pax2 transgene is expressed in cells that do not normally express Pax2, then these observations would have very different interpretations and not support the conclusions made. The two lineages are never compared in the same embryo, which also makes it difficult to assess relative contributions and renders the evidence more circumstantial than definitive.

      We do not agree that the Cre lines being transgenics rather than knock-ins changes the utility of these reagents or the interpretation of the results; there are also potential problems with knock-in alleles. Wnt1Cre has been in use for 25 years as a pan-neural crest lineage cell marker with exceptional efficiency and specificity (including numerous studies of the ENS), so we disagree that it is not well characterized. Pax2Cre of course has not previously been studied in the ENS, but it has been broadly used in other contexts (e.g., craniofacial, kidney). That said, and as noted in our original manuscript, we are aware that an issue of this study is the uniqueness of the recombination domains of the two Cre lines.  As we wrote, Wnt1Cre and Pax2Cre cannot be combined into the same embryo because they are both Cre lines, and we do not have a suitable nonCre recombinase line to substitute for either. Instead, we demonstrate that the two lines recombine in distinct territories of the early embryonic ectoderm, and that the two lineages thus labeled are distinct in marker expression at the initial onset of their delamination, utilize Edn3-Ednrb and GDNF-Ret in distinct ways during their migration to the hindgut, and contribute to different terminal cell fates in the colon. We think this evidence of the distinct nature of the two lineages from start to finish is compelling rather than merely circumstantial.

      -  Visualization of the Pax2-Cre and Wnt-1Cre induced recombination in cross-sections at postnatal ages would help with data interpretation. If there is recombination induced in the mesenchyme, this would particularly alter the interpretation of Ednrb mutant experiments, since that pathway has been shown to alter gut mesenchyme and ECM, which could indirectly alter ENS colonization.

      We have several thoughts about this comment. First, we are uncertain why postnatal analysis would be informative, as ENS colonization occurs (or fails to occur in mutants) during embryogenesis. The reviewer might be thinking of a juvenile stage additional contribution to the ENS, which is addressed below (responses to Recommendations for the Authors) but as we discuss there is not relevant to our analysis. Second, we did examine recombination in the distal hindgut at E12.5 during ENS colonization (Fig. 1f and 1h) and did not see overlap between either Cre recombination domain and Edn3 mRNA expression (which is expressed by the nonENS mesenchyme). Furthermore, Ednrb is not expressed in the gut mesenchyme during ENS colonization (Fig. 7figure supplement 1), thus ectopic mesenchymal Cre expression, if any, by either line would have no impact in Cre/Ednrb mutants. Lastly, the reviewer’s idea could have been a plausible hypothesis at the onset of the project, but here we show positive evidence for a different explanation. We do not rigorously exclude the reviewer’s hypothesis, nor other theoretically possible models, but we think we have provided a strong case to support the direct involvement of Ret and Ednrb in ENS progenitors rather than in surrounding non-neural mesenchyme.

      -  No consideration of glia - are these derived from both lineages?

      To properly address this question would require new reagents and analyses that we have not yet initiated. While an interesting question from a developmental biology standpoint, we don’t think that this investigation would change any of the interpretations that we make in the manuscript.

      -  No discussion of how these observations may fit in with recent work that suggests a mesenchymal contribution of enteric neurons (PMID: 38108810).

      The recent paper cited by the reviewer is very explicit in describing this mesenchymal contribution to the ENS as occurring after postnatal day P11. Other than the terminal Hirschsprung phenotype, all of our analysis of cell lineage migration and fate and colonic aganglionosis was conducted at embryonic or early (P9) postnatal stages. We therefore do not see a relation of our work to this study. In light of this paper, however, we do agree that it would be worthwhile in a future study to explore Wnt1Cre and Pax2Cre lineage dynamics in the ENS of older mice.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should reanalyze multiple single-cell RNA-seq datasets available now, to see if these cells are detected in those studies and then look at the global transcriptional profile of these Pax2-positive cells compared to the other vagal neural crest-derived ENCDCs. Some of these datasets can be found here - PMIDs: 33288908, 37585461, and https://www.gutcellatlas.org/.

      We disagree that the datasets from previous studies provide additional insights that are relevant to the current study. It must be appreciated that Wnt1Cre and Pax2Cre are genetic lineage tracers and that migratory ENS progenitor cells labeled with these reagents do not maintain expression of Wnt1 and Pax2 mRNA or protein. The Wnt1 and Pax2 genes are only transiently expressed within their distinct regions of the ectoderm, and their expression turns off as cells delaminate and begin migration. Thus, Pax2Cre-labeled ENS progenitor cells are not Pax2-positive thereafter. The single cell RNA-Seq studies suggested by the reviewer were collected from older embryos and postnatal mice, and do not represent the E10.5-E11.5 period that accounts for genesis of Ret-mediated and Ednrb-mediated Hirschsprung disease pathology. Even with the most recent work by Zhou et al (Dev Cell, 2024) that included E10.5 cells, this analysis only evaluated neural crest-derived Sox10Cre lineage cells, which does not include the placode-derived Pax2Cre lineage (as we show explicitly in Fig. 2-figure supplement 2).  Consequently, it would not be possible to find the “Pax2-positive cells” in these datasets. Performing a new transcriptomic analysis by isolating Pax2Cre-lineage and Wnt1Cre-lineage cells at the appropriate developmental time points could be the basis of future studies, but we think these are beyond the scope of the present paper. 

      (2) Even in their current quantification method of using immunofluorescent cells in a microscopic field, the authors count very few cells. The quantification in Figures 2v-2z is only from 4 embryos and is in the hundreds. This leads to misrepresentation of cell numbers and is best reflected in Figure 2x, where Wnt1Cre/Ret GI tracts have 0 Ret +ve cells, which we now know is not true even in ubiquitous Ret null embryos, where Ret null cells are detected as late as E14.5 (PMID 37585461)

      Because of the reviewer’s comment, we recognize that the specific detail about cell numbers wasn’t properly written. We didn’t count a few hundred cells total, it was a few hundred cells per embryo. Exact numbers are provided in the revised figure legend where “cells/embryo” is now explicitly stated. Multiplied by the number of embryos, this means that we evaluated approx. 1000 total cells per genotype and time point in cases where Ret+ and/or GFP+ (lineage+) cells were found. The total absence of such cells in Wnt1Cre/Ret mutants is a rigorous conclusion. Our results do not misrepresent nor contradict the study by Vincent et al (PMID 37585461). Our analyses were performed on gut tissue isolated at E10.5 and E11.5 stages, which is long before Schwann cell precursors (SCPs, the primary focus of the Vincent et al study) colonize the gut (E14.5; Uesaka et al, 2015. PMID: 26156989). Indeed, as the reviewer notes, SCPs migrate into the gut in a Retindependent manner. For being at a much earlier time point, our focus is on the cranial ectoderm sources of ENS progenitors. We have adjusted the text associated with Fig. 2 to make this more clear.

      (3) There are multiple sections in the manuscript that rehash already known facts, like the whole section about Wnt1 conditional Ret null mice which show failure of migration of ENCDCs. This has been shown multiple times and doesn't add anything to the author's story.

      We think this comment stems from the reviewer’s perception that the Pax2Cre lineage is a subset of neural crest. The Wnt1Cre data (including Ret-deficient and Ednrb-deficient embryos) presented in the manuscript are not intended to rehash what is already known but to establish important similarities and differences between the newly identified placode-derived and the well-established neural crest-derived ENS progenitor cells. In light of the reviewer’s suggestion #8 below, to move the Wnt1Cre lineage analysis to a supplement, this information remains in the main text to provide proper comparison to the Pax2Cre-lineage profile. We think we were fair in the text to the legacy of work on neural crest and ENS development and were explicit in using our Wnt1Cre analysis to compare to the Pax2Cre lineage. Finally, we point out that our analysis was conducted on a different genetic background (outbred ICR) compared to previous studies, and there are strain-specific differences in Hirschsprung-associated lethality between our background and previous studies, so it was not impossible that the behavior of the neural crest cell lineage in the ICR background could be different from past observations on different backgrounds. Although we did not identify any major differences, it is important that the information on NC behavior in this background be presented. 

      (4) Also, the conclusion drawn for Figure 5C "this indicates that the Wnt1Cre-derived cells do not harbor a cellautonomous response to GDNF" seems to suggest the authors are not very well versed with the ENS literature. GDNF as well as EDN3 are expressed from surrounding mesenchyme and are cell non-autonomous.

      The reviewer seems to have misread or misunderstood the specific statement as well as the more important broader conclusion of the experiment. First, of course the source of GDNF ligand in vivo is the mesenchyme. The explant assay was designed to eliminate this and then to substitute GDNF as provided experimentally. The focus of the experiment was to address the response to GDNF, not the source of GDNF. But more importantly, the experiment revealed a surprising outcome that the reviewer did not appreciate. In Pax2Cre/Ret mutants, the Wnt1Cre lineage still expresses Ret, yet does not grow out from the gut explant when provided with GDNF. This shows that the neural crest lineage requires Ret function in placode-derived cells in order to respond to GDNF. In other words, despite expressing Ret, the NC lineage does not harbor a cellautonomous response to GDNF, as we wrote. Because this might be confusing to some readers, we have revised the description of this analysis to hopefully be more clear.

      (5) The fact that Ret and Ednrb signaling pathways interact is not a novel finding and has been reported multiple times in Ret and Ednrb mutant mice and cell lines (PMID: 12355085, 12574515 , 27693352, 31818953), potentially through shared transcription factors (PMID:31313802).It would have been more relevant if the authors could show how the specific tyrosine residue (Y 1015) in Ret is phosphorylated in the presence of Ednrb.

      The observation that human mutations in RET and EDNRB both cause Hirschsprung disease is decades old, and of course numerous studies in human, mouse, and cells have addressed the relation between the two signaling pathways. We did not mean to imply that we were the first to discover that Ret and Ednrb signaling pathways interact. The reviewer cites a number of papers all from the Chakravarti lab that address this phenomenon; while these are a valuable contribution to the field, there is still more to be learned. The model elaborated in PMID: 31313802, in which Ret and Ednrb are both enmeshed in a common gene regulatory network, does not readily explain why each has a different phenotypic manifestation and doesn’t take into account the importance of the placodal lineage. The main new contributions of our paper are the existence of a new cell lineage that contributes to the ENS, and that the placodal and neural crest lineages utilize Ret and Ednrb signaling differently. The clarification of how these elements are differentially used by the two lineages explains long-segment and short-segment Hirschsprung disease (Ret and Ednrb mutants, respectively) far better than in past studies. The reviewer unfortunately dismisses these insights and seems to feel that a biochemical exploration of one specific component of the signaling interaction (Y1015 phosphorylation) would be more relevant. This should be the basis of future studies and are beyond the scope of the new findings reported in the present paper. 

      (6) What is the mechanism of the presence of Y1015 phosphorylation in 33% of Ednrb deficient Pax2Cre cells? It appears to me what the authors report as absent phosphorylation in the 67% of cells could be just weak staining or cells missing in prep.

      The reviewer, referring to Fig. 7q, presumably meant to say Wnt1Cre rather than Pax2Cre. The reviewer overlooked that we provided an explanation for this observation in our original manuscript. This sentence reads “Because Ednrb is expressed only in a subset of Wnt1Cre-derived enteric progenitor cells (Figure 7 – figure supplement 1), the residual Y1015 phosphorylation observed in Wnt1Cre/Ednrb mutant cells is likely to occur in the Ednrb-negative Wnt1Cre-derived cell population”. The sentence is retained unchanged in the revised manuscript. The explanation is not because of weak staining or problems with tissue preparation.

      (7) The references the authors cite regarding the previous discovery of Ret expression in the nucleus are incorrect. The review articles the authors cite do not mention anything about Ret expression in the nucleus. The evidence of nuclear localization of Ret previously comes from overexpression studies in HEK293 cells (PMID: 25795775). Such overexpression studies are fraught with generating noisy data for well-documented reasons. But if this observation is correct, the authors miss a great opportunity to identify what the Ret protein is doing in the nucleus. Is it in direct contact with its known transcription factors like Sox10 and Rarb? This would shed a lot of light on the possible mechanism of Ret LoF observed in Ret mutant mice

      The reviewer overlooked that the one of the review articles that we cited (Chen, Hsu, & Hung, 2020) has a dedicated paragraph for RET (section 3.14), which summarizes the work by Barheri-Yarmand et al (PMID: 25795775) which is the very paper noted by the reviewer in the comment above. The reviewer also somewhat misstated the results of the Barheri-Yarmand et al study. By immunostaining, this paper showed nuclear localization of endogenous Ret, albeit a version of Ret with a disease-associated mutation that makes it constitutively active by constitutive autophosphorylation. Nonetheless, this was endogenous Ret. The paper also used overexpression of GFP-tagged RET in HEK293 cells to show that wildtype RET can behave in a similar manner, at least under these circumstances. Our point is simply that Ret (and other receptor tyrosine kinases) can be found in the nucleus in certain biological contexts, and our observations are consistent with this precedent.

      The reviewer also suggests a biochemical follow-up analysis related to this observation, which we agree would be of interest. Such an investigation however is beyond the scope of the present study.

      (8) The manuscript could benefit from a major rewrite by reorganizing sections to make it easy for the readers to follow the narrative.

      Many sections about the role of Ret and Ednrb in Wnt1cre-derived ENCDCs can be moved to a supplement. These facts are well-documented and have been proven before.

      This was addressed in our response to comment #3 of this reviewer. The figures have been kept as main figures in the revised manuscript to allow side-by-side comparison to parallel analysis of the Pax2Cre lineage.

      - The observation that only a handful of Pax2Cre cells at E10.5 express Ret and the observation that conditional Ret null abrogates these cells at E11.5, are not presented together and makes connecting these two facts difficult.

      Ret expression at E10.5 and E11.5 are both shown in the same figure (Fig. 2). In the presentation of these results, we first describe in normal development that Ret is expressed differently in E10.5 ENS progenitors between the Pax2Cre and Wnt1Cre lineages. This is additional support for the argument that the two lineages are molecularly distinct. Then comes evaluation of postnatal fates with different markers before we return to embryonic Ret expression. We acknowledge that this can make it difficult to connect these observations. We decided to retain the original organization in order to not lose this important conclusion. However, we have revised the text to hopefully make this connection between the sections more congruent.

      Reviewer #2 (Recommendations For The Authors):

      - The labeling of some as "figure supplements" is really hard to follow in the text and confusing to interpret when a main figure or supplemental figure is being referenced, and which one.

      We understand this comment, but this is journal style and outside of our control. We have kept the journal format in the revised manuscript.

      - The data in Figures 3b-c is well established in the field and somewhat misinterpreted. NOS1 neurons in the mouse ENS and their projections have been well described (Sang and Young, 1996, and other studies). CGRP immunoreactivity would reflect both ENS CGRP-expressing neurons and visceral afferents from DRG.

      There of course is a history of analysis of NOS1, CGRP, and other markers in the ENS. The focus of the analysis in Fig. 3 is to demonstrate how the cells that express these markers are impacted by gene manipulation in the Wnt1Cre and Pax2Cre lineages. For the giant migrating contractions that are associated with defecation, ample past electrophysiological studies have established that mechanosensory CGRP+ neurons trigger NOS+ inhibitory neurons (and ACh+ excitatory neurons) of the myenteric plexus to propel colonic contents. Thus, these are the relevant markers to explain the lack of colonic peristalsis in Ednrb-deficient mice. To our awareness, our results with NOS1 do not contradict any past study, including the Sang and Young 1996 description. Regarding CGRP, indeed the reviewer is correct that this marker is expressed by both neuronal subtypes. Two arguments support the specific derivation of ENS mechanosensory neurons from the Pax2 lineage. First, the ENS and DRG neurons can be distinguished by the location of their cell bodies and their axon extensions in the gut wall; only the ENS neurons are deficient in Pax2Cre/Ednrb mutants (as documented in Fig. 3). Second, the DRG population is derived from neural crest and is not labeled by Pax2Cre. If this population of CGRP+ neurons had functional relevance to colonic peristalsis, this would not be altered in Pax2Cre/Ednrb mutants. Indeed, the CGRP+ afferent nerve endings of DRG origin in the distal colon are mechanical distension sensors but do not modulate either ENS or autonomic nervous system activity (PMID: 37541195). We believe that our interpretation is correct.

      - The evidence in Figure 3 supporting the claim that NOS1 and CGRP-expressing enteric neurons come from distinct lineages is weak. IHC for CGRP is notoriously poor at labeling soma in the ENS. IHC for tdTomato to ensure the detection of low levels of Tomato expression and quantification of observations would strengthen this claim.

      CGRP is a vesicular peptide which is stored and transported in vesicles, therefore the antibody against CGRP labels vesicular particles of soma and synaptic vesicles along the axons of those CGRP-producing neurons.

      It is not expected to label the entire cytoplasm (or the range of subcellular organelles) as NOS antibody does. We did included quantification of data in Figure 3-figure supplement 1 in the manuscript to support the claim of lineage derivation. As described in the Methods section of the manuscript, we used binary threshold selection for Tomato+ cell count using Fiji-Image J, which detects both TomatoHigh and TomatoLow cells as Tomato+; we feel this is equal to or even superior to IHC for this analysis. 

      - IHC panels in Figures 3h-o are largely uninterpretable. Most of the signal seems to be non-specific background staining in the mucosa and quantification of mucosal signal in this context does not seem meaningful.  

      We disagree with the reviewer’s comment. As described in the response above, CGRP+ mechanosensory neurons send their peripheral axon projections to innervate mucosa (sensory epithelial cells), and NOS+ inhibitory motor axons innervate the circular muscle. Thus, panels h-o of Fig. 3 focus on the axonal profile and are not intended to visualize soma, which is why sagittal views are presented instead of flatmount views. All of the controls were performed side-by-side to confirm that the signal is real and interpretable.

      Note also that the colon does not have villi so this annotation should be revised.

      We appreciate that the reviewer brought this misstatement to our attention. We corrected this error in the revised manuscript.

      - Phospho-RET staining in Figure 7 is difficult to discern and interpret with high background. Positive and negative controls would strengthen these data.

      Fig. 7 shows phospho Ret-Y1015 staining in lineage-labeled Wnt1Cre/Ednrb/R26nTnG mutants. The strength of the signal to noise in the figure is a matter of Ret expression level and the quality of the anti-pY1015 antibody. We are not aware of a meaningful positive control that has been validated in the literature that we could use for comparison. The ideal negative control would be to perform the same analysis in Wnt1Cre/Ret/R26nTnG mutants, but because this manipulation eliminates the entire NC cell lineage from the colon, there would be no NC cells in which to visualize background staining in this lineage with this antibody when Ret protein is not present. We note that anti-pY1096 did not show a difference in staining between control and mutant, which supports the interpretation of a specific impact on pY1015. We also point out here, as in the text, that we do not yet have any validation that phosphorylation of Y1015 is functionally important in NC migration to the distal colon. Clearly, more work to address this role and to demonstrate the mechanism of phosphorylation of this specific residue in response to Edn3-Ednrb signaling will be needed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      The work introduces a valuable new method for depleting the ribosomal RNA from bacterial single-cell RNA sequencing libraries and shows that this method is applicable to studying the heterogeneity in microbial biofilms. The evidence for a small subpopulation of cells at the bottom of the biofilm which upregulates PdeI expression is solid. However, more investigation into the unresolved functional relationship between PdeI and c-di-GMP levels with the help of other genes co-expressed in the same cluster would have made the conclusions more significant. 

      Many thanks for eLife’s assessment of our manuscript and the constructive feedback. We are encouraged by the recognition of our bacterial single-cell RNA-seq methodology as valuable and its efficacy in studying bacterial population heterogeneity. We appreciate the suggestion for additional investigation into the functional relationship between PdeI and c-di-GMP levels. We concur that such an exploration could substantially enhance the impact of our conclusions. To address this, we have implemented the following revisions: We have expanded our data analysis to identify and characterize genes co-expressed with PdeI within the same cellular cluster (Fig. 3F, G, Response Fig. 10); We conducted additional experiments to validate the functional relationships between PdeI and c-di-GMP, followed by detailed phenotypic analyses (Response Fig. 9B). Our analysis reveals that while other marker genes in this cluster are co-expressed, they do not significantly impact biofilm formation or directly relate to c-di-GMP or PdeI. We believe these revisions have substantially enhanced the comprehensiveness and context of our manuscript, thereby reinforcing the significance of our discoveries related to microbial biofilms. The expanded investigation provides a more thorough understanding of the PdeI-associated subpopulation and its role in biofilm formation, addressing the concerns raised in the initial assessment.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In this manuscript, Yan and colleagues introduce a modification to the previously published PETRI-seq bacterial single-cell protocol to include a ribosomal depletion step based on a DNA probe set that selectively hybridizes with ribosome-derived (rRNA) cDNA fragments. They show that their modification of the PETRI-seq protocol increases the fraction of informative non-rRNA reads from ~4-10% to 54-92%. The authors apply their protocol to investigating heterogeneity in a biofilm model of E. coli, and convincingly show how their technology can detect minority subpopulations within a complex community. 

      Strengths: 

      The method the authors propose is a straightforward and inexpensive modification of an established split-pool single-cell RNA-seq protocol that greatly increases its utility, and should be of interest to a wide community working in the field of bacterial single-cell RNA-seq. 

      Weaknesses: 

      The manuscript is written in a very compressed style and many technical details of the evaluations conducted are unclear and processed data has not been made available for evaluation, limiting the ability of the reader to independently judge the merits of the method. 

      Thank you for your thoughtful and constructive review of our manuscript. We appreciate your recognition of the strengths of our work and the potential impact of our modified PETRI-seq protocol on the field of bacterial single-cell RNA-seq. We are grateful for the opportunity to address your concerns and improve the clarity and accessibility of our manuscript.

      We acknowledge your feedback regarding the compressed writing style and lack of technical details, which are constrained by the requirements of the Short Report format in eLife. We have addressed these issues in our revised manuscript as follows:

      (1) Expanded methodology section: We have provided a more comprehensive description of our experimental procedures, including detailed protocols for the ribosomal depletion step (lines 435-453) and data analysis pipeline (lines 471-528). This will enable readers to better understand and potentially replicate our methods.

      (2) Clarification of technical evaluations: We have elaborated on the specifics of our evaluations, including the criteria used for assessing the efficiency of ribosomal depletion (lines 99-120), and the methods employed for identifying and characterizing subpopulations (lines 155-159, 161-163 and 163-167).

      (3) Data availability: We apologize for the oversight in not making our processed data readily available. We have deposited all relevant datasets, including raw and source data, in appropriate public repositories (GEO: GSE260458) and provide clear instructions for accessing this data in the revised manuscript.

      (4) Supplementary information: To maintain the concise nature of the main text while providing necessary details, we have included additional supplementary information. This will cover extended methodology (lines 311-318, 321-323, 327-340, 450-453, 533, and 578-589), detailed statistical analyses (lines 492-493, 499-501 and 509-528), and comprehensive data tables to support our findings.

      We believe these changes significantly improved the clarity and reproducibility of our work, allowing readers to better evaluate the merits of our method.

      Reviewer #2 (Public Review): 

      Summary: 

      This work introduces a new method of depleting the ribosomal reads from the single-cell RNA sequencing library prepared with one of the prokaryotic scRNA-seq techniques, PETRI-seq. The advance is very useful since it allows broader access to the technology by lowering the cost of sequencing. It also allows more transcript recovery with fewer sequencing reads. The authors demonstrate the utility and performance of the method for three different model species and find a subpopulation of cells in the E.coli biofilm that express a protein, PdeI, which causes elevated c-di-GMP levels. These cells were shown to be in a state that promotes persister formation in response to ampicillin treatment. 

      Strengths: 

      The introduced rRNA depletion method is highly efficient, with the depletion for E.coli resulting in over 90% of reads containing mRNA. The method is ready to use with existing PETRI-seq libraries which is a large advantage, given that no other rRNA depletion methods were published for split-pool bacterial scRNA-seq methods. Therefore, the value of the method for the field is high. There is also evidence that a small number of cells at the bottom of a static biofilm express PdeI which is causing the elevated c-di-GMP levels that are associated with persister formation. Given that PdeI is a phosphodiesterase, which is supposed to promote hydrolysis of c-di-GMP, this finding is unexpected. 

      Weaknesses: 

      With the descriptions and writing of the manuscript, it is hard to place the findings about the PdeI into existing context (i.e. it is well known that c-di-GMP is involved in biofilm development and is heterogeneously distributed in several species' biofilms; it is also known that E.coli diesterases regulate this second messenger, i.e. https://journals.asm.org/doi/full/10.1128/jb.00604-15). 

      There is also no explanation for the apparently contradictory upregulation of c-di-GMP in cells expressing higher PdeI levels. Perhaps the examination of the rest of the genes in cluster 2 of the biofilm sample could be useful to explain the observed association. 

      Thank you for your thoughtful and constructive review of our manuscript. We are pleased that the reviewer recognizes the value and efficiency of our rRNA depletion method for PETRI-seq, as well as its potential impact on the field. We would like to address the points raised by the reviewer and provide additional context and clarification regarding the function of PdeI in c-di-GMP regulation.

      We acknowledge that c-di-GMP’s role in biofilm development and its heterogeneous distribution in bacterial biofilms are well studied. We appreciate the reviewer's observation regarding the seemingly contradictory relationship between increased PdeI expression and elevated c-di-GMP levels. This is indeed an intriguing finding that warrants further explanation.

      PdeI is predicted to function as a phosphodiesterase involved in c-di-GMP degradation, based on sequence analysis demonstrating the presence of an intact EAL domain, which is known for this function. However, it is important to note that PdeI also harbors a divergent GGDEF domain, typically associated with c-di-GMP synthesis. This dual-domain structure indicates that PdeI may play complex regulatory roles. Previous studies have shown that knocking out the major phosphodiesterase PdeH in E. coli results in the accumulation of c-di-GMP. Moreover, introducing a point mutation (G412S) in PdeI's divergent GGDEF domain within this PdeH knockout background led to decreased c-di-GMP levels2. This finding implies that the wild-type GGDEF domain in PdeI contributes to maintaining or increasing cellular c-di-GMP levels.

      Importantly, our single-cell experiments demonstrated a positive correlation between PdeI expression levels and c-di-GMP levels (Figure 4D). In this revision, we also constructed a PdeI(G412S)-BFP mutation strain. Notably, our observations of this strain revealed that c-di-GMP levels remained constant despite an increase in BFP fluorescence, which serves as a proxy for PdeI(G412S) expression levels (Figure 4D). This experimental evidence, coupled with domain analyses, suggests that PdeI may also contribute to c-di-GMP synthesis, rebutting the notion that it acts solely as a phosphodiesterase. HPLC LC-MS/MS analysis further confirmed that the overexpression of PdeI, induced by arabinose, resulted in increased c-di-GMP levels (Fig. 4E) . These findings strongly suggest that PdeI plays a pivotal role in upregulating c-di-GMP levels.

      Our further analysis indicated that PdeI contains a CHASE (cyclases/histidine kinase-associated sensory) domain. Combined with our experimental results showing that PdeI is a membrane-associated protein, we hypothesize that PdeI acts as a sensor, integrating environmental signals with c-di-GMP production under complex regulatory mechanisms.

      We understand your interest in the other genes present in cluster 2 of the biofilm and their potential relationship to PdeI and c-di-GMP. Upon careful analysis, we have determined that the other marker genes in this cluster do not significantly impact biofilm formation, nor have we identified any direct relationship between these genes, c-di-GMP, or PdeI. Our focus on PdeI within this cluster is justified by its unique and significant role in c-di-GMP regulation and biofilm formation, as demonstrated by our experimental results. While other genes in this cluster may be co-expressed, their functions appear unrelated to the PdeI-c-di-GMP pathway we are investigating. Therefore, we opted not to elaborate on these genes in our main discussion, as they do not contribute directly to our understanding of the PdeI-c-di-GMP association. However, we can include a brief mention of these genes in the manuscript, indicating their lack of relevance to the PdeI-c-di-GMP pathway. This addition will provide a more comprehensive view of the cluster's composition while maintaining our focus on the key findings related to PdeI and c-di-GMP.

      We have also included the aforementioned explanations and supporting experimental data within the manuscript to clarify this important point (lines 193-217). Thank you for highlighting this apparent contradiction, allowing us to provide a more detailed explanation of our findings.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Overall, I found the main text of the manuscript well written and easy to understand, though too compressed in parts to fully understand the details of the work presented, some examples are outlined below. The materials and methods appeared to be less carefully compiled and could use some careful proof-reading for spelling (e.g. repeated use of "minuts" for minutes, "datas" for data) and grammar and sentence fragments (e.g. "For exponential period E. coli data." Line 333). In general, the meaning is still clear enough to be understood. I also was unable to find figure captions for the supplementary figures, making these difficult to understand. 

      We appreciate your careful review, which has helped us improve the clarity and quality of our manuscript. We acknowledge that some parts of the main text may have been overly compressed due to Short Report format in eLife. We have thoroughly reviewed the manuscript and expanded on key areas to provide more comprehensive explanations. We have carefully revised the Materials and Methods section to address the following: Corrected all spelling and grammatical error, including "minuts" to "minutes" and "datas" to "data". Corrected grammatical issues and sentence fragments throughout the section. We sincerely apologize for the omission of captions for the supplementary figures. We have now added detailed captions for all supplementary figures to ensure they are easily understandable. We believe these revisions address your concerns and enhance the overall readability and comprehension of our work.

      General comments: 

      (1) To evaluate the performance of RiboD-PETRI, it would be helpful to have more details in general, particularly to do with the development of the sequencing protocol and the statistics shown. Some examples: How many reads were sequenced in each experiment? Of these, how many are mapped to the bacterial genome? How many reads were recovered per cell? Have the authors performed some kind of subsampling analysis to determine if their sequencing has saturated the detection of expressed genes? The authors show e.g. correlations between classic PETRI-seq and RiboD-PETRI for E. coli in Figure 1, but also have similar data for C. crescentus and S. aureus - do these data behave similarly? These are just a few examples, but I'm sure the authors have asked themselves many similar questions while developing this project; more details, hard numbers, and comparisons would be very much appreciated. 

      Thank you for your valuable feedback. To address your concerns, we have added a table in the supplementary material that clarifies the details of sequencing.

      The correlation values of PETRI-seq and RiboD-PETRI data in C. crescentus are relatively good. However, the correlation values between PETRI-seq and RiboD-PETRI data in SA data are relatively less high. The reason is that the sequencing depths of RiboD-PETRI and PETRI-seq are different, resulting in much higher gene expression in the RiboD-PETRI sequencing results than in PETRI-seq, and the calculated correlation coefficient is only about 0.47. This indicates that there is some positive correlation between the two sets of data, but it is not particularly strong. This indicates that there is a certain positive correlation between these two sets of data, but it is not particularly strong. However, we have counted the expression of 2763 genes in total, and even though the calculated correlation coefficient is relatively low, it still shows that there is some consistency between the two groups of samples.

      Author response image 1.

      Assessment of the effect of rRNA depletion on transcriptional profiles of (A) C. crescentus (CC) and (B) S. aureus (SA) . The Pearson correlation coefficient (r) of UMI counts per gene (log2 UMIs) between RiboD-PETRI and PETRI-seq was calculated for 4097 genes (A) and 2763 genes (B). The "ΔΔ" label represents the RiboD-PETRI protocol; The "Ctrl" label represents the classic PETRI-seq protocol we performed. Each point represents a gene.

      (2) Additionally, I think it is critical that the authors provide processed read counts per cell and gene in their supplementary information to allow others to investigate the performance of their method without going back to raw FASTQ files, as this can represent a significant hurdle for reanalysis. 

      Thank you for your suggestion. However, it's important to clarify that reads and UMIs (Unique Molecular Identifiers) are distinct concepts in single-cell RNA sequencing. Reads can be influenced by PCR amplification during library construction, making their quantity less stable. In contrast, UMIs serve as a more reliable indicator of the number of mRNA molecules detected after PCR amplification. Throughout our study, we primarily utilized UMI counts for quantification. To address your concern about data accessibility, we have included the UMI counts per cell and gene in our supplementary materials provided above (Table S7-15. Some of the files are too large in memory and are therefore stored in GEO: GSE260458). This approach provides a more accurate representation of gene expression levels and allows for robust reanalysis without the need to process raw FASTQ files.

      (3) Finally, the authors should also discuss other approaches to ribosomal depletion in bacterial scRNA-seq. One of the figures appears to contain such a comparison, but it is never mentioned in the text that I can find, and one could read this manuscript and come away believing this is the first attempt to deplete rRNA from bacterial scRNA-seq. 

      We have addressed this concern by including a comparison of different methods for depleting rRNA from bacterial scRNA-seq in Table S4 and make a short text comparison as follows: “Additionally, we compared our findings with other reported methods (Fig. 1B; Table S4). The original PETRI-seq protocol, which does not include an rRNA depletion step, exhibited an mRNA detection rate of approximately 5%. The MicroSPLiT-seq method, which utilizes Poly A Polymerase for mRNA enrichment, achieved a detection rate of 7%. Similarly, M3-seq and BacDrop-seq, which employ RNase H to digest rRNA post-DNA probe hybridization in cells, reported mRNA detection rates of 65% and 61%, respectively. MATQ-DASH, which utilizes Cas9-mediated targeted rRNA depletion, yielded a detection rate of 30%. Among these, RiboD-PETRI demonstrated superior performance in mRNA detection while requiring the least sequencing depth.” We have added this content in the main text (lines 110-120), specifically in relation to Figure 1B and Table S4. This addition provides context for our method and clarifies its position among existing techniques.

      Detailed comments: 

      Line 78: the authors describe the multiplet frequency, but it is not clear to me how this was determined, for which experiments, or where in the SI I should look to see this. Often this is done by mixing cultures of two distinct bacteria, but I see no evidence of this key experiment in the manuscript. 

      The multiplet frequency we discuss in the manuscript is not determined through experimental mixing of distinct bacterial cultures.The PETRI-seq and mirco-SPLIT articles have also done experiments mixing the two libraries to determine the single-cell rate, and both gave good results. Our technique is derived from these two articles (mainly PETRI-seq), and the biggest difference is the difference in the later RiboD part, so we did not do this experiment separately. So the multiple frequencies here are theoretical predictions based on our sequencing results, calculated using a Poisson distribution. We have made this distinction clearer in our manuscript (lines 93-97). The method is available in Materials and Methods section (lines 520-528). The data is available in Table S2. To elaborate:

      To assess the efficiency of single-cell capture in RiboD-PETRI, we calculated the multiplet frequency using a Poisson distribution based on our sequencing results

      (1) Definition: In our study, multiplet frequency is defined as the probability of a non-empty barcode corresponding to more than one cell.

      (2) Calculation Method: We use a Poisson distribution-based approach to calculate the predicted multiplet frequency. The process involves several steps:

      We first calculate the proportion of barcodes corresponding to zero cells: . Then, we calculate the proportion corresponding to one cell: . We derive the proportion for more than zero cells: P(≥1) = 1 - P(0). And for more than one cell: P(≥2) = 1 - P(1) - P(0). Finally, the multiplet frequency is calculated as:

      (3) Parameter λ: This is the ratio of the number of cells to the total number of possible barcode combinations. For instance, when detecting 10,000 cells, .

      Line 94: the concept of "percentage of gene expression" is never clearly defined. Does this mean the authors detect 99.86% of genes expressed in some cells? How is "expressed" defined - is this just detecting a single UMI? 

      The term "percentage gene expression" refers to the proportion of genes in the bacterial strain that were detected as expressed in the sequenced cell population. Specifically, in this context, it means that 99.86% of all genes in the bacterial strain were detected as expressed in at least one cell in our sequencing results. To define "expressed" more clearly: a gene is considered expressed if at least one UMI (Unique Molecular Identifier) detected in a cell in the population. This definition allows for the detection of even low-level gene expression. To enhance clarity in the manuscript, we have rephrased the sentence as “transcriptome-wide gene coverage across the cell population”.

      Line 98: The authors discuss the number of recovered UMIs throughout this paragraph, but there is no clear discussion of the number of detected expressed genes per cell. Could the authors include a discussion of this as well, as this is another important measure of sensitivity? 

      We appreciate your suggestion to include a discussion on the number of detected expressed genes per cell, as this is indeed another important measure of sensitivity. We would like to clarify that we have actually included statistics on the number of genes detected across all cells in the main text of our paper. This information is presented as percentages. However, we understand that you may be looking for a more detailed representation, similar to the UMI statistics we provided. To address this, we have now added a new analysis showing the number of genes detected per cell (lines 132-133, 138-139, 144-145 and 184-186, Fig. 2B, 3B and S2B). This additional result complements our existing UMI data and provides a more comprehensive view of the sensitivity of our method. We have included this new gene-per-cell statistical graph in the supplementary materials.

      Figure 1B: I presume ctrl and delta delta represent the classic PETRI-seq and RiboD protocols, respectively, but this is not specified. This should be clarified in the figure caption, or the names changed. 

      We appreciate you bringing this to our attention. We acknowledge that the labeling in the figure could have been clearer. We have now clarified this information in the figure caption. To provide more specificity: The "ΔΔ" label represents the RiboD-PETRI protocol; The "Ctrl" label represents the classic PETRI-seq protocol we performed. We have updated the figure caption to include these details, which should help readers better understand the protocols being compared in the figure.​

      Line 104: the authors claim "This performance surpassed other reported bacterial scRNA-seq methods" with a long number of references to other methods. "Performance" is not clearly defined, and it is unclear what the exact claim being made is. The authors should clarify what they're claiming, and further discuss the other methods and comparisons they have made with them in a thorough and fair fashion. 

      We appreciate your request for clarification, and we acknowledge that our definition of "performance" should have been more explicit. We would like to clarify that in this context, we define performance primarily in terms of the proportion of mRNA captured. Our improved method demonstrates a significantly higher rate of rRNA removal compared to other bacterial single-cell library construction methods. This results in a higher proportion of mRNA in our sequencing data, which we consider a key performance metric for single-cell RNA sequencing in bacteria. Additionally, when compared to our previous method, PETRI-seq, our improved approach not only enhances rRNA removal but also reduces library construction costs. This dual improvement in both data quality and cost-effectiveness is what we intended to convey with our performance claim.

      We recognize that a more thorough and fair discussion of other methods and their comparisons would be beneficial. We have summarized the comparison in Table S4 and make a short text discussion in the main text (lines 106-120). This addition provides context for our method and clarifies its position among existing techniques.

      Figure 1D: Do the authors have any explanation for the relatively lower performance of their C. crescentus depletion? 

      We appreciate your attention to detail and the opportunity to address this point. The lower efficiency of rRNA removal in C. crescentus compared to other species can be attributed to inherent differences between species. It's important to note that a single method for rRNA depletion may not be universally effective across all bacterial species due to variations in their genetic makeup and rRNA structures. Different bacterial species can have unique rRNA sequences, secondary structures, or associated proteins that may affect the efficiency of our depletion method. This species-specific variation highlights the challenges in developing a one-size-fits-all approach for bacterial rRNA depletion. While our method has shown high efficiency across several species, the results with C. crescentus underscore the need for continued refinement and possibly species-specific optimizations in rRNA depletion techniques. We thank you for bringing attention to this point, as it provides valuable insight into the complexities of bacterial rRNA depletion and areas for future improvement in our method.

      Line 118: The authors claim RiboD-PETRI has a "consistent ability to unveil within-population heterogeneity", however the preceding paragraph shows it detects potential heterogeneity, but provides no evidence this inferred heterogeneity reflects the reality of gene expression in individual cells. 

      We appreciate your careful reading and the opportunity to clarify this point. We acknowledge that our wording may have been too assertive given the evidence presented. We acknowledge that the subpopulations of cells identified in other species have not undergone experimental verification. Our intention in presenting these results was to demonstrate RiboD-PETRI's capability to detect “potential” heterogeneity consistently across different bacterial species, showcasing the method's sensitivity and potential utility in exploring within-population diversity. However, we agree that without further experimental validation, we cannot definitively claim that these detected differences represent true biological heterogeneity in all cases. We have revised this section to reflect the current state of our findings more accurately, emphasizing that while RiboD-PETRI consistently detects potential heterogeneity across species, further experimental validation would be required to confirm the biological significance of the observations (lines 169-171).

      Figure 1 H&I: I'm not entirely sure what I am meant to see in these figures, presumably some evidence for heterogeneity in gene expression. Are there better visualizations that could be used to communicate this? 

      We appreciate your suggestion for improving the visualization of gene expression heterogeneity. We have explored alternative visualization methods in the revised manuscript. Specifically, for the expression levels of marker genes shown in Figure 1H (which is Figure 2D now), we have created violin plots (Supplementary Fig. 4). These plots offer a more comprehensive view of the distribution of expression levels across different cell populations, making it easier to discern heterogeneity. However, due to the number of marker genes and the resulting volume of data, these violin plots are quite extensive and would occupy a significant amount of space. Given the space constraints of the main figure, we propose to include these violin plots as a Fig. S4 immediately following Figure 1 H&I (which is Figure 2D&E now). This arrangement will allow readers to access more detailed information about these marker genes while maintaining the concise style of the main figure.

      Regarding the pathway enrichment figure (Figure 2E), we have also considered your suggestion for improvement. We attempted to use a dot plot to display the KEGG pathway enrichment of the genes. However, our analysis revealed that the genes were only enriched in a single pathway. As a result, the visual representation using a dot plot still did not produce a particularly aesthetically pleasing or informative figure.

      Line 124: The authors state no significant batch effect was observed, but in the methods on line 344 they specify batch effects were removed using Harmony. It's unclear what exactly S2 is showing without a figure caption, but the authors should clarify this discrepancy. 

      We apologize for any confusion caused by the lack of a clear figure caption for Figure S2 (which is Figure S3D now). To address your concern, in addition to adding figure captions for supplementary figure, we would also like to provide more context about the batch effect analysis. In Supplementary Fig. S3, Panel C represents the results without using Harmony for batch effect removal, while Panel D shows the results after applying Harmony. In both panels A and B, the distribution of samples one and two do not show substantial differences. Based on this observation, we concluded that there was no significant batch effect between the two samples. However, we acknowledge that even subtle batch effects could potentially influence downstream analyses. Therefore, out of an abundance of caution and to ensure the highest quality of our results, we decided to apply Harmony to remove any potential minor batch effects. This approach aligns with best practices in single-cell analysis, where even small technical variations are often accounted for to enhance the robustness of the results.

      To improve clarity, we have revised our manuscript to better explain this nuanced approach: 1. We have updated the statement to reflect that while no major batch effect was observed, we applied batch correction as a precautionary measure (lines 181-182). 2. We have added a detailed caption to Figure S3, explaining the comparison between non-corrected and batch-corrected data. 3. We have modified the methods section to clarify that Harmony was applied as a precautionary step, despite the absence of obvious batch effects (lines 492-493).

      Figure 2D: I found this panel fairly uninformative, is there a better way to communicate this finding? 

      Thank you for your feedback regarding Figure 2D. We have explored alternative ways to present this information, using a dot plot to display the enrichment pathways, as this is often an effective method for visualizing such data. Meanwhile, we also provided a more detailed textual description of the enrichment results in the main text, highlighting the most significant findings.

      Figure 2I: the figure itself and caption say GFP, but in the text and elsewhere the authors say this is a BFP fusion. 

      We appreciate your careful review of our manuscript and figures. We apologize for any confusion this may have caused. To clarify: Both GFP (Green Fluorescent Protein) and BFP (Blue Fluorescent Protein) were indeed used in our experiments, but for different purposes: 1. GFP was used for imaging to observe location of PdeI in bacteria and persister cell growth, which is shown in Figure 4C and 4K. 2. BFP was used for cell sorting, imaging of location in biofilm, and detecting the proportion of persister cells which shown in Figure 4D, 4F-J. To address this inconsistency and improve clarity, we will make the following corrections: 1. We have reviewed the main text to ensure that references to GFP and BFP are accurate and consistent with their respective uses in our experiments. 2. We have added a note in the figure caption for Figure 4C to explicitly state that this particular image shows GFP fluorescence for location of PdeI. 3. In the methods section, we have provided a clear explanation of how both fluorescent proteins were used in different aspects of our study (lines 326-340).

      Line 156: The authors compare prices between RiboD and PETRI-seq. It would be helpful to provide a full cost breakdown, e.g. in supplementary information, as it is unclear exactly how the authors came to these numbers or where the major savings are (presumably in sequencing depth?) 

      We appreciate your suggestion to provide a more detailed cost breakdown, and we agree that this would enhance the transparency and reproducibility of our cost analysis. In response to your feedback, we have prepared a comprehensive cost breakdown that includes all materials and reagents used in the library preparation process. Additionally, we've factored in the sequencing depth (50G) and the unit price for sequencing (25¥/G). These calculations allow us to determine the cost per cell after sequencing. As you correctly surmised, a significant portion of the cost reduction is indeed related to sequencing depth. However, there are also savings in the library preparation steps that contribute to the overall cost-effectiveness of our method. We propose to include this detailed cost breakdown as a supplementary table (Table S6) in our paper. This table will provide a clear, itemized list of all expenses involved, including: 1. Reagents and materials for library preparation 2. Sequencing costs (depth and price per G) 3. Calculated cost per cell.

      Line 291: The design and production of the depletion probes are not clearly explained. How did the authors design them? How were they synthesized? Also, it appears the authors have separate probe sets for E. coli, C. crescentus, and S. aureus - this should be clarified, possibly in the main text.

      Thank you for your important questions regarding the design and production of our depletion probes. We included the detailed probe information in Supplementary Table S1, however, we didn’t clarify the information in the main text due to the constrains of the requirements of the Short Report format in eLife. We appreciate the opportunity to provide clarifications. ​

      The core principle behind our probe design is that the probe sequences are reverse complementary to the r-cDNA sequences. This design allows for specific recognition of r-cDNA. The probes are then bound to magnetic beads, allowing the r-cDNA-probe-bead complexes to be separated from the rest of the library. To address your specific questions: 1. Probe Design: We designed separate probe sets for E. coli, C. crescentus, and S. aureus. Each set was specifically constructed to be reverse complementary to the r-cDNA sequences of its respective bacterial species. This species-specific approach ensures high efficiency and specificity in rRNA depletion for each organism. The hybrid DNA complex wasthen removed by Streptavidin magnetic beads. 2. Probe Synthesis: The probes were synthesized based on these design principles. 3. Species-Specific Probe Sets: You are correct in noting that we used separate probe sets for each bacterial species. We have clarified this important point in the main text to ensure readers understand the specificity of our approach. To further illustrate this process, we have created a schematic diagram showing the principle of rRNA removal and clarified the design principle in figure legend, which we have included in the figure legend of Fig. 1A.

      Line 362: I didn't see a description of the construction of the PdeI-BFP strain, I assume this would be important for anyone interested in the specific work on PdeI. 

      Thank you for your astute observation regarding the construction of the PdeI-BFP strain. We appreciate the opportunity to provide this important information. The PdeI-BFP strain was constructed as follows: 1. We cloned the pdeI gene along with its native promoter region (250bp) into a pBAD vector. 2. The original promoter region of the pBAD vector was removed to avoid any potential interference. 3. This construction enables the expression of the PdeI-BFP fusion protein to be regulated by the native promoter of pdeI, thus maintaining its physiological control mechanisms. 4. The BFP coding sequence was fused to the pdeI gene to create the PdeI-BFP fusion construct. We have added a detailed description of the PdeI-BFP strain construction to our methods section (lines 327-334).

      Reviewer #2 (Recommendations For The Authors): 

      (1) General remarks: 

      Reconsider using 'advanced' in the title. It is highly generic and misleading. Perhaps 'cost-efficient' would be a more precise substitute. 

      Thank you for your valuable suggestion. After careful consideration, we have decided to use "improved" in the title. Firstly, our method presents an efficient solution to a persistent challenge in bacterial single-cell RNA sequencing, specifically addressing rRNA abundance. Secondly, it facilitates precise exploration of bacterial population heterogeneity. We believe our method encompasses more than just cost-effectiveness, justifying the use of the term "advanced."

      Consider expanding the introduction. The introduction does not explain the setup of the biological question or basic details such as the organism(s) for which the technique has been developed, or which species biofilms were studied. 

      Thank you for your valuable feedback regarding our introduction. We acknowledge our compressed writing style due to constrains of the requirements of the Short Report format in eLife. We appreciate opportunity to expand this crucial section of our manuscript, which will undoubtedly improve the clarity and impact of our manuscript's introduction.

      We revised our introduction (lines 53-80) according to following principles:

      (1) Initial Biological Question: We explained the initial biological question that motivated our research—understanding the heterogeneity in E. coli biofilms—to provide essential context for our technological development.

      (2) Limitations of Existing Techniques: We briefly described the limitations of current single-cell sequencing techniques for bacteria, particularly regarding their application in biofilm studies.

      (3) Introduction of Improved Technique: We introduced our improved technique, initially developed for E. coli.

      (4) Research Evolution: We highlighted how our research has evolved, demonstrating that our technique is applicable not only to E. coli but also to Gram-positive bacteria and other Gram-negative species, showcasing the broad applicability of our method.

      (5) Specific Organisms Studied: We provided examples of the specific organisms we studied, encompassing both Gram-positive and Gram-negative bacteria.

      (6) Potential Implications: Finally, we outlined the potential implications of our technique for studying bacterial heterogeneity across various species and contexts, extending beyond biofilms.

      (2) Writing remarks: 

      43-45 Reword: "Thus, we address a persistent challenge in bacterial single-cell RNA-seq regarding rRNA abundance, exemplifying the utility of this method in exploring biofilm heterogeneity.". 

      Thank you for highlighting this sentence and requesting a rewording. I appreciate the opportunity to improve the clarity and impact of our statement. We have reworded the sentence as: "Our method effectively tackles a long-standing issue in bacterial single-cell RNA-seq: the overwhelming abundance of rRNA. This advancement significantly enhances our ability to investigate the intricate heterogeneity within biofilms at unprecedented resolution." (lines 47-50)

      49 "Biofilms, comprising approximately 80% of chronic and recurrent microbial infections in the human body..." - probably meant 'contribute to'. 

      Thank you for catching this imprecision in our statement. We have reworded the sentence as: "​Biofilms contribute to approximately 80% of chronic and recurrent microbial infections in the human body...​"

      54-55 Please expand on "this". 

      Thank you for your request to expand on the use of "this" in the sentence. You're right that more clarity would be beneficial here. We have revised and expanded this section in lines 54-69.

      81-84 Unclear why these species samples were either at exponential or stationary phases. The growth stage can influence the proportion of rRNA and other transcripts in the population. 

      Thank you for raising this important point about the growth phases of the bacterial samples used in our study. We appreciate the opportunity to clarify our experimental design. To evaluate the performance of RiboD-PETRI, we designed a comprehensive assessment of rRNA depletion efficiency under diverse physiological conditions, specifically contrasting exponential and stationary phases. This approach allows us to understand how these different growth states impact rRNA depletion efficacy. Additionally, we included a variety of bacterial species, encompassing both gram-negative and gram-positive organisms, to ensure that our findings are broadly applicable across different types of bacteria. By incorporating these variables, we aim to provide insights into the robustness and reliability of the RiboD-PETRI method in various biological contexts. We have included this rationale in our result section (lines 99-106), providing readers with a clear understanding of our experimental design choices.

      86 "compared TO PETRI-seq " (typo). 

      We have corrected this typo in our manuscript.

      94 "gene expression collectively" rephrase. Probably this means coverage of the entire gene set across all cells. Same for downstream usage of the phrase. 

      Thank you for pointing out this ambiguity in our phrasing. Your interpretation of our intended meaning is accurate. We have rephrased the sentence as “transcriptome-wide gene coverage across the cell population”.

      97 What were the median UMIs for the 30,000 cell library {greater than or equal to}15 UMIs? Same question for the other datasets. This would reflect a more comparable statistic with previous studies than the top 3% of the cells for example, since the distributions of the single-cell UMIs typically have a long tail. 

      Thank you for this insightful question and for pointing out the importance of providing more comparable statistics. We agree that median values offer a more robust measure of central tendency, especially for datasets with long-tailed distributions, which are common in single-cell studies. The suggestion to include median Unique Molecular Identifier (UMI) counts would indeed provide a more comparable statistic with previous studies. We have analyzed the median UMIs for our libraries as follows and revised our manuscript according to the analysis (lines 126-130, 133-136, 139-142 and 175-180).

      (1) Median UMI count in Exponential Phase E. coli:

      Total: 102 UMIs per cell

      Top 1,000 cells: 462 UMIs per cell

      Top 5,000 cells: 259 UMIs per cell

      Top 10,000 cells: 193 UMIs per cell

      (2) Median UMI count in Stationary Phase S. aureus:

      Total: 142 UMIs per cell

      Top 1,000 cells: 378 UMIs per cell

      Top 5,000 cells: 207 UMIs per cell

      Top 8,000 cells: 167 UMIs per cell

      (3) Median UMI count in Exponential Phase C. crescentus:

      Total: 182 UMIs per cell

      Top 1,000 cells: 2,190 UMIs per cell

      Top 5,000 cells: 662 UMIs per cell

      Top 10,000 cells: 225 UMIs per cell

      (4) Median UMI count in Static E. coli Biofilm:

      Total of Replicate 1: 34 UMIs per cell

      Total of Replicate 2: 52 UMIs per cell

      Top 1,621 cells of Replicate 1: 283 UMIs per cell

      Top 3,999 cells of Replicate 2: 239 UMIs per cell

      104-105 The performance metric should again be the median UMIs of the majority of the cells passing the filter (15 mRNA UMIs is reasonable). The top 3-5% are always much higher in resolution because of the heavy tail of the single-cell UMI distribution. It is unclear if the performance surpasses the other methods using the comparable metric. Recommend removing this line. 

      We appreciate your suggestion regarding the use of median UMIs as a more appropriate performance metric, and we agree that comparing the top 3-5% of cells can be misleading due to the heavy tail of the single-cell UMI distribution. We have removed the line in question (104-105) that compares our method's performance based on the top 3-5% of cells in the revised manuscript. Instead, we focused on presenting the median UMI counts for cells passing the filter (≥15 mRNA UMIs) as the primary performance metric. This will provide a more representative and comparable measure of our method's performance. We have also revised the surrounding text to reflect this change, ensuring that our claims about performance are based on these more robust statistics (lines 126-130, 133-136, 139-142 and 175-180).

      106-108 The sequencing saturation of the libraries (in %), and downsampling analysis should be added to illustrate this point. 

      Thank you for your valuable suggestion. Your recommendation to add sequencing saturation and downsampling analysis is highly valuable and will help better illustrate our point. Based on your feedback, we have revised our manuscript by adding the following content:

      To provide a thorough evaluation of our sequencing depth and library quality, we performed sequencing saturation analysis on our sequencing samples. The findings reveal that our sequencing saturation is 100% (Fig. 8A & B), indicating that our sequencing depth is sufficient to capture the diversity of most transcripts. To further illustrate the impact of our downstream analysis on the datasets, we have demonstrated the data distribution before and after applying our filtering criteria (Fig. S1B & C). These figures effectively visualized the influence of our filtering process on the data quality and distribution. After filtering, we can have a more refined dataset with reduced noise and outliers, which enhances the reliability of our downstream analyses.

      We have also ensured that a detailed description of the sequencing saturation method is included in the manuscript to provide readers with a comprehensive understanding of our methodology. We appreciate your feedback and believe these additions significantly improve our work.

      122: Please provide more details about the biofilm setup, including the media used. I did not find them in the methods. 

      We appreciate your attention to detail, and we agree that this information is crucial for the reproducibility of our experiments. We propose to add the following information to our methods section (lines 311-318):

      "For the biofilm setup, bacterial cultures were grown overnight. The next day, we diluted the culture 1:100 in a petri dish. We added 2ml of LB medium to the dish. If the bacteria contain a plasmid, the appropriate antibiotic needs to be added to LB. The petri dish was then incubated statically in a growth chamber for 24 hours. After incubation, we performed imaging directly under the microscope. The petri dishes used were glass-bottom dishes from Biosharp (catalog number BS-20-GJM), allowing for direct microscopic imaging without the need for cover slips or slides. This setup allowed us to grow and image the biofilms in situ, providing a more accurate representation of their natural structure and composition.​"

      125: "sequenced 1,563 reads" missing "with" 

      Thank you for correcting our grammar. We have revisd the phrase as “sequenced with 1,563 reads”.

      126: "283/239 UMIs per cell" unclear. 283 and 239 UMIs per cell per replicate, respectively? 

      Thank you for correcting our grammar. We have revised the phrase as “283 and 239 UMIs per cell per replicate, respectively” (lines 184).

      Figure 1D: Please indicate where the comparison datasets are from. 

      We appreciate your question regarding the source of the comparison datasets in Figure 1D. All data presented in Figure 1D are from our own sequencing experiments. We did not use data from other publications for this comparison. Specifically, we performed sequencing on E. coli cells in the exponential growth phase using three different library preparation methods: RiboD-PETRI, PETRI-seq, and RNA-seq. The data shown in Figure 1D represent a comparison of UMIs and/or reads correlations obtained from these three methods. All sequencing results have been uploaded to the Gene Expression Omnibus (GEO) database. The accession number is GSE260458. We have updated the figure legend for Figure 1D to clearly state that all datasets are from our own experiments, specifying the different methods used.

      Figure 1I, 2D: Unable to interpret the color block in the data. 

      We apologize for any confusion regarding the interpretation of the color blocks in Figures 1I and 2D (which are Figure 2E, 3E now). The color blocks in these figures represent the p-values of the data points. The color scale ranges from red to blue. Red colors indicate smaller p-values, suggesting higher statistical significance and more reliable results. Blue colors indicate larger p-values, suggesting lower statistical significance and less reliable results. We have updated the figure legends for both Figure 2E and Figure 3E to include this explanation of the color scale. Additionally, we have added a color legend to each figure to make the interpretation more intuitive for readers.

      Figure1H and 2C: Gene names should be provided where possible. The locus tags are highly annotation-dependent and hard to interpret. Also, a larger size figure should be helpful. The clusters 2 and 3 in 2C are the most important, yet because they have few cells, very hard to see in this panel. 

      We appreciate your suggestions for improving the clarity and interpretability of Figures 1H and 2C (which is Figure 2D, 3D now). We have replaced the locus tags with gene names where possible in both figures. We have increased the size of both figures to improve visibility and readability. We have also made Clusters 2 and 3 in Figure 3D more prominent in the revised figure. Despite their smaller cell count, we recognize their importance and have adjusted the visualization to ensure they are clearly visible. We believe these modifications will significantly enhance the clarity and informativeness of Figures 2D and 3D.​

      (3) Questions to consider further expanding on, by more analyses or experiments and in the discussion: 

      What are the explanations for the apparently contradictory upregulation of c-di-GMP in cells expressing higher PdeI levels? How could a phosphodiesterase lead to increased c-di-GMP levels? 

      We appreciate the reviewer's observation regarding the seemingly contradictory relationship between increased PdeI expression and elevated c-di-GMP levels. This is indeed an intriguing finding that warrants further explanation.

      PdeI was predicted to be a phosphodiesterase responsible for c-di-GMP degradation. This prediction is based on sequence analysis where PdeI contains an intact EAL domain known for degrading c-di-GMP. However, it is noteworthy that PdeI also contains a divergent GGDEF domain, which is typically associated with c-di-GMP synthesis (Fig S8). This dual-domain architecture suggests that PdeI may engage in complex regulatory roles. Previous studies have shown that the knockout of the major phosphodiesterase PdeH in E. coli leads to the accumulation of c-di-GMP. Further, a point mutation on PdeI's divergent GGDEF domain (G412S) in this PdeH knockout strain resulted in decreased c-di-GMP levels2, implying that the wild-type GGDEF domain in PdeI contributes to the maintenance or increase of c-di-GMP levels in the cell. Importantly, our single-cell experiments showed a positive correlation between PdeI expression levels and c-di-GMP levels (Response Fig. 9B). In this revision, we also constructed PdeI(G412S)-BFP mutation strain. Notably, our observations of this strain revealed that c-di-GMP levels remained constant despite increasing BFP fluorescence, which serves as a proxy for PdeI(G412S) expression levels (Fig. 4D). This experimental evidence, along with domain analysis, suggests that PdeI could contribute to c-di-GMP synthesis, rebutting the notion that it solely functions as a phosphodiesterase. HPLC LC-MS/MS analysis further confirmed that PdeI overexpression, induced by arabinose, led to an upregulation of c-di-GMP levels (Fig. 4E). These results strongly suggest that PdeI plays a significant role in upregulating c-di-GMP levels. Our further analysis revealed that PdeI contains a CHASE (cyclases/histidine kinase-associated sensory) domain. Combined with our experimental results demonstrating that PdeI is a membrane-associated protein, we hypothesize that PdeI functions as a sensor that integrates environmental signals with c-di-GMP production under complex regulatory mechanisms.

      We have also included this explanation (lines 193-217) and the supporting experimental data (Fig. 4D & 4J) in our manuscript to clarify this important point. Thank you for highlighting this apparent contradiction, as it has allowed us to provide a more comprehensive explanation of our findings.

      What about the rest of the genes in cluster 2 of the biofilm? They should be used to help interpret the association between PdeI and c-di-GMP. 

      We understand your interest in the other genes present in cluster 2 of the biofilm and their potential relationship to PdeI and c-di-GMP. After careful analysis, we have determined that the other marker genes in this cluster do not have a significant impact on biofilm formation. Furthermore, we have not found any direct relationship between these genes and c-di-GMP or PdeI. Our focus on PdeI in this cluster is due to its unique and significant role in c-di-GMP regulation and biofilm formation, as demonstrated by our experimental results. While the other genes in this cluster may be co-expressed, their functions appear to be unrelated to the PdeI and c-di-GMP pathway we are investigating. We chose not to elaborate on these genes in our main discussion as they do not contribute directly to our understanding of the PdeI and c-di-GMP association. Instead, we could include a brief mention of these genes in the manuscript, noting that they were found to be unrelated to the PdeI-c-di-GMP pathway. This would provide a more comprehensive view of the cluster composition while maintaining focus on the key findings related to PdeI and c-di-GMP.

      Author response image 2.

      Protein-protein interactions of marker genes in cluster 2 of 24-hour static biofilms of E coli data.

      A verification is needed that the protein fusion to PdeI functional/membrane localization is not due to protein interactions with fluorescent protein fusion. 

      We appreciate your concern regarding the potential impact of the fluorescent protein fusion on the functionality and membrane localization of PdeI. It is crucial to verify that the observed effects are attributable to PdeI itself and not an artifact of its fusion with the fluorescent protein. To address this matter, we have incorporated a control group expressing only the fluorescent protein BFP (without the PdeI fusion) under the same promoter. This experimental design allows us to differentiate between effects caused by PdeI and those potentially arising from the fluorescent protein alone.

      Our results revealed the following key observations:

      (1) Cellular Localization: The GFP alone exhibited a uniform distribution in the cytoplasm of bacterial cells, whereas the PdeI-GFP fusion protein was specifically localized to the membrane (Fig. 4C).

      (2) Localization in the Biofilm Matrix: BFP-positive cells were distributed throughout the entire biofilm community. In contrast, PdeI-BFP positive cells localized at the bottom of the biofilm, where cell-surface adhesion occurs (Fig 4F).

      (3) c-di-GMP Levels: Cells with high levels of BFP displayed no increase in c-di-GMP levels. Conversely, cells with high levels of PdeI-BFP exhibited a significant increase in c-di-GMP levels (Fig. 4D).

      (4) Persister Cell Ratio: Cells expressing high levels of BFP showed no increase in persister ratios, while cells with elevated levels of PdeI-BFP demonstrated a marked increase in persister ratios (Fig. 4J).

      These findings from the control experiments have been included in our manuscript (lines 193-244, Fig. 4C, 4D, 4F, 4G and 4J), providing robust validation of our results concerning the PdeI fusion protein. They confirm that the observed effects are indeed due to PdeI and not merely artifacts of the fluorescent protein fusion.

      (!) Vrabioiu, A. M. & Berg, H. C. Signaling events that occur when cells of Escherichia coli encounter a glass surface. Proceedings of the National Academy of Sciences of the United States of America 119, doi:10.1073/pnas.2116830119 (2022). https://doi.org/10.1073/pnas.2116830119

      (2)bReinders, A. et al. Expression and Genetic Activation of Cyclic Di-GMP-Specific Phosphodiesterases in Escherichia coli. J Bacteriol 198, 448-462 (2016). https://doi.org:10.1128/JB.00604-15

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study attempts to resolve an apparent paradox of rapid evolutionary rates of multi-copy gene systems by using a theoretical model that integrates two classic population models. While the conceptual framework is intuitive and thus useful, the specific model is perplexing and difficult to penetrate for non-specialists. The data analysis of rRNA genes provides inadequate support for the conclusions due to a lack of consideration of technical challenges, mutation rate variation, and the relationship between molecular processes and model parameters.

      Overall Responses:

      Since the eLife assessment succinctly captures the key points of the reviews, the reply here can be seen as the overall responses to the summed criticisms. We believe that the overview should be sufficient to address the main concerns, but further details can be found in the point-by-point responses below. The overview covers the same grounds as the provisional responses (see the end of this rebuttal) but is organized more systematically in response to the reviews. The criticisms together fall into four broad areas. 

      First, the lack of engagement with the literature, particularly concerning Cannings models and non-diffusive limits. This is the main rebuttal of the companion paper (eLife-RP-RA-2024-99990). The literature in question is all in the WF framework and with modifications, in particular, with the introduction of V(K). Nevertheless, all WF models are based on population sampling. The Haldane model is an entirely different model of genetic drift, based on gene transmission. Most importantly, the WF models and the Haldane model differ in the ability to handle the four paradoxes presented in the two papers. These paradoxes are all incompatible with the WF models.

      Second, the poor presentation of the model that makes the analyses and results difficult to interpret. In retrospect, we fully agree and thank all the reviewers for pointing them out. Indeed, we have unnecessarily complicated the model. Even the key concept that defines the paradox, which is the effective copy number of rRNA genes, is difficult to comprehend. We have streamlined the presentation now. Briefly, the complexity arose from the general formulation permitting V(K) ≠ E(K) even for single copy genes. (It would serve the same purpose if we simply let V(K) = E(K) for single copy genes.) The sentences below, copied from the new abstract, should clarify the issue. The full text in the Results section has all the details.

      “On average, rDNAs have C ~ 150 - 300 copies per haploid in humans. While a neutral mutation of a single-copy gene would take 4N generations (N being the population size of an ideal population) to become fixed, the time should be 4NC* generations for rRNA genes (C* being the effective copy number). Note that C* >> 1, but C* < (or >) C would depend on the drift strength. Surprisingly, the observed fixation time in mouse and human is < 4N, implying the paradox of C* < 1.”

      Third, the confusion about which rRNA gene is being compared with which homology, as there are hundreds of them. We should note that the effective copy number C* indicates that the rRNA gene arrays do not correspond with the “gene locus” concept. This is at the heart of the confusion we failed to remove clearly. We now use the term “pseudo-population” to clarify the nature of rDNA variation and evolution. The relevant passage is reproduced from the main text shown below.

      “The pseudo-population of ribosomal DNA copies within each individual

      While a human haploid with 200 rRNA genes may appear to have 200 loci, the concept of "gene loci" cannot be applied to the rRNA gene clusters. This is because DNA sequences can spread from one copy to others on the same chromosome via replication slippage. They can also spread among copies on different chromosomes via gene conversion and unequal crossovers (Nagylaki 1983; Ohta and Dover 1983; Stults, et al. 2008; Smirnov, et al. 2021). Replication slippage and unequal crossovers would also alter the copy number of rRNA genes. These mechanisms will be referred to collectively as the homogenization process. Copies of the cluster on the same chromosome are known to be nearly identical in sequences (Hori, et al. 2021; Nurk, et al. 2022). Previous research has also provided extensive evidence for genetic exchanges between chromosomes (Krystal, et al. 1981; Arnheim, et al. 1982; van Sluis, et al. 2019).

      In short, rRNA gene copies in an individual can be treated as a pseudo-population of gene copies. Such a pseudo-population is not Mendelian but its genetic drift can be analyzed using the branching process (see below). The pseudo-population corresponds to the "chromosome community" proposed recently (Guarracino, et al. 2023). As seen in Fig. 1C, the five short arms harbor a shared pool of rRNA genes that can be exchanged among them. Fig. 1D presents the possible molecular mechanisms of genetic drift within individuals whereby mutations may spread, segregate or disappear among copies. Hence, rRNA gene diversity or polymorphism refers to the variation across all rRNA copies, as these genes exist as paralogs rather than orthologs. This diversity can be assessed at both individual and population levels according to the multi-copy nature of rRNA genes.”

      Fourth, the lack of consideration of many technical challenges. We have responded to the criticisms point-by-point below. One of the main criticisms is about mutation rate differences between single-copy and rRNA genes. We did in fact alluded to the parity in mutation rate between them in the original text but should have presented this property more prominently as is done now. Below is copied from the revised text:

      “We now consider the evolution of rRNA genes between species by analyzing the rate of fixation (or near fixation) of mutations. Polymorphic variants are filtered out in the calculation. Note that Eq. (3) shows that the mutation rate, m, determines the long-term evolutionary rate, l. Since we will compare the l values between rRNA and single-copy genes, we have to compare their mutation rates first by analyzing their long-term evolution. As shown in Table S1, l falls in the range of 50-60 (differences per Kb) for single copy genes and 40 – 70 for the non-functional parts of rRNA genes. The data thus suggest that rRNA and single-copy genes are comparable in mutation rate. Differences between their l values will have to be explained by other means.”

      While the overview should address the key issues, we now present the point-by-point response below. 

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Wang et al is, like its companion paper, very unusual in the opinion of this reviewer. It builds off of the companion theory paper's exploration of the "Wright-Fisher Haldane" model but applies it to the specific problem of diversity in ribosomal RNA arrays.

      The authors argue that polymorphism and divergence among rRNA arrays are inconsistent with neutral evolution, primarily stating that the amount of polymorphism suggests a high effective size and thus a slow fixation rate, while we, in fact, observe relatively fast fixation between species, even in putatively non-functional regions.

      They frame this as a paradox in need of solving, and invoke the WFH model.

      The same critiques apply to this paper as to the presentation of the WFH model and the lack of engagement with the literature, particularly concerning Cannings models and non-diffusive limits. However, I have additional concerns about this manuscript, which I found particularly difficult to follow.

      Response 1: We would like to emphasize that, despite the many modified WF models, there has not been a model for quantifying genetic drift in multi-copy gene systems, due to the complexity of two levels of genetic drift – within individuals as well as between individuals of the population. We will address this question in the revised manuscript (Ruan, et al. 2024) and have included a mention of it in the text as follows:

      “In the WF model, gene frequency is governed by 1/N (or 1/2_N_ in diploids) because K would follow the Poisson distribution whereby V(K) = E(K). As E(K) is generally ~1, V(K) would also be ~ 1. In this backdrop, many "modified WF" models have been developed(Der, et al. 2011), most of them permitting V(K) ≠ E(K) (Karlin and McGregor 1964; Chia and Watterson 1969; Cannings 1974). Nevertheless, paradoxes encountered by the standard WF model apply to these modified WF models as well because all WF models share the key feature of gene sampling (see below and (Ruan, et al. 2024)). ”

      My first, and most major, concern is that I can never tell when the authors are referring to diversity in a single copy of an rRNA gene compared to when they are discussing diversity across the entire array of rRNA genes. I admit that I am not at all an expert in studies of rRNA diversity, so perhaps this is a standard understanding in the field, but in order for this manuscript to be read and understood by a larger number of people, these issues must be clarified.

      Response 2: We appreciate the reviewer’s feedback and acknowledge that the distinction between the diversity of individual rRNA gene copies and the diversity across the entire array of rRNA genes may not have been clearly defined in the original manuscript. The diversity in our manuscript is referring to the genetic diversity of the population of rRNA genes in the cell. To address this concern, we have revised the relevant paragraph in the text:

      “Hence, rRNA gene diversity or polymorphism refer to the variation across all rRNA copies, as these genes exist as paralogs rather than orthologs. This diversity can be assessed at both individual and population levels according to the multi-copy nature of rRNA genes.”

      Additionally, we have updated the Methods section to include a detailed description of how diversity is measured as follows:

      “All mapping and analysis are performed among individual copies of rRNA genes.

      Each individual was considered as a psedo-population of rRNA genes and the diversity of rRNA genes was calculated using this psedo-population of rRNA genes.”

      The authors frame the number of rRNA genes as roughly equivalent to expanding the population size, but this seems to be wrong: the way that a mutation can spread among rRNA gene copies is fundamentally different than how mutations spread within a single copy gene. In particular, a mutation in a single copy gene can spread through vertical transmission, but a mutation spreading from one copy to another is fundamentally horizontal: it has to occur because some molecular mechanism, such as slippage, gene conversion, or recombination resulted in its spread to another copy. Moreover, by collapsing diversity across genes in an rRNA array, the authors are massively increasing the mutational target size.   

      For example, it's difficult for me to tell if the discussion of heterozygosity at rRNA genes in mice starting on line 277 is collapsed or not. The authors point out that Hs per kb is ~5x larger in rRNA than the rest of the genome, but I can't tell based on the authors' description if this is diversity per single copy locus or after collapsing loci together. If it's the first one, I have concerns about diversity estimation in highly repetitive regions that would need to be addressed, and if it's the second one, an elevated rate of polymorphism is not surprising, because the mutational target size is in fact significantly larger.

      Response 3: As addressed in previous Response2, the measurement of diversity or heterozygosity of rRNA genes is consistently done by combining copies, as there is no concept of single gene locus for rDNAs. We agree that by combining the diversity across multiple rRNA gene copies into one measurement, the mutational target size is effectively increased, leading to higher observed levels of diversity than one gene. This is in line with our text:

      “If we use the polymorphism data, it is as if rDNA array has a population size 5.2 times larger than single-copy genes. Although the actual copy number on each haploid is ~ 110, these copies do not segregate like single-copy genes and we should not expect N* to be 100 times larger than N. The HS results confirm the prediction that rRNA genes should be more polymorphic than single-copy genes.”

      Under this consensus, the reviewer points out that the having a large number of rRNA genes is not equivalent to having a larger population size, because the spreading of mutations among rDNA copies within a species involves two stages: within individual (horizontal transmission) and between individuals (vertical transmission). Let’s examine how the mutation spreading mechanisms influence the population size of rRNA genes.

      First, an increase in the copy number of rRNA genes dose increase the actual population size (CN) of rRNA genes. If reviewer is referring to the effective population size of rRNA genes in the context of diversity (N* = CN/V*(K)), then an increase in C would also increase N*. In addition, the linkage among copies would reduce the drift effect, leading to increase diversity. Conversely, homogenization mechanism, like gene conversion and unequal crossing-over would reduce genetic variations between copies and increase V*(K), leading to lower diversity. Therefore, the C* =C/V*(K) in mice is about 5 times larger for rRNA genes than the rest of the genome (which mainly single-copy genes), even though the actual copy number is about 110, indicating a high homogenization rate.

      Even if these issues were sorted out, I'm not sure that the authors framing, in terms of variance in reproductive success is a useful way to understand what is going on in rRNA arrays. The authors explicitly highlight homogenizing forces such as gene conversion and replication slippage but then seem to just want to incorporate those as accounting for variance in reproductive success. However, don't we usually want to dissect these things in terms of their underlying mechanism? Why build a model based on variance in reproductive success when you could instead explicitly model these homogenizing processes? That seems more informative about the mechanism, and it would also serve significantly better as a null model, since the parameters would be able to be related to in vitro or in vivo measurements of the rates of slippage, gene conversion, etc.

      In the end, I find the paper in its current state somewhat difficult to review in more detail, because I have a hard time understanding some of the more technical aspects of the manuscript while so confused about high-level features of the manuscript. I think that a revision would need to be substantially clarified in the ways I highlighted above.

      Response 4: We appreciate your perspective on modeling the homogenizing processes of rRNA gene arrays.

      We employ the WFH model to track the drift effect of the multi-copy gene system. In the context of the Haldane model, the term K is often referred to as reproductive success, but it might be more accurate to interpret it as “transmission rate” in this study. As stated in the caption of Figure 1D, two new mutations can have very large differences in individual output (K) when transmitted to the next generation through homogenization process.

      Regarding why we did not explicitly model different mechanisms of homogenization, previous elegant models of multigene families have involved mechanisms like unequal crossing over(Smith 1974a; Ohta 1976; Smith 1976) or gene conversion (Nagylaki 1983; Ohta 1985) for concerted evolution, or using conversion to approximate the joint effect of conversion and crossing over (Ohta and Dover 1984). However, even when simplifying the gene conversion mechanism, modeling remains challenging due to controversial assumptions, such as uniform homogenization rate across all gene members (Dover 1982; Ohta and Dover 1984). No models can fully capture the extreme complexity of factors, while these unbiased mechanisms are all genetic drift forces that contribute to changes in mutant transmission. Therefore, we opted for a more simplified and collective approach using V*(K) to see the overall strength of genetic drift.

      We have discussed the reason for using V*(K) to collectively represent the homogenization effect in Discussion. As stated in our manuscript:

      “There have been many rigorous analyses that confront the homogenizing mechanisms directly. These studies (Smith 1974b; Ohta 1976; Dover 1982; Nagylaki 1983; Ohta and Dover 1983) modeled gene conversion and unequal cross-over head on. Unfortunately, on top of the complexities of such models, the key parameter values are rarely obtainable. In the branching process, all these complexities are wrapped into V*(K) for formulating the evolutionary rate. In such a formulation, the collective strength of these various forces may indeed be measurable, as shown in this study.”

      Reviewer #2 (Public Review):

      Summary:

      Multi-copy gene systems are expected to evolve slower than single-copy gene systems because it takes longer for genetic variants to fix in the large number of gene copies in the entire population. Paradoxically, their evolution is often observed to be surprisingly fast. To explain this paradox, the authors hypothesize that the rapid evolution of multi-copy gene systems arises from stronger genetic drift driven by homogenizing forces within individuals, such as gene conversion, unequal crossover, and replication slippage. They formulate this idea by combining the advantages of two classic population genetic models -- adding the V(k) term (which is the variance in reproductive success) in the Haldane model to the Wright-Fisher model. Using this model, the authors derived the strength of genetic drift (i.e., reciprocal of the effective population size, Ne) for the multi-copy gene system and compared it to that of the single-copy system. The theory was then applied to empirical genetic polymorphism and divergence data in rodents and great apes, relying on comparison between rRNA genes and genome-wide patterns (which mostly are single-copy genes). Based on this analysis, the authors concluded that neutral genetic drift could explain the rRNA diversity and evolution patterns in mice but not in humans and chimpanzees, pointing to a positive selection of rRNA variants in great apes.

      Strengths:

      Overall, the new WFH model is an interesting idea. It is intuitive, efficient, and versatile in various scenarios, including the multi-copy gene system and other cases discussed in the companion paper by Ruan et al.

      Weaknesses:

      Despite being intuitive at a high level, the model is a little unclear, as several terms in the main text were not clearly defined and connections between model parameters and biological mechanisms are missing. Most importantly, the data analysis of rRNA genes is extremely over-simplified and does not adequately consider biological and technical factors that are not discussed in the model. Even if these factors are ignored, the authors' interpretation of several observations is unconvincing, as alternative scenarios can lead to similar patterns. Consequently, the conclusions regarding rRNA genes are poorly supported. Overall, I think this paper shines more in the model than the data analysis, and the modeling part would be better presented as a section of the companion theory paper rather than a stand-alone paper. My specific concerns are outlined below.

      Response 5: We appreciate the reviewer’s feedback and recognize the need for clearer definitions of key terms. We have made revisions to ensure that each term is properly defined upon its first use.

      Regarding the model’s simplicity, as in the Response4, our intention was to create a framework that captures the essence of how mutant copies spread by chance within a population, relying on the variance in transmission rates for each copy (V(K)). By doing so, we aimed to incorporate the various homogenization mechanisms that do not affect single-copy genes, highlighting the substantially stronger genetic drift observed in multi-copy systems compared to single-copy genes. We believe that simplifying the model was necessary to make it more accessible and practical for real-world data analysis and provides a useful approximation that can be applied broadly. It is clearly an underestimate the actual rate as some forces with canceling effects might not have been accounted for.

      (1) Unclear definition of terms

      Many of the terms in the model or the main text were not clearly defined the first time they occurred, which hindered understanding of the model and observations reported. To name a few:

      (i) In Eq(1), although C* is defined as the "effective copy number", it is unclear what it means in an empirical sense. For example, Ne could be interpreted as "an ideal WF population with this size would have the same level of genetic diversity as the population of interest" or "the reciprocal of strength of allele frequency change in a unit of time". A few factors were provided that could affect C*, but specifically, how do these factors impact C*? For example, does increased replication slippage increase or decrease C*? How about gene conversion or unequal cross-over? If we don't even have a qualitative understanding of how these processes influence C*, it is very hard to make interpretations based on inferred C*. How to interpret the claim on lines 240-241 (If the homogenization is powerful enough, rRNA genes would have C*<1)? Please also clarify what C* would be, in a single-copy gene system in diploid species.

      Response 6: We apology for the confusion caused by the lack of clear definitions in the initial manuscript. We recognize that this has led to misunderstandings regarding the concept we presented. Our aim was to demonstrate the concerted evolution in multi-copy gene systems, involving two levels of “effective copy number” relative to single-copy genes: first, homogenization within populations then divergence between species. We used C* and Ne* to try to designated the two levels driven by the same homogenization force, which complicated the evolutionary pattern.

      To address these issues, we have simplified the model and revised the abstract to prevent any misunderstandings:

      “On average, rDNAs have C ~ 150 - 300 copies per haploid in humans. While a neutral mutation of a single-copy gene would take 4_N_ (N being the population size) generations to become fixed, the time should be 4_NC* generations for rRNA genes where 1<< C* (C* being the effective copy number; C* < C or C* > C would depend on the drift strength). However, the observed fixation time in mouse and human is < 4_N, implying the paradox of C* < 1. Genetic drift that encompasses all random neutral evolutionary forces appears as much as 100 times stronger for rRNA genes as for single-copy genes, thus reducing C* to < 1.”

      Thus, it should be clear that the fixation time as well as the level of polymorphism represent the empirical measures of C*.We have also revised the relevant paragraph in the text to define C* and V*(K) and removed Eq. 2 for clarity:

      “Below, we compare the strength of genetic drift in rRNA genes vs. that of single-copy genes using the Haldane model (Ruan, et al. 2024). We shall use * to designate the equivalent symbols for rRNA genes; for example, E(K) vs. E*(K). Both are set to 1, such that the total number of copies in the long run remains constant.

      For simplicity, we let V(K) = 1 for single-copy genes. (If we permit V(K) ≠ 1, the analyses will involve the ratio of V*(K) and V(K) to reach the same conclusion but with unnecessary complexities.) For rRNA genes,  V*(K) ≥ 1 may generally be true because K for rDNA mutations are affected by a host of homogenization factors including replication slippage, unequal cross-over, gene conversion and other related mechanisms not operating on single copy genes. Hence,

      where C is the average number of rRNA genes in an individual and V*(K) reflects the homogenization process on rRNA genes (Fig. 1D). Thus,

      C* = C/V*(K)

      represents the effective copy number of rRNA genes in the population, determining the level of genetic diversity relative to single-copy genes. Since C is in the hundreds and V*(K) is expected to be > 1, the relationship of 1 << C* ≤ C is hypothesized. Fig. 1D is a simple illustration that the homogenizing process may enhance V*(K) substantially over the WF model.

      In short, genetic drift of rRNA genes would be equivalent to single copy genes in a population of size NC* (or N*). Since C* >> 1 is hypothesized, genetic drift for rRNA genes is expected to be slower than for single copy genes.”

      (ii) In Eq(1), what exactly is V*(K)? Variance in reproductive success across all gene copies in the population? What factors affect V*(K)? For the same population, what is the possible range of V*(K)/V(K)? Is it somewhat bounded because of biological constraints? Are V*(K) and C*(K) independent parameters, or does one affect the other, or are both affected by an overlapping set of factors?

      Response 7: - In Eq(1), what exactly is V*(K)?  In Eq(1), V*(K) refers to the variance in the number of progeny to whom the gene copy of interest is transmitted (K) over a specific time interval. When considering evolutionary divergence between species, V*(K) may correspond to the divergence time.

      - What factors affect V*(K)? For the same population, what is the possible range of V*(K)/V(K)? Is it somewhat bounded because of biological constraints?  “V*(K) for rRNA genes is likely to be much larger than V(K) for single-copy genes, because K for rRNA mutations may be affected by a host of homogenization factors including replication slippage, unequal cross-over, gene conversion and other related mechanisms not operating on single-copy genes. For simplicity, we let V(K) = 1 (as in a WF population) and V*(K) ≥ 1.” Thus, the V*(K)/V(K) = V*(K) can potentially reach values in the hundreds, and may even exceed C, resulting in C*(= C/V*(K)) values less than 1. Biological constraints that could limit this variance include the minimum copy number within individuals, sequence constraints in functional regions, and the susceptibility of chromosomes with large arrays to intrachromosomal crossover (which may lead to a reduction in copy number)(Eickbush and Eickbush 2007), potentially reducing the variability of K.

      - Are V*(K) and C*(K) independent parameters, or does one affect the other, or are both affected by an overlapping set of factors?  There is no C*(K), the C* is defined as follows in the text:

      “C* = C/V*(K) represents the effective copy number of rRNA genes, reflecting the level of genetic diversity relative to single-copy genes. Since C is in the hundreds and V*(K) is expected to be > 1, the relationship of 1 << C* ≤ C is hypothesized.” The factors influencing V*(K) directly affect C* due to this relationship.

      (iii) In the multi-copy gene system, how is fixation defined? A variant found at the same position in all copies of the rRNA genes in the entire population?

      Response 8: We appreciate the reviewer's suggestion and have now provided a clear definition of fixation in the context of multi-copy genes within the manuscript.

      “For rDNA mutations, fixation must occur in two stages – fixation within individuals and among individuals in the population. (Note that a new mutation can be fixed via homogenization, thus making rRNA gene copies in an individual a pseudo-population.)”

      The evolutionary dynamics of multi-copy genes differ from those of single-copy (Mendelian) genes, which mutate, segregate and evolve independently in the population. Fixation in multi-copy genes, such as rRNA genes, is influenced by their ability to transfer genetic information among their copies through nonreciprocal exchange mechanisms, like gene conversion and unequal crossover (Ohta and Dover 1984). These processes can cause fluctuations in the number of mutant copies within an individual's lifetime and facilitate the spread of a mutant allele across all copies even in non-homologous chromosomes. Over time, this can result in the mutant allele replacing all preexisting alleles throughout the population, leading to fixation (Ohta 1976) meaning that the same variant will eventually be present at the corresponding position in all copies of the rRNA genes across the entire population. Without such homogenization processes, fixation would be unlikely to be obtained in multi-copy genes.

      (iv) Lines 199-201, HI, Hs, and HT are not defined in the context of a multi-copy gene system. What are the empirical estimators?

      Response 9: We appreciate the reviewer's comment and would like to clarify the definitions and empirical estimators for within the context of a multi-copy gene system in the text:

      “A standard measure of genetic drift is the level of heterozygosity (H). At the mutation-selection equilibrium

      where μ is the mutation rate of the entire gene and Ne is the effective population size. In this study, Ne = N for single-copy gene and Ne = C*N for rRNA genes. The empirical measure of nucleotide diversity H is given by

      where L is the gene length (for each copy of rRNA gene, L ~ 43kb) and pi is the variant frequency at the i-th site.

      We calculate H of rRNA genes at three levels – within-individual, within-species and then, within total samples (HI, HS and HT, respectively). HS and HT are standard population genetic measures (Hartl, et al. 1997; Crow and Kimura 2009). In calculating HS, all sequences in the species are used, regardless of the source individuals. A similar procedure is applied to HT. The HI statistic is adopted for multi-copy gene systems for measuring within-individual polymorphism. Note that copies within each individual are treated as a pseudo-population (see Fig. 1 and text above). With multiple individuals, HI is averaged over them.”

      (v) Line 392-393, f and g are not clearly defined. What does "the proportion of AT-to-GC conversion" mean? What are the numerator and denominator of the fraction, respectively?

      Response 10: We appreciate the reviewer's comment and have revised the relevant text for clarity as well as improved the specific calculation methods for f and g in the Methods section.

      “We first designate the proportion of AT-to-GC conversion as f and the reciprocal, GC-to-AT, as g. Specifically, f represents the proportion of fixed mutations where an A or T nucleotide has been converted to a G or C nucleotide (see Methods). Given f ≠ g, this bias is true at the site level.”

      Methods:

      “Specifically, f represents the proportion of fixed mutations where an A or T nucleotide has been converted to a G or C nucleotide. The numerator for f is the number of fixed mutations from A-to-G, T-to-C, T-to-G, or A-to-C. The denominator is the total number of A or T sites in the rDNA sequence of the specie lineage.

      Similarly, g is defined as the proportion of fixed mutations where a G or C nucleotide has been converted to an A or T nucleotide. The numerator for g is the number of fixed mutations from G-to-A, C-to-T, C-to-A, or G-to-T. The denominator is the total number of G or C sites in the rDNA sequence of the specie lineage.

      The consensus rDNA sequences for the species lineage were generated by Samtools consensus (Danecek, et al. 2021) from the bam file after alignment. The following command was used:

      ‘samtools consensus -@ 20 -a -d 10 --show-ins no --show-del yes input_sorted.bam output.fa’.”

      (2) Technical concerns with rRNA gene data quality

      Given the highly repetitive nature and rapid evolution of rRNA genes, myriads of things could go wrong with read alignment and variant calling, raising great concerns regarding the data quality. The data source and methods used for calling variants were insufficiently described at places, further exacerbating the concern.

      (i) What are the accession numbers or sample IDs of the high-coverage WGS data of humans, chimpanzees, and gorillas from NCBI? How many individuals are in each species? These details are necessary to ensure reproducibility and correct interpretation of the results.

      Response 11: We apologize for not including the specific details of the sample information in the main text. All accession numbers and sample IDs for the WGS data used in this study, including mice, humans, chimpanzee, and gorilla, are already listed in Supplementary Tables S4-S5. We have revised the table captions and referenced them at the appropriate points in the Methods to ensure clarity.

      “The genome sequences of human (n = 8), chimpanzee (n = 1) and gorilla (n = 1) were sourced from National Center for Biotechnology Information (NCBI) (Supplementary Table 4). … Genomic sequences of mice (n = 13) were sourced from the Wellcome Sanger Institute’s Mouse Genome Project (MGP) (Keane, et al. 2011).

      The concern regarding the number of individuals needed to support the results will be addressed in Response 13.

      (ii) Sequencing reads from great apes and mice were mapped against the human and mouse rDNA reference sequences, respectively (lines 485-486). Given the rapid evolution of rRNA genes, even individuals within the same species differ in copy number and sequences of these genes. Alignment to a single reference genome would likely lead to incorrect and even failed alignment for some reads, resulting in genotyping errors. Differences in rDNA sequence, copy number, and structure are even greater between species, potentially leading to higher error rates in the called variants. Yet the authors provided no justification for the practice of aligning reads from multiple species to a single reference genome nor evidence that misalignment and incorrect variant calling are not major concerns for the downstream analysis.

      Response 12: While the copy number of rDNA varies in each individuals, the sequence identity among copies is typically very high (median identity of 98.7% (Nurk, et al. 2022)). Therefore, all rRNA genes were aligned against to the species-specific reference sequences, where the consensus nucleotide nearly accounts for >90% of the gene copies in the population. In minimize genotyping errors, our analysis focused exclusively on single nucleotide variants (SNVs) with only two alleles, discarding other mutation types.

      Regarding sequence divergence between species, which may have greater sequence variations, we excluded unmapped regions with high-quality reads coverage below 10. In calculation of substitution rate, we accounted for the mapping length (L), as shown in the column 3 in Table 3-5.

      We appreciate the reviewer’s comments and have provide details in the Methods.

      (vi) It is unclear how variant frequency within an individual was defined conceptually or computed from data (lines 499-501). The population-level variant frequency was calculated by averaging across individuals, but why was the averaging not weighted by the copy number of rRNA genes each individual carries? How many individuals are sampled for each species? Are the sample sizes sufficient to provide an accurate estimate of population frequencies?

      Response 13: Each individual was considered as a psedo-population of rRNA genes, varaint frequency within an individual was the proportions of mutant allele in this psedo-population. The calculation of varaint frequency is based on the number of supported reads of each individual.

      The reason for calculating population-level variant frequency by averaging across individuals is relevant in the calculation of FIS and FST. In calculating FST, the standard practice is to weigh each population equally. So, when we show FST in humans, we do not consider whether there are more Africans, Caucasians or Asians. There is a reason for not weighing them even though the population sizes could be orders of magnitude different, say, in the comparison between an ethnic minority and the main population. In the case of FIS, the issue is moot. Although copy number may range from 150 to 400 per haploid, most people have 300 – 500 copies with two haploids.

      As for the concern regarding the number the individuals needed to support of the results:

      Considering the nature of multi-copy genes, where gene members undergo continuous exchanges at a much slower rate compared to the rapid rate of random distribution of chromosomes at each generation of sexual reproduction, even a few variant copies that arise during an individual's lifetime would disperse into the gene pool in the next generation (Ohta and Dover 1984). Thus, there is minimal difference between individuals. Our analysis is also aligns with this theory, particularly in human population (FIS = 0.059), where each individual carries the majority of the population's genetic diversity. Therefore, even a single chimpanzee or gorilla individual caries sufficient diversity with its hundreds of gene copies to calculate divergence with humans.

      (vii) Fixed variants are operationally defined as those with a frequency>0.8 in one species. What is the justification for this choice of threshold? Without knowing the exact sample size of the various species, it's difficult to assess whether this threshold is appropriate.

      Response 14: First, the mutation frequency distribution is strongly bimodal (see Figure below) with a peak at zero and the other at 1. This high frequency peak starts to rise slowly at 0.8, similar to FST distribution in Figure 4C. That is why we use it as the cutoff although we would get similar results at the cutoff of 0.90 (see Table below). Second, the sample size for the calculation of mutant frequency is based on the number of reads which is usually in the tens of thousands. Third, it does not matter if the mutation frequency calculation is based on one individuals or multiple individuals because 95% of the genetic diversity of the population is captured by the gene pool within each individual.

      Author response image 1.

      Author response table 1.

      The A/T to G/C and G/C to A/T changes in apes and mouse.

      New mutants with a frequency >0.9 within an individual are considered as (nearly) fixed, except for humans, where the frequency was averaged over 8 individuals in the Table 2.

      The X-squared values for each species are as follows: 58.303 for human, 7.9292 for chimpanzee, and 0.85385 for M. m. domesticus.

      (viii) It is not explained exactly how FIS, FST, and divergence levels of rRNA genes were calculated from variant frequency at individual and species levels. Formulae need to be provided to explain the computation.

      Response 15: After we clearly defined the HI, HS, and HT in Response9, understanding FIS and F_ST_ becomes straightforward.

      “Given the three levels of heterozygosity, there are two levels of differentiation. First, FIS is the differentiation among individuals within the species, defined by

      FIS = [HS - HI]/HS  

      FIS is hence the proportion of genetic diversity in the species that is found only between individuals. We will later show FIS ~ 0.05 in human rDNA (Table 2), meaning 95% of rDNA diversity is found within individuals.

      Second, FST is the differentiation between species within the total species complex, defined as

      FST = [HT – HS]/HT 

      FST is the proportion of genetic diversity in the total data that is found only between species.”

      (3) Complete ignorance of the difference in mutation rate difference between rRNA genes and genome-wide average

      Nearly all data analysis in this paper relied on comparison between rRNA genes with the rest (presumably single-copy part) of the genome. However, mutation rate, a key parameter determining the diversity and divergence levels, was completely ignored in the comparison. It is well known that mutation rate differs tremendously along the genome, with both fine and large-scale variation. If the mutation rate of rRNA genes differs substantially from the genome average, it would invalidate almost all of the analysis results. Yet no discussion or justification was provided.

      Response 16: We appreciate the reviewer's observation regarding the potential impact of varying mutation rates across the genome. To address this concern, we compared the long-term substitution rates on rDNA and single-copy genes between human and rhesus macaque, which diverged approximately 25 million years ago. Our analysis (see Table S1 below) indicates that the substitution rate in rDNA is actually slower than the genome-wide average. This finding suggests that rRNA genes do not experience a higher mutation rate compared to single-copy genes, as stated in the text:

      “Note that Eq. (3) shows that the mutation rate, m, determines the long-term evolutionary rate, l. Since we will compare the l values between rRNA and single-copy genes, we have to compare their mutation rates first by analyzing their long-term evolution. As shown in Table S1, l falls in the range of 50-60 (differences per Kb) for single copy genes and 40 – 70 for the non-functional parts of rRNA genes. The data thus suggest that rRNA and single-copy genes are comparable in mutation rate. Differences between their l values will have to be explained by other means.”

      However, given the divergence time (Td) being equal to or smaller than Tf, even if the mutation rate per nucleotide is substantially higher in rRNA genes, these variants would not become fixed after the divergence of humans and chimpanzees without the help of strong homogenization forces. Thus, the presence of divergence sites (Table 5) still supports the conclusion that rRNA genes undergo much stronger genetic drift compared to single-copy genes.

      Related to mutation rate: given the hypermutability of CpG sites, it is surprising that the evolution/fixation rate of rRNA estimated with or without CpG sites is so close (2.24% vs 2.27%). Given the 10 - 20-fold higher mutation rate at CpG sites in the human genome, and 2% CpG density (which is probably an under-estimate for rDNA), we expect the former to be at least 20% higher than the latter.

      Response 17: While it is true that CpG sites exhibit a 10-20-fold higher mutation rate, the close evolution/fixation rates of rDNA with and without CpG sites (2.24% vs 2.27%) may be attributed to the fact that fixation rates during short-term evolutionary processes are less influenced by mutation rates alone. As observed in the Human-Macaque comparison in the table above, the substitution rate of rDNA in non-functional regions with CpG sites is 4.18%, while it is 3.35% without CpG sites, aligning with your expectation of 25% higher rates where CpG sites are involved.

      This discrepancy between the expected and observed fixation rates may be due to strong homogenization forces, which can rapidly fix or eliminate variants, thereby reducing the overall impact of higher mutation rates at CpG sites on the observed fixation rate. This suggests that the homogenization mechanisms play a more dominant role in the fixation process over short evolutionary timescales, mitigating the expected increase in fixation rates due to CpG hypermutability.

      Among the weaknesses above, concern (1) can be addressed with clarification, but concerns (2) and (3) invalidate almost all findings from the data analysis and cannot be easily alleviated with a complete revamp work.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Both reviewers found the manuscript confusing and raised serious concerns. They pointed out a lack of engagement with previous literature on modeling and the presence of ill-defined terms within the model, which obscure understanding. They also noted a significant disconnection between the modeling approach and the biological processes involved. Additionally, the data analysis was deemed problematic due to the failure to consider essential biological and technical factors. One reviewer suggested that the modeling component would be more suitable as a section of the companion theory paper rather than a standalone paper. Please see their individual reviews for their overall assessment.

      Reviewer #2 (Recommendations For The Authors):

      Beyond my major concerns, I have numerous questions about the interpretation of various findings:

      Lines 62-63: Please explain under what circumstance Ne=N/V(K) is biologically nonsensical and why.

      Response 18: “Biologically non-sensical” is the term used in (Chen, et al. 2017). We now used the term “biologically untenable” but the message is the same. How does one get V(K) ≠ E(K) in the WF sampling? It is untenable under the WF structure. Kimura may be the first one to introduce V(K) ≠ E(K) into the WF model and subsequent papers use the same sort of modifications that are mathematically valid but biologically dubious. As explained extensively in the companion paper, the modifications add complexities but do not give the WF models powers to explain the paradoxes.

      Lines 231-234: The claim about a lower molecular evolution rate (lambda) is inaccurate - under neutrality, the molecular evolution rate is always the same as the mutation rate. It is true that when the species divergence Td is not much greater than fixation time Tf, the observed number of fixed differences would be substantially smaller than 2*mu*Td, but the lower divergence level does not mean that the molecular evolution is slower. In other words, in calculating the divergence level, it is the time term that needs to be adjusted rather than the molecular evolution rate.

      Response 19: Thanks, we agree that the original wording was not accurate. It is indeed the substitution rate rather than the molecular evolution rate that is affected when species divergence time Td is not much greater than the fixation time Tf. We have revised the relevant text in the manuscript to correct this and ensure clarity.

      Lines 277-279: Hs for rRNA is 5.2x fold than the genome average. This could be roughly translated as Ne*/Ne=5.2. According to Eq 2: (1/Ne*)/(1/Ne)= Vh/C*, it can be drived that mean Ne*/Ne=C*/Vh. Then why do the authors conclude "C*=N*/N~5.2" in line 278? Wouldn't it mean that C*/Vh is roughly 5.2?

      Response 20: We apologize for the confusion. To prevent misunderstandings, we have revised Equation 1 and deleted Equation 2 from the manuscript. Please refer to the Response6 for further details.

      Lines 291-292: What does "a major role of stage I evolution" mean? How does it lead to lower FIS?

      Response 21: We apologize for the lack of clarity in our original description, and we have revised the relevant content to make them more directly.

      “In this study, we focus on multi-copy gene systems, where the evolution takes place in two stages: both within (stage I) and between individuals (stage II).”

      FIS for rDNA among 8 human individuals is 0.059 (Table 2), much smaller than 0.142 in M. m. domesticus mice, indicating minimal genetic differences across human individuals and high level of genetic identity in rDNAs between homologous chromosomes among human population. … Correlation of polymorphic sites in IGS region is shown in Supplementary Fig. 1. The results suggest that the genetic drift due to the sampling of chromosomes during sexual reproduction (e.g., segregation and assortment) is augmented substantially by the effects of homogenization process within individual. Like those in mice, the pattern indicates that intra-species polymorphism is mainly preserved within individuals.”

      Line 297-300: why does the concentration at very allele frequency indicate rapid homogenization across copies? Suppose there is no inter-copy homogenization, and each copy evolves independently, wouldn't we still expect the SFS to be strongly skewed towards rare variants? It is completely unclear how homogenization processes are expected to affect the SFS.

      Response 22: We appreciate the reviewer’s insightful comments and apologize for any confusion in our original explanation. To clarify:

      If there is no inter-copy homogenization and each copy evolves independently, it would effectively result in an equivalent population size that is C times larger than that of single-copy genes. However, given the copies are distributed on five chromosomes, if the copies within a chromosome were fully linked, there would be no fixation at any sites. Considering the data presented in Table 4, where the substitution rate in rDNA is higher than in single-copy genes, this suggests that additional forces must be acting to homogenize the copies, even across non-homologous chromosomes.

      Regarding the specific data presented in the Figure 3, the allele frequency spectrum is based on human polymorphism sites and is a folded spectrum, as the ancestral state of the alleles was not determined. High levels of homogenization would typically push variant mutations toward the extremes of the SFS, leading to fewer intermediate-frequency alleles and reduced heterozygosity. The statement that "allele frequency spectrum is highly concentrated at very low frequency within individuals" was intended to emphasize the localized distribution of variants and the high identity at each site. However, we recognize that it does not accurately reflect the role of homogenization and this conclusion cannot be directly inferred from the figure as presented. Therefore, we have removed the sentence in the text.

      The evidence of gBGC in rRNA genes in great apes does not help explain the observed accelerated evolution of rDNA relative to the rest of the genome. Evidence of gBGC has been clearly demonstrated in a variety of species, including mice. It affects not only rRNA genes but also most parts of the genome, particularly regions with high recombination rates. In addition, gBGC increases the fixation probability of W>S mutations but suppresses the fixation of S>W mutations, so it is not obvious how gBGC will increase or decrease the molecular evolution rate overall.

      Response 23: We have thoroughly rewritten the last section of Results. The earlier writing has misplaced the emphasis, raising many questions (as stated above). To answer them, we would have to present a new set of equations thus adding unnecessary complexities to the paper. Here is the streamlined and more logical flow of the new section.

      First, Tables 4 and 5 have shown the accelerated evolution of the rRNA genes. We have now shown that rRNA genes do not have higher mutation rates. Below is copied from the revised text:

      “We now consider the evolution of rRNA genes between species by analyzing the rate of fixation (or near fixation) of mutations. Polymorphic variants are filtered out in the calculation. Note that Eq. (3) shows that the mutation rate, m, determines the long-term evolutionary rate, l. Since we will compare the l values between rRNA and single-copy genes, we have to compare their mutation rates first by analyzing their long-term evolution. As shown in Table S1 l falls in the range of 50-60 (differences per Kb) for single copy genes and 40 – 70 for the non-functional parts of rRNA genes. The data thus suggest that rRNA and single-copy genes are comparable in mutation rate. Differences between their l values will have to be explained by other means.”

      Second, we have shown that the accelerated evolution in mice is likely due to genetic drift, resulting in faster fixation of neutral variants. We also show that this is unlikely to be true in humans and chimpanzees; hence selection is the only possible explanation. The section below is copied from the revised text. It shows the different patterns of gene conversions between mice and apes, in agreement with the results of Tables 4 and 5. In essence, it shows that the GC ratio in apes is shifting to a new equilibrium, which is equivalent to a new adaptive peak. Selection is driving the rDNA genes to move to the new adaptive peak.

      Revision - “Thus, the much accelerated evolution of rRNA genes between humans and chimpanzees cannot be entirely attributed to genetic drift. In the next and last section, we will test if selection is operating on rRNA genes by examining the pattern of gene conversion. 

      3) Positive selection for rRNA mutations in apes, but not in mice – Evidence from gene conversion patterns

      For gene conversion, we examine the patterns of AT-to-GC vs. GC-to-AT changes. While it has been reported that gene conversion would favor AT-to-GC over GC-to-AT conversion (Jeffreys and Neumann 2002; Meunier and Duret 2004) at the site level, we are interested at the gene level by summing up all conversions across sites. We designate the proportion of AT-to-GC conversion as f and the reciprocal, GC-to-AT, as g. Both f and g represent the proportion of fixed mutations between species (see Methods). So defined, f and g are influenced by the molecular mechanisms as well as natural selection. The latter may favor a higher or lower GC ratio at the genic level between species. As the selective pressure is distributed over the length of the gene, each site may experience rather weak pressure.

      Let p be the proportion of AT sites and q be the proportion of GC sites in the gene. The flux of AT-to-GC would be pf and the flux in reverse, GC-to-AT, would be qg. At equilibrium, pf = qg. Given f and g, the ratio of p and q would eventually reach p/q \= g/f. We now determine if the fluxes are in equilibrium (pf =qg). If they are not, the genic GC ratio is likely under selection and is moving to a different equilibrium.

      In these genic analyses, we first analyze the human lineage (Brown and Jiricny 1989; Galtier and Duret 2007). Using chimpanzees and gorillas as the outgroups, we identified the derived variants that became nearly fixed in humans with frequency > 0.8 (Table 6). The chi-square test shows that the GC variants had a significantly higher fixation probability compared to AT. In addition, this pattern is also found in chimpanzees (p < 0.001). In M. m. domesticus (Table 6), the chi-square test reveals no difference in the fixation probability between GC and AT (p = 0.957). Further details can be found in Supplementary Figure 2. Overall, a higher fixation probability of the GC variants is found in human and chimpanzee, whereas this bias is not observed in mice.

      Tables 6-7 here

      Based on Table 6, we could calculate the value of p, q, f and g (see Table 7). Shown in the last row of Table 7, the (pf)/(qg) ratio is much larger than 1 in both the human and chimpanzee lineages. Notably, the ratio in mouse is not significantly different from 1. Combining Tables 4 and 7, we conclude that the slight acceleration of fixation in mice can be accounted for by genetic drift, due to gene conversion among rRNA gene copies. In contrast, the different fluxes corroborate the interpretations of Table 5 that selection is operating in both humans and chimpanzees.”

      References

      Arnheim N, Treco D, Taylor B, Eicher EM. 1982. Distribution of ribosomal gene length variants among mouse chromosomes. Proc Natl Acad Sci U S A 79:4677-4680.

      Brown T, Jiricny J. 1989. Repair of base-base mismatches in simian and human cells. Genome / National Research Council Canada = Génome / Conseil national de recherches Canada 31:578-583.

      Cannings C. 1974. The latent roots of certain Markov chains arising in genetics: A new approach, I. Haploid models. Advances in Applied Probability 6:260-290.

      Chen Y, Tong D, Wu CI. 2017. A New Formulation of Random Genetic Drift and Its Application to the Evolution of Cell Populations. Mol Biol Evol 34:2057-2064.

      Chia AB, Watterson GA. 1969. Demographic effects on the rate of genetic evolution I. constant size populations with two genotypes. Journal of Applied Probability 6:231-248.

      Crow JF, Kimura M. 2009. An Introduction to Population Genetics Theory: Blackburn Press.

      Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, et al. 2021. Twelve years of SAMtools and BCFtools. Gigascience 10.

      Datson NA, Morsink MC, Atanasova S, Armstrong VW, Zischler H, Schlumbohm C, Dutilh BE, Huynen MA, Waegele B, Ruepp A, et al. 2007. Development of the first marmoset-specific DNA microarray (EUMAMA): a new genetic tool for large-scale expression profiling in a non-human primate. Bmc Genomics 8:190.

      Der R, Epstein CL, Plotkin JB. 2011. Generalized population models and the nature of genetic drift. Theoretical Population Biology 80:80-99.

      Dover G. 1982. Molecular drive: a cohesive mode of species evolution. Nature 299:111-117.

      Eickbush TH, Eickbush DG. 2007. Finely orchestrated movements: evolution of the ribosomal RNA genes. Genetics 175:477-485.

      Galtier N, Duret L. 2007. Adaptation or biased gene conversion? Extending the null hypothesis of molecular evolution. Trends in Genetics 23:273-277.

      Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, Remington KA, Strausberg RL, Venter JC, Wilson RK, et al. 2007. Evolutionary and Biomedical Insights from the Rhesus Macaque Genome. Science 316:222-234.

      Guarracino A, Buonaiuto S, de Lima LG, Potapova T, Rhie A, Koren S, Rubinstein B, Fischer C, Abel HJ, Antonacci-Fulton LL, et al. 2023. Recombination between heterologous human acrocentric chromosomes. Nature 617:335-343.

      Hartl DL, Clark AG, Clark AG. 1997. Principles of population genetics: Sinauer associates Sunderland.

      Hori Y, Shimamoto A, Kobayashi T. 2021. The human ribosomal DNA array is composed of highly homogenized tandem clusters. Genome Res 31:1971-1982.

      Jeffreys AJ, Neumann R. 2002. Reciprocal crossover asymmetry and meiotic drive in a human recombination hot spot. Nat Genet 31:267-271.

      Karlin S, McGregor J. 1964. Direct Product Branching Processes and Related Markov Chains. Proceedings of the National Academy of Sciences 51:598-602.

      Keane TM, Goodstadt L, Danecek P, White MA, Wong K, Yalcin B, Heger A, Agam A, Slater G, Goodson M, et al. 2011. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477:289-294.

      Krystal M, D'Eustachio P, Ruddle FH, Arnheim N. 1981. Human nucleolus organizers on nonhomologous chromosomes can share the same ribosomal gene variants. Proceedings of the National Academy of Sciences of the United States of America 78:5744-5748.

      Meunier J, Duret L. 2004. Recombination drives the evolution of GC-content in the human genome. Molecular Biology and Evolution 21:984-990.

      Nagylaki T. 1983. Evolution of a large population under gene conversion. Proc Natl Acad Sci U S A 80:5941-5945.

      Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, et al. 2022. The complete sequence of a human genome. Science 376:44-53.

      Ohta T. 1985. A model of duplicative transposition and gene conversion for repetitive DNA families. Genetics 110:513-524.

      Ohta T. 1976. Simple model for treating evolution of multigene families. Nature 263:74-76.

      Ohta T, Dover GA. 1984. The Cohesive Population Genetics of Molecular Drive. Genetics 108:501-521.

      Ohta T, Dover GA. 1983. Population genetics of multigene families that are dispersed into two or more chromosomes. Proc Natl Acad Sci U S A 80:4079-4083.

      Ruan Y, Wang X, Hou M, Diao W, Xu S, Wen H, Wu C-I. 2024. Resolving Paradoxes in Molecular Evolution: The Integrated WF-Haldane (WFH) Model of Genetic Drift. bioRxiv:2024.2002.2019.581083.

      Smirnov E, Chmúrčiaková N, Liška F, Bažantová P, Cmarko D. 2021. Variability of Human rDNA. Cells 10.

      Smith GP. 1976. Evolution of Repeated DNA Sequences by Unequal Crossover. Science 191:528-535.

      Smith GP. 1974a. Unequal crossover and the evolution of multigene families. Cold Spring Harbor symposia on quantitative biology 38:507-513.

      Smith GP. 1974b. Unequal Crossover and the Evolution of Multigene Families.  38:507-513.

      Stults DM, Killen MW, Pierce HH, Pierce AJ. 2008. Genomic architecture and inheritance of human ribosomal RNA gene clusters. Genome Res 18:13-18.

      van Sluis M, Gailín M, McCarter JGW, Mangan H, Grob A, McStay B. 2019. Human NORs, comprising rDNA arrays and functionally conserved distal elements, are located within dynamic chromosomal regions. Genes Dev 33:1688-1701.

      Wall JD, Frisse LA, Hudson RR, Di Rienzo A. 2003. Comparative linkage-disequilibrium analysis of the beta-globin hotspot in primates. Am J Hum Genet 73:1330-1340.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      Summary:

      The authors set out to measure the diffusion of small drug molecules inside live cells. To do this, they selected a range of flourescent drugs, as well as some commonly used dyes, and used FRAP to quantify their diffusion. The authors find that drugs diffuse and localize within the cell in a way that is weakly correalted with their charge, with positively charged molecules displaying dramatically slower diffusion and a high degree of subcellular localization. <br /> The study is important because it points at an important issue related to the way drugs behave inside cells beyond the simple "IC50" metric (a decidedly mesoscopic/systemic value). The authors conclude, and I agree, that their results point to nuanced effects that are governed by drug chemistry that could be optimized to make them more effective. 

      We are grateful to the reviewer for summarizing the work and appreciate him/her pointing out that it is high time to consider the drug aggregation and high degree of subcellular localization while optimizing to make them more effective beyond the mesoscopic value like "IC50".

      Strengths: 

      The work examines an understudied aspect of drug delivery. 

      The work uses well-established methodologies to measure diffusion in cells 

      The work provides an extensive dataset, covering a range of chemistries that are common in small molecule drug design 

      The authors consider several explanations as to the origin of changes in cellular diffusion

      We are grateful to the reviewer for pointing out the strengths of the manuscript.

      Weaknesses: 

      The results are described qualitatively, despite quantitative data that can be used to infer the strength of the proposed correlations. 

      The statistical treatment of the data is not rigorous and not visualized according to best practices, making it difficult for readers to assess the significance of the findings. 

      Some important aspects of drug behavior are not discussed quantitatively, such as the cell-to-cell or subcellular variability in concentration. 

      It is unclear if the observed behavior of each drug in the cell actually relates to its efficacy - though this is clearly beyond the scope of this specific work.

      We have addressed the weaknesses found by the reviewer (see bellow in Reviewer #1 Recommendations For The Authors). Concerning the last point, It would have been indeed very valuable to find a relation between drug's observable behavior and their efficacy, but as the reviewer indicates, it is beyond the scope of this work.

      Reviewer #2 (Public Review): 

      Summary:

      Blocking a weak base compound's protonation increased intracellular diffusion and fractional recovery in the cytoplasm, which may improve the intracellular availability and distribution of weakly basic, small molecule drugs and be impactful in future drug development. 

      We are thankful to the reviewer for summarizing our work and acknowledging that the points raised above can be impactful in future drug development.

      Strengths: 

      (1) The intracellular distribution of drugs and the chemical properties that drive their distribution are much needed in the literature. Thus, the idea behind this paper is of relevance. 

      (2) The study used common compounds that were relevant to others. 

      (3) Altering a compound's pKa value and measuring cytosolic diffusion rates certainly is inciteful on how weak base drugs and their relatively high pKa values affect distribution and pharmacokinetics. This particular experiment demonstrated relevance to drug targeting and drug development. 

      (4) The manuscript was fairly well written. 

      We are thankful to the reviewer for pointing out the strengths of the manuscript like the intracellular distribution of drugs and properties that drive it, which are missing in the literature.

      Weaknesses: 

      (1) Small sample sizes. 2 acids and 1 neutral compound vs 6 weak bases (Figure 1). 

      We fully agree with the reviewer on this point. However, the major limitation we have faced here is the small number of drug/drug-like molecules that fluorescent with sufficient high quantum yields. For this study, we initially screened 1600 drugs for their fluorescence in the visible spectrum, and penetration into cells, resulting in 16 drugs. Of those, a small number was suitable for FRAP due to low quantum yield. For some of the molecules (Mitoxantrone, Priaquine), recovery was minimal, making them challenging to study. We added this information in the materials and method section under “Selection of drugs used in this study” (p.10).

      (2) A comparison between the percentage of neutral and weak base drug accumulation in lysosomes would have helped indicate weak base ion trapping. Such a comparison would have strengthened this study. 

      For weakly basic compounds, the ionic form and the non-ionic form of the molecules always remain in equilibrium. The direction of the equilibrium depends on the pH of the medium, which determines the major form of the drug molecules in the solution. Our examples of GSK3 inhibitor (neutral compound, pka~7.0, as predicted by Chemaxon), shows behaviour very similar to the other basic drugs (pka>8) inside the cells. As lysosome pH is about 5.0, the neutral drug also gets protonated inside the lysosomes, as the colocalization study reveals (Figure 4). We added Fig S16 C-D, where we show co-localization of three drugs within the lysosomes showing that all the three weak base drugs colocalize to acidic lysosomes from moderately to extensively. See also in p. 11 under “Confocal microscopy and FRAP Analysis section”.

      (3) When cytosolic diffusion rates of compounds were measured, were the lysosomes extracted from the image using Imaris to determine a realistic cytosolic value? In real-time, lysosomes move through the cytosol at different rates. Because weak base drugs get trapped, it is likely the movement of a weak base in the lysosome being measured rather than the movement of a weak base itself throughout the cytosol. This was unclear in the methods. Please explain.

      We want to thank the reviewer for pointing this out. To clarify the point, we added to the material and method section in p. 13 the following text: “When the areas of bleach were selected in the drug-treated cell cytoplasm, we avoided the lysosomes as much as possible, within the resolution limits of the confocal microscope. Lysosomes themselves were measured to move within the cytoplasm with an diffusion coefficient of 0.03-0.071 µm2 s−1  (Bandyopadhyay et al., 2014), which is much slower than the diffusion measured for even the slowest compounds using fast Line FRAP, further validating that we did not measure lysosome diffusion.” In addition, we show that in cells after Bafilomycin A1 or Na-Azide treatments the number of lysosomes was reduced drastically (Figures S8& S9, and Figure 7), while the rates of diffusion remain very slow, similar to those measured without lysosomal inhibitors.   

      (4) Because weak base drugs can be protonated in the cytoplasm, the authors need to elaborate on why they thought that inhibiting lysosome accumulation of weak bases would increase cytosolic diffusion rates. Ion trapping is different than "micrometers per second" in the cytosol. Moreover, treating cells with sodium azide de-acidifies lysosomes and acidifies the cytosol; thus, more protons in the cytosol means more protonation of weak base drugs. The diffusion rates were slowed down in the presence of lysosome inhibition (Figure 7), which is more fitting of the story about blocking protonation increases diffusion rates, but in this case, increasing cytosolic protonation via lysosome de-acidification agents decreases diffusion rates. Please elaborate.

      We thank the reviewer for the comment. We added to the results in p. 7 (top) the following “While we selected bleach spots to be small and located outside of lysosomes, this does not assure that some of the bleached area does not include smaller lysosomes. Therefore we investigated whether inhibiting lysosomal trapping will eliminate slow diffusion of cationic drugs.” In addition, we added to the results in p. 7-8 the following: “Comparative FRAP profiles and diffusion coefficients (Figure 7B-D and 7F-H) were slow, but conversely to Bafilomycin, sodium azide treatment did cause a further reduction is rates from Dconfocal 2.4±0.1 µm2s-1  to 1.8±0.1µm2s-1 for quinacrine and from 0.6 to  0.45 µm2s-1 for the GSK3 inhibitor (Figure 7C and G). Both Bafilomycin and sodium azide treatments resulted in elimination of drug confinement in the lysosome, and the small difference in diffusion rates may be a result of the de-acidification of the lysosomes by sodium azide, which may increase the protons in the cytosol upon treatment.”

      Reviewer : A discussion of the likely impact: 

      The manuscript certainly adds another dimension to the field of intracellular drug distribution, but the manuscript needs to be strengthened in its current form. Additional experiments need to be included, and there are clarifications in the manuscript that need to be addressed. Once these issues are resolved, then the manuscript, if the conclusions are further strengthened, is much needed and would be inciteful to drug development.

      Reviewer #1 (Recommendations For The Authors):

      Major issues: 

      The paper suffers from poor statistical treatment of the data. FRAP recovery curves should be shown for each repeat, overlaid by an average with SDs as errorbars or shaded regions shown. In bar plots, SEMs should be eliminated in favor of StdDevs. All datapoints should be shown for each bar in Figs. 3-8. To show differences in D_confocal appropriate statistical tests should be conducted. In addition it is unclear what an "independent repeat" is. Does this mean 30 separate imaging sessions/drug treatments/etc? Is it 30 cells on the same coverslip? Is it a combination of both? All reported errors, SD or SEM, should have a single significant digit. Guidelines and best practices for representing quantitative imaging data are all described and visualized in detail in Lord et al. JBS 2020. 

      We improved the statistics and added the individual progression curves and did the statistics on them as requested. See Figure S2 for individual FRAP curves of fluorescein, GSK3 inhibitor and and quinacrine. Statistical analysis of the individual FRAP curves is in Figure 3B, 4B, 5B, 7C and G. For details see figures legends and material and methods p. 13 in “Determination of Dconfocal from FRAP results”. Line FRAP was done from the cells taken from different plates, treated independently (see text p. 13).   

      The extensive (and commendable!) dataset the authors have collected can be put to better use than what is currently done. The main text figures in the current form of the preprint are mostly descriptive and their discussion is qualitative, to the point where the author's conclusions are supported only anecdotally. Instead, I would much rather see panels that collate the entire dataset (both protein and drugs) numerically, comparing diffusion values in buffer/cytoplasm/nucleus for all drugs (Like Fig. S6, which is in my opinion the most important in the paper but for some reason relegated to the SI). In addition I would like to see correlations within the dataset, such as D_confocal vs. pKa, vs. concentration (as measured by overall fluorescence signal, see my comment below), vs. mw, or vs. specific chemical moieties (number of charges, aromatic rings, etc). Such correlations should be discussed in terms of a correlation coefficient if conclusions were to be drawn from them, and include errors if available. 

      We want to thank the reviewer for these suggestions. We now made new Figures 9, and S16 to compare multiple parameters. Figure 9C shows a clear relation between pKa and Dconfocal, but no relation was found between logP, MW or number of aromatic rings and Dconfocal. Fig. S3 also shows the relation between drug concentration and Dconfocal values. These data are now discussed in the discussion section in p. 9 (bottom). 

      The drug sequestration hypothesis and other conclusions brought forth by the authors could be further tested by looking at the concentration dependence of the drugs inside eachcell and/or its partitioning between different subcellular compartments. The concentration dependence of these drugs is discussed in a very anecdotal fashion using two concentrations - and despite some cases showing an effect no further studies were done. Drug concentrations in this experiment can vary between cells between repeats or even within a single repeat as a result of drug chemistry and delivery methods (microinjection/passive permeability). This is especially important since it is unclear what clinically-relevant concentrations are for each drug (or at least an IC50 for the cell types tested here). I would like to see a quantitative measure of concentrations as another metric to compare diffusion behavior (see my comment above as well). 

      And maybe one thing to consider in addition would be some discussion in the paper about what sub-cellular distributions might actually mean in the context of drug efficacy (asking for myself as well!) - a paragraph describing recent works on the topic with some references could be instructive. 

      We want to thank the reviewer for the suggestion. We added now Figure S3, showing the relation between fluorescence intensity in each cell (which is directly related to the concentration of the compound) and FRAP rates and percent recovery for fluorescein, GSK inhibitor and Quinacrine. The results show now relation between drug concentration and FRAP rates, and some relation towards percent recovery. These data are now discussed in the main text (p. 4 bottor and p.6) and in the discussion (p. 9, bottom).

      Minor issues: 

      Readers could benefit from a schematic showing the line FRAP method. It is difficult to understand from the text.

      We show now in Figure 2 the line-FRAP method, and discuss it in the introduction (p. 3 top).

      Have the authors considered enrichment in the cell membrane? Summed intensity projections or co-labeling with membrane dyes could prove useful to identify if the membrane is enriched in fluorescence.

      The microscopy slides, including the super-resolution image in Figure S15 do not show enrichment of membranes.

      Cell extracts obtained by chemical lysis are problematic because they contain surfactants. This comparison might not be meaningful. 

      The reviewer is correct about surfactants; However, this is only for illustration to show the crowd density of the cell extracts compared to live cells.

      Unclear why "Bleach size" plots are shown. They are not discussed in the main text. 

      We show now a bleach size plot in Figure 2, where we explain the method. We removed them from the other figures.

      Some figure panels have a strange aspect ratio, causing text to look distorted. 

      We corrected the figure distortion in the revised manuscript.

      How are the values of D_confocal in buffer compared with past literature? Should these not all be diffusion limited? BCECF - larger than many of the drugs used here - shows ~ 100 μm^2/s in buffer (Verkman TiBS 2002).

      We discussed this in our previous work (Ref. 13, iscience 2022, Dey et al.) Dconfocal is a relative diffusion rate and should not be confused with single-molecule diffusion coefficients. FRAP cannot measure the diffusion of more than 100 μm^2/s in the buffer. However, when comparing apparent FRAP rates between different fluorophores, it is not quantitative due to the major implication of the bleach radius towards diffusion rates. The rate constant normalized by bleach radius^2 is the proper way to compare i.e., our Dconfocal. (Ref. JMB 2021, iScience 2022 by Dey et al.).

      Reviewer #2 (Recommendations For The Authors): 

      Recommendations: 

      (1) Page 3 at the bottom of the Introduction states, "...sodium azide (Hiruma et al., 2007) inhibited accumulation in lysosomes, cellular diffusion...increased only slightly." However, Figure 7C, F shows a sodium azide-induced decrease in the Dconfocal cellular diffusion. Please clarify.

      Thank you for pointing this out; we corrected it in the revised version, including adding statistics.

      (2) Page 6 states, "Quinacrine accumulation in the lysosome was observed also immediately after micro-injection, with aggregation increasing over time. Dconfocal of 4.2{plus minus}0.2 µm2 s-1 was calculated from line-FRAP immediately after micro-injection, slowing to 2.2{plus minus}0.1 µm2 s-1 following 2 hours incubations, with fractional recoveries of 0.63 and 0.57 respectively." If lysosome sequestration does not have an effect on cytosolic diffusion rates as the manuscript concludes, why do the authors think the diffusion rate decreased here within 2 hours? A solid conclusion would strengthen the conclusions of this manuscript rather than passing over it.

      Thank you for pointing this out. We added the following text to page 7: “It is notable that the Dconfocal for Quinacrine remained consistent regardless of Bafilomycin treatment, 2 hours after incubation (Fig. S9D, 2.4±0.1 µm2s-1). However, when measured immediately after injection, the diffusion coefficient was higher at 4.2 µm2s-1 (Fig. S5D). This result does not support the notion that the faster diffusion measured immediately after cellular injection relates to lysosomal aggregation, and would better support self-aggregation, or aggregation with other molecules in the cell, which increases over time. This notion is further supported by the almost complete lack in FRAP observed 24 hours after injection (Fig. S5C).”

      (3) In the Results section, the subheading states, "Inhibition of lysosomal sequestration is only slightly increasing diffusion in cells", but the conclusion for bafilomycin was...Dconfocal values were not altered by Bafilomycin A1", and the conclusion for sodium azide was diffusion coefficients (Figure 7B-C and 7E-F) were not much changed for the two drugs and stayed low... similarly to what was observed with Bafilomycin." The clear question is what is the result, "slightly increased diffusion, decreased diffusion, or had no significant effect at all"? Please clarify the wording in the manuscript to accurately describe the results. 

      Indeed, a small difference is obsevered between the two treatments. We added now statistical significance to Fig. 7D and H and to Fig. S8 and S9. In addition, we clarified this point in the text in p.7-8: “Comparative FRAP profiles and diffusion coefficients (Figure 7B-D and 7F-H) were slow, but conversely to Bafilomycin, sodium azide treatment did cause a further reduction is rates from Dconfocal 2.4±0.1 µm2s-1  to 1.8±0.1µm2s-1 for quinacrine and from 0.6 to  0.45 µm2s-1 for the GSK3 inhibitor (Figure 7C and G). Both Bafilomycin and sodium azide treatments resulted in elimination of drug confinement in the lysosome, and the small difference in diffusion rates may be a result of the de-acidification of the lysosomes by sodium azide, which may increase the protons in the cytosol upon treatment.”

      (4) In Figure 8B, why was the Dconfocal for AM-fluorescein with or without sodium azide not included here? Besides consistency, the results might demonstrate significance. Please elaborate on the occlusion of this data. 

      Fraction recovery after FRAP of AM-fluorescein was very low. Calculating Dconfocal rates with such low fraction recovery is meaningless, as in the time of measurement only a small fraction recovered. Therefore, we calculated Dconfocal only when fraction recovery was at least 0.5.

      (5) Throughout the Results section, the ideas and experiments are of relevance, but the suggestions/conclusions at the end of each paragraph of this section seem lightly thought out. For example, as stated on Page 8, "...however, this did not contribute new information to the puzzle." For a chemistry paper, a chemical suggestion strengthens the manuscript. 

      We want to thank the reviewer for these suggestions. We now made new Figures 9, and S16 to compare multiple parameters. Figure 9C shows a clear relation between pKa and Dconfocal, but no relation was found between logP, MW or number of aromatic rings and Dconfocal. Fig. S16 also shows the relation between drug concentration and Dconfocal values. We revised the discussion section to giver more weith to these quantitative assessments. These data are now discussed in p. 9.

      In conclusion, the manuscript's ideas are needed, but the conclusions drawn from the experiments need to be strengthened, more explanatory, and consistent with the main conclusion of the manuscript.

      See answer to point 5.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This useful manuscript shows a set of interesting data including the first cryo-EM structures of human PIEZO1 as well as structures of disease-related mutants in complex with the regulatory subunit MDFIC, which generate different inactivation phenotypes. The molecular basis of PIEZO channel inactivation is of great interest due to its association with several pathologies. This manuscript provides some structural insights that may help to ultimately build a molecular picture of PIEZO channel inactivation. While the structures are of use and clear conformational differences can be seen in the presence of the auxiliary subunit MDFIC, the strength of the evidence supporting the conclusions of the paper, especially the proposed role for pore lipids in inactivation, is incomplete and there is a lack of data to support them.

      We thank the editors and reviewers for taking the time and effort to review our manuscript.  The evidence supporting the key role of pore lipids in hPIEZO1 activation is as follows. i. Compared with wild-type hPIEZO1, the hydrophobic acyl chain tails of the pore lipids retracted from the hydrophobic pore region in slower inactivating mutant hPIEZO1-A1988V (Fig. 7a-b). ii. Previous electrophysiological functional studies revealed that substituting this hydrophobic pore formed by I2447, V2450, and F2454 with a hydrophilic pore prolongs the inactivation time for both PIEZO1 and PIEZO2 channels (PMID: 30628892). iii. In the structure of the HX channelopathy mutant R2456H, the interaction between the hydrophilic phosphate group head of pore lipids and R2456 is disrupted, remodeling the blade and pore module and resulting in a significantly slow-inactivating rate. iv. The interaction between pore lipids and lipidated-MDFIC stabilizes the pore lipids to reseal the pore upon activation of the hPIEZO1-MDFIC complex.

      According to previously proposed models for the role of pore lipids in mechanosensitive ion channels, such as MscS (PMID: 33568813), MS K2P (PMID: 25500157) and OSCA channels (PMID: 37402734), the pore lipids seal the channel pores in closed state and could be removed in open state by mechanical force induced membrane deformation, which obeys the force-from-lipids principle. Therefore, in our putative model, the pore lipids seal the hydrophobic pore of hPIEZO1 in the closed state. Upon activation of hPIEZO1, the pore lipids retract from the hydrophobic pore and interact with multi-lipidated MDFIC, stabilizing in the inactivation state. The mild channelopathy mutants make the pore lipids retract from the hydrophobic pore and harder to close upon activation. For the severe channelopathy mutant, the interaction between the pore lipids and R2456 is disrupted, resulting in the missing of pore lipids and significantly slow-inactivating. We fully understand the concern of the role of pore lipids in our proposed model. Therefore, we have toned down our putative model.

      Public Reviews:  

      Reviewer #1 (Public review):  

      Summary:  

      This manuscript by Shan, Guo, Zhang, Chen et al., shows a raft of interesting data including the first cryo-EM structures of human PIEZO1. Clearly, the molecular basis of PIEZO channel inactivation is of great interest and as such this manuscript provides some valuable extra information that may help to ultimately build a molecular picture of PIEZO channel inactivation. However, the current manuscript though does not provide any compelling evidence for a detailed mechanism of PIEZO inactivation.

      Strengths:

      This manuscript documents the first cryo-EM structures of human PIEZO1 and the gain of function mutants associated with hereditary anaemia. It is also the first evidence showing that PIEZO1 gain of function mutants are also regulated by the auxiliary subunit MDFIC.

      We thank reviewer #1 for the encouragement.

      Weaknesses:

      While the structures are interesting and clear differences can be seen in the presence of the auxiliary subunit MDFIC the major conclusions and central tenets of the paper, especially a role for pore lipids in inactivation, lack data to support them. The post-translational modification of PIEZOser# auxiliary subunit MDFIC is not modelled as a covalent interaction.

      We fully understand the concern of the role of pore lipids in our proposed model. Therefore, we have toned down our putative model.

      The lipids densities of the post-transcriptional modification of PIEZO1 auxiliary subunit MDFIC are shown below. As the lipids densities are not confident, we only use the single-chain lipids to represent them. And the lipidated MDFIC is proven by the MDFIC identification paper.

      Author response image 1.

      Reviewer #2 (Public review):

      Summary:

      Mechanically activated ion channels PIEZOs have been widely studied for their role in mechanosensory processes like touch sensation and red blood cell volume regulation. PIEZO in vivo roles are further exemplified by the presence of gain-of-function (GOF) or loss-of-function (LOF) mutations in humans that lead to disease pathologies. Hereditary xerocytosis (HX) is one such disease caused due to GOF mutation in Human PIEZO1, which are characterized by their slow inactivation kinetics, the ability of a channel to close in the presence of stimulus. But how these mutations alter PIEZO1 inactivation or even the underlying mechanisms of channel inactivation remains unknown. Recently, MDFIC (myoblast determination family inhibitor proteins) was shown to directly interact with mouse PIEZO1 as an auxiliary subunit to prolong inactivation and alter gating kinetics. Furthermore, while lipids are known to play a role in the inactivation and gating of other mechanosensitive channels, whether this mechanism is conserved in PIEZO1 is unknown. Thus, the structural basis for PIEZO1 inactivation mechanism, and whether lipids play a role in these mechanisms represent important outstanding questions in the field and have strong implications for human health and disease.

      To get at these questions, Shan et al. use cryogenic electron microscopy (Cryo-EM) to investigate the molecular basis underlying differences in inactivation and gating kinetics of PIEZO1 and human disease-causing PIEZO1 mutations. Notably, the authors provide the first structure of human PIEZO1 (hPIEZO1), which will facilitate future studies in the field. They reveal that hPIEZO1 has a more flattened shape than mouse PIEZO1 (mPIEZO1) and has lipids that insert into the hydrophobic pore region. To understand how PIEZO1 GOF mutations might affect this structure and the underlying mechanistic changes, they solve structures of hPIEZO1 as well as two HXcausing mild GOF mutations (A1988V and E756del) and a severe GOF mutation (R2456H). Unable to glean too much information due to poor resolution of the mutant channels, the authors also attempt to resolve MCFIC-bound structures of the mutants. These structures show that MDFIC inserts into the pore region of hPIEZO1, similar to its interaction with mPIEZO1, and results in a more curved and contracted state than hPIEZO1 on its own. The authors use these structures to hypothesize that differences in curvature and pore lipid position underlie the differences in inactivation kinetics between wild-type hPIEZO1, hPIEZO1 GOF mutations, and hPIEZO1 in complex with MDFIC.

      Strengths:

      This is the first human PIEZO1 structure. Thus, these studies become the stepping stone for future investigations to better understand how disease-causing mutations affect channel gating kinetics.

      We thank reviewer #2 for the positive comments.

      Weaknesses:

      Many of the hypotheses made in this manuscript are not substantiated with data and are extrapolated from mid-resolution structures.

      We fully understand the concern of the role of pore lipids in our proposed model. Therefore, we have toned down our putative model.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors used structural biology approaches to determine the molecular mechanism underlying the inactivation of the PIEZO1 ion channel. To this end, the authors presented structures of human PIEZO1 and its slow-inactivating mutants. The authors also determined the structures of these PIEZO1 constructs in complexes with the auxiliary subunit MDFIC, which substantially slows down PIEZO1 inactivation. From these structures, the authors suggested an anti-correlation between the inactivation kinetics and the resting curvature of PIEZO1 in detergent. The authors also observed a unique feature of human PIEZO1 in which the lipid molecules plugged the channel pore. The authors proposed that these lipid molecules could stabilize human PIEZO1 in a prolonged inactivated state.

      We thank reviewer #3 for the summary.

      Strengths:

      Notedly, this manuscript reported the first structures of a human PIEZO1 channel, its channelopathy mutants, and their complexes with MDFIC. The evidence that lipid molecules could occupy the channel pore of human PIEZO1 is solid. The authors' proposals to correlate PIEZO1 resting curvature and pore-resident lipid molecules with the inactivation kinetics are novel and interesting.

      Thanks for the positive comments.

      Weaknesses:

      However, in my opinion, additional evidence is needed to support the authors' proposals.

      (1) The authors determined the apo structure of human PIEZO1, which showed a more flattened architecture than that of the mouse PIEZO1. Functionally, the inactivation kinetics of human PIEZO1 is faster than its mouse counterpart. From this observation (and some subsequent observations such as the complex with MDFIC), the authors proposed the anti-correlation between curvature and inactivation kinetics. However, the comparison between human and mouse PIEZO1 structure might not be justified. For example, the human and mouse structures were determined in different detergent environments, and the choice of detergent could influence the resting curvature of the PIEZO structures.

      We apologize for the misleading statement about the anti-correlation between curvature and inactivation kinetics of PIEZOs. We cannot conclude that the observation of curvature variation of mPIEZO1 and hPIEZO1 is related to their inactivation kinetics based on structural studies and electrophysiological assay. The difference in structural basis between mPIEZO1 and hPIEZO1 is what we want to state. To avoid this misleading, we have revised the manuscript. 

      For the concern about detergent, we cannot fully exclude its influence on the curvature of PIEZOs. However, previously reported structures of mPiezo1 (PDB: 7WLT, 5Z10, 6B3R) were in the different detergent environments or in lipid bilayer, but the curvature of mPiezo1 is similar as shown below. Considering the high sequence similarity between mPiezo1 and hPiezo1, we hypothesize that the curvature of both hPiezo1 and mPiezo1 may be unaffected by the detergent.

      Author response image 2.

      Overall structural comparison of curved mPIEZO1 in the lipid bilayer (PDB: 7WLT), mPiezo1 in CHAPS (PDB: 6B3R) and mPiezo1 in Digitonin (PDB: 5Z10).

      (2) Related to point 1), the 3.7 Å structure of the A1988V mutant presented by the authors showed a similar curvature as the WT but has a slower inactivating kinetics.

      Based on the structural comparison between hPIEZO1 and its A1998V mutant, the retraction of pore lipids from the hydrophobic center pore in hPIEZO1-A1998V is mainly responsible for its slower inactivating kinetics.

      (3) Related to point 1), the authors stated that human PIEZO1 might not share the same mechanism as mouse PIEZO1 due to its unique properties. For example, MDFIC only modifies the curvature of human PIEZO1, and lipid molecules were only observed in the pore of the human PIEZO1. Therefore, it may not be justified to draw any conclusions by comparing the structures of PIEZO1 from humans and mice.

      Thanks for the constructive suggestion. To avoid this misleading, we have revised the manuscript.

      (4) Related to point 1), it is well established that PIEZO1 opening is associated with a flattened structure. If the authors' proposal were true, in which a more flattened structure led to faster inactivation, we would have the following prediction: more opening is associated with faster inactivation. In this case, we would expect a pressure-dependent increase in the inactivation kinetics.

      Could the authors provide such evidence, or provide other evidence along this direction?

      We appreciate the reviewer’s comment. We are not claiming a relationship between the flattened structure and activation/inactivation. We only present the results of the structure of wild-type/mutant PIEZO1.

      (5) In Figure S2, the authors showed representative experiments of the inactivation kinetics of PIEZO1 using whole-cell poking. However, poking experiments have high cell-to-cell variability.

      The authors should also show statics of experiments obtained from multiple cells.

      We have shown the statics of representative electrophysiology experiments obtained from multiple cells in Figure S2.

      (6) In Figure 2 and Figure 5, when the authors show the pore diameter, it could be helpful to also show the side chain densities of the pore lining residues.

      We appreciate the reviewer’s suggestion. The side chain of the pore lining restricted residues have been shown in Figure 2 and Figure 5 and the densities of pore domain have been shown in Figure S4 and S14. Interestingly, the pore lining restricted residues in mPIEZO1 and hPIEZO1 is highly conserved.

      (7) The authors observed pore-plugging lipids in slow inactivating conditions such as channelopathy mutations or in complex with MDFIC. The authors propose that these lipid molecules stabilize a "deep resting state" of PIEZO1, making it harder to open and harder to inactivate once opened. This will lead to the prediction that the slow-inactivating conditions will lead to a higher activation threshold, such as the mid-point pressure in the activation curve. Is this true?

      Yes, it is true. In Figure S2, the MDFIC-induced slow-inactivation conditions in hPIEZO1-MDFIC, hPIEZO1-A1988V-MDFIC, hPIEZO1-E756del-MDFIC and hPIEZO1-R2456H-MDFIC result in larger half-activation thresholds than hPIEZO1, hPIEZO1-A1988V, hPIEZO1-E756del and hPIEZO1-R2456H, respectively.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I document the major issues below:

      (1) Mouse vs Human inactivation

      Line 21- "than the slower inactivating curved mouse PIEZO1 (mPIEZO1)."

      Where is the data in this paper or any other paper that human PIEZO1 inactivates faster than mouse PIEZO1? This is central to the way the authors present the paper. In fact, the tau quoted for the hPIEZO1 of ~10 ms is similar to that often measured for mPIEZO1. The reference in the discussion for mouse vs human inactivation times is a review of mechanotransduction. Either the authors need to directly compare the tau of mP1 vs hP1 or quote the relevant primary literature if it exists.

      As measured in HEK-PIKO cells transfected with mPiezo1, the inactivation time of mPiezo1 is 13 ± 1 ms (PMID: 29261642) at -80 mV. 

      The tau is also voltage-dependent. The tau is beyond 20 ms at -60 mV for mPIEZO1 (PMID:

      20813920) and for hPIEZO1 is still around 10 ms.

      (2) MDFIC-lipidation

      Without seeing the PDB or EMDB I can't guarantee this but from Figure 6d it seems like the Sacylation in the distal C-terminus of MDFIC is not modelled as a covalent interaction, these lipids are covalently added to the Cys residues in S-acylation via zDHHC enzymes. This should be modelled correctly.

      Thanks for this suggestion. As the lipid densities of the post-transcriptional modification of PIEZOs auxiliary subunit MDFIC are not confident, we only use the single-chain lipids to represent them.

      And the lipidated MDFIC is proven by the MDFIC identification paper (PMID: 37590348).

      (3) Pore lipids and inactivation

      The lipids close to the pore are interesting and the density for a lipid is also seen in the mouse MDFIC-PIEZO1 complex from Zhou, Ma et al, 2023. However, there is no data provided by the authors that the lipid is functionally relevant to anything. There is not even a correlation with inactivation in Figure 7. P1+MDFIC inactivates slowest yet the lipids are present within the pore. Second, there is no evidence for what these structures are: closed, or inactivated? In fact, the Xiao lab is now interpreting the 7WLU structure as inactivated.

      The evidence supporting the key role of pore lipids in hPIEZO1 activation is as follows. i. Compared with wild-type hPIEZO1, the hydrophobic acyl chain tails of the pore lipids retracted from the hydrophobic pore region in slower inactivating mutant hPIEZO1-A1988V (Fig. 7a-b). ii. Previous electrophysiological functional studies revealed that substituting this hydrophobic pore formed by I2447, V2450, and F2454 with a hydrophilic pore prolongs the inactivation time for both PIEZO1 and PIEZO2 channels (PMID: 30628892). iii. In the structure of the HX channelopathy mutant R2456H, the interaction between the hydrophilic phosphate group head of pore lipids and R2456 is disrupted, remodeling the blade and pore module and resulting in a significantly slow-inactivating rate. iv. The interaction between pore lipids and lipidated-MDFIC stabilizes the pore lipids to reseal the pore upon activation of the hPIEZO1-MDFIC complex. Overall, the pore lipid is involved in inactivation, and we have toned down the statement.

      (4) Cytosolic plug

      There is additional cytosolic density for the human PIEZO1 that the authors intimate could be from a different binding partner. IS it possible to refine this density? Is it from the PIEZO1-tag? At the very least a little more information about this density should be given if it is going to be mentioned like this.

      Our purification result shows that the protein is tag-free. We are also curious about the extra cytosolic density, but we do not know what it is.

      (5) Reduced sensitivity of PIEZO1 in the presence of MDFIC and its regulatory mechanism

      This was reported in the first article however no data is presented by the authors to support MDFIC increasing the mechanical energy required to open PIEZO1. The sentence in the discussion; "MDFIC enables hPIEZO1 to respond to different forces by modifying the pore module through lipid interactions." is not supported by any functional data and seems to be an over-interpretation of the structures.

      We appreciate this suggestion. The half-activation threshold of hPEIZO1 and hPEIZO1-MDFIC is measured to be 7 μm and 9 μm, respectively (Fig.S2). In addition, the mechanical currents amplitude of hPIEZO1-MDFIC is extremely small compared to that of WT reaching the nA level (Fig.S2). Therefore, the less mechanosensitive hPIEZO1-MDFIC may require more mechanical energy to open than PIEZO1 WT.

      6) Both referencing of the PIEZO1 literature and prose could be improved.

      Thanks for the suggestion. We have improved the referencing and prose.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors speculate that the difference in curvature between human and mouse PIEZO1 results in its fast inactivation but do not provide experimental evidence to support this idea. This claim would have been bolstered by showing that the GOF human mutations have a more curved structure, but these proved too structurally unstable to be solved at high resolution. However, the authors state that the 3.7 angstrom map solved for hPIEZO1-A1988V does have an overall similar architecture as wild-type hPIEZO1; thus, contradicting their hypothesis.

      We apologize for the misleading statement. In our revised manuscript, we do not claim a relationship between the flattened structure and activation/inactivation. We only present the results of the structure of wild-type/mutant PIEZO1.

      The structure comparison between the A1988V mutant and WT shows a similar architecture but a different occupancy pattern of pore lipids. Therefore, we suggested that the A1988V mutant has slightly slower inactivation kinetics, mainly due to the exit of pore lipids from the pore.

      (2) The authors show that interaction with MDFIC alters hPIEZO1 structure to be more curved and use this to support their idea that changing the curvature of the protein underlies the prolonged inactivation kinetics. It has been previously shown that MDFIC does not change the structure of mPIEZO1 but does alter its inactivation and gating kinetics. How does this discrepancy fit into the inactivation model proposed by the authors? Similarly, their claim that MDFIC slows hPIEZO1 inactivation and weakens mechanosensitivity just by affecting the pore module and changing blade curvature is made based on observation and no experimental data to test it.

      We have revised the manuscript to avoid misleading the relationship between the curvature and the inaction kinetics of hPIEZO1. The evidence reported previously that substitution of the hydrophobic pore, formed by I2447, V2450, and F2454, with a hydrophilic pore prolongs the inactivation time for both PIEZO1 and PIEZO2 channels (PMID: 30628892). In addition, the severe HX channelopathy mutant R2456H, wherein the interaction between the hydrophilic phosphate group head and R2456 is disrupted, leads to remodeling of the blade and pore module. Indeed, our observation is limited and further experiments will be performed to support our model.

      (3) How does their model fit in cell types that have PIEZO1 (or GOF mutant PIEZO1) but not MDFIC?

      In cell types that have PIEZO1 or GOF mutant PIEZO1 but not MDFIC, PIEZO1 or GOF mutant PIEZO1 may have a faster inactivation rate than those that bind to MDFIC. It can be proved that overexpressed PIEZOs exhibit faster inactivation kinetics than those in some native cell types with MDFIC expression (PMID: 20813920, 30132757).

      (4) Figure S2 is missing quantification of the electrophysiology data. The authors should show summary data in addition to their representative traces including the Imax for all conditions, tau for data shown in b, and sample size for all conditions, and related statistics. The text claims that MDFIC decreases mechanosensitivity (line 156) but there is no data to support this.

      For the electrophysiological assay in Figure S2, we referred to previously reported mPIEZO1 mutants (PMID: 23487776, 28716860). We confirmed that the slower inactivation phenotypes of these mutations of hPIEZO1 are similar to those of mPIEZO1.

      The half-activation threshold of hPEIZO1 and hPEIZO1-MDFIC is measured to be 7 μm and 9 μm, respectively. This tendency of increased half-activation threshold of hPIEZO1 upon binding with MDFIC is also shown in the electrophysiological result of hPIEZO1 channelopathy mutants.

      (5) In line 144, the authors mention that they were able to validate the MDFIC density with multilipidated cysteines on the C-terminal amphipathic helix, but they do not show the density with fitted lipids. While individual densities for some of the lipids are shown in extended Figure 12, it would be helpful to include a figure where they show the map for MDFIC with fitted lipids in it.

      Thanks for the valuable suggestion. As the lipid densities of the post-transcriptional modification of PIEZOs auxiliary subunit MDFIC are not confident, we only use the single-chain lipids to represent them. And the lipidated MDFIC is proven by the MDFIC identification paper.

      (6) The authors show that R2456 interacts with a lipid at the pore module and hypothesize that this underlies the fast inactivation of hPIEZO1. While they did not obtain a high-resolution structure of this mutant, this hypothesis could be tested by substituting R for side chains with different charges and performing electrophysiology to determine the effects on inactivation.

      Thanks for the constructive suggestion. We will perform the electrophysiology assay for R2456 mutants with different side chains.

      7) Figure 4 shows overall structure of hPIEZO1 GOF mutations A1988V and E756del in complex with MDFIC. Other than showing an overall similar structure to wildtype hPIEZO1, the authors do not show how the human mutations A1988V alter the structure of the protein at the site of change. Understanding how these mutations affect the local architecture of the protein has important relevance for human physiology.

      As the GOF channelopathy mutant hPIEZO1-A1988V is structurally unstable, the density at the site of A1988V is too weak to figure out the related interaction in the structure of the hPIEZO1-A1988V mutant. 

      Minor comment:

      In general, the manuscript will benefit from heavy copy editing. For example, the word cartoon is misspelled in many of the figure legends.

      We apologize for the mistake. The manuscript has been checked and revised.

      Reviewer #3 (Recommendations for the authors):

      Some portions of this manuscript were not well written. For example, at the end of the 3rd paragraph in the introduction, the authors talked about HX mutations and their correlation with malaria infection and plasma iron. This is irrelevant information and will only distract the readers. It would be ideal if the authors could go through the entire manuscript and improve its clarity.

      Thanks for the suggestion. We have revised the sentences about HX mutations as suggested and improved the entire manuscript.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Mehmet Mahsum Kaplan et al. demonstrate that Meis2 expression in neural crest-derived mesenchymal cells is crucial for whisker follicle (WF) development, as WF fails to develop in wnt1-Cre;Meis2 cKO mice. Advanced imaging techniques effectively support the idea that Meis2 is essential for proper WF development and that nerves, while affected in Meis2 cKO, are dispensable for WF development and not the primary cause of WF developmental failure. The study also reveals that although Meis2 significantly downregulates Foxd1 in the mesenchyme, this is not the main reason for WF development failure. The paper presents valuable data on the role of mesenchymal Meis2 in WF development. However, further quantification and analysis of the WF developmental phenotype would be beneficial in strengthening the claim that Meis2 controls early WF development rather than causing a delay or arrest in development. A deeper sequencing data analysis could also help link Meis2 to its downstream targets that directly impact the epithelial compartment.

      Strengths:

      (1) The authors describe a novel molecular mechanism involving Mesenchymal Meis2 expression, which plays a crucial role in early WF development.

      (2) They employ multiple advanced imaging techniques to illustrate their findings beautifully.

      (3) The study clearly shows that nerves are not essential for WF development.

      We thank the reviewer for valuable comments that will help improve our study.

      Weaknesses:

      (1) The authors claim that Meis2 acts very early during development, as evidenced by a significant reduction in EDAR expression, one of the earliest markers of placode development. While EDAR is indeed absent from the lower panel in Figure 3C of the Meis2 cKO, multiple placodes still express EDAR in the upper two panels of the Meis2 cKO. The authors also present subsequent analysis at E13.3, showing one escaped follicle positive for SHH and Sox9 in Figures 1 and 3. Does this suggest that follicles are specified but fail to develop? Alternatively, could there be a delay in follicle formation? The increase in Foxd1 expression between E12.5 and E13.5 might also indicate delayed follicle development, or as the authors suggest, follicles that have escaped the phenotype. The paper would significantly benefit from robust quantification to accompany their visual data, specifically quantifying EDAR, Sox9, and Foxd1 at different developmental stages. Additionally, analyzing later developmental stages could help distinguish between a delay or arrest in WF development and a complete failure to specify placodes.

      The earliest DC (Foxd1) and placodal (EDAR, Lef1) markers tested in this study were observed only in the escaped WFs whereas these markers were missing in expected WF sites in mutants. This was also reflected in the loss of typical placodal morphology in the mutant’s epithelium. On the other hand, escaped WFs developed normally as shown by the analysis in Supp Fig 1A-B showing their normal size. These data suggest that development of escaped WFs is not delayed because they would appear smaller in size. To strengthen this conclusion, we will analyze whiskers at E18.5 in Meis2 cKO mice by staining Edar, Foxd1, Sox9 and/or Lef1 in revision and results will be added in the revised manuscript. Two-week time for this provisional response is too short to gather all these data. As far as quantification is concerned, we have already quantified the number of whiskers in controls and mutants at E12.5 and E13.5 in all whole mount experiments we did, i.e. Shh ISH and Sox9 or EDAR whole mount IFC. We pooled all these numbers together and calculated the whisker number reduction to 5.7+/-2.0% at E12.5 and 17.1+/-5.9 at E13.5 (page 3, row 114). We will also quantify the whisker number at E15.5 and E18.5 in the revised manuscript.

      (2) The authors show that single-cell sequencing reveals a reduction in the pre-DC population, reduced proliferation, and changes in cell adhesion and ECM. However, these changes appear to affect most mesenchymal cells, not just pre-DCs. Moreover, since E12.5 already contains WFs at different stages of development, as well as pre-DCs and DCs, it becomes challenging to connect these mesenchymal changes directly to WF development. Did the authors attempt to re-cluster only Cluster 2 to determine if a specific subpopulation is missing in Meis2 cKO? Alternatively, focusing on additional secreted molecules whose expression is disrupted across different clusters in Meis2 cKO could provide insights, especially since mesenchymal-epithelial communication is often mediated through secreted molecules. Did the authors include epithelial cells in the single-cell sequencing, can they look for changes in mesenchyme-epithelial cell interactions (Cell Chat) to indicate a possible mechanism?

      We agree with the reviewer that the effect of Meis2 on cell proliferation and expression of cell adhesion and ECM markers are more general because they take place in the whole underlying mesenchyme. Our genetic tools did not allow specific targeting of DC or pre-DCs. Nonetheless, we trust that our data show that mesenchymal Meis2 is required for the initial steps of WF development including Pc formation. As far as bioinformatics data are concerned, this data set was taken from the large dataset GSE262468 covering the whole craniofacial region which led to very limited cell numbers in the cluster 2 (DC): WT_E12_2 --> 28, WT_E13_2 --> 131, MUT_E12_2 --> 19, MUT_E13_2 --> 28. Unfortunately, such small cell numbers did not allow further sub-clustering, efficient normalization, integration and conclusions from their transcriptional profiles. Although a number of interesting differentially expressed genes were identified (see supplementary datasets), none of them convincingly pointed at reasonable secreted molecule candidate.  

      We agree with the reviewer that cellchat analysis could provide robust indication of the mesenchymal-epithelial communication, however our datasets included only mesenchymal cell population (Wnt1-Cre2progeny) and epithelial cells were excluded by FACS prior to sc RNA-seq. (Hudacova et al. https://doi.org/10.1016/j.bone.2024.117297)

      (3) The authors aim to link Meis2 expression in the mesenchyme with epithelial Wnt signaling by analyzing Lef1, bat-gal, Axin1, and Wnt10b expression. However, the changes described in the figures are unclear, and the phenotype appears highly variable, making it difficult to establish a connection between Meis2 and Wnt signaling. For instance, some follicles and pre-condensates are Lef1 positive in Meis2 cKO. Including quantification or providing a clearer explanation could help clarify the relationship between mesenchymal Meis2 and Wnt signaling in both epidermal and mesenchymal cells. Did the authors include epithelial cells in the sequencing? Could they use single-cell analysis to demonstrate changes in Wnt signaling?

      We have now analyzed changes in Lef1 staining intensity in the epithelium and in the upper dermis. According to these quantifications, we observed a considerable decline in the number of Lef1+ placodes in the epithelium which corresponds to the lower number of placodes. On the other hand, Lef1 intensity in the ‘escaped’ placodes were similar between controls and mutants. Lef1 signal in the upper dermis is very strong overall and its quantification did not reveal any changes in the DC and non-DC region of the upper dermis. These data corroborate with our coclusion that Meis2 in the mesenchyme is not crucial for the dermal Wnt signaling but is required for induction of Lef1 expression in the epithelium. However, once ‘escaper’ placodes appear, they display normal wnt signaling in Pc, DC and subsequent development. These quantification data will be added to the revised manuscript.

      (4) Existing literature, including studies on Neurog KO and NGF KO, as well as the references cited by the authors, suggest that nerves are unlikely to mediate WF development. While the authors conduct a thorough analysis of WF development in Neurog KO, further supporting this notion, this point may not be central to the current work. Additionally, the claim that Meis2 influences trigeminal nerve patterning requires further analysis and quantification for validation.

      We agree with the reviewer that analysis of the Neurogenin knockout mice should not be central to this report. Nonetheless, a thorough analysis of WF development in Neurog1 KO was needed to distinguish between two possible mechanisms: whisker phenotype in Meis2 cKO results from 1. impaired nerve branching 2. Function of Meis2 in the mesenchyme. We will modify the text accordingly to make this clearer to readers. We also agree that nerve branching was not extensively analyzed in the current study but two samples from mutant mice were provided (Fig1 and Supp Videos), reflecting the consistency of the phenotype (see also Machon et al. 2015). This section was not central to this report either but led us to focus fully on the mesenchyme. We think that Meis2 function in cranial nerve development is very interesting and deserves a separate study.

      (5) Meis2 expression seems reduced but has not entirely disappeared from the mesenchyme. Can the authors provide quantification?

      In the revised manuscript, we will provide wt/mut quantification of Meis2 expression in the dermis.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Kaplan et al. study mesenchymal Meis2 in whisker formation and the links between whisker formation and sensory innervation. To this end, they used conditional deletion of Meis2 using the Wnt1 driver. Whisker development was arrested at the placode induction stage in Meis2 conditional knockouts leading to the absence of expression of placodal genes such as Edar, Lef1, and Shh. The authors also show that branching of trigeminal nerves innervating whisker follicles was severely affected but that whiskers did form in the complete absence of trigeminal nerves.

      Strengths:

      The analysis of Meis2 conditional knockouts convincingly shows a lack of whisker formation and all epithelial whisker/hair placode markers were analyzed. Using Neurog1 knockout mice, the authors show equally convincingly that whiskers and teeth develop in the complete absence of trigeminal nerves.

      We thank the reviewer for valuable comments that will help improve our study.

      Weaknesses:

      The manuscript does not provide much mechanistic insight as to why mesenchymal Meis2 leads to the absence of whisker placodes. Using a previously generated scRNA-seq dataset they show that two early markers of dermal condensates, Foxd1 and Sox2, are downregulated in Meis2 mutants. However, given that placodes and dermal condensates do not form in the mutants, this is not surprising and their absence in the mutants does not provide any direct link between Meis2 and Foxd1 or Sox2. (The absence of a structure evidently leads to the absence of its markers.)

      We apologize for unclear explanation of our data. We meant that Meis2 is functionally upstream of Foxd1 because Foxd1 is reduced upon Meis2 deletion. This means that during WF formation, Meis2 operates before Foxd1 induction and does not mean necessarily that Meis2 directly controls expression of Foxd1. Yes, we agree with reviewer’s note that Foxd1 and Sox2, as known DC markers, decline because the number of WF declines. We wanted to convince readers that Meis2 operates very early in the GRN hierarchy during WF development. We also admit that we provide poor mechanistic insights into Meis2 function as a transcription factor. We think that this weak point does not lower the value of the report showing indispensable role of Meis2 in WFs and possibly all HFs.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review): 

      [...] Strengths: 

      The method the authors propose is a straightforward and inexpensive modification of an established split-pool single-cell RNA-seq protocol that greatly increases its utility, and should be of interest to a wide community working in the field of bacterial single-cell RNA-seq. 

      Weaknesses: 

      The manuscript is written in a very compressed style and many technical details of the evaluations conducted are unclear and processed data has not been made available for evaluation, limiting the ability of the reader to independently judge the merits of the method. 

      Thank you for your thoughtful and constructive review of our manuscript. We appreciate your recognition of the strengths of our work and the potential impact of our modified PETRI-seq protocol on the field of bacterial single-cell RNA-seq. We are grateful for the opportunity to address your concerns and improve the clarity and accessibility of our manuscript.

      We acknowledge your feedback regarding the compressed writing style and lack of technical details,which are constrained by the requirements of the Short Report format in eLife. We will addresse these issues in our revised manuscript as follows:

      (1) Expanded methodology section: We will provide a more comprehensive description of our experimental procedures, including detailed protocols for the ribosomal depletion step and data analysis pipeline. This will enable readers to better understand and potentially replicate our methods.

      (2) Clarification of technical evaluations: We will elaborate on the specifics of our evaluations, including the criteria used for assessing the efficiency of ribosomal depletion and the methods employed for identifying and characterizing subpopulations within the E. coli biofilm model.

      (3) Data availability: We apologize for the oversight in not making our processed data readily available. We have deposited all relevant datasets, including raw and source data, in appropriate public repositories (GEO number: GSE260458) and provide clear instructions for accessing this data in the revised manuscript.

      (4) Supplementary information: To maintain the concise nature of the main text while providing necessary details, we will inculde additional supplementary information. This will cover extended methodology, detailed statistical analyses, and comprehensive data tables to support our findings.

      (5) Discussion of limitations: We will include a more thorough discussion of the potential limitations of our modified protocol and areas for future improvement.

      ​We believe these changes will significantly improve the clarity and reproducibility of our work, allowing readers to better evaluate the merits of our method.

      Reviewer #2 (Public Review): 

      [...] Strengths: 

      The introduced rRNA depletion method is highly efficient, with the depletion for E.coli resulting in over 90% of reads containing mRNA. The method is ready to use with existing PETRI-seq libraries which is a large advantage, given that no other rRNA depletion methods were published for split-pool bacterial scRNA-seq methods. Therefore, the value of the method for the field is high. There is also evidence that a small number of cells at the bottom of a static biofilm express PdeI which is causing the elevated c-di-GMP levels that are associated with persister formation. Given that PdeI is a phosphodiesterase, which is supposed to promote hydrolysis of c-di-GMP, this finding is unexpected. 

      Weaknesses: 

      With the descriptions and writing of the manuscript, it is hard to place the findings about the PdeI into existing context (i.e. it is well known that c-di-GMP is involved in biofilm development and is heterogeneously distributed in several species' biofilms; it is also known that E.coli diesterases regulate this second messenger, i.e. https://journals.asm.org/doi/full/10.1128/jb.00604-15). <br /> There is also no explanation for the apparently contradictory upregulation of c-di-GMP in cells expressing higher PdeI levels. Perhaps the examination of the rest of the genes in cluster 2 of the biofilm sample could be useful to explain the observed association. 

      Thank you for your thoughtful and constructive review of our manuscript. We are pleased that the reviewer recognizes the value and efficiency of our rRNA depletion method for PETRI-seq, as well as its potential impact on the field. We would like to address the points raised by the reviewer and provide additional context and clarification regarding the function of PdeI in c-di-GMP regulation.

      We acknowledge that c-di-GMP’s role in biofilm development and its heterogeneous distribution in bacterial biofilms are well studied. We appreciate the reviewer's observation regarding the seemingly contradictory relationship between increased PdeI expression and elevated c-di-GMP levels. This is indeed an intriguing finding that warrants further explanation.

      PdeI was predicted to be a phosphodiesterase responsible for c-di-GMP degradation. This prediction is based on sequence analysis where PdeI contains an intact EAL domain known for degrading c-di-GMP. However, it is noteworthy that PdeI also contains a divergent GGDEF domain, which is typically associated with c-di-GMP synthesis. This dual-domain architecture suggests a potential for complex regulatory roles. As reported, the knockout of the major phosphodiesterase PdeH in E. coli leads to the accumulation of c-di-GMP. Further, a point mutation on PdeI's divergent GGDEF domain (G412S) in this PdeH knockout strain resulted in decreased c-di-GMP levels, implying that the wild-type GGDEF domain in PdeI has a role in maintaining or increasing c-di-GMP levels in the cell. Additionally, PdeI contains a CHASE (cyclases/histidine kinase-associated sensory) domain. Combined with our experimental results demonstrating that PdeI is a membrane-associated protein, we predict that PdeI functions as a sensor that integrates environmental signals with c-di-GMP production under complex regulatory mechanisms. The experimental evidence, along with domain analysis, suggests that PdeI could contribute to c-di-GMP synthesis, rebutting the notion that it solely functions as a phosphodiesterase. Furthermore, our single-cell experiments showed a positive correlation between PdeI expression levels and c-di-GMP levels (Fig. 2J). HPLC LC-MS/MS analysis further confirmed that PdeI overexpression (induced by arabinose) upregulated c-di-GMP levels (Fig. 2K). Importantly, in our HPLC LC-MS/MS analysis, we compared the PdeI overexpression strain with the wild-type MG1655 strain, thereby excluding the influence of other genes in cluster 2. In summary, while PdeI is predicted to be a phosphodiesterase based on its sequence and the presence of an EAL domain, the additional presence of a divergent GGDEF domain and experimental evidence suggests that PdeI has a function in upregulating c-di-GMP levels. These findings support the hypothesis that PdeI may have both synthetic and regulatory roles in c-di-GMP metabolism.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I appreciate the efforts the authors made to clarify and justify their statements and methodology, respectively. I additionally appreciate the efforts they made to provide me with detailed information - including figures - to aid my comprehension. However, there are two things I nevertheless recommend the authors to include in the main manuscript.

      (1) Statement about animal wellbeing: The authors state that they were constrained in their imaging session duration not because of a commonly reported technical limitation, such as photobleaching (which I honestly assumed), but rather the general wellbeing of the animals, who exhibited signs of distress after longer imaging periods. I find this to be a critical issue and perhaps the best argument against performing longer imaging experiments (which would have increased the number of trials, thus potentially boosting the performance of their model). To say that they put animal welfare above all other scientific and technical considerations speaks to a strong ethical adherence to animal welfare policy, and I believe this should be somehow incorporated into the methods.

      We have now included this at the top of page 26:

      “Mice fully recovered from the brief isoflurane anesthesia, showing a clear blinking reflex, whisking and sniffing behaviors and normal body posture and movements, immediately after head fixation. In our experimental conditions, mice were imaged in sessions of up to 25 min since beyond this time we started observing some signs of distress or discomfort. Thus, we avoided longer recording times at the expense of collecting larger trial numbers, in strong adherence of animal welfare and ethics policy. A pilot group of mice were habituated to the head fixed condition in daily 20 min sessions for 3 days, however we did not observe a marked contrast in the behavior of habituated versus unhabituated mice beyond our relatively short 25 min imaging sessions. In consequence imaging sessions never surpassed a maximum of 25 min, after which the mouse was returned to its home cage.”

      (2) Author response image 2: I sincerely thank the authors for providing us reviewers with this figure, which compares the performance of the naïve Bayesian classifier their ultimately use in the study with other commonly implemented models. Also here I falsely assumed that other models, which take correlated activity into account, did not generally perform better than their ultimate model of choice. Although dwelling on it would be distractive (and outside the primary scope of the study), I would encourage the authors to include it as a figure supplement (and simply mention these controls en passant when they justify their choice of the naïve Bayesian classifier).

      This figure was now included in the revised manuscript as supplemental figure 3.

      Page 10 now reads:

      “We performed cross-validated, multi-class classification of the single-trial population responses (decoding, Fig. 2A) using a naive Bayes classifier to evaluate the prediction errors as the absolute difference between the stimulus azimuth and the predicted azimuth (Fig. 2A). We chose this classification algorithm over others due to its generally good performance with limited available data. We visualized the cross-validated prediction error distribution in cumulative plots where the observed prediction errors were compared to the distribution of errors for random azimuth sampling (Fig. 2B). When decoding all simultaneously recorded units, the observed classifier output was not significantly better (shifted towards smaller prediction errors) than the chance level distribution (Fig. 2B). The classifier also failed to decode complete DCIC population responses recorded with neuropixels probes (Fig. 3A). Other classifiers performed similarly (Suppl. Fig. 3A).”

      The bottom paragraph in page 19 now reads:

      “To characterize how the observed positive noise correlations could affect the representation of stimulus azimuth by DCIC top ranked unit population responses, we compared the decoding performance obtained by classifying the single-trial response patterns from top ranked units in the modeled decorrelated datasets versus the acquired data (with noise correlations). With the intention to characterize this with a conservative approach that would be less likely to find a contribution of noise correlations as it assumes response independence, we relied on the naive Bayes classifier for decoding throughout the study. Using this classifier, we observed that the modeled decorrelated datasets produced stimulus azimuth prediction error distributions that were significantly shifted towards higher decoding errors (Fig. 6B, C) and, in our imaging datasets, were not significantly different from chance level (Fig. 6B). Altogether, these results suggest that the detected noise correlations in our simultaneously acquired datasets can help reduce the error of the IC population code for sound azimuth. We observed a similar, but not significant tendency with another classifier that does not assume response independence (KNN classifier), though overall producing larger decoding errors than the Bayes classifier (Suppl. Fig. 3B).”

      Reviewer #3 (Recommendations for the authors):

      I am generally happy with the response to the reviews.

      I find the Author response image 3 quite interesting. The neuropixel data looks somewhat like I expected (especially for mouse #3 and maybe mouse #4). I find the distribution of weights across units in the imaging dataset compared to in the pixel dataset intriguing (though it probably is just the dimensionality of the data being so much higher).

      I'm not too familiar with facial movements but is it the case that the DCIC would be more modulated by ipsilateral movement compared to contralateral movements? Are face movements in mice conjugate or do both sides of the face move more or less independently? If not it may be interesting in future work to record bilaterally and see if that provides more information about DCIC responses.

      We sincerely thank the editors and reviewers for their careful appraisal, commendation of our effort and helpful constructive feedback which greatly improved the presentation of our study. Below in green font is a point by point reply to the comments provided by the reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: In this study, the authors address whether the dorsal nucleus of the inferior colliculus (DCIC) in mice encodes sound source location within the front horizontal plane (i.e., azimuth). They do this using volumetric two-photon Ca2+ imaging and high-density silicon probes (Neuropixels) to collect single-unit data. Such recordings are beneficial because they allow large populations of simultaneous neural data to be collected. Their main results and the claims about those results are the following:

      (1) DCIC single-unit responses have high trial-to-trial variability (i.e., neural noise);

      (2) approximately 32% to 40% of DCIC single units have responses that are sensitive to sound source azimuth;

      (3) single-trial population responses (i.e., the joint response across all sampled single units in an animal) encode sound source azimuth "effectively" (as stated in title) in that localization decoding error matches average mouse discrimination thresholds;

      (4) DCIC can encode sound source azimuth in a similar format to that in the central nucleus of the inferior colliculus (as stated in Abstract);

      (5) evidence of noise correlation between pairs of neurons exists;

      and (6) noise correlations between responses of neurons help reduce population decoding error.

      While simultaneous recordings are not necessary to demonstrate results #1, #2, and #4, they are necessary to demonstrate results #3, #5, and #6.

      Strengths:

      - Important research question to all researchers interested in sensory coding in the nervous system.

      - State-of-the-art data collection: volumetric two-photon Ca2+ imaging and extracellular recording using high-density probes. Large neuronal data sets.

      - Confirmation of imaging results (lower temporal resolution) with more traditional microelectrode results (higher temporal resolution).

      - Clear and appropriate explanation of surgical and electrophysiological methods. I cannot comment on the appropriateness of the imaging methods.

      Strength of evidence for claims of the study:

      (1) DCIC single-unit responses have high trial-to-trial variability - The authors' data clearly shows this.

      (2) Approximately 32% to 40% of DCIC single units have responses that are sensitive to sound source azimuth - The sensitivity of each neuron's response to sound source azimuth was tested with a Kruskal-Wallis test, which is appropriate since response distributions were not normal. Using this statistical test, only 8% of neurons (median for imaging data) were found to be sensitive to azimuth, and the authors noted this was not significantly different than the false positive rate. The Kruskal-Wallis test was not performed on electrophysiological data. The authors suggested that low numbers of azimuth-sensitive units resulting from the statistical analysis may be due to the combination of high neural noise and relatively low number of trials, which would reduce statistical power of the test. This may be true, but if single-unit responses were moderately or strongly sensitive to azimuth, one would expect them to pass the test even with relatively low statistical power. At best, if their statistical test missed some azimuthsensitive units, they were likely only weakly sensitive to azimuth. The authors went on to perform a second test of azimuth sensitivity-a chi-squared test-and found 32% (imaging) and 40% (e-phys) of single units to have statistically significant sensitivity. This feels a bit like fishing for a lower p-value. The Kruskal-Wallis test should have been left as the only analysis. Moreover, the use of a chi-squared test is questionable because it is meant to be used between two categorical variables, and neural response had to be binned before applying the test.

      The determination of what is a physiologically relevant “moderate or strong azimuth sensitivity” is not trivial, particularly when comparing tuning across different relays of the auditory pathway like the CNIC, auditory cortex, or in our case DCIC, where physiologically relevant azimuth sensitivities might be different. This is likely the reason why azimuth sensitivity has been defined in diverse ways across the bibliography (see Groh, Kelly & Underhill, 2003 for an early discussion of this issue). These diverse approaches include reaching a certain percentage of maximal response modulation, like used by Day et al. (2012, 2015, 2016) in CNIC, and ANOVA tests, like used by Panniello et al. (2018) and Groh, Kelly & Underhill (2003) in auditory cortex and IC respectively. Moreover, the influence of response variability and biases in response distribution estimation due to limited sampling has not been usually accounted for in the determination of azimuth sensitivity.

      As Reviewer #1 points out, in our study we used an appropriate ANOVA test (KruskalWallis) as a starting point to study response sensitivity to stimulus azimuth at DCIC. Please note that the alpha = 0.05 used for this test is not based on experimental evidence about physiologically relevant azimuth sensitivity but instead is an arbitrary p-value threshold. Using this test on the electrophysiological data, we found that ~ 21% of the simultaneously recorded single units reached significance (n = 4 mice). Nevertheless these percentages, in our small sample size (n = 4) were not significantly different from our false positive detection rate (p = 0.0625, Mann-Whitney, See Author response image 1).  In consequence, for both our imaging (Fig. 3C) and electrophysiological data, we could not ascertain if the percentage of neurons reaching significance in these ANOVA tests were indeed meaningfully sensitive to azimuth or this was due to chance.

      Author response image 1.

      Percentage of the neuropixels recorded DCIC single units across mice that showed significant median response tuning, compared to false positive detection rate (α = 0.05, chance level).

      We reasoned that the observed markedly variable responses from DCIC units, which frequently failed to respond in many trials (Fig. 3D, 4A), in combination with the limited number of trial repetitions we could collect, results in under-sampled response distribution estimations. This under-sampling can bias the determination of stochastic dominance across azimuth response samples in Kruskal-Wallis tests. We would like to highlight that we decided not to implement resampling strategies to artificially increase the azimuth response sample sizes with “virtual trials”, in order to avoid “fishing for a smaller p-value”, when our collected samples might not accurately reflect the actual response population variability.

      As an alternative to hypothesis testing based on ranking and determining stochastic dominance of one or more azimuth response samples (Kruskal-Wallis test), we evaluated the overall statistical dependency to stimulus azimuth of the collected responses.  To do this we implement the Chi-square test by binning neuronal responses into categories. Binning responses into categories can reduce the influence of response variability to some extent, which constitutes an advantage of the Chi-square approach, but we note the important consideration that these response categories are arbitrary.

      Altogether, we acknowledge that our Chi-square approach to define azimuth sensitivity is not free of limitations and despite enabling the interrogation of azimuth sensitivity at DCIC, its interpretability might not extend to other brain regions like CNIC or auditory cortex. Nevertheless we hope the aforementioned arguments justify why the Kruskal-Wallis test simply could not “have been left as the only analysis”.

      (3) Single-trial population responses encode sound source azimuth "effectively" in that localization decoding error matches average mouse discrimination thresholds - If only one neuron in a population had responses that were sensitive to azimuth, we would expect that decoding azimuth from observation of that one neuron's response would perform better than chance. By observing the responses of more than one neuron (if more than one were sensitive to azimuth), we would expect performance to increase. The authors found that decoding from the whole population response was no better than chance. They argue (reasonably) that this is because of overfitting of the decoder modeltoo few trials used to fit too many parameters-and provide evidence from decoding combined with principal components analysis which suggests that overfitting is occurring. What is troubling is the performance of the decoder when using only a handful of "topranked" neurons (in terms of azimuth sensitivity) (Fig. 4F and G). Decoder performance seems to increase when going from one to two neurons, then decreases when going from two to three neurons, and doesn't get much better for more neurons than for one neuron alone. It seems likely there is more information about azimuth in the population response, but decoder performance is not able to capture it because spike count distributions in the decoder model are not being accurately estimated due to too few stimulus trials (14, on average). In other words, it seems likely that decoder performance is underestimating the ability of the DCIC population to encode sound source azimuth.

      To get a sense of how effective a neural population is at coding a particular stimulus parameter, it is useful to compare population decoder performance to psychophysical performance. Unfortunately, mouse behavioral localization data do not exist. Therefore, the authors compare decoder error to mouse left-right discrimination thresholds published previously by a different lab. However, this comparison is inappropriate because the decoder and the mice were performing different perceptual tasks. The decoder is classifying sound sources to 1 of 13 locations from left to right, whereas the mice were discriminating between left or right sources centered around zero degrees. The errors in these two tasks represent different things. The two data sets may potentially be more accurately compared by extracting information from the confusion matrices of population decoder performance. For example, when the stimulus was at -30 deg, how often did the decoder classify the stimulus to a lefthand azimuth? Likewise, when the stimulus was +30 deg, how often did the decoder classify the stimulus to a righthand azimuth?

      The azimuth discrimination error reported by Lauer et al. (2011) comes from engaged and highly trained mice, which is a very different context to our experimental setting with untrained mice passively listening to stimuli from 13 random azimuths. Therefore we did not perform analyses or interpretations of our results based on the behavioral task from Lauer et al. (2011) and only made the qualitative observation that the errors match for discussion.

      We believe it is further important to clarify that Lauer et al. (2011) tested the ability of mice to discriminate between a positively conditioned stimulus (reference speaker at 0º center azimuth associated to a liquid reward) and a negatively conditioned stimulus (coming from one of five comparison speakers positioned at 20º, 30º, 50º, 70 and 90º azimuth, associated to an electrified lickport) in a conditioned avoidance task. In this task, mice are not precisely “discriminating between left or right sources centered around zero degrees”, making further analyses to compare the experimental design of Lauer et al (2011) and ours even more challenging for valid interpretation.

      (4) DCIC can encode sound source azimuth in a similar format to that in the central nucleus of the inferior colliculus - It is unclear what exactly the authors mean by this statement in the Abstract. There are major differences in the encoding of azimuth between the two neighboring brain areas: a large majority of neurons in the CNIC are sensitive to azimuth (and strongly so), whereas the present study shows a minority of azimuth-sensitive neurons in the DCIC. Furthermore, CNIC neurons fire reliably to sound stimuli (low neural noise), whereas the present study shows that DCIC neurons fire more erratically (high neural noise).

      Since sound source azimuth is reported to be encoded by population activity patterns at CNIC (Day and Delgutte, 2013), we refer to a population activity pattern code as the “similar format” in which this information is encoded at DCIC. Please note that this is a qualitative comparison and we do not claim this is the “same format”, due to the differences the reviewer precisely describes in the encoding of azimuth at CNIC where a much larger majority of neurons show stronger azimuth sensitivity and response reliability with respect to our observations at DCIC. By this qualitative similarity of encoding format we specifically mean the similar occurrence of activity patterns from azimuth sensitive subpopulations of neurons in both CNIC and DCIC, which carry sufficient information about the stimulus azimuth for a sufficiently accurate prediction with regard to the behavioral discrimination ability.

      (5) Evidence of noise correlation between pairs of neurons exists - The authors' data and analyses seem appropriate and sufficient to justify this claim.

      (6) Noise correlations between responses of neurons help reduce population decoding error - The authors show convincing analysis that performance of their decoder increased when simultaneously measured responses were tested (which include noise correlation) than when scrambled-trial responses were tested (eliminating noise correlation). This makes it seem likely that noise correlation in the responses improved decoder performance. The authors mention that the naïve Bayesian classifier was used as their decoder for computational efficiency, presumably because it assumes no noise correlation and, therefore, assumes responses of individual neurons are independent of each other across trials to the same stimulus. The use of decoder that assumes independence seems key here in testing the hypothesis that noise correlation contains information about sound source azimuth. The logic of using this decoder could be more clearly spelled out to the reader. For example, if the null hypothesis is that noise correlations do not carry azimuth information, then a decoder that assumes independence should perform the same whether population responses are simultaneous or scrambled. The authors' analysis showing a difference in performance between these two cases provides evidence against this null hypothesis.

      We sincerely thank the reviewer for this careful and detailed consideration of our analysis approach. Following the reviewer’s constructive suggestion, we justified the decoder choice in the results section at the last paragraph of page 18:

      “To characterize how the observed positive noise correlations could affect the representation of stimulus azimuth by DCIC top ranked unit population responses, we compared the decoding performance obtained by classifying the single-trial response patterns from top ranked units in the modeled decorrelated datasets versus the acquired data (with noise correlations). With the intention to characterize this with a conservative approach that would be less likely to find a contribution of noise correlations as it assumes response independence, we relied on the naive Bayes classifier for decoding throughout the study.

      Using this classifier, we observed that the modeled decorrelated datasets produced stimulus azimuth prediction error distributions that were significantly shifted towards higher decoding errors (Fig. 5B, C) and, in our imaging datasets, were not significantly different from chance level (Fig. 5B). Altogether, these results suggest that the detected noise correlations in our simultaneously acquired datasets can help reduce the error of the IC population code for sound azimuth.”

      Minor weakness:

      - Most studies of neural encoding of sound source azimuth are done in a noise-free environment, but the experimental setup in the present study had substantial background noise. This complicates comparison of the azimuth tuning results in this study to those of other studies. One is left wondering if azimuth sensitivity would have been greater in the absence of background noise, particularly for the imaging data where the signal was only about 12 dB above the noise. The description of the noise level and signal + noise level in the Methods should be made clearer. Mice hear from about 2.5 - 80 kHz, so it is important to know the noise level within this band as well as specifically within the band overlapping with the signal.

      We agree with the reviewer that this information is useful. In our study, the background R.M.S. SPL during imaging across the mouse hearing range (2.5-80kHz) was 44.53 dB and for neuropixels recordings 34.68 dB. We have added this information to the methods section of the revised manuscript.

      Reviewer #2 (Public Review):

      In the present study, Boffi et al. investigate the manner in which the dorsal cortex of the of the inferior colliculus (DCIC), an auditory midbrain area, encodes sound location azimuth in awake, passively listening mice. By employing volumetric calcium imaging (scanned temporal focusing or s-TeFo), complemented with high-density electrode electrophysiological recordings (neuropixels probes), they show that sound-evoked responses are exquisitely noisy, with only a small portion of neurons (units) exhibiting spatial sensitivity. Nevertheless, a naïve Bayesian classifier was able to predict the presented azimuth based on the responses from small populations of these spatially sensitive units. A portion of the spatial information was provided by correlated trial-to-trial response variability between individual units (noise correlations). The study presents a novel characterization of spatial auditory coding in a non-canonical structure, representing a noteworthy contribution specifically to the auditory field and generally to systems neuroscience, due to its implementation of state-of-the-art techniques in an experimentally challenging brain region. However, nuances in the calcium imaging dataset and the naïve Bayesian classifier warrant caution when interpreting some of the results.

      Strengths:

      The primary strength of the study lies in its methodological achievements, which allowed the authors to collect a comprehensive and novel dataset. While the DCIC is a dorsal structure, it extends up to a millimetre in depth, making it optically challenging to access in its entirety. It is also more highly myelinated and vascularised compared to e.g., the cerebral cortex, compounding the problem. The authors successfully overcame these challenges and present an impressive volumetric calcium imaging dataset. Furthermore, they corroborated this dataset with electrophysiological recordings, which produced overlapping results. This methodological combination ameliorates the natural concerns that arise from inferring neuronal activity from calcium signals alone, which are in essence an indirect measurement thereof.

      Another strength of the study is its interdisciplinary relevance. For the auditory field, it represents a significant contribution to the question of how auditory space is represented in the mammalian brain. "Space" per se is not mapped onto the basilar membrane of the cochlea and must be computed entirely within the brain. For azimuth, this requires the comparison between miniscule differences between the timing and intensity of sounds arriving at each ear. It is now generally thought that azimuth is initially encoded in two, opposing hemispheric channels, but the extent to which this initial arrangement is maintained throughout the auditory system remains an open question. The authors observe only a slight contralateral bias in their data, suggesting that sound source azimuth in the DCIC is encoded in a more nuanced manner compared to earlier processing stages of the auditory hindbrain. This is interesting, because it is also known to be an auditory structure to receive more descending inputs from the cortex.

      Systems neuroscience continues to strive for the perfection of imaging novel, less accessible brain regions. Volumetric calcium imaging is a promising emerging technique, allowing the simultaneous measurement of large populations of neurons in three dimensions. But this necessitates corroboration with other methods, such as electrophysiological recordings, which the authors achieve. The dataset moreover highlights the distinctive characteristics of neuronal auditory representations in the brain. Its signals can be exceptionally sparse and noisy, which provide an additional layer of complexity in the processing and analysis of such datasets. This will be undoubtedly useful for future studies of other less accessible structures with sparse responsiveness.

      Weaknesses:                                                                                               

      Although the primary finding that small populations of neurons carry enough spatial information for a naïve Bayesian classifier to reasonably decode the presented stimulus is not called into question, certain idiosyncrasies, in particular the calcium imaging dataset and model, complicate specific interpretations of the model output, and the readership is urged to interpret these aspects of the study's conclusions with caution.

      I remain in favour of volumetric calcium imaging as a suitable technique for the study, but the presently constrained spatial resolution is insufficient to unequivocally identify regions of interest as cell bodies (and are instead referred to as "units" akin to those of electrophysiological recordings). It remains possible that the imaging set is inadvertently influenced by non-somatic structures (including neuropil), which could report neuronal activity differently than cell bodies. Due to the lack of a comprehensive ground-truth comparison in this regard (which to my knowledge is impossible to achieve with current technology), it is difficult to imagine how many informative such units might have been missed because their signals were influenced by spurious, non-somatic signals, which could have subsequently misled the models. The authors reference the original Nature Methods article (Prevedel et al., 2016) throughout the manuscript, presumably in order to avoid having to repeat previously published experimental metrics. But the DCIC is neither the cortex nor hippocampus (for which the method was originally developed) and may not have the same light scattering properties (not to mention neuronal noise levels). Although the corroborative electrophysiology data largely eleviates these concerns for this particular study, the readership should be cognisant of such caveats, in particular those who are interested in implementing the technique for their own research.

      A related technical limitation of the calcium imaging dataset is the relatively low number of trials (14) given the inherently high level of noise (both neuronal and imaging). Volumetric calcium imaging, while offering a uniquely expansive field of view, requires relatively high average excitation laser power (in this case nearly 200 mW), a level of exposure the authors may have wanted to minimise by maintaining a low the number of repetitions, but I yield to them to explain.

      We assumed that the levels of heating by excitation light measured at the neocortex in Prevedel et al. (2016), were representative for DCIC also. Nevertheless, we recognize this approximation might not be very accurate, due to the differences in tissue architecture and vascularization from these two brain areas, just to name a few factors. The limiting factor preventing us from collecting more trials in our imaging sessions was that we observed signs of discomfort or slight distress in some mice after ~30 min of imaging in our custom setup, which we established as a humane end point to prevent distress. In consequence imaging sessions were kept to 25 min in duration, limiting the number of trials collected. However we cannot rule out that with more extensive habituation prior to experiments the imaging sessions could be prolonged without these signs of discomfort or if indeed influence from our custom setup like potential heating of the brain by illumination light might be the causing factor of the observed distress. Nevertheless, we note that previous work has shown that ~200mW average power is a safe regime for imaging in the cortex by keeping brain heating minimal (Prevedel et al., 2016), without producing the lasting damages observed by immunohistochemisty against apoptosis markers above 250mW (Podgorski and Ranganathan 2016, https://doi.org/10.1152/jn.00275.2016).

      Calcium imaging is also inherently slow, requiring relatively long inter-stimulus intervals (in this case 5 s). This unfortunately renders any model designed to predict a stimulus (in this case sound azimuth) from particularly noisy population neuronal data like these as highly prone to overfitting, to which the authors correctly admit after a model trained on the entire raw dataset failed to perform significantly above chance level. This prompted them to feed the model only with data from neurons with the highest spatial sensitivity. This ultimately produced reasonable performance (and was implemented throughout the rest of the study), but it remains possible that if the model was fed with more repetitions of imaging data, its performance would have been more stable across the number of units used to train it. (All models trained with imaging data eventually failed to converge.) However, I also see these limitations as an opportunity to improve the technology further, which I reiterate will be generally important for volume imaging of other sparse or noisy calcium signals in the brain.

      Transitioning to the naïve Bayesian classifier itself, I first openly ask the authors to justify their choice of this specific model. There are countless types of classifiers for these data, each with their own pros and cons. Did they actually try other models (such as support vector machines), which ultimately failed? If so, these negative results (even if mentioned en passant) would be extremely valuable to the community, in my view. I ask this specifically because different methods assume correspondingly different statistical properties of the input data, and to my knowledge naïve Bayesian classifiers assume that predictors (neuronal responses) are assumed to be independent within a class (azimuth). As the authors show that noise correlations are informative in predicting azimuth, I wonder why they chose a model that doesn't take advantage of these statistical regularities. It could be because of technical considerations (they mention computing efficiency), but I am left generally uncertain about the specific logic that was used to guide the authors through their analytical journey.

      One of the main reasons we chose the naïve Bayesian classifier is indeed because it assumes that the responses of the simultaneously recorded neurons are independent and therefore it does not assume a contribution of noise correlations to the estimation of the posterior probability of each azimuth. This model would represent the null hypothesis that noise correlations do not contribute to the encoding of stimulus azimuth, which would be verified by an equal decoding outcome from correlated or decorrelated datasets. Since we observed that this is not the case, the model supports the alternative hypothesis that noise correlations do indeed influence stimulus azimuth encoding. We wanted to test these hypotheses with the most conservative approach possible that would be least likely to find a contribution of noise correlations. Other relevant reasons that justify our choice of the naive Bayesian classifier are its robustness against the limited numbers of trials we could collect in comparison to other more “data hungry” classifiers like SVM, KNN, or artificial neuronal nets. We did perform preliminary tests with alternative classifiers but the obtained decoding errors were similar when decoding the whole population activity (Supplemental figure 3A). Dimensionality reduction following the approach described in the manuscript showed a tendency towards smaller decoding errors observed with an alternative classifier like KNN, but these errors were still larger than the ones observed with the naive Bayesian classifier (median error 45º). Nevertheless, we also observe a similar tendency for slightly larger decoding errors in the absence of noise correlations (decorrelated, Supplemental figure 3B). Sentences detailing the logic of classifier choice are now included in the results section at page 10 and at the last paragraph of page 18 (see responses to Reviewer 1).

      That aside, there remain other peculiarities in model performance that warrant further investigation. For example, what spurious features (or lack of informative features) in these additional units prevented the models of imaging data from converging?

      Considering the amount of variability observed throughout the neuronal responses both in imaging and neuropixels datasets, it is easy to suspect that the information about stimulus azimuth carried in different amounts by individual DCIC neurons can be mixed up with information about other factors (Stringer et al., 2019). In an attempt to study the origin of these features that could confound stimulus azimuth decoding we explored their relation to face movement (Supplemental Figure 2), finding a correlation to snout movements, in line with previous work by Stringer et al. (2019).

      In an orthogonal question, did the most spatially sensitive units share any detectable tuning features? A different model trained with electrophysiology data in contrast did not collapse in the range of top-ranked units plotted. Did this model collapse at some point after adding enough units, and how well did that correlate with the model for the imaging data?

      Our electrophysiology datasets were much smaller in size (number of simultaneously recorded neurons) compared to our volumetric calcium imaging datasets, resulting in a much smaller total number of top ranked units detected per dataset. This precluded the determination of a collapse of decoder performance due to overfitting beyond the range plotted in Fig 4G.

      How well did the form (and diversity) of the spatial tuning functions as recorded with electrophysiology resemble their calcium imaging counterparts? These fundamental questions could be addressed with more basic, but transparent analyses of the data (e.g., the diversity of spatial tuning functions of their recorded units across the population). Even if the model extracts features that are not obvious to the human eye in traditional visualisations, I would still find this interesting.

      The diversity of the azimuth tuning curves recorded with calcium imaging (Fig. 3B) was qualitatively larger than the ones recorded with electrophysiology (Fig. 4B), potentially due to the larger sampling obtained with volumetric imaging. We did not perform a detailed comparison of the form and a more quantitative comparison of the diversity of these functions because the signals compared are quite different, as calcium indicator signal is subject to non linearities due to Ca2+ binding cooperativity and low pass filtering due to binding kinetics. We feared this could lead to misleading interpretations about the similarities or differences between the azimuth tuning functions in imaged and electrophysiology datasets. Our model uses statistical response dependency to stimulus azimuth, which does not rely on features from a descriptive statistic like mean response tuning. In this context, visualizing the trial-to-trial responses as a function of azimuth shows “features that are not obvious to the human eye in traditional visualizations” (Fig. 3D, left inset).

      Finally, the readership is encouraged to interpret certain statements by the authors in the current version conservatively. How the brain ultimately extracts spatial neuronal data for perception is anyone's guess, but it is important to remember that this study only shows that a naïve Bayesian classifier could decode this information, and it remains entirely unclear whether the brain does this as well. For example, the model is able to achieve a prediction error that corresponds to the psychophysical threshold in mice performing a discrimination task (~30 {degree sign}). Although this is an interesting coincidental observation, it does not mean that the two metrics are necessarily related. The authors correctly do not explicitly claim this, but the manner in which the prose flows may lead a non-expert into drawing that conclusion.

      To avoid misleading the non-expert readers, we have clarified in the manuscript that the observed correspondence between decoding error and psychophysical threshold is explicitly coincidental.

      Page 13, end of middle paragraph:

      “If we consider the median of the prediction error distribution as an overall measure of decoding performance, the single-trial response patterns from subsamples of at least the 7 top ranked units produced median decoding errors that coincidentally matched the reported azimuth discrimination ability of mice (Fig 4G, minimum audible angle = 31º) (Lauer et al., 2011).”

      Page 14, bottom paragraph:

      “Decoding analysis (Fig. 4F) of the population response patterns from azimuth dependent top ranked units simultaneously recorded with neuropixels probes showed that the 4 top ranked units are the smallest subsample necessary to produce a significant decoding performance that coincidentally matches the discrimination ability of mice (31° (Lauer et al., 2011)) (Fig. 5F, G).”

      We also added to the Discussion sentences clarifying that a relationship between these two variables remains to be determined and it also remains to be determined if the DCIC indeed performs a bayesian decoding computation for sound localization.

      Page 20, bottom:

      “… Concretely, we show that sound location coding does indeed occur at DCIC on the single trial basis, and that this follows a comparable mechanism to the characterized population code at CNIC (Day and Delgutte, 2013). However, it remains to be determined if indeed the DCIC network is physiologically capable of Bayesian decoding computations. Interestingly, the small number of DCIC top ranked units necessary to effectively decode stimulus azimuth suggests that sound azimuth information is redundantly distributed across DCIC top ranked units, which points out that mechanisms beyond coding efficiency could be relevant for this population code.

      While the decoding error observed from our DCIC datasets obtained in passively listening, untrained mice coincidentally matches the discrimination ability of highly trained, motivated mice (Lauer et al., 2011), a relationship between decoding error and psychophysical performance remains to be determined. Interestingly, a primary sensory representations should theoretically be even more precise than the behavioral performance as reported in the visual system (Stringer et al., 2021).”

      Moreover, the concept of redundancy (of spatial information carried by units throughout the DCIC) is difficult for me to disentangle. One interpretation of this formulation could be that there are non-overlapping populations of neurons distributed across the DCIC that each could predict azimuth independently of each other, which is unlikely what the authors meant. If the authors meant generally that multiple neurons in the DCIC carry sufficient spatial information, then a single neuron would have been able to predict sound source azimuth, which was not the case. I have the feeling that they actually mean "complimentary", but I leave it to the authors to clarify my confusion, should they wish.

      We observed that the response patterns from relatively small fractions of the azimuth sensitive DCIC units (4-7 top ranked units) are sufficient to generate an effective code for sound azimuth, while 32-40% of all simultaneously recorded DCIC units are azimuth sensitive. In light of this observation, we interpreted that the azimuth information carried by the population should be redundantly distributed across the complete subpopulation of azimuth sensitive DCIC units.

      In summary, the present study represents a significant body of work that contributes substantially to the field of spatial auditory coding and systems neuroscience. However, limitations of the imaging dataset and model as applied in the study muddles concrete conclusions about how the DCIC precisely encodes sound source azimuth and even more so to sound localisation in a behaving animal. Nevertheless, it presents a novel and unique dataset, which, regardless of secondary interpretation, corroborates the general notion that auditory space is encoded in an extraordinarily complex manner in the mammalian brain.

      Reviewer #3 (Public Review):

      Summary: Boffi and colleagues sought to quantify the single-trial, azimuthal information in the dorsal cortex of the inferior colliculus (DCIC), a relatively understudied subnucleus of the auditory midbrain. They used two complementary recording methods while mice passively listened to sounds at different locations: a large volume but slow sampling calcium-imaging method, and a smaller volume but temporally precise electrophysiology method. They found that neurons in the DCIC were variable in their activity, unreliably responding to sound presentation and responding during inter-sound intervals. Boffi and colleagues used a naïve Bayesian decoder to determine if the DCIC population encoded sound location on a single trial. The decoder failed to classify sound location better than chance when using the raw single-trial population response but performed significantly better than chance when using intermediate principal components of the population response. In line with this, when the most azimuth dependent neurons were used to decode azimuthal position, the decoder performed equivalently to the azimuthal localization abilities of mice. The top azimuthal units were not clustered in the DCIC, possessed a contralateral bias in response, and were correlated in their variability (e.g., positive noise correlations). Interestingly, when these noise correlations were perturbed by inter-trial shuffling decoding performance decreased. Although Boffi and colleagues display that azimuthal information can be extracted from DCIC responses, it remains unclear to what degree this information is used and what role noise correlations play in azimuthal encoding.

      Strengths: The authors should be commended for collection of this dataset. When done in isolation (which is typical), calcium imaging and linear array recordings have intrinsic weaknesses. However, those weaknesses are alleviated when done in conjunction with one another - especially when the data largely recapitulates the findings of the other recording methodology. In addition to the video of the head during the calcium imaging, this data set is extremely rich and will be of use to those interested in the information available in the DCIC, an understudied but likely important subnucleus in the auditory midbrain.

      The DCIC neural responses are complex; the units unreliably respond to sound onset, and at the very least respond to some unknown input or internal state (e.g., large inter-sound interval responses). The authors do a decent job in wrangling these complex responses: using interpretable decoders to extract information available from population responses.

      Weaknesses:

      The authors observe that neurons with the most azimuthal sensitivity within the DCIC are positively correlated, but they use a Naïve Bayesian decoder which assume independence between units. Although this is a bit strange given their observation that some of the recorded units are correlated, it is unlikely to be a critical flaw. At one point the authors reduce the dimensionality of their data through PCA and use the loadings onto these components in their decoder. PCA incorporates the correlational structure when finding the principal components and constrains these components to be orthogonal and uncorrelated. This should alleviate some of the concern regarding the use of the naïve Bayesian decoder because the projections onto the different components are independent. Nevertheless, the decoding results are a bit strange, likely because there is not much linearly decodable azimuth information in the DCIC responses. Raw population responses failed to provide sufficient information concerning azimuth for the decoder to perform better than chance. Additionally, it only performed better than chance when certain principal components or top ranked units contributed to the decoder but not as more components or units were added. So, although there does appear to be some azimuthal information in the recoded DCIC populations - it is somewhat difficult to extract and likely not an 'effective' encoding of sound localization as their title suggests.

      As described in the responses to reviewers 1 and 2, we chose the naïve Bayes classifier as a decoder to determine the influence of noise correlations through the most conservative approach possible, as this classifier would be least likely to find a contribution of correlated noise. Also, we chose this decoder due to its robustness against limited numbers of trials collected, in comparison to “data hungry” non linear classifiers like KNN or artificial neuronal nets. Lastly, we observed that small populations of noisy, unreliable (do not respond in every trial) DCIC neurons can encode stimulus azimuth in passively listening mice matching the discrimination error of trained mice. Therefore, while this encoding is definitely not efficient, it can still be considered effective.

      Although this is quite a worthwhile dataset, the authors present relatively little about the characteristics of the units they've recorded. This may be due to the high variance in responses seen in their population. Nevertheless, the authors note that units do not respond on every trial but do not report what percent of trials that fail to evoke a response. Is it that neurons are noisy because they do not respond on every trial or is it also that when they do respond they have variable response distributions? It would be nice to gain some insight into the heterogeneity of the responses.

      The limited number of azimuth trial repetitions that we could collect precluded us from making any quantification of the unreliability (failures to respond) and variability in the response distributions from the units we recorded, as we feared they could be misleading. In qualitative terms, “due to the high variance in responses seen” in the recordings and the limited trial sampling, it is hard to make any generalization. In consequence we referred to the observed response variance altogether as neuronal noise. Considering these points, our datasets are publicly available for exploration of the response characteristics.

      Additionally, is there any clustering at all in response profiles or is each neuron they recorded in the DCIC unique?

      We attempted to qualitatively visualize response clustering using dimensionality reduction, observing different degrees of clustering or lack thereof across the azimuth classes in the datasets collected from different mice. It is likely that the limited number of azimuth trials we could collect and the high response variance contribute to an inconsistent response clustering across datasets.

      They also only report the noise correlations for their top ranked units, but it is possible that the noise correlations in the rest of the population are different.

      For this study, since our aim was to interrogate the influence of noise correlations on stimulus azimuth encoding by DCIC populations, we focused on the noise correlations from the top ranked unit subpopulation, which likely carry the bulk of the sound location information.  Noise correlations can be defined as correlation in the trial to trial response variation of neurons. In this respect, it is hard to ascertain if the rest of the population, that is not in the top rank unit percentage, are really responding and showing response variation to evaluate this correlation, or are simply not responding at all and show unrelated activity altogether. This makes observations about noise correlations from “the rest of the population” potentially hard to interpret.

      It would also be worth digging into the noise correlations more - are units positively correlated because they respond together (e.g., if unit x responds on trial 1 so does unit y) or are they also modulated around their mean rates on similar trials (e.g., unit x and y respond and both are responding more than their mean response rate). A large portion of trial with no response can occlude noise correlations. More transparency around the response properties of these populations would be welcome.

      Due to the limited number of azimuth trial repetitions collected, to evaluate noise correlations we used the non parametric Kendall tau correlation coefficient which is a measure of pairwise rank correlation or ordinal association in the responses to each azimuth. Positive rank correlation would represent neurons more likely responding together. Evaluating response modulation “around their mean rates on similar trials” would require assumptions about the response distributions, which we avoided due to the potential biases associated with limited sample sizes.

      It is largely unclear what the DCIC is encoding. Although the authors are interested in azimuth, sound location seems to be only a small part of DCIC responses. The authors report responses during inter-sound interval and unreliable sound-evoked responses. Although they have video of the head during recording, we only see a correlation to snout and ear movements (which are peculiar since in the example shown it seems the head movements predict the sound presentation). Additional correlates could be eye movements or pupil size. Eye movement are of particular interest due to their known interaction with IC responses - especially if the DCIC encodes sound location in relation to eye position instead of head position (though much of eye-position-IC work was done in primates and not rodent). Alternatively, much of the population may only encode sound location if an animal is engaged in a localization task. Ideally, the authors could perform more substantive analyses to determine if this population is truly noisy or if the DCIC is integrating un-analyzed signals.

      We unsuccessfully attempted eye tracking and pupillometry in our videos. We suspect that the reason behind this is a generally overly dilated pupil due to the low visible light illumination conditions we used which were necessary to protect the PMT of our custom scope.

      It is likely that DCIC population activity is integrating un-analyzed signals, like the signal associated with spontaneous behaviors including face movements (Stringer et al., 2019), which we observed at the level of spontaneous snout movements. However investigating if and how these signals are integrated to stimulus azimuth coding requires extensive behavioral testing and experimentation which is out of the scope of this study. For the purpose of our study, we referred to trial-to-trial response variation as neuronal noise. We note that this definition of neuronal noise can, and likely does, include an influence from un-analyzed signals like the ones from spontaneous behaviors.

      Although this critique is ubiquitous among decoding papers in the absence of behavioral or causal perturbations, it is unclear what - if any - role the decoded information may play in neuronal computations. The interpretation of the decoder means that there is some extractable information concerning sound azimuth - but not if it is functional. This information may just be epiphenomenal, leaking in from inputs, and not used in computation or relayed to downstream structures. This should be kept in mind when the authors suggest their findings implicate the DCIC functionally in sound localization.

      Our study builds upon previous reports by other independent groups relying on “causal and behavioral perturbations” and implicating DCIC in sound location learning induced experience dependent plasticity (Bajo et al., 2019, 2010; Bajo and King, 2012), which altogether argues in favor of DCIC functionality in sound localization.

      Nevertheless, we clarified in the discussion of the revised manuscript that a relationship between the observed decoding error and the psychophysical performance, or the ability of the DCIC network to perform Bayesian decoding computations, both remain to be determined (please see responses to Reviewer #2).

      It is unclear why positive noise correlations amongst similarly tuned neurons would improve decoding. A toy model exploring how positive noise correlations in conjunction with unreliable units that inconsistently respond may anchor these findings in an interpretable way. It seems plausible that inconsistent responses would benefit from strong noise correlations, simply by units responding together. This would predict that shuffling would impair performance because you would then be sampling from trials in which some units respond, and trials in which some units do not respond - and may predict a bimodal performance distribution in which some trials decode well (when the units respond) and poor performance (when the units do not respond).

      In samples with more that 2 dimensions, the relationship between signal and noise correlations is more complex than in two dimensional samples (Montijn et al., 2016) which makes constructing interpretable and simple toy models of this challenging. Montijn et al. (2016) provide a detailed characterization and model describing how the accuracy of a multidimensional population code can improve when including “positive noise correlations amongst similarly tuned neurons”. Unfortunately we could not successfully test their model based on Mahalanobis distances as we could not verify that the recorded DCIC population responses followed a multivariate gaussian distribution, due to the limited azimuth trial repetitions we could sample.

      Significance: Boffi and colleagues set out to parse the azimuthal information available in the DCIC on a single trial. They largely accomplish this goal and are able to extract this information when allowing the units that contain more information about sound location to contribute to their decoding (e.g., through PCA or decoding on top unit activity specifically). The dataset will be of value to those interested in the DCIC and also to anyone interested in the role of noise correlations in population coding. Although this work is first step into parsing the information available in the DCIC, it remains difficult to interpret if/how this azimuthal information is used in localization behaviors of engaged mice.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      In this study, Alejandro Rosell et al. uncovers the immunoregulation functions of RAS-p110α pathway in macrophages, including the extravasation of monocytes from the bloodstream and subsequent lysosomal digestion. Disrupting RAS-p110α pathway by mouse genetic tools or by pharmacological intervention, hampers the inflammatory response, leading to delayed resolution and more severe acute inflammatory reactions. The authors proposed that activating p110α using small molecules could be a promising approach for treating chronic inflammation. This study provides insights into the roles and mechanisms of p110α on macrophage function and the inflammatory response, while some conclusions are still questionable because of several issues described below. 

      (1) Fig. 1B showed that disruption of RAS-p110α causes the decrease in the activation of NF-κB, which is a crucial transcription factor that regulates the expression of proinflammatory genes. However, the authors observed that disruption of RAS-p110α interaction results in an exacerbated inflammatory state in vivo, in both localized paw inflammation and systemic inflammatory mediator levels. Also, the authors introduced that "this disruption leads to a change in macrophage polarization, favoring a more proinflammatory M1 state" in introduction according to reference 12. The conclusions drew from the signaling and the models seemed contradictory and puzzling. Besides, it is not clear why the protein level of p65 was decreased at 10' and 30'. Was it attributed to the degradation of p65 or experimental variation? 

      We thank the reviewer for this insightful comment and apologize for not previously explaining the implications of the observed decrease in NF-κB activation. We found a decrease in NF-κB activation in response to LPS + IFN-γ stimulation in macrophages lacking RAS-PI3K interaction. As the reviewer pointed out, NF-κB is a key transcription factor that regulates the expression of various proinflammatory genes. To better characterize whether the decrease in p-p65 would lead to a reduction in the expression of specific cytokines, we performed a cytokine array using unstimulated and LPS + IFN-γ stimulated macrophages. The results indicated a small number of cytokines with altered expression, validating that RAS-p110α activation of p-p65 regulates the expression of some inflammatory cytokines. These results have been added to the manuscript and to Figure 1 (panels C and D). In brief, the data suggest an impairment in recruitment factors and inflammatory regulators following the disruption of RAS-p110α signaling in macrophages, which aligns with the observed in vivo phenotype. 

      Our findings indicate that the disruption of RAS-p110α signaling has a complex and multifaceted role in BMDMs. Specifically, monocytes lacking RAS-PI3K are unable to reach the inflamed area due to an impaired ability to extravasate, caused by altered actin cytoskeleton dynamics. Consequently, inflammation is sustained over time, continuously releasing inflammatory mediators. Moreover, we have shown that macrophages deficient in RAS-p110α interaction fail to mount a full inflammatory response due to decreased activation of p-p65, leading to reduced production of a set of inflammatory regulators. Additionally, these macrophages are unable to effectively process phagocytosed material and activate the resolutive phase of inflammation. As a result of these defects, an exacerbated and sustained inflammatory response occurs. 

      Our in vivo data, showing an increase in systemic inflammatory mediators, might be a consequence of the accumulation of monocytes produced by bone marrow progenitors in response to sensed inflammatory stimuli, but unable to extravasate.

      Regarding the sentence in the introduction: "this disruption leads to a change in macrophage polarization, favoring a more proinflammatory M1 state" (reference 12), this was observed in an oncogenic context, which might differ from the role of RAS-p110α in a non-oncogenic situation, as analyzed in this work. We introduced these results as an example to establish the role of RAS-p110α in macrophages, demonstrating its participation in macrophage-dependent responses. Together with our study, these findings clearly indicate that p110α signaling is critical when analyzing full immune responses. Previously, little was known about the role of this PI3K isoform in immune responses. Our data, along with those presented by Murillo et al. (ref. 12), demonstrate that p110α plays a significant role in macrophage function in both oncogenic and inflammatory contexts. Additionally, our results suggest that this role is complex and multifaceted, warranting further investigation to fully understand the complexity of p110α signaling in macrophages.

      Regarding decreased levels of p65 at 10’ and 30’ in RBD cells we are still uncertain about the possible molecular mechanism leading to the observed decrease. No changes in p65 mRNA levels were observed after 30 minutes of LPS+IFNγ treatment as shown in Author response image 1.

      Author response image 1.

      Preliminary data not shown here suggest that treating macrophages with BYL exhibits a similar effect, indicating a potential pathway for investigation. Considering that the decrease in protein levels is not due to lower mRNA expression, we may infer that post-translational mechanisms are leading to early protein degradation in RAS-p110α deficient macrophages. This could explain the observed decrease in protein activation. However, the specific molecular mechanism responsible for this degradation remains unclear, and further research is necessary to elucidate it. 

      (2) In Fig 3, the authors used bone-marrow derived macrophages (BMDMs) instead of isolated monocytes to evaluate the ability of monocyte transendothelial migration, which is not sufficiently convincing. In Fig. 3B, the authors evaluated the migration in Pik3caWT/- BMDMs, and Pik3caWT/WT BMDMs treated with BYL-719'. Given that the dose effect of gene expression, the best control is Pik3caWT/- BMDMs treated with BYL-719. 

      We thank reviewer for this comment. While we agree that using BMDMs might not be the most conventional approach for studying monocyte migration, there were several reasons why we still considered them a valid method. While isolated monocytes are the initial cell type involved in transendothelial migration, bone marrow-derived macrophages (BMDMs) provide a relevant and practical model for studying this process. BMDMs are differentiated from the same bone marrow precursors as monocytes and retain the ability to respond to chemotactic signals, adhere to endothelial cells, and migrate through the endothelium. This makes them a suitable tool for examining the cellular and molecular mechanisms underlying monocyte migration and subsequent macrophage infiltration into tissues. Additionally, BMDMs offer experimental consistency and are easier to manipulate in vitro, enabling more controlled and reproducible studies. 

      In response to the comment regarding Fig. 3B, we appreciate the suggestion to use Pik3ca WT/- BMDMs treated with BYL-719 as a control. However, our rationale for using Pik3ca WT/WT BMDMs treated with BYL-719 was based on a conceptual approach rather than a purely experimental control. The BYL-719 treatment in Pik3ca WT/WT cells was intended to simulate the inhibition of p110α in a fully functional, wild-type context. This allows us to directly assess the impact of p110α inhibition under normal physiological conditions, which is more representative of what would occur in an organism where the full dose of Pik3ca is present. Using Pik3ca WT/- BMDMs treated with BYL-719 as a control may not accurately reflect the in vivo scenario, where any therapeutic intervention would likely occur in the context of a fully functional, wild-type background. Our approach aims to provide a clearer understanding of how p110α inhibition affects cell functionality in a wild-type setting, which is relevant for potential therapeutic applications. Therefore, we considered the use of Pik3ca WT/WT BMDMs with BYL-719 treatment to be a more appropriate control for testing the effects of p110α inhibition in normal conditions.

      (3) In Fig. 4E-4G, the authors observed that elevated levels of serine 3 phosphorylated Cofilin in Pik3caRBD/- BMDMs both in unstimulated and in proinflammatory conditions, and phosphorylation of Cofilin at Ser3 increase actin stabilization, it is not clear why disruption of RAS-p110α binding caused a decrease in the F-actin pool in unstimulated BMDMs? 

      We thank the reviewer for this insightful comment. During the review process, we have carefully quantified all the Western blots conducted. While we did observe an increase in phospho-Cofilin (Ser3) levels in RBD BMDMs, this increase did not reach statistical significance. As a result, we cannot confidently attribute the observed increase in F-actin to this proposed mechanism. We apologize for any confusion this may have caused. Consequently, we have removed these data from Figure 4G and the associated discussion.

      Unfortunately, we have not yet identified the underlying mechanism responsible for this phenotype. Future experiments will focus on exploring potential alterations in other actin-nucleating, regulating, and stabilizing proteins that could account for the observed changes in F-actin levels.

      Reviewer #2 (Public Review): 

      Summary: 

      Cell intrinsic signaling pathways controlling the function of macrophages in inflammatory processes, including in response to infection, injury or in the resolution of inflammation are incompletely understood. In this study, Rosell et al. investigate the contribution of RAS-p110α signaling to macrophage activity. p110α is a ubiquitously expressed catalytic subunit of PI3K with previously described roles in multiple biological processes including in epithelial cell growth and survival, and carcinogenesis. While previous studies have already suggested a role for RAS-p110α signaling in macrophages function, the cell intrinsic impact of disrupting the interaction between RAS and p110α in this central myeloid cell subset is not known. 

      Strengths: 

      Exploiting a sound previously described genetically mouse model that allows tamoxifen-inducible disruption of the RAS-p110α pathway and using different readouts of macrophage activity in vitro and in vivo, the authors provide data consistent with their conclusion that alteration in RAS-p110α signaling impairs the function of macrophages in a cell intrinsic manner. The study is well designed, clearly written with overall high-quality figures. 

      Weaknesses: 

      My main concern is that for many of the readouts, the difference between wild-type and mutant macrophages in vitro or between wild-type and Pik3caRBD mice in vivo is rather modest, even if statistically significant (e.g. Figure 1A, 1C, 2A, 2F, 3B, 4B, 4C). In other cases, such as for the analysis of the H&E images (Figure 1D-E, S1E), the images are not quantified, and it is hard to appreciate what the phenotype in samples from Pik3caRBD mice is or whether this is consistently observed across different animals. Also, the authors claim there is a 'notable decrease' in Akt activation but 'no discernible chance' in ERK activation based on the western blot data presented in Figure 1A. I do not think the data shown supports this conclusion. 

      We appreciate the reviewer's careful examination of our data and their observation regarding the modest differences between wild-type and mutant macrophages in vitro, as well as between wild-type and Pik3caRBD mice in vivo. While the differences observed in Figures 1A, 1C, 2A, 2F, 3B, 4B, and 4C are statistically significant but modest, our data demonstrate that they are biologically relevant and should be interpreted within the specific nature of our model. Our study focuses on the disruption of the RASp110α interaction, but it should be noted that alternative pathways for p110α activation, independent of RAS, remain functional in this model. Additionally, the model retains the expression of other p110 isoforms, such as p110β, p110γ, and p110δ, which are known to have significant roles in immune responses. Given the overlapping functions of these p110 isoforms, and the fact that our model involves a subtle modification that specifically affects the RAS-p110α interaction without completely abrogating p110α activity, it is understandable that only modest effects are observed in some readouts. The redundancy and compensation by other p110 isoforms likely mitigate the impact of disrupting RAS-mediated p110α activation.

      However, despite these modest in vitro differences, it is crucial to highlight that the in vivo effects on inflammation are both clear and consistent. The persistence of inflammation in our model suggests that the RAS-p110α interaction plays a specific, non-redundant role in resolving inflammation, which cannot be fully compensated by other signaling pathways or p110 isoforms. These findings underscore the importance of RAS-p110α signaling in immune homeostasis and suggest that even subtle disruptions in this pathway can lead to significant physiological consequences over time, particularly in the context of inflammation. The modest differences observed may represent early or subtle alterations that could lead to more pronounced phenotypes under specific stress or stimulation conditions. This could be tested across all the figures mentioned. For instance, in Fig. 1A, the Western blot for AKT has been quantified, demonstrating a significant decrease in AKT levels; in Fig. 1C, although the difference in paw inflammation was only a few millimeters in thickness, considering the size of a mouse paw, those millimeters were very noticeable by eye. Furthermore, pathological examination of the tissue consistently showed an increase in inflammation in RBD mice. Furthermore, the consistency of the observed differences across different readouts and experimental setups reinforces the reliability and robustness of our findings. Even modest changes that are consistently observed across different assays and conditions are indicative of genuine biological effects. The statistical significance of the differences indicates that they are unlikely to be due to random variation. This statistical rigor supports the conclusion that the observed effects, albeit modest, are real and warrant further exploration.

      Regarding the analysis of H&E images, we have now quantified the changes with the assistance of the pathologist, Mª Carmen García Macías, who has been added to the author list. We removed the colored arrows from the images and instead quantified fibrin and chromatin remnants as markers of inflammation staging. Loose chromatin, which increases as a consequence of cell death, is higher in the early phases of inflammation and decreases as macrophages phagocytose cell debris to initiate tissue healing. Chromatin content was scored on a scale from 1 to 3, where 1 represents the lowest amount and 3 the highest. The scoring was based on the area within the acute inflammatory abscess where chromatin could be found: 3 for less than 30%, 2 for 30-60%, and 1 for over 60%. Graphs corresponding to this quantification have now been added to Figure 1 and an explanation of the scale has been added to Material and Methods. 

      To further substantiate the extent of macrophage function alteration upon disruption of RAS-p110α signaling, the manuscript would benefit from testing macrophage activity in vitro and in vivo across other key macrophage activities such as bacteria phagocytosis, cytokine/chemokine production in response to titrating amounts of different PAMPs, inflammasome function, etc. This would be generally important overall but also useful to determine whether the defects in monocyte motility or macrophage lysosomal function are selectively controlled downstream of RAS-p110α signaling.  

      We thank reviewer #2 for this comment. In order to better address the role of RAS-PI3K in macrophage function, we have performed some additional experiments, some of which have been added to the revised version of the manuscript. 

      (1) We have performed cytokine microarrays of RAS-p110α deficient macrophages unstimulated and stimulated with LPS+IFN-g. Results have been added to the manuscript and to Supplementary Figure S1E and S1F. In brief, the data obtained suggest an impairment in recruitment factors, as well as in inflammatory regulators after disruption of RAS-p110α signaling in macrophages, which align with the in vivo observed phenotype. 

      (2) We also conducted phagocytosis assays to analyze the ability of RAS-p110α deficient macrophages to phagocytose 1 µm Sepharose beads, Borrelia burgdorferi, and apoptotic cells. The data reveal varied behavior of RAS-p110α deficient bone marrow-derived macrophages (BMDMs) depending on the target: 

      • Engulfment of Non-biological Particles: RAS-p110α deficient macrophages showed a decreased ability to engulf 1 µm Sepharose beads. This suggests that RAS-p110α signaling is important for the effective phagocytosis of non-biological particles. These findings have now been added to the text and figures have been added to supplementary Fig. S4A

      • Response to Bacterial Pathogens: When exposed to Borrelia burgdorferi, RAS-p110α deficient macrophages did not exhibit a change in bacterial uptake. This indicates that RAS-p110α may not play a critical role in the initial phagocytosis of this bacterial pathogen. The observed increase in the phagocytic index, although not statistically significant, might imply a compensatory mechanism or a more complex interaction that warrants further investigation. These findings have now been added to the text and figures have been added to supplementary Fig. S4B. These experiments were performed in collaboration with Dr. Anguita, from CICBioBune (Bilbao, Spain) and, as a consequence, he has been added as an author in the paper. 

      • Phagocytosis of Apoptotic Cells: There were no differences in the phagocytosis rate of apoptotic cells between RAS-p110α deficient and control macrophages at early time points. However, the accumulation of engulfed material at later time points suggests a possible delay in the processing and degradation of apoptotic cells in the absence of RAS-p110α signaling.

      These findings highlight the complexity of RAS-p110α's involvement in phagocytic processes and suggest that its role may vary with different types of phagocytic targets. 

      Furthermore, given the key role of other myeloid cells besides macrophages in inflammation and immunity it remains unclear whether the phenotype observed in vivo can be attributed to impaired macrophage function. Is the function of neutrophils, dendritic cells or other key innate immune cells not affected? 

      Thank you for this insightful comment. We understand the key role of other myeloid cells in inflammation and immunity. However, our study specifically focuses on the role of macrophages. Our data show that disruption of RAS-PI3K leads to a clear defect in macrophage extravasation, and our in vitro data demonstrate issues in macrophage cytoskeleton and phagocytosis, aligning with the in vivo phenotype.

      Experiments investigating the role of RAS-PI3K in neutrophils, dendritic cells, or other innate immune cells are beyond the scope of this study. Understanding these interactions would indeed require separate, comprehensive studies and the generation of new mouse models to disrupt RAS-PI3K exclusively in specific cell types.

      Furthermore, during paw inflammation experiments, polymorphonuclear cells were present from the initial phases of the inflammatory response. What caught our attention was the prolonged presence of these cells. In conversation with our in-house pathologist, she mentioned the lack of macrophages to remove dead polymorphonuclear cells in our RAS-PI3K mutant mice. Specific staining for macrophages confirmed the absence of macrophages in the inflamed node of mutant mice.

      We acknowledge that further research is necessary to elucidate the effects on other myeloid cells. However, our current findings provide clear evidence of a decrease in inflammatory monocytes and defective macrophage responses to inflammation, both in vivo and in vitro. We believe these results significantly contribute to understanding the role of RAS-PI3K in macrophage function during inflammation.

      Compelling proof of concept data that targeting RAS-p110α signalling constitutes indeed a putative approach for modulation of chronic inflammation is lacking. Addressing this further would increase the conceptual advance of the manuscript and provide extra support to the authors' suggestion that p110α inhibition or activation constitute promising approaches to manage inflammation. 

      We thank Reviewer #2 for this insightful comment. In our manuscript, we have demonstrated through multiple experiments that the inhibition of p110α, either by disrupting RAS-p110α signaling or through the use of Alpelisib (BYL-719), has a modulatory effect on inflammatory responses. However, we acknowledge that we have not activated the pathway due to the unavailability of a suitable p110α activator until the concluding phase of our study.

      We recognize the importance of this point and are eager about investigating both the inhibition and activation of p110α as potential approaches to managing inflammation in well-established inflammatory disease models. We believe that such comprehensive studies would significantly enhance the conceptual advance and translational relevance of our findings.

      However, it is essential to note that the primary aim of our current work was to demonstrate the role of RAS-p110α in the inflammatory responses of macrophages. We have successfully shown that RASp110α influences macrophage behavior and inflammatory signaling. Expanding the scope to include disease models and pathway activation studies would be an extensive project that goes beyond the current objectives of this manuscript. While our present study establishes the foundational role of RASp110α in macrophage-mediated inflammatory responses, we agree that further investigation into both p110α inhibition and activation in disease models is crucial. We are keen to pursue this line of research in future studies, which we believe will provide robust evidence supporting the therapeutic potential of targeting RAS-p110α signaling in chronic inflammation.

      Finally, the analysis by FACS should also include information about the total number of cells, not just the percentage, which is affected by the relative change in other populations. On this point, Figure S2B shows a substantial, albeit not significant (with less number of mice analysed), increase in the percentage of CD3+ cells. Is there an increase in the absolute number of T cells or does this apparent relative increase reflect a reduction in myeloid cells? 

      We thank the reviewer for this comment, which we have addressed in the revised version of the manuscript. Regarding the total number of cells analyzed, we have added to the Materials and Methods section that in all our studies, a total of 50,000 cells were analyzed (line 749). The percentages of cells are related to these 50,000 events. Additionally, we have increased the number of mice analyzed by including new mice for CD3+ cell analysis. Despite this, the results remain not significant.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):   

      (1) It is recommended to provide a graphical abstract to summarize the multiple functions of RAS-p110α pathway in monocyte/macrophages that the authors proposed 

      We thank reviewer for this useful recommendation. A graphical abstract has now been added to the study. 

      (2) Western blots in this paper need quantification and a measure of reproducibility 

      We have now added a graph with the quantification of the western blots performed in this work as a measure of reproducibility. 

      (3) Representative flow data and gating strategy should be included

      We have now added the description of the gating strategy followed to material and methods section.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      Tian et al. describe how TIPE regulates melanoma progression, stemness, and glycolysis. The authors link high TIPE expression to increased melanoma cell proliferation and tumor growth. TIPE causes dimerization of PKM2, as well as translocation of PKM2 to the nucleus, thereby activating HIF-1alpha. TIPE promotes the phosphorylation of S37 on PKM2 in an ERK-dependent manner. TIPE is shown to increase stem-like phenotype markers. The expression of TIPE is positively correlated with the levels of PKM2 Ser37 phosphorylation in murine and clinical tissue samples. Taken together, the authors demonstrate how TIPE impacts melanoma progression, stemness, and glycolysis through dimeric PKM2 and HIF-1alpha crosstalk.

      Strengths:

      The authors manipulated TIPE expression using both shRNA and overexpression approaches throughout the manuscript. Using these models, they provide strong evidence of the involvement of TIPE in mediating PKM2 Ser37 phosphorylation and dimerization. The authors also used mutants of PKM2 at S37A to block its interaction with TIPE and HIF-1alpha. In addition, an ERK inhibitor (U0126) was used to block the phosphorylation of Ser37 on PKM2. The authors show how dimerization of PKM2 by TIPE causes nuclear import of PKM2 and activation of HIF-1alpha and target genes. Pyridoxine was used to induce PKM2 dimer formation, while TEPP-46 was used to suppress PKM2 dimer formation. TIPE maintains stem cell phenotypes by increasing the expression of stem-like markers. Furthermore, the relationship between TIPE and Ser37 PKM2 was demonstrated in murine and clinical tissue samples.

      Weaknesses:

      The evaluation of how TIPE causes metabolic reprogramming can be better assessed using isotope tracing experiments and improved bioenergetic analysis.

      Thank you immensely for your invaluable suggestions. Regrettably, we encountered a significant obstacle in completing the isotope tracing experiments due to an unfortunate shortage of necessary instruments. Furthermore, despite our efforts to consult with several companies, we were unable to secure their assistance, which unfortunately hindered the completion of these experiments. We deeply apologize for this imperfection in our experimental design and have thoroughly discussed this limitation in our manuscript.

      Additionally, we acknowledge our oversight in the previous versions of our manuscripts, where only three metabolites were presented. To rectify this and provide a more comprehensive understanding of the metabolic reprogramming induced by TIPE, we have conducted routine untargeted metabolomics analysis. We are pleased to announce that we have incorporated the detailed results of this analysis into our work as a new supplementary figure, designated as Figure S3. This figure specifically highlights the notable decrease in the glycolysis pathway, particularly in pyruvate and lactic acid levels, following TIPE interference.

      Reviewer #2 (Public Review):

      In this article, Tian et al present a convincing analysis of the molecular mechanisms underpinning TIPE-mediated regulation of glycolysis and tumor growth in melanoma. The authors begin by confirming TIPE expression in melanoma cell lines and identify "high" and "low" expressing models for functional analysis. They show that TIPE depletion slows tumour growth in vivo, and using both knockdown and over-expression approaches, show that this is associated with changes in glycolysis in vitro. Compelling data using multiple independent approaches is presented to support an interaction between TIPE and the glycolysis regulator PKM2, and the over-expression of TIPE-promoted nuclear translocation of PKM2 dimers. Mechanistically, the authors also demonstrate that PKM2 is required for TIPE-mediated activation of HIF1a transcriptional activity, as assessed using an HRE-promoter reporter assay, and that TIPE-mediated PKM2 dimerization is p-ERK dependent. Finally, the dependence of TIPE activity on PKM2 dimerization was demonstrated on tumor growth in vivo and in the regulation of glycolysis in vitro, and ectopic expression of HIF1a could rescue the inhibition of PKM2 dimerization in TIPE overexpressing cells and reduced induction of general cancer stem cell markers, showing a clear role for HIF1a in this pathway. The main conclusions of this paper are well supported by data, but some aspects of the experiments need clarification and some data panels are difficult to read and interpret as currently presented.

      The detailed mechanistic analysis of TIPE-mediated regulation of PKM2 to control aerobic glycolysis and tumor growth is a major strength of the study and provides new insights into the molecular mechanisms that underpin the Warburg effect in cancer cells. However, despite these strengths, some weaknesses were noted, which if addressed will further strengthen the study.

      (1) The analysis of patient samples should be expanded to more directly measure the relationship between TIPE levels and melanoma patient outcome and progression (primary vs metastasis), to build on the association between TIPE levels and proliferation (Ki67) and hypoxia gene sets that are currently shown.

      Thanks for your suggestions. We have expanded the analysis to include the relationship between TIPE levels and melanoma progression, specifically distinguishing between non-lymph node metastasis and lymph node metastasis. In addition, we added the association between TIPE and Ki67 or LDH levels as your advised, as shown in Figure 7.

      However, the relationship between TIPE levels and melanoma patient outcome is not presented in this article. One reason is that the tissue microarray lack of the survival data. Interestingly, the TCGA dataset showed that the higher TIPE expression has a favorable prognosis for melanoma. We are also very curious about this. Our following study indicated that TIPE might serve as a positive regulator of PD-L1. Therefore, the higher expression of TIPE presents more sensitive tendency to immunotherapy, resulting in a favorable prognosis in melanoma. The detailed mechanisms will be discussed in our following article, and we hope that it might as a continuous research topic for TIPE in melanoma.

      We just only disclose a little information that TIPE shares similar survival and immune signature to PD-L1 and PD-1 in melanoma as following:

      Author response image 1.

      (2) The duration of the in vivo experiments was not clearly defined in the figures, however, it was clear from the tumor volume measurements that they ended well before standard ethical endpoints in some of the experiments. A rationale for this should be provided because longer-duration experiments might significantly change the interpretation of the data. For example, does TIPE depletion transiently reduce or lead to sustained reductions in tumor growth?

      Thanks for your suggestions. Actually, we have performed a pre-experiment before the formal experiments, and all the time points were referred to this. Furthermore, we have added the detailed time points into the figure legends as you suggested.

      (3) The analysis of general cancer stem cell markers is solid and interesting, however inclusion of neural crest stem cell markers that are more relevant to melanoma biology would greatly strengthen this aspect of the study.

      Thanks for your advices. We have selected two neural crest stem cell markers including Nestin and Sox10 to test their expression after overexpression of TIPE in G361 cells or interference of TIPE in A375 cells.

      (4) The authors should take care that all data panels are clearly readable in the figures to facilitate appropriate interpretation by the reader.

      Thanks for your suggestions. We have amended the data panels according to you advises to ensure it is clear and professionally presented.

      Reviewer #1 (Recommendations for the authors):

      It would be suggested to improve the image quality of certain panels (please refer to Fig.1A and Fig.S3B-D).

      Thank you for your expert advice. We have optimized the quality of certain panels according to your suggestions.

      Reviewer #2 (Recommendations for the authors):

      Major comments:

      - TCGA survival/patient outcome data relative to TIPE levels should be provided in the supplementary figures, together with TIPE correlation with PKM2.

      - Suggest revising how this point is described in the discussion.

      We have added the results of TIPE expression and prognosis of melanoma patients from the TCGA database as required by the expert, and discussed it appropriately in the article. In addition, the correlation between TIPE and PKM2 expression has already been described in Supplementary Figure 6.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The work by Joseph et al "Impact of the clinically approved BTK inhibitors on the conformation of full-length BTK and analysis of the development of BTK resistance mutations in chronic lymphocytic leukemia" seeks to comparatively analyze the effect of a range of covalent and noncovalent clinical BTK inhibitors upon BTK conformation. The novel aspect of this manuscript is that it seeks to evaluate the differential resistance mutations that arise distinctly from each of the inhibitors.

      Strengths:

      This is an exciting study that builds upon the fundamental notion of ensemble behavior in solutions for enzymes such as BTK. The HDX-MS and NMR experiments are adequately and comprehensively presented.

      We thank the reviewer for this positive feedback.

      Weaknesses:

      While I commend the novelty of the study, the absence of important controls greatly tempers my enthusiasm for this work. As stated in the abstract, there are no broad takeaways for how resistance mutation bias operated from this study, although the mechanism of action of 2 common resistance mutations is useful. How these 2 resistance mutations connect to ensemble behavior, is not obvious. This is partly because BTK does not populate just binary "open"/"closed" conformations, but there are likely multiple intermediate conformations. Each inhibitor appears to preferentially "select" conformations by the authors' own assessment (line 236) and this carries implications for the emergence of resistance mutations. The most important control that would help is to use ADP or nonhydrolyzable and ATP as a baseline to establish the "inactive" and "active" conformations. All of the HDX-MS and NMR studies use protein that has no nucleotide present. A major question that remains is whether each of the inhibitors preferentially favors/blocks ADP or ATP binding. This then means it is not equivalent to correlate functional kinase assay conditions with either HDX-MS or NMR experiments.

      We thank the reviewer for raising this point. The BTK inhibitors studied here are active site inhibitors that completely prevent (block) nucleotide (both ATP and ADP) binding. We believe the other question being asked here is whether the different BTK inhibitors bind preferentially to the ADP or ATP bound kinase (do the conformational states favored by ADP versus ATP bound BTK affect drug binding). We agree this is an interesting question that deserves further study. Here we are focused on the ligand bound state itself rather than on the conformational state selection mechanism of each inhibitor. Thus, HDX-MS and NMR work to compare ligand bound to apo-, ADP, and ATP bound BTK is beyond the scope of this manuscript. That said, previous work (doi: 10.1038/s41598-017-17703-5) has shown that the related TEC kinase, ITK, preferentially binds ADP when the kinase is in the autoinhibited conformation. Since we have previously shown that BTK adopts the autoinhibited conformation in the nucleotide free form (https://doi.org/10.7554/eLife.89489.2), we suggest that the comparison we have carried out here between drug bound and apo-protein is valid. Future work will carefully address the conformational preferences of all three conditions, apo-, ADP- and ATP-bound.

      Reviewer #2 (Public Review):

      Summary:

      Previous NMR and HDX-MS studies on full-length (FL) BTK showed that the covalent BTKi, ibrutinib, causes long-range effects on the conformation of BTK consistent with disruption of the autoinhibited conformation, based on HDX deuterium uptake patterns and NMR chemical shift perturbations. This study extends the analyses to four new covalent BTKi, acalabrutinib, zanubrutinib, tirabrutinib/ONO4059, and a noncovalent ATP competitive BTKi, pirtobrutinib/LOXO405.

      The results show distinct conformational changes that occur upon binding each BTKi. The findings show consistent NMR and HDX changes with covalent inhibitors, which move helix aC to an 'out' position and disrupt SH3-kinase interactions, in agreement with X-ray structures of the BTKi complexed with the BTK kinase domain. In contrast, the solution measurements show that pirtobrutinib maintains and even stabilizes the helix aC-in and autoinhibited conformation, even though the BTK:pritobrutinib crystallizes with helix aC-out. This and unexpected variations in NMR and HDX behavior between inhibitors highlight the need for solution measurements to understand drug interactions with the full-length BTK. Overall the findings present good evidence for allosteric effects by each BTKi that induce distal conformational changes which are sensitive to differences in inhibitor structure.

      The study goes on to examine BTK mutants T474I and L528W, which are known to confer resistance to pirtobrutinib, zanubritinib, and tirabrutinib. T474I reduces and L528W eliminates BTK autophosphorylation at pY551, while both FL-BTK-WT and FL-BTK-L528W increase HCK autophosphorylation and PLCg phosphorylation. These show that mutants partially or completely inactivate BTK and that inactive FL-BTK can activate HCK, potentially by direct BTK-HCK interactions. But they do not explain drug resistance. However, HDX and NMR show that each mutant alters the effects of BTKi binding compared to WT. In particular, T474I alters the effects of all three inhibitors around W395 and the activation loop, while L528W alters interactions around W395 with tirabrutinib and pirtobrutinib, and does not appear to bind zanubrutinib at all. The study concludes that the mutations might block drug efficacy by reducing affinity or altering binding mode.

      Strengths:

      The work presents convincing evidence that BTK inhibitors alter the conformation of regions distal to their binding sites, including those involved in the SH3-kinase interface, the activation loop, and a substrate binding surface between helix aF and helix aG. The findings add to the growing understanding of allosteric effects of kinase inhibitors, and their potential regulation of interactions between kinase and binding proteins.

      We thank the reviewer for these positive comments.

      Weaknesses:

      The interpretation of HDX, NMR, and kinase assays is confusing in some places, due to ambiguity in quantifying how much kinase is bound to the inhibitor. It would be helpful to confirm binding occupancy, in order to clarify if mutants lower the amount of BTK complexed with BTKi as implied in certain places, or if they instead alter the binding mode. In addition, the interpretation of the mutant effects might benefit from a more detailed examination of how each inhibitor occupies the ATP pocket and how substitutions of T474 and L528 with Ile and Trp respectively might change the contacts with each inhibitor.

      We thank the reviewer for these suggestions. As requested we have now modified the manuscript to clearly state the effects of the mutations on inhibitor binding. Additionally, we have included a new figure to discuss the interaction of the inhibitors within the BTK kinase active site to provide a better explanation for the impact of the resistance mutations.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major Comments:

      (1) What is the binding affinity of ATP/ADP to BTK? BTK is purified by the authors as an apoenzyme (by the final purification by SEC, all protein should be completely stripped of nucleotide)- but must toggle between ATP and ADP-bound states. Do the inhibitors completely sterically block nucleotide binding? Do they only block one or the other- ADP/ATP binding? Do they weaken ADP/ATP binding? The authors have an opportunity with NMR to establish a clear baseline to compare the inhibitors' effects on BTK. It is not clear if the authors' assumption is that all BTKi share a common mode of action (Line 114).

      All BTK inhibitors studied in this work (Ibrutinib, Acalabrutinib, Zanubrutinib, Tirabrutinib and Pirtobrutinib) share a common mode of action. They are active site inhibitors that completely block nucleotide (ATP and ADP) binding. The introduction to the manuscript has been updated to add this information (lines 70-71, pg. 4).

      "The covalent BTK inhibitors (Ibrutinib, Acalabrutinib, Zanubrutinib and Tirabrutinib) and the non-covalent BTK inhibitor Pirtobrutinib bind tightly to the BTK active site (Kinact/KI or KD values in the nM range; DOI: 10.1056/NEJMoa2114110). In contrast, previous studies have reported nucleotide affinity for TEC kinases that are lower (KD in the µM range), (doi: 10.1038/s41598-017-17703-5). Additionally, the same work has shown that the conformational state of TEC kinases can impact nucleotide binding. The TEC kinases have a higher affinity for ADP (KD ~ 20 µM), as compared to ATP (KD ~ 15 fold lower than ADP), when the full-length protein adopts the autoinhibited conformation. Disruption of the TEC kinase autoinhibited conformation (by mutation) decreases the affinity for ADP, allowing ATP to bind, enabling kinase activity. Nevertheless, regardless of the conformational state of BTK, all the BTK inhibitors studied here block both ADP and ATP binding to the active site."

      (2) Is there an effect of nucleotide binding bias on resistance mutation emergence? Is there a nucleotide binding bias in the resistance mutations characterized in this study? There likely is - BTK L528W is catalytically inactive. It is not clear if this mutant stays bound to ADP or to ATP and cannot transfer the phosphate to its substrate. How does BTK T474I interact with ADP/ATP? This is needed before concluding - in lines 289-291- that mutations cause only minor conformational changes. This needs a qualifier - in the nucleotide-free apo conformation.

      The BTK L528W mutation introduces a bulky sidechain into the BTK kinase active site that sterically impedes both ATP and ADP binding. In fact, previous studies (https://doi.org/10.1016/j.jbc.2022.102555) have confirmed the inability of the BTK L528W mutant to bind ATP.

      The BTK T474I mutation could alter nucleotide binding. However, The BTK T474I mutation lowers the overall activity of BTK, and is consistent with previous work that have shown the same (https://doi.org/10.1021/acschembio.6b00480). The decrease in overall kinase activity cannot account for the development of resistance (which typically requires increased kinase activity). Hence, a decrease in inhibitor binding is likely driving resistance.

      Lines 293 (pg. 14) have been modified to indicate that the conformational changes observed in the BTK mutants are in the absence of nucleotide as requested.

      (3) What is the half-life BTK? And does inhibitor binding to BTK change the half-life of the inhibitor?

      BTK has a long half-life of 48-72 h (DOI: https://doi.org/10.1124/jpet.113.203489). Unbound covalent inhibitors are rapidly cleared from the body with short half-lives on the order of < 4h. Non-covalent BTK inhibitors typically have a longer half-life on the order of 20h. Once bound to BTK, the irreversible nature of binding by covalent inhibitors make them unavailable to other molecules. CLL patients are treated typically with a once daily or twice daily dose of BTK inhibitor. Hence, inhibitor binding to BTK does not alter the half-life of free inhibitor.

      (4) Are there broad differences between covalent and single non-covalent inhibitors upon resistance mutation bias? And nucleotide binding?

      The biggest difference observed between BTK covalent and non-covalent inhibitors in the emergence of resistance mutations is the occurrence of the C481S mutation in patients treated with covalent inhibitors. This resistance mutation is absent in patients treated with non-covalent BTK inhibitors. Patients that develop mutations in BTK C481 can no longer be treated with any of the approved covalent BTK inhibitors (as they all use BTK C481 for covalent linkage). To ensure BTK inhibition, patients with mutations in C481 can be treated with non-covalent BTK active site inhibitors. All currently approved BTK inhibitors (covalent and non-covalent) are active site inhibitors that compete with nucleotide binding.

      (5) It's unclear why the authors chose to evaluate the impact of inhibitor binding on the linker kinase domain first. This seems unnecessary.

      NMR analysis is easier on the smaller BTK linker kinase domain (LKD) fragment compared to the full-length protein. Hence for practical reasons we used the BTK LKD fragment.

      (6) Line 508 - there seems to be a gap in understanding protein half-lives, inhibitor half-lives, and the emergence of resistance mutations in this manuscript itself. The manuscript falls short of a mechanistic descriptor of variable inhibitors and resistance mutation bias.

      The half-life of the inhibitors assessed in this study are provided in Table 1 of this manuscript. The emergence of resistance mutations such as C481 are likely due to a direct consequence of differences in inhibitor half-life as described in the discussion section of this manuscript (page 23).

      (7) HDX-MS reports the conformational average difference across the ensemble but does not distinguish between the number of intermediary conformations. The authors should clarify that this is a limitation of an average readout method such as HDX-MS. This is currently not addressed.

      A sentence describing this limitation has been added (lines 219-221, pg. 11) as requested.

      Minor  Points:

      (1) Some of the qualitative descriptors are unnecessary - line 284 - "Slightly towards....". Line 286 - "Slight stabilizing effect on the conformation..." How slight is slight?

      Qualitative descriptors have been removed from the manuscript as requested.

      (2) The authors should provide SPR data with Kon and Koff values for Pirtobrutinib binding to BTK ( in the presence of ARP and ADP).

      SPR analysis of Pirtobrutinib has previously been reported. Pirtobrutininb binds to BTK wild-type with a KD of 0.9 nM (DOI: 10.1056/NEJMoa2114110). As mentioned earlier in response to comment 1, Pirtobrutinib binds to the BTK kinase active site and is competitive with both nucleotides (ATP and ADP, which bind with lower affinity, KD in the µM range).

      (3) In Figure 2, the legend needs to describe the specific time point represented. Same with Figure 5.

      The HDX-MS changes that are mapped onto the structure represent the maximal changes observed at any time point. The figure legends have been modified as requested to clarify this.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 7 is an amazing and impressive finding, but it could use two controls: First a blot of pY551 to show more rigorously that FL-BTK-WT and L528W autophosphorylation is unaffected by zanubrutinib binding, just to eliminate the possibility that elevated pY551 accounts for the enhanced HCK phosphorylation.

      Both BTK FL enzymes (WT and L528W) in this assay are catalytically inactive and do not contribute to autophosphorylation on BTK Y551 (BTK FL WT is inhibited by Zanubrutinib and BTK FL L528W is catalytically dead). Additionally, BTK FL WT and BTK FL L528W are both able to activate HCK. Hence differences in pY551 levels between these BTK proteins cannot explain how both proteins are able to activate HCK.

      Nevertheless, as requested, we probed for pY551 levels on BTK. While BTK cannot autophosphorylate itself on BTK Y551 in this assay, BTK Y551 is able to be phosphorylated by HCK. BTK Y551 phosphorylation levels were higher in BTK FL WT compared to BTK FL L528W likely due to Y551 on the activation loop being less accessible in the BTK L528W mutant (which is more stabilized in the autoinhibited conformation) compared to the WT protein. This data has been added as a new panel in Figure 7a.

      Additionally, we tested the ability of the BTK FL L528W/Y551F double mutant to activate HCK. The BTK FL L528W/Y551F double mutant is able to activate HCK similar to BTK FL L528W single mutant, demonstrating that phosphorylation on Y551 is not necessary for HCK activation by BTK FL L528W. This new data has been added as supplemental figure S2a. Taken together, pY551 levels on BTK do not contribute to enhanced HCK phosphorylation. The results section of the manuscript has been modified to include this additional data (Lines 319-335, pg. 15-16).

      Second, controls performed in the absence of Zanubrutinib are needed for the time courses with HCK alone, HCK + FL-BTK WT, and HCK + FL-BTK-L528W. This would help show that the ability of BTK to increase the phosphorylation of HCK and PLCg1 is (or isn't) dependent on drug interactions with BTK, HCK, or PLCg.

      BTK FL L528W can enhance phosphorylation on PLCg by HCK even in the absence of Zanubrutinib. We have added this data as a new supplemental figure S2b. We have not included BTK FL WT in this analysis as in the absence of Zanubrutinib, we would have two active enzymes (HCK and BTK) in the assay which would complicate the interpretation of the data. The results section of the manuscript has been modified to include this additional data (Lines 333-335, pg. 16).

      And please comment: in cells, does zanubrutinib treatment (or any other drug) increase pY phosphorylation of HCK or PLCg?

      All clinically approved BTK inhibitors (covalent and non-covalent) inhibit BTK WT activity and decrease PLCg phosphorylation in cells. There have been no reports, to our knowledge, of any clinically approved BTK inhibitor causing an increase in HCK activity.

      (2) Sections of the Results discussing Figures 8 and 9 are confusing to read because they variously propose that the mutants (i) reduce inhibitor occupancy, or (ii) alter the inhibitor binding mode. However, some of the results unambiguously show an altered binding mode instead of reduced inhibitor binding.

      a) For example, HDX clearly shows protection by tira, zanu, and pirto, therefore reduced inhibitor binding does not seem to be an option. Therefore, I recommend modifying lines 357-363. "The differences in deuterium exchange for drug binding to WT and mutant BTK suggest that the T474I mutation either causes a reduction in inhibitor binding or otherwise alters the mode of drug interaction in the active site. "

      While the HDX-MS data of BTK T474I shows protection by Tirabrutinib, Zanubrutinib and Pirtobrutinib, the magnitude of the protection is reduced in the BTK T474I mutant compared to WT BTK (Fig. 8e) suggesting a reduction in inhibitor binding. These results are consistent with previous SPR analysis of the BTK T474I mutant which also showed reduced binding to Zanubrutinib, Acalabrutinib and Pirtobrutinib (DOI: 10.1056/NEJMoa2114110). The manuscript (lines 381-383, pg. 18) has been modified to clearly state that the BTK T474I mutation causes a reduction in inhibitor binding.

      b) I recommend modifying lines 370-373.

      " In stark contrast to the BTK T474I mutant, the BTK 370 L528W mutant does not show any change in deuterium incorporation in the presence of 371 Zanubrutinib, Tirabrutinib or Pirtobrutinib, providing strong evidence that the BTK L528W 372 mutant does not bind the inhibitors (Fig.8d)."

      Lines 432-435: Although the L528W mutation alters binding to both Tirabrutinib 432 and Pirtobrutinib, the NMR data suggests that it retains partial binding unlike the HDX-MS data 433 that suggests complete disruption of binding. The higher inhibitor concentrations used in the NMR 434 experiments compared to the HDX-MS experiments likely explain this discrepancy."

      The discordance in the L528W mutant between the lack of any HDX protection by tira and pirto versus the clear chemical shift of W395 by NMR is worrisome. If the HDX experiments were really done under conditions where binding occupancy was too low, then it seems important to redo these experiments at higher drug concentrations.

      Alternatively, and perhaps more useful would be to report Kd for binding of these inhibitors to the two mutants. That would allow the authors to interpret these results more definitively.

      SPR analysis of inhibitor binding to full-length BTK WT, T474I and L528W has been previously reported (DOI: 10.1056/NEJMoa2114110). The covalent BTK inhibitors (Ibrutinib, Acalabrutinib, and Zanubrutinib) and the non-covalent BTK inhibitor Pirtobrutinib bind tightly to full-length WT BTK (Kinact/KI or KD values in the nM range). The BTK T474I mutation disrupts binding to Zanubrutinib, Acalabrutinib and Pirtobrutinib, but not Ibrutinib and Fenebrutinib. BTK L528W mutation disrupts binding to Zanubrutinib, Acalabrutinib, Ibrutinib and Pirtobrutinib, but not Fenebrutinib. These previously published results are consistent with the HDX-MS and NMR data presented here. The manuscript has been modified to clearly state that the mutations reduce drug binding instead of altered binding.

      c) Recommend adding data to confirm statements in lines 419-421:

      "Spectral overlays of the BTK L528W mutant with and without Zanubrutinib show no 419 chemical shift changes (Fig. 9a, right panel) suggesting that the mutation completely disrupts 420 inhibitor binding in complete agreement with the HDX-MS data (Fig. 8d).

      428-432: The Pirtobrutinib-bound BTK L528W spectrum (Fig. 9c) shows two resonance positions, 428 one of which overlaps with the W395 resonance in the apo protein and the other that corresponds to that of the mutant protein bound to Pirtobrutinib. This data suggests a mixture of inhibitor bound and unbound BTK kinase domain in solution, likely due to a reduction in Pirtobrutinib affinity 431 caused by the L528W mutation."

      Likewise, direct measurements of binding affinity to L528W would be helpful. It is not completely convincing that the effects of this mutant are due to the reduced binding of either inhibitor. The effects of pirtobrutinib may instead reflect a slow exchange of W395 instead of 50% occupancy. For example, what happened in the rest of the spectra? Were other chemical shifts apparent in either case, which might address binding stoichiometry? It would be useful to show the full spectra in Supplemental figures, as well as any titrations that may have been done to confirm that the inhibitors are added at saturating concentration.

      As requested the full-spectra of Pirtobrutinib bound to BTK L528W has now been added as supplemental figure S1c. In the BTK L528W bound to Pirtobrutinib spectrum, two cross peaks are visible for multiple resonances, one of which overlaps with that of the apo BTK L528W spectrum, suggesting that there is a mixture of apo and inhibitor bound forms of BTK L528W.

      The clinically approved inhibitors that we are working with here (Ibrutinib, Acalabrutinib, Zanubrutinib, Tirabrutinib and Pirtobrutinib have reported IC50 values in the nM range (0.5 nM, 3 nM, 0.3 nM, 6.8 nM and 3.68 nM respectively). All the NMR work presented here was carried out at a 1:1.33, protein:inhibitor ratio (absolute concentration of the inhibitor was 200 µM). NMR titrations of BTK WT have been carried out with Ibrutinib (https://doi.org/10.7554/eLife.60470) and Tirabrutinib. Complete binding is observed at a 1:1 molar ratio of protein:inhibitor, consistent with the previously reported binding characteristics. Mass spec analysis also shows one covalent inhibitor bound to each BTK WT protein (Fig. 4a). The BTK T474I and L528W mutants were tested at the same protein:inhibitor ratio as WT BTK for ease of comparison.

      (3) The Discussion could use a structural perspective on the likely effects of each mutation on inhibitor binding. Both residues occupy positions in beta7 and the hinge, which are commonly found to form hydrophobic and polar contacts with ATP competitive inhibitors in many kinases. This would be useful to discuss and show as a figure, in order to give the non-kinase expert a better understanding of why the mutations might affect inhibitor binding. The variations in structures of each inhibitor and how they contact these two positions might be useful to inspect, and ask why some inhibitors but not others are affected by mutation, and why some inhibitors but not others induce effects over long distances to W395 and the activation loop.

      As requested, we have added a new paragraph in the discussion and a new figure (Fig. 10), to expand on likely effects of the mutations on inhibitor binding. The allosteric effects of some of the BTK inhibitors, on the other hand are currently being investigated and is beyond the scope of the current manuscript.

      (4) The authors propose that small differences in Tm and stability of L358W account for its effect on resistance. Does this mutant show elevated expression in patient tumors over those with WT BTK?

      Preliminary data indicates that BTK L528W levels are elevated in one of two patients carrying this resistance mutation. However, due to the low number of patients tested, we have chosen to not include the data in this study but will continue to pursue this question in future work.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      The Editors have assessed your revised submission and rather than issuing a further decision letter we are writing to invite you to make a few small amendments to this version of the paper as listed below.

      We added a summary paragraph at the end of the introduction for clarity.

      (1) RMSD values in Fig 2-source data 1 (and possibly reflected in Fig 2C) appear to be improbably duplicated, specifically ACh runs 1/2, Ebx runs 1/3, and error values for Ebx vs. ACh.

      Thanks for bringing this to our attention. The values are now corrected.

      (2) Shaded area in Fig 2-supplement 5D is inaccurate for depicting loop C.

      The shaded area now reflects residues in loop C, residues 189-198.

      (3) In Fig 2-supplement 4 where an abrupt change in ligand RMSD is implied to represent a cis-trans flip, the accompanying figure showing snapshots misleadingly depicts a different simulation of CCh instead of ACh.

      The snapshot was from the correct ACh simulation. It was mislabeled as CCh in the legend, which now stands corrected.

      (4) Legend to Fig 3 seems misleading regarding colors in the porcupine plots.

      The color pattern indicated in the legend represents the FEL plot and not the porcupine plot. Description about the porcupine plot is not associated with any color.

      (5) Some shaded regions in Fig 6-supplement 2 do not correspond to intervals reported in Fig 4-source data 1.

      Thanks. This is now corrected to match the table.

      Given that some of the above points have remained unaddressed from the prior round of review, the authors should double check that they have addressed any other relevant prior comments not explicitly listed here.

      Finally, the revised first results section has removed the explanation as to why the authors opted to simulate a dimer (i.e., affinity being affected only by local perturbations). The authors should consider reincorporating this explanation for readers, as well as adding a reference to Wang et al. 1997 (PMID: 9222901) in regard to lines 116-119.

      The revised section now includes an added explanation on why dimer was used in simulations. Gupta et. al., J Gen Physiol. 2017 Jan; 149(1): 85–103 was added, as it includes residues from not just the M1 domain that Wang et al covers, but other TMD regions also.

    1. Author response:

      eLife Assessment

      Zhang et al. present important findings that reveal a new role for TET2 in controlling glucose production in the liver, showing that both fasting and a high-fat diet increase TET2 levels, while its absence reduces glucose production. TET2 works with HNF4α to activate the FBP1 gene upon glucagon stimulation, while metformin disrupts TET2-HNF4α interaction, lowering FBP1 levels and improving glucose homeostasis. While the results are solid, more details about the mechanisms and methods are needed to strengthen the study's conclusions

      Thanks for the positive evaluation and constructive comments, which will significantly improve the quality of the manuscript. We will provide more details about the mechanisms and methods in the revised version.

      Reviewer #1 (Public review):

      Summary:

      Zhang et al. describe a delicate relationship between Tet2 and FBP1 in the regulation of hepatic gluconeogenesis.

      Strengths:

      The studies are very mechanistic, indicating that this interaction occurs via demethylation of HNF4a. Phosphorylation of HNF4a at ser 313 induced by metformin also controls the interaction between Tet2 and FBP1.

      Weaknesses:

      The results are briefly described, and oftentimes, the necessary information is not provided to interpret the data. Similarly, the methods section is not well developed to inform the reader about how these experiments were performed. While the findings are interesting, the results section needs to be better developed to increase confidence in the interpretation of the results.

      We thank the reviewer for the positive evaluation and constructive comments. There is a factual error in the paragraph of “Strengths”. The comment that “The studies are very mechanistic, indicating that this interaction occurs via demethylation of HNF4a. Phosphorylation of HNF4a at ser 313 induced by metformin also controls the interaction between Tet2 and FBP1.” should be revised as follows: “The studies are very mechanistic, indicating that this interaction occurs via demethylation of FBP1. Phosphorylation of HNF4a at ser 313 induced by metformin also controls the interaction between Tet2 and HNF4a.”

      Following reviewer’s suggestions, we will provide all the necessary information in methods section to inform the reader about how these experiments were performed, and improve the description of the results in the revised revision.

      Reviewer #2 (Public review):

      Summary:

      This study reveals a novel role of TET2 in regulating gluconeogenesis. It shows that fasting and a high-fat diet increase TET2 expression in mice, and TET2 knockout reduces glucose production. The findings highlight that TET2 positively regulates FBP1, a key enzyme in gluconeogenesis, by interacting with HNF4α to demethylate the FBP1 promoter in response to glucagon. Additionally, metformin reduces FBP1 expression by preventing TET2-HNF4α interaction. This identifies an HNF4α-TET2-FBP1 axis as a potential target for T2D treatment.

      Strengths:

      The authors use several methods in vivo (PTT, GTT, and ITT in fasted and HFD mice; and KO mice) and in vitro (in HepG2 and primary hepatocytes) to support the existence of the HNF4alpha-TET-2-FBP-1 axis in the control of gluconeogenesis. These findings uncovered a previously unknown function of TET2 in gluconeogenesis.

      Weaknesses:

      Although the authors provide evidence of an HNF4α-TET2-FBP1 axis in the control of gluconeogenesis, which contributes to the therapeutic effect of metformin on T2D, its role in the pathogenesis of T2D is less clear. The mechanisms by which TET2 is up-regulated by glucagon should be more explored.

      We thank the reviewer for the supports and constructive comments, and agree with the reviewer that the current version mainly focused on the function of HNF4α-TET2-FBP1 axis in the control of gluconeogenesis. We will explore the pathogenesis of T2D and the mechanism how TET2 is up-regulated by glucagon in the revised revision.

      Both reviewers made positive comments and we will address all the reviewers’ concerns either by new experiments or clarifications. We thank editors and reviewers for the constructive comments, which will significantly improve the quality of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their overall positive evaluation of the manuscript and finding MChIP-C to be a valuable technological advance. To address the reviewer’s helpful comments and recommendations, we performed several additional analyses and improved the text and figures.

      Briefly, we extended and clarified the main text and methods, added analyses of interactions at consensus and method-specific CTCF/DHS sites (Figure S3), added additional comparison tracks to other methods in specific loci (Figure 4), added examples of MChIP-C E-P interactions at previously-verified loci (Figure S2a) and added extensive MChIP-C downsampling analysis (Figure S6).

      Recommendations for authors:

      Reviewer #2 (Recommendations For The Authors:

      (1) Provide .HiC and .cool files for the community to explore the data.

      We thank the reviewer for this suggestion. We have uploaded both the raw and processed data to GEO. We note that .cool and .hic formats may be less useful for this type of data, since it includes only promoter-based interactions and thus the resulting interaction matrix is extremely sparse at the relevant resolutions. In addition, we provide an online genomic browser for our data.

      (2) Provide an R or bioconda package for future data processing.

      We thank the reviewer for this suggestion. We have organized and streamlined the relevant code for processing MChIP-C data and it is available as a github repository.

      (3) The authors should avoid using "mln" for "million".

      We thank the reviewer for this suggestion. We have corrected this in the text.

      Reviewer #3 (Recommendations For The Authors):

      (1) Figure 2- A handful of sites identified by MChIP-C should be verified by 3C or 4C to validate they are true interactions using an orthogonal approach.

      We thank the reviewer for this suggestion. As we show in the current manuscript (and supported by several papers using MNase-based C-methods), C-methods based on restriction enzymes are considerably less sensitive than those based on MNase, so using these methods for anecdotal validation may not be adequate. In addition, it is difficult to extract accurate quantitative measurements from 3C and 4C due to challenges in bias normalization. As a large-scale alternative, we analyzed a set of consensus promoter-CTCF and promoter-DHS interactions identified by all 3 methods (PLAC-seq/Micro-C/MChIP-C; Figure S3). We find that MChIP-C shows clearly superior resolution and sensitivity on these consensus sites. In fact, even for sites which were only called by one of the competing methods, we still see better signal in the MChIP-C data (suggesting that our simplistic MChIP-C peak-calling approach could be improved for further gain). However, as this analysis focuses on “easily detectable” consensus sites, we also emphasize the importance of inspecting interactions which are not detected clearly by alternative methods. To this end, we now show in our manuscript interaction profiles for 11 loci (MYC, PTGER3, CITED2, BTG1, ANTXR2, SEMA7A, LMO2, GATA1, HBG2, VEGFA, MYB), each showing high-resolution MChIP-C interactions which coincide with expected genomic features (p300, CTCF, H3K27ac, known enhancers) and are not clearly observable in Micro-C and PLAC-seq. We also note that the extended overlap of detected MChIP-C interactions with functionally validated enhancers (as measured by CRISPRi) provides an additional large-scale orthogonal validation.

      (2) A supplemental table indicating read pair depth, etc, similar to S02, should be added for the datasets used for comparison (HiChIP-etc). Given the age differences between some of the reference data used, it may represent simply an improvement by increasing sequencing depth rather than a true technical advantage.

      We thank the reviewer for this suggestion. We have added the sequencing depths of the relevant datasets in the methods section. We also performed extensive downsampling analyses as explained in response to the next point.

      (3) I would recommend performing a downsampling analysis to determine at what point the MChIP-C data reaches saturation in terms of the number of reads, with a comparison to the HiChIP reference data. This would allow a more objective measure of the sensitivity of the assays with reference to read depth.

      We thank the reviewer for this suggestion. First, we note that downsampling does not affect the high sensitivity and resolution results as shown in aggregate plots (e.g. Figure 2 and Figure S3). However, downsampling can affect individual peak calling. We thus downsampled our data to 50%, approximately matching the number of total informative reads of both PLAC-seq and Micro-C (i.e. ~20M). We also further downsampled our data to 25% and 10%. With respect to prediction of K562 functionally validated enhancer-promoter interactions (Figure S6b), even at 25% downsampling MChIP-C achieves both a higher recall and higher precision than the other methods, with a slightly higher false-positive rate. At 10% sampling, recall is slightly worse than Micro-C and PLAC-seq, but both the precision and false-positive rate are better than the alternatives. With respect to saturation, we plotted the number of unique distal cis read pairs versus the total number of reads (Figure S6c), and find that our MChIP-C data does not yet show saturation. We also show that downsampling our data to 50% maintains  ~80% of the called interactions (Figure S6d).

      (4) "our results suggest that MChIP-C achieves superior sensitivity and resolution compared to C-methods based on standard restriction enzymes." The sensitivity claims are supported by Figure 2, but not the resolution claims. This is particularly challenging when using histone marks since they can be broad. To directly compare the resolution of MChIP-C to other approaches such as ChIA-PET or HiChIP CTCF or a similar DNA binding protein is required.

      We thank the reviewer for this suggestion. We first note that actually both sensitivity and resolution are relevant for the results shown in Figure 2 and for the signal-to-noise calculations. This is because the low resolution of PLAC-seq peaks can result in very broad peaks that cover the entire area of the interrogated window (5kb on each side), which could seem like low sensitivity. However, we believe that the new Figure S3 may show the higher resolution of MChIP-C more clearly, as do the 11 locus interaction profiles tracks shown in Figure 2, Figure 4 and Figure S2.

      Public reviews:

      Reviewer #1:

      The authors presented a new MNase-based proximity ligation method called MChIP-C, allowing for the measurement of protein-mediated chromatin interactions at single-nucleosome resolution on a genome-wide scale. With improved resolution and sensitivity, they explored the spatial connectivity of active promoters and identified the potential candidates for establishing/maintaining E-P interactions. Finally, with published CRISPRi screens, they found that most functionally verified enhancers do physically interact with their cognate promoters, supporting the enhancer-promoter looping model.

      The study's experimental approach and findings are interesting. However, several issues need to be addressed.

      (1) The authors described that "the lack of interaction between experimentally-validated enhancers and their cognate promoters in some studies employing C-methods has raised doubts regarding the classical promoter-enhancer looping model", so it's intriguing to see whether the MChIP-C could indeed detect the E-P interactions which were not identified by C-methods as they mentioned (Benabdallah et al., 2019; Gupta et al., 2017). I agree that they identified more E-P interactions using MChIP-C, but specifically, they should show at least 2-3 cases. It's important since this is the main conclusion the authors want to draw.

      We thank the reviewer for this suggestion. As we show in the current manuscript (and supported by several papers using MNase-based C-methods), C-methods based on restriction enzymes are considerably less sensitive than those based on MNase, so using these methods for anecdotal validation may not be useful. In addition, it is difficult to extract accurate quantitative measurements from 3C and 4C due to challenges in bias normalization. As a large-scale alternative, we analyzed a set of consensus promoter-CTCF and promoter-DHS interactions identified by all 3 methods (PLAC-seq/Micro-C/MChIP-C; new Figure S3). We find that MChIP-C shows clearly superior resolution and sensitivity on these consensus sites. However, as this analysis focuses on “easily detectable” consensus sites, we also emphasize the importance of inspecting interactions which are not detected clearly by alternative methods. To this end, we now show in our manuscript interaction profiles for 11 loci (MYC, PTGER3, CITED2, BTG1, ANTXR2, SEMA7A, LMO2, GATA1, HBG2, VEGFA, MYB), each showing high-resolution MChIP-C interactions which coincide with expected genomic features (p300, CTCF, H3K27ac, known enhancers) and are not clearly observable in Micro-C and PLAC-seq. We also note that the extended overlap of detected MChIP-C interactions with functionally validated enhancers (as measured by CRISPRi) provides an additional large-scale orthogonal validation.

      (2) The authors compared their data to those of Chen et al. (Chen et al., 2022), who used PLAC-seq with anti-H3K4me3 antibodies in K562 cells and standard Micro-C data previously reported for K562, concluding that "MChIP-C achieves superior sensitivity and resolution compared to C-methods based on standard restriction enzymes.". This is not convincing since they only compared their data to one dataset. More datasets from other cell lines should be included.

      We thank the reviewer for this suggestion. We would like to clarify that all datasets in the paper are K562 datasets, and this cell line is unique in the availability of CRISPRi screens, PLAC-Seq, Micro-C, and hundreds of ChIP-Seq tracks for it. We would expect datasets from other cell types to have changes in their regulatory interactions, so they would be less adequate for direct comparison. In addition, the general resolution and sensitivity limitations (e.g. due to restriction fragment size) are not dependent on cell type and has been shown in other MNase-based method papers.

      (3) The reasons for choosing Chen's data (Chen et al., 2022) and CRISPRi screens (Fulco et al., 2019; Gasperini et al., 2019) should be provided since there are so many out there.

      We thank the reviewer for this comment. We selected these CRISPRi screen datasets since they match the cell type (K562) which we used for MChIP-C, and we selected the PLAC-seq data as it is the only PLAC-seq/HiChIP dataset which matches both the cell type (K562) and the antibody (H3K4me3).

      (4) The authors identify EP300 histone acetyltransferase and the SWI/SNF remodeling complex as potential candidates for establishing and/or maintaining enhancer-promoter interactions, but not RNA polymerase II, mediator complex, YY1, and BRD4. More explanation is needed for this point since they're previously suggested to be associated with E-P interactions.

      We thank the reviewer for this comment. We apologize for this point being unclear: as Figure S5 shows, we actually did identify Pol2, mediator YY1 and BRD4 as predictive features, but P300 and SWI/SNF show somewhat higher predictive power. We have now clarified this in the text.

      (5) The limitations of the method should be discussed.

      We thank the reviewer for this suggestion. We have now added to the text a discussion of what we view as the current main limitation of the method, namely its low fraction of informative reads.

      Reviewer #2:

      Summary:

      Golov et al performed the capture of MChIP-C using the H3K4me3 antibody. The new method significantly increases the resolution of Micro-C and can detect clear interactions which are not well described in the previous HiChIP/PLAC-seq method. Overall, the paper represents a significant technological advance that can be valuable to the 3D genomic field in the future.

      Strengths:

      (1) The authors established a novel method to profile the promoter center genomic interactions based on the Micro-C method. Such a method could be very useful to dissect the enhancer promoter interaction which has long been an issue for the popular HiC method.

      (2) With the MChIP-C method the authors are able to find new genomic interactions with promoter regions enriched in CTCF. The author has significantly increased the detection sensitivity of such methods as PLAC-seq, Micro-C, and HiChIP.

      (3) The authors identified a new type of interaction between the CTCF-less promoter and the CTCF binding site. This particular type of interaction could explain the CTCF's function in regulating gene transcription activity as observed in many studies. I personally think the second stripe model of P-CTCF interaction is more likely as this has been proposed for the super-enhancer stripe model before. The author should also discuss this part of the story more.

      Weaknesses:

      (1) The data presentation should include the contact heat map. The current data presentation makes it hard for the readers to have a comprehensive view of pair-wise interactions between promoters and the PIR. In particular, these maps may directly give answers to the proposed model of promoter-CTCF interactions by the authors in Figure 3a.

      We thank the reviewer for this suggestion. We note that since the data mainly includes promoter-based interactions, the resulting interaction matrix is extremely sparse at the relevant resolutions. Specifically with respect to promoter-CTCF interactions, without a good sampling of the entire interaction matrix it is difficult to confidently distinguish between the two models only based on MChIP-C data, as it would require data about interaction between non-promoter regions and CTCF.

      (2) In Fig 3D, there seems a very limited increase of power predicting MChIP-C signal for DHS-promoter pairs beyond the addition of CTCF. This figure could be simplified with fewer factors.

      We thank the reviewer for this suggestion. We agree that the last factors do not add predictive power, but we do not think this overly complicates the figure and we prefer to leave these for the reader to evaluate.

      (3) The current method seems to have a big fraction of unusable reads. How the authors process the data should be included to allow for future reproduction. Ideally, the authors should generate a package on R or Bioconda for this processing.

      We thank the reviewer for this suggestion. We agree that the fraction of informative reads is small with respect to some other methods, and expect future versions of MChIP-C to address this limitation. We have organized and streamlined the relevant code for processing MChIP-C data and it is available as a github repository.

      Reviewer #3:

      Summary:

      This manuscript represents a technological development- specifically a micrococcal nuclease chromatin capture approach, termed MChIP-C to identify promoter-centered chromatin interactions at single nucleosome resolution via a specific protein, similar to HiChIP, ChIA-PET, etc.. In general, the manuscript is technically well done. Two major issues raise concerns that need to be addressed. First, it does not appear that novel chromatin interactions identified by MChIP-C which were missed by other approaches such as HiChIP, were validated. This is central to the argument of "improved" sensitivity, which is one of the key factors to assess sensitivity. Second is the question of resolution. Because the authors focus on a histone mark (H3K4me3) it is unclear whether the resolution of the assay truly exceeds other approaches, especially microC. These two issues are not completely supported by the data provided.

      Strengths:

      The method appears to hold promise to improve both the sensitivity and resolution of protein-centered chromatin capture approaches.

      Weaknesses:

      (1) Specific validation experiments to demonstrate the identification of previously missed novel interactions are missing.

      We thank the reviewer for this suggestion. Given that such interactions are missed by Micro-C and PLAC-seq, it would not make sense to use these methods for validation. We thus propose that MChIP-C interactions can be validated by their overlap with expected genomic features. To this end, we now show in our manuscript interaction profiles for 11 loci (MYC, PTGER3, CITED2, BTG1, ANTXR2, SEMA7A, LMO2, GATA1, HBG2, VEGFA, MYB), each showing high-resolution MChIP-C interactions which coincide with expected genomic features (p300, CTCF, H3K27ac, known enhancers) and are not clearly observable in Micro-C and PLAC-seq. In addition, the higher overlap of MChIP-C interactions with functionally-validated K562 enhancer-promoter interactions (provided by CRISPRi screens) provides further functional validation for novel MChIP-C interactions.

      (2) It is unclear if the resolution is really superior based on the data provided.

      We thank the reviewer for this comment. We first note that actually both sensitivity and resolution are relevant for the results shown in Figure 2 and for the signal-to-noise calculations. This is because the low resolution of PLAC-seq peaks can result in very broad peaks that cover the entire area of the interrogated window (5kb on each side), which could seem like low sensitivity. However, we believe that the new Figure S3 may show the higher resolution of MChIP-C more clearly, as do the 11 locus interaction profiles tracks shown in Figure 2, Figure 4 and Figure S2.

      (3) It is unclear how much advantage the approach has, especially compared to existing approaches such as HiChIP since sequencing depth as a variable is not adequately addressed.

      We thank the reviewer for this comment. First, we note that downsampling does not affect the high sensitivity and resolution results as shown in aggregate plots (e.g. Figure 2 and Figure S3). However, downsampling can affect individual peak calling. We thus downsampled our data to 50%, approximately matching the number of total informative reads of both PLAC-seq and Micro-C (i.e. ~20M). We also further downsampled our data to 25% and 10%. With respect to prediction of K562 functionally validated enhancer-promoter interactions (Figure S6b), even at 25% downsampling MChIP-C achieves both a higher recall and higher precision than the other methods, with a slightly higher false-positive rate. At 10% sampling, recall is slightly worse than Micro-C but both the precision and false-positive rate are better than the alternatives.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript proposes that 5mC modifications to DNA, despite being ancient and widespread throughout life, represent a vulnerability, making cells more susceptible to both chemical alkylation and, of more general importance, reactive oxygen species. Sarkies et al take the innovative approach of introducing enzymatic genome-wide cytosine methylation system (DNA methyltransferases, DNMTs) into E. coli, which normally lacks such a system. They provide compelling evidence that the introduction of DNMTs increases the sensitivity of E. coli to chemical alkylation damage. Surprisingly they also show DNMTs increase the sensitivity to reactive oxygen species and propose that the DNMT generated 5mC presents a target for the reactive oxygen species that is especially damaging to cells. Evidence is presented that DNMT activity directly or indirectly produces reactive oxygen species in vivo, which is an important discovery if correct, though the mechanism for this remains obscure.

      Strengths:

      This work is based on an interesting initial premise, it is well-motivated in the introduction and the manuscript is clearly written. The results themselves are compelling.

      We thank the reviewer for their positive response to our study.  We also really appreciate the thoughtful comments raised.  Adding the considerations raised below to the manuscript will considerably strengthen our findings.

      Weaknesses:

      I am not currently convinced by the principal interpretations and think that other explanations based on known phenomena could account for key results. Specific points below.

      (1) As noted in the manuscript, AlkB repairs alkylation damage by direct reversal (DNA strands are not cut). In the absence of AlkB, repair of alklylation damage/modification is likely through BER or other processes involving strand excision and resulting in single stranded DNA. It has previously been shown that 3mC modification from MMS exposure is highly specific to single stranded DNA (PMID:20663718) occurring at ~20,000 times the rate as double stranded DNA. Consequently, the introduction of DNMTs is expected to introduce many methylation adducts genome-wide that will generate single stranded DNA tracts when repaired in an AlkB deficient background (but not in an AlkB WT background), which are then hyper-susceptible to attack by MMS. Such ssDNA tracts are also vulnerable to generating double strand breaks, especially when they contain DNA polymerase stalling adducts such as 3mC. The generation of ssDNA during repair is similarly expected follow the H2O2 or TET based conversion of 5mC to 5hmC or 5fC neither of which can be directly repaired and depend on single strand excision for their removal. The potential importance of ssDNA generation in the experiments has not been considered.

      We thank the reviewer for this interesting and insightful suggestion.  Our interpretation of our findings is that a subset of MMS-induced DNA damage, specifically 3mC, overlaps with the damage introduced by DNMTs and this accounts for increased sensitivity to MMS when DNMTs are expressed.  However, the idea that the introduction of 3mC by DNMT actually makes the DNA more liable to damage by MMS, potentially through increasing the level of ssDNA, is also a potential explanation, which could operate in addition to the mechanism that we propose.

      (2) The authors emphasise the non-additivity of the MMS + DNMT + alkB experiment but the interpretation of the result is essentially an additive one: that both MMS and DNMT are introducing similar/same damage and AlkB acts to remove it. The non-additivity noted would seem to be more consistent with the ssDNA model proposed in #1. More generally non-additivity would also be seen if the survival to DNA methylation rate is non-linear over the range of the experiment, for example if there is a threshold effect where some repair process is overwhelmed. The linearity of MMS (and H2O2) exposure to survival could be directly tested with a dilution series of MMS (H2O2).

      We thank the reviewer for this point.  As in the response to point #1, the reviewer’s hypothesis of increased potency of MMS, potentially through increased ssDNA, downstream of 3mC induction by DNMT, is a good one.  The reviewers’ suggestion would produce a highly non-linear response to MMS treatment in the AlkB mutant in the DNMT background, so we agree that investigating non-linearity over a wider range rather than inferring from the non-additivity of a single point would be useful in evaluating the results so we will add a dose-response curve for DNMT-expressing cells to MMS to the revised version of the manuscript.

      (3) The substantial transcriptional changes induced by DNMT expression (Supplemental Figure 4) are a cause for concern and highlight that the ectopic introduction of methylation into a complex system is potentially more confounded than it may at first seem. Though the expression analysis shows bulk transcription properties, my concern is that the disruptive influence of methylation in a system not evolved with it adds not just consistent transcriptional changes but transcriptional heterogeneity between cells which could influence net survival in a stressed environment. In practice I don't think this can be controlled for, possibly quantified by single-cell RNA-seq but that is beyond the reasonable scope of this paper.

      We fully agree with the reviewer and, indeed, we are very interested in what is driving the transcriptional changes that we observed.  Work is currently underway in the lab to investigate this further but, as the reviewer suggests, is beyond the scope of this paper.  However, we will include a more extensive comment about the transcriptional changes in the discussion of the revised manuscript.

      (4) Figure 4 represents a striking result. From its current presentation it could be inferred that DNMTs are actively promoting ROS generation from H2O2 and also to a lesser extent in the absence of exogenous H2O2. That would be very surprising and a major finding with far-reaching implications. It would need to be further validated, for example by in vitro reconstitution of the reaction and monitoring ROS production. Rather, I think the authors are proposing that some currently undefined, indirect consequence of DNMT activity promotes ROS generation, especially when exogenous H2O2 is available. It would help if this were clarified.

      We thank the reviewer for picking this up.  In the current version’s discussion, we raised two possible explanations for why DNMT (even without H2O2) increases the ROS levels.  One idea is direct activity of DNMT, and one is through the product of DNMT activity acting as a platform to generate more ROS from endogenous or exogenous sources.  We argued that direct activity is less likely, exactly as the reviewer points out.  It is, however, not impossible and we agree with the reviewer that, if it were to be the case, it would be a striking result.  In the revised version of the manuscript we will include an experiment to test whether DNMTs can generate ROS in vitro, which may provide preliminary evidence to distinguish between the two hypotheses we raised, and we will also edit the text of the discussion to clarify our reasoning. 

      Reviewer #2 (Public review):

      5-methylcytosine (5mC) is a key epigenetic mark in DNA and plays a crucial role in regulating gene expression in many eukaryotes including humans. The DNA methyltransferases (DNMTs) that establish and maintain 5mC, are conserved in many species across eukaryotes, including animals, plants, and fungi, mainly in a CpG context. Interestingly, 5mC levels and distributions are quite variable across phylogenies with some species even appearing to have no such DNA methylation.

      This interesting and well-written paper discusses the continuation of some of the authors' work published several years ago. In that previous paper, the laboratory demonstrated that DNA methylation pathways coevolved with DNA repair mechanisms, specifically with the alkylation repair system. Specifically, they discovered that DNMTs can introduce alkylation damage into DNA, specifically in the form of 3-methylcytosine (3mC). (This appears to be an error in the DNMT enzymatic mechanism where the generation 3mC as opposed to its preferred product 5-methylcytosine (5mC), is caused by the flipped target cytosine binding to the active site pocket of the DNMT in an inverted orientation.) The presence of 3mC is potentially toxic and can cause replication stress, which this paper suggests may explain the loss of DNA methylation in different species. They further showed that the ALKB2 enzyme plays a crucial role in repairing this alkylation damage, further emphasizing the link between DNA methylation and DNA repair.

      The co-evolution of DNMTs with DNA repair mechanisms suggests there can be distinct advantages and disadvantages of DNA methylation to different species which might depend on their environmental niche. In environments that expose species to high levels of DNA damage, high levels of 5mC in their genome may be disadvantageous. This present paper sets out to examine the sensitivity of an organism to genotoxic stresses such as alkylation and oxidation agents as the consequence of DNMT activity. Since such a study in eukaryotes would be complicated by DNA methylation controlling gene regulation, these authors cleverly utilize Escherichia coli (E.coli) and incorporate into it the DNMTs from other bacteria that methylate the cytosines of DNA in a CpG context like that observed in eukaryotes; the active sites of these enzymes are very similar to eukaryotic DNMTs and basically utilize the same catalytic mechanism (also this strain of E.coli does not specifically degrade this methylated DNA) .

      The experiments in this paper more than adequately show that E. coli expression of these DNMTs (comparing to the same strain without the DNMTS) do indeed show increased sensitivity to alkylating agents and this sensitivity was even greater than expected when a DNA repair mechanism was inactivated. Moreover, they show that this E. coli expressing this DNMT is more sensitive to oxidizing agents such as H2O2 and has exacerbated sensitivity when a DNA repair glycosylase is inactivated. Both propensities suggest that DNMT activity itself may generate additional genotoxic stress. Intrigued that DNMT expression itself might induce sensitivity to oxidative stress, the experimenters used a fluorescent sensor to show that H2O2 induced reactive oxygen species (ROS) are markedly enhanced with DNMT expression. Importantly, they show that DNMT expression alone gave rise to increased ROS amounts and both H2O2 addition and DNMT expression has greater effect that the linear combination of the two separately. They also carefully checked that the increased sensitivity to H2O2 was not potentially caused by some effect on gene expression of detoxification genes by DNMT expression and activity. Finally, by using mass spectroscopy, they show that DNMT expression led to production of the 5mC oxidation derivatives 5-hydroxymethylcytosine (5hmC) and 5-formylcytosine (5fC) in DNA. 5fC is a substrate for base excision repair while 5hmC is not; more 5fC was observed. Introduction of non-bacterial enzymes that produce 5hmC and 5fC into the DNMT expressing bacteria again showed a greater sensitivity than expected. Remarkedly, in their assay with addition of H2O2, bacteria showed no growth with this dual expression of DNMT and these enzymes.

      Overall, the authors conduct well thought-out and simple experiments to show that a disadvantageous consequence of DNMT expression leading to 5mC in DNA is increased sensitivity to oxidative stress as well as alkylating agents.

      Again, the paper is well-written and organized. The hypotheses are well-examined by simple experiments. The results are interesting and can impact many scientific areas such as our understanding of evolutionary pressures on an organism by environment to impacting our understanding about how environment of a malignant cell in the human body may lead to cancer.

      We thank the reviewer for their response to our study, and value the time taken to produce a public review that will aid readers in understanding the key results of our study. 

      Reviewer #3 (Public review):

      Summary:

      Krwawicz et al., present evidence that expression of DNMTs in E. coli results in (1) introduction of alkylation damage that is repaired by AlkB; (2) confers hypersensitivity to alkylating agents such as MMS (and exacerbated by loss of AlkB); (3) confers hypersensitivity to oxidative stress (H2O2 exposure); (4) results in a modest increase in ROS in the absence of exogenous H2O2 exposure; and (5) results in the production of oxidation products of 5mC, namely 5hmC and 5fC, leading to cellular toxicity. The findings reported here have interesting implications for the concept that such genotoxic and potentially mutagenic consequences of DNMT expression (resulting in 5mC) could be selectively disadvantageous for certain organisms. The other aspect of this work which is important for understanding the biological endpoints of genotoxic stress is the notion that DNA damage per se somehow induces elevated levels of ROS.

      Strengths:

      The manuscript is well-written, and the experiments have been carefully executed providing data that support the authors' proposed model presented in Fig. 7 (Discussion, sources of DNA damage due to DNMT expression).

      Weaknesses:

      (1) The authors have established an informative system relying on expression of DNMTs to gauge the effects of such expression and subsequent induction of 3mC and 5mC on cell survival and sensitivity to an alkylating agent (MMS) and exogenous oxidative stress (H2O2 exposure). The authors state (p4) that Fig. 2 shows that "Cells expressing either M.SssI or M.MpeI showed increased sensitivity to MMS treatment compared to WT C2523, supporting the conclusion that the expression of DNMTs increased the levels of alkylation damage." This is a confusing statement and requires revision as Fig. 2 does ALL cells shown in Fig. 2 are expressing DNMTs and have been treated with MMS. It is the absence of AlkB and the expression of DNMTs that that causes the MMS sensitivity.

      We thank the reviewer for this and agree that this needs to be clarified with regards to the figure presented and will do so in the revised manuscript. 

      (2) It would be important to know whether the increased sensitivity (toxicity) to DNMT expression and MMS is also accompanied by substantial increases in mutagenicity. The authors should explain in the text why mutation frequencies were not also measured in these experiments.

      This is an important point because it is not immediately obvious that increased sensitivity would be associated with increased mutagenicity (if, for example, 3mC was never a cause of innacurate DNA repair even in the absence of AlkB).  We will carry out this experiment and include these data in the revised version of the manuscript.  Detailed consideration of the types and sources of mutations is beyond the scope of this manuscript, but we are also working on this and hope to produce data on this in the future. 

      (3) Materials and Methods. ROS production monitoring. The "Total Reactive Oxygen Species (ROS) Assay Kit" has not been adequately described. Who is the Vendor? What is the nature of the ROS probes employed in this assay? Which specific ROS correspond to "total ROS"?

      The ROS measurement was with a kit from ThermoFisher: https://www.thermofisher.com/order/catalog/product/88-5930-74.  The probe is DCFH-DA.  This is a general ROS sensor that is oxidised by a large number of cellular reactive oxygen species hence we cannot attribute the signal to a single species.  Use of a technique with the potential to more precisely identify the species involved is something we plan to do in future, but is beyond what we can do as part of this study.  We will include a comment to this effect in the revised version of the manuscript.

      (4) The demonstration (Fig. 4) that DNMT expression results in elevated ROS and its further synergistic increase when cells are also exposed to H2O2 is the basis for the authors' discussion of DNA damage-induced increases in cellular ROS. S. cerevisiae does not possess DNMTs/5mC, yet exposure to MMS also results in substantial increases in intracellular ROS (Rowe et al, (2008) Free Rad. Biol. Med. 45:1167-1177. PMC2643028). The authors should be aware of previous studies that have linked DNA damage to intracellular increases in ROS in other organisms and should comment on this in the text.

      We thank the reviewer for this point.  We note that the increased ROS that we observed occur in the presence of DNMTs alone and in the presence of H2O2, not in the presence of MMS; however, the point that DNA damage in general can promote increased ROS in some circumstances is well taken and we will include a comment on this in the discussion of the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      Type 1 diabetes mellitus (T1DM) progression is accelerated by oxidative stress and apoptosis. Eugenol (EUG) is a natural compound previously documented as anti-inflammatory, anti-oxidative, and anti-apoptotic. In this manuscript by Jiang et al., the authors study the effects of EUG on T1DM in MIN6 insulinoma cells and a mouse model of chemically induced T1DM. The authors show that EUG increases nuclear factor E2-related factor 2 (Nrf2) levels. This results in a reduction of pancreatic beta-cell damage, apoptosis, oxidative stress markers, and a recovery of insulin secretion. The authors highlight these effects as indicative of the therapeutic potential of EUG in managing T1DM.

      Strengths

      Relevant, timely, and addresses an interesting question in the field. The authors consistently observe enhanced beta cell functionality following EUG treatment, which makes the compound a promising candidate for T1DM therapy.

      Weaknesses

      (1) The in vivo experiments have too few biological replicates. With an n=3 (as all figure legends indicate) in complex mouse studies such as these, drawing robust conclusions becomes challenging. It is important to reproduce these results in a larger cohort, to validate the conclusions of the authors.

      Thanks for your comments. In the figure legends of the first draft manuscript, n=3 means at least 3 biological replicates, and in the section of material and methods, n=30 means sample size. The number of mice in each group is 30 and there were 150 mice used in this study, and mice are assigned as follows for the whole in vivo experiments. The relative information has been added in the revised manuscript.

      Author response image 1.

      (2) Another big concern is the lack of quantifications and statistical analysis throughout the manuscript. Although the authors claim statistical significance in various experiments, the limited information provided makes it difficult to verify. The authors use vague and minimal descriptions of their experiments, which further reduces the reader's comprehension and the reproducibility of the experiments.

      Thanks for your constructive suggestion. We conducted quantitative and statistical analysis of the entire manuscript through GraphPad Prism software again. Additionally, we have improved the experimental description in the revised manuscript.

      (3) Finally, the use of Min6 cells as a model for pancreatic beta cells is a strong limitation of this study. Future studies should seek to reproduce these findings in a more translational model and use more relevant in vitro cell systems (eg. Islets).

      Thanks for your professional comments. Mouse insulinoma cells (MIN6 cell line) are permanent cell lines isolated from mouse islet β cell tumors, which can reflect the functional changes of islet β cells. As mature islet cells, MIN6 cells have been widely used in the study of type 1 diabetes mellitus[1-4], so in this study, MIN6 cells were used as the cell model in vitro. In our future studies, we will try to conduct our findings using more relevant in vitro cell systems (eg. Islets).

      References:

      (1) WU M, CHEN W, ZHANG S, et al. Rotenone protects against β-cell apoptosis and attenuates type 1 diabetes mellitus [J]. Apoptosis, 2019, 24(11-12): 879-91.

      (2) LUO C, HOU C, YANG D, et al. Urolithin C alleviates pancreatic β-cell dysfunction in type 1 diabetes by activating Nrf2 signaling [J]. Nutr Diabetes, 2023, 13(1): 24.

      (3) LAKHTER A J, PRATT R E, MOORE R E, et al. Beta cell extracellular vesicle miR-21-5p cargo is increased in response to inflammatory cytokines and serves as a biomarker of type 1 diabetes [J]. Diabetologia, 2018, 61(5): 1124-34.

      (4) LIN Y, SUN Z. Antiaging Gene Klotho Attenuates Pancreatic β-Cell Apoptosis in Type 1 Diabetes [J]. Diabetes, 2015, 64(12): 4298-311.

      Reviewer #3 (Public Review):

      Summary:

      This study by Jiang et al. aims to establish the streptozotocin (STZ)-induced type 1 diabetes mellitus (T1DM) mouse model in vivo and the STZ-induced pancreatic β cell MIN6 cell model in vitro to explore the protective effects of Eugenol (EUG) on T1DM. The authors tried to elucidate the potential mechanism by which EUG inhibits the NRF2-mediated anti-oxidative stress pathway. Overall, this study is well executed with solid data, offering an intriguing report from animal studies for a potential new treatment strategy for T1DM.

      Strengths:

      The in vivo efficacy study is comprehensive and solid. Given that STZ-induced T1DM is a devastating and harsh model, the in vivo efficacy of this compound is really impressive.

      Weaknesses:

      (1) The Mechanism is linked with the anti-oxidant property of the compound, which is common for many natural compounds, such as flavonoids and polyphenol. However, rarely, this kind of compound has been successfully developed into therapeutics in clinical usage. Indeed, if that is the case, Vitamin C or Vitamin E could be used here as the positive control.

      Thanks for your comments. In fact, many anti-oxidant drugs are used for the treatment of type 1 diabetes mellitus in the clinical. For example, lipoic acid was used to treat diabetic peripheral neuropathy[5]. Vitamin E could effectively eliminate free radicals, protect cell membranes, and significantly reduce the risk of cardiovascular disease in patients with SPACE or ICARE diabetes[6]. Glutathione played crucial roles in the detoxification and anti-oxidant systems of cells and has been used to treat acute poisoning and chronic liver diseases by intravenous injection[7]. Therefore, eugenol enhances the management of type 1 diabetes mellitus by modulating oxidative stress pathways and holds potential as a future therapeutic choice for clinical application. In the future relevant studies, we will try to use Vitamin C or Vitamin E as the positive control.

      References:

      (5) ZIEGLER D, PAPANAS N, SCHNELL O, et al. Current concepts in the management of diabetic polyneuropathy [J]. J Diabetes Investig, 2021, 12(4): 464-75.

      (6) VARDI M, LEVY N S, LEVY A P. Vitamin E in the prevention of cardiovascular disease: the importance of proper patient selection [J]. J Lipid Res, 2013, 54(9): 2307-14.

      (7) HONDA Y, KESSOKU T, SUMIDA Y, et al. Efficacy of glutathione for the treatment of nonalcoholic fatty liver disease: an open-label, single-arm, multicenter, pilot study [J]. BMC Gastroenterol, 2017, 17(1): 96.

      Reviewer #1 (Recommendations For The Authors):

      • For each of the figure panels the authors should indicate the exact number of biological replicates (how many mice or how many independent in vitro experiments). For IF panels, the number of mice, the number of histology slides per mouse, number of fields analyzed should be indicated.

      Thanks for your constructive suggestion. These details had been added in the revised manuscript.

      • The methods state n=30 and Figure 1 states n=3. N=3 is too little for such a complex in vivo study and would severely reduce the reliability of the in vivo experiments.

      Thanks for your suggestion. In the figure legends of the first draft manuscript, n=3 means at least 3 biological replicates, and in the section of material and methods, n=30 means sample size. The number of mice in each group is 30 and there were 150 mice used in this study, and mice are assigned as follows for the whole in vivo experiments. The in vivo experimental data of Figure 1 were supplemented in the revised manuscript.

      • Individual data points should be included in each of the graphs from this manuscript.

      Thanks for your reminder. The revised manuscript have shown the individual data points in each of the graphs.

      • The quantifications and statistics in the manuscript need improvement. Several experiments are missing quantifications and/or statistical tests (e.g. Figure 1J). Other experiments show a quantification but without any explanation of replicates (e.g. Figures 2B and 2G). None of the experiments show individual data points, and as in the previous comment, these should be included.

      Thanks for your comments. In the revised manuscript, statistics and repetitions of experimental data have been supplemented, and individual data points were shown in each graph.

      • What is the reason for intragastric administration? The previous studies on which the dosages were based used oral administration (gavage). (Discussed in methods 4.2).

      Thanks for your professional comments. The intervention treatment of T1DM mice is conducted through two methods: oral administration[8] and oral gavage[9-11]. Due to limited experimental conditions, it is not feasible to feed a single mouse in a single cage, which makes it challenging to precisely control the actual daily intervention dose for each mouse when using oral administration. To ensure that each mouse receives an intervention dose according to its weight and expected dosage, we employ a method of gavage. In addition, oral gavage is more convenient and easier to operate than oral administration. Therefore, in vivo experiment of this study used eugenol gavage intervention as a treatment method. These details had been added in the revised manuscript.

      References:

      (8) ZHAO H, WU H, DUAN M, et al. Cinnamaldehyde Improves Metabolic Functions in Streptozotocin-Induced Diabetic Mice by Regulating Gut Microbiota [J]. Drug Des Devel Ther, 2021, 15: 2339-55.

      (9) XING D, ZHOU Q, WANG Y, et al. Effects of Tauroursodeoxycholic Acid and 4-Phenylbutyric Acid on Selenium Distribution in Mice Model with Type 1 Diabetes [J]. Biol Trace Elem Res, 2023, 201(3): 1205-13.

      (10) SUDIRMAN S, LAI C S, YAN Y L, et al. Histological evidence of chitosan-encapsulated curcumin suppresses heart and kidney damages on streptozotocin-induced type-1 diabetes in mice model [J]. Sci Rep, 2019, 9(1): 15233.

      (11) YAO H, SHI H, JIANG C, et al. L-Fucose promotes enteric nervous system regeneration in type 1 diabetic mice by inhibiting SMAD2 signaling pathway in enteric neural precursor cells [J]. Cell Commun Signal, 2023, 21(1): 273.

      • Urine volume cannot be specified per mouse (methods 4.4) unless the mice were single-housed or if the different groups were not mixed, both are not ideal study set-ups. Please clarify in the methods section.

      Thanks for your constructive suggestion. After successful modeling of T1DM mice, the successful modeling mice were grouped based on method 4.2 as follows Control, T1DM, T1DM + EUG (5 mg/kg/day), T1DM + EUG (10 mg/kg/day), and T1DM + EUG (20 mg/kg/day). To ensure consistency among groups, each group consisted of 5 mice and had equal amounts of diet (100 g), drinking water (250 mL), and environmental conditions for feeding. The urine-soaked area of mice in each group was recorded to quantify the urine volume. The conditions are the same for each group. The description of Method 4.4 has been improved in the revised manuscript.

      • OGTT (Figure 1H) of week 2 is missing. This is an important control time point, as it would show the effect of STZ before EUG treatment.

      Thanks for your careful review. OGTT (Figure 1H) of week 2 has been added in the revised manuscript.

      • In Figure 1J, the control group does not follow the expected ITT trajectory. If possible, add the 120-minute time point to see if the blood glucose levels return to baseline in the control group. The graph shows increased basal glucose levels in the experimental groups, but no differences in insulin tolerance. It also misses the AUC calculations. It is probably not significantly different, which should be noted in the text.

      Thanks for your suggestion. T1DM primarily manifests as pancreatic β cell damage and the absolute reduction of insulin secretion, resulting in the disorder of glucose metabolism in vivo. The oral glucose tolerance test (OGTT) is a series of plasma glucose concentrations measured within 2 h after oral gavage of a certain amount of glucose. It is a standard method to evaluate an individual's blood glucose regulation ability and to understand the function of islet β cells. Insulin resistance means reducing the efficiency of insulin to promote glucose uptake and utilization for various reasons, and the body's compensatory secretion of excessive insulin leads to hyperinsulinemia to maintain the stability of blood glucose. The insulin resistance test (ITT) is commonly employed to detect insulin resistance in T2DM. However, it was found that the ITT experiment had little correlation with T1DM. Therefore, the ITT experiment of Figure 1J and related description have been removed from the revised manuscript.

      • The staining and FACS data on the effects of STZ+EUG+/- ML385 are not convincing (Figure 6 and Figure 7) and do not seem to align with the bar graphs and the conclusions in the text. It would be good to include immunofluorescent staining for insulin to further validate the effects of STZ+EUG+/- ML385 on insulin expression.

      Thanks for your comments.

      (1) In the revised manuscript, between the statistical results and the pictures, so we re-conducted the statistics of the immunofluorescence results of NRF2 and HO-1, as follows:

      (1) NRF2 immunofluorescence staining:

      Author response image 2.

      Group 1

      Author response image 3.

      Group 2

      Author response image 4.

      Group 3

      Author response image 5.

      Group 4

      Author response image 6.

      Group 5

      Author response image 7.

      NRF2 immunofluorescence staining statistics:

      (2) HO-1 immunofluorescence staining:

      Author response image 8.

      Group 1

      Author response image 9.

      Group 2

      Author response image 10.

      Group 3

      Author response image 11.

      Group 4

      Author response image 12.

      Group 5

      Author response image 13.

      HO-1 immunofluorescence staining statistics:

      (2) The meanings represented by each quadrant of cell flow analysis are as follows: Q1 represents a group of necrotic cells, characterized by positive PI staining and negative Anenexin V staining; Q2 represents late apoptotic cells, with both PI and Anenexin V staining negative; Q3 represents early apoptotic cells, with both PI and Anenexin V staining positive; Q4 represents living cells, characterized by positive Anenexin V staining and negative PI staining. In the experiment, the number of apoptotic cells were calculated as the sum of late apoptotic cells in Q2 and early apoptotic cells in Q3. As shown in Figure 9F-G, these results were consistent with those observed in Figure 6G, 6J and Figure 7D-F.

      (3) MIN6 cells, as mouse islet β cell line, has the function of secreting insulin. The intervention of STZ was an absolute decrease in the number of islet β cells, so the result of insulin immunofluorescence staining was only a decrease in the number of MIN6 cells in each cell group. In addition, the detection of insulin protein expression level is always through ELISA method to assess the secretion of insulin protein in the cell supernatant. Figure 6E is the ELISA results of insulin protein secretion in the cell supernatant.

      • The experimental design for the in vitro experiments was unclear from the text. Consider including a schematic to show when cells were treated with STZ, EUG, and ML385.

      Thanks for your suggestion. The experimental design for the in vitro experiments of this study has been added in Figure 6A of the revised manuscript.

      • As stated in the Discussion, the use of the insulinoma line Min6 as a model instead of primary pancreatic beta cells is a clear limitation of the study. The mechanistic data would be stronger if validated on a more relevant system (eg. untransformed Islets).

      Thanks for your comments. Mouse insulinoma cells (MIN6 cell line) are permanent cell lines isolated from mouse islet β cell tumors, which can reflect the functional changes of islet β cells. As mature islet cells, MIN6 cells have been widely utilized as an in vitro cellular model for diabetes research to investigate the functionality of β cells within pancreatic islets[1, 2, 12]. So in this study, MIN6 cells were used as the cell model in vitro. In our future studies, we will try to conduct our findings using more relevant in vitro cell systems (eg. Islets).

      References:

      (1) WU M, CHEN W, ZHANG S, et al. Rotenone protects against β-cell apoptosis and attenuates type 1 diabetes mellitus [J]. Apoptosis, 2019, 24(11-12): 879-91.

      (2) LUO C, HOU C, YANG D, et al. Urolithin C alleviates pancreatic β-cell dysfunction in type 1 diabetes by activating Nrf2 signaling [J]. Nutr Diabetes, 2023, 13(1): 24.

      (12) CHEN H, LOU Y, LIN S, et al. Formononetin, a bioactive isoflavonoid constituent from Astragalus membranaceus (Fisch.) Bunge, ameliorates type 1 diabetes mellitus via activation of Keap1/Nrf2 signaling pathway: An integrated study supported by network pharmacology and experimental validation [J]. J Ethnopharmacol, 2024, 322: 117576.

      • The use of small molecule inhibitors such as ML385 can have unspecific effects. Genetic manipulation or the use of siRNAs to inhibit the NRF2 pathway would have been preferable for the in vitro experiments.

      Thanks for your constructive suggestion. ML385 is a commonly used and stable inhibitor of the NRF2 and has been used in a variety of disease studies[13-15]. The MIN6 cells utilized in this study were cultured under challenging conditions and exhibited a sluggish growth rate. Owing to the cytotoxicity associated with siRNAs transfection reagents, a significant proportion of MIN6 cells succumbed following transfection. Consequently, small molecule inhibitors ML385 were employed in this investigation. In our future studies, we will try to conduct our findings using siRNAs.

      References:

      (13) DANG R, WANG M, LI X, et al. Edaravone ameliorates depressive and anxiety-like behaviors via Sirt1/Nrf2/HO-1/Gpx4 pathway [J]. J Neuroinflammation, 2022, 19(1): 41.

      (14) WANG Z, YAO M, JIANG L, et al. Dexmedetomidine attenuates myocardial ischemia/reperfusion-induced ferroptosis via AMPK/GSK-3β/Nrf2 axis [J]. Biomed Pharmacother, 2022, 154: 113572.

      (15) LI J, DENG S H, LI J, et al. Obacunone alleviates ferroptosis during lipopolysaccharide-induced acute lung injury by upregulating Nrf2-dependent antioxidant responses [J]. Cell Mol Biol Lett, 2022, 27(1): 29.

      • The study proposes a mechanism in which EUG-induced disruption of KEAP1 and NRF2 interaction leads to NRF2 translocation to the nucleus and upregulation of proteins required to prevent oxidative stress. In Figure 6H it is unclear whether the nuclear NRF2 increases. Please add quantifications of the immunostainings.

      Thanks for your reminder. Figure 6J shows the quantifications of the immunostainings of NRF2 in the revised manuscript.

      • Some of the figure legends lack important information. In Figure 5A, 6E for instance, what is the protein expression normalized to?

      Thanks for your constructive suggestion. Protein normalization refers to the standardization of proteins from different sources and with different properties, so as to facilitate the comparison of protein content and expression in different samples. In WB experiment, protein expression normalization is one of the essential steps. Western blot of nuclear protein generally cannot be performed using β-Actin as an internal reference. Lamin B was chosen because β-Actin is an intrinsic parameter not found in the nucleus. N-NRF2, as a nuclear protein, requires Lamin B as a reference for protein normalization. The lack important information of WB in Figure have been supplemented in figure legends of the revised manuscript.

      • Please acknowledge previous literature on the effects of EUG/clove oil in diabetes models. The meta-analytical review by Carvalho et al. (DOI: 10.1016/j.phrs.2020.105315) should be cited and discussed.

      Thanks for your suggestion. It has been cited and discussed in the revised manuscripts.

      • Consider revising the text for grammar, language mistakes, and readability. The text is not always precise (e.g. in the explanation of gamma-H2AX in the results), does not explain terminology (e.g. the oxidative stress markers - line 204+205), or simplifies conclusions (e.g. "improved islet function" based on glucose tolerance test", line 129).

      Thanks for your comments. The above problem has been solved in the revised manuscripts. In addition, we had send our manuscript to the professional English language editing company to improve our paper, and the editorial certificate had been submitted as a supplement document.

      • In the current format, some figures are out of focus. Please make sure to upload a high-quality version for publication.

      Thanks for your suggestion. A high quality version figures has been uploaded. Perhaps due to the excessive content of the file after upload, the file is compressed, and the figures is not focused. So, all figures in this study have been uploaded separately for download in the review system.

      Reviewer #2 (Recommendations For The Authors):

      Below are specific points of criticism on the experiments presented.

      (1a) There is no comparison among eugenol treatments with regards to fasting weight, blood glucose, water intake, food intake, and, crucially, OGTT. All three treatments appear to show very similar effects but has this been statistically assessed? Shown statistical significance of ketonuria between no and high eugenol treatments seems exaggerated.

      Thanks for your comments. EUG intervention has a dose-dependent effect on T1DM. According to Figure 1B-I, 20 mg/kg EUG has the best effect. Fasting body weight, blood glucose, water intake, food intake, and OGTT were statistically assessed in Figure 1 of the revised manuscript. In addition, we performed statistical analyse of ketonuria between no and high eugenol treatments again in the revised manuscript. In the revised manuscript, we have also made objective revisions to the expression of eugenol's efficacy.

      (b) ITT is not used to detect T1DM (line 126).

      Thanks for your suggestion. T1DM primarily manifests as pancreatic β cell damage and the absolute reduction of insulin secretion, resulting in the disorder of glucose metabolism in vivo. The oral glucose tolerance test (OGTT) is a series of plasma glucose concentrations measured within 2 h after oral gavage of a certain amount of glucose. It is a standard method to evaluate an individual's blood glucose regulation ability and to understand the function of islet β cells. Insulin resistance means reducing the efficiency of insulin to promote glucose uptake and utilization for various reasons, and the body's compensatory secretion of excessive insulin leads to hyperinsulinemia to maintain the stability of blood glucose. The insulin resistance test (ITT) is commonly employed to detect insulin resistance in T2DM. However, it was found that the ITT experiment had little correlation with T1DM. Therefore, the ITT experiment and related description have been removed in the revised manuscript.

      (2) Here it is hard to reconcile the gradual increase of Ins protein levels in (STZ) and (STZ + increasing eugenol) samples with(a) results in 1 suggesting that the dose of eugenol does not significantly affect the outcome and(b) Ins expression, which is essentially undetectable in both STZ and STZ+EUG mice. A likely explanation is that EUG just postpones beta cell death. I assume that these analyses were done in week 10 but it is not stated.

      Thanks for your professional suggestion. Perhaps because the file is compressed, the gray value of WB strip is not obvious, so the expression of INS is not seen clearly. In fact, the intervention of STZ resulted in a significant decrease in INS expression compared with the Control group, which could be alleviated by the treatment of EUG. However, due to the large difference in INS between the STZ group, EUG treatment, and the Control group, the gray values of INS in the STZ group and the STZ + EUG group were not clear. As mentioned in the method 4.12-4.13, our WB and PCR samples were from 10 week mice.

      (3) The γH2Ax stainings provided are weak and do not fully correspond to the quantitation - the 5 mg/Kg EUG treatment appears less severe than the 10 mg/Kg. In contrast, changes in the PCD pathway are convincingly demonstrated.

      Thanks for your reminder. γH2AX immunohistochemical staining is required to be located in the islets. It measured the number of β cells stained with brown, not the brown area. The ZOOM image of γH2AX staining showed that the EUG improvement effect of 10 mg/kg was better than that of 5 mg/kg. γH2AX, as a marker of DNA damage, exhibits nuclear localization and is absent in the cytoplasmic compartment. Therefore, in Figure 4C-D, we quantified the proportion of cells exhibiting brown staining. In Figure 4C, black arrows were employed to highlight the presence of brown-stained islet β cells.

      (4) Is there a reason for looking at mRNA levels of Ho-1 but not KEAP1 or NQO-1 ? What is the expression of Nrf2 itself at the RNA level? Please give in the text what the abbreviations MDA, SOD, CAT GSH-Px stand for. Are these protein levels or activity assays? Units in the y-axis of graphs?

      Thanks for your constructive suggestion.The required KEAP1 and NQO-1 primers have been synthesized, and the relevant data have been supplemented in the revised manuscript. The expression of Nrf2 itself at the RNA level is T-NRF2 (Total NRF2). The MDA, SOD, CAT and GSH-Px abbreviations stand for Malondialdehyde, Superoxide dismutase, Catalase, Glutathione peroxidase, and the relevant information, which have been supplemented in the revised manuscript. These are activity assays of serum, and units in the y-axis of graphs have been added in the revised manuscripts.

      (5) The Ins levels in the culture medium of STZ + ML treated cells are much lower than the levels in STZ treated cells (6D). This is not consistent with the results of Ins cell content or Ins expression as stated (6B and D).

      Thanks for your careful review. The experimental samples in Figure 6C in the revised manuscript represent the proteins extracted from cells of each group, while the experimental samples in Figure 6E represent the supernatant of cells from each group. ML385 is an inhibitor of NRF2, which effectively suppresses the NRF2 signaling pathway and aggravates MIN6 cell damage, resulting in lower INS expression observed in both the STZ+ML385 group depicted in Figures 6C and 6E compared to that in the STZ group. Although the sample sources of the two groups differ and there are slight variations in the trend, it can be observed that the overall trend of the STZ+ML385 group is comparatively lower than that of the STZ group.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      It is evident that studying leukocyte extravasation in vitro is a challenge. One needs to include physiological flow, culture cells and isolate primary immune cells. Timing is of utmost Importance and a reproducible setup essential. Extra challenges are met when extravasation kinetics in different vascular beds is required, e.g., across the blood-brain barrier. In this study, the authors describe a reliable and reproducible method to analyze leukocyte TEM under physiological flow conditions, including this analysis. That the software can also detect reverse TEM is a plus.

      Strengths:

      It is quite a challenge to get this assay reproducible and stable, in particular as there is flow included. Also for the analysis, there is currently no clear software analysis program, and many labs have their own methods. This paper gives the opportunity to unify the data and results obtained with this assay under label-free conditions. This should eventually lead to more solid and reproducible results.

      Also, the comparison between manual and software analysis is appreciated.

      We thank the Reviewer for their positive evaluation of our manuscript and highlighting the value of obtaining more reproducible and unbiases results, as well as detection of forward and reverse transmigration with UFMTrack.

      Weaknesses:

      The authors stress that it can be done in BBB models, but I would argue that it is much more broadly applicable. This is not necessarily a weakness of the study but more an opportunity to strengthen the method. So I would encourage the authors to rewrite some parts and make it more broadly applicable.

      We thank the Reviewer for this suggestion. In the revised version of our manuscript, we have now emphasized the broader applicability of UFMTrack to analyze the interaction of immune cells with 2dimensional endothelial monolayers in various contexts in the abstract, introduction, and discussion sections.

      Reviewer #2 (Public Review):

      Summary:

      This paper develops an under-flow migration tracker to evaluate all the steps of the extravasation cascade of immune cells across the BBB. The algorithm is useful and has important applications.

      Strengths:

      Algorithm is almost as accurate as manual tracking and importantly saves time for researchers.

      We thank the Reviewer for this positive evaluation of our work.

      Weaknesses:

      Applicability can be questioned because the device used is 2D and physiological biology is in 3D. Comparisons to other automated tools was not performed by the authors.

      We thank the Reviewer for pointing our attention to these weaknesses in our manuscript.

      We have clarified in the revised manuscript that using 2D endothelial monolayer models in parallel laminar flow chambers is still a state-of-the-art methodology for studying the multi-step extravasation process of immune cells across endothelial monolayers under physiological flow by in vitro live cell imaging. These models provide excellent optical quality that is not yet achieved in 3D models. We have extended the introduction to emphasize the limitations of existing tools that motivated us to establish UFMTrack. We have furthermore extended the discussion section to highlight the features unique to our UFMTrack framework.

      Reviewer #3 (Public Review):

      Summary:

      The authors aimed to establish a faster and more efficient method of tracking steps of T-cell extravasation across the blood brain barrier. The authors developed a framework to visualize, recognize and track the movement of different immune cells across primary human and mouse brain microvascular endothelial cells without the need for fluorescence-based imaging. The authors succinctly describe the basic requirements for tracking in the introduction followed by an in-depth account of the execution.

      We thank the Reviewer for their positive evaluation of our manuscript and highlighting the value of label-free analysis of the multistep immune cell extravasation cascade with UFMTrack.

      Weaknesses and Strengths:

      Materials & methods and results:

      (1) The methods section also lacks details of the microfluidic device that the authors talk about in the paper. Under physiological sheer stress, the T-cells detach from the pMBMEC monolayer, and are hence unable to be detected; however, this observation requires an explanation pertaining to the reason of occurrence and potential solutions to circumvent it to ensure physiologically relevant experimental parameters.

      We thank the Reviewer for pointing out this oversight. We have used a custom-made microfluidic device that has been published and described in detail before. This information has now been included in the Methods Section under Point 7, and the two references describing the flow chamber in depth are mentioned below and have been included in the manuscript.  

      Coisne Caroline, Ruth Lyck and Britta Engelhardt. 2013. Live cell imaging techniques to study T cell trafficking across the blood-brain barrier in vitro and in vivo. Fluids and Barriers of the CNS 10:7 doi:10.1186/20458118-10-7; 21 January 2013

      Lyck R, Hideaki Nishihara, Sidar Aydin, Sasha Soldati and Britta Engelhardt. 2022. Modeling brain vasculature immune interactions in vitro. Angogenesis, 2nd edition. Editors PatriciaD’Amore and Diane Bielenberg Cold Spring Harb Perspect Med doi: 10.1101/cshperspect.a041185

      T cell detachment is a physiologically relevant parameter besides T cell arrest, polarization, crawling, probing, and transmigration during the interaction with an endothelial monolayer. T cell detachment means that post-arrest, the T cell cannot engage adhesion molecules required for subsequent polarization and, eventually, transmigration. 

      (2) The author describes a method for debris exclusion using UFMTrack that eliminates objects of <30 pixels in size from analysis based on a mean pixel size of 400 for T lymphocytes. However, this mean pixel size appears to stem from in-vitro activated CD8 T cells, which rapidly grow and proliferate upon stimulation. In line with this, activated lymphocytes exhibit increased cytoplasmic area, making them appear less dense or “brighter” by phase microscopy compared to naïve lymphocytes, which are relatively compact and subsequently appear dimmer. Given this, it is not clear whether UFMTrack is sufficiently trained to identify naïve human lymphocytes in circulating blood, nor smaller, murine lymphocytes. Analysis of each lymphocyte subtype in terms of pixel size and intensity would be beneficial to strengthen the claim that UFMTrack can identify each of these populations. Additionally, demonstrating that UFMTrack can correctly characterize the behavior of naïve versus activated lymphocytes isolated from murine and human sources would strengthen the claim that UFMTrack can be broadly applied to study lymphocyte dynamics in diverse models without additional training

      We thank the Reviewer for the suggestion to more precisely evaluate the range of cell sizes that can be analyzed by our framework. We have included a visualization of crawling cell sizes successfully analyzed by the UFMTrack in Supplementary Figure 7. It demonstrates that the human peripheral blood mononuclear cells, that are almost twice as small as the activated mouse CD4 T cells used in these assays, can be successfully segmented, tracked, and analyzed with the UFMTrack framework. Thus, our UFMTrack framework is suitable for a broad application to differentially sized immune cells during their interaction with the endothelial cell monolayer under flow. 

      (3) Average precision was compared to the analysis of UFMTrack but it is unclear how average precision was calculated. This information should have been included in the methods section

      We thank the Reviewer for pointing our attention to the missing information. We have added a subsection, “Performance Analysis”, to the Materials and Methods section, where we describe the statistical methods and the performance metrics used to evaluate the UFMTrack framework.

      (4) CD4 and CD8 T cells exhibit distinct biology and interaction kinetics driven in part by their MHC molecule affinity and distinct receptor expression profiles. Thus, it is unclear why two distinct mechanisms of endothelial cell activation are needed to see differences between the populations.

      We thank the Reviewer for pointing out that different cytokine stimulations of endothelial cells were used in the assays used here to test our UFMTrack to analyze CD4 and CD8 T cell interactions with the endothelial monolayer. While the Reviewer is correct that CD4 and CD8 T cells use different mechanism to cross the pMBMEC monolayer as show by us (doi: 10.1002/eji.201546251.) and others and that recognition of cognate antigen on MHC class I on pMBMECs will arrest CD8 T cells and lead to CD8 T-cell mediated apoptosis ( doi: 10.1038/s41467-023-38703-2.) the focus of the present study was not on comparing CD4 and CD8 T cell interactions with the pMBMEC monolayer but rather to test suitability of UFMTrack to study the different multi-step transmigration of these T cell subsets across the endothelial monolayer. 

      (5) The BMECs are barrier tissues but were cultured on µdishes in this study. To study the transmigration of T-cells across the endothelium, the model would have been more relevant on a semi-permeable membrane instead of a closed surface.

      We understand the critique of the Reviewer, but laminar flow chambers with endothelial monolayers still provide a state-of-the-art and established methodology to study immune cell migration across endothelial monolayers by in vitro live cell imaging including endothelial cells forming the blood-brain barrier.  

      (6) Methods are provided for the isolation and expansion of human effector and memory CD4+ T cells. However, there is no mention of specific CD4+ T cell populations used for analysis with UFMTrack, nor a clear breakdown of tracking efficiency for each subpopulation. Further, there is no similar method for the isolation of CD8+ T cell compartments. A clear breakdown of the performance efficiency of UFMTrack with each cell population investigated in this study would provide greater insight into the software’s performance with regard to tracking the behavior and movement of distinct immune populations.

      We thank the Reviewer for this comment. Since a fair performance evaluation requires collecting reliable and consistent manual annotations, in this work we have performed such analysis only for the mouse CD8 T-cell population migrating on the pMBMEC monolayer. We have chosen this as a reference since it is a different cell population than the one the segmentation model was trained on. This provides an insight into how high performance is expected when other immune cell types are studied than the ones used for model development.

      (7) The results section is quite extensive and discusses details of establishment of the framework while highlighting both the pros and cons of the different aspects of the process, for example the limitation of the two models, 2D and 2D+T were highlighted well. However, the results section includes details which may be more fitting in the methods section.

      We thank the Reviewer for highlighting the extensive work carried out in the development of our UFMTrack framework. We decided to include in the results section only the description of key elements and design decisions taken when developing the framework, such as the need to include a time series of images for successful segmentation of the transmigrated cells. At the same time, the majority of implementational details can be found in the Supplementary Material.

      (8) A few statements in the results section lacked literary support, which was not provided in the discussion either, such as support for increased variance of T-cell instantaneous speed on stimulated vs non-stimulated pMBMECs. Another example is the enhancement of cytokine stimulation directed T-cell movement on the pMBMECs that the authors observed but failed to relay the physiological relevance of it. The authors don’t provide enough references for developments in the field prior to their work which form the basis and need for this technology.

      We thank the Reviewer for this comment and for asking for literature references. However, we cannot provide such references as these are original observations we made by employing the UFMTrack framework.  This shows that UFMTrack observes T-cell behaviors that have previously been overlooked. Their physiological relevance will have to be explored in separate studies. We have extended the introduction section to include the details on the existing methods developed in the field, as well as their weaknesses that motivated the development of the UFMTrack framework.

      (9) The rationale for use of OT-1 and 2D2-derived murine lymphocytes is unclear here. The OT-1 model has been generated to study antigen-specific CD8+ T cell responses, while the 2D2 model has been generated to recapitulate CD4 T cell-specific myelin oligodendrocyte glycoprotein (MOG) responses.

      To establish and test the UFMTrack framework, we have made use of the specific T-cell subsets and endothelial cell models we generally use within our research context. Especially for animal work, this is according to the 3R rules requesting to reduce animal experimentation.  

      Figures and text:

      (1) There are certain discrepancies and misarrangement of figures and text. For example, discussion of the effect of sheer flow on T cell attachment as part of the introduction in figure 1 and then mentioning it in the text again in the results section as part of figure 4 is repetitive.

      We thank the Reviewer for pointing our attention to this misarrangement. We have adjusted the label of Figure 4 to emphasize that this effect is correctly captured by the UFMTrack.

      (2) Section IV, subsection 1 of the results section, refers to ‘data acquisition section above’ in line 279, however the said section is part of materials and methods which is provided towards the end of the manuscript.

      We thank the Reviewer for pointing our attention to this misarrangement. We have adjusted the text to reflect the correct chapter order.

      (3) There are figures in the manuscript that have not been referenced in the results section, for example, figure 3A and B. Figure 1 hasn’t been addressed until subsection 7 of materials and methods

      We thank the Reviewer for pointing our attention to this misarrangement. We have adjusted the text to refer to all figure panels and the clarification of the cell multiplicity estimation in the supplementary information section. References to Figure 1 were added in the introduction section to illustrate the in vitro under flow imaging setup as well as the typical T cell behaviors in such experiments.

      (4) A lack of significance but an observed trend of increased variance of T cell instantaneous speed is reported in line 296-298; however, the graph (figure 4G) shows a significant change in instantaneous speed between non-stimulated and TNFα-stimulated systems. This is misleading to the readers.

      We thank the Reviewer for pointing our attention to this discrepancy. We have expanded the text to indicate a low statistical significance for the TNF and no significance but just a trend for the IL1-beta conditions.

      (5) The authors talk about three beginner experimentors testing the manual T cell tracking process but figure 5 only showcases data from two experimentors without stating the reason for excluding experimentor 1.

      We thank the Reviewer for pointing our attention to this ambiguity. While both the migration analysis and the manual cell tracking were performed by all three beginner experimenters, the cell tracking data for the first one was unfortunately lost due to a hardware failure.

      Discussion:

      (1) While the discussion captures the major takeaways from the paper, it lacks relevant supporting references to relate the observation to physiological conditions and applicability.

      This study is not about the physiological relevance of the microfluidic devices and immune cells used but rather about advancing methodology to analyze dynamic immune cell behavior on endothelial monolayers under physiological flow. Therefore, the discussion does not extend to comparing the physiological relevance of the specific in vitro models employed in this study.   

      (2) The discussion lacks connection to the results since the figures were not referenced while discussing an observed trend

      We thank the Reviewer for pointing our attention to this misarrangement. We have included the references to the relevant figures as well as supporting references.

      (3) The authors briefly looked into mouse and human BMECs and their individual interaction with Tcells, but don’t discuss the differences between the two, if any, that challenged their framework.

      We thank the Reviewer for pointing our attention to this weakness. We have added to the discussion section clarifications on the challenges of analyzing the T cell interactions with the HBMEC and the BMDM interactions with the pMBMEC monolayer.

      (4) Even though though the imaging tool relies on difference in appearance for detection, the authors talk about lack of feasibility in detecting transmigration of BMDMs due to their significantly different appearance. The statement lacks a problem solving approach to discuss how and why this was the case.

      We thank the Reviewer for pointing our attention to this weakness and apologize for the misleading explanation of the problem of analyzing the BMDM sample. Since the transmigrated part of the macrophages differs in appearance from a transmigrated part of a T cell, its detection by a Deep Neural Network trained on the T cell data is worse than that for the T cells. At the same time, the detection performance before the transmigration is sufficient for the BMDM migration analysis. The potential approaches to alleviate this are added to the discussion section.

      Relevance to the field:

      Utilizing the framework provided by the authors, the application can be adapted and/or utilized for visualizing a range of different cell types, provided they are different in appearance. However, this would require extensive changes to the script and won’t be adaptable in its current form.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors should announce in the abstract that the software analysis Track is downloadable and free to use for all researchers. They may consider providing some sort of helpdesk, although I realize that that may run into too much time.

      As said above, they stress that it can be done in BBB models, but I would argue that it is much more broadly applicable.

      We thank the Reviewer for these suggestions. We have emphasized the broader applicability of UFMTrack in the abstract and pointed out the public availability of the code and data.

      Can they add an experiment that shows that it also works for neutrophils for example? I understand that on paper yes it should work, but the neutrophils are of course faster etc.

      This is an excellent suggestion, but we tested UFMTrack within the current framework of ongoing research, which does not include the investigation of neutrophil transmigration across endothelial monolayers.  

      Also, the combination of different leukocytes in one TEM assay would really be a step forward. If the software can detect different-sized leukocytes, then this should be possible.

      We thank the Reviewer for this suggestion. We have added Supplementary Figure 7, demonstrating the range of cell sizes that were successfully analyzed by the UFMTrack framework throughout our manuscript. We also added a statement to the discussion that according to this data, “simply by discriminating cells by size, it is possible to extend UFMTrack to study the interaction of several types of immune cells migrating on top of a cellular monolayer under flow.”

      Extra challenges: can the method also discriminate between paracellular and transcellular migration modes? In particular for T-cells this is known to happen.

      We thank the Reviewer for this suggestion. We have added this to the potential applications of UFMTrack in the discussion section. While this differentiation is not feasible relying solely on the phasecontrast imaging data, UFMTrack can simplify this analysis by providing automatically the predictions of the transmigration locations, for analysis of the fluorescent data of the junctional labels.

      Reviewer #2 (Recommendations For The Authors):

      This paper develops an under-flow migration tracker to evaluate all the steps of the extravasation cascade of immune cells across the BBB. The algorithm is useful and has important applications. There are several points that need to be addressed, particularly about the claims made by the authors.

      Please see the comments below for more details:

      • Lines 88-92: Add a citation for the characteristics of the BBB as a barrier

      We have added two references accordingly.  

      • Lines 94-95: Can the authors indicate what models were used for these studies and how those compare to their in vitro model? In addition, can the authors say whether T cells were manually tracked in this study to translate results to the clinic and whether the results were successful when translated to the clinic? This may enhance the argument that automatic trackers are needed if the translation was not 100% successful

      This introductory paragraph summarizes in vivo and in vitro observations from several laboratories. Although these studies include manual tracking of T cells, they do not necessarily distinguish all sequential steps of the multi-step T cell transmigration cascade. Thus, automated tracking may provide additional insights, allowing for increased translation of findings to the clinic.  

      • Lines 96-98: Citing the work of Roger Kamm and Noo Li Jeon would be helpful here as they pioneered these BBB microfluidic models and have protocol papers on how to build them and how to use them for cancer cell extravasation studies. Roger Kamm has also worked on several extravasation studies with neutrophils, monocytes, and PBMCs from 3D vasculatures in microfluidic devices, under flow using pressurized fluid or recirculating pumps. Mentioning those would be helpful as they are directly related to what the authors are presenting in their paper.

      We thank the Reviewer for this comment, and we consider the work of Roger Kamm and Noo Li Jeon as very valuable for the field. However, these authors have focused on developing functional 3D microfluidic devices, including, e.g., all cells of the neurovascular unit which is not the focus of this present study that solely employed parallel flow chamber devices and endothelial monolayers.  

      • Lines 110-116: Can the authors comment on the use of ImageJ or similar automatic tracking tools and how these compare to the under-flow migration tracker developed in this paper? Several groups use ImageJ to track cellular migration successfully and in an automatic manner with short intervals between each frame. One paper that comes to mind is Chen et al: DOI: 10.1073/pnas.1715932115 where neutrophil migration in 3D was assessed with ImageJ in microfluidic devices of the vasculature. If the authors can highlight differences between their tool and what is currently available and used for automatic tracking (e.g. ImageJ), this would help in understanding the advantages of the migration tracker developed in this paper.

      • Lines 118-121: Add citations for the current state of the art for T cell extravasation tracking

      We thank the Reviewer for these suggestions. We have extended the introduction to add more details on the available tools for tracking migrating immune cells and their limitations, as well as the discussion section to emphasize the features unique to the developed UFMTrack framework.

      • Figure 1: The device used by the authors is considered to be a 2D microfluidic device with a monolayer of mouse brain endothelial cells. I would recommend the authors to carefully revise the claims made in the paper to mention that this is a 2D device as opposed to a 3D device, in order to not mislead readers who may be expecting these analyses to be performed in 3D vasculatures.

      We thank the Reviewer for this suggestion. We have included in the summary the mention of the 2dimensional nature of the employed BBB model.

      • Figure 1: The T cells used in this study are not fluorescently-labeled but the authors mention that this is an issue from current state-of-the-art tools. I would recommend that the authors remove this point as being an issue because it is not addressed in their paper. The T cells are also not labeled in this study so this limitation of other systems is not addressed in this paper.

      We apologize to the Reviewer as we do not understand this question. There will be many experimental conditions not allowing to study fluorescently tagged T cells. Therefore, UFMTrack is tailored to follow and analyze T cells and other immune cells during their interaction with endothelial monolayers independent of a fluorescence tag.  

      • Figure 1: Was the shear stress controlled manually with a syringe? Or with the use of a pressure controller? I would clarify this aspect and discuss human errors that can be introduced from manually controlling the pressure applied to the monolayer.

      We thank the Reviewer for pointing our attention to this ambiguity. We have added a mention of the automated syringe pump used to control the shear stress in the text where the values of shear stress applied to the sample are first mentioned.

      • Figure 1: Does T cell attachment occur within the first 5 minutes? Can the authors comment on how they chose this timeline and the percentage of T cells that are washed off at the second step at 1.5 dynes/cm^2? Is 30 seconds enough to ensure all the non-adhered T cells are washed off with 1.5 dyns/cm^2?

      Superfusion of the T cells over the endothelial monolayer is performed under 0.5 dynes/cm2 to allow the T cells to settle on the endothelial cell monolayer under flow. After increasing to physiological, flow non adherent T cells detach within 30 seconds, as described by the Reviewer. We have included in the Methods Section Point 7 the references describing in depth the design of the flow chamber device and methods used here.  

      • Line 154: How many images were used in the training vs. testing dataset for T cell migrations?

      We thank the Reviewer for pointing our attention to this missing information. We have added the sizes of the training and validation datasets. Specifically, the 226MPix of available imaging data was split into 154Mpix training and 37 MPix validation sets. The gap in between was introduced to avoid a correlation between validation and training set that would compromise the performance evaluation.

      • Are the supplementary videos at real speed or accelerated?

      We thank the Reviewer for pointing our attention to this missing information. The videos are sped up by a factor of 96. We have added this information to the Supplementary video descriptions.  

      • Lines 208 216: Can the authors comment on how their initial adhesion timeframe of 30sec before starting the recording at 5.5min affects the number of T cells with rapid displacement? 30 seconds may not be enough to ensure T cells have adhered to the endothelium

      Please see our comment above. The methodology used in the present assays has been set up and validated in numerous publications. We have included in the Methods Section under Point 7 the references describing in depth the design of the flow chamber device and the methods used here.  

      • Lines 275-277: Was the number of testing images 18? Can the authors comment on how this compares to training dataset size and whether these numbers are enough to achieve robust results?

      We apologize for this ambiguity in our manuscript. The framework was evaluated on 18 imaging datasets, each corresponding to 32 minutes of recording, not 18 images. We have added this clarification to the “CD4+ T cell analysis” subsection. The total size of these datasets is 18 datasets * 191 timeframe/dataset * 9.9MPix/frame = 34MPix

      • Figure 4B: Can the authors add statistics here? Individual datapoints on the error bars would be helpful too. 

      We thank the Reviewer for pointing our attention to this weakness. The data corresponds to the statistical errors as evaluated based on all cells in the 18 datasets. We have added the total number of cells in each of the endothelium stimulation conditions to the text.

      • Figure 4C-J: Can the authors put individual datapoints here as well and explain whether they considered each T cell to be one datapoint or each endothelium (averaging all T cells) to be one datapoint? 

      We thank the Reviewer for this suggestion. However, adding about one thousand points corresponding to each cell would be impractical. We thus present the distributions of the evaluated from the data metrics as a histogram on the violin plot instead of the swarm plot.

      • Figure 4: Did the authors wash the monolayers before introducing T cells? Soluble unbound cytokines may still be present and there are two different questions that would be studied here: “Is the inflamed endothelium affecting T cell migration?” (if washing was performed) or “Is T cell and microenvironmental inflammation affecting T cell migration?” (if no washing was performed)

      The endothelial monolayers are “washed” by starting the flow in the flow chamber device and this is before superfusing the T cells over the endothelial monolayer. We agree that our flow chamber device combined with UFMTrack will allow to address all these questions.

      • Figure 4I: Are all the T cells decelerating? (negative AM speed)

      We thank the Reviewer for this question. The cells are moving along the flow, which, in our experiments, is from left to right. The vector of speed is thus pointing against the x-axis, and thus the AM speed is negative.

      • Lines 302 306: Please explain how this compares to ImageJ or similar trackers that can achieve similar outputs. 

      We thank the Reviewer for this question. We have added a statement in the “T-cell tracking” section emphasizing that standard trackers are incapable of correctly capturing large displacements.

      • Lines 306-309: It is not lower for TNF stimulation though. How do the authors address this? TNF is also a pro-inflammatory cytokine.

      We have previously shown that stimulation of pMBMECs with IL-1 and TNF-a induces different cell surface levels of ICAM-1 and VCAM-1, which will influence T cell behavior on the pMBMEC monolayer.  

      • Lines 313-315: Could this be because the monolayer was not washed and soluble cytokines affected T cell response directly?

      Please see our answer to lines 306-309.  

      • Lines 319: Please cite Roger Kamm and Noo Li Jeon’s papers on BBB models with human BMECs, pericytes and astrocytes in 3D microfluidic devices.

      We thank the Reviewer again for pointing out these studies. As mentioned above, as our present study does not explore 3D models of the BBB, we think it does not fit into the framework of our study to elaborate on 3D models of the BBB. In addition, this would require the inclusion of a discussion of the work of others like, e.g., Peter Searson and others.  

      • Figure 5: Several statistics are missing from parts of the figure. Please add those.

      We apologize – but we do not understand which statistical analysis the Reviewer is missing from this Figure.  

      • Can the authors comment on the number of T cells perfused over the monolayer and if this ratio of T cells to endothelial cells makes physiological sense? Too many T cells may result in endothelium inflammation and increased diapedesis.

      The number of T cells used to suprerfuse over the endothelial monolayer is tested to avoid aggregation of T cells in suspension and thus artificial interactions with the endothelial monolayer. T cell behavior on the pMBMEC monolayer remains the same over the dilution of factor 10.  

      • Lines 381 383: How does this compare to analyses that look at the cross-section of the endothelium? It is difficult to assess transmigration looking at the top view of the endothelium. Perhaps, cross-section assessments will identify differences in manual vs. automatic tracking.

      There is, to the best of our knowledge, no microscopic device that would allow for in vitro live cell imaging of a live endothelial monolayer – this is in the presence of tissue culture medium – from the side at a resolution that would allow to define transmigration. Our current study rather shows the UFMTrack can distinguish cells moving above or below the endothelial monolayer.  

      • Figure 5J: This is probably the most important argument of the paper. If the authors can show statistical differences in their graph, this would greatly help convince readers that this tool is necessary and actually computationally efficient compared to manual work by researchers.

      We thank the Reviewer for this suggestion. However, comparing a single data point for automated measurement with four manual experimenter analysts is not a statistically sound comparison. We believe that Figure 5K is clearly showing the factor 5 difference in analysis speed as compared to manual analysis. More importantly, though, the automated analysis is taking the machine time, lifting the need for the experimenter to invest even 1/5th of the original analysis time.

      • Figure 6: Did the authors use autologous immune cells and endothelial cells? This is particularly relevant with the use of human-derived T cells (line 436) on the BMEC monolayer. Can the authors comment on non-self reactivity by the T cells encountering BMEC from another human subject?

      Autologous T cell interaction with BMECs would only be possible when using hiPSC-derived EECM-BMECs and the T cells from the same individual. All other experimental frameworks will not include autologous interactions. This is the experimental framework used by most authors studying immune cell interactions with commercially available donors. We have not studied alloreactive interactions in our assays and thus cannot further comment.  

      • Figure 6M,N,O: How does this compare to ImageJ for tracking of fluorescent cells? I recommend the authors to try that, at least for this section, as this may enhance their argument for their tool vs. standard tools like ImageJ if success rates are higher for their tool.

      We thank the Reviewer for this suggestion. We included a note on the analysis of the fluorescent datasets using the  TrackMate plugin for imageJ performed previously in our lab in the “Human T cells on immobilized recombinant BBB adhesion molecules” subsection.

      • Figure 6: Please put individual datapoints on the bar or violin plots where they are missing.

      We thank the Reviewer for this suggestion. However, adding about one thousand points corresponding to each cell would be impractical. We thus present the distributions of the evaluated from the data metrics as a histogram on the violin plot instead of the swarm plot.

      • Lines 467-471: This argument is important and should be mentioned earlier in the introduction.

      Another point that can be mentioned is the application of this platform to imaging modalities in vivo (mouse or human) given that there is no fluorescent staining in these cases. This review may be relevant: https://doi.org/10.1002/jcb.10454

      We thank the Reviewer for this suggestion. We have clarified in the introduction that UFMTrack does not require fluorescent labels of the imaged migrating cells and relies solely on the phase contrast imaging data.

      • Discussion: Please address a few more potential applications to this study. One can be cancer and immune infiltration.

      We thank the Reviewer for this suggestion. We have elaborated on additional potential applications to the discussion section.

      Reviewer #3 (Recommendations For The Authors):

      (1) Line 327-328: The authors talk about ‘As we have previously shown…pMBMEC monolayers differs between CD4+ and CD8+ cells…’. Where was this shown? If it was in a previously published article, please provide a reference.

      We have added these missing references.  

      (2) Line 353: Please provide clear location on where to find the associated information instead of stating ‘see below’.

      We thank the Reviewer for pointing our attention to this ambiguity. We have corrected the phrase to “see next paragraph”

      (3) Line 439: Please correct the acronym to BMECs

      We thank the Reviewer for pointing our attention to this typo. We have corrected it.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors employed direct RNA sequencing with nanopores, enhanced by 5' end adaptor ligation, to comprehensively interrogate the human transcriptome at singlemolecule and nucleotide resolution. They conclude that cellular stress induces prevalent 5' end RNA decay that is coupled to translation and ribosome occupancy. Contrary to the literature, they found that, unlike typical RNA decay models in normal conditions, stress-induced RNA decay is dependent on XRN1 but does not depend on the removal of the poly(A) tail. The findings presented are interesting but a substantial amount of work is needed to fully establish these paradigm-shifting findings.

      Strengths:

      These are paradigm-shifting observations using cutting-edge technologies.

      Weaknesses:

      The conclusions do not appear to be fully supported by the data presented.

      Our response to the reviewer comments is provided at the end of this document in the section "Recommendations For The Authors"

      Reviewer #2 (Public Review):

      In the manuscript "Full-length direct RNA sequencing uncovers stress-granule dependent RNA decay upon cellular stress", Dar, Malla, and colleagues use direct RNA sequencing on nanopores to characterize the transcriptome after arsenite and oxidative stress. They observe a population of transcripts that are shortened during stress. The authors hypothesize that this shortening is mediated by the 5'-3' exonuclease XRN1, as XRN1 knockdown results in longer transcripts. Interestingly, the authors do not observe a polyA-tail shortening, which is typically thought to precede decapping and XRN1-mediated transcript decay. Finally, the authors use G3BP1 knockout cells to demonstrate that stress granule formation is required for the observed transcript shortening.

      The manuscript contains intriguing findings of interest to the mRNA decay community. That said, it appears that the authors at times overinterpret the data they get from a handful of direct RNA sequencing experiments. To bolster some of the statements additional experiments might be desirable.

      A selection of comments:

      (1) Considering that the authors compare the effects of stress, stress granule formation, and XRN1 loss on transcriptome profiles, it would be desirable to use a single-cell system (and validated in a few more). Most of the direct RNAseq is performed in HeLa cells, but the experiments showing that stress granule formation is required come from U2OS cells, while short RNAseq data showing loss of coverage on mRNA 5'ends is reanalyzed from HEK293 cells. It may be plausible that the same pathways operate in all those cells, but it is not rigorously demonstrated.

      We agree with the reviewer that performing all experiments in a single cell system would be desirable. Presently, our core findings on 5’ RNA shortening are all performed in HeLa cells: the identification of 5’ RNA shortening, the reliance of shortening through XRN1 silencing, suppression of shortening by translation inhibition, and now the relationship between 5’ shortening and deadenylation/decapping through experiments described further below. Our use of other cell lines is primarily to show that 5’ shortening is a general phenomenon, and we have now done this for U20S cells, HEK293 cells, and primary 3T3 cells from mouse. 

      Regarding stress granule formation, we are unfortunately restricted by the lack of available wellcharacterized resources. The DDG3BP1/2 U2OS is a well characterized cell line that has been extensively used for stress granule-related experiments. We have therefore opted to use it and performed experiments to verify both the occurrence of stress-induced RNA shortening as well as the rescue in the absence of stress granules. The reproducibility and breadth of the cell lines used in our analysis makes us confident on the generality of our findings.

      (2) An interesting finding of the manuscript is that polyA tail shortening is not observed prior to transcript shortening. The authors would need to demonstrate that their approach is capable of detecting shortened polyA tails. Using polyA purified RNA to look at the status of polyA tail length may not be ideal (as avidity to oligodT beads may increase with polyA tail length and therefore the authors bias themselves to longer tails anyway). At the very least, the use of positive controls would be desirable; e.g. knockdown of CCR4/NOT.

      We thank the reviewer for their comment. Previous studies, using in vitro transcribed RNA molecules, have shown that direct RNA sequencing can capture and quantify poly(A) tails of varying lengths (Krause et al. 2019). Specifically, a range of 10 to 150 nt has been tested and a high concordance between known and dRNA-Seq determined values was observed. Both tailfindR and nanopolish (used in this work) showed high poly(A) tail estimation accuracy.

      Regardless, we agree with the reviewer that our method depends on poly(A) tail capture and thus may be incomplete for fully quantifying poly(A) length changes. We therefore opted to replace these data and instead follow this and other reviewers’ suggestions and perform experiments following knockdown of CCR4/NOT using cells expressing a catalytically inactive CNOT8 (CNOT8*) dominant negative mutant (Chang et al. 2019). Our new data show that stress-induced 5’ end decay is indeed not dependent on prior removal of the poly(A) tail. Specifically, we find that transcript shortening is still observed upon oxidative stress in cells expressing CNOT8* compared to control cells. We present these new results in Fig. 3 and Sup. Fig 3. 

      (3) The authors use a strategy of ligating an adapter to 5' phosphorylated RNA (presumably the breakdown fragments) to be able to distinguish true mRNA fragments from artifacts of abortive nanopore sequencing. This is a fantastic approach to curating a clean dataset. Unfortunately, the authors don't appear to go through with discarding fragments that are not adapter-ligated (presumably to increase the depth of analysis; they do offer Figure 1e that shows similar changes in transcript length for fragments with adapter, compared to Figure 1d). It would be good to know how many reads in total had the adapter. Furthermore, it would be good to know what percentage of reads without adapters are products of abortive sequencing. What percentage of reads had 5'OH ends (could be answered by ligating a different adapter to kinasetreated transcripts). More read curation would also be desirable when building the metagene analysis - why do the authors include every 3'end of sequenced reads (their RNA purification scheme requires a polyA tail, so non-polyadenylated fragments are recovered in a nonquantitative manner and should be discarded).

      We thank the reviewer for appreciating our approach. The reviewer is correct that we do not discard reads that are not adapter-ligated. As the reviewer correctly mentions this is to increase the sequencing depth. We have found that the ligation efficiency is very low, ~1-2 % of total reads (now in Sup. Table. 1), across all libraries, and so the percentage of REL5-ligated reads does not directly infer the total amount of non-artifactual 5’ ends. Instead, we use these REL5ligated reads as a subset of our data for which we have extremely high confidence in the true 5’end. Our results show that non-ligated reads display the same length distribution as ligated ones, and that the results are reproducible regardless of read selection (e.g. Fig. 1c, e, Sup. Fig. 1k, l, Fig. 3b, c). This strong concordance between REL5-ligated and non-ligated reads suggests that our conclusions on 5’ end shortening are not substantially influenced by abortive sequencing or other artefactual creation of 5’ shortening. We have modified the text to clarify these points and have added plots using only ligated molecules for relevant figures that this was not previously done (Sup. Fig 1l, 3c)

      We agree with the reviewer that non-polyadenylated reads could be discarded from metagene analysis and we have performed this change in the revised version. Our conclusions following removal of non-polyadenylated reads remain unchanged (Sup. Fig. 1g).

      (4) The authors should come to a clear conclusion about what "transcript shortening" means. Is it exonucleolytic shortening from the 5'end? They cannot say much about the 3'ends anyway (see above). Or are we talking about endonucleolytic cuts leaving 5'P that then can be attached by XRN1 (again, what is the ratio of 5'P and 5'OH fragments; also, what is the ratio of shortened to full-length RNA)?

      We thank the reviewer for their suggestion. We have performed additional experiments to investigate the role of deadenylation and decapping by expressing dominant negative forms of the NOT8 deadenylase (NOT8*) and DCP2 decapping (DCP2*) enzyme in HeLa cells. Our results show that neither expression of NOT8* nor DCP2* can inhibit stress-induced transcript shortening following arsenite treatment (Fig. 3e-f). These new data suggest that neither deadenylation nor decapping are required for stress-induced RNA decay. Instead, our data are more compatible with endonucleolytic cleavage as the most likely mechanism for stressinduced RNA decay. We have incorporated these results in the text and present them in Fig. 3 and Sup. Fig. 3.

      (5) The authors should clearly explain how they think the transcript shortening comes about. They claim it does not need polyA shortening, but then do not explain where the XRN1 substrate comes from. Does their effect require decapping? Or endonucleolytic attacks?

      Please also refer to our answer to the previous comment (#4). Collectively, our results from a) the dominant negative expression of NOT8* and DCP2* that show no effect on stress-induced shortening and b) the rescue of transcript length upon translation initiation inhibition, indicate a potential endonucleolytic mechanism as a mediator of stress-induced RNA decay. However, we believe that extensive, further studies currently beyond the scope of this work, will be required to discover the nuclease and to dissect the exact molecular mechanisms that define the 5' ends of mRNAs upon stress-induced decay. We now discuss these points in the discussion.

      (6) XRN1 KD results in lengthened transcripts. That is not surprising as XRN1 is an exonuclease - and XRN1 does not merely rescue arsenite stress-mediated transcript shortening, but results in a dramatic transcript lengthening.

      The reviewer raises an intriguing point. Additional analysis of data has showed that in fact, in unstressed cells, XRN1 KD leads to modestly significant reduction in overall transcript length (Fig. 3b, c). This could possibly be the result of an accumulation of intermediate cleavage products normally expected to be degraded by XRN1 as previously described (Pelechano, Wei, and Steinmetz 2015; Ibrahim et al. 2018).

      Instead, we find that under stress, XRN1 KD shows an almost identical transcript length distribution to unstressed cells and significantly higher than siCTRL stressed cells (Fig. 3b, c). These results indicate that in the absence of XRN1, stress-induced decay is largely abolished. As the reviewer correctly points out, this seems to affect the majority of RNAs which we believe is evidence of the general lack of specificity in the mechanism. Nevertheless, we find that transcripts that are the primary substrates to stress-induced shortening are substantially more lengthened than all other transcripts (Fig. 3e). This indicates that transcripts primarily affected by stress-induced decay are also lengthened the most in the absence of XRN1 and at an even higher level than expected by general XRN1 KD effects.

      Reviewer #3 (Public Review):

      The work by Dar et al. examines RNA metabolism under cellular stress, focusing on stressgranule-dependent RNA decay. It employs direct RNA sequencing with a Nanopore-based method, revealing that cellular stress induces prevalent 5' end RNA decay that is coupled to translation and ribosome occupancy but is independent of the shortening of the poly(A) tail. This decay, however, is dependent on XRN1 and enriched in the stress granule transcriptome. Notably, inhibiting stress granule formation in G3BP1/2-null cells restores the RNA length to the same level as wild-type. It suppresses stress-induced decay, identifying RNA decay as a critical determinant of RNA metabolism during cellular stress and highlighting its dependence on stress-granule formation.

      This is an exciting and novel discovery. I am not an expert in sequencing technologies or sequencing data analysis, so I will limit my comments purely to biology and not technical points. The PI is a leader in applying innovative sequencing methods to studying mRNA decay.

      One aspect that appeared overlooked is that poly(A) tail shortening per se does lead to decapping. It is shortening below a certain threshold of 8-10 As that triggers decapping. Therefore, I found the conclusion that poly(A) tail shortening is not required for stress-induced decay to be somewhat premature. For a robust test of this hypothesis, the authors should consider performing their analysis in conditions where CNOT7/8 is knocked down with siRNA.

      We agree with the reviewer. We have now performed experiments in cells expressing a well characterized catalytically inactive dominant negative NOT8 isoform (NOT8*) (Chang et al.

      2019). Our new data show that stress-induced decay still occurs in cells expressing NOT8*.

      These results confirm our findings that stress-induced decay does not require deadenylation. We present these new results in Fig. 3 and Sup. Fig. 3. 

      Similarly, as XRN1 requires decapping to take place, it necessitates the experiment where a dominant-negative DCP2 mutant is over-expressed.

      We agree with the reviewer and have performed this experiment as requested. Expression of a dominant negative DCP2 (DCP2*) isoform (Loh, Jonas, and Izaurralde 2013) in HeLa cells showed that decapping is also not required for stress-induced decay. We present these new results in Fig. 3 and Sup. Fig. 3.

      Are G3BP1/2 stress granules required for stress-induced decay or simply sites for storage? This part seems unclear. A very worthwhile test here would be to assess in XRN1-null background.

      We thank the reviewer for their comment. Our data show that stress-induced decay is not observed in DDG3BP1/2 U2OS cells, unable to form stress granules (Fig. 6). This result suggests that G3BP1/2 SGs are either a) required for 5’ RNA shortening or b) preserve partially fragmented RNAs that would otherwise be rapidly degraded. We find the second option unlikely for two reasons. First, even if the fragments were rapidly degraded, we would still expect to find evidence of their presence in our data. However, Fig. 6f shows that the length distribution of DDG3BP1/2 U2OS cells, with and without arsenite, are almost identical, thus arguing against the presence of such a pool of rapidly degrading RNAs. Second, if these RNAs were protected by SGs, then they would be expected to be downregulated in the absence of SGs in DDG3BP1/2 U2OS cells treated with arsenite. Our results contradict this hypothesis as no association is found between the level of downregulation in arsenite-treated DDG3BP1/2 U2OS cells and the observed stress-induced fragmentation in WT. Collectively our results point towards G3BP1/2 stress granules being required for stress-induced decay. We have expanded on these points in the manuscript to clarify.

      Finally, the authors speculate that the mechanism of stress-induced decay may have evolved to relieve translational load during stress. But why degrade the 5' end when removing the cap may be sufficient? This returns to the question of assessing the role of decapping in this mechanism.

      The reviewer raises a very interesting point. Our new results, following expression of dominant negative DCP2, show that stress-induced decay does not require decapping. It is therefore plausible that a stress-induced co-translational mechanism cleaves mRNAs endonucleolyticaly to reduce the translational load. Such a mechanism would have many functional benefits as it would acutely reduce the translational load, degrade non-essential RNAs, preserve energy and release ribosomes for translation of the stress response program. We have expanded the discussion to mention these points.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      As you can see from the comments, although the reviewers appreciate the novelty of your findings, there was a consensus opinion from all reviewers that the authors overinterpreted their data, since they only have one assay and did not fully analyze it, as laid out in one of the reviewer's critiques. Some orthogonal validation of the "groundbreaking" claims is necessary. Examination of the effects of upstream events in 5'-to-3' decay, namely deadenylation, and decapping, would be necessary for a better understanding of the phenomena the authors describe. Many tools and approaches for studying this are described well in the literature (CNOT7-KD, dominant negative DCP2 E148Q, XRN1-null cell lines), so it is well within the authors' reach. Overall, while some of the evidence presented is novel and solid, for some of the claims there is only incomplete evidence.

      We thank the reviewers and the editor for their comments and suggestions. We have performed several additional experiments to further support our conclusions. We have notably investigated the role of deadenylation and decapping in the stress-induced decay by expressing dominant negative NOT8 and DCP2, respectively, as suggested. Our results show that neither deadenylation nor decapping is necessary for stress-induced transcript shortening, suggesting an endonucleolytic event. We believe that these additional experiments strengthen the main conclusions of our work. 

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) The experiments were conducted in two unrelated cell lines, HeLa and U2OS. The authors should determine if the 5'end RNA decay in response to stress is also observed in normal human cells such as normal human diploid fibroblasts. Furthermore, it would be important to know if this mechanism is conserved between human and mouse cells. This can be tested in mouse embryonic fibroblasts.

      We thank the reviewer for their suggestion. We have now also performed experiments in the mouse embryonic fibroblast NIH 3T3 cell line. Our new results confirm that stress-induced 5’ end RNA decay is also observed in this primary cell line and is conserved between human and mouse (Sup. Fig. 1k, I). 

      (2) The authors state that they monitored cell viability up to 24 hours after Arsenite treatment, but the data is shown up to 240 min (Suppl. 1a). Also, the Y-axis label of this Figure is "Active cells (%)". This should be changed to "Live cells (%)" if this is what they are referring to.

      We thank the reviewer for identifying this mistake. Cell viability was monitored up to 4 hours after arsenite treatment. We have corrected the text and modified the figure according to the reviewer’s suggestion.

      (3) Based on direct Nanopore-based RNA-seq the authors surprisingly found that RNAs in oxidative stress were globally shorter than unstressed cells. Since Nanopore-based RNA-seq will not detect RNAs that lack a poly A-tail, are they not missing out on RNAs that have already started getting degraded due to the loss of a poly A-tail? Also, I am not sure if they used a spikein control which would be critical to claim global changes in RNA expression.

      We agree with the reviewer that our strategy does not capture RNA molecules without a poly(A) tail. Nevertheless, our data do identify shortening upon stress at the 5’ end of RNAs that include poly(A) tails. We considered this as direct evidence that decay at the 5’ end does not require prior removal of the poly(A) tail. Otherwise, these molecules would not have been captured and observed. Indeed, our newly added data from cells expressing a well characterized catalytically inactive dominant negative NOT8 isoform (Chang et al. 2019) show that stress-induced decay occurs even upon silencing of the CCR4-NOT deadenylation complex. We present these results in Fig. 3 and Sup. Fig 3.

      We would like to clarify that in our results we did not use a spike-in control and thus refrain from claiming global changes in RNA expression. Instead, we compare relative ratios of groups of molecules within libraries that are internally normalized, we perform correlative comparisons that are invariant to normalization and we perform differential gene expression using established normalization schemes such as DESeq2 (Love, Huber, and Anders 2014). 

      (4) Many graphs are confusing and inconsistent. For example, samples for Nanopore RNA-seq were prepared in triplicates. Biological or technical? The schematic in Figure 1a shows ISRIB but it appears from Figure 4 onwards. It is missing in the Figure 1 results and the Figure legend. The X-axis labels of many graphs are confusing. For example, Supplementary Figure 1d, 1e, 1g and 1h. It says transcript length but are these nucleotides? P-values are missing from many of these graphs. For some graphs, the authors compared Unstressed vs Arsenite (Figure 1), but in other panels they state No Ars vs 0.5 mM Ars (Fig. 3a) or Control vs Ars (Figure 5c). Likewise, in Figure 1b, Expression change (log2) is unstressed vs Arsenite or Arsenite vs unstressed?

      We thank the reviewer identifying these inconsistencies in the presentation of our results. The replicates for nanopore RNA-seq experiments were biological. We have now clarified this point in the text. Furthermore, we have removed “ISRIB” from Fig. 1a to avoid any confusion. We have also made our labelling across all figures more consistent using ‘unstressed’ for NO arsenite treatment vs “arsenite” or ‘+ Ars’ for arsenite treatment. 

      (5) The authors transfected cells with siCTRL or siXRN1 using electroporation and treated the cells 72 hours after transfection. Since XRN1 is an essential gene, it would be important to determine the viability of cells 72 hours after transfection. Along these lines, in Figure 3b, it would be important to determine the effect of XRN1 knockdown in unstressed cells. Currently, there are only 3 comparisons in Figure 3b - unstressed, siCTRL + Ars and siXRN1 + Ars, and this is insufficient to conclude the effects of XRN1 knockdown in the presence of Arsenite.

      We thank the reviewer for their suggestion. We have updated Fig. 3b and the text to show the requested conditions: siCTRL and siXRN1 with and without arsenite. While XRN2 is an essential gene for many organisms, XRN1 is not essential in mammalian cells and no increased cell death has been reported for XRN1-KO or –KD cells (Brothers et al. 2023). We have also tested different concentration (up to 40 nM) of siRNA and monitored the cells up to five days after transfection without observing any cell toxicity, as previously reported.

      (6) More broadly, the whole study is somewhat descriptive. The biological effect of 5'end mRNA shortening on gene expression is unclear. There is no data indicating how these changes in RNA lengths impact protein expression. Global quantitative proteomics would be critical to determine this.

      We thank the reviewer for their suggestion. To address this concern we have performed additional experiments using cells expressing catalytically inactive forms of NOT8 (Chang et al. 2019) and DCP2 (Loh, Jonas, and Izaurralde 2013) to inhibit deadenylation and decapping.

      These experiments provide additional mechanistic details for 5’ shortening and suggest endonucleolytic cleavage as a critical step (Fig. 3 and Sup. Fig. 3). We agree that it would be interesting to study the fate of these shortened transcripts notably regarding translation. However, given the complexity of the expected proteome changes also following global translation arrest under stress (Harding et al., 2003; Pakos-Zebrucka et al., 2016), we think that this work is beyond the scope of this manuscript and will be the subject of future studies. 

      Minor comments:

      (1) Some of the affected RNAs can be validated in HeLa and other cell lines.

      We thank the reviewer for their suggestion. We have performed RT-qPCR on 3 different mRNAs that present 5’ shortening upon oxidative stress using different primers located along the mRNA. We hypothesized that the closer the primer set is located to the 5’ end, the less abundant the corresponding region would be for arsenite-treated compared to untreated cells. Our results show indeed that the measured level of these mRNAs depends on the location of the primer sets used for the qPCR, the closer to the 5’end it is, the less abundant the mRNA is upon oxidative stress compared to control cells. We present these data as well as a schematic representing the positions of the primers in Sup. Fig. 2d. 

      (2) The authors should check whether XRN1 also co-localizes in SGs.

      We thank the reviewer for their suggestion. We have performed immunofluorescence on U2OS and HeLa upon oxidative stress and did not observe a co-localization of XRN1 with TIA-1, a marker of stress granules (see below). These results are consistent with (Kedersha et al. 2005) that have shown that XRN1 mainly co-localizes to processing bodies and are very weakly detectable in SGs in DU145 cells. We think that this result is beyond the scope of this study and thus decided to only include it for the reviewers.

      Author response image 1.

      Representative immunofluorescence merged image of HeLa (left panel) and U2OS (right panel) cells treated with sodium arsenite and labelled with anti-TIA1 (red), anti-XRN1 (green) antibodies and DAPI (blue). Scale bar 50 µm.

      (3) XRN1 should be knocked down with more than one siRNA.

      We thank the reviewer for this suggestion. Our results show that our XRN1 KD specifically rescues the length of the most shortened mRNAs (Fig. 3e). This is a highly specific effect that makes us confident it is not mediated by non-specific siRNA binding; thus, we do not consider it necessary to repeat the experiment.

      (4) There are typos in the text regarding Figure 6d, e, and f. Also, Supplementary Figure 4a.

      We thank the reviewer for identifying these mistakes. We have corrected the typos. 

      Reviewer #3 (Recommendations For The Authors):

      The authors should consider testing their hypotheses by arresting the decay pathway using the approaches I mentioned previously. As it stands, some conclusions are somewhat speculative.

      We have replied to the reviewer comments in the public review section. 

      References:

      • Brothers, William R., Farah Ali, Sam Kajjo, and Marc R. Fabian. 2023. “The EDC4-XRN1 Interaction Controls P-Body Dynamics to Link MRNA Decapping with Decay.” The EMBO Journal, August, e113933.

      • Chang, Chung-Te, Sowndarya Muthukumar, Ramona Weber, Yevgen Levdansky, Ying Chen, Dipankar Bhandari, Catia Igreja, Lara Wohlbold, Eugene Valkov, and Elisa Izaurralde. 2019. “A Low-Complexity Region in Human XRN1 Directly Recruits Deadenylation and Decapping Factors in 5’-3’ Messenger RNA Decay.” Nucleic Acids Research 47 (17): 9282–95.

      • Harding, Heather P., Yuhong Zhang, Huiquing Zeng, Isabel Novoa, Phoebe D. Lu, Marcella Calfon, Navid Sadri, et al. 2003. “An Integrated Stress Response Regulates Amino Acid Metabolism and Resistance to Oxidative Stress.” Molecular Cell 11 (3): 619–33.

      • Ibrahim, Fadia, Manolis Maragkakis, Panagiotis Alexiou, and Zissimos Mourelatos. 2018. “Ribothrypsis, a Novel Process of Canonical MRNA Decay, Mediates Ribosome-Phased MRNA Endonucleolysis.” Nature Structural & Molecular Biology 25 (4): 302–10.

      • Kedersha, Nancy, Georg Stoecklin, Maranatha Ayodele, Patrick Yacono, Jens Lykke-Andersen, Marvin J. Fritzler, Donalyn Scheuner, Randal J. Kaufman, David E. Golan, and Paul Anderson. 2005. “Stress Granules and Processing Bodies Are Dynamically Linked Sites of MRNP Remodeling.” The Journal of Cell Biology 169 (6): 871–84.

      • Krause, Maximilian, Adnan M. Niazi, Kornel Labun, Yamila N. Torres Cleuren, Florian S. Müller, and Eivind Valen. 2019. “Tailfindr: Alignment-Free Poly(A) Length Measurement for Oxford Nanopore RNA and DNA Sequencing.” RNA  25 (10): 1229–41.

      • Loh, Belinda, Stefanie Jonas, and Elisa Izaurralde. 2013. “The SMG5-SMG7 Heterodimer Directly Recruits the CCR4-NOT Deadenylase Complex to MRNAs Containing Nonsense Codons via Interaction with POP2.” Genes & Development 27 (19): 2125–38.

      • Love, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2.” Genome Biology 15 (12): 550.

      • Pakos-Zebrucka, Karolina, Izabela Koryga, Katarzyna Mnich, Mila Ljujic, Afshin Samali, and Adrienne M. Gorman. 2016. “The Integrated Stress Response.” EMBO Reports 17 (10): 1374–95.

      • Pelechano, Vicent, Wu Wei, and Lars M. Steinmetz. 2015. “Widespread Co-Translational RNA Decay Reveals Ribosome Dynamics.” Cell 161 (6): 1400–1412.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors report an fMRI investigation of the neural mechanisms by which selective attention allows capacity-limited perceptual systems to preferentially represent task-relevant visual stimuli. Specifically, they examine competitive interactions between two simultaneously-presented items from different categories, to reveal how task-directed attention to one of them modulates the activity of brain regions that respond to both. The specific hypothesis is that attention will bias responses to be more like those elicited by the relevant object presented on its own, and further that this modulation will be stronger for more dissimilar stimulus pairs. This pattern was confirmed in univariate analyses that measured the mass response of a priori regions of interest, as well as multivariate analyses that considered the patterns of evoked activity within the same regions. The authors follow these neuroimaging results with a simulation study that favours a "tuning" mechanism of attention (enhanced responses to highly effective stimuli, and suppression for ineffective stimuli) to explain this pattern.

      Strengths:

      The manuscript clearly articulates a core issue in the cognitive neuroscience of attention, namely the need to understand how limited perceptual systems cope with complex environments in the service of the observer's goals. The use of a priori regions of interest, and the inclusion of both univariate and multivariate analyses as well as a simple model, are further strengths. The authors carefully derive clear indices of attentional effects (for both univariate and multivariate analyses) which makes explication of their findings easy to follow.

      Weaknesses:

      There are some relatively minor weaknesses in presentation, where the motivation behind some of the procedural decisions could be clearer. There are some apparently paradoxical findings reported -- namely, cases in which the univariate response to pairs of stimuli is greater than to the preferred stimulus alone -- that are not addressed. It is possible that some of the main findings may be attributable to range effects: notwithstanding the paradox just noted, it seems that a floor effect should minimise the range of possible attentional modulation of the responses to two highly similar stimuli. One possible limitation of the modelled results is that they do not reveal any attentional modulation at all under the assumptions of the gain model, for any pair of conditions, implying that as implemented the model may not be correctly capturing the assumptions of that hypothesis.

      We thank the reviewer for the constructive comments. In response, in the current version of the manuscript we have improved the presentation. We further discuss how the response in paired conditions is in some cases higher than the response to the preferred stimulus in this letter. For this, we provide a vector illustration, and a supplementary figure of the sum of weights to show that the weights of isolated-stimulus responses for each category pair are not bound to the similarity of the two isolated responses.

      Regarding the simulation results, we have clarified that the univariate effect of attention is not the attentional modulation itself, but the change in the amount of attentional modulation in the two paired conditions. We provide an explanation for this in this letter below, and have changed the term “attentional modulation” to “univariate shift” in the manuscript to avoid the confusion.

      Reviewer #2 (Public Review):

      Summary:

      In an fMRI study requiring participants to attend to one or another object category, either when the object was presented in isolation or with another object superimposed, the authors compared measured univariate and multivariate activation from object-selective and early visual cortex to predictions derived from response gain and tuning sharpening models. They observed a consistent result across higher-level visual cortex that more-divergent responses to isolated stimuli from category pairs predicted a greater modulation by attention when attending to a single stimulus from the category pair presented simultaneously, and argue via simulations that this must be explained by tuning sharpening for object categories.

      Strengths:

      - Interesting experiment design & approach - testing how category similarity impacts neural modulations induced by attention is an important question, and the experimental approach is principled and clever.

      - Examination of both univariate and multivariate signals is an important analysis strategy.

      - The acquired dataset will be useful for future modeling studies.

      Weaknesses:

      - The experimental design does not allow for a neutral 'baseline' estimate of neural responses to stimulus categories absent attention (e.g., attend fixation), nor of the combination of the stimulus categories. This seems critical for interpreting results (e.g., how should readers understand univariate results like that plotted in Fig. 4C-D, where the univariate response is greater for 2 stimuli than one, but the analyses are based on a shift between each extreme activation level?).

      We are happy to clarify our research rationale. We aimed to compare responses in paired conditions when the stimuli were kept constant while varying the attentional target. After we showed that the change in the attentional target resulted in a response change , we compared the amount of this response change to different stimulus category pairs to investigate the effect of representation similarity between the target and the distractor on the response modulation caused by attentional shift. While an estimate of the neural responses in the absence of attention might be useful for other modeling studies, it would not provide us with more information than the current data to answer the question of this study.

      Regarding the univariate results in Fig. 4C-D (and other equivalent ROI results in the revised version) and our analyses, we did not impose any limit on the estimated weights of the two isolated responses in the paired response and thus the sum of the two weights could be any number. We however see that the naming of “weighted average”, which implies a sum of weights being capped at one, has been misleading . We have now changed the name of this model to “linear combination” to avoid confusion

      Previous studies (Reddy et al., 2009, Doostani et al., 2023) using a similar approach have shown a related results pattern: the response to multiple stimuli is higher than the average, but lower than the sum of the isolated responses, which is exactly what our results suggest. We have added discussion on this topic in the Results section in lines 409-413 for clarification:

      “Note that the response in paired conditions can be higher or lower than the response to the isolated more preferred stimulus (condition Mat), depending on the voxel response to the two presented stimuli, as previously reported (Doostani et al. 2023). This is consistent with previous studies reporting the response to multiple stimuli to be higher than the average, but lower than the sum of the response to isolated stimuli (Reddy et al. 2009).”

      We are not sure what the reviewer means by “each extreme activation level”. Our analyses are based on all four conditions. The two isolated conditions are used to calculate the distance measures and the two paired conditions are used for calculating the shift index. Please note that either the isolated or the paired conditions could show the highest response and we seeboth cases in our data. For example, as shown in Figure 4A in EBA, the isolated Body condition and the paired BodyatCar condition show the highest activation levels for the Body-Car pair, whereas in Figure 4C, the two paired conditions (BodyatCat and BodyCatat) elicit the highest response.

      - Related, simulations assume there exists some non-attended baseline state of each individual object representation, yet this isn't measured, and the way it's inferred to drive the simulations isn't clearly described.

      We agree that the simulations assume a non-attended baseline state, and that we did not measure that state empirically. We needed this non-attended response in the simulations to test which attention mechanism led to the observed results. Thus, we generated the non-attended response using the data reported in previous neural studies of object recognition and attention in the visual cortex (Ni et al., 2012, Bao and Tsao, 2018). Note that the simulations are checking for the profile of the modulations based on category distance. Thus, they do not need to exactly match the real isolated responses in order to show the effect of gain and tuning shift on the results. We include the clarification and the range of neural responses and attention parameters used in the simulations in the revised manuscript in lines 327-333:

      “To examine which attentional mechanism leads to the effects observed in the empirical data, we generated the neural response to unattended object stimuli as a baseline response in the absence of attention, using the data reported by neural studies of object recognition in the visual cortex (Ni et al., 2012, Bao and Tsao, 2018). Then, using an attention parameter for each neuron and different attentional mechanisms, we simulated the response of each neuron to the different task conditions in our experiment. Finally, we assessed the population response by averaging neural responses.”

      - Some of the simulation results seem to be algebraic (univariate; Fig. 7; multivariate, gain model; Fig. 8)

      This is correct. We have used algebraic equations for the effect of attention on neural responses in the simulations. In fact, thinking about the two models of gain and tuning shift leads to the algebraic equations, which in turn logically leads to the observed results, if no noise is added to the data. The simulations are helpful for visualizing these logical conclusions. Also, after assigning different noise levels to each condition for each neuron, the results are not algebraic anymore which is shown in updated Figure 7 and Figure 8.

      - Cross-validation does not seem to be employed - strong/weak categories seem to be assigned based on the same data used for computing DVs of interest - to minimize the potential for circularity in analyses, it would be better to define preferred categories using separate data from that used to quantify - perhaps using a cross-validation scheme? This appears to be implemented in Reddy et al. (2009), a paper implementing a similar multivariate method and cited by the authors (their ref 6).

      Thank you for pointing out the missing details about how we used cross-validation. In the univariate analysis, we did use cross validation, defining preferred categories and calculating category distance on one half of the data and calculating the univariate shift on the other half of the data. Similarly, we employed cross-validation for the multivariate analysis by using one half of the data to calculate the multivariate distance between category pairs, and the other half of the data to calculate the weight shift for each category pair. We have now added this methodological information in the revised manuscript.

      - Multivariate distance metric - why is correlation/cosine similarity used instead of something like Euclidean or Mahalanobis distance? Correlation/cosine similarity is scale-invariant, so changes in the magnitude of the vector would not change distance, despite this likely being an important data attribute to consider.

      Since we are considering response patterns as vectors in each ROI, there is no major difference between the two measures for similarity. Using euclidean distance as a measure of distance (i.e. inverse of similarity) we observed the same relationship between weight shift and category euclidean distance. There was a positive correlation between weight shift and the euclidean category distance in all ROIs ( ps < 0.01, ts > 2.9) except for V1 (p = 0.5, t = 0.66). We include this information in the revised manuscript in the Results section lines 513-515:

      “We also calculated category distance based on the euclidean distance between response patterns of category pairs and observed a similarly positive correlation between the weight shift and the euclidean category distance in all ROIs (ps < 0.01, ts >2.9) except V1 ( p = 0.5, t = 0.66).”

      - Details about simulations implemented (and their algebraic results in some cases) make it challenging to interpret or understand these results. E.g., the noise properties of the simulated data aren't disclosed, nor are precise (or approximate) values used for simulating attentional modulations.

      We clarify that the average response to each category was based on previous neurophysiology studies (Ni et al., 2012, Bao and Tsao, 2018). The attentional parameter was also chosen based on previous neurophysiology (Ni et al., 2012) and human fMRI (Doostani et al., 2023) studies of visual attention by randomly assigning a value in the range from 1 to 10. We have included the details in the Methods section in lines 357-366:

      “We simulated the action of the response gain model and the tuning sharpening model using numerical simulations. We composed a neural population of 4⨯105 neurons in equal proportions body-, car-, cat- or house-selective. Each neuron also responded to object categories other than its preferred category, but to a lesser degree and with variation. We chose neural responses to each stimulus from a normal distribution with the mean of 30 spikes/s and standard deviation of 10 and each neuron was randomly assigned an attention factor in the range between 1 and 10 using a uniform distribution. These values are comparable with the values reported in neural studies of attention and object recognition in the ventral visual cortex (Ni et al. 2012, Bao and Tsao 2018). We also added poisson noise to the response of each neuron (Britten et al. 1993), assigned randomly for each condition of each neuron.”

      - Eye movements do not seem to be controlled nor measured. Could it be possible that some stimulus pairs result in more discriminable patterns of eye movements? Could this be ruled out by some aspect of the results?

      Subjects were instructed to direct their gaze towards the fixation point. Given the variation in the pose and orientation of the stimuli, it is unlikely that eye movements would help with the task. Eye movements have been controlled in previous experiments with individual stimulus presentation (Xu and Vaziri-Pashkam, 2019) and across attentional tasks in which colored dots were superimposed on the stimuli (Vaziri-Pashkam and Xu, 2017) and no significant difference for eye movement across categories or conditions was observed. As such, we do not think that eye movements would play a role in the results we are observing here.

      - A central, and untested/verified, assumption is that the multivariate activation pattern associated with 2 overlapping stimuli (with one attended) can be modeled as a weighted combination of the activation pattern associated with the individual stimuli. There are hints in the univariate data (e.g., Fig. 4C; 4D) that this might not be justified, which somewhat calls into question the interpretability of the multivariate results.

      If the reviewer is referring to the higher response in the paired compared to the isolated conditions, as explained above, we have not forced any limit on the sum of the estimated weights to equal 1 or 2. Therefore, our model is an estimation of a linear combination of the two multivariate patterns in the isolated conditions. In fact, Leila Reddy et al. (reference 6) reported that while the combination is closer to a weighted average than to a weighted sum, the sum of the weights are on average larger than 1. In Figure 4C and 4D the responses in the paired conditions are higher than either of the isolated-condition responses. This suggests that the weights for the linear combination of isolated responses in the multivariate analysis should add up to larger than one. This is what we find in our results. We have added a supplementary figure to Figure 6, depicting the sum of weights for different category pairs in all ROIs. The figure illustrates that in each ROI, the sum of weights are greater than 1 for some category pairs. It is however noteworthy that we normalized the weights in each condition by the sum of weights to calculate the weight shift in our analysis. The amount of the weight shift was therefore not affected by the absolute value of the weights.

      - Throughout the manuscript, the authors consistently refer to "tuning sharpening", an idea that's almost always used to reference changes in the width of tuning curves for specific feature dimensions (e.g., motion direction; hue; orientation; spatial position). Here, the authors are assaying tuning to the category (across exemplars of the category). The link between these concepts could be strengthened to improve the clarity of the manuscript.

      The reviewer brings up an excellent point. Whereas tuning curves have been extensively used for feature dimensions such as stimulus orientation or motion direction, here, we used the term to describe the variation in a neuron’s response to different object stimuli.

      With a finite set of object categories, as is the case in the current study, the neural response in object space is discrete, rather than a continuous curve illustrated for features such as stimulus orientation. However, since more preferred and less preferred features (objects in this case) can still be defined, we illustrated the neural response using a hypothetical curve in object space in Figure 3 to show how it relates with other stimulus features. Therefore, here, tuning sharpening refers to the fact that the response to the more preferred object categories has been enhanced while the response to the less preferred stimulus categories is suppressed.

      We clarify this point in the revised manuscript in the Discussion section lines 649-659:

      “While tuning curves are commonly used for feature dimensions such as stimulus orientation or motion direction, here, we used the term to describe the variation in a neuron’s response to different object stimuli. With a finite set of object categories, as is the case in the current study, the neural response in object space is discrete, rather than a continuous curve illustrated for features such as stimulus orientation. The neuron might have tuning for a particular feature such as curvature or spikiness (Bao et al., 2020) that is present to different degrees in our object stimuli in a continuous way, but we are not measuring this directly. Nevertheless, since more preferred and less preferred features (objects in this case) can still be defined, we illustrate the neural response using a hypothetical curve in object space. As such, here, tuning sharpening refers to the fact that the response to the more preferred object categories has been enhanced while the response to the less preferred stimulus categories is suppressed.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      a. The authors should address the apparent paradox noted above (and report whether it is seen in other regions of interest as well). On what model would the response to any pair of stimuli exceed that of the response to the preferred stimulus alone? This implies some kind of Gestalt interaction whereby the combined pair generates a percept that is even more effective for the voxels in question than the "most preferred" one?

      The response to a pair of stimuli can exceed the response to each of the stimuli presented in isolation if the voxel is responsive to both stimuli and as long as the voxel has not reached its saturation level. This phenomenon has been reported in many previous studies (Zoccolan et al., 2005, Reddy et al., 2009, Ni et al., 2012, Doostani et al., 2023) and can be modeled using a linear combination model which does not limit the weights of the isolated responses to equal 1 (Doostani et al., 2023). Note that the “most preferred” stimulus does not necessarily saturate the voxel response, thus the response to two stimuli could be more effective based on voxel responsiveness to the second stimulus.

      As for the current study, the labels “more preferred” and “less preferred” are only relatively defined (as explained in the Methods section), meaning that the more preferred stimulus is not necessarily the most preferred stimulus for the voxels. Furthermore, the presented stimuli are semi-transparent and presented with low-contrast, which moves the responses further away from the saturation level. Based on reported evidence for multiple-stimulus responses, responses to single stimuli are in many cases sublinearly added to yield the multiple-stimulus response (Zoccolan et al., 2005, Reddy et al., 2009, Doostani et al., 2023). This means that the multiple-stimulus response is lower than the sum of the isolated responses and not lower than each of the isolated responses. Therefore, it is not paradoxical to observe higher responses in paired conditions compared to the isolated conditions. We observe similar results in other ROIs, which we provide as supplementary figures to Figure 4 in the revised manuscript.

      We address this observation and similar reports in previous studies in the Results section of the revised manuscript in lines 409-413:

      “Note that the response in paired conditions can be higher or lower than the response to the isolated more preferred stimulus (condition Mat), depending on the voxel preference for the two presented stimuli, as previously reported (Doostani et al., 2023). This is consistent with previous studies reporting the response to multiple stimuli to be higher than the average, but lower than the sum of the response to isolated stimuli (Reddy et al., 2009).”

      b. Paradox aside, I wondered to what extent the results are in part explained by range limits. Take two categories that evoke a highly similar response (either mean over a full ROI, or in the multivariate sense). That imposes a range limit such that attentional modulation, if it works the way we think it does, could only move responses within that narrow range. In contrast, the starting point for two highly dissimilar categories leaves room in principle for more modulation.

      We do not believe that the results can be explained by range limits because responses in paired conditions are not limited by the isolated responses, as can be observed in Figure 4. However, to rule out the possibility of the similarity between responses in isolated conditions affecting the range within which responses in paired conditions can change, we turned to the multivariate analysis. We used the weight shift measure as the change in the weight of each stimulus with the change in the attentional target. In this method, no matter how close the two isolated vectors are, the response to the pair could still have a whole range of different weights of the isolated responses. We have plotted an example illustration of two-dimensional vectors for better clarification. Here, the vectors Vxat and Vyat denote the responses to the isolated x and y stimuli, respectively, and the vector Pxaty denotes the response to the paired condition in which stimulus x is attended. The weights a1 and a2 are illustrated in the figure, which are equal to regression coefficients if we solve the equation Pxaty \= [a1 a2] [x y]’. While the weight values depend on the amplitude of and the angle between the three vectors, they are not limited by a lower angle between Vxat and Vyat.

      We have updated Figure 2 in the manuscript to avoid the confusion. We have also added a figure including the sum of weights for different category pairs in different regions, showing that the sum of weights are not dependent on the similarity between the two stimuli. The conclusions based on the weight shift are therefore not confounded by the similarity between the two stimuli.

      c. Finally, related to the previous point, while including V1 is a good control, I wonder if it is getting a "fair" test here, because the range of responses to the four categories in this region, in terms of (dis)similarity, seems compressed relative to the other categories.

      We believe that V1 is getting a fair test because the single-subject range of category distance in V1 is similar to LO, as can be observed Author response image 1_:_

      Author response image 1.

      Range of category distance in each ROI averaged across participants

      The reason that V1 is showing a more compressed distance range on the average plot is that the category distance in V1 is not consistent among participants. Although the average plots are shown in Figure 5 and Figure 6, we tested statistical significance in each ROI based on single-subject correlation coefficients.

      Please also note that a more compressed range of dissimilarity does not necessarily lead to a less strong effect of category distance on the effect of attention. For instance, while LO shows a more compressed dissimilarity range for the presented categories compared to the other object selective regions, it shows the highest correlation between weight shift and category distance. Furthermore, as illustrated in Figure 5, no significant correlation is observed between univariate shift and category distance in V1, even though the range of the univariate distance in V1 is similar to LO and pFs, where we observed a significant correlation between category distance and univariate shift.

      d. In general, the manuscript does a very good job explaining the methods of the study in a way that would allow replication. In some places, the authors could be clearer about the reasoning behind those methodological choices. For example: - How was the sample size determined?

      Estimating conservatively based on the smallest amount of attentional modulation we observed in a previous study (Doostani et al., 2023), we chose a medium effect size (0.3). For a power of 0.8, the minimum number of participants should be 16. We have added the explanation to the Methods section in lines 78-81:

      “We estimated the number of participants conservatively based on the smallest amount of attentional modulation observed in our previous study (Doostani et al., 2023). For a medium effect size of 0.3 and a power of 0.8, we needed a minimum number of 16 participants.”

      - Why did the authors choose those four categories? What was the evidence that would suggest these would span the range of similarities needed here?

      We chose these four categories based on a previous behavioral study reporting the average reaction time of participants when detecting a target from one category among distractors from another category (Xu and Vaziri-Pashkam, 2019). Ideally the experiment should include as many object categories as possible. However, since we were limited by the duration of the experiment, the number of conditions had to be controlled, leading to a maximum of 4 object categories. We chose two animate and two inanimate object categories to include categories that are more similar and more different based on previous behavioral results (Xu and Vaziri-Pashkam, 2019). We included body and house categories because they are both among the categories to which highly responsive regions exist in the cortex. We chose the two remaining categories based on their similarity to body and house stimuli. In this way, for each category there was another category that elicited similar cortical responses, and two categories that elicited different responses. While we acknowledge that the chosen categories do not fully span the range of similarities, they provide an observable variety of similarities in different ROIs which we find acceptable for the purposes of our study.

      We include this information in the Methods section of the revised manuscript in lines 89-94:

      “We included body and house categories because there are regions in the brain that are highly responsive and unresponsive to each of these categories, which provided us with a range of responsiveness in the visual cortex. We chose the two remaining categories based on previous behavioral results to include categories that provided us with a range of similarities (Xu and Vaziri-Pashkam, 2019). Thus, for each category there was a range of responsiveness in the brain and a range of similarity with the other categories.”

      - Why did the authors present the stimuli at the same location? This procedure has been adopted in previous studies, but of course, it does also move the stimulus situation away from the real-world examples of cluttered scenes that motivate the Introduction.

      We presented the stimuli at the same location because we aimed to study the mechanism of object-based attention and this experimental design helped us isolate it from spatial attention. We do not think that our design moves the stimulus situation away from real-world examples in such a way that our results are not generalizable. We include real-world instances, as well as a discussion on this point, in the Discussion section of the revised manuscript, in lines 611-620:

      “Although examples of superimposed cluttered stimuli are not very common in everyday life, they still do occur in certain situations, for example reading text on the cellphone screen in the presence of reflection and glare on the screen or looking at the street through a patterned window. Such instances recruit object-based attention which was the aim of this study, whereas in more common cases in which attended and unattended objects occupy different locations in space, both space-based and object-based attention may work together to resolve the competition between different stimuli. Here we chose to move away from usual everyday scenarios to study the effect of object-based attention in isolation. Future studies can reveal the effect of target-distractor similarity, i.e. proximity in space, on space-based attention and how the effects caused by object-based and space-based attention interact.”

      - While I'm not concerned about this (all relevant comparisons were within-participants) was there an initial attempt to compare data quality from the two different scanners?

      We compared the SNR values of the two groups of participants and observed no significant difference between these values (ps > 0.34, ts < 0.97). We have added this information to the Methods section.

      Regarding the observed effect, we performed a t-test between the results of the participants from the two scanners. For the univariate results, the observed correlation between univariate attentional modulation and category distance was not significantly different for participants of the two scanners in any ROIs (ps > 0.07 , ts < 1.9). For the multivariate results, the observed correlation between the weight shift and multivariate category distance was not significantly different in any ROIs (ps > 0.48 , ts < 0.71) except for V1 (p-value = 0.015 , t-value = 2.75).

      We include a sentence about the comparison of the SNR values in the preprocessing section in the revised manuscript.

      e. There are a couple of analysis steps that could be applied to the existing data that might strengthen the findings. For one, the authors have adopted a liberal criterion of p < 0.001 uncorrected to include voxels within each ROI. Why, and to what extent is the general pattern of findings robust over more selective thresholds? Also, there are additional regions that are selective for bodies (fusiform body area) and scenes (occipital place area and retrosplenial cortex). Including these areas might provide more diversity of selectivity patterns (e.g. different responses to non-preferred categories) that would provide further tests of the hypothesis.

      We selected this threshold to allow for selection of a reasonable number of voxels in each hemisphere across all participants. To check whether the effect is robust over more selective thresholds, we exemplarily redefined the left EBA region using p < 0.0001 and p < 0.00001 and observed that the weight shift effect remained equivalent. We have made a note of this analysis in the Results section. As for the additional regions suggested by the reviewer, we chose not to include them because they could not be consistently defined in both hemispheres of all participants. Please note that the current ROIs also show different responses to non-preferred categories (e.g. in LO and pFs). We include this information in the Methods section in lines 206-207:

      “We selected this threshold to allow for selection of a reasonable number of voxels in each hemisphere across all participants.”

      And in the Results section in lines 509-512:

      “We performed the analysis including only voxels that had a significantly positive GLM coefficient across the runs and observed the same results. Moreover, to check whether the effect is robust over more selective thresholds for ROI definition, we redefined the left EBA region with p < 0.0001 and p < 0.00001 criteria. We observed a similar weight shift effect for both criteria.”

      f. One point the authors might address is the potential effect of blocking the paired conditions. If I understood right, the irrelevant item in each paired display was from the same category throughout a block. To what extent might this knowledge shape the way participants attend to the task-relevant item (e.g. by highlighting to them certain spatial frequencies or contours that might be useful in making that particular pairwise distinction)? In other words, are there theoretical reasons to expect different effects if the irrelevant category is not predictable?

      We believe that the participants’ knowledge about the distractor does not significantly affect our results because our results are in agreement with previous behavioral data (Cohen et al., 2014, Xu and Vaziri-Pashkam, 2019), in which the distractor could not be predicted. These reports suggest there is a theoretical reason to expect similar effects if the participants could not predict the distractor. To directly test this, one would need to perform an fMRI experiment using an event-related design, an interesting venue for future research.

      We have made a note of this point in the Discussion section of the revised manuscript in lines 621-626:

      “Please note that we used a blocked design in which the target and distractor categories could be predicted across each block. While it is possible that the current design has led to an enhancement of the observed effect, previous behavioral data (Cohen et al., 2014, Xu and Vaziri-Pashkam, 2019) have reported the same effect in experiments in which the distractor was not predictable. To study the effect of predictability on fMRI responses, however, an event-related design is more appropriate, an interesting venue for future fMRI studies.”

      g. The authors could provide behavioural data as a function of the specific category pairs. There is a clear prediction here about which pairs should be more or less difficult.

      We provide the behavioral data as a supplementary figure to Figure 1 in the revised manuscript. We however do not see differences in behavior for the different category paris. This is so because our fMRI task was designed in a way to make sure the participants could properly attend to the target for all conditions. The task was rather easy across all conditions and due to the ceiling effect, there was no significant difference between behavioral performance for different category pairs. However, the effect of category pair on behavior has been previously tested and reported in a visual search paradigm with the same categories (Xu and Vaziri-Pashkam, 2019), which was in fact the basis for our choice of categories in this study (as explained in response to point “d” above).

      h. Figure 4 shows data for EBA in detail; it would be helpful to have a similar presentation of the data for the other ROIs as well.

      We provide data for all ROIs as figure supplements 1-4 to Figure 4 in the revised manuscript.

      i. For the pFs and LOC ROIs, it would be helpful to have an indication of what proportion of voxels was most/least responsive to each of the four categories. Was this a relatively even balance, or generally favouring one of the categories?

      In LO, the proportion of voxels most responsive to each of the four categories was relatively even for Body (31%) and House (32%) stimuli, which was higher than the proportion of Car- and Cat-preferring voxels (18% and 19%, respectively). In pFs, 40% of the voxels were house-selective, while the proportion was relatively even for voxels most responsive to bodies, cars, and houses with 21%, 17%, and 22% of the voxels, respectively. We include the percentage of voxels most responsive to each of the four categories in each ROI as Appendix 1-table 1.

      j. Were the stimuli in the localisers the same as in the main experiment?

      No, we used different sets of stimuli for the localizers and the main experiment. We have added the information in line 146 of the Methods section.

      Reviewer #2 (Recommendations For The Authors):

      (1) Why are specific ROIs chosen? Perhaps some discussion motivating these choices, and addressing the possible overlap between these and retinotopic regions (based on other studies, or atlases - Wang et al, 2015) would be useful.

      Considering that we used object categories, we decided to look at general object-selective regions (LO, pFS) as well as regions that are highly selective for specific categories (EBA, PPA). We also looked at the primary visual cortex as a control region. We have added this clarification in the Methods section lines 128-133:

      “Considering that we used object categories, we investigated five different regions of interest (ROIs): the object-selective areas lateral occipital cortex (LO) and posterior fusiform (pFs) as general object-selective regions, the body-selective extrastriate body area (EBA) and the scene-selective parahippocampal place area (PPA) as regions that are highly selective for specific categories, and the primary visual cortex (V1) as a control region. We chose these regions because they could all be consistently defined in both hemispheres of all participants and included a large number of voxels.”

      (2) The authors should consider including data on the relative prevalence of voxels preferring each category for each ROI (and/or the mean activation level across voxels for each category for each ROI). If some ROIs have very few voxels preferring some categories, there's a chance the observed results are a bit noisy when sorting based on those categories (e.g., if a ROI has essentially no response to a given pair of categories, then there's not likely to be much attentional modulation detectable, because the ROI isn't driven by those categories to begin with).

      We thank the reviewer for the insightful comment.

      We include the percentage of voxels most responsive to each of the four categories in each ROI in the Appendix ( Appendix 1-table 1, please see the answer to point “i” of the first reviewer).

      We also provide a table of average activity across voxels for each category in all ROIs as Appendix 1-table 2.

      As shown in the table, voxels show positive activity for all categories in all ROIs except for PPA, where voxels show no response to body and cat stimuli. This might explain why we observed a marginally significant correlation between weight shift and category distance in PPA only. As the reviewer mentions, since this region does not respond to body and cat stimuli, we do not observe a significant change in response due to the shift in attention for some pairs. We include the table in the Appendix and add the explanation to the Results section of the revised manuscript in lines 506-508:

      _“_Less significant results in PPA might arise from the fact that PPA shows no response to body and cat stimuli and little response to car stimuli (Appendix 1-table 2). Therefore, it is not possible to observe the effect of attention for all category pairs.”

      a. Related - would it make sense to screen voxels for inclusion in analysis based on above-basely activation for one or both of the categories? [could, for example, imagine you're accidentally measuring from the motor cortex - you'd be able to perform this analysis, but it would be largely nonsensical because there's no established response to the stimuli in either isolated or combined states].

      We performed all the analyses including only voxels that had a significantly positive GLM coefficient across the runs and the results remained the same. We have added the explanation in the Results section in line 509-510.

      (3) Behavioral performance is compared against chance level, but it doesn't seem that 50% is chance for the detection task. The authors write on page 4 that the 1-back repetition occurred between 2-3 times per block, so it doesn't seem to be the case that each stimulus had a 50% chance of being a repetition of the previous one.

      We apologize for the mistake in our report. We have reported the detection rate for the target-present trials (2-3 per block), not the behavioral performance across all trials. We have modified the sentence in the Results section.

      (4) Authors mention that the stimuli are identical for 2-stimulus trials where each category is attended (for a given pair) - but the cue is different, and the cue appears as a centrally-fixated word for 1 s. Is this incorporated into the GLM? I can't imagine this would have much impact, but the strict statement that the goals of the participant are the only thing differentiating trials with otherwise-identical stimuli isn't quite true.

      The word cue was not incorporated as a separate predictor into the GLM. As the reviewer notes, the signals related to the cue and stimuli are mixed. But given that the cues are brief and in the form of words rather than images, they are unlikely to have an effect on the response in the regions of interest.

      To be more accurate, we have included the clarification in the Methods section in lines 181-182:

      “We did not enter the cue to the GLM as a predictor. The obtained voxel-wise coefficients for each condition are thus related to the cue and the stimuli presented in that condition.”

      And in the Results section in lines 425-428 :

      “It is important to note that since the cue was not separately modeled in the GLM, the signals related to the cue and the stimuli were mixed. However, given that the cues were brief and presented in the form of words, they are unlikely to have an effect on the responses observed in the higher-level ROIs.”

      (5) Eq 5: I expected there to be some comparison of a and b directly as ratios (e.g., a_1 > b_1, as shown in Fig. 2). The equations used here should be walked through more carefully - it's very hard to understand what this analysis is actually accomplishing. I'm not sure I follow the explanation of relative weights given by the authors, nor how that maps onto the delta_W quantity in Equation 5.

      We provide a direct comparison of a and b, as well as a more thorough clarification of the analysis, in the Methods section in lines 274-276:

      “We first projected the paired vector on the plane defined by the isolated vectors (Figure 2A) and then determined the weight of each isolated vector in the projected vector (Figure 2B).”

      And in lines 286-297:

      “A higher a1 compared to a2 indicates that the paired response pattern is more similar to Vxat compared to Vyat, and vice versa. For instance, if we calculate the weights of the Body and Car stimuli in the paired response related to the simultaneous presentation of both stimuli, we can write in the LO region: VBodyatCar \= 0.81 VBody + 0.31 VCar, VBodyCarat \= 0.43 VBody + 0.68 VCar. Note that these weights are averaged across participants. As can be observed, in the presence of both body and car stimuli, the weight of each stimulus is higher when attended compared to the case when it is unattended. In other words, when attention shifts from body to car stimuli, the weight of the isolated body response (VBody) decreases in the paired response. We can therefore observe that the response in the paired condition is more similar to the isolated body response pattern when body stimuli are attended and more similar to the isolated car response pattern when car stimuli are attended.”

      And lines 303-306:

      “As shown here, even when body stimuli are attended, the effect of the unattended car stimuli is still present in the response, shown in the weight of the isolated car response (0.31). However, this weight increases when attention shifts towards car stimuli (0.68 in the attended case).”

      We also provide more detailed clarification for the 𝛥w and the relative weights in lines 309-324:

      “To examine whether this increase in the weight of the attended stimulus was constant or depended on the similarity of the two stimuli in cortical representation, we defined the weight shift as the multivariate effect of attention:

      𝛥w = a1/(a1+a2) – b1/(b1+b2)                                                                                          (5)

      Here, a1, a2, b1,and b2 are the weights of the isolated responses, estimated using Equation 4. We calculate the weight of the isolated x response once when attention is directed towards x (a1), and a second time when attention is directed towards y (b1). In each case, we calculate the relative weight of the isolated x in the paired response by dividing the weight of the isolated x by the sum of weights of x and y (a1+a2 when attention is directed towards x, and b1+b2 when attention is directed towards y). We then define the weight shift, Δw, as the change in the relative weight of the isolated x response in the paired response when attention shifts from x to y. A higher Δw for a category pair indicates that attention is more efficient in removing the effect of the unattended stimulus in the pair. We used relative weights as a normalized measure to compensate for the difference in the sum of weights for different category pairs. Thus, using the normalized measure, we calculated the share of each stimulus in the paired response. For instance, considering the Body-Car pair, the share of the body stimulus in the paired response was equal to 0.72 and 0.38, when body stimuli were attended and unattended, respectively. We then calculated the change in the share of each stimulus caused by the shift in attention using a simple subtraction ( Equation 5: Δw=0.34 for the above example of the Body-Car pair in LO) and used this measure to compare between different pairs.”

      We hope that this clarification makes it easier to understand the multivariate analysis and the weight shift calculation in Equation 5.

      We additionally provide the values of the weights (a1, b1, a2, and b2 ) for each category pair averaged across participants as Appendix 1 -table 4.

      (6) For multivariate analyses (Fig. 6A-E), x axis is normalized (pattern distance based on Pearson correlation), while the delta_W does not seem to be similarly normalized.

      We calculated ΔW by dividing the weights in each condition by the sum of weights in that condition. Thus, we use relative weights which are always in the range of 0 to 1, and ΔW is thus always in the range of -1 to 1. This means that both axes are normalized. Note that even if one axis were not normalized, the relationship between the independent and the dependent variables would remain the same despite the change in the range of the axis.

      (7) Simulating additional scenarios like attention to both categories just increasing the mean response would be helpful - is this how one would capture results like those shown in some panels of Fig. 4?

      We did not have a condition in which participants were asked to attend to both categories. Therefore it was not useful for our simulations to include such a scenario. Please also note that the goal of our simulations is not to capture the exact amount of attentional modulation, but to investigate the effect of target-distractor similarity on the change in attentional modulation (univariate shift and weight shift).

      As for the results in some panels of Figure 4, we have explained the reason underlying higher responses in paired conditions compared to isolated conditions) in response to the “weaknesses” section of the second reviewer. We hope that these points satisfy the reviewer’s concern regarding the results in Figure 4 and our simulations.

      (8) Lines 271-276 - the "latter" and "former" are backwards here I think.

      We believe that the sentence was correct, but confusing.. We have rephrased the sentence to avoid the confusion in lines 371-376 of the revised manuscript:

      “We modeled two neural populations: a general object-selective population in which each voxel shows preference to a particular category and voxels with different preferences are mixed in with each other (similar to LO and pFS), and a category-selective population in which all voxels have a similar preference for a particular category (similar to EBA and PPA).”

      (9) Line 314 - "body-car" pair is mentioned twice in describing the non-significant result in PPA ROI.

      Thank you for catching the typo. We have changed the second Body-Car to Body-Cat.

      (10) Fig. 5 and Fig. 6 - I was expecting to see a plot that demonstrated variability across subjects rather than across category pairs. Would it be possible to show the distribution of each pair's datapoints across subjects, perhaps by coloring all (e.g.) body-car datapoints one color, all body-cat datapoints another, etc? This would also help readers better understand how category preferences (which differ across ROIs) impact the results.

      We demonstrated variability across category pairs rather than subjects because we aimed to investigate how the variation in the similarity between categories (i.e. category distance) affected the univariate and multivariate effects of attention. The variability across subjects is reflected in the error bars in the bar plots of Figure 5 and Figure 6.

      Here we show the distribution of each category pair’s data points across subjects by using a different color for each pair:

      Author response image 2.

      Univariate shift versus category distance including single-subject data points in all ROIs.

      Author response image 3.

      Weight shift versus category distance including single-subject data points in all ROIs.

      As can be observed in the figures, category preference has little impact on the results. Rather, the similarity in the preference (in the univariate case) or the response pattern (in the multivariate case) to the two presented categories is what impacts the amount of the univariate shift and the weight shift, respectively. For instance, in EBA we observe a low amount of attentional shift both for the Body-Cat pair, with two stimuli for which the ROI is highly selective, and the Car-House pair, including stimuli to which the region shows little response. A similar pattern is observed in the object-selective regions LO and pFs which show high responses to all stimulus categories.

      We believe that the figures including the data points related to all subjects are not strongly informative. However, we agree that using different colors for each category pair helps the readers better understand that category preference has little impact on the results in different ROIs. We therefore present the colored version of Figure 5 and Figure 6 in the revised manuscript, with a different color for each category pair.

      (11) Fig. 5 and Fig. 6 use R^2 as a dependent variable across participants to conclude a positive relationship. While the positive relationship is clear in the scatterplots, which depict averages across participants for each category pair, it could still be the case that there are a substantial number of participants with negative (but predictive, thus high positive R^2) slopes. For completeness and transparency, the authors should illustrate the average slope or regression coefficient for each of these analyses.

      We concluded the positive relationship and calculated the significance in Figure 5 and Figure 6 using the correlation r rather than r.^2 This is why the result was not significantly positive in V1. We acknowledge that the use of r-squared in the bar plot leads to confusion. We have therefore changed the bar plots to show the correlation coefficient instead of the r-squared. Furthermore, we have added a table of the correlation coefficient for all participants in all ROIs for the univariate and weight shift analyses supplemental to Figure 5 and Figure 6, respectively.

      (12) No statement about data or analysis code availability is provided

      Thanks for pointing this out. The fMRI data is available on OSF. We have added a statement about it in the Data Availability section of the revised manuscript in line 669.