26,925 Matching Annotations
  1. Mar 2024
    1. eLife assessment

      This work demonstrates an important regulatory role of the N-terminal disordered tail of small ubiquitin-like modifier (SUMO) proteins, which modulates the function of a variety of proteins in eukaryotic cells. The authors present convincing evidence that the N-terminal region of SUMO inhibits its own interaction with downstream effector proteins and SUMOylation targets. This new discovery significantly advances the field by providing a possible explanation of how SUMO paralogues select their effectors and SUMOylation targets.

    2. Reviewer #1 (Public Review):

      Summary:

      SUMO proteins are processed and then conjugated to other proteins via a C-terminal di-glycine motif. In contrast, the N-terminus of some SUMO proteins (SUMO2/3) contains lysine residues that are important for the formation of SUMO chains. Using NMR studies, the N-terminus of SUMO was previously reported to be flexible (Bayer et al., 1998). The authors are investigating the role of the flexible (referred to as intrinsically disordered) N-terminus of several SUMO proteins. They report their findings and modeling data that this intrinsically disordered N-terminus of SUMO1 (and the C. elegans Smo1) regulates the interaction of SUMO with SUMO interacting motifs (SIMs).

      Strengths:

      Among the strongest experimental data suggesting that the N-terminus plays an inhibitory function are their observations that<br /> (1) SUMO1∆N19 binds more efficiently to SIM-containing Usp25, Tdp2, and RanBp2,<br /> (2) SUMO1∆N19 shows improved sumoylation of Usp25,<br /> (3) changing negatively-charged residues, ED11,12KK in the SUMO1 N-terminus increased the interaction and sumoylation with/of USP25.

      The paper is very well-organized, clearly written, and the experimental data are of high quality. There is good evidence that the N-terminus of SUMO1 plays a role in regulating its binding and conjugation to SIM-containing proteins. Therefore, the authors are presenting a new twist in the ever-evolving saga of SUMO, SIMs, and sumoylation.

      Weaknesses:

      Much has been learned about SUMO through structure-function analyses and this study is another excellent example. I would like to suggest that the authors take some extra time to place their findings into the context of previous SUMO structure-function analyses. Furthermore, it would be fitting to place their finding of a potential role of N-terminally truncated Smo1 into the context of the many prior findings that have been made with regard to the C. elegans SUMO field. Finally, regarding their data modeling/simulation, there are questions regarding the data comparisons and whether manipulations of the N-terminus also have an effect on the 70/80 region of the core.

    3. Reviewer #2 (Public Review):

      Summary:

      This very interesting study originated from a serendipitous observation that the deletion of the disordered N-terminal tail of human SUMO1 enhances its binding to its interaction partners. This suggested that the N terminus of SUMO1 might be an intrinsic competitive inhibitor of SUMO-interacting motif (SIM) binding to SUMO1. Subsequent experiments support this mechanism, showing that in humans it is specific to SUMO1 and does not extend to SUMO2 or SUMO3 (except, perhaps, when the N terminus of SUMO2 becomes phosphorylated, as the authors intriguingly suggest - and partially demonstrate). The auto-inhibition of SUMO1 via its N-terminal tail apparently explains the lower binding of SUMO1 compared to SUMO2 to some SIMs and lower SIM-dependent SUMOylation of some substrates with SUMO1 compared to SUMO2, thus adding an important element to the puzzle of SUMO paralogue preference. In line with this explanation, N-terminally truncated SUMO1 was equally efficient to SUMO2 in the studied cases. The inhibitory role of SUMO1's N terminus appears conserved in other species including S. cerevisiae and C. elegans, both of which contain only one SUMO. The study also elucidates the molecular mechanism by which the disordered N-terminal region of SUMO1 can exert this auto-inhibitory effect. This appears to depend on the transient, very highly dynamic physical interaction between the N terminus and the surroundings of the SIM-binding groove based mostly on electrostatic interactions between acidic residues in the N terminus and basic residues around the groove.

      Strengths:

      A key strength of this study is the interplay of different techniques, including biochemical experiments, NMR, molecular dynamics simulations, and, at the end, in vivo experiments. The experiments performed with these different techniques inform each other in a productive way and strengthen each others' conclusions. A further strength is the detailed and clear text, which patiently introduces, describes, and discusses the study. Finally, in terms of the message, the study has a clear, mechanistic message of fundamental importance for various aspects of the SUMO field, and also more generally for protein biochemists interested in the functional importance of intrinsically disordered regions.

      Weaknesses:

      Some of the authors' conclusions are similar to those from a recent study by Lussier-Price et al. (NAR, 2022), the two studies likely representing independent inquiries into a similar topic. I don't see it as a weakness by itself (on the contrary), but it seems like a lost opportunity not to discuss at more length the congruence between these two studies in the discussion (Lussier-Price is only very briefly cited). Another point that can be raised concerns the wording of conclusions from molecular dynamics. The use of molecular dynamics simulations in this study has been rigorous and fruitful - indeed, it can be a model for such studies. Nonetheless, parameters derived from molecular dynamics simulations, including kon and koff values, could be more clearly described as coming from simulations and not experiments. Lastly, some of the conclusions - such as enhanced binding to SIM-containing proteins upon N-terminal deletion - could be additionally addressed with a biophysical technique (e.g. ITC) that is more quantitative than gel-based pull-down assays - but I don't think it is a must.

    1. eLife assessment

      This manuscript reports important data on the stability of nucleosomes. Convincing evidence obtained by single-molecule FRET experiments shows that DNA unwrapping is found to be slower when a single CC base pair mismatch is introduced at three different positions. The work is carefully conducted and described clearly, but the biological significance and implications of the findings on cellular DNA metabolism remain unclear.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Ngo et al. report a peculiar effect where a single base mismatch (CC) can enhance the mechanical stability of a nucleosome. In previous studies, the same group used a similar state-of-the-art fluorescence-force assay to study the unwrapping dynamics of 601-DNA from the nucleosome and observed that force-induced unwrapping happens more slowly for DNA that is more bendable because of changes in sequence or chemical modification. This manuscript appears to be a sequel to this line of projects, where the effect of CC is tested. The authors confirmed that CC is the most flexible mismatch using the FRET-based cyclization assay and found that unwrapping becomes slower when CC is introduced at three different positions in the 601 sequence. The CC mismatch only affects the local unwrapping dynamics of the outer turn of nucleosomal DNA.

      Strengths:

      These results are in good agreement with the previously established correlation between DNA bendability and nucleosome mechanical stability by the same group. This well-executed, technically sound, and well-written experimental study contains novel nucleosome unwrapping data specific to the CC mismatch and 601 sequence, the cyclizability of DNA containing all base pair mismatches, and the unwrapping of 601-DNA from xenophus and yeast histones. Overall, this work will be received with great interest by the biophysics community and is definitely worth attention.

      Weaknesses:

      The scope and impact of this study are somewhat limited due to the lack of sequence variation. Whether the conclusion from this study can be generalized to other sequences and other bendability-enhancing mismatches needs further investigation.

      Major questions:

      (1) As pointed out by the authors, the FRET signal is not sensitive to nucleosome position; therefore, the increasing unwrapping force in the presence of CC can be interpreted as the repositioning of the nucleosome upon perturbation. It is then also possible that CC-containing DNA is not positioned exactly the same as normal DNA from the start upon nucleosome assembly, leading to different unwrapping trajectories. What is the experimental evidence that supports identical positioning of the nucleosomes before the first stretch?

      (2) The authors chose a constant stretching rate in this study. Can the authors provide a more detailed explanation or rationale for why this rate was chosen? At this rate, the authors found hysteresis, which indicates that stretching is faster than quasi-static. But it must have been slow and weak enough to allow for reversible unwrapping and wrapping of a CC-containing DNA stretch longer than one helical turn. Otherwise, such a strong effect of CC at a single location would not be seen. I am also curious about the biological relevance of the magnitude of the force. Can such force arise during nucleosome assembly in vivo?

      (3) In this study, the CC mismatch is the only change made to the 601 sequence. For readers to truly appreciate its unique effect on unwrapping dynamics as a base pair defect, it would be nice to include the baseline effects of other minor changes to the sequence. For example, how robust is the unwrapping force or dynamics against a single-bp change (e.g., AT to GC) at the three chosen positions?

      (4) The last section introduces yeast histones. Based on the theme of the paper, I was expecting to see how the effect of CC is or is not preserved with a different histone source. Instead, the experiment only focuses on differences in the unwrapping dynamics. Although the data presented are important, it is not clear how they fit or support the narrative of the paper without the effect of CC.

      (5) It is stated that tRNA was excluded in experiments with yeast-expressed nucleosomes. What is the reason for excluding it for yeast nucleosomes? Did the authors rule out the possibility that tRNA causes the measured difference between the two nucleosome types?

    3. Reviewer #2 (Public Review):

      Summary:

      Mismatches occur as a result of DNA polymerase errors, chemical modification of nucleotides, during homologous recombination between near-identical partners, as well as during gene editing on chromosomal DNA. Under some circumstances, such mismatches may be incorporated into nucleosomes but their impact on nucleosome structure and stability is not known. The authors use the well-defined 601 nucleosome positioning sequence to assemble nucleosomes with histones on perfectly matched dsDNA as well as on ds DNA with defined mismatches at three nucleosomal positions. They use the R18, R39, and R56 positions situated in the middle of the outer turn, at the junction between the outer turn and inner turn, and in the middle of the inner turn, respectively. Most experiments are carried out with CC mismatches and Xenopus histones. Unwrapping of the outer DNA turn is monitored by single-molecule FRET in which the Cy3 donor is incorporated on the 68th nucleotide from the 5'-end of the top strand and the Cy5 acceptor is attached to the 7th nucleotide from the 5' end of the bottom strand. Force is applied to the nucleosomal DNA as FRET is monitored to assess nucleosome unwrapping. The results show that a CC mismatch enhances nucleosome mechanical stability. Interestingly, yeast and Xenopus histones show different behaviors in this assay. The authors use FRET to measure the cyclization of the dsDNA substrates to test the hypothesis that mismatches enhance the flexibility of the 601 dsDNA fragment and find that CC, CA, CT, TT, and AA mismatches decrease looping time, whereas GA, GG, and GT mismatches had little to no effect. These effects correlate with the results from DNA buckling assays reported by Euler's group (NAR 41, 2013) using the same mismatches as an orthogonal way to measure DNA kinking. The authors discuss that substitution rates are higher towards the middle of the nucleosome, suggesting that mismatches/DNA damage at this position are less accessible for repair, consistent with the nucleosome stability results.

      Strengths:

      The single-molecule data show clear and consistent effects of mismatches on nucleosome stability and DNA persistence length.

      Weaknesses:

      It is unclear in the looping assay how the cyclization rate relates to the reporting looping time. The biological significance and implications such as the effect on mismatch repair or nucleosome remodelers remain untested. It is unclear whether the mutational pattern reflects the behavior of the different mismatches. Such a correlation could strengthen the argument that the observed effects are relevant for mutagenesis.

    4. Reviewer #3 (Public Review):

      Summary:

      The mechanical properties of DNA wrapped in nucleosomes affect the stability of nucleosomes and may play a role in the regulation of DNA accessibility in eukaryotes. In this manuscript, Ngo and coworkers study how the stability of a nucleosome is affected by the introduction of a CC mismatched base pair, which has been reported to increase the flexibility of DNA. Previously, the group has used a sophisticated combination of single-molecule FRET and force spectroscopy with an optical trap to show that the more flexible half of a 601 DNA segment provides for more stable wrapping as compared to the other half. Here, it is confirmed with a single-molecule cyclization essay that the introduction of a CC mismatch increases the flexibility of a DNA fragment. Consistent with the previous interpretation, it also increased the unwrapping force for the half of the 601 segment in which the CC mismatch was introduced, as measured with single-molecule FRET and force spectroscopy. Enhanced stability was found up to 56 bp into the nucleosome. The intricate role of mechanical stability of nucleosomes was further investigated by comparing force-induced unwrapping profiles of yeast and Xenopus histones. Intriguingly, asymmetric unwrapping was more pronounced for yeast histones.

      Strengths:

      (1) High-quality single-molecule data.

      (2) Novel mechanism, potentially explaining the increased prominence of mutations near the dyads of nucleosomes.

      (3) A clear mechanistic explanation of how mismatches affect nucleosome stability.

      Weaknesses:

      (1) Disconnect between mismatches in nucleosomes and measurements comparing Xenopus and yeast nucleosome stability.

      (2) Convoluted data in cyclization experiments concerning the phasing of mismatches and biotin site.

    1. Reviewer #3 (Public Review):

      Induced pluripotent stem cells, or iPSCs, are cells that scientists can push to become new, more mature cell types like neurons. iPSCs have a high potential to transform how scientists study disease by combining precision medicine gene editing with processes known as high-content imaging and drug screening. However, there are many challenges that must be overcome to realize this overall goal. The authors of this paper solve one of these challenges: predicting cell types that might result from potentially inefficient and unpredictable differentiation protocols. These predictions can then help optimize protocols.

      The authors train advanced computational algorithms to predict single-cell types directly from microscopy images. The authors also test their approach in a variety of scenarios that one may encounter in the lab, including when cells divide quickly and crowd each other in a plate. Importantly, the authors suggest that providing their algorithms with just the right amount of information beyond the cells' nuclei is the best approach to overcome issues with cell crowding.

      The work provides many well-controlled experiments to support the authors' conclusions. However, there are two primary concerns: (1) The model may be relying too heavily on the background and thus technical artifacts (instead of the cells) for making CNN-based predictions, and (2) the conclusion that their nucleocentric approach (including a small area beyond the nucleus) is not well supported, and may just be better by random chance. If the authors were to address these two concerns (through additional experimentation), then the work may influence how the field performs cell profiling in the future.

      Additionally, the impact of this work will be limited, given the authors do not provide a specific link to the public source code that they used to process and analyze their data.

    1. eLife assessment

      This study reports an important finding highlighting the essential role of the putative ion channel, TMC7 (transmembrane channel-like 7) in male fertility, thereby significantly advancing our understanding of the function of the previously uncharacterized protein in sperm development. The evidence supporting TMC7's requirement in acrosome biogenesis during spermatogenesis is solid, and its function as an ion channel requires more study.

    2. Reviewer #1 (Public Review):

      Summary:<br /> TMC7 knockout mice were generated by the authors and the phenotype was analyzed. They found that Tmc7 is localized to Golgi and is needed for acrosome biogenesis.

      Strengths:<br /> The phenotype of infertility is clear, and the results of TMC7 localization and the failed acrosome formation are highly reliable. In this respect, they made a significant discovery regarding spermatogenesis.

      Weaknesses:<br /> There are also some concerns, which are mainly related to the molecular function of TMC7 and Figure 5. It is understandable that TMC7 exhibits some channel activity in the Golgi and somehow affects luminal pH or Ca2+, leading to the failure of acrosome formation. On the other hand, since they are conducting the pH and calcium imaging from the cytoplasm, I do not think that the effect of TMC7 channel function in Golgi is detectable with their methods. Rather, it is more likely that they are detecting apoptotic cells that have no longer normal ion homeostasis. Another concern is that n is only 3 for these imaging experiments.

    3. Reviewer #2 (Public Review):

      Summary:

      This study presents a significant finding that enhances our understanding of spermatogenesis. TMC7 belongs to a family of transmembrane channel-like proteins (TMC1-8), primarily known for their role in the ear. Mutations to TMC1/2 are linked to deafness in humans and mice and were originally characterized as auditory mechanosensitive ion channels. However, the function of the other TMC family members remains poorly characterized. In this study, the authors begin to elucidate the function of TMC7 in acrosome biogenesis during spermatogenesis. Through analysis of transcriptomics datasets, they identify TMC7 as a transmembrane channel-like protein with elevated transcript levels in round spermatids in both mouse and human testis. They then generate Tmc7-/- mice and find that male mice exhibit smaller testes and complete infertility. Examination of different developmental stages reveals spermatogenesis defects, including reduced sperm count, elongated spermatids, and large vacuoles. Additionally, abnormal acrosome morphology is observed beginning at the early-stage Golgi phase, indicating TMC7's involvement in proacrosomal vesicle trafficking and fusion. They observed localization of TMC7 in the cis-Golgi and suggest that its presence is required for maintaining Golgi integrity, with Tmc7-/- leading to reduced intracellular Ca2+, elevated pH, and increased ROS levels, likely resulting in spermatid apoptosis. Overall, the work delineates a new function of TMC7 in spermatogenesis and the authors suggest that its ion channel activity is likely important for Golgi homeostasis. This work is of significant interest to the community and is of high quality.

      Strengths:

      The biggest strength of the paper is the phenotypic characterization of the TMC7-/- mouse model, which has clear acrosome biogenesis/spermatogenesis defects. This is the main claim of the paper and it is supported by the data that are presented.

      Weaknesses:

      The claim is that TMC7 functions as an ion channel. It is reasonable to assume this given what has been previously published on the more well-characterized TMCs (TMC1/2), but the data supporting this is preliminary here, and more needs to be done to solidify this hypothesis. The authors are careful in their interpretation and present this merely as a hypothesis supporting this idea.

    4. Reviewer #3 (Public Review):

      Summary:

      In this study, Wang et al. have demonstrated that TMC7, a testis-enriched multipass transmembrane protein, is essential for male reproduction in mice. Tmc7 KO male mice are sterile due to reduced sperm count and abnormal sperm morphology. TMC7 co-localizes with GM130, a cis-Golgi marker, in round spermatids. The absence of TMC7 results in reduced levels of Golgi proteins, elevated abundance of ER stress markers, as well as changes of Ca2+ and pH levels in the KO testis. However, further confirmation is required because the analyses were performed with whole testis samples in spite of the differences in the germ cell composition in WT and KO testis. In addition, the causal relationships between the reported anomalies await thorough interrogation.

      Strengths:<br /> The microscopic images are of great quality, all figures are properly arranged, and the entire manuscript is very easy to follow.

      Weaknesses:<br /> Tmc7 KO male mice show multiple anomalies in sperm production and morphogenesis, such as reduced sperm count, abnormal sperm head, and deformed midpiece. Thus, it is confusing that the authors focused solely on impaired acrosome biogenesis. Further investigations are warranted to determine whether the abnormalities reported in this manuscript (e.g., changes in protein, Ca2+, and pH levels) are directly associated with the molecular function of TMC7 or are the byproducts of partially arrested spermiogenesis. Please find additional comments in "Recommendations for the authors".

    1. eLife assessment

      This valuable study addresses how distinct cilia lengths of different cell types in zebrafish are regulated and proposed an interesting mechanism. While the quality of imaging and the resources generated in this study are excellent, key experimental information is not always provided and the strength of evidence to support the proposed hypothesis on ciliary length regulation is currently incomplete.

    2. Reviewer #1 (Public Review):

      Among the many challenges in the cilia field, is the question of how multicellular organisms assemble a variety of structurally and functionally specialized cilia, including cilia of different lengths. This study addresses the important question of how ciliary length differences are established in vertebrates. Specifically, the authors analyzed the role of intraflagellar transport (IFT) in ciliary length regulation in zebrafish, exploiting the transparency of the embryos. Zebrafish possess functionally specialized motile and non-motile cilia in a variety of tissues. Expression of GFP-tagged IFT88, a component of the IFT-B subcomplex, in a corresponding mutant, allowed the authors to image IFT in five distinct types of cilia. They note that IFT moves faster in longer cilia. Tagging and imaging of the IFT-A protein IFT43 further support this observation. IFT speed was largely unaffected in knock-out and morphants targeting the BBSome, various kinesin-2 motors, and the posttranslational modifications of tubulin polyglycylation and polyglutamylation. Using high-resolution STED imaging, the authors observe that IFT signals (likely, corresponding to IFT trains) are smaller in the shorter spinal cord cilia compared to the long cristae cilia. Based on these observations, the authors test the hypothesis that larger IFT trains recruit more motors or coordinate the motors better, resulting in faster trains, and causing cilia to be longer. This is further tested using partial knock-down of IFT88-GFP, which resulted in shorter crista cilia, reduced IFT particle number, size, and velocity. Some parts of the manuscript show "negative" data (e.g., ciliary length and IFT are not affected by the loss of BBS4) but these add beautifully to the overall story and allow for additional conclusions such as the minor role of ttll3 and ccp knockouts on ciliary length in this model. This is an excellent study, which documents IFT in a vertebrate species and explores its regulation. The data are of high quality and support most of the conclusions.

      (1) The main hypothesis/conclusion is summarized in the abstract: "Our study presents an intriguing model of cilia length regulation via controlling IFT speed through the modulation of the size of the IFT complex." The data clearly document the remarkable correlation between IFT velocity and ciliary length in the different cells/tissues/organs analyzed. The experimental test of this idea, i.e., the knock-down of GFP-IFT88, further supports the conclusion but needs to be interpreted more carefully. While IFT particle size and train velocity were reduced in the IFT88 morphants, the number of IFT particles is even more decreased. Thus, the contributions of the reduction in train size and velocity to ciliary length are, in my opinion, not unambiguous. Also, the concept that larger trains move faster, likely because they dock more motors and/or better coordinating kinesin-2 and that faster IFT causes cilia to be loner, is to my knowledge, not further supported by observations in other systems (see below).

      (2) I think the manuscript would be strengthened if the IFT frequency would also be analyzed in the five types of cilia. This could be done based on the existing kymographs from the spinning disk videos. As mentioned above, transport frequency in addition to train size and velocity is an important part of estimating the total number of IFT particles, which bind the actual cargoes, entering/moving in cilia.

      (3) Here, the variation in IFT velocity in cilia of different lengths within one species is documented - the results document a remarkable correlation between IFT velocity and ciliary length. These data need to be compared to observations from the literature. For example, the velocity of IFT in the quite long (~ 100 um) olfactory cilia of mice is similar to that observed in the rather short cilia of fibroblasts (~0.6 um/s). In Chlamydomonas, IFT velocity is not different in long flagella mutants compared to controls. Probably data are also available for C. elegans or other systems. Discussing these data would provide a broader perspective on the applicability of the model outside of zebrafish.

    3. Reviewer #2 (Public Review):

      Summary:

      In this study, the authors study intraflagellar transport (IFT) in cilia of diverse organs in zebrafish. They elucidate that IFT88-GFP (an IFT-B core complex protein) can substitute for endogenous IFT88 in promoting ciliogenesis and use it as a reporter to visualize IFT dynamics in living zebrafish embryos. They observe striking differences in cilia lengths and velocity of IFT trains in different cilia types, with smaller cilia lengths correlating with lower IFT speed. They generate several mutants and show that disrupting the function of different kinesin-2 motors and BBSome or altering post-translational modifications of tubulin does not have a significant impact on IFT velocity. They however observe that when the amount of IFT88 is reduced it impacts the cilia length, IFT velocity as well as the number and size of IFT trains. They also show that the IFT train size is slightly smaller in one of the organs with shorter cilia (spinal cord). Based on their observations they propose that IFT velocity determines cilia length and go one step further to propose that IFT velocity is regulated by the size of IFT trains.

      Strengths:

      The main highlight of this study is the direct visualization of IFT dynamics in multiple organs of a living complex multi-cellular organism, zebrafish. The quality of the imaging is really good. Further, the authors have developed phenomenal resources to study IFT in zebrafish which would allow us to explore several mechanisms involved in IFT regulation in future studies. They make some interesting findings in mutants with disrupted function of kinesin-2, BBSome, and tubulin modifying enzymes which are interesting to compare with cilia studies in other model organisms. Also, their observation of a possible link between cilia length and IFT speed is potentially fascinating.

      Weaknesses:

      The manuscript as it stands, has several issues.

      (1) The study does not provide a qualitative description of cilia organization in different cell types, the cilia length variation within the same organ, and IFT dynamics. The methodology is also described minimally and must be detailed with more care such that similar studies can be done in other laboratories.

      (2) They provide remarkable new observations for all the mutants. However, discussion regarding what the findings imply and how these observations align (or contradict) with what has been observed in cilia studies in other organisms is incomprehensive.

      (3) The analysis of IFT velocities, the main parameter they compare between experiments, is not described at all. The IFT velocities appear variable in several kymographs (and movies) and are visually difficult to see in shorter cilia. It is unclear how they make sure that the velocity readout is robust. Perhaps, a more automated approach is necessary to obtain more precise velocity estimates.

      (4) They claim that IFT speeds are determined by the size of IFT trains, based on their observations in samples with a reduced amount of IFT88. If this was indeed the case, the velocity of a brighter IFT train (larger train) would be higher than the velocity of a dimmer IFT train (smaller train) within the same cilia. This is not apparent from the movies and such a correlation should be verified to make their claim stronger.

      (5) They make an even larger claim that the cilia length (and IFT velocity) in different organs is different due to differences in the sizes of IFT trains. This is based on a marginal difference they observe between the cilia of crista and the spinal cord in immunofluorescence experiments (Figure 5C). Inferring that this minor difference is key to the striking difference in cilia length and IFT velocity is incorrect in my opinion.

      Impact:

      Overall, I think this work develops an exciting new multicellular model organism to study IFT mechanisms. Zebrafish is a vertebrate where we can perform genetic modifications with relative ease. This could be an ideal model to study not just the role of IFT in connection with ciliary function but also ciliopathies. Further, from an evolutionary perspective, it is fascinating to compare IFT mechanisms in zebrafish with unicellular protists like Chlamydomonas, simple multicellular organisms like C elegans, and primary mammalian cell cultures. Having said that, the underlying storyline of this study is flawed in my opinion and I would recommend the authors to report the striking findings and methodology in more detail while significantly toning down their proposed hypothesis on ciliary length regulation. Given the technological advancements made in this study, I think it is fine if it is a descriptive manuscript and doesn't necessarily need a breakthrough hypothesis based on preliminary evidence.

    4. Reviewer #3 (Public Review):

      Summary:

      A known feature of cilia in vertebrates and many, if not all, invertebrates is the striking heterogeneity of their lengths among different cell types. The underlying mechanisms, however, remain largely elusive. In the manuscript, the authors addressed this question from the angle of intraflagellar transport (IFT), a cilia-specific bidirectional transportation machinery essential to biogenesis, homeostasis, and functions of cilia, by using zebrafish as a model organism. They conducted a series of experiments and proposed an interesting mechanism. Furthermore, they achieved in situ live imaging of IFT in zebrafish larvae, which is a technical advance in the field.

      Strengths:

      The authors initially demonstrated that ectopically expressed Ift88-GFP through a certain heat-shock induction protocol fully sustained the normal development of mutant zebrafish that would otherwise be dead by 7 dpf due to the lack of this critical component of IFT-B complex. Accordingly, cilia formations were also fully restored in the tissues examined. By imaging the IFT using Ift88-GFP in the mutant fish as a marker, they unexpectedly found that both anterograde and retrograde velocities of IFT trains varied among cilia of different cell types and appeared to be positively correlated with the length of the cilia.

      For insights into the possible cause(s) of the heterogeneity in IFT velocities, the authors assessed the effects of IFT kinesin Kif3b and Kif17, BBSome, and glycylation or glutamylation of axonemal tubulin on IFT and excluded their contributions. They also used a cilia-localized ATP reporter to exclude the possibility of different ciliary ATP concentrations. When they compared the size of Ift88-GFP puncta in crista cilia, which are long, and spinal cord cilia, which are relatively short, by imaging with a cutting-edge super-resolution microscope, they noticed a positive correlation between the puncta size, which presumably reflected the size of IFT trains, and the length of the cilia.

      Finally, they investigated whether it is the size of IFT trains that dictates the ciliary length. They injected a low dose (0.5 ng/embryo) of ift88 MO and showed that, although such a dosage did not induce the body curvature of the zebrafish larvae, crista cilia were shorter and contained less Ift88-GFP puncta. The particle size was also reduced. These data collectively suggested mildly downregulated expression levels of Ift88-GFP. Surprisingly, they observed significant reductions in both retrograde and anterograde IFT velocities. Therefore, they proposed that longer IFT trains would facilitate faster IFT and result in longer cilia.

      Weaknesses:

      The current manuscript, however, contains serious flaws that markedly limit the credibility of major results and findings. Firstly, important experimental information is frequently missing, including (but not limited to) developmental stages of zebrafish larvae assayed (Figures 1, 3, and 5), how the embryos or larvae were treated to express Ift88-GFP (Figures 3-5), and descriptions on sample sizes and the number of independent experiments or larvae examined in statistical results (Figures 3-5, S3, S6). For instance, although Figure 1B appears to be the standard experimental scheme, the authors provided results from 30-hpf larvae (Figure 3) that, according to Figure 1B, are supposed to neither express Ift88-GFP nor be genotyped because both the first round of heat shock treatment and the genotyping were arranged at 48 hpf. Similarly, the results that ovl larvae containing Tg(hsp70l:ift88 GFP) (again, because the genotype is not disclosed in the manuscript, one can only deduce) display normal body curvature at 2 dpf after the injection of 0.5 ng of ift88 MO (Fig 5D) is quite confusing because the larvae should also have been negative for Ift88-GFP and thus displayed body curvature. Secondly, some inferences are more or less logically flawed. The authors tend to use negative results on specific assays to exclude all possibilities. For instance, the negative results in Figures 4A-B are not sufficient to "suggest that the variability in IFT speeds among different cilia cannot be attributed to the use of different motor proteins" because the authors have not checked dynein-2 and other IFT kinesins. In fact, in their previous publication (Zhao et al., 2012), the authors actually demonstrated that different IFT kinesins have different effects on ciliogenesis and ciliary length in different tissues. Furthermore, instead of also examining cilia affected by Kif3b or Kif17 mutation, they only examined crista cilia, which are not sensitive to the mutations. Similarly, their results in Figures 4C-G only excluded the importance of tubulin glycylation or glutamylation in IFT. Thirdly, the conclusive model is based on certain assumptions, e.g., constant IFT velocities in a given cell type. The authors, however, do not discuss other possibilities.

    1. eLife assessment

      This fundamental study employs a combination of cryo-electron microscopy, molecular dynamics, and mass spectrometry to elucidate the role of α-tubulin acetylation at the lumenal lysine 40 residue (αK40) within the cilium. Compelling evidence shows αK40 acetylation to impact the structure and stability of doublet microtubules in cilia by affecting the lateral rotational angle. The work will be of relevance to those interested in cytoskeleton and structural biology.

    2. Reviewer #1 (Public Review):

      Summary:

      The study "Effect of alpha-tubulin acetylation on the doublet microtubule structure" by S. Yang et al employs a multi-disciplinary approach, including cryo-electron microscopy (cryo-EM), molecular dynamics, and mass spectrometry, to investigate the impact of α-tubulin acetylation at the lysine 40 residue (αK40) on the structure and stability of doublet microtubules in cilia. The work reveals that αK40 acetylation exerts a small-scale, but significant, effect by influencing the lateral rotational angle of the microtubules, thereby affecting their stability. Additionally, the study provided an explanation of the relationship between αK40 acetylation and phosphorylation within cilia, despite that the details still remain elusive. Overall, these findings contribute to our understanding of how post-translational modifications can influence the structure, composition, stability, and functional properties of important cellular components like cilia.

      Strengths:

      (1) Multi-Disciplinary Approach: The study employs a robust combination of cryo-electron microscopy (cryo-EM), molecular dynamics, and mass spectrometry, providing a comprehensive analysis of the subject matter.<br /> (2) Significant Findings: The paper successfully demonstrates the impact of αK40 acetylation on the lateral rotational angles between protofilaments (inter-PF angles) of doublet microtubules in cilia, thereby affecting their stability. This adds valuable insights into the role of post-translational modifications in cellular components.<br /> (3) Exploration of Acetylation-Phosphorylation Relationship: The study also delves into the relationship between αK40 acetylation and phosphorylation within cilia, contributing to a broader understanding of post-translational modifications.<br /> (4) High-quality data: The authors are cryo-EM experts in the field and the data quality presented in the manuscript is excellent.<br /> (5) Depth of analysis: The authors analyzed the effects of αK40 acetylation in excellent depth which significantly improved our understanding of this system.

      Weaknesses:

      I have no major concerns about this paper.

    1. eLife assessment

      The solid study addresses the role of extracellular matrix (ECM) in neuronal migration. The authors showed that the interaction between the ternary complex formed by tenascin-C, the chondroitin sulfate proteoglycan neurocan, and hyaluronic acid is important for the multipolar to bipolar transition in the intermediate zone (IZ) of the developing cortex

    2. Reviewer #1 (Public Review):

      Summary:<br /> In the present study, authors found the ternary complex formed by NCAN, TNC, and HA as an important factor facilitating the multipolar to bipolar transition in the intermediate zone (IZ) of the developing cortex. NCAM binds HA via the N-terminal Link modules, meanwhile, TNC cross-links NCAN through the CDL domain at the C-terminal. The expression and right localization of these three factors facilitate the multipolar-bipolar transition necessary for immature neurons to migrate radially. TNC and NCAM are also involved in neuronal morphology. The authors used a wide range of techniques to study the interaction between these three molecules in the developing cortex. In addition, single and double KO mice for NCAN and TNC were analyzed to decipher the role of these molecules in neuronal migration and morphology.

      Strengths:<br /> The study of the formation of the cerebral cortex is crucial to understanding the pathophysiology of many neurodevelopmental disorders associated with malformation of the cerebral cortex. In this study, the authors showed, for the first time, that the ternary complex formed by NCAN, TNC, and HA promotes neuronal migration. The results regarding the interaction between the three factors forming the ternary complex are convincing.

    3. Reviewer #2 (Public Review):

      Summary:

      ECM components are prominent constituents of the pericellular environment of CNS cells and form complex and dynamic interactomes in the pericellular spaces. Based on bioinformatic analysis, more than 300 genes have been attributed to the so-called matrisome, many of which are detectable in the CNS. Yet, not much is known about their functions while increasing evidence suggests important contributions to developmental processes, neural plasticity, and inhibition of regeneration in the CNS. In this respect, the present work offers new insights and adds interesting aspects to the facets of ECM contributions to neural development. This is even more relevant in view of the fact that neurocan has recently been identified as a potential risk gene for neuropsychiatric diseases. Because ECM components occur in the interstitial space and are linked in interactomes their study is very difficult. A strength of the manuscript is that the authors used several approaches to shed light on ECM function, including proteome studies, the generation of knockout mouse lines, and the analysis of in vivo labeled neural progenitors. This multi-perspective approach permitted to reveal hitherto unknown properties of the ECM and highlighted its importance for the overall organization of the CNS.

      Strengths:

      Systematic analysis of the ternary complex between neurone, TNC, and hyaluronic acid; establishment of KO mouse lines to study the function of the complex, use of in utero electroporation to investigate the impact on neuronal migration.

    1. eLife assessment

      This study provides an important insight into the mechanisms of cooperation between Hsp70 and its cochaperones during protein disaggregation. Based on compelling evidence, the authors demonstrate that Hsp110 increases Hsp70 recruitment to protein aggregates. This work is of broad interest to biochemists and cell biologists working in the protein homeostasis field.

    2. Reviewer #1 (Public Review):

      Summary:

      The manuscript by Sztangierska et al explores how the Hsp70 chaperone together with its JDP-NEF cofactors and Hsp104 disentangle aggregated proteins. Specifically, the study provides mechanistic findings that explain what role the NEF class Hsp110 has in protein disaggregation. The results explain several previous observations related to Hsp110 in protein disaggregation. Importantly, the study provides compelling evidence that Hsp110 acts early in the disaggregation process.

      Strengths:<br /> (1) This is a very well-performed study with multiple in vitro experiments that provide convincing support for the claims.

      (2) An important finding is that the study places the Hsp110 function early in the disaggregation process.

      (3) The study has an important value in that it picks up on a number of observations in the field that have not been explored or directly tested by experiment. The presented results settle questions and controversy regarding Hsp110 function in disaggregation.

      Weaknesses:

      (1) While the key finding of this manuscript is that it places Hsp110 early in the disaggregation process, the other findings are advancing the field less.

      (2) A claim in the paper is that Hsp110 NEFs improve disaggregation by Hsp70 in a manner dependent on the class of JDP (class A vs class B). However, it rather appears that in the experiments class B JDPs support robust disaggregation, while class A JDPs are not as effective. This simple fact may very well underly the differences and questions if class specificity should be in focus in the interpretation of the data.

      (3) The experiments differ somewhat in regard to the aggregated protein used. For example, in Figure 1A, FFL is used with only limited reactivation (10% reactivated at the last timepoint and the curve is flattening), while in Figure 2B FFL-EGFP is used to monitor microscopically what appears to be complete disaggregation. Does FFL-EGFP behave the same as FFL in assays such as the one in Figure 1A or are there major differences that may impact how the data should be interpreted?

    3. Reviewer #2 (Public Review):

      Sztangierska et al. have investigated the impact of the nucleotide exchange (NEF) factor Hsp110 on the Hsp70-dependent dissolution of amorphous aggregates in the presence of representative members of two classes of J-domain protein.

      The authors find that the nucleotide exchange factor of the Hsp110 family, sse1, stimulates the disaggregation activity of yeast Hsp70, ssa1, in particular in the presence of the J-domain protein sis1. Linking chaperone-substrate interactions as determined by biolayer interferometry (BLI) to activity assays, they show that sse1 facilitates the loading of more ssa1 onto the aggregate substrate and propose that this is due to active remodeling of the protein aggregate which exposes more chaperone binding sites and thus facilitates reactivation. This study highlights two important facets of Hsp70 biology: different Hsp70 functions rely on the functional cooperation of specific co-chaperone combinations and the stoichiometry of the different players of the Hsp70 system is an important parameter in tuning Hsp70 chaperone activity.

      Strengths:

      The manuscript presents a systematic analysis of the functional cooperation of sse1 with a class B J-domain protein sis1 in the disaggregation of two different model aggregate substrates, allowing the authors to draw more general conclusions about Hsp70 disaggregation activity.

      The authors can pinpoint the role of sse1 to the initial remodeling of aggregates, rather than the later stages of refolding, highlighting the functional specificity of Hsp70 co-chaperones.

      They demonstrate the competitive nature of binding to ssa1 between sse1 and sis1 which can explain the poisoning of Hsp70 chaperone activities observed at high NEF concentrations.

      Weaknesses:

      Experimental data concerning the class A JDPs should be interpreted with caution. These experiments show very small reactivation activities for luciferase in the range of 0-1% without the addition of Hsp104 and 0-15% with the addition of Hsp104. Moreover, since the assay is based on the recovery of luciferase activity, it conflates two chaperone activities, namely disaggregation and refolding. It is possible that the small degree of reactivation observed for the class A JDP reflects a minor subpopulation of the aggregated species that is particularly easy to disaggregate/refold and may thus not be representative of bulk behaviour.

      While structural requirements have been identified that allow sse1, in cooperation with sis1, to facilitate the loading of Hsp70 on the amorphous aggregate substrate, how this is achieved on a mechanistic level remains an open question.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors studied the function of Hsp110 co-chaperones (e.g. yeast Sse1) in Hsp70-dependent protein disaggregation reactions. The study builds on former work by the authors (Wyszkowski et al., 2021, PNAS), analyzing the binding of Hsp70 and J-domain protein (JDP) cochaperones to protein aggregates using bio-layer interferometry (BLI). It was shown before by other groups that Hsp110 enhances Hsp70 disaggregation activity. The mechanism of Hsp110-stimulated disaggregation activity, however, remained poorly defined. Here, the authors show that yeast Hsp110 increases Hsp70 recruitment to the surface of protein aggregates. The effect is largely dependent on J-domain protein (JDP) identity and is particularly pronounced for class B JDPs (e.g. yeast Sis1), which are also more effective in disaggregation reactions. The authors also confirm former results, showing inhibition by increased Hsp110 levels, and provide novel evidence that the inhibitory effect is caused by competition between Hsp110 and JDPs for Hsp70 binding.

      Strengths:

      The work represents a very thoroughly executed study, which provides novel insights into the mechanism of Hsp70-mediated protein disaggregation. Key findings established for yeast chaperones are also documented for human counterparts. The observation that Hsp110 might displace JDPs from Hsp70 during the disaggregation reaction is very appealing. It will now become important to validate this initial finding and dissect how it propels the disaggregation reaction.

      Weaknesses:

      How exactly the interplay between JDPs and Hsp110 orchestrates protein disaggregation remains largely speculative and further analysis is required for a deeper mechanistic understanding. Enhanced recruitment of Hsp70 in the presence of Hsp110 was shown for amyloid fibrils before (Beton et al., EMBO J 2022) and should be acknowledged. The assay reporting on the refolding activity of Hsp70 seems problematic due to the high spontaneous refolding of the substrate Luciferase and should be modified.

    1. eLife assessment

      This manuscript highlights single-stranded DNA exo- and endo-nuclease activities of ExoIII as a potential caveat and an underestimated source of decreased efficiency in its use in biosensor assays. The data present convincing evidence for the ssDNA nuclease activity of ExoIII and identifies residues that contribute to it. The findings are useful, but the study remains incomplete as the effect on biosensor assays was not established.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors show compelling data indicating that ExoIII has significant ssDNA nuclease activity that is posited to interfere with biosensor assays. This does not come as a surprise as other published works have indeed shown the same, but in this work, the authors provide a deeper analysis of this underestimated activity.

      Strengths:

      The authors used a variety of assays to examine the ssDNA nuclease activity of ExoIII and its origin. Fluorescence-based assays and native gel electrophoresis, combined with MS analysis clearly indicate that both commercial and laboratory purified ExoIII contain ssDNA nuclease activity. Mutational analysis identifies the residues responsible for this activity. Of note is the observation in this submitted work that the sites of ssDNA and dsDNA exonuclease activity overlap, suggesting that it may be difficult to identify mutations that affect one activity but not the other. In this regard, it is of interest the observation by the authors that the ssDNA nuclease activity depends on the sequence composition of the ssDNA, and this may be used as a strategy to suppress this activity when necessary. For example, the authors point out that a 3′ A4-protruding ssDNA could be employed in ExoIII-based assays due to its resistance to digestion. However, this remains an interesting suggestion that the authors do not test, but that would have strengthened their conclusion.

      Weaknesses:

      The authors provide a wealth of experimental data showing that E. coli ExoIII has ssDNA nuclease activities, both exo- and endo-, however this work falls short in showing that indeed this activity practically interferes with ExoIII-driven biosensor assays, as suggested by the authors. Furthermore, it is not clear what new information is gained compared to the one already gathered in previously published works (e.g. references 20 and 21). Also, the authors show that ssDNA nuclease activity has sequence dependence, but in the context of the observation that this activity is driven by the same site as dsDNA Exo, how does this differ from similar sequence effects observed for the dsDNA Exo? (e.g. see Linxweiler, W. and Horz, W. (1982). Nucl. Acids Res. 10, 4845-4859).

      Because of the claim that the underestimated ssDNA nuclease activity can interfere with commercially available assays, it would have been appropriate to test this. The authors only show that ssDNA activity can be identified in commercial ExoIII-based kits, but they do not assess how this affects the efficiency of a full reaction of the kit. This could have been achieved by exploiting the observed ssDNA sequence dependence of the nuclease activity. In this regard, the work cited in Ref. 20 showed that indeed ExoIII has ssDNA nuclease activity at concentrations as low as 50-fold less than what test in this work. Ref 20 also tested the effect of the ssDNA nuclease activity in Targeted Recycle Assays, rather than just testing for its presence in a kit.

      Because of the implication that the presence of ssDNA exonuclease activity may have in reactions that are supposed to only use ExoIII dsDNA exonuclease, it is surprising that in this submitted work no direct comparison of these two activities is done. Please provide an experimental determination of how different the specific activities for ssDNA and dsDNA are.

    3. Reviewer #2 (Public Review):

      Summary:

      This paper describes some experiments addressing 3' exonuclease and 3' trimming activity of bacterial exonuclease III. The quantitative activity is in fact very low, despite claims to the contrary. The work is of low interest with regard to biology, but possibly of use for methods development. Thus the paper seems better suited to a methods forum.

      Strengths:

      Technical approaches.

      Weaknesses:

      The purity of the recombinant proteins is critical, but no information on that is provided. The minimum would be silver-stained SDS-PAGE gels, with some samples overloaded in order to detect contaminants.

      Lines 74-76: What is the evidence that BER in E. coli generates multinucleotide repair patches in vivo? In principle, there is no need for the nick to be widened to a gap, as DNA Pol I acts efficiently from a nick. And what would control the extent of the 3' excision?

      Figure 1: The substrates all report only the first phosphodiester cleavage near the 3' end, which is quite a limitation. Do the reported values reflect only the single phosphodiester cleavage? Including the several other nucleotides likely inflates that activity value. And how much is a unit of activity in terms of actual protein concentration? Without that, it's hard to compare the observed activities to the many published studies. As best I know, Exo III was already known to remove a single-nucleotide 3'-overhang, albeit more slowly than the digestion of a duplex, but not zero! We need to be able to calculate an actual specific activity: pmol/min per µg of protein.

      Figures 2 & 3: These address the possible issue of 1-nt excision noted above. However, the question of efficiency is still not addressed in the absence of a more quantitative approach, not just "units" from the supplier's label. Moreover, it is quite common that commercial enzyme preparations contain a lot of inactive material.

      Figure 4D: This gets to the quantitative point. In this panel, we see that around 0.5 pmol/min of product is produced by 0.025 µmol = 25,000 pmol of the enzyme. That is certainly not very efficient, compared to the digestion of dsDNA or cleavage of an abasic site. It's hard to see that as significant.

      Line 459 and elsewhere: as noted above, the activity is not "highly efficient". I would say that it is not efficient at all.

    4. Reviewer #3 (Public Review):

      Overall:

      ExoIII has been described and commercialized as a dsDNA-specific nuclease. Several lines of evidence, albeit incomplete, have indicated this may not be entirely true. Therefore, Wang et al comprehensively characterize the endonuclease and exonuclease enzymatic activities of ExoIII on ssDNA. A strength of the manuscript is the testing of popular kits that utilize ExoIII and coming up with and testing practical solutions (e.g. the addition of SSB proteins ExoIII variants such as K121A and varied assay conditions).

      Comments:

      (1) The footprint of ExoIII on DNA is expected to be quite a bit larger than 5-nt, see structure in manuscript reference #5. Therefore, the substrate design in Figure 1A seems inappropriate for studying the enzymatic activity and it seems likely that ExoIII would be interacting with the FAM and/or BHQ1 ends as well as the DNA. Could this cause quenching? Would this represent real ssDNA activity? Is this figure/data necessary for the manuscript?

      (2) Based on the descriptions in the text, it seems there is activity with some of the other nucleases in 1C, 1F, and 1I other than ExoIII and Cas12a. Can this be plotted on a scale that allows the reader to see them relative to one other?

      (3) The sequence alignment in Figure 2N and the corresponding text indicates a region of ExoIII lacking in APE1 that may be responsible for their differences in substrate specificity in regards to ssDNA. Does the mutational analysis support this hypothesis?

    1. eLife assessment

      This valuable study presents findings that suggest the need for postoperative type 2 diabetes screening and that this should be prioritized in colorectal cancer survivors with overweight/obesity regardless of the type of colorectal cancer treatment applied. The evidence supporting the claims of the authors is solid and the authors use a population-based cohort study including all Danish colorectal patients who had undergone colorectal cancer surgery between 2001-2018. The work will be of interest to medical biologists, endocrinologists and oncologists working on colorectal cancer.

    2. Reviewer #1 (Public Review):

      Summary:<br /> In this study, the authors set out to determine whether colorectal cancer surgery site (right, left, rectal) and chemotherapy impact the subsequent risk of developing T2DM in the Danish national health register.

      Strengths:<br /> - The research question is conceptually interesting<br /> - The Danish national health register is a comprehensive health database<br /> - The data analysis was thorough and appropriate<br /> -The findings are interesting, and a little surprising that there was no impact of chemotherapy on the development of T2DM<br /> - The authors have addressed my previous clarifications and questions.

      - Regarding the generalizability of this study, as the authors discuss the prevalence of T2DM and obesity are lower in Denmark than in a number of other high income countries. Therefore, similar studies in other populations would be of interest.<br /> - The study includes individuals who filled a prescription for diabetes medication, so likely includes some individuals with transient hyperglycemia/steroid induced diabetes during chemotherapy, rather than those with new onset longterm T2DM.

      Overall, the authors achieved their aims, and the conclusions are supported by their results as reported.<br /> The results are unlikely to significantly impact clinical practice or T2DM screening in this population, however are of interest to the community.

    1. eLife assessment

      This study presents a valuable inventory of immune signatures that are correlated with cancer treatment-related pneumonitis. The data were collected and analysed using validated methodology and can be used as a starting point for further prospective studies. The authors have provided a scRNA-Seq analysis with an HD baseline using publicly available dataset and the evidence for their claims is convincing.

    2. Reviewer #2 (Public Review):

      Yanagihara and colleagues investigated the immune cell composition of bronchoalveolar lavage fluid (BALF) samples in a cohort of patients with malignancy undergoing chemotherapy and with lung adverse reactions including Pneumocystis jirovecii pneumonia (PCP) and immune-checkpoint inhibitors (ICIs) or cytotoxic drug induced interstitial lung diseases (ILDs). Using mass cytometry, their aim was to characterize the cellular and molecular changes in BAL to improve our understanding of their pathogenesis and identify potential biomarkers and therapeutic targets. In this regard, the authors identify a correlation between CD16 expression in T cells and the severity of PCP and an increased infiltration of CD57+ CD8+ T cells expressing immune checkpoints and FCLR5+ B cells in ICI-ILD patients.

      The conclusions of this paper are mostly well supported by data, but some aspects of the data analysis need to be clarified and extended.

      The authors should elaborate on why different sets of markers were selected for each analysis step. E.g., Different sets of markers were used for UMAP, CITRUS and viSNE in the T cell and myeloid analysis.

    1. eLife assessment

      Pin1 as an essential prolyl cis/trans isomerase has attracted considerable attention because this enzyme family is implicated in cancer and neurodegenerative diseases. However, the requirement for its catalytic function remains a matter of dispute. The authors provide solid evidence that Pin1 modulates the activity of an important cell signaling kinase, Protein Kinase C, by a non-catalytic mechanism, acting as a chaperone to regulate the stability of this kinase.

    2. Reviewer #1 (Public Review):

      When writing a short review on the function of Pin1 some 15 years ago (Lippens et al., Febs J 2007), we concluded the introduction by the following sentence: "..., it seems that further analysis is required to determine whether binding or catalysis is the primary mechanism through which Pin1 affects cell cycle progression." In the present manuscript, the authors provide experimental evidence for the Pin1/PKC interaction that tips the balance towards interaction and not catalysis.

      Their main data concern the interaction between the V5 domains of two PKC isoenzymes (alpha and betaII) and Pin1. This V5 domain can be further separated into a Turn Motif (TM) and a Hydrophobic Motif (HM), that both can be phosphorylated on specific positions. Phosphorylation in the TM occurs on a TPP motif, and in agreement with previous results on the same motif in Tau, Pin1 cannot isomerize efficiently the TP amide bond when the residue following the proline is another proline. Phosphorylation of the HM is not proline directed but occurs on a serine flanked by 2 aromatic residues (FSF or FSY, according to the isoenzyme). They dissect in detail the interaction of both motifs with the WW and PPIase domains and conclude that the fully phosphorylated V5 peptide binds Pin1 in a directional mode, with the TM binding to the WW domain and the HM to the PPIase domain.

      In the absence of crystals of the complex, they solve a structure by NMR, and use selectively labeled peptides (and probably a lot of NMR time) to obtain a structural model. Finally, they provide functional data by silencing/overepxressing Pin1 and inactive mutants (both at the level of its WW domain and the PPIase domain) in HEK293T cells and evaluating the PKCalpha homeostasis.

      The structural part of this work is interesting, as it is the first structure of Pin1 with a ligand that bridges both domains. They might want to underline this - all other structures in the PDB have a single domain complex, but never both domains by a single longer peptide. I would however question the static representation of this structure - the 90{degree sign} kink in the peptide when complexed is probably one single snapshot, but I hardly believe the PPIase/WW domain orientation to be static. Unless the authors have additional information to stand by this static structure, this point merits being commented on in the manuscript.

      I would like to point out to literature that described for example the non-canonical binding (Yeh ES, Lew BO & Means AR (2006) The loss of PIN1 deregulates cyclin E and sensitizes mouse embryo fibroblasts to genomic instability. J Biol Chem 281, 241-251. Pin1 recognizes cyclin E via a noncanonical pThr384- Gly385 motif [33] rather than the pThr380-Pro381 motif.). They mention briefly the absence of isomerase activity in similar TPP motifs, but this information might already come in the Results section.

      The weakest part seems the in vivo data. Although this is not the main focus of this lab, there is some issues that could be addressed. The expression levels of Pin1 and PKCa are amazingly linear (Fig 7A), but when they overexpress WT Pin1 in a KO line, with 3-4 times higher overexpression, the PKCa levels are hardly higher than in the original WT cell line. Also, the levels in the W34A/R68A/R69A (abolishing both WW and PPIase binding functions) are surprising, why would PKCa levels rise above the level found in the Pin1 KO cells? Finally, if even slight overexpression of the C113S catalytically inactive mutant leads to more efficient PKCa degradation than overexpression of the WT Pin1 (Figure 7C), it is hard to interpret. The conclusion that Pin1-mediated regulation of PKCa requires a bivalent interaction mode of Pin1 with PKCa independent of its catalytic activity do depend on these data, so they merit further analysis.

    3. Reviewer #2 (Public Review):

      Chen, Dixit et al. report on the first structure of a bivalent interaction between a natural interaction partner of Pin1: the C-terminal tail of PKC phosphorylated at two sites. The biggest strength of the paper is the impressive amount of NMR-based structural data that is sound and clearly reported. The authors strive to propose a novel non-catalytic mechanistic role for Pin1 that is supported by cell culture models and somewhat by the interaction assays, however, in my eyes, they fell short in proving their mechanistic hypothesis. Nevertheless, the potential ways Pin1 may modulate PKC's activity is nicely discussed.

    1. eLife assessment

      This computational study is a valuable empirical investigation into the common trait of neurons in brains and artificial neural networks: responding effectively to both objects and their mirror images and it focuses on uncovering conditions that lead to mirror symmetry in visual networks and the evidence convincingly demonstrates that learning contributes to expanding mirror symmetry tuning, given its presence in the data. Additionally, the paper delves into the transformation of face patches in primate visual hierarchy, shifting from view specificity to mirror symmetry to view invariance. It empirically analyzes factors behind similar effects in two network architectures, and key claims highlight the emergence of invariances in architectures with spatial pooling, driven by learning bilateral symmetry discrimination and importantly, these effects extend beyond faces, suggesting broader relevance. Despite strong experiments, some interpretations lack explicit support, and the paper overlooks pre-training emergence of mirror symmetry.

    2. Reviewer #1 (Public Review):

      By using deep convolutional neural networks (CNNs) as model for the visual system, this study aims at understanding and explaining the emergence of mirror-symmetric viewpoint tuning in the brain.

      Major strengths of the methods and results:

      (1) The paper presents comprehensive, insightful and detailed analyses investigating how mirror-symmetric viewpoint tuning emergence in artificial neural networks, providing significant and novel insights into this complex process.<br /> (2) The authors analyze reflection equivariance and invariance in both trained and untrained CNNs' convolutional layers. This elucidates how object categorization training gives rise to mirror-symmetric invariance in the fully-connected layers.<br /> (3) By training CNNs on small datasets of numbers and a small object set excluding faces, the authors demonstrate mirror-symmetric tuning's potential to generalize to untrained categories and the necessity of view-invariant category training for its emergence.<br /> (4) A further analysis probes the contribution of local versus global features to mirror-symmetric units in the first fully-connected layer of a network. This innovative analysis convincingly shows that local features alone suffice for the emergence of mirror-symmetric tuning in networks.<br /> (5) The results make a clear prediction that mirror-symmetric tuning should also emerge for other bilaterally symmetric categories, opening avenues for future neural studies.

      Major weaknesses of the methods and results:

      (1) The authors propose a mirror-symmetric viewpoint tuning index, which, although innovative, complicates comparison with previous work and this choice is not well motivated. This index is based on correlating representational dissimilarity matrices (RDMs) with their flipped versions, a method differing from previous approaches.<br /> (2) Faces exhibit unique behavior in terms of the progression of mirror-symmetric viewpoint tuning and their training task and dataset dependency. Given that mirror-symmetric tuning has been identified in the brain for faces, it would be beneficial to discuss this observation and provide potential explanations.<br /> (3) Previous work reported critical differences between CNNs and neural representations in area AL indicating that mirror-symmetric viewpoint tuning is less present than view invariance in CNNs compared to area AL. While such findings could potentially limit the usefulness of CNNs as models for mirror-symmetric viewpoint tuning in the brain, they are not addressed in the study.<br /> (4) The study's results, while informative, are qualitative rather than quantitative, and lack direct comparison with neural data. This obscures the implications for neural mechanisms and their relevance to the broader field.

      The study provides compelling evidence that learning to discriminate bilaterally symmetric objects (beyond faces) induces mirror-symmetric viewpoint tuning in the networks, qualitatively similar to the brain. Moreover, the results suggest that this tuning can, in principle, generalize beyond previously trained object categories. Overall, the study provides important conclusions regarding the emergence of mirror-symmetric viewpoint tuning in networks, and potentially the brain. However, the conducted analyses and results do not entirely address the question why mirror-symmetric viewpoint tuning emerges in networks or the brain. Specifically, the results leave open whether mirror-symmetric viewpoint tuning is indeed necessary to achieve view invariance for bilaterally symmetric objects.

      Taken together, this study moves us a step closer to uncovering the origins of mirror-symmetric tuning in networks, and has implications for more comprehensive investigations into this neural phenomenon in the brain. The methods of probing CNNs are innovative and could be applied to other questions in the field. This work will be of broad interest to cognitive neuroscientists, psychologists, and computer scientists.

    3. Reviewer #2 (Public Review):

      Strengths

      (1) The statements made in the paper are precise, separating observations from inferences, with claims that are well supported by empirical evidence. Releasing the underlying code repository further bolsters the credibility and reproducibility. I especially appreciate the detailed discussion of limitations and future work.

      (2) The main claims with respect to the two convolutional architectures are well supported by thorough analyses. The analyses are well-chosen and overall include good controls, such as changes in the training diet. Going beyond "passive" empirical tests, the paper makes use of the fully accessible nature of computational models and includes more "causal" insertion and deletion tests that support the necessity and sufficiency of local object features.

      (3) Based on modeling results, the paper makes a testable prediction: that mirror-symmetric viewpoint tuning is not specific to faces and can also be observed in other bilaterally symmetric objects such as cars and chairs. To test this experimentally in primates (and potentially other model architectures), the stimulus set is available online.

      Weaknesses

      My main concern with this paper is in its choice of the two model architectures AlexNet and VGG. In an earlier study, Yildirim et al. (2020) found an inverse graphics network "EIG" to better correspond to neural and behavioral data for face processing than VGG. All claims in the paper thus relate to a weaker model of the biological effects since this work does not analyze the EIG model. Since EIG follows an analysis-by-synthesis approach rather than standard classification training, it is unclear whether the claims in this paper generalize to this other model architecture. It is also unclear if the claims will hold for: 1) transformer architectures, 2) the HMAX architecture by Leibo et al. (2017) which has also been proposed as a computational explanation for mirror-symmetric tuning, and, as the authors note in the Discussion, 3) deeper architectures such as ResNet-50 which tend to better align to neural and behavioral data in general. These architectures include different computational motifs such as skip connections and a much smaller proportion of fully-connected layers which are a major focus of this work.

      Overall, I thus view the paper's claims as limited to AlexNet- and VGG-like architectures, both of which fall behind state-of-the-art in their alignment to primates in general and also specifically for mirror-symmetric viewpoint tuning.

      Minor weaknesses

      (1) Figure 1A: since the relevance to primate brains is a major motivator of this work, the results from actual neural recordings should be shown and not just schematics. For instance, the mirror symmetry in AL is not as clean as the illustration (compare with Fig. 3 in Yildirim et al. 2020), and in the paper's current form, this is not easily accessible to the reader.

      (2) Figure 4 / L832-845: The claims for the effect of training on mirror-symmetric viewpoint tuning are with respect to the training data only, but there are other differences between the models such as the number of epochs (250 for CIFAR-10 training, 200 for all other datasets), the learning rate (2.5 * 10^-4 for CIFAR-10, 10^-4 for all others), the batch size (128 vs 64), etc. I do not expect these choices to make a major difference for your claims, but it would be much cleaner to keep everything but the training dataset consistent. Especially the different test accuracies worry me a bit (from 81% to 92%, and they appear different from the accuracy numbers in figure S4 e.g. for CIFAR-10 and asymSVHN), at the very least those should be comparable.

      (3) L681-685: The general statement made in the paper that "deeper models lose their advantage as models of cortical representations" is not supported by the cited limited comparison on a single dataset. There are many potential confounds here with respect to prior work, e.g. the recording modality (fMRI vs electrodes), the stimulus set (62 images vs thousands), the models that were tested (9 vs hundreds), etc.

    4. Reviewer #3 (Public Review):

      This study aimed to explore the computational mechanisms of view invariance, driven by the observation that in some regions of monkey visual cortex, neurons show comparable responses to (1) a given face and (2) to the same face but horizontally flipped. Here they study this known phenomenon using AlexNet and other shallow neural networks, using an index for mirror symmetric viewpoint tuning based on representational similarity analyses. They find that this tuning is enhanced at fully connected- or global pooling layers (layers which combine spatial information), and that the invariance is prominent for horizontal- but not vertical- or rotational transformations. The study shows that mirror tuning can be learned when a given set of images are flipped horizontally and given the same label, but *not* if they are flipped and given different labels. They also show that networks learn this tuning by focusing on local features, not global configurations.

      I found the study to be a mixed read. Some analyses were fascinating: for example, it was satisfying to see the use of well-controlled datasets to increase or decrease the rate of mirror-symmetry tuning. The insertion- and deletion¬ experiments were elegant tests to probe the mechanisms of mirror symmetry, asking if symmetry could arise from (1) global feature configurations (in a holistic sense) vs. (2) local features, with stronger evidence for the latter. These two sets of results were successful and interpretable. They stand in contrast with the first analysis, which relies on observations that do not seem justified. Specifically, Figure 2D shows mirror-symmetry tuning across 11 stages of image processing, from pixels space to fully connected layers. It shows that images from different object categories evoke considerably different tuning index values. The explanation for this result is that some categories, such as "tools," have "bilaterally symmetric structure," but this is not explicitly measured anywhere. "Boats" are described as having "front-back symmetry," more so than flowers. One imagines flowers being extremely symmetric, but perhaps that depends on the metric. What is the metric? At first I thought it was the mirror-symmetric viewpoint tuning index in the image (pixel) space, but this cannot be, as the index for faces and flowers is negative, cars have no symmetry, and boats are positive. To support these descriptions, one must have an independent variable (for object class symmetry) that can be related to the dependent variable (the mirror-symmetric viewpoint tuning index). If it exists, it is not a part of the Results section. This omission undermines other parts of the Results section: "some car models have an approximate front-back symmetry...however, a flower typically does not..." "Some," "typically:" how many in the dataset exactly, and how often? The description of CIFAR-10 as having bilaterally symmetric categories - are all these categories equally symmetric? If not, would such variability matter in terms of these results? These assessments of object category symmetry values are made before experiments are presented, so they are not interpretations of the results, and it would be circular to write it otherwise.

      Overall, my bigger concern is that the framing is misleading or at best incomplete. The manuscript successfully showed that if one introduces left-right symmetry to a dataset, the network will develop population-level representations that are also bilaterally symmetric. But the study does not explain that the model's architecture and random weight distribution are sufficient for symmetry tuning to emerge, without training, just to a much more limited degree. Baek et al. showed in 2021 that viewpoint-invariant face-selective units and mirror-symmetric units emerge in untrained networks ("Face detection in untrained deep neural networks"; this current manuscript cites this paper but does not mention that mirror symmetry is a feature of the 2021 study). This current study also used untrained networks as controls (Fig. 3), and while they were useful in showing that learning boosts symmetry tuning, the results also clearly show that horizontal-reflection invariance is far from zero. So, the simple learning-driven explanation for the mirror-symmetric viewpoint tuning for faces is wrong: while (1) network training and (2) pooling are mechanisms that charge the development of mirror-symmetric tuning, the lottery ticket hypothesis is enough for its emergence. Faces and numbers are simple patterns, so the overparameterization of networks is enough to randomly create units that are tuned to these shapes and to wire many of them together. How learning shapes this process is an interesting direction, especially now that this current study has outlined its importance.

      Finally, it would help to cite other previous demonstrations of equivariance and mirror symmetry in neural networks. Chris Olah, Nick Cammarata, Chelsea Voss, Ludwig Schubert, and Gabriel Goh of OpenAI wrote of this phenomenon in 2020 (Distill journal).

      Some other observations that might help:

      - I am enthusiastic about the experiments using different datasets to increase or decrease the rate of mirror-symmetry tuning (sets including CIFAR10, SVHN, symSVHN, asymSVHN); it is worth noting, however, that the lack of a ground truth metric for category symmetry is a problem here too. In the asymSVHN dataset, images are flipped and given different labels. If some categories are naturally symmetric after horizontal flips, such as images containing "0" or "8", then changing the label is likely to disturb training. This would explain why the training loss is larger for this condition (Figure S4D).

      - It is puzzling why greyscale 3D rendered images are used. By using greyscale 3D render (at least as shown in the figures) the study proceeds as if the units are invariant under color transformations. Unfortunately, this is not true and using greyscale images impact the activations of different layers of Alexnet in a way that is not fully defined. Moreover, many units in shallow networks focus on color and exactly these units could be invariant to other transformation like the mirror symmetry, but grey scaling the images makes them inactive.

    1. eLife assessment

      This manuscript presents a detailed phenotyping of the role of dietary iron in a large number of genetically distinct mouse strains. There are exciting and convincing data that could be valuable in their impact on the fields of nutrition, iron metabolism and anemia.

    2. Reviewer #1 (Public Review):

      In this manuscript, the authors perform a very thorough, extensive characterization of the impact of an iron-rich diet on multiple phenotypes in a wide range of inbred mouse strains. While a work of this type does not offer mechanistic insights, the value of the study lies not only in its immediate results but also in what it can offer to future researchers as they explore the genetic basis of iron levels and other related phenotypes in rodent studies. The creation of a web resource and the offer from the authors to share all available samples is particularly laudable, and helps to increase the accessibility of the work to other scientists. There is one shortcoming to the work however. To induce iron overload in mice in the main study in this work, mice were placed on an iron-rich diet that differed in its composition from the baseline diet in more than just iron. This could influence some of the phenotypes observed in this study.

    3. Reviewer #2 (Public Review):

      Here, the authors tried to identify the genes and biological pathways underlying iron overload and its associated pathologies in mice. Several wet lab experiments and measurements alongside many bioinformatic analyses like GWAS, RNA-seq data analysis (DEG), eQTL analysis, TWAS, and gene-set enrichment analysis have been performed. The study design is good enough and the author tried to validate the results. The data have been submitted (Accession #: GSE230674) but are not public yet.

      (1) The main issue of this manuscript is its length. It's too long, especially the result section. It's hard for readers to follow the paper. Moreover, you added results about other minerals, mostly copper, which seems too much (considering the fact that this study is about iron). The text doesn't have the required Integrity and focus. You should decide where you want to put the focus of this manuscript and I strongly recommend shortening the manuscript, try to be short and sweet as much as you can.<br /> (2) Also, the "Methods" section is long, some parts are over-detailed (mostly wet lab procedures) and some parts are not detailed enough. It seems the "Statistical analyses" part doesn't have extra information. I recommend removing the first paragraph and moving some of the information from the second paragraph to the right place in the Method section.<br /> (3) Some part of your discussion section, is retelling the results. Please discuss your results and compare them with previous findings.<br /> (4) Add detail about your GWAS model. As you had repeated samples from each strain, it's good to mention how you considered this. Also, show how you determined the significance threshold.<br /> (5) The abstract could be better. It also doesn't have a conclusion.<br /> (6) Page 8, lines 4-7: Please remove these lines or move them to the Method section. The last paragraph of the introduction should clearly explain the goal of the study.<br /> (7) Page 68, line 13: Explain the abbreviation (RINe) before use. Also, most probably it is RIN (RNA Integrity Number).<br /> (8) The heritability estimates seem high and the 1% difference between broad- and narrow-sense heritability means there is almost no dominant and epistatic genetic variance between alleles affecting the studied trait (which is hard to accept). I recommend considering a within-group (strain) variance (common environmental effect) component in the model to absorb this source of variation in this component, so the genetic variance and consequently the heritability estimates would be more accurate. You also can consider this source of variance in your GWAS model.

    1. eLife assessment

      This valuable study provides new insight into potential subtle dynamics in effector biology. The data presented generally support the claims, but in some cases controls are missing and so the overall work is currently incomplete. If the limitations can be addressed, this work should be of broad relevance for biologists interested in molecular plant-microbe interactions.

    2. Reviewer #1 (Public Review):

      The authors have identified the predicted EBE of PthA4 in the promoter of Cs9g12620, which is induced by Xcc. The authors identified a homolog of Cs9g12620, which has variations in the promoter region. The authors show that PthA4 suppresses Cs9g12620 promoter activity independent of the binding action. The authors also show that CsLOB1 binds to the promoter of Cs9g12620. Interestingly, the authors show that PthA4 interacts with CsLOB1 at the protein level. Finally, it shows that Cs9g12620 is important for canker symptoms. Overall, this study has reported some interesting discoveries and the writing is generally well done. However, the discoveries are affected by the reliability of the data and some flaws in the experimental designs.

      Here are some major issues:<br /> The authors have demonstrated that Cs9g12620 contains the EBE of PthA4 in the promoter region, to show that PthA4 controls Cs9g12620, the authors need to compare to the wild type Xcc and pthA4 mutant for Cs9g12620 expression. The data in Figure 1 is not enough.

      The authors confirmed the interaction between PthA4 and the EBE in the promoter of Cs9g12620 using DNA electrophoretic mobility shift assay (EMSA). However, Figure 2B is not convincing. The lane without GST-PthA4 also clearly showed a mobility shift. For the EMSA assay, the authors need also to include a non-labeled probe as a competitor to verify the specificity. The description of the EMSA in this paper suggests that it was not done properly. It is suggested the authors redo this EMSA assay following the protocol: Electrophoretic mobility shift assay (EMSA) for detecting protein-nucleic acid interactions PMID: 17703195.

      The authors also claimed that PthA4 suppresses the promote activity of Cs9g12620. The data is not convincing and also contradicts with their own data that overexpression of Cs9g12620 causes canker and silencing of it reduces canker considering PthA4 is required for canker development. The authors conducted the assays using transient expression of PthA4. It should be done with Xcc wild type, pthA4 mutant, and negative control to inoculate citrus plants to check the expression of Cs9g12620.

      Figure 6 AB is not convincing. There are no apparent differences. The variations shown in B are common in different wild-type samples. It is suggested that the authors conduct transgenic instead of transient overexpression.

      Gene silencing data needs more appropriate controls. Figure D seems to suggest canker symptoms actually happen for the RNAi treated. The authors need to make sure the same amount of Xcc was used for both CTV empty vector and the RNAi. It is suggested a blink test is needed here.

    3. Reviewer #2 (Public Review):

      The following submission titled "Xanthomonas citri subsp. citri type III effector PthA4 directs the dynamical expression of a putative citrus carbohydrate-binding gene for canker formation" by Chen et al. provides evidence that PthA4 binds to PCs9g12620 to downregulate expression potentially for citrus canker disease development. They tackle a relevant, complicated problem about the timing and regulation of an S gene expression and its relationship to disease development. Most often research stops at an S gene that is upregulated. This study aims to define the complexity of TAL effector family proteins beyond their standard activation role. Cs9g12620 encodes a putative carbohydrate-binding protein, and downregulation of this occurs via PthA4-CsLOB1 direct interaction. Silencing of Cs9g12620 leads to reduced virulence of X. citri, highlighting its importance as an S gene target from PthA4-mediated CsLOB1 induction. The authors also hypothesize that PthA4 represses the expression of Cs9g12620, and it seems to depend not on DNA binding by PthA4 but rather CsLOB1 interaction. This provides an interesting mode of action for a TAL effector, which typically is described as a transcription factor. An overall curiosity is that TAL effectors like PthA4 induce gene expression for virulence activity, but the authors do not probe this question with artificial TAL effectors or PthA4 variants to define the domains required for this activity. These tools, which are widely used in TAL effector research, could help determine what domain is responsible for this repression and if it is unique to PthA4 or a general TAL phenomenon. Work is further needed to also demonstrate the repressive role of PthA4 over time because it is not explicitly clear that the time-related suppression is directly attributed to the PthA4-CsLOB1 interactions.

      (1) The authors show that both WT but not WT expressing AvrXa7 induce Cs9g12620 and CsLOB1. They performed an adjacent supportive experiment comparing a Tn5-disrupted pthA4 to WT and saw a similar induction. Do the authors have a southern blot or genome sequence to show this is the true mutation? Have the authors complemented the Tn5 strain with pthA4 and an artificial TAL effector?

      (2) Figure 2 and "The expression of Cs9g12620 depends on pthA4 during Xcc infection" section: Overall I cannot determine the biological importance as written in the text about examining an ortholog of Cs9g12620 that is not expressed. The title of Figure 2 is: "Cs9g12620 and Cs9g12650 show different profiles of expression owing to the genetic variation in promoter." What is the biological importance of showing that there is promoter variation when the RNA-seq pointed to this target? This is unclear. Now, an interesting experiment would be to create an artificial TAL that activates the expression of Cs9g12650, which was, yes, not expressed in Nicotiana, but this wasn't examined in citrus and could be with an artificial TAL effector. Moreover, if this is about how something is not expressed, this seems out of place in the story before we arrive at the repression aspect of the narrative. Is the lack of expression a typical state of this gene family and do TAL effectors induce this for virulence? Is it also possible that RT alone isn't sensitive enough to detect relevant Cs9g12650 expression? Could the authors rather build on their RNAseq data or maybe use qPCR, a more sensitive approach, to see if this gene is expressed. Overall, this seems like a non-issue still because it isn't clear why this is important to support their narrative. Finally "2 μg of total RNA extracted" seems to be an extremely high input for RT. In summary here, it would be nice to see the hypothesis they tested and how it supports their overall aim because this is unclear.

      (3) Figure 3C: The authors should include a 35S::GUS + 35S::pthA4 control. This control is missing to show that the suppression is not due to overexpressing the two proteins simultaneously.

      (4) Figure 3E&G are just the same but rotated. Please include a separate replicate as this would be more beneficial to examine. With this and concerns on some of the reporting, the raw data and images should be included as supplemental for each replicate and detailed as if they are a regular figure.

      (5) Figure 3G: What is low and high? There are quantifiable values (e.g. RLU) here that correspond to the intensity of the figure legend. There should be a water/buffer infiltrated control.

      (6) Figure 3F: The Y1H data demonstrate that PCs9g12620 is bound by PthA4. The second panel for the gel mobility shift is however lacking a complementary treatment with PCs9g12620 WT. These gel mobility shift assays are always relative to something, and there is no comparison here unfortunately to other treatments. An example to follow as a model for formatting and experimental design could include as seen in Figure 5 by Duan et al. MPP (DOI:10.1111/mpp.12667). These should be performed as a single experiment not separated by panel D. A GST-Tag only should always be an additional control.

      (7) Figure 4: CsLOB1 activates Cs9g12620. Figure 4C: A reasonable control would be to include 35S::GUS and 35S::PthA4.

      (8) Figure 5F: The purpose of this experiment to show the multiplication over time and increase is not clear. It would be expected to see an increase in growth over time during susceptibility; so why was this documented?

      (9) Figure 5: Cs9g12620 expression decreases along with expansion and pectin esterase expression. How do we know that this is not a general downregulation of gene expression more broadly due to cell death or tissue deformation at 10 dpi? To test if this is also PthA4-specific, an experiment needed would be to test a specific pthA4 mutant rather than the TAL effectorless strain, which is already pretty weak a pathogen and does not trigger expression of any tested genes to wild-type levels to see if this is a general trend or specific to PthA4 activity. Finally, why are the color bars switched for time points 5 & 10 dpi for the effectorless strain? This is the finding that led them to suggest the repression. According to the rest of the figure, the gray and black are typically 5 and 10 dpi, respectively, but they seem to be switched to fit the narrative.

      (10) Figure 6 nicely documents the interaction between PthA4 and CsLOB1, but why did the authors not take the additional step to define what domains are required for PthA4 interaction? This is an important curiosity of what mediates this interaction. Was it the repeats or C- or N-terminus? Is this general to TAL effectors or precise to PthA4? This seems like the crux of the story especially since there is a TAL effector binding cited in the promoter.

      (11) Figure 7: RNAi-mediated silencing of Cs9g12620 demonstrates that this gene is a susceptibility target for X. citri as seen by colonization (E). First, the symptoms are not quite clear in A, and the morphological changes are unclear. Are there additional images for these to showcase the difference reproducibly? They hypothesize that there is complexity in Cs9g12620 expression during infection as proposed in Figure 8. It seems pretty important to perturb this in the opposite direction with artificial TAL effectors that either target a) Cs9g12620 for induction and b) CsLOB1 in a 049E background. One would hypothesize that this would not allow for the CsLOB1 interaction because they demonstrate this is PthA4-specific and therefore Cs9g12620 expression would not decrease while CsLOB1 is induced.

      (12) Figure 8: It is unclear if this is an appropriate model. The impact of CsLOB1-PthA4 interaction is depicted as a late phenomenon based on Cs9g12620 expression. However, it is not clear from their data that the CsLOB1-PthA4 interaction does not happen at the early stages of infection. This is not defined by their experiments proposed. As mentioned above, an overall concern is that the authors do not test variants of PthA4 or domains that could examine specifically what permits this suppression. Is this a general TAL effector structure-mediated phenomenon or is it something unique about PthA4 in this family? Does it require both DNA binding and interaction with CsLOB1?

    1. eLife assessment

      This important study reveals the molecular basis of mutualism between a vector insect and a bacterium responsible for the most devastating disease in citrus agriculture worldwide. The evidence supporting the conclusions is compelling, with biochemical and gene expression analysis demonstrating the phenomenon. However, there are concerns related to the presentation, as well as lack of sufficient information about data analysis, both of which should be clarified and/or extended. With these matters addressed, this work will be of great interest to the fields of vector-borne disease control and host-pathogen interaction.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The manuscript by Jiayun Li and colleagues aims to provide insight into adipokinetic hormone signaling that mediates the fecundity of Diaphorina citri infected by 'Candidatus Liberibacter asiaticus'. CLas-positive D. citri are more fecund than their CLas-negative counterparts and require extra energy expenditure. Using FISH, qRT-PCR, WB, RNAi, and miRNA-related methods, authors found that knockdown of DcAKH and DcAKHR not only resulted in triacylglycerol accumulation and a decline of glycogen but also significantly decreased fecundity and CLas titer in ovaries. miR-34 suppresses DcAKHR expression by binding to its 3' untranslated region, whilst overexpression of miR-34 resulted in a decline of DcAKHR expression and CLas titer in ovaries and caused defects that mimicked DcAKHR knockdown phenotypes. Most of the methods and results are solid and valuable, but I have a number of concerns with this paper, relating to the writing and lack of sufficient information about data analysis.

    3. Reviewer #2 (Public Review):

      Diaphorina citri is the primary vector of Candidatus Liberibacter asiaticus (CLas), but the mechanism of how D. citri maintains a balance between lipid metabolism and increased fecundity after infection with CLas remains unknown. In their study, Li et al. presented convincing methodology and data to demonstrate that CLas exploits AKH/AKHR-miR-34-JH signaling to enhance D. citri lipid metabolism and fecundity, while simultaneously promoting CLas replication. These findings are both novel and valuable, not only have theoretical implications for expanding our understanding of the interaction between insect vectors and pathogenic microorganisms but also provide new targets for controlling D. citri and HLB in practical implications. The conclusions of this paper are mostly well supported by data, but some aspects of phrasing and data analysis need to be further clarified and extended.

      Key Considerations:

      There are specific instances where additional information would enhance comprehension of the results and their interpretation.

      There seem to be two inconsistencies related to some results depicted in Figures 1, 2, 3 and 5.

      Firstly, Figure 1 shows the effect on CLas infection (CLas+) compared to the control (CLas-), where results show an increase of TAG, Glycogen, lipid droplet size, oviposition period, and fecundity. In Figures 2, 3, and 5, the authors establish the involvement of the genes DcAKH, DcAKHR, and miR34 in this process, by showing that by preventing the function of these three factors the effects of CLas+ are lost. However, while Figure 1 shows the increase of TAG and lipid droplet size in CLas+, Figures 2, 3, and 5 do not show a significant elevation in TAG when comparing CLas- and CLas+.

      Secondly, in addition to the absence of statistical difference in TAG and lipid droplet size observed in Figure 1, Figures 2, 3, and 5 show an increase in TAG and lipid droplet size after dsDcAKH (Figure 2), dsDcAKHR (Figure 3) and agomiR34 (Figure 5) treatments. Considering that AKH, AKHR, and miR34 are important factors to CLas-induce increase in TAG and lipid droplet size, one might expect a reduction in TAG and lipid droplet size when CLas+ insects are silenced for these factors, contrary to the observed results.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This important study nicely integrates a breadth of experimental and computational data to address fundamental aspects of RNA methylation by an important for biology and health RNA methyltransferases (MTases).

      Strengths:

      The authors offer compelling and strong evidence, based on carefully performed work with appropriate and well-established techniques to shed light on aspects of the methyl transfer mechanism of the methyltransferase-like protein 3 (METTL3), which is part of the methyltransferase-like proteins 3 & 14 (METTL3-14) complex.

      Weaknesses:

      The significance of this foundational work is somewhat diminished mostly due to mostly efficient communication of certain aspects of this work. Parts of the manuscript are somewhat uneven and don't quite mesh well with one another. The manuscript could be enhanced by careful revision and significant textual and figure edits. Examples of recommended edits that would improve clarity and allow accessibility to a broader audience are highlighted in some detail below.

      We thank the reviewer for the positive evaluation of our work. We have followed the suggestions and modified the text and figures as detailed further in our answers to the specific recommendations.

      Reviewer #2 (Public Review):

      Summary:

      Caflisch and coworkers investigate the methyltransferase activity of the complex of methyltransferaselike proteins 3 and 14 (METTL3-14). To obtain a high-resolution description of the complete catalytic cycle they have carefully designed a combination of experiments and simulations. Starting from the identification of bisubstrate analogues (BAs) as binders to stabilise a putative transition state of the reaction, they have determined multiple crystal structures and validated relevant interactions by mutagenesis and enzymatic assays.

      Using the resolved structure and classical MD simulations they obtained a kinetic picture of the binding and release of the substrates. Of note, they accumulated very good statistics on these processes using 16 simulation replicates over a time scale of 500 ns. To compare the time scale of the release of the products with that of the catalytic step they performed state-of-the-art QM/MM free energy calculations (testing multiple levels of theory) and obtained a free energy barrier that indicates how the release of the product is slower than the catalytic step.

      Strengths:

      All the work proceeds through clear hypothesis testing based on a combination of literature and new results. Eventually, this allows them to present in Figure 10 a detailed step-by-step description of the catalytic cycle. The work is very well crafted and executed.

      We thank the reviewer for the positive evaluation of our work.

      Weaknesses:

      To fulfill its potential of guiding similar studies for other systems as well as to allow researchers to dig into their vast work, the authors should share the results of their simulations (trajectories, key structures, input files, protocols, and analysis) using repositories like Zenodo, the plumed-nest, figshare or alike.

      The reviewer is right. We have uploaded the simulation materials to Zenodo: the MD simulation data (trajectories, pdb files, parameter files), and the PLUMED file that was used for the DFTB3/MM metadynamics simulations. We provide the link in the “Data availability” section.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Coberski et al describes a combined experimental and computational study aimed to shed light on the catalytic mechanism in a methyltransferase that transfers a methyl group from Sadenosylmethionine (SAM) to a substrate adenosine to form N6-methyladenosine (m6A).

      Strengths:

      The authors determine crystal structures in complex with so-called bi-substrate analogs that can bridge across the SAM and adenosine binding sites and mimic a transition state or intermediate of the methyltransfer reaction. The crystal structures suggest dynamical motions of the substrate(s) that are examined further using classical MD simulations. The authors then use QM/MM calculations to study the methyl-transfer process. Together with biochemical assays of ligand/substrate binding and enzyme turnover, the authors use this information to suggest what the key steps are in the catalytic cycle. The manuscript is in most places easy to read.

      We thank the reviewer for the positive evaluation of our work.

      Weaknesses:

      My main suggestion for the authors is that they show better how their conclusions are supported by the data. This includes how the electron density maps for example support the key interactions and water molecules in the active site and a better error analysis of the computational analyses.

      We thank the reviewer for the comments and suggestions. We have followed the suggestions and added error analysis of the computational results as well as additional figures (in the supplementary information) that illustrate key interactions and water molecules in the active site supported by the electron density.

      Reviewer #1 (Recommendations For The Authors):

      • The phrasing of the second sentence in the introduction is difficult to read. I am not sure it is necessary to define the DRACH motif if you are also giving the exact consensus sequence unless providing more context for other instances of the DRACH motif. Referring to this motif instead as "consensus sequence GGACU? may be more effective.

      The reviewer is right. We corrected the sentence accordingly.

      • In the second paragraph of the introduction, a further short description of how METTL3-14 is "involved" in diseases would be appreciated.

      We thank the reviewer for the comment. We made that clearer by including “by promoting the translation of genes involved in cell growth, differentiation, and apoptosis” together with a reference.

      • Is there any evidence that inhibiting METTL3-14 doesn't negatively impact healthy cells?

      We thank the reviewer for the question. Yes, there is such evidence and we added to the sentence “but not in normal non-leukaemic haemopoietic cells” together with a reference to make this point clearer.

      • Bringing up the MACOM complex in the third paragraph of the introduction is perhaps not necessary unless further discussing the MACOM complex later.

      The reviewer is right. We removed the mention of the MACOM complex.

      • Figure 1B: Color coding is difficult to distinguish on a screen and print out. More contrasting colors would be helpful.

      We thank the reviewer for the suggestion. We removed the transparency from the protein cartoon representation that was the reason for the low contrast.

      • The level of detail in the "MD simulations for mechanistic studies of RNA MTases" is not advised. Would strongly encourage condensing this section to improve clarity and accessibility to a larger audience.

      The reviewer is right. We removed non-essential parts of this paragraph.

      • Confirming the role of the hydroxyl in Y406 would be better supported by a Y406 -> F406 mutant because the A406 mutant could bind differently due to a loss of pi-stacking interactions.

      The purpose of the Y406A mutant was to eliminate the interaction of the aromatic sidechain with adenosine as seen from the structure with BA4. Since there is no involvement of the Y406-OH group with adenosine, mutating to F did not seem sufficient. Furthermore, by mutating Y406 to alanine, we also eliminate the possibility for a water-mediated hydrogen bond to the W398 backbone. Hence, with the alanine mutant we achieve the strongest possible effect on the enzymatic activity while the integrity of the active site is maintained as seen from the thermal shift assay.

      • For Figure 4D, can the authors justify why SAH was used as a metric for SAM binding instead of using SAM directly? Additionally, referring to the RNA as "ligand" instead of "RNA" in the Figure caption is more confusing than simply calling it RNA.

      We thank the reviewer for the comment. With the TSA, we wanted to show that with the adenosine binding mutants, the integrity of the METTL3 active site is still intact. It was shown that SAH is bound with higher affinity than SAM by METTL3 (DOI: 10.1016/j.celrep.2019.02.100). Since the magnitude of the thermal shift depends also on the affinity, we chose the higher-affinity binder SAH. There is no RNA per se shown in this figure. “Ligands” in the figure caption (A) refers to the three bound molecules that are shown and mentioned in the previous sentence: SAM, BA2, and BA4. “Ligand” in the figure caption (D) refers to “SAH” that was used in the experiment described and mentioned just after, but is now removed.

      • Figure 5D is very difficult to interpret. Removing the ribbons representing Y406 movement may make it easier to see. Color coding the Supplementary Movie 1 to match would be also helpful.

      The reviewer is right. We have changed the figure to make the different conformations of METTL3 and its Y406 sidechain clearer. However, we left the coloring of the different conformations as the colors are connected to different time points of the simulation. Following the suggestion of the reviewer we changed the coloring of SAM and AMP to match that of the supplementary movie.

      • Figure 10 is overwhelming as is. Removing the grey area around the binding sites and toning down the color of the substrate binding sites would help with visibility. The size of the chemical structures and illustrations is currently too small to easily be made out. A full page-sized figure may be beneficial for this figure.

      We agree with the reviewer and have changed the figure to make each reaction step clearer and better recognizable.

      Minor >edits

      • Change "Despite the growing knowledge on the diverse pathways" to "Despite growing knowledge of the diverse pathways involving METTL3-14".

      We corrected the sentence.

      • Perhaps use "redundant active site" instead of "degenerate active site".

      We changed the word as suggested.

      • Consider moving "The METTL3 MTase domain has the catalytically active SAM binding site and adopts a Rossmann fold that is characteristic of Class I SAM-dependent MTases" to before "METTL14 also has an MTase domain, however, with a degenerate active site of hitherto unknown function, and so-called RGG repeats at its C-terminus essential for RNA binding" to keep information about METTL3 together.

      We shifted the part of the text as suggested.

      • "Molecular dynamics studies have mainly focused on protein and bacterial MTases"? Does this mean bacterial MTases that methylate proteins?

      We thank the reviewer for the comment. This means bacterial MTases in general. The example that we mention is of a bacterial MTase that methylates a chemical precursor. We changed the sentence slightly to make that clearer.

      • In "Bisubstrate analogues bind in the METTL3 active site", please consider the following:

      • Change "and to investigate" to "and investigated".

      • Briefly describe the enzymatic assay in the main text.

      • Either more clearly defining "least potent" or change to "have the highest IC50 values".

      We made all the suggested changes to improve the description of the assay and its outcomes.

      • In Figure 3, remove some of the amino acid labels from panels A, C, and E for clarity, especially since panels B, D, and F more clearly demonstrate the interactions.

      We removed amino acids that were not involved in polar contacts and adapted the figure caption accordingly.

      • In panels 3D, 3F, and 4B, the lightning bolts are too small to make out as lightning bolts. An asterisk or other symbol may be easier to distinguish.

      We made the lightnings more than double the size to make them better recognizable.

      • In Figure 4C, no units are provided on the y-axis. Additionally, I do not believe the arrows indicating "Loss of activity" are necessary.

      These are arbitrary units as it is a ratio which is explained in the materials and methods section. We removed the arrows following the suggestion of the reviewer.

      • While demonstrating mutants with no activity still retain SAM binding is suggestive of the mutant impacting RNA binding, this would still be better supported with RNA binding studies. Electrophoretic mobility shift assays would be sufficient if Tm studies are time-consuming. While these experiments could be informative, we also acknowledge that they may be outside the scope of this current report.

      We thank the reviewer for suggesting these experiments and acknowledging that they would be outside of the scope of the current study. Such RNA binding experiments can turn out to be very time consuming, both in TSA and EMSA. The reason is mainly this: The RNA substrate must be chosen such that it binds sufficiently strong to the WT to cause an effect (thermal shift or electrophoretic mobility shift), but also to observe a clear difference in binding between WT and mutant proteins. Since many more residues of METTL3 and METTL14 contribite to RNA binding, the effects of individual mutants on affinity might be too small to be confidently detected in TSA or EMSA. In particular, we only identified the substrate adenosine binding residues, and mutating them and hence preventing adenosine binding alone, might not have a big effect on overall RNA binding affinity. The enzymatic assay that we used, on the other hand, is more sensitive since the detection is fluorescence based and quantifies the conversion of A to m6A in an RNA substrate, and more factors than just affinity play a role for enzymatic activity, such as correct orientation and stability of the adenosine in the active site and stabilization of the transision state.

      • A written narrative to accompany Supplementary Movie 1 would make it much more accessible to those unfamiliar with modeling and simulations.

      We thank the reviewer for the comment. We expanded the caption to the movie with a narrative describing different events at different time points in the movie.

      • Table 3 could be made clearer to those without MD experience by defining/indicating the top row as different computational models.

      The reviewer is right. We have added a footnote to Table 3 to clearly indicate the different density functional theory and semi-empirical density functional tight binding method used in this study. We also added another line in the table.

      • In the conclusion, the authors state "the height of the QM/MM free-energy barrier indicates that the methyl transfer step is not rate-determining." How does this compare to experimental data? Additional kinetic assays to demonstrate this experimentally would go a long way in convincing the reader of this conclusion.

      We thank the reviewer for the question. Kinetic assays have been performed for METTL3-14 and we mention and reference them in the text. We believe that further kinetic experiments would be outside of the scope of this study. Furthermore, the METTL3 mutants that we made show no activity in our enzymatic assay and hence kinetic studies would be probably impossible to do with them.<br /> As we show from QM/MM and describe in the text, the methyl cation in the SAM cofactor is transferred directly to the N6 position of the adenosine substrate. DFTB3/MM free energy simulations show that this mechanism has an energetic barrier of 15-16 kcal/mol. The turnover as published based on an enzymatic assay is 0.2-0.6 min-1 at ambient temperature which implies a barrier of ~20 kcal/mol. This value is higher than that determined for the methyl transfer alone as determined by QM/MM. Hence, in the overall mechanism, there must be a step that is slower than the methy transfer and hence we conclude that the methyl transfer is not the rate-limiting step.

      Reviewer #3 (Recommendations For The Authors):

      I only have a few comments about the work.

      (1) It would be good if the authors could show more of the data that is used as the basis for their conclusions. For example, IC50 values are presented (Table 1) without error estimates or an indication of the quality of the data that is used to estimate the data.

      We thank the reviewer for the suggestion. We included errors of the IC50 values and show the dose response curved from the enzymatic assay with the BAs as inhibitors in a new Supplementary Figure S1.

      (2) More substantially, it would be good to have a more detailed analysis of the crystal structures in terms of the properties that are mentioned/analysed. While the structures are relatively good (2.1 Å2.5Å), it is not clear to the reader how this data supports the interactions that are proposed. For example, the authors pinpoint a number of hydrogen bonding interactions and water molecules in the complexes. They might consider showing support for some of these in the electron density maps. Similarly, it would be good to show the densities that support the substantial differences of the Ade in the BA2 and BA4 complexes. These might be supplementary files. I note also that the structures are not yet released or available for analysis [which of course is a valid choice but also means that I cannot inspect the maps myself].

      We have added supplementary figures supporting the conformations of the BAs and their interactions with METTL3 with electron density, for BA1 and BA6 in Supplementary Figure S2, and for BA2 and BA4 in a new Supplementary Figure S3.

      (3) It would be useful with an error analysis of the off-rates estimated from the MD simulations and a discussion of the accuracy of these estimates. Even the slower dissociation events seem quite fast. What are the rough affinities of these molecules and how fast would the binding need to be to be compatible with the affinity and estimated off-rates?

      We expanded upon this in the results paragraph concerning the MD simulations. The affinities of METTL3-14 binding to AMP or m6AMP can be expected to be very low, with Kd values in the millimolar range. We have not measured these Kd values, nor have we found any published data, but we have conducted thermal shift assays with A and m6A and did not observe any significant thermal shifts in the melting temperature of METTL3-14 at high micromolar concentrations of these compounds, indicative of a very low binding affinity. This is to be expected because METTL3-14 should not methylate adenosines unspecifically but rather in the GGACU motif of substrate mRNA.

      (4) The authors use QM/MM simulations with metadynamics to estimate the energy profile of the methyl transfer reaction. They find a barrier of ca. 15 kcal/mol and suggest this to be compatible with the enzymatic turnover rate of ca. 0.3/min. Here it would be good with a clearer description of the possible sources of error and assumptions in making these statements. First, what is the error on the estimated energy profile from the metadynamics? The authors mention the analysis of progression of the PMF as a function of time, but that is in itself not a strong test for convergence (the PMF may stay constant if there is little sampling). What does the time series of the CV look like? Second, it seems as if the authors are assuming a large pre-exponential factor (10^9/s ?). Is that correct, and how sure are they of this value? Finally, when linking the barrier of the methyl-transfer reaction to the overall turnover rate it sounds like they assume that other parts of the reaction do not affect the turnover rate. Is that correctly understood, and what is the evidence for that? It sounds like the authors are saying that step 5 in the cycle (Figure 10) is limiting.

      We thank the reviewer for the questions. Accordingly, we have carried out additional simulations and statistical error analyses.

      (i) We have carried out two additional sets of multi-walker metadynamics simulations with the same setup as the original calculation, except for using different initial random seeds. Using the three independent sets of metadynamics simulations, we can better estimate the statistical uncertainty for the computed potential of mean force (PMF). We have updated the PMF in Fig. 8b, in which the solid curve represents the result averaged over three independent runs, and the shaded area represents the standard error of the mean of the three replicas. The figure caption of Fig. 8b is revised accordingly.

      (ii) To further illustrate the convergence behavior of the metadynamics simulations, we have included the following supplementary files: (1). Potentials of mean force computed with different numbers of deposited Gaussians are compared. (2). As suggested by the reviewer, we show the time series of the collective variable (CV) sampled by the 24 independent walkers during one set of metadynamics simulations. These results clearly indicate that the CV exhibits diffusive behaviors between the reactant and product regions, further supporting the adequate sampling and convergence of our metadynamics simulations.

      (iii) Regarding the issue of pre-factor used in the rate estimate, we have indeed used the common approximation of kT/h as in the regular transition state theory. Many studies in the literature support the use of this expression for very localized chemical reactions in enzymes. We have included several representative references along this line: (1) M. Garcia-Viloca, J. Gao, M. Karplus, D. G. Truhlar, How enzymes work: Analysis by modern rate theory and computer simulations, Science, 303, 186-195 (2004) (2) D. R. Glowacki, J. N. Harvey, A. J. Mulholland, Taking Ockham’s razor to enzyme dynamics and catalysis, Nat. Chem. 4, 169-176 (2012)

      (iv) Regarding the nature of the rate-limiting event, please see our response to reviewer 1.

      (5) The authors should ideally make the input files for their simulations available and deposit the plumed files in for example plumed-nest (as indicated in their reference 100).

      We agree with the reviewer. Accordingly, we have uploaded the PLUMED file that we have used for the DFTB3/MM metadynamics simulations (plumed.dat) together with the MD simulation trajectories to Zenodo.

      Minor

      (1) Many of the details in Figure 10 are very small and difficult to read without zooming in. Consider whether some parts could be made larger.

      The reviewer is right. We have changed the figure to make each reaction step clearer and better recognizable.

    1. eLife assessment

      This study presents valuable findings that expand our view of dopamine release in different brain regions and show that dopamine release in the lateral hypothalamus is related to the activity of orexin neurons. The evidence supporting the claims of the authors is solid, although inclusion of tests that directly assess causality of the noble pathways would have been even more conclusive. The work will be of interest of neuroscientists who study the neural basis of motivation.

    2. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Mice can learn to associate sensory cues (sound and light) with a reward or activation of dopamine neurons in the ventral tegmental area (VTA), and then anticipate the reward from the sensory cue only. Using this paradigm, Harada et al. showed that after learning, the cue is able to induce dopamine release in the projection targets of the VTA, namely the nucleus accumbens and lateral hypothalamus (LH). Within the LH, dopamine release from VTA neurons (either by presentation of the cue or direct optical stimulation of VTA neurons) activates orexin neurons, measured as an increase in intracellular calcium levels.

      Strengths:

      This study utilized genetically encoded optical tools to selectively stimulate dopamine neurons and to monitor dopamine release in target brain areas and the calcium response of orexin neurons. This allowed a direct assessment of the relationship between the behavioral response of the animals, the release of a key neurotransmitter in select brain areas, and its effect on target cells, with a precision previously not possible. The results shed light on the mechanism underlying reward-related learning and expectation.

      Weaknesses: - The Ca increase in orexin neurons in response to optical stimulation of VTA DA neurons is convincing. However, there is an accumulated body of literature indicating that dopamine inhibits orexin neurons through D2 receptors, particularly at high concentrations both directly and indirectly (PMID 15634779, 16611835, 26036709, 30462527; but note that synaptic effects at low conc are excitatory - PMID 30462527, 26036709). There should be a clear acknowledgment of these previous studies and a discussion directly addressing the discrepancy. Furthermore, there are in-vivo studies that investigated the role of dopamine in the LH involving orexin neurons in different behavioral contexts (e.g. PMID 24236888). The statement found in the introduction "whether and how dopamine release modulates orexin neuronal activity has not been investigated vigorously" (3rd para of Introduction) is an understatement of these previous reports.

      We thank the Reviewer for pointing out that we missed several important citations. We added the references mentioned and the discrepancy of concern is addressed in the discussion section

      • Along these lines, previous reports of concentration-dependent bidirectional dopaminergic modulation of orexin neurons suggest that high and low levels of DA would affect orexin neurons differently. Is there any way to estimate the local concentration of DA released by the laser stimulation protocol used in this study? Could there be a dose dependency in the Intensity of laser stimulation and orexin neuron response?

      We agree that this is an interesting point. However, one limitation of our study, and of intensity-based genetically-encoded sensors in general, is that the estimation of the concentration is technically difficult. The sensor effectively reports changes in extra-synaptic levels of neurotransmitters, but to get the absolute value other modalities would be needed such as fast scan voltammetry. This limitation is now included in the discussion section.

      • The transient dip in DA signal during omission sessions in Fig2C (approx 1% decrease from baseline) is similar in amplitude compared to the decrease seen in non-laser trails shown in Fig 1C right panel (although the time course of the latter is unknown as the data is truncated). The authors should clarify whether those dips are a direct effect of the cue itself or indeed reward prediction error.

      Thanks for raising this important point. Indeed, there is a dip of the signal during non-stimulation trials. At day 1, the delivery of the cue triggered a dip and at day 10, there was a slight increase of the signal and followed by the dip. The data is difficult to interpret but our hypothesis is that two components trigger this dip of the signal. One is the aversiveness of the cue. Because a relatively loud sound (90dB) was used for the cue, it would not be surprising if the auditory cue was slightly aversive to the experimental animals. It has been shown that aversive stimuli induce a dip of dopamine in the NAc, although it is specific to NAc subregions. The second component is reward prediction error. Although the non-laser paired cue never triggered the laser stimulation, it is similar to the laser paired one. In a way both are composed of loud tone and same color of the visual cue (spatially different). We think it is possible that reward-related neuronal circuit was slightly activated by the non-laser paired cue. In line with this interpretation, a small increase of the signal was observed at day 10 but not day 1. If our hypothesis is true, since this signal was induced by two components, further analysis is unfortunately difficult.

      • There seem to be orexin-negative-GCaMP6 positive cells (Fig. 4B), suggesting that not all cells were phenotypically orexin+ at the time of imaging.<br /> The proportion of GCaMP6 cells that were ORX+ or negative and whether they responded differently to the stimuli should be indicated.

      While we acknowledge the observation of orexin-negative-GCaMP6 positive cells in Figure 4B, it's important to note that this phenomenon is consistent with the characteristics of the hOX-GCaMP virus used in prior experiments. The virus has undergone thorough characterization, and it has been reported to exhibit over 90% specificity, as demonstrated in prior work conducted in the laboratory of one of our contributing authors (PMID: 27546579). To address the concern raised by the reviewer, we have included Supplemental Figure 4 confirming that all mice consistently exhibited qualitatively similar hOX-GCaMP transients upon dopaminergic terminal stimulation. This additional evidence supports the reliability and specificity of our experimental approach.

      • Laser stimulation of DA neurons at the level of cell bodies (in VTA) induces an increase in DA release within the LH (Fig. 3C, D), however, there is no corresponding Ca signal in orexin neurons (Fig.4C).

      We realized that the figures were not clear and we understood that the reviewer did not see any corresponding Ca signal, but this description is not true. We now added Supplemental Figure 3 to show that there is Ca signal at day 1 already.

      In contrast, stimulating DA terminals within the LH induces a robust, long-lasting Ca signal (> 30s) in orexin neurons (Fig. 5). The initial peak is blocked by raclopride but the majority of Ca signal is insensitive to DA antagonists (please add a positive control or cite references indicating that the dose of antagonists used was sufficient; also the timing of antagonist administration should be indicated).

      This is now included in the discussion section. Also, the timing and dose of the antagonist is now described in the method section.

      Taken together, these results seem to suggest that DA does not directly increase Ca signal in orexin neurons. What could be mediating the remaining component?

      This point has been included in the discussion section.

      • Similarly, there is an elevation of Ca signal in orexin neurons that remains significantly higher after the cue/laser stimulation (Fig. 4F). It appears that it is this sustained component that is missing in omission trials. This can be analyzed further.

      It is true that there is a sustained component in stimulation trials, that is missing in omission trials. Most likely that is evoked by the stimulation of dopamine neurons. We argue that this component is isolated in Fig 5 and analyzed as much as we can.

      • Mice of both sexes were used in this study; it would be interesting to know whether sex differences were observed or not.

      We agree that this is an important point. However, our sample number is not high enough to make a meaningful comparison between male and female.

      Reviewer #2 (Public Review):

      Summary:

      This is an interesting and well-written study assessing the role of dopaminergic inputs from the VTA on orexin cell responses in an opto-pavlovian conditioning task. These data are consistent with a possible role of this system in reward expectation and are surprisingly one of the first demonstrations of a role for dopamine in this phenomenon.

      Strengths:

      The study has used an interesting opto-Pavlovian approach combined with fibre photometry.

      Weaknesses:

      It is unclear what n size was used or analysed, particularly for AUC measures e.g. Figures 1 D/E and 3 G. The number of trials reflected and the animal numbers need clarification.

      The sample size is indicated in the legend section.

      The study focused on opto-stim omissions - this work would be significantly strengthened by a comparison to a real-world examination where animals are trained for a radiation reward (food pellet).

      We agree that this would be an important experiment. This experiment is partially done in one of the contributing authors laboratories (doi.org/10.1101/2022.04.13.488195) and would be one of our follow up study.

      Have the authors considered the role of orexin in the opposing situation i.e. a surprise addition of reward?

      That would be an interesting experiment. To do that, natural reward, not optical stimulation, should be used as a reinforcer. This could be part of our follow up study.

      Similarly, there remains some conjecture regarding the role of these systems in reward and aversion - have the authors considered aversive learning paradigms - fear, or fear extinction - to further explore the roles of this system? There are some (important) discussions about the possible role of orexin in negative reinforcement. Further studies to address this could be warranted.

      It is true that dopamine also plays a significant role in aversive learning. Therefore, this would be an interesting experiment. The discussion section now includes this point.

      I think some further discussion of the work by Lineman concerning the interesting bidirectional actions of d1/d2 r signalling on glutamatergic transmission onto orexin neurons is worthwhile. While this work is currently cited, the nuance and perhaps relevance to d1 and d2 signalling could be contextualised a little more (https://doi.org/10.1152/ajpregu.00150.2018).

      Thanks for the suggestion. The discussion has been expanded.

      Reviewer #3 (Public Review):

      Summary:

      Harada and colleagues describe an interesting set of experiments characterizing the relationship between dopamine cell activity in the ventral tegmental area (VTA) and orexin neuron activity in the lateral hypothalamus (LH). All experiments are conducted in the context of an opto-Pavlovian learning task, in which a cue predicts optogenetic stimulation of VTA dopamine neurons. With training, cues that predict DA stimulation come to elicit dopamine release in LH (a similar effect is seen in accumbens). After training, omission trials (cue followed by no laser) result in a dip (inhibition) of dopamine release in LH, characteristic of reward prediction error observed in the striatum. Across cue training, the activity pattern of orexin neurons in LH mirrors that of LH DA levels. However, unlike the DA signal, orexin neurons do not exhibit a decrease in activity in omission trials. Systemic blockade of D2 but not D1 receptors blocked DA release in LH following VTA DA cell stimulation.

      Strengths: Although much work has been dedicated to examining projections from orexin cells to VTA, less has been done to characterize reciprocal projections and their function. In this way, this paper is a very important addition to the literature. The experiments are technically sound (with some limitations, below) and utilize sophisticated approaches, the manuscript is nicely written, and the conclusions are mostly reasonable based on the data collected.

      Weaknesses:

      I believe the impact of the paper could be enhanced by considering and/or addressing the following:

      Major:

      • I encourage the authors to discuss in the Introduction previous work on DA regulation of orexin neurons. In particular, the authors cite, but do not describe in any detail, the very relevant Linehan paper (2019; Am J Physiol Regul) which shows that DA differentially alters excitatory/inhibitory input onto orexin neurons and that these actions are reversed by D1 vs D2 receptor antagonists. Another paper (Bubser, 2005, EJN) showed that dopamine agonists increase the activity of orexin neurons and that these effects are blocked by D1/D2 antagonists. The current findings should be discussed in the context of these (and any other relevant) papers in the Discussion, too.

      Thanks for the valuable suggestion. This point has been integrated and the introduction and discussion sections have been revised carefully.

      • In the Discussion, the authors provide two (plausible) explanations for why they did not observe a dip in the calcium signal of orexin neurons during omission trials. Is it not possible that these cells do not encode for this type of RPE?

      We completely agree that it is possible. Now our current hypothesis is that dopamine in the LH encodes RPE and that information is transmitted to orexin neurons. Orexin neurons integrate other information and encode something else, we call it ‘multiplexed cognitive information’. It is still open question what this means exactly. This point is now mentioned in the discussion section.

      • Related to the above - I am curious about the authors' thoughts on why there is such redundancy in the system. i.e. why is dopamine doing the same thing in NAC and LH in the context of cue-reward learning?

      Thank you for the question. This is an important point, indeed. Our current hypothesis is described in the discussion section.

      ’Our data indicate that dopamine in both the NAc and LH encodes reward prediction error (RPE). One open question is the existence of such a redundant mechanism. We hypothesize that dopamine in the LH boosts dopamine release via a positive feedback loop between the orexin and dopamine systems. It has already been established that some orexin neurons project to dopaminergic neurons in the VTA, positively modulating firing. On the other hand, our data indicate that dopamine in the LH stimulates orexinergic neurons. These collective findings suggest that when either the orexin or dopamine system is activated, the other system is also activated consequently. Although the current findings align with this idea, the hypothesis should be carefully challenged and scrutinized.’

      • The data, as they stand, are largely correlative and do not indicate that DA recruitment of orexin neurons is necessary for learning to occur. It would be compelling if blocking the orexin cell recruitment affected some behavioral outcomes of learning. Similarly - does raclopride treatment across training prevent learning?

      We appreciate the insightful comment. It is indeed a limitation of our study that we lack behavioral data. However, given the extensive previous research on the crucial role of orexin in motivated behavior, we argue that establishing dopaminergic regulation of the orexin system itself is a valuable contribution. This perspective is thoroughly discussed in the dedicated section of our paper. It's important to note that the injection of D2 antagonists, including raclopride, is known to induce significant sedation. Due to this sedative effect, combining behavioral experiments with these drugs poses considerable challenges.

      • Only single doses of SCH23390 and raclopride were used. How were these selected? It would be nice to use more of a dose range to show that 1) and effect of D1R blockade was not missed, and 2) that the reduction in orexin signal with raclopride was dose-dependent.

      The rationale of the dose has been added to the discussion session. It is reported that these doses block dopamine receptors. We agree that it would be nice to have a dose-response curve, we are reluctant to increase the doses to avoid adverse effect to the experimental animals. The doses we used effectively induced hypo-locomotion, although data is not shown.

      • Fig 1C, could the effect the authors observed be due to movement?

      We argue this is unlikely. We recorded two channels one for the control and the other one for the signal. The motion-related artifact is corrected based on the control channel. One example trace around the laser stimulation is shown below. Please note that a typical motion-related artifact is a fast dip of the signal, normally observed in both 405 and 465 nm channels.

      Relatedly, what was the behavior like when the cue was on? Did mice orient/approach the cue?

      Although it has been reported that rats approach the cue (PMID: 30038277) in a similar task, it was not obvious in our case. It could be because we used both visual and auditory cues. Mice showed a general increase of locomotion during the cue and the stimulation but the direction was not clear to the experimenter.

      Also, when does the learning about the cue occur? Does it take all 10 days of learning or does this learning/cue-induced increase in dopamine signaling occur in less than 10 days?

      It is hard to say when the learning occurs. When we look at the learning curve of Figures 1,3 and 4, it seems the response to the cue plateaus at day 5 but since we don’t have behavioral data, the assessment is relayed only on the neuronal signal.

      • Also related to the above, could the observed dopamine signal be a result of just the laser turning on? It would seem important to include mice with a control sensor.

      We recorded two channels, 405 nm and 465 nm wavelength. 405 nm signal did not show increase of the signal while 465 nm signal did. The example trace is shown. Besides, the sensor has been characterized by the corresponding author already so we argue that this is unlikely.

      Author response image 1.

      Fig 1E, the effect seems to be driven by one mouse which looks like it could be a statistical outlier. The inclusion of additional animals would make these data more compelling.

      We agree that adding more mice would make data more compelling. However, considering the fact that dopamine in the accumbens has been investigated vigorously and our data is in line with the prior studies, we argue that we have enough data to claim our conclusion.

      • For Fig 1C, 3D, 3F, and 4D, could the authors please show the traces for the entire length of laser onset? It would be helpful to see both the rise and the fall of dopamine signals.

      For Fig 1C, one panel has been added. For fig 3, 4, supplemental figure was created to show the signal around laser stimulation.

      • Fig 2C, could the authors comment on how they compared the AUC to baseline? Was this comparison against zero? Because of natural hills and troughs during signals prior to cue (which may not equate to a zero), comparing the omission-induced dip to a zero may not be appropriate. A better baseline might be using the signals prior to the cue.

      The signal immediately before the cue onset was considered as a baseline, and baseline was subtracted. This means zero and baseline would be the same in our way of analysis.

      • Could the authors comment on how they came up with the 4-5.3s window to observe the AUC in Fig 3H?

      Since the kinetic of dopamine in the NAc and LH is different, different time windows have been used to observed a dip of dopamine. The analysis of the kinetics has been added.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific feedback to the authors

      • Sample size for each experiment/group could not be found.

      The sample size is now included in the legends.

      • In most figures, the timing of onset for the cue and laser stimulation is unclear. This makes the data interpretation difficult. They should be labeled as in Fig. 3C, for example.

      Panels have been updated to address this point.

      • Please provide the rationale for selecting the time range for the measurement of AUC for different experiments (e.g. Fig. 2C, 3H, 4A, 5F).

      The kinetics of dopamine in NAc and LH are different. This is now shown in the new Supplemental Figure 2. Based on this difference, the different window was chosen.

      • Fig. 1E, 3G right, 4E right: statistical analysis should use two-way repeated measures ANOVA rather than one-way ANOVA. Fig 1D, 3G left and 4E left panels can also be analyzed by two-way repeated measures ANOVA.

      We realized that those panels were redundant. Some panels have been removed and the analysis has been conducted according to this point.

      Minor comments:

      Fig. 2C can also show non-omission trials as a comparison.

      The panel has been updated.

      • The term "laser cue" is confusing, as the cue itself does not involve a laser.

      ’Laser-paired cue’ is used instead.

      • Color contrast can be improved for some figures, including Fig. 2C right, Fig. 3H right, and green and blue fluorescent fonts.

      The panels have been updated.

      • Figure legends: Tukey's test, rather than Tekey's test.

      This has been fixed.

      • There are some long-winded sentences that are hard to follow.

      Edited.

      • p.2, line 11 from bottom: should read ...the VTA evokes the release of dopamine.

      Edited

      • p.3, line 9: remove e from release.

      This has been addressed.

      Reviewer #3 (Recommendations For The Authors):

      Minor:

      • When discussing the understudied role of dopamine in brain regions other than the striatum in the Introduction, it might be helpful to cite this article: https://elifesciences.org/articles/81980 where the authors characterize dopamine in the bed nucleus of stria terminalis in associative behaviors and reward prediction error.

      The discussion session has been updated accordingly.

      • In the Discussion, it might be better to refrain from describing the results as 'measuring dopamine release' in the LH. Since there was no direct detection of dopamine release, rather a dopamine binding to the dLight receptors, referring to the detection as dopamine signaling/binding/transients is a better alternative.

      This point has been addressed.

      • In the Discussion, without measuring tonic dopamine release, it is difficult to say that there was a tonic dopamine release in the LH prior to negative RPE. In addition, I wouldn't describe the negative RPE as silencing of dopamine neurons projecting to the LH since this was not directly measured and it is hard to say for sure if the dip in dopamine is caused by silencing of the neurons. There certainly seems to be a reduction in extra-synaptic dopamine signaling in LH, however, what occurs upstream is unknown.

      We respectfully disagree with this point. In our opinion, the dopamine transient is more important than the firing of dopamine neurons because what matters for downstream neurons is dopamine concentration. For example, administration of cocaine increases the dopamine concentration extra-synaptically via blockade of DAT, while the firing of dopamine neurons go down via activation of D2 receptors expressed in dopamine neurons. Administration of cocaine is not known to induce negative RPE.

      • Typo at multiple places: 'Tekey's multiple comparison test'.

      This has been fixed.

    3. Reviewer #1 (Public Review):

      Summary:

      Mice can learn to associate sensory cues (sound and light) with a reward or activation of dopamine neurons in the ventral tegmental area (VTA), and then anticipate the reward from the sensory cue only. Using this paradigm, Harada et al. showed that after learning, the cue is able to induce dopamine release in the projection targets of the VTA, namely the nucleus accumbens and lateral hypothalamus (LH). Within the LH, dopamine release from VTA neurons (either by presentation of the cue or direct optical stimulation of VTA neurons) activates orexin neurons, measured as an increase in intracellular calcium levels.

      Strengths:

      This study utilized genetically encoded optical tools to selectively stimulate dopamine neurons and to monitor dopamine release in target brain areas and calcium response of orexin neurons. This allowed a direct assessment of the relationship between the behavioral response of the animals, release of a key neurotransmitter in select brain areas and its effect on target cells with the precision previously not possible. The results shed light onto the mechanism underlying reward-related learning and expectation.

      Weaknesses:

      Supplementary Fig.2: While the differences in time course are analyzed and extensively discussed, there is also a large discrepancy in the magnitude of change in DA levels in the two areas that is not mentioned. Specifically, DA increases is about 90-fold of baseline in NAc while it is about 2-fold in the LH. This could be because the DA level is either higher during baseline or lower during peak in the LH. Is there a known difference in the DA fiber density or extracellular DA levels in the two areas?

      The DA antagonist i.p. study (Fig.5E and suppl fig 4) appears to be repeated measurements in same animals. If so, is it possible that repeated opto-sessions result in desensitization of the response, and therefore the smaller response is not due to the antagonist? Ideally, the order of experiments (i.e. vehicle, SCH23390 and raclopride) would be randomized, and a control group should be shown where DA terminal-stimulation induces consistent response in orexin neurons when applied three times without any antagonists. The result should be assessed using one-way repeated measures ANOVA.

      Importantly, only 5 minutes were allowed for i.p. injected drugs to be absorbed and distributed to the brain before DA release was evoked and ORX neuron activity were monitored. Unfortunately, this is too short (In Ref 13, ip injection of SCH 23390 was 30 minutes prior to optogenetics/photometry experiments. In Ref 70, no effect on behavior was detected at 10 min post-i.p. injection of SCH 23390; In Ref 71, the effect of raclopride on behavior was measured 30 min post-ip injection).

      Overall, it seems premature to make a conclusion about a role for D2 receptors or lack of involvement of D1 receptors in the observed phenomenon.

      Reciprocal activation of VTA DA neurons and LH orexin neurons is an interesting idea. However, if this is the case, the activity of these two types of cells should show similar pattern and time course. This manuscript shows that extracellular DA levels decays quickly following the cessation of optical stimulation (Fig. 3B) whereas orexin neuron activity is long-lasting (Fig. 5). Thus, the hypothesis does not seem to be fully supported by experimental data.

    4. Reviewer #3 (Public Review):

      Summary:

      Harada and colleagues describe an interesting set of experiments characterizing the relationship between dopamine cell activity in ventral tegmental area (VTA) and orexin neuron activity in lateral hypothalamus (LH). All experiments are conducted in the context of an opto-Pavlovian learning task, in which a cue predicts optogenetic stimulation of VTA dopamine neurons. With training, cues that predict DA stimulation come to elicit dopamine release in LH (a similar effect is seen in accumbens). After training, omission trials (cue followed by no laser) result in a dip (inhibition) of dopamine release in LH, characteristic of reward prediction error observed in striatum. Across cue training, the activity pattern of orexin neurons in LH mirrors that of LH DA levels. However, unlike the DA signal, orexin neurons do not exhibit a decrease in activity in omission trials. Systemic blockade of D2 but not D1 receptors blocked DA release in LH following VTA DA cell stimulation.

      Strengths:

      Although much work has been dedicated to examining projections from orexin cells to VTA, less has been done to characterize reciprocal projections and their function. In this way, this paper is a very important addition to the literature. The experiments are technically sound (with some limitations, below) and utilize sophisticated approaches, the manuscript is nicely written, and the conclusions are mostly reasonable based on the data collected.

      Weaknesses:

      I believe the impact of the paper could be enhanced by considering and/or addressing the following:

      Major<br /> • I encourage the authors to discuss in the Introduction previous work on DA regulation of orexin neurons. In particular, the authors cite, but do not describe in any detail, the very relevant Linehan paper (2019; Am J Physiol Regul) which shows that DA differentially alters excitatory/inhibitory input onto orexin neurons and that these actions are reversed by D1 vs D2 receptor antagonists. Another paper (Bubser, 2005, EJN) showed that dopamine agonists increase activity of orexin neurons and that these effects are blocked by D1/D2 antagonists. The current findings should be discussed in the context of these (and any other relevant) papers in the Discussion, too.

      The revised manuscript addresses these concerns.

      • In the Discussion, the authors provide 2 (plausible) explanations for why they did not observe a dip in calcium signal of orexin neurons during omission trials. Is it not possible that these cells do not encode for this type of RPE?

      The revised manuscript addresses these concerns.

      • Related to the above - I am curious about the authors' thoughts on why there is such redundancy in the system. i.e. why is dopamine doing the same thing in NAC and LH in the context of cue-reward learning?

      The revised manuscript addresses these concerns.

      • The data, as they stand, are largely correlative and do not indicate that DA recruitment of orexin neurons is necessary for learning to occur. It would be compelling if blocking the orexin cell recruitment affected some behavioral outcome of learning. Similarly - does raclopride treatment across training prevent learning?

      I maintain that experiments testing the causality of these effects on learning/behavior would enhance the impact of the paper. However, I recognize that this would require substantial additional experimentation and the data here are interesting regardless.

      • Only single doses of SCH23390 and raclopride were used. How were these selected? It would be nice to use more of a dose range to show that 1) and effect of D1R blockade was not missed, and 2) that the reduction in orexin signal with raclopride was dose-dependent.

      Additional information on dose selection has been included - thank you. Again, these data might be more impactful if the effects of antagonists were found to be dose-dependent.

      • Fig 1C, could the effect the authors observed due to movement? Relatedly, what was the behavior like when the cue was on? Did mice orient/approach the cue? Also, when does the learning about the cue occur? Does it take all 10 days of learning or does this learning/cue-induced increase in dopamine signaling occur in less than 10 days?

      These have been addressed in the revised manuscript

      • Also related to above, could the observed dopamine signal be a result of just the laser turning on? It would seem important to include mice with a control sensor.

      The authors note that a control channel was recorded. I agree this is useful, but my concern is that the illumination of laser itself might startle the animal (promote movement), resulting in dopamine release. Showing this does not occur with the same laser in chr2-lacking vta neurons would help resolve this issue.

      • Fig 1E, the effect seems to be driven by one mouse which looks like it could be a statistical outlier. Inclusion of additional animals would make these data more compelling.

      I would still argue that these data could be strengthened by the addition of more mice. I note that the graph depicting individual data points has been removed from the revised manuscript - i would recommend re-including this figure.

      • For Fig 1C, 3D, 3F, and 4D, could the authors please show the traces for the entire length of laser onset? It would be helpful to see both the rise and the fall of dopamine signals.<br /> • Fig 2C, could the authors comment on how they compared the AUC to baseline? Was this comparison against zero? Because of natural hills and troughs during signals prior to cue (which may not equate to a zero), comparing the omission-induced dip to a zero may not be appropriate. A better baseline might be using the signals prior to the cue.<br /> • Could the authors comment on how they came up with the 4-5.3s window to observe the AUC in Fig 3H?

      These have all been addressed.

      Minor<br /> • When discussing the understudied role of dopamine in brain regions other than the striatum in the Introduction, it might be helpful to cite this article: https://elifesciences.org/articles/81980 where the authors characterize dopamine in the bed nucleus of stria terminalis in associative behaviors and reward prediction error.<br /> • In Discussion, it might be better to refrain from describing the results as 'measuring dopamine release' in the LH. Since there was no direct detection of dopamine release, rather dopamine binding to the dLight receptors, referring to the detection as dopamine signaling/binding/transients is a better alternative.<br /> • In Discussion, without measuring tonic dopamine release, it is difficult to say that there was a tonic dopamine release in the LH prior to negative RPE. In addition, I wouldn't describe the negative RPE as silencing of dopamine neurons projecting to the LH since this was not directly measured and it is hard to say for sure if the dip in dopamine is caused by silencing of the neurons. There certainly seems to be a reduction in extrasynaptic dopamine signaling in LH, however what occurs upstream is unknown.<br /> • Typo at multiple places: 'Tekey's multiple comparison test'.

      These have been addressed.

    1. eLife assessment

      This manuscript claims to have found evidence for coordinated membrane potential oscillations in E. coli biofilms that can be linked to a putative K+ channel and that may serve to enhance photo-protection. The finding of waves of membrane potential would be of interest to a wide audience from molecular biology to microbiology and physical biology. Unfortunately, a major issue with the experimental technique affects the interpretation of the observations: the dye used has been previously shown not to report membrane potential, leaving the evidence inadequate.

    2. Reviewer #1 (Public Review):

      Summary:<br /> Cell-to-cell communication is essential for higher functions in bacterial biofilms. Electrical signals have proven effective in transmitting signals across biofilms. These signals are then used to coordinate cellular metabolisms or to increase antibiotic tolerance. Here, the authors have reported for the first time coordinated oscillation of membrane potential in E. coli biofilms that may have a functional role in photoprotection.

      Strengths:<br /> - The authors report original data.<br /> - For the first time, they showed that coordinated oscillations in membrane potential occur in E. Coli biofilms.<br /> - The authors revealed a complex two-phase dynamic involving distinct molecular response mechanisms.<br /> - The authors developed two rigorous models inspired by 1) Hodgkin-Huxley model for the temporal dynamics of membrane potential and 2) Fire-Diffuse-Fire model for the propagation of the electric signal.<br /> - Since its discovery by comparative genomics, the Kch ion channel has not been associated with any specific phenotype in E. coli. Here, the authors proposed a functional role for the putative K+ Kch channel : enhancing survival under photo-toxic conditions.

      Weaknesses:<br /> - Since the flow of fresh medium is stopped at the beginning of the acquisition, environmental parameters such as pH and RedOx potential are likely to vary significantly during the experiment. It is therefore important to exclude the contributions of these variations to ensure that the electrical response is only induced by light stimulation. Unfortunately, no control experiments were carried out to address this issue.<br /> - Furthermore, the control parameter of the experiment (light stimulation) is the same as that used to measure the electrical response, i.e. through fluorescence excitation. The use of the PROPS system could solve this problem.<br /> - Electrical signal propagation is an important aspect of the manuscript. However, a detailed quantitative analysis of the spatial dynamics within the biofilm is lacking. In addition, it is unclear if the electrical signal propagates within the biofilm during the second peak regime, which is mediated by the Kch channel. This is an important question, given that the fire-diffuse-fire model is presented with emphasis on the role of K+ ions.<br /> - Since deletion of the kch gene inhibits the long-term electrical response to light stimulation (regime II), the authors concluded that K+ ions play a role in the habituation response. However, Kch is a putative K+ ion channel. The use of specific drugs could help to clarify the role of K+ ions.<br /> - The manuscript as such does not allow us to properly conclude on the photo-protective role of the Kch ion channel.<br /> - The link between membrane potential dynamics and mechanosensitivity is not captured in the equation for the Q-channel opening dynamics in the Hodgkin-Huxley model (Supp Eq 2).<br /> - Given the large number of parameters used in the models, it is hard to distinguish between prediction and fitting.

    3. Reviewer #2 (Public Review):

      Summary of what the authors were trying to achieve:<br /> The authors thought they studied membrane potential dynamics in E.coli biofilms. They thought so because they were unaware that the dye they used to report that membrane potential in E.coli, has been previously shown not to report it. Because of this, the interpretation of the authors' results is not accurate.

      Major strengths and weaknesses of the methods and results:<br /> The strength of this work is that all the data is presented clearly, and accurately, as far as I can tell.

      The major critical weakness of this paper is the use of ThT dye as a membrane potential dye in E.coli. The work is unaware of a publication from 2020 https://www.sciencedirect.com/science/article/pii/S0006349519308793 that demonstrates that ThT is not a membrane potential dye in E. coli. Therefore I think the results of this paper are misinterpreted. The same publication I reference above presents a protocol on how to carefully calibrate any candidate membrane potential dye in any given condition.

      I now go over each results section in the manuscript.

      Result section 1: Blue light triggers electrical spiking in single E. coli cells

      I do not think the title of the result section is correct for the following reasons. The above-referenced work demonstrates the loading profile one should expect from a Nernstian dye (Figure 1). It also demonstrates that ThT does not show that profile and explains why is this so. ThT only permeates the membrane under light exposure (Figure 5). This finding is consistent with blue light peroxidising the membrane (see also following work Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 on light-induced damage to the electrochemical gradient of protons-I am sure there are more references for this).

      Please note that the loading profile (only observed under light) in the current manuscript in Figure 1B as well as in the video S1 is identical to that in Figure 3 from the above-referenced paper (i.e. https://www.sciencedirect.com/science/article/pii/S0006349519308793), and corresponding videos S3 and S4. This kind of profile is exactly what one would expect theoretically if the light is simultaneously lowering the membrane potential as the ThT is equilibrating, see Figure S12 of that previous work. There, it is also demonstrated by the means of monitoring the speed of bacterial flagellar motor that the electrochemical gradient of protons is being lowered by the light. The authors state that applying the blue light for different time periods and over different time scales did not change the peak profile. This is expected if the light is lowering the electrochemical gradient of protons. But, in Figure S1, it is clear that it affected the timing of the peak, which is again expected, because the light affects the timing of the decay, and thus of the decay profile of the electrochemical gradient of protons (Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923).

      If find Figure S1D interesting. There authors load TMRM, which is a membrane voltage dye that has been used extensively (as far as I am aware this is the first reference for that and it has not been cited https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1914430/). As visible from the last TMRM reference I give, TMRM will only load the cells in Potassium Phosphate buffer with NaCl (and often we used EDTA to permeabilise the membrane). It is not fully clear (to me) whether here TMRM was prepared in rich media (it explicitly says so for ThT in Methods but not for TMRM), but it seems so. If this is the case, it likely also loads because of the damage to the membrane done with light, and therefore I am not surprised that the profiles are similar.

      The authors then use CCCP. First, a small correction, as the authors state that it quenches membrane potential. CCCP is a protonophore (https://pubmed.ncbi.nlm.nih.gov/4962086/), so it collapses electrochemical gradient of protons. This means that it is possible, and this will depend on the type of pumps present in the cell, that CCCP collapses electrochemical gradient of protons, but the membrane potential is equal and opposite in sign to the DeltapH. So using CCCP does not automatically mean membrane potential will collapse (e.g. in some mammalian cells it does not need to be the case, but in E.coli it is https://www.biorxiv.org/content/10.1101/2021.11.19.469321v2). CCCP has also been recently found to be a substrate for TolC (https://journals.asm.org/doi/10.1128/mbio.00676-21), but at the concentrations the authors are using CCCP (100uM) that should not affect the results. However, the authors then state because they observed, in Figure S1E, a fast efflux of ions in all cells and no spiking dynamics this confirms that observed dynamics are membrane potential related. I do not agree that it does. First, Figure S1E, does not appear to show transients, instead, it is visible that after 50min treatment with 100uM CCCP, ThT dye shows no dynamics. The action of a Nernstian dye is defined. It is not sufficient that a charged molecule is affected in some way by electrical potential, this needs to be in a very specific way to be a Nernstian dye. Part of the profile of ThT loading observed in https://www.sciencedirect.com/science/article/pii/S0006349519308793 is membrane potential related, but not in a way that is characteristic of Nernstian dye.

      Result section 2: Membrane potential dynamics depend on the intercellular distance

      In this chapter, the authors report that the time to reach the first intensity peak during ThT loading is different when cells are in microclusters. They interpret this as electrical signaling in clusters because the peak is reached faster in microclusters (as opposed to slower because intuitively in these clusters cells could be shielded from light). However, shielding is one possibility. The other is that the membrane has changed in composition and/or the effective light power the cells can tolerate (with mechanisms to handle light-induced damage, some of which authors mention later in the paper) is lower. Given that these cells were left in a microfluidic chamber for 2h hours to attach in growth media according to Methods, there is sufficient time for that to happen. In Figure S12 C and D of that same paper from my group (https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf) one can see the effects of peak intensity and timing of the peak on the permeability of the membrane. Therefore I do not think the distance is the explanation for what authors observe.

      Result section 3: Emergence of synchronized global wavefronts in E. coli biofilms

      In this section, the authors exposed a mature biofilm to blue light. They observe that the intensity peak is reached faster in the cells in the middle. They interpret this as the ion-channel-mediated wavefronts moved from the center of the biofilm. As above, cells in the middle can have different membrane permeability to those at the periphery, and probably even more importantly, there is no light profile shown anywhere in SI/Methods. I could be wrong, but the SI3 A profile is consistent with a potential Gaussian beam profile visible in the field of view. In Methods, I find the light source for the blue light and the type of microscope but no comments on how 'flat' the illumination is across their field of view. This is critical to assess what they are observing in this result section. I do find it interesting that the ThT intensity collapsed from the edges of the biofilms. In the publication I mentioned https://www.sciencedirect.com/science/article/pii/S0006349519308793#app2, the collapse of fluorescence was not understood (other than it is not membrane potential related). It was observed in Figure 5A, C, and F, that at the point of peak, electrochemical gradient of protons is already collapsed, and that at the point of peak cell expands and cytoplasmic content leaks out. This means that this part of the ThT curve is not membrane potential related. The authors see that after the first peak collapsed there is a period of time where ThT does not stain the cells and then it starts again. If after the first peak the cellular content leaks, as we have observed, then staining that occurs much later could be simply staining of cytoplasmic positively charged content, and the timing of that depends on the dynamics of cytoplasmic content leakage (we observed this to be happening over 2h in individual cells). ThT is also a non-specific amyloid dye, and in starving E. coli cells formation of protein clusters has been observed (https://pubmed.ncbi.nlm.nih.gov/30472191/), so such cytoplasmic staining seems possible.

      Finally, I note that authors observe biofilms of different shapes and sizes and state that they observe similar intensity profiles, which could mean that my comment on 'flatness' of the field of view above is not a concern. However, the scale bar in Figure 2A is not legible, so I can't compare it to the variation of sizes of the biofilms in Figure 2C (67 to 280um). Based on this, I think that the illumination profile is still a concern.

      Result section 4: Voltage-gated Kch potassium channels mediate ion-channel electrical oscillations in E. coli

      First I note at this point, given that I disagree that the data presented thus 'suggest that E. coli biofilms use electrical signaling to coordinate long-range responses to light stress' as the authors state, it gets harder to comment on the rest of the results.

      In this result section the authors look at the effect of Kch, a putative voltage-gated potassium channel, on ThT profile in E. coli cells. And they see a difference. It is worth noting that in the publication https://www.sciencedirect.com/science/article/pii/S0006349519308793 it is found that ThT is also likely a substrate for TolC (Figure 4), but that scenario could not be distinguished from the one where TolC mutant has a different membrane permeability (and there is a publication that suggests the latter is happening https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2958.2010.07245.x). Given this, it is also possible that Kch deletion affects the membrane permeability. I do note that in video S4 I seem to see more of, what appear to be, plasmolysed cells. The authors do not see the ThT intensity with this mutant that appears long after the initial peak has disappeared, as they see in WT. It is not clear how long they waited for this, as from Figure S3C it could simply be that the dynamics of this is a lot slower, e.g. Kch deletion changes membrane permeability.

      The authors themselves state that the evidence for Kch being a voltage-gated channel is indirect (line 54). I do not think there is a need to claim function from a ThT profile of E. coli mutants (nor do I believe it's good practice), given how accurate single-channel recordings are currently. To know the exact dependency on the membrane potential, ion channel recordings on this protein are needed first.

      Result section 5: Blue light influences ion-channel mediated membrane potential events in E. coli

      In this chapter the authors vary the light intensity and stain the cells with PI (this dye gets into the cells when the membrane becomes very permeable), and the extracellular environment with K+ dye (I have not yet worked carefully with this dye). They find that different amounts of light influence ThT dynamics. This is in line with previous literature (both papers I have been mentioning: Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 and https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf especially SI12), but does not add anything new. I think the results presented here can be explained with previously published theory and do not indicate that the ion-channel mediated membrane potential dynamics is a light stress relief process.

      Result section 6: Development of a Hodgkin-Huxley model for the observed membrane potential dynamics

      This results section starts with the authors stating: 'our data provide evidence that E. coli manages light stress through well-controlled modulation of its membrane potential dynamics'. As stated above, I think they are instead observing the process of ThT loading while the light is damaging the membrane and thus simultaneously collapsing the electrochemical gradient of protons. As stated above, this has been modelled before. And then, they observe a ThT staining that is independent from membrane potential.

      I will briefly comment on the Hodgkin Huxley (HH) based model. First, I think there is no evidence for two channels with different activation profiles as authors propose. But also, the HH model has been developed for neurons. There, the leakage and the pumping fluxes are both described by a constant representing conductivity, times the difference between the membrane potential and Nernst potential for the given ion. The conductivity in the model is given as gK*n^4 for potassium, gNa*m^3*h sodium, and gL for leakage, where gK, gNa and gL were measured experimentally for neurons. And, n, m, and h are variables that describe the experimentally observed voltage-gated mechanism of neuronal sodium and potassium channels. (Please see Hodgkin AL, Huxley AF. 1952. Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo. J. Physiol. 116:449-72 and Hodgkin AL, Huxley AF. 1952. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117:500-44).

      Thus, in applying the model to describe bacterial electrophysiology one should ensure near equilibrium requirement holds (so that (V-VQ) etc terms in authors' equation Figure 5 B hold), and potassium and other channels in a given bacterium have similar gating properties to those found in neurons. I am not aware of such measurements in any bacteria, and therefore think the pump leak model of the electrophysiology of bacteria needs to start with fluxes that are more general (for example Keener JP, Sneyd J. 2009. Mathematical physiology: I: Cellular physiology. New York: Springer or https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000144)

      Result section 7: Mechanosensitive ion channels (MS) are vital for the first hyperpolarization event in E. coli.

      The results that Mcs channels affect the profile of ThT dye are interesting. It is again possible that the membrane permeability of these mutants has changed and therefore the dynamics have changed, so this needs to be checked first. I also note that our results show that the peak of ThT coincides with cell expansion. For this to be understood a model is needed that also takes into account the link between maintenance of electrochemical gradients of ions in the cell and osmotic pressure.

      A side note is that the authors state that the Msc responds to stress-related voltage changes. I think this is an overstatement. Mscs respond to predominantly membrane tension and are mostly nonspecific (see how their action recovers cellular volume in this publication https://www.pnas.org/doi/full/10.1073/pnas.1522185113). Authors cite references 35-39 to support this statement. These publications still state that these channels are predominantly membrane tension-gated. Some of the references state that the presence of external ions is important for tension-related gating but sometimes they gate spontaneously in the presence of certain ions. Other publications cited don't really look at gating with respect to ions (39 is on clustering). This is why I think the statement is somewhat misleading.

      Result section 8: Anomalous ion-channel-mediated wavefronts propagate light stress signals in 3D E. coli biofilms.

      I am not commenting on this result section, as it would only be applicable if ThT was membrane potential dye in E. coli.

      Aims achieved/results support their conclusions:

      The authors clearly present their data. I am convinced that they have accurately presented everything they observed. However, I think their interpretation of the data and conclusions is inaccurate in line with the discussion I provided above.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      Any other comments:

      I note, that while this work studies E. coli, it references papers in other bacteria using ThT. For example, in lines 35-36 authors state that bacteria (Bacillus subtilis in this case) in biofilms have been recently found to modulate membrane potential citing the relevant literature from 2015. It is worth noting that the most recent paper https://journals.asm.org/doi/10.1128/mbio.02220-23 found that ThT binds to one or more proteins in the spore coat, suggesting that it does not act as a membrane potential in Bacillus spores. It is possible that it still reports membrane potential in Bacillus cells and the recent results are strictly spore-specific, but these should be kept in mind when using ThT with Bacillus.

    4. Reviewer #3 (Public Review):

      It has recently been demonstrated that bacteria in biofilms show changes in membrane potential in response to changes in their environment, and that these can propagate signals through the biofilm to coordinate bacterial behavior. Akabuogu et al. contribute to this exciting research area with a study of blue light-induced membrane potential dynamics in E. coli biofilms. They demonstrate that Thioflavin-T (ThT) intensity (a proxy for membrane potential) displays multiphasic dynamics in response to blue light treatment. They additionally use genetic manipulations to implicate the potassium channel Kch in the latter part of these dynamics. Mechanosensitive ion channels may also be involved, although these channels seem to have blue light-independent effects on membrane potential as well. In addition, there are challenges to the quantitative interpretation of ThT microscopy data which require consideration. The authors then explore whether these dynamics are involved in signaling at the community level. The authors suggest that cell firing is both more coordinated when cells are clustered and happens in waves in larger, 3D biofilms; however, in both cases evidence for these claims is incomplete. The authors present two simulations to describe the ThT data. The first of these simulations, a Hodgkin-Huxley model, indicates that the data are consistent with the activity of two ion channels with different kinetics; the Kch channel mutant, which ablates a specific portion of the response curve, is consistent with this. The second model is a fire-diffuse-fire model to describe wavefront propagation of membrane potential changes in a 3D biofilm; because the wavefront data are not presented clearly, the results of this model are difficult to interpret. Finally, the authors discuss whether these membrane potential changes could be involved in generating a protective response to blue light exposure; increased death in a Kch ion channel mutant upon blue light exposure suggests that this may be the case, but a no-light control is needed to clarify this.

      In a few instances, the paper is missing key control experiments that are important to the interpretation of the data. This makes it difficult to judge the meaning of some of the presented experiments.

      (1) An additional control for the effects of autofluorescence is very important. The authors conduct an experiment where they treat cells with CCCP and see that Thioflavin-T (ThT) dynamics do not change over the course of the experiment. They suggest that this demonstrates that autofluorescence does not impact their measurements. However, cellular autofluorescence depends on the physiological state of the cell, which is impacted by CCCP treatment. A much simpler and more direct experiment would be to repeat the measurement in the absence of ThT or any other stain. This experiment should be performed both in the wild-type strain and in the ∆kch mutant.

      (2) The effects of photobleaching should be considered. Of course, the intensity varies a lot over the course of the experiment in a way that photobleaching alone cannot explain. However, photobleaching can still contribute to the kinetics observed. Photobleaching can be assessed by changing the intensity, duration, or frequency of exposure to excitation light during the experiment. Considerations about photobleaching become particularly important when considering the effect of catalase on ThT intensity. The authors find that the decrease in ThT signal after the initial "spike" is attenuated by the addition of catalase; this is what would be predicted by catalase protecting ThT from photobleaching (indeed, catalase can be used to reduce photobleaching in time lapse imaging).

      (3) It would be helpful to have a baseline of membrane potential fluctuations in the absence of the proposed stimulus (in this case, blue light). Including traces of membrane potential recorded without light present would help support the claim that these changes in membrane potential represent a blue light-specific stress response, as the authors suggest. Of course, ThT is blue, so if the excitation light for ThT is problematic for this experiment the alternative dye tetramethylrhodamine methyl ester perchlorate (TMRM) can be used instead.

      (4) The effects of ThT in combination with blue light should be more carefully considered. In mitochondria, a combination of high concentrations of blue light and ThT leads to disruption of the PMF (Skates et al. 2021 BioRXiv), and similarly, ThT treatment enhances the photodynamic effects of blue light in E. coli (Bondia et al. 2021 Chemical Communications). If present in this experiment, this effect could confound the interpretation of the PMF dynamics reported in the paper.

      (5) Figures 4D - E indicate that a ∆kch mutant has increased propidium iodide (PI) staining in the presence of blue light; this is interpreted to mean that Kch-mediated membrane potential dynamics help protect cells from blue light. However, Live/Dead staining results in these strains in the absence of blue light are not reported. This means that the possibility that the ∆kch mutant has a general decrease in survival (independent of any effects of blue light) cannot be ruled out.

      (6) Additionally in Figures 4D - E, the interpretation of this experiment can be confounded by the fact that PI uptake can sometimes be seen in bacterial cells with high membrane potential (Kirchhoff & Cypionka 2017 J Microbial Methods); the interpretation is that high membrane potential can lead to increased PI permeability. Because the membrane potential is largely higher throughout blue light treatment in the ∆kch mutant (Fig. 3AB), this complicates the interpretation of this experiment.

      Throughout the paper, many ThT intensity traces are compared, and described as "similar" or "dissimilar", without detailed discussion or a clear standard for comparison. For example, the two membrane potential curves in Fig. S1C are described as "similar" although they have very different shapes, whereas the curves in Fig. 1B and 1D are discussed in terms of their differences although they are evidently much more similar to one another. Without metrics or statistics to compare these curves, it is hard to interpret these claims. These comparative interpretations are additionally challenging because many of the figures in which average trace data are presented do not indicate standard deviation.

      The differences between the TMRM and ThT curves that the authors show in Fig. S1C warrant further consideration. Some of the key features of the response in the ThT curve (on which much of the modeling work in the paper relies) are not very apparent in the TMRM data. It is not obvious to me which of these traces will be more representative of the actual underlying membrane potential dynamics.

      A key claim in this paper (that dynamics of firing differ depending on whether cells are alone or in a colony) is underpinned by "time-to-first peak" analysis, but there are some challenges in interpreting these results. The authors report an average time-to-first peak of 7.34 min for the data in Figure 1B, but the average curve in Figure 1B peaks earlier than this. In Figure 1E, it appears that there are a handful of outliers in the "sparse cell" condition that likely explain this discrepancy. Either an outlier analysis should be done and the mean recomputed accordingly, or a more outlier-robust method like the median should be used instead. Then, a statistical comparison of these results will indicate whether there is a significant difference between them.

      In two different 3D biofilm experiments, the authors report the propagation of wavefronts of membrane potential; I am unable to discern these wavefronts in the imaging data, and they are not clearly demonstrated by analysis.

      The first data set is presented in Figures 2A, 2B, and Video S3. The images and video are very difficult to interpret because of how the images have been scaled: the center of the biofilm is highly saturated, and the zero value has also been set too high to consistently observe the single cells surrounding the biofilm. With the images scaled this way, it is very difficult to assess dynamics. The time stamps in Video S3 and on the panels in Figure 2A also do not correspond to one another although the same biofilm is shown (and the time course in 2B is also different from what is indicated in 2B). In either case, it appears that the center of the biofilm is consistently brighter than the edges, and the intensity of all cells in the biofilm increases in tandem; by eye, propagating wavefronts (either directed toward the edge or the center) are not evident to me. Increased brightness at the center of the biofilm could be explained by increased cell thickness there (as is typical in this type of biofilm). From the image legend, it is not clear whether the image presented is a single confocal slice or a projection. Even if this is a single confocal slice, in both Video S3 and Figure 2A there are regions of "haze" from out-of-focus light evident, suggesting that light from other focal planes is nonetheless present. This seems to me to be a simpler explanation for the fluorescence dynamics observed in this experiment: cells are all following the same trajectory that corresponds to that seen for single cells, and the center is brighter because of increased biofilm thickness.

      The second data set is presented in Video S6B; I am similarly unable to see any wave propagation in this video. I observe only a consistent decrease in fluorescence intensity throughout the experiment that is spatially uniform (except for the bright, dynamic cells near the top; these presumably represent cells that are floating in the microfluidic and have newly arrived to the imaging region).

      3D imaging data can be difficult to interpret by eye, so it would perhaps be more helpful to demonstrate these propagating wavefronts by analysis; however, such analysis is not presented in a clear way. The legend in Figure 2B mentions a "wavefront trace", but there is no position information included - this trace instead seems to represent the average intensity trace of all cells. To demonstrate the propagation of a wavefront, this analysis should be shown for different subpopulations of cells at different positions from the center of the biofilm. Data is shown in Figure 8 that reflects the velocity of the wavefront as a function of biofilm position; however, because the wavefronts themselves are not evident in the data, it is difficult to interpret this analysis. The methods section additionally does not contain sufficient information about what these velocities represent and how they are calculated. Because of this, it is difficult for me to evaluate the section of the paper pertaining to wave propagation and the predicted biofilm critical size.

      There are some instances in the paper where claims are made that do not have data shown or are not evident in the cited data:

      (1) In the first results section, "When CCCP was added, we observed a fast efflux of ions in all cells"- the data figure pertaining to this experiment is in Fig. S1E, which does not show any ion efflux. The methods section does not mention how ion efflux was measured during CCCP treatment.

      (2) In the discussion of voltage-gated calcium channels, the authors refer to "spiking events", but these are not obvious in Figure S3E. Although the fluorescence intensity changes over time, it's hard to distinguish these fluctuations from measurement noise; a no-light control could help clarify this.

      (3) The authors state that the membrane potential dynamics simulated in Figure 7B are similar to those observed in 3D biofilms in Fig. S4B; however, the second peak is not clearly evident in Fig. S4B and it looks very different for the mature biofilm data reported in Fig. 2. I have some additional confusion about this data specifically: in the intensity trace shown in Fig. S4B, the intensity in the second frame is much higher than the first; this is not evident in Video S6B, in which the highest intensity is in the first frame at time 0. Similarly, the graph indicates that the intensity at 60 minutes is higher than the intensity at 4 minutes, but this is not the case in Fig. S4A or Video S6B.

    5. Author Response:

      We would like to sincerely thank the referees and the editor for their time in considering our manuscript. The electrophysiology of bacteria is a fast-moving complex

      field and is proving contentious in places. We believe the peer review process of eLife provides an ideal mechanism to address the issues raised on our manuscript in an open and transparent manner. Hopefully we will encourage some more consensus in the field and help understand some of the inconsistencies in the current literature that are

      hampering progress.

      The editors stress the main issue raised was a single referee questioning the use of ThT as an indicator of membrane potential. We are well aware of the articles by the Pilizota group and we believe them to be scientifically flawed. The authors assume there are no voltage-gated ion channels in E. coli and then attempt to explain motility

      data based on a simple Nernstian battery model (they assume E. coli are unexcitable matter). This in turn leads them to conclude the membrane dye ThT is faulty, when in

      fact it is a problem with their simple battery model.

      In terms of the previous microbiology literature, the assumption of no voltage-gated ion channels in E. coli suggested by referee 2 is a highly contentious niche ideology. The majority of gene databases for E. coli have a number of ion-channels annotated as voltage sensitive due to comparative genetics studies e.g. try the https://bacteria.ensembl.org/ database (the search terms ‘voltage-gated coli’ give 2521 hits for genes, similarly you could check www.uniprot.org or www.biocyc.org) and M.M.Kuo, Y.Saimi, C.Kung, ‘Gain of function mutation indicate that E. coli Kch form a functional K + conduit in vivo’, EMBO Journal, 2003, 22, 16, 4049. Furthermore, recent microbiology reviews all agree that E. coli has a number of voltage-gated ion channels S.D.Beagle, S.W.Lockless, ‘Unappreciated roles for K + channels in bacterial physiology’,Trends in microbiology, 2021, 29, 10, 942-950. More emphatic experimental data is seen in spiking potentials that have been observed by many groups for E. coli, both directly using microelectrodes and indirectly using genetically expressed fluorophores, ‘Electrical spiking in bacterial biofilms’ E.Masi et al, Journal of the Royal Society Interface, 2015, 12, 102, ‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, J.M.Kralj, et al, Science, 2011, 333, 6040, 345 and ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 2023, 120, 3, e2208348120. The only mechanism currently known to cause spiking potentials in cells is due to positive feedback from voltage-gated ion channels (you need a mechanism to induce the oscillations). Indeed, people are starting to investigate the specific voltage-gated ion channels in E. coli and a role is emerging for calcium in addition to potassium e.g. ‘Genome-wide functional screen for calcium transients in E. coli identifies increased membrane potential adaptation to persistent DNA damage’, R.Luder, et al, J.Bacteriology, 2021, 203, 3, e00509.

      In terms of recent data from our own group, electrical impedance spectroscopy (EIS) experiments from E. coli indicate there are large conductivity changes associated with the Kch ion channels (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior',

      E.Akabuogu et al, ACS Nanoletters, 2024, in print). EIS experiments pr be the electrical phenomena of bacterial biofilms directly and do not depend on fluorophores i.e. they can’t be affected by ThT.

      Attempts to disprove the use of ThT to measure hyperpolarisation phenomena in E. coli using fluorescence microscopy also seem doomed to failure based on comparative control experiments. A wide range of other cationic fluorophores show similar behaviour to ThT e.g. the potassium sensitive dye used in our eLife article. Thus the behaviour of ThT appears to be generic for a range of cationic dyes and it implies a simple physical mechanism i.e. the positively charged dyes enter cells at low potentials. The elaborate photobleaching mechanism postulated by referee 2 seems most unlikely and is unable to explain our data (see below). ThT is photostable and chemically well- defined and it is therefore used almost universally in fluorescence assays for amyloids.

      A challenge with trying to use flagellar motility to measure intracellular potentials in live bacteria, as per referee 2’s many publications, is that a clutch is known to occur with E. coli e.g. ‘Flagellar brake protein YcgR interacts with motor proteins MotA and FliG to regulate the flagellar rotation speed and direction’, Q.Han et al, Frontiers in Microbiology, 2023, 14. Thus bacteria with high membrane potentials can have low motility when their clutch is engaged. This makes sense, since otherwise bacterial motility would be enslaved to their membrane potentials, greatly restricting their ability to react to their environmental conditions. Without quantifying the dynamics of the clutch (e.g. the gene circuit) it seems challenging to deduce how the motor reacts to Nernstian potentials in vivo. As a result we are not convinced by any of the Pilizota group articles. The quantitative connection between motility and membrane potential is too tenuous.

      In conclusion, the articles questioning the use of ThT are scientifically flawed and based on a niche ideology that E. coli do not contain voltage-gated ion channels. The current work disproves the simple Nernstian battery (SNB) model expounded by Pilizota et al, unpersuasively represented in multiple publications by this one group in the literature (see below for critical synopses) and demonstrates the SNB models needs to be replaced by a model that includes excitability (demonstrating hyperpolarization of the membrane potential).

      In the language of physics, a non-linear oscillator model is needed to explain spiking potentials in bacteria and the simple battery models presented by Pilizota et al do not have the required non-linearities to oscillate (‘Nonlinear dynamics and chaos’, Steve Strogatz, Westview Press, 2014). Such non-linear models are the foundation for describing eukaryotic electrophysiology, e.g. Hodgkin and Huxley’s Nobel prize winning research (1963), but also the vast majority of modern extensions (‘Mathematical physiology’, J.Keener, J.Sneyd, Springer, 2009, ‘Cellular biophysis and modelling: a primer on the computational biology of excitable cells’, G.C.Smith, 2019, CUP, ‘Dynamical systems in neuroscience: the geometry of excitability and bursting’, E.M.Izhikevich, 2006, MIT and ‘Neuronal dynamics: from single neurons to networks and models of cognition’, W.Gerstner et al, 2014, CUP). The Pilizota group is using modelling tools from the 1930s that quickly were shown to be inadequate to describe eukaryotic cellular electrophysiology and the same is true for bacterial electrophysiology (see the ground breaking work of A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 7576, 59 for the use of Hodgkin-Huxley models with bacterial biofilms). Below we describe a critical synopsis of the articles cited by referee 2 and we then directly answer the specific points all the

      referees raise.

      Critical synopsis of the articles cited by referee 2:

      1) ‘Generalized workflow for characterization of Nernstian dyes and their effects on bacterial physiology’, L.Mancini et al, Biophysical Journal, 2020, 118, 1, 4-14.

      This is the central article used by referee 2 to argue that there are issues with the calibration of ThT for the measurement of membrane potentials. The authors use a simple Nernstian battery (SNB) model and unfortunately it is wrong when voltage-gated ion channels occur. Huge oscillations occur in the membrane potentials of E. coli that cannot be described by the SNB model. Instead a Hodgkin Huxley model is needed, as shown in our eLife manuscript and multiple other studies (see above). Arrhenius kinetics are assumed in the SNB model for pumping with no real evidence and the generalized workflow involves ripping the flagella off the bacteria! The authors construct an elaborate ‘work flow’ to insure their ThT results can be interpreted using their erroneous SNB model over a limited range of parameters.

      2) ‘Non-equivalence of membrane voltage and ion-gradient as driving forces for the bacterial flagellar motor at low load’, C.J.Lo, et al, Biophysical Journal, 2007, 93, 1, 294.

      An odd de novo chimeric species is developed using an E. coli chassis which uses Na + instead of H + for the motility of its flagellar motor. It is not clear the relevance to wild type E. coli, due to the massive physiological perturbations involved. A SNB model is using to fit the data over a very limited parameter range with all the concomitant errors.

      3) Single-cell bacterial electrophysiology reveals mechanisms of stress-induced damage’, E.Krasnopeeva, et al, Biophysical Journal, 2019, 116, 2390.

      The abstract says ‘PMF defines the physiological state of the cell’. This statement is hyperbolic. An extremely wide range of molecules contribute to the physiological state of a cell. PMF does not even define the electrophysiology of the cell e.g. via the membrane potential. There are 0.2 M of K + compared with 0.0000001 M of H + in E. coli, so K + is arguably a million times more important for the membrane potential than H + and thus the electrophysiology! Equation (1) in the manuscript assumes no other ions are exchanged during the experiments other than H + . This is a very bad approximation when voltage-gated potassium ion channels move the majority ion (K + ) around! In our model Figure 4A is better explained by depolarisation due to K + channels closing than direct irreversible photodamage. Why does the THT fluorescence increase again for the second hyperpolarization event if the THT is supposed to be damaged? It does not make sense.

      4) ‘The proton motive force determines E. coli robustness to extracellular pH’, G.Terradot et al, 2024, preprint.

      This article expounds the SNB model once more. It still ignores the voltage-gated ion channels. Furthermore, it ignores the effect of the dominant ion in E. coli, K + . The manuscript is incorrect as a result and I would not recommend publication. In general, an important problem is being researched i.e. how the membrane potential of E. coli is related to motility, but there are serious flaws in the SNB approach and the experimental methodology appears tenuous.

      Answers to specific questions raised by the referees:

      Reviewer #1:

      Summary:<br /> Cell-to-cell communication is essential for higher functions in bacterial biofilms. Electrical signals have proven effective in transmitting signals across biofilms. These signals are then used to coordinate cellular metabolisms or to increase antibiotic tolerance. Here, the authors have reported for the first time coordinated oscillation of membrane potential in E. coli biofilms that may have a functional role in photoprotection.

      Strengths:<br /> - The authors report original data.<br /> - For the first time, they showed that coordinated oscillations in membrane potential occur in E. Coli biofilms.<br /> - The authors revealed a complex two-phase dynamic involving distinct molecular response mechanisms.<br /> - The authors developed two rigorous models inspired by 1) Hodgkin-Huxley model for the temporal dynamics of membrane potential and 2) Fire-Diffuse-Fire model for the propagation of the electric signal.<br /> - Since its discovery by comparative genomics, the Kch ion channel has not been associated with any specific phenotype in E. coli. Here, the authors proposed a functional role for the putative K+ Kch channel : enhancing survival under photo-toxic conditions.

      We thank the referee for their positive evaluations and agree with these statements.

      Weaknesses:<br /> - Since the flow of fresh medium is stopped at the beginning of the acquisition, environmental parameters such as pH and RedOx potential are likely to vary significantly during the experiment. It is therefore important to exclude the contributions of these variations to ensure that the electrical response is only induced by light stimulation. Unfortunately, no control experiments were carried out to address this issue.

      The electrical responses occur almost instantaneously when the stimulation with blue light begins i.e. it is too fast to be a build of pH. We are not sure what the referee means by

      Redox potential since it is an attribute of all chemicals that are able to donate/receive electrons. The electrical response to stress appears to be caused by ROS, since when ROS scavengers are added the electrical response is removed i.e. pH plays a very small minority role if any.

      - Furthermore, the control parameter of the experiment (light stimulation) is the same as that used to measure the electrical response, i.e. through fluorescence excitation. The use of the PROPS system could solve this problem.

      We were enthusiastic at the start of the project to use the PROPs system in E. coli as presented by J.M.Krajl et al,‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, Science, 2011, 333, 6040, 345. However, the people we contacted in the microbiology community said that it had some technical issues and there have been no subsequent studies using PROPs in bacteria after the initial promising study. The fluorescent protein system recently presented in PNAS seems more promising, ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 120, 3, e2208348120.

      - Electrical signal propagation is an important aspect of the manuscript. However, a detailed quantitative analysis of the spatial dynamics within the biofilm is lacking. In addition, it is unclear if the electrical signal propagates within the biofilm during the second peak regime, which is mediated by the Kch channel. This is an important question, given that the fire-diffuse-fire model is presented with emphasis on the role of K+ ions.

      We have presented a more detailed account of the electrical wavefront modelling work and it is currently under review in a physical journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      - Since deletion of the kch gene inhibits the long-term electrical response to light stimulation (regime II), the authors concluded that K+ ions play a role in the habituation response. However, Kch is a putative K+ ion channel. The use of specific drugs could help to clarify the role of K+ ions.

      Our recent electrical impedance spectroscopy publication provides further evidence that Kch is associated with large changes in conductivity as expected for a voltage-gated ion channel (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      - The manuscript as such does not allow us to properly conclude on the photo-protective role of the Kch ion channel.

      That Kch has a photoprotective role is our current working hypothesis. The hypothesis fits with the data, but we are not saying we have proven it beyond all possible doubt.

      - The link between membrane potential dynamics and mechanosensitivity is not captured in the equation for the Q-channel opening dynamics in the Hodgkin-Huxley model (Supp Eq 2).

      Our model is agnostic with respect to the mechanosensitivity of the ion channels, although we deduce that mechanosensitive ion channels contribute to ion channel Q.

      - Given the large number of parameters used in the models, it is hard to distinguish between prediction and fitting.

      This is always an issue with electrophysiological modelling (compared with most heart and brain modelling studies we are very conservative in the choice of parameters for the bacteria). In terms of predicting the different phenomena observed, we believe the model is very successful.

      Reviewer #2:

      Summary of what the authors were trying to achieve:<br /> The authors thought they studied membrane potential dynamics in E.coli biofilms. They thought so because they were unaware that the dye they used to report that membrane potential in E.coli, has been previously shown not to report it. Because of this, the interpretation of the authors' results is not accurate.

      We believe the Pilizota work is scientifically flawed.

      Major strengths and weaknesses of the methods and results:<br /> The strength of this work is that all the data is presented clearly, and accurately, as far as I can tell.

      The major critical weakness of this paper is the use of ThT dye as a membrane potential dye in E.coli. The work is unaware of a publication from 2020 https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] that demonstrates that ThT is not a membrane potential dye in E. coli. Therefore I think the results of this paper are misinterpreted. The same publication I reference above presents a protocol on how to carefully calibrate any candidate membrane potential dye in any given condition.

      We are aware of this study, but believe it to be scientifically flawed. We do not cite the article because we do not think it is a particularly useful contribution to the literature.

      I now go over each results section in the manuscript.

      Result section 1: Blue light triggers electrical spiking in single E. coli cells

      I do not think the title of the result section is correct for the following reasons. The above-referenced work demonstrates the loading profile one should expect from a Nernstian dye (Figure 1). It also demonstrates that ThT does not show that profile and explains why is this so. ThT only permeates the membrane under light exposure (Figure 5). This finding is consistent with blue light peroxidising the membrane (see also following work Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] on light-induced damage to the electrochemical gradient of protons-I am sure there are more references for this).

      The Pilizota group invokes some elaborate artefacts to explain the lack of agreement with a simple Nernstian battery model. The model is incorrect not the fluorophore.

      Please note that the loading profile (only observed under light) in the current manuscript in Figure 1B as well as in the video S1 is identical to that in Figure 3 from the above-referenced paper (i.e. https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com]), and corresponding videos S3 and S4. This kind of profile is exactly what one would expect theoretically if the light is simultaneously lowering the membrane potential as the ThT is equilibrating, see Figure S12 of that previous work. There, it is also demonstrated by the means of monitoring the speed of bacterial flagellar motor that the electrochemical gradient of protons is being lowered by the light. The authors state that applying the blue light for different time periods and over different time scales did not change the peak profile. This is expected if the light is lowering the electrochemical gradient of protons. But, in Figure S1, it is clear that it affected the timing of the peak, which is again expected, because the light affects the timing of the decay, and thus of the decay profile of the electrochemical gradient of protons (Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com]).

      We think the proton effect is a million times weaker than that due to potasium i.e. 0.2 M K+ versus 10-7 M H+. We can comfortably neglect the influx of H+ in our experiments.

      If find Figure S1D interesting. There authors load TMRM, which is a membrane voltage dye that has been used extensively (as far as I am aware this is the first reference for that and it has not been cited https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1914430 [ncbi.nlm.nih.gov]/). As visible from the last TMRM reference I give, TMRM will only load the cells in Potassium Phosphate buffer with NaCl (and often we used EDTA to permeabilise the membrane). It is not fully clear (to me) whether here TMRM was prepared in rich media (it explicitly says so for ThT in Methods but not for TMRM), but it seems so. If this is the case, it likely also loads because of the damage to the membrane done with light, and therefore I am not surprised that the profiles are similar.

      The vast majority of cells continue to be viable. We do not think membrane damage is dominating.

      The authors then use CCCP. First, a small correction, as the authors state that it quenches membrane potential. CCCP is a protonophore (https://pubmed.ncbi.nlm.nih.gov/4962086 [pubmed.ncbi.nlm.nih.gov]/), so it collapses electrochemical gradient of protons. This means that it is possible, and this will depend on the type of pumps present in the cell, that CCCP collapses electrochemical gradient of protons, but the membrane potential is equal and opposite in sign to the DeltapH. So using CCCP does not automatically mean membrane potential will collapse (e.g. in some mammalian cells it does not need to be the case, but in E.coli it is https://www.biorxiv.org/content/10.1101/2021.11.19.469321v2 [biorxiv.org]). CCCP has also been recently found to be a substrate for TolC (https://journals.asm.org/doi/10.1128/mbio.00676-21 [journals.asm.org]), but at the concentrations the authors are using CCCP (100uM) that should not affect the results. However, the authors then state because they observed, in Figure S1E, a fast efflux of ions in all cells and no spiking dynamics this confirms that observed dynamics are membrane potential related. I do not agree that it does. First, Figure S1E, does not appear to show transients, instead, it is visible that after 50min treatment with 100uM CCCP, ThT dye shows no dynamics. The action of a Nernstian dye is defined. It is not sufficient that a charged molecule is affected in some way by electrical potential, this needs to be in a very specific way to be a Nernstian dye. Part of the profile of ThT loading observed in https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] is membrane potential related, but not in a way that is characteristic of Nernstian dye.

      Our understanding of the literature is CCCP poisons the whole metabolism of the bacterial cells. The ATP driven K+channels will stop functioning and this is the dominant contributor to membrane potential.

      Result section 2: Membrane potential dynamics depend on the intercellular distance

      In this chapter, the authors report that the time to reach the first intensity peak during ThT loading is different when cells are in microclusters. They interpret this as electrical signalling in clusters because the peak is reached faster in microclusters (as opposed to slower because intuitively in these clusters cells could be shielded from light). However, shielding is one possibility. The other is that the membrane has changed in composition and/or the effective light power the cells can tolerate (with mechanisms to handle light-induced damage, some of which authors mention later in the paper) is lower. Given that these cells were left in a microfluidic chamber for 2h hours to attach in growth media according to Methods, there is sufficient time for that to happen. In Figure S12 C and D of that same paper from my group (https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com]) one can see the effects of peak intensity and timing of the peak on the permeability of the membrane. Therefore I do not think the distance is the explanation for what authors observe.

      Shielding would provide the reverse effect, since hyperpolarization begins in the dense centres of the biofilms. For the initial 2 hours the cells receive negligible blue light. Neither of the referee’s comments thus seem tenable.

      Result section 3: Emergence of synchronized global wavefronts in E. coli biofilms

      In this section, the authors exposed a mature biofilm to blue light. They observe that the intensity peak is reached faster in the cells in the middle. They interpret this as the ion-channel-mediated wavefronts moved from the center of the biofilm. As above, cells in the middle can have different membrane permeability to those at the periphery, and probably even more importantly, there is no light profile shown anywhere in SI/Methods. I could be wrong, but the SI3 A profile is consistent with a potential Gaussian beam profile visible in the field of view. In Methods, I find the light source for the blue light and the type of microscope but no comments on how 'flat' the illumination is across their field of view. This is critical to assess what they are observing in this result section. I do find it interesting that the ThT intensity collapsed from the edges of the biofilms. In the publication I mentioned https://www.sciencedirect.com/science/article/pii/S0006349519308793#app2 [sciencedirect.com], the collapse of fluorescence was not understood (other than it is not membrane potential related). It was observed in Figure 5A, C, and F, that at the point of peak, electrochemical gradient of protons is already collapsed, and that at the point of peak cell expands and cytoplasmic content leaks out. This means that this part of the ThT curve is not membrane potential related. The authors see that after the first peak collapsed there is a period of time where ThT does not stain the cells and then it starts again. If after the first peak the cellular content leaks, as we have observed, then staining that occurs much later could be simply staining of cytoplasmic positively charged content, and the timing of that depends on the dynamics of cytoplasmic content leakage (we observed this to be happening over 2h in individual cells). ThT is also a non-specific amyloid dye, and in starving E. coli cells formation of protein clusters has been observed (https://pubmed.ncbi.nlm.nih.gov/30472191 [pubmed.ncbi.nlm.nih.gov]/), so such cytoplasmic staining seems possible.

      It is very easy to see if the illumination is flat (Köhler illumination) by comparing the intensity of background pixels on the detector. It was flat in our case. Protons have little to do with our work for reasons highlighted before. Differential membrane permittivity is a speculative phenomenon not well supported by any evidence and with no clear molecular mechanism.

      Finally, I note that authors observe biofilms of different shapes and sizes and state that they observe similar intensity profiles, which could mean that my comment on 'flatness' of the field of view above is not a concern. However, the scale bar in Figure 2A is not legible, so I can't compare it to the variation of sizes of the biofilms in Figure 2C (67 to 280um). Based on this, I think that the illumination profile is still a concern.

      The referee now contradicts themselves and wants a scale bar to be more visible. We have changed the scale bar.

      Result section 4: Voltage-gated Kch potassium channels mediate ion-channel electrical oscillations in E. coli

      First I note at this point, given that I disagree that the data presented thus 'suggest that E. coli biofilms use electrical signaling to coordinate long-range responses to light stress' as the authors state, it gets harder to comment on the rest of the results.

      In this result section the authors look at the effect of Kch, a putative voltage-gated potassium channel, on ThT profile in E. coli cells. And they see a difference. It is worth noting that in the publication https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] it is found that ThT is also likely a substrate for TolC (Figure 4), but that scenario could not be distinguished from the one where TolC mutant has a different membrane permeability (and there is a publication that suggests the latter is happening https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2958.2010.07245.x [onlinelibrary.wiley.com]). Given this, it is also possible that Kch deletion affects the membrane permeability. I do note that in video S4 I seem to see more of, what appear to be, plasmolysed cells. The authors do not see the ThT intensity with this mutant that appears long after the initial peak has disappeared, as they see in WT. It is not clear how long they waited for this, as from Figure S3C it could simply be that the dynamics of this is a lot slower, e.g. Kch deletion changes membrane permeability.

      The work that TolC provides a possible passive pathway for ThT to leave cells seems slightly niche. It just demonstrates another mechanism for the cells to equilibriate the concentrations of ThT in a Nernstian manner i.e. driven by the membrane voltage.

      The authors themselves state that the evidence for Kch being a voltage-gated channel is indirect (line 54). I do not think there is a need to claim function from a ThT profile of E. coli mutants (nor do I believe it's good practice), given how accurate single-channel recordings are currently. To know the exact dependency on the membrane potential, ion channel recordings on this protein are needed first.

      We have good evidence form electrical impedance spectroscopy experiments that Kch increases the conductivity of biofilms  (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      Result section 5: Blue light influences ion-channel mediated membrane potential events in E. coli

      In this chapter the authors vary the light intensity and stain the cells with PI (this dye gets into the cells when the membrane becomes very permeable), and the extracellular environment with K+ dye (I have not yet worked carefully with this dye). They find that different amounts of light influence ThT dynamics. This is in line with previous literature (both papers I have been mentioning: Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] and https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com] especially SI12), but does not add anything new. I think the results presented here can be explained with previously published theory and do not indicate that the ion-channel mediated membrane potential dynamics is a light stress relief process.

      The simple Nernstian battery model proposed by Pilizota et al is erroneous in our opinion for reasons outlined above. We believe it will prove to be a dead end for bacterial electrophysiology studies.

      Result section 6: Development of a Hodgkin-Huxley model for the observed membrane potential dynamics

      This results section starts with the authors stating: 'our data provide evidence that E. coli manages light stress through well-controlled modulation of its membrane potential dynamics'. As stated above, I think they are instead observing the process of ThT loading while the light is damaging the membrane and thus simultaneously collapsing the electrochemical gradient of protons. As stated above, this has been modelled before. And then, they observe a ThT staining that is independent from membrane potential.

      This is an erroneous niche opinion. Protons have little say in the membrane potential since there are so few of them. The membrane potential is mostly determined by K+.

      I will briefly comment on the Hodgkin Huxley (HH) based model. First, I think there is no evidence for two channels with different activation profiles as authors propose. But also, the HH model has been developed for neurons. There, the leakage and the pumping fluxes are both described by a constant representing conductivity, times the difference between the membrane potential and Nernst potential for the given ion. The conductivity in the model is given as gK*n^4 for potassium, gNa*m^3*h sodium, and gL for leakage, where gK, gNa and gL were measured experimentally for neurons. And, n, m, and h are variables that describe the experimentally observed voltage-gated mechanism of neuronal sodium and potassium channels. (Please see Hodgkin AL, Huxley AF. 1952. Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo. J. Physiol. 116:449-72 and Hodgkin AL, Huxley AF. 1952. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117:500-44).

      In the 70 years since Hodgkin and Huxley first presented their model, a huge number of similar models have been proposed to describe cellular electrophysiology. We are not being hyperbolic when we state that the HH models for excitable cells are like the Schrödinger equation for molecules. We carefully adapted our HH model to reflect the currently understood electrophysiology of E. coli.

      Thus, in applying the model to describe bacterial electrophysiology one should ensure near equilibrium requirement holds (so that (V-VQ) etc terms in authors' equation Figure 5 B hold), and potassium and other channels in a given bacterium have similar gating properties to those found in neurons. I am not aware of such measurements in any bacteria, and therefore think the pump leak model of the electrophysiology of bacteria needs to start with fluxes that are more general (for example Keener JP, Sneyd J. 2009. Mathematical physiology: I: Cellular physiology. New York: Springer or https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000144 [journals.plos.org])

      The reference is to a slightly more modern version of a simple Nernstian battery model. The model will not oscillate and thus will not help modelling membrane potentials in bacteria. We are unsure where the equilibrium requirement comes from (inadequate modelling of the dynamics?)

      Result section 7: Mechanosensitive ion channels (MS) are vital for the first hyperpolarization event in E. coli.

      The results that Mcs channels affect the profile of ThT dye are interesting. It is again possible that the membrane permeability of these mutants has changed and therefore the dynamics have changed, so this needs to be checked first. I also note that our results show that the peak of ThT coincides with cell expansion. For this to be understood a model is needed that also takes into account the link between maintenance of electrochemical gradients of ions in the cell and osmotic pressure.

      The evidence for permeability changes in the membranes seems to be tenuous.

      A side note is that the authors state that the Msc responds to stress-related voltage changes. I think this is an overstatement. Mscs respond to predominantly membrane tension and are mostly nonspecific (see how their action recovers cellular volume in this publication https://www.pnas.org/doi/full/10.1073/pnas.1522185113 [pnas.org]). Authors cite references 35-39 to support this statement. These publications still state that these channels are predominantly membrane tension-gated. Some of the references state that the presence of external ions is important for tension-related gating but sometimes they gate spontaneously in the presence of certain ions. Other publications cited don't really look at gating with respect to ions (39 is on clustering). This is why I think the statement is somewhat misleading.

      We have reworded the discussion of Mscs since the literature appears to be ambiguous. We will try to run some electrical impedance spectroscopy experiments on the Msc mutants in the future to attempt to remove the ambiguity.

      Result section 8: Anomalous ion-channel-mediated wavefronts propagate light stress signals in 3D E. coli biofilms.

      I am not commenting on this result section, as it would only be applicable if ThT was membrane potential dye in E. coli.

      Ok, but we disagree on the use of ThT.

      Aims achieved/results support their conclusions:

      The authors clearly present their data. I am convinced that they have accurately presented everything they observed. However, I think their interpretation of the data and conclusions is inaccurate in line with the discussion I provided above.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      I do not think this publication should be published in its current format. It should be revised in light of the previous literature as discussed in detail above. I believe presenting it in it's current form on eLife pages would create unnecessary confusion.

      We believe many of the Pilizota group articles are scientifically flawed and are causing the confusion in the literature.

      Any other comments:

      I note, that while this work studies E. coli, it references papers in other bacteria using ThT. For example, in lines 35-36 authors state that bacteria (Bacillus subtilis in this case) in biofilms have been recently found to modulate membrane potential citing the relevant literature from 2015. It is worth noting that the most recent paper https://journals.asm.org/doi/10.1128/mbio.02220-23 [journals.asm.org] found that ThT binds to one or more proteins in the spore coat, suggesting that it does not act as a membrane potential in Bacillus spores. It is possible that it still reports membrane potential in Bacillus cells and the recent results are strictly spore-specific, but these should be kept in mind when using ThT with Bacillus.

      ThT was used successfully in previous studies of normal B. subtilis cells (by our own group and A.Prindle, ‘Spatial propagation of electrical signal in circular biofilms’, J.A.Blee et al, Physical Review E, 2019, 100, 052401, J.A.Blee et al, ‘Membrane potentials, oxidative stress and the dispersal response of bacterial biofilms to 405 nm light’, Physical Biology, 2020, 17, 2, 036001, A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 59-63). The connection to low metabolism pore research seems speculative.

      Reviewer #3:

      It has recently been demonstrated that bacteria in biofilms show changes in membrane potential in response to changes in their environment, and that these can propagate signals through the biofilm to coordinate bacterial behavior. Akabuogu et al. contribute to this exciting research area with a study of blue light-induced membrane potential dynamics in E. coli biofilms. They demonstrate that Thioflavin-T (ThT) intensity (a proxy for membrane potential) displays multiphasic dynamics in response to blue light treatment. They additionally use genetic manipulations to implicate the potassium channel Kch in the latter part of these dynamics. Mechanosensitive ion channels may also be involved, although these channels seem to have blue light-independent effects on membrane potential as well. In addition, there are challenges to the quantitative interpretation of ThT microscopy data which require consideration. The authors then explore whether these dynamics are involved in signaling at the community level. The authors suggest that cell firing is both more coordinated when cells are clustered and happens in waves in larger, 3D biofilms; however, in both cases evidence for these claims is incomplete. The authors present two simulations to describe the ThT data. The first of these simulations, a Hodgkin-Huxley model, indicates that the data are consistent with the activity of two ion channels with different kinetics; the Kch channel mutant, which ablates a specific portion of the response curve, is consistent with this. The second model is a fire-diffuse-fire model to describe wavefront propagation of membrane potential changes in a 3D biofilm; because the wavefront data are not presented clearly, the results of this model are difficult to interpret. Finally, the authors discuss whether these membrane potential changes could be involved in generating a protective response to blue light exposure; increased death in a Kch ion channel mutant upon blue light exposure suggests that this may be the case, but a no-light control is needed to clarify this.

      In a few instances, the paper is missing key control experiments that are important to the interpretation of the data. This makes it difficult to judge the meaning of some of the presented experiments.

      1. An additional control for the effects of autofluorescence is very important. The authors conduct an experiment where they treat cells with CCCP and see that Thioflavin-T (ThT) dynamics do not change over the course of the experiment. They suggest that this demonstrates that autofluorescence does not impact their measurements. However, cellular autofluorescence depends on the physiological state of the cell, which is impacted by CCCP treatment. A much simpler and more direct experiment would be to repeat the measurement in the absence of ThT or any other stain. This experiment should be performed both in the wild-type strain and in the ∆kch mutant.

      ThT is a very bright fluorophore (much brighter than a GFP). It is clear from the images of non-stained samples that autofluorescence provides a negligible contribution to the fluorescence intensity in an image.

      2. The effects of photobleaching should be considered. Of course, the intensity varies a lot over the course of the experiment in a way that photobleaching alone cannot explain. However, photobleaching can still contribute to the kinetics observed. Photobleaching can be assessed by changing the intensity, duration, or frequency of exposure to excitation light during the experiment. Considerations about photobleaching become particularly important when considering the effect of catalase on ThT intensity. The authors find that the decrease in ThT signal after the initial "spike" is attenuated by the addition of catalase; this is what would be predicted by catalase protecting ThT from photobleaching (indeed, catalase can be used to reduce photobleaching in time lapse imaging).

      Photobleaching was negligible over the course of the experiments. We employed techniques such as reducing sample exposure time and using the appropriate light intensity to minimize photobleaching.

      3. It would be helpful to have a baseline of membrane potential fluctuations in the absence of the proposed stimulus (in this case, blue light). Including traces of membrane potential recorded without light present would help support the claim that these changes in membrane potential represent a blue light-specific stress response, as the authors suggest. Of course, ThT is blue, so if the excitation light for ThT is problematic for this experiment the alternative dye tetramethylrhodamine methyl ester perchlorate (TMRM) can be used instead.

      Unfortunately the fluorescent baseline is too weak to measure cleanly in this experiment. It appears the collective response of all the bacteria hyperpolarization at the same time appears to dominate the signal (measurements in the eLife article and new potentiometry measurements).

      4. The effects of ThT in combination with blue light should be more carefully considered. In mitochondria, a combination of high concentrations of blue light and ThT leads to disruption of the PMF (Skates et al. 2021 BioRXiv), and similarly, ThT treatment enhances the photodynamic effects of blue light in E. coli (Bondia et al. 2021 Chemical Communications). If present in this experiment, this effect could confound the interpretation of the PMF dynamics reported in the paper.

      We think the PMF plays a minority role in determining the membrane potential in E. coli. For reasons outlined before (H+ is a minority ion in E. coli compared with K+).

      5. Figures 4D - E indicate that a ∆kch mutant has increased propidium iodide (PI) staining in the presence of blue light; this is interpreted to mean that Kch-mediated membrane potential dynamics help protect cells from blue light. However, Live/Dead staining results in these strains in the absence of blue light are not reported. This means that the possibility that the ∆kch mutant has a general decrease in survival (independent of any effects of blue light) cannot be ruled out.

      Both strains of bacterial has similar growth curve and also engaged in membrane potential dynamics for the duration of the experiment. We were interested in bacterial cells that observed membrane potential dynamics in the presence of the stress. Bacterial cells need to be alive to engage in membrane potential  dynamics (hyperpolarize) under stress conditions. Cells that engaged in membrane potential dynamics and later stained red were only counted after the entire duration. We believe that the wildtype handles the light stress better than the ∆kch mutant as measured with the PI.

      6. Additionally in Figures 4D - E, the interpretation of this experiment can be confounded by the fact that PI uptake can sometimes be seen in bacterial cells with high membrane potential (Kirchhoff & Cypionka 2017 J Microbial Methods); the interpretation is that high membrane potential can lead to increased PI permeability. Because the membrane potential is largely higher throughout blue light treatment in the ∆kch mutant (Fig. 3AB), this complicates the interpretation of this experiment.

      Kirchhoff & Cypionka 2017 J Microbial Methods, using fluorescence microscopy, suggested that changes in membrane potential dynamics can introduce experimental bias when propidium iodide is used to confirm the viability of tge bacterial strains, B subtilis (DSM-10) and Dinoroseobacter shibae, that are starved of oxygen (via N2 gassing) for 2 hours. They attempted to support their findings by using CCCP in stopping the membrane potential dynamics (but never showed any pictoral or plotted data for this confirmatory experiment). In our experiment methodology, cell death was not forced on the cells by introducing an extra burden or via anoxia. We believe that the accumulation of PI in ∆kch mutant is not due to high membrane potential dynamics but is attributed to the PI, unbiasedly showing damaged/dead cells. We think that propidium iodide is good for this experiment. Propidium iodide is a dye that is extensively used in life sciences. PI has also been used in the study of bacterial electrophysiology (https://pubmed.ncbi.nlm.nih.gov/32343961/, ) and no membrane potential related bias was reported.

      Throughout the paper, many ThT intensity traces are compared, and described as "similar" or "dissimilar", without detailed discussion or a clear standard for comparison. For example, the two membrane potential curves in Fig. S1C are described as "similar" although they have very different shapes, whereas the curves in Fig. 1B and 1D are discussed in terms of their differences although they are evidently much more similar to one another. Without metrics or statistics to compare these curves, it is hard to interpret these claims. These comparative interpretations are additionally challenging because many of the figures in which average trace data are presented do not indicate standard deviation.

      Comparison of small changes in the absolute intensities is problematic in such fluorescence experiments. We mean the shape of the traces is similar and they can be modelled using a HH model with similar parameters.

      The differences between the TMRM and ThT curves that the authors show in Fig. S1C warrant further consideration. Some of the key features of the response in the ThT curve (on which much of the modeling work in the paper relies) are not very apparent in the TMRM data. It is not obvious to me which of these traces will be more representative of the actual underlying membrane potential dynamics.

      In our experiment, TMRM was used to confirm the dynamics observed using ThT. However, ThT appear to be more photostable than TMRM (especially towars the 2nd peak). The most interesting observation is that with both dyes, all phases of the membrane potential dynamics were conspicuous (the first peak, the quiescent period and the second peak). The time periods for these three episodes were also similar.

      A key claim in this paper (that dynamics of firing differ depending on whether cells are alone or in a colony) is underpinned by "time-to-first peak" analysis, but there are some challenges in interpreting these results. The authors report an average time-to-first peak of 7.34 min for the data in Figure 1B, but the average curve in Figure 1B peaks earlier than this. In Figure 1E, it appears that there are a handful of outliers in the "sparse cell" condition that likely explain this discrepancy. Either an outlier analysis should be done and the mean recomputed accordingly, or a more outlier-robust method like the median should be used instead. Then, a statistical comparison of these results will indicate whether there is a significant difference between them.

      The key point is the comparison of standard errors on the standard deviation.

      In two different 3D biofilm experiments, the authors report the propagation of wavefronts of membrane potential; I am unable to discern these wavefronts in the imaging data, and they are not clearly demonstrated by analysis.

      The first data set is presented in Figures 2A, 2B, and Video S3. The images and video are very difficult to interpret because of how the images have been scaled: the center of the biofilm is highly saturated, and the zero value has also been set too high to consistently observe the single cells surrounding the biofilm. With the images scaled this way, it is very difficult to assess dynamics. The time stamps in Video S3 and on the panels in Figure 2A also do not correspond to one another although the same biofilm is shown (and the time course in 2B is also different from what is indicated in 2B). In either case, it appears that the center of the biofilm is consistently brighter than the edges, and the intensity of all cells in the biofilm increases in tandem; by eye, propagating wavefronts (either directed toward the edge or the center) are not evident to me. Increased brightness at the center of the biofilm could be explained by increased cell thickness there (as is typical in this type of biofilm). From the image legend, it is not clear whether the image presented is a single confocal slice or a projection. Even if this is a single confocal slice, in both Video S3 and Figure 2A there are regions of "haze" from out-of-focus light evident, suggesting that light from other focal planes is nonetheless present. This seems to me to be a simpler explanation for the fluorescence dynamics observed in this experiment: cells are all following the same trajectory that corresponds to that seen for single cells, and the center is brighter because of increased biofilm thickness.

      We appreciate the reviewer for this important observation. We have made changes to the figures to address this confusion. The cell cover has no influence on the observed membrane potential dynamics. The entire biofilm was exposed to the same blue light at each time. Therefore all parts of the biofilm received equal amounts of the blue light intensity. The membrane potential dynamics was not influenced by cell density (see Fig 2C). 

      The second data set is presented in Video S6B; I am similarly unable to see any wave propagation in this video. I observe only a consistent decrease in fluorescence intensity throughout the experiment that is spatially uniform (except for the bright, dynamic cells near the top; these presumably represent cells that are floating in the microfluidic and have newly arrived to the imaging region).

      A visual inspection of Video S6B shows a fast rise, a decrease in fluorescence and a second rise (supplementary figure 4B). The data for the fluorescence was carefully obtained using the imaris software. We created a curved geometry on each slice of the confocal stack. We analyzed the surfaces of this curved plane along the z-axis. This was carried out in imaris.

      3D imaging data can be difficult to interpret by eye, so it would perhaps be more helpful to demonstrate these propagating wavefronts by analysis; however, such analysis is not presented in a clear way. The legend in Figure 2B mentions a "wavefront trace", but there is no position information included - this trace instead seems to represent the average intensity trace of all cells. To demonstrate the propagation of a wavefront, this analysis should be shown for different subpopulations of cells at different positions from the center of the biofilm. Data is shown in Figure 8 that reflects the velocity of the wavefront as a function of biofilm position; however, because the wavefronts themselves are not evident in the data, it is difficult to interpret this analysis. The methods section additionally does not contain sufficient information about what these velocities represent and how they are calculated. Because of this, it is difficult for me to evaluate the section of the paper pertaining to wave propagation and the predicted biofilm critical size.

      The analysis is considered in more detail in a more expansive modelling article, currently under peer review in a physics journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      There are some instances in the paper where claims are made that do not have data shown or are not evident in the cited data:

      1. In the first results section, "When CCCP was added, we observed a fast efflux of ions in all cells"- the data figure pertaining to this experiment is in Fig. S1E, which does not show any ion efflux. The methods section does not mention how ion efflux was measured during CCCP treatment.

      We have worded this differently to properly convey our results.

      2. In the discussion of voltage-gated calcium channels, the authors refer to "spiking events", but these are not obvious in Figure S3E. Although the fluorescence intensity changes over time, it's hard to distinguish these fluctuations from measurement noise; a no-light control could help clarify this.

      The calcium transients observed were not due to noise or artefacts.

      3. The authors state that the membrane potential dynamics simulated in Figure 7B are similar to those observed in 3D biofilms in Fig. S4B; however, the second peak is not clearly evident in Fig. S4B and it looks very different for the mature biofilm data reported in Fig. 2. I have some additional confusion about this data specifically: in the intensity trace shown in Fig. S4B, the intensity in the second frame is much higher than the first; this is not evident in Video S6B, in which the highest intensity is in the first frame at time 0. Similarly, the graph indicates that the intensity at 60 minutes is higher than the intensity at 4 minutes, but this is not the case in Fig. S4A or Video S6B.

      The confusion stated here has now been addressed. Also it should be noted that while Fig 2.1 was obtained with LED light source, Fig S4A was obtained using a laser light source. While obtaining the confocal images (for Fig S4A ), the light intensity was controlled to further minimize photobleaching. Most importantly, there is an evidence of slow rise to the 2nd peak in Fig S4B. The first peak, quiescence and slow rise to second peak are evident.

    6. Author Response

      We would like to sincerely thank the referees and the editor for their time in considering our manuscript. The electrophysiology of bacteria is a fast-moving complex field and is proving contentious in places. We believe the peer review process of eLife provides an ideal mechanism to address the issues raised on our manuscript in an open and transparent manner. Hopefully we will encourage some more consensus in the field and help understand some of the inconsistencies in the current literature that are hampering progress.

      The editors stress the main issue raised was a single referee questioning the use of ThT as an indicator of membrane potential. We are well aware of the articles by the Pilizota group and we believe them to be scientifically flawed. The authors assume there are no voltage-gated ion channels in E. coli and then attempt to explain motility data based on a simple Nernstian battery model (they assume E. coli are unexcitable matter). This in turn leads them to conclude the membrane dye ThT is faulty, when in fact it is a problem with their simple battery model.

      In terms of the previous microbiology literature, the assumption of no voltage-gated ion channels in E. coli suggested by referee 2 is a highly contentious niche ideology. The majority of gene databases for E. coli have a number of ion-channels annotated as voltage sensitive due to comparative genetics studies e.g. try the https://bacteria.ensembl.org/ database (the search terms ‘voltage-gated coli’ give 2521 hits for genes, similarly you could check www.uniprot.org or www.biocyc.org) and M.M.Kuo, Y.Saimi, C.Kung, ‘Gain of function mutation indicate that E. coli Kch form a functional K+ conduit in vivo’, EMBO Journal, 2003, 22, 16, 4049. Furthermore, recent microbiology reviews all agree that E. coli has a number of voltage-gated ion channels S.D.Beagle, S.W.Lockless, ‘Unappreciated roles for K+ channels in bacterial physiology’, Trends in microbiology, 2021, 29, 10, 942-950. More emphatic experimental data is seen in spiking potentials that have been observed by many groups for E. coli, both directly using microelectrodes and indirectly using genetically expressed fluorophores, ‘Electrical spiking in bacterial biofilms’ E.Masi et al, Journal of the Royal Society Interface, 2015, 12, 102, ‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, J.M.Kralj, et al, Science, 2011, 333, 6040, 345 and ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 2023, 120, 3, e2208348120. The only mechanism currently known to cause spiking potentials in cells is due to positive feedback from voltage-gated ion channels (you need a mechanism to induce the oscillations). Indeed, people are starting to investigate the specific voltage-gated ion channels in E. coli and a role is emerging for calcium in addition to potassium e.g. ‘Genome-wide functional screen for calcium transients in E. coli identifies increased membrane potential adaptation to persistent DNA damage’, R.Luder, et al, J.Bacteriology, 2021, 203, 3, e00509.

      In terms of recent data from our own group, electrical impedance spectroscopy (EIS) experiments from E. coli indicate there are large conductivity changes associated with the Kch ion channels (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print). EIS experiments probe the electrical phenomena of bacterial biofilms directly and do not depend on fluorophores i.e. they can’t be affected by ThT.

      Attempts to disprove the use of ThT to measure hyperpolarisation phenomena in E. coli using fluorescence microscopy also seem doomed to failure based on comparative control experiments. A wide range of other cationic fluorophores show similar behaviour to ThT e.g. the potassium sensitive dye used in our eLife article. Thus the behaviour of ThT appears to be generic for a range of cationic dyes and it implies a simple physical mechanism i.e. the positively charged dyes enter cells at low potentials. The elaborate photobleaching mechanism postulated by referee 2 seems most unlikely and is unable to explain our data (see below). ThT is photostable and chemically well-defined and it is therefore used almost universally in fluorescence assays for amyloids.

      A challenge with trying to use flagellar motility to measure intracellular potentials in live bacteria, as per referee 2’s many publications, is that a clutch is known to occur with E. coli e.g. ‘Flagellar brake protein YcgR interacts with motor proteins MotA and FliG to regulate the flagellar rotation speed and direction’, Q.Han et al, Frontiers in Microbiology, 2023, 14. Thus bacteria with high membrane potentials can have low motility when their clutch is engaged. This makes sense, since otherwise bacterial motility would be enslaved to their membrane potentials, greatly restricting their ability to react to their environmental conditions. Without quantifying the dynamics of the clutch (e.g. the gene circuit) it seems challenging to deduce how the motor reacts to Nernstian potentials in vivo. As a result we are not convinced by any of the Pilizota group articles. The quantitative connection between motility and membrane potential is too tenuous.

      In conclusion, the articles questioning the use of ThT are scientifically flawed and based on a niche ideology that E. coli do not contain voltage-gated ion channels. The current work disproves the simple Nernstian battery (SNB) model expounded by Pilizota et al, unpersuasively represented in multiple publications by this one group in the literature (see below for critical synopses) and demonstrates the SNB models needs to be replaced by a model that includes excitability (demonstrating hyperpolarization of the membrane potential).

      In the language of physics, a non-linear oscillator model is needed to explain spiking potentials in bacteria and the simple battery models presented by Pilizota et al do not have the required non-linearities to oscillate (‘Nonlinear dynamics and chaos’, Steve Strogatz, Westview Press, 2014). Such non-linear models are the foundation for describing eukaryotic electrophysiology, e.g. Hodgkin and Huxley’s Nobel prize winning research (1963), but also the vast majority of modern extensions (‘Mathematical physiology’, J.Keener, J.Sneyd, Springer, 2009, ‘Cellular biophysics and modelling: a primer on the computational biology of excitable cells’, G.C.Smith, 2019, CUP, ‘Dynamical systems in neuroscience: the geometry of excitability and bursting’, E.M.Izhikevich, 2006, MIT and ‘Neuronal dynamics: from single neurons to networks and models of cognition’, W.Gerstner et al, 2014, CUP). The Pilizota group is using modelling tools from the 1930s that quickly were shown to be inadequate to describe eukaryotic cellular electrophysiology and the same is true for bacterial electrophysiology (see the ground breaking work of A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 7576, 59 for the use of Hodgkin-Huxley models with bacterial biofilms). Below we describe a critical synopsis of the articles cited by referee 2 and we then directly answer the specific points all the referees raise.

      Critical synopsis of the articles cited by referee 2:

      (1) ‘Generalized workflow for characterization of Nernstian dyes and their effects on bacterial physiology’, L.Mancini et al, Biophysical Journal, 2020, 118, 1, 4-14.

      This is the central article used by referee 2 to argue that there are issues with the calibration of ThT for the measurement of membrane potentials. The authors use a simple Nernstian battery (SNB) model and unfortunately it is wrong when voltage-gated ion channels occur. Huge oscillations occur in the membrane potentials of E. coli that cannot be described by the SNB model. Instead a Hodgkin Huxley model is needed, as shown in our eLife manuscript and multiple other studies (see above). Arrhenius kinetics are assumed in the SNB model for pumping with no real evidence and the generalized workflow involves ripping the flagella off the bacteria! The authors construct an elaborate ‘work flow’ to insure their ThT results can be interpreted using their erroneous SNB model over a limited range of parameters.

      (2) ‘Non-equivalence of membrane voltage and ion-gradient as driving forces for the bacterial flagellar motor at low load’, C.J.Lo, et al, Biophysical Journal, 2007, 93, 1, 294.

      An odd de novo chimeric species is developed using an E. coli chassis which uses Na+ instead of H+ for the motility of its flagellar motor. It is not clear the relevance to wild type E. coli, due to the massive physiological perturbations involved. A SNB model is using to fit the data over a very limited parameter range with all the concomitant errors.

      (3) Single-cell bacterial electrophysiology reveals mechanisms of stress-induced damage’, E.Krasnopeeva, et al, Biophysical Journal, 2019, 116, 2390.

      The abstract says ‘PMF defines the physiological state of the cell’. This statement is hyperbolic. An extremely wide range of molecules contribute to the physiological state of a cell. PMF does not even define the electrophysiology of the cell e.g. via the membrane potential. There are 0.2 M of K+ compared with 0.0000001 M of H+ in E. coli, so K+ is arguably a million times more important for the membrane potential than H+ and thus the electrophysiology!

      Equation (1) in the manuscript assumes no other ions are exchanged during the experiments other than H+. This is a very bad approximation when voltage-gated potassium ion channels move the majority ion (K+) around!

      In our model Figure 4A is better explained by depolarisation due to K+ channels closing than direct irreversible photodamage. Why does the THT fluorescence increase again for the second hyperpolarization event if the THT is supposed to be damaged? It does not make sense.

      (4) ‘The proton motive force determines E. coli robustness to extracellular pH’, G.Terradot et al, 2024, preprint.

      This article expounds the SNB model once more. It still ignores the voltage-gated ion channels. Furthermore, it ignores the effect of the dominant ion in E. coli, K+. The manuscript is incorrect as a result and I would not recommend publication. In general, an important problem is being researched i.e. how the membrane potential of E. coli is related to motility, but there are serious flaws in the SNB approach and the experimental methodology appears tenuous.

      Answers to specific questions raised by the referees

      Reviewer #1 (Public Review):

      Summary: Cell-to-cell communication is essential for higher functions in bacterial biofilms. Electrical signals have proven effective in transmitting signals across biofilms. These signals are then used to coordinate cellular metabolisms or to increase antibiotic tolerance. Here, the authors have reported for the first time coordinated oscillation of membrane potential in E. coli biofilms that may have a functional role in photoprotection.

      Strengths:

      • The authors report original data.

      • For the first time, they showed that coordinated oscillations in membrane potential occur in E. Coli biofilms.

      • The authors revealed a complex two-phase dynamic involving distinct molecular response mechanisms.

      • The authors developed two rigorous models inspired by 1) Hodgkin-Huxley model for the temporal dynamics of membrane potential and 2) Fire-Diffuse-Fire model for the propagation of the electric signal.

      • Since its discovery by comparative genomics, the Kch ion channel has not been associated with any specific phenotype in E. coli. Here, the authors proposed a functional role for the putative K+ Kch channel : enhancing survival under photo-toxic conditions.

      We thank the referee for their positive evaluations and agree with these statements.

      Weaknesses:

      • Since the flow of fresh medium is stopped at the beginning of the acquisition, environmental parameters such as pH and RedOx potential are likely to vary significantly during the experiment. It is therefore important to exclude the contributions of these variations to ensure that the electrical response is only induced by light stimulation. Unfortunately, no control experiments were carried out to address this issue.

      The electrical responses occur almost instantaneously when the stimulation with blue light begins i.e. it is too fast to be a build of pH. We are not sure what the referee means by Redox potential since it is an attribute of all chemicals that are able to donate/receive electrons. The electrical response to stress appears to be caused by ROS, since when ROS scavengers are added the electrical response is removed i.e. pH plays a very small minority role if any.

      • Furthermore, the control parameter of the experiment (light stimulation) is the same as that used to measure the electrical response, i.e. through fluorescence excitation. The use of the PROPS system could solve this problem.

      We were enthusiastic at the start of the project to use the PROPs system in E. coli as presented by J.M.Krajl et al, ‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, Science, 2011, 333, 6040, 345. However, the people we contacted in the microbiology community said that it had some technical issues and there have been no subsequent studies using PROPs in bacteria after the initial promising study. The fluorescent protein system recently presented in PNAS seems more promising, ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 120, 3, e2208348120.

      Electrical signal propagation is an important aspect of the manuscript. However, a detailed >quantitative analysis of the spatial dynamics within the biofilm is lacking. In addition, it is unclear if the electrical signal propagates within the biofilm during the second peak regime, which is mediated by the Kch channel. This is an important question, given that the fire-diffuse-fire model is presented with emphasis on the role of K+ ions.

      We have presented a more detailed account of the electrical wavefront modelling work and it is currently under review in a physical journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      • Since deletion of the kch gene inhibits the long-term electrical response to light stimulation (regime II), the authors concluded that K+ ions play a role in the habituation response. However, Kch is a putative K+ ion channel. The use of specific drugs could help to clarify the role of K+ ions.

      Our recent electrical impedance spectroscopy publication provides further evidence that Kch is associated with large changes in conductivity as expected for a voltage-gated ion channel (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      • The manuscript as such does not allow us to properly conclude on the photo-protective role of the Kch ion channel.

      That Kch has a photoprotective role is our current working hypothesis. The hypothesis fits with the data, but we are not saying we have proven it beyond all possible doubt.

      • The link between membrane potential dynamics and mechanosensitivity is not captured in the equation for the Q-channel opening dynamics in the Hodgkin-Huxley model (Supp Eq 2).

      Our model is agnostic with respect to the mechanosensitivity of the ion channels, although we deduce that mechanosensitive ion channels contribute to ion channel Q.

      • Given the large number of parameters used in the models, it is hard to distinguish between prediction and fitting.

      This is always an issue with electrophysiological modelling (compared with most heart and brain modelling studies we are very conservative in the choice of parameters for the bacteria). In terms of predicting the different phenomena observed, we believe the model is very successful.

      Reviewer #2 (Public Review):

      Summary of what the authors were trying to achieve:

      The authors thought they studied membrane potential dynamics in E.coli biofilms. They thought so because they were unaware that the dye they used to report that membrane potential in E.coli, has been previously shown not to report it. Because of this, the interpretation of the authors' results is not accurate.

      We believe the Pilizota work is scientifically flawed.

      Major strengths and weaknesses of the methods and results:

      The strength of this work is that all the data is presented clearly, and accurately, as far as I can tell.

      The major critical weakness of this paper is the use of ThT dye as a membrane potential dye in E.coli. The work is unaware of a publication from 2020 https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] that demonstrates that ThT is not a membrane potential dye in E. coli. Therefore I think the results of this paper are misinterpreted. The same publication I reference above presents a protocol on how to carefully calibrate any candidate membrane potential dye in any given condition.

      We are aware of this study, but believe it to be scientifically flawed. We do not cite the article because we do not think it is a particularly useful contribution to the literature.

      I now go over each results section in the manuscript.

      Result section 1: Blue light triggers electrical spiking in single E. coli cells

      I do not think the title of the result section is correct for the following reasons. The above-referenced work demonstrates the loading profile one should expect from a Nernstian dye (Figure 1). It also demonstrates that ThT does not show that profile and explains why is this so. ThT only permeates the membrane under light exposure (Figure 5). This finding is consistent with blue light peroxidising the membrane (see also following work Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] on light-induced damage to the electrochemical gradient of protons-I am sure there are more references for this).

      The Pilizota group invokes some elaborate artefacts to explain the lack of agreement with a simple Nernstian battery model. The model is incorrect not the fluorophore.

      Please note that the loading profile (only observed under light) in the current manuscript in Figure 1B as well as in the video S1 is identical to that in Figure 3 from the above-referenced paper (i.e. https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com]), and corresponding videos S3 and S4. This kind of profile is exactly what one would expect theoretically if the light is simultaneously lowering the membrane potential as the ThT is equilibrating, see Figure S12 of that previous work. There, it is also demonstrated by the means of monitoring the speed of bacterial flagellar motor that the electrochemical gradient of protons is being lowered by the light. The authors state that applying the blue light for different time periods and over different time scales did not change the peak profile. This is expected if the light is lowering the electrochemical gradient of protons. But, in Figure S1, it is clear that it affected the timing of the peak, which is again expected, because the light affects the timing of the decay, and thus of the decay profile of the electrochemical gradient of protons (Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com]).

      We think the proton effect is a million times weaker than that due to potasium i.e. 0.2 M K+ versus 10-7 M H+. We can comfortably neglect the influx of H+ in our experiments.

      If find Figure S1D interesting. There authors load TMRM, which is a membrane voltage dye that has been used extensively (as far as I am aware this is the first reference for that and it has not been cited https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1914430 [ncbi.nlm.nih.gov]/). As visible from the last TMRM reference I give, TMRM will only load the cells in Potassium Phosphate buffer with NaCl (and often we used EDTA to permeabilise the membrane). It is not fully clear (to me) whether here TMRM was prepared in rich media (it explicitly says so for ThT in Methods but not for TMRM), but it seems so. If this is the case, it likely also loads because of the damage to the membrane done with light, and therefore I am not surprised that the profiles are similar.

      The vast majority of cells continue to be viable. We do not think membrane damage is dominating.

      The authors then use CCCP. First, a small correction, as the authors state that it quenches membrane potential. CCCP is a protonophore (https://pubmed.ncbi.nlm.nih.gov/4962086 [pubmed.ncbi.nlm.nih.gov]/), so it collapses electrochemical gradient of protons. This means that it is possible, and this will depend on the type of pumps present in the cell, that CCCP collapses electrochemical gradient of protons, but the membrane potential is equal and opposite in sign to the DeltapH. So using CCCP does not automatically mean membrane potential will collapse (e.g. in some mammalian cells it does not need to be the case, but in E.coli it is https://www.biorxiv.org/content/10.1101/2021.11.19.469321v2 [biorxiv.org]). CCCP has also been recently found to be a substrate for TolC (https://journals.asm.org/doi/10.1128/mbio.00676-21 [journals.asm.org]), but at the concentrations the authors are using CCCP (100uM) that should not affect the results. However, the authors then state because they observed, in Figure S1E, a fast efflux of ions in all cells and no spiking dynamics this confirms that observed dynamics are membrane potential related. I do not agree that it does. First, Figure S1E, does not appear to show transients, instead, it is visible that after 50min treatment with 100uM CCCP, ThT dye shows no dynamics. The action of a Nernstian dye is defined. It is not sufficient that a charged molecule is affected in some way by electrical potential, this needs to be in a very specific way to be a Nernstian dye. Part of the profile of ThT loading observed in https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] is membrane potential related, but not in a way that is characteristic of Nernstian dye.

      Our understanding of the literature is CCCP poisons the whole metabolism of the bacterial cells. The ATP driven K+ channels will stop functioning and this is the dominant contributor to membrane potential.

      Result section 2: Membrane potential dynamics depend on the intercellular distance

      In this chapter, the authors report that the time to reach the first intensity peak during ThT loading is different when cells are in microclusters. They interpret this as electrical signalling in clusters because the peak is reached faster in microclusters (as opposed to slower because intuitively in these clusters cells could be shielded from light). However, shielding is one possibility. The other is that the membrane has changed in composition and/or the effective light power the cells can tolerate (with mechanisms to handle light-induced damage, some of which authors mention later in the paper) is lower. Given that these cells were left in a microfluidic chamber for 2h hours to attach in growth media according to Methods, there is sufficient time for that to happen. In Figure S12 C and D of that same paper from my group (https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com]) one can see the effects of peak intensity and timing of the peak on the permeability of the membrane. Therefore I do not think the distance is the explanation for what authors observe.

      Shielding would provide the reverse effect, since hyperpolarization begins in the dense centres of the biofilms. For the initial 2 hours the cells receive negligible blue light. Neither of the referee’s comments thus seem tenable.

      Result section 3: Emergence of synchronized global wavefronts in E. coli biofilms

      In this section, the authors exposed a mature biofilm to blue light. They observe that the intensity peak is reached faster in the cells in the middle. They interpret this as the ion-channel-mediated wavefronts moved from the center of the biofilm. As above, cells in the middle can have different membrane permeability to those at the periphery, and probably even more importantly, there is no light profile shown anywhere in SI/Methods. I could be wrong, but the SI3 A profile is consistent with a potential Gaussian beam profile visible in the field of view. In Methods, I find the light source for the blue light and the type of microscope but no comments on how 'flat' the illumination is across their field of view. This is critical to assess what they are observing in this result section. I do find it interesting that the ThT intensity collapsed from the edges of the biofilms. In the publication I mentioned https://www.sciencedirect.com/science/article/pii/S0006349519308793#app2 [sciencedirect.com], the collapse of fluorescence was not understood (other than it is not membrane potential related). It was observed in Figure 5A, C, and F, that at the point of peak, electrochemical gradient of protons is already collapsed, and that at the point of peak cell expands and cytoplasmic content leaks out. This means that this part of the ThT curve is not membrane potential related. The authors see that after the first peak collapsed there is a period of time where ThT does not stain the cells and then it starts again. If after the first peak the cellular content leaks, as we have observed, then staining that occurs much later could be simply staining of cytoplasmic positively charged content, and the timing of that depends on the dynamics of cytoplasmic content leakage (we observed this to be happening over 2h in individual cells). ThT is also a non-specific amyloid dye, and in starving E. coli cells formation of protein clusters has been observed (https://pubmed.ncbi.nlm.nih.gov/30472191 [pubmed.ncbi.nlm.nih.gov]/), so such cytoplasmic staining seems possible.

      It is very easy to see if the illumination is flat (Köhler illumination) by comparing the intensity of background pixels on the detector. It was flat in our case. Protons have little to do with our work for reasons highlighted before. Differential membrane permittivity is a speculative phenomenon not well supported by any evidence and with no clear molecular mechanism.

      Finally, I note that authors observe biofilms of different shapes and sizes and state that they observe similar intensity profiles, which could mean that my comment on 'flatness' of the field of view above is not a concern. However, the scale bar in Figure 2A is not legible, so I can't compare it to the variation of sizes of the biofilms in Figure 2C (67 to 280um). Based on this, I think that the illumination profile is still a concern.

      The referee now contradicts themselves and wants a scale bar to be more visible. We have changed the scale bar.

      Result section 4: Voltage-gated Kch potassium channels mediate ion-channel electrical oscillations in E. coli

      First I note at this point, given that I disagree that the data presented thus 'suggest that E. coli biofilms use electrical signaling to coordinate long-range responses to light stress' as the authors state, it gets harder to comment on the rest of the results.

      In this result section the authors look at the effect of Kch, a putative voltage-gated potassium channel, on ThT profile in E. coli cells. And they see a difference. It is worth noting that in the publication https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] it is found that ThT is also likely a substrate for TolC (Figure 4), but that scenario could not be distinguished from the one where TolC mutant has a different membrane permeability (and there is a publication that suggests the latter is happening https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2958.2010.07245.x [onlinelibrary.wiley.com]). Given this, it is also possible that Kch deletion affects the membrane permeability. I do note that in video S4 I seem to see more of, what appear to be, plasmolysed cells. The authors do not see the ThT intensity with this mutant that appears long after the initial peak has disappeared, as they see in WT. It is not clear how long they waited for this, as from Figure S3C it could simply be that the dynamics of this is a lot slower, e.g. Kch deletion changes membrane permeability.

      The work that TolC provides a possible passive pathway for ThT to leave cells seems slightly niche. It just demonstrates another mechanism for the cells to equilibriate the concentrations of ThT in a Nernstian manner i.e. driven by the membrane voltage.

      The authors themselves state that the evidence for Kch being a voltage-gated channel is indirect (line 54). I do not think there is a need to claim function from a ThT profile of E. coli mutants (nor do I believe it's good practice), given how accurate single-channel recordings are currently. To know the exact dependency on the membrane potential, ion channel recordings on this protein are needed first.

      We have good evidence form electrical impedance spectroscopy experiments that Kch increases the conductivity of biofilms (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      Result section 5: Blue light influences ion-channel mediated membrane potential events in E. coli

      In this chapter the authors vary the light intensity and stain the cells with PI (this dye gets into the cells when the membrane becomes very permeable), and the extracellular environment with K+ dye (I have not yet worked carefully with this dye). They find that different amounts of light influence ThT dynamics. This is in line with previous literature (both papers I have been mentioning: Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] and https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com] especially SI12), but does not add anything new. I think the results presented here can be explained with previously published theory and do not indicate that the ion-channel mediated membrane potential dynamics is a light stress relief process.

      The simple Nernstian battery model proposed by Pilizota et al is erroneous in our opinion for reasons outlined above. We believe it will prove to be a dead end for bacterial electrophysiology studies.

      Result section 6: Development of a Hodgkin-Huxley model for the observed membrane potential dynamics

      This results section starts with the authors stating: 'our data provide evidence that E. coli manages light stress through well-controlled modulation of its membrane potential dynamics'. As stated above, I think they are instead observing the process of ThT loading while the light is damaging the membrane and thus simultaneously collapsing the electrochemical gradient of protons. As stated above, this has been modelled before. And then, they observe a ThT staining that is independent from membrane potential.

      This is an erroneous niche opinion. Protons have little say in the membrane potential since there are so few of them. The membrane potential is mostly determined by K+.

      I will briefly comment on the Hodgkin Huxley (HH) based model. First, I think there is no evidence for two channels with different activation profiles as authors propose. But also, the HH model has been developed for neurons. There, the leakage and the pumping fluxes are both described by a constant representing conductivity, times the difference between the membrane potential and Nernst potential for the given ion. The conductivity in the model is given as gKn^4 for potassium, gNam^3*h sodium, and gL for leakage, where gK, gNa and gL were measured experimentally for neurons. And, n, m, and h are variables that describe the experimentally observed voltage-gated mechanism of neuronal sodium and potassium channels. (Please see Hodgkin AL, Huxley AF. 1952. Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo. J. Physiol. 116:449-72 and Hodgkin AL, Huxley AF. 1952. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117:500-44).

      In the 70 years since Hodgkin and Huxley first presented their model, a huge number of similar models have been proposed to describe cellular electrophysiology. We are not being hyperbolic when we state that the HH models for excitable cells are like the Schrödinger equation for molecules. We carefully adapted our HH model to reflect the currently understood electrophysiology of E. coli.

      Thus, in applying the model to describe bacterial electrophysiology one should ensure near equilibrium requirement holds (so that (V-VQ) etc terms in authors' equation Figure 5 B hold), and potassium and other channels in a given bacterium have similar gating properties to those found in neurons. I am not aware of such measurements in any bacteria, and therefore think the pump leak model of the electrophysiology of bacteria needs to start with fluxes that are more general (for example Keener JP, Sneyd J. 2009. Mathematical physiology: I: Cellular physiology. New York: Springer or https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000144 [journals.plos.org])

      The reference is to a slightly more modern version of a simple Nernstian battery model. The model will not oscillate and thus will not help modelling membrane potentials in bacteria. We are unsure where the equilibrium requirement comes from (inadequate modelling of the dynamics?)

      Result section 7: Mechanosensitive ion channels (MS) are vital for the first hyperpolarization event in E. coli.

      The results that Mcs channels affect the profile of ThT dye are interesting. It is again possible that the membrane permeability of these mutants has changed and therefore the dynamics have changed, so this needs to be checked first. I also note that our results show that the peak of ThT coincides with cell expansion. For this to be understood a model is needed that also takes into account the link between maintenance of electrochemical gradients of ions in the cell and osmotic pressure.

      The evidence for permeability changes in the membranes seems to be tenuous.

      A side note is that the authors state that the Msc responds to stress-related voltage changes. I think this is an overstatement. Mscs respond to predominantly membrane tension and are mostly nonspecific (see how their action recovers cellular volume in this publication https://www.pnas.org/doi/full/10.1073/pnas.1522185113 [pnas.org]). Authors cite references 35-39 to support this statement. These publications still state that these channels are predominantly membrane tension-gated. Some of the references state that the presence of external ions is important for tension-related gating but sometimes they gate spontaneously in the presence of certain ions. Other publications cited don't really look at gating with respect to ions (39 is on clustering). This is why I think the statement is somewhat misleading.

      We have reworded the discussion of Mscs since the literature appears to be ambiguous. We will try to run some electrical impedance spectroscopy experiments on the Msc mutants in the future to attempt to remove the ambiguity.

      Result section 8: Anomalous ion-channel-mediated wavefronts propagate light stress signals in 3D E. coli biofilms.

      I am not commenting on this result section, as it would only be applicable if ThT was membrane potential dye in E. coli.

      Ok, but we disagree on the use of ThT.

      Aims achieved/results support their conclusions:

      The authors clearly present their data. I am convinced that they have accurately presented everything they observed. However, I think their interpretation of the data and conclusions is inaccurate in line with the discussion I provided above.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      I do not think this publication should be published in its current format. It should be revised in light of the previous literature as discussed in detail above. I believe presenting it in it's current form on eLife pages would create unnecessary confusion.

      We believe many of the Pilizota group articles are scientifically flawed and are causing the confusion in the literature.

      Any other comments:

      I note, that while this work studies E. coli, it references papers in other bacteria using ThT. For example, in lines 35-36 authors state that bacteria (Bacillus subtilis in this case) in biofilms have been recently found to modulate membrane potential citing the relevant literature from 2015. It is worth noting that the most recent paper https://journals.asm.org/doi/10.1128/mbio.02220-23 [journals.asm.org] found that ThT binds to one or more proteins in the spore coat, suggesting that it does not act as a membrane potential in Bacillus spores. It is possible that it still reports membrane potential in Bacillus cells and the recent results are strictly spore-specific, but these should be kept in mind when using ThT with Bacillus.

      ThT was used successfully in previous studies of normal B. subtilis cells (by our own group and A.Prindle, ‘Spatial propagation of electrical signal in circular biofilms’, J.A.Blee et al, Physical Review E, 2019, 100, 052401, J.A.Blee et al, ‘Membrane potentials, oxidative stress and the dispersal response of bacterial biofilms to 405 nm light’, Physical Biology, 2020, 17, 2, 036001, A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 59-63). The connection to low metabolism pore research seems speculative.

      Reviewer #3 (Public Review):

      It has recently been demonstrated that bacteria in biofilms show changes in membrane potential in response to changes in their environment, and that these can propagate signals through the biofilm to coordinate bacterial behavior. Akabuogu et al. contribute to this exciting research area with a study of blue light-induced membrane potential dynamics in E. coli biofilms. They demonstrate that Thioflavin-T (ThT) intensity (a proxy for membrane potential) displays multiphasic dynamics in response to blue light treatment. They additionally use genetic manipulations to implicate the potassium channel Kch in the latter part of these dynamics. Mechanosensitive ion channels may also be involved, although these channels seem to have blue light-independent effects on membrane potential as well. In addition, there are challenges to the quantitative interpretation of ThT microscopy data which require consideration. The authors then explore whether these dynamics are involved in signaling at the community level. The authors suggest that cell firing is both more coordinated when cells are clustered and happens in waves in larger, 3D biofilms; however, in both cases evidence for these claims is incomplete. The authors present two simulations to describe the ThT data. The first of these simulations, a Hodgkin-Huxley model, indicates that the data are consistent with the activity of two ion channels with different kinetics; the Kch channel mutant, which ablates a specific portion of the response curve, is consistent with this. The second model is a fire-diffuse-fire model to describe wavefront propagation of membrane potential changes in a 3D biofilm; because the wavefront data are not presented clearly, the results of this model are difficult to interpret. Finally, the authors discuss whether these membrane potential changes could be involved in generating a protective response to blue light exposure; increased death in a Kch ion channel mutant upon blue light exposure suggests that this may be the case, but a no-light control is needed to clarify this.

      In a few instances, the paper is missing key control experiments that are important to the interpretation of the data. This makes it difficult to judge the meaning of some of the presented experiments.

      (1) An additional control for the effects of autofluorescence is very important. The authors conduct an experiment where they treat cells with CCCP and see that Thioflavin-T (ThT) dynamics do not change over the course of the experiment. They suggest that this demonstrates that autofluorescence does not impact their measurements. However, cellular autofluorescence depends on the physiological state of the cell, which is impacted by CCCP treatment. A much simpler and more direct experiment would be to repeat the measurement in the absence of ThT or any other stain. This experiment should be performed both in the wild-type strain and in the ∆kch mutant.

      ThT is a very bright fluorophore (much brighter than a GFP). It is clear from the images of non-stained samples that autofluorescence provides a negligible contribution to the fluorescence intensity in an image.

      (2) The effects of photobleaching should be considered. Of course, the intensity varies a lot over the course of the experiment in a way that photobleaching alone cannot explain. However, photobleaching can still contribute to the kinetics observed. Photobleaching can be assessed by changing the intensity, duration, or frequency of exposure to excitation light during the experiment. Considerations about photobleaching become particularly important when considering the effect of catalase on ThT intensity. The authors find that the decrease in ThT signal after the initial "spike" is attenuated by the addition of catalase; this is what would be predicted by catalase protecting ThT from photobleaching (indeed, catalase can be used to reduce photobleaching in time lapse imaging).

      Photobleaching was negligible over the course of the experiments. We employed techniques such as reducing sample exposure time and using the appropriate light intensity to minimize photobleaching.

      (3) It would be helpful to have a baseline of membrane potential fluctuations in the absence of the proposed stimulus (in this case, blue light). Including traces of membrane potential recorded without light present would help support the claim that these changes in membrane potential represent a blue light-specific stress response, as the authors suggest. Of course, ThT is blue, so if the excitation light for ThT is problematic for this experiment the alternative dye tetramethylrhodamine methyl ester perchlorate (TMRM) can be used instead.

      Unfortunately the fluorescent baseline is too weak to measure cleanly in this experiment. It appears the collective response of all the bacteria hyperpolarization at the same time appears to dominate the signal (measurements in the eLife article and new potentiometry measurements).

      (4) The effects of ThT in combination with blue light should be more carefully considered. In mitochondria, a combination of high concentrations of blue light and ThT leads to disruption of the PMF (Skates et al. 2021 BioRXiv), and similarly, ThT treatment enhances the photodynamic effects of blue light in E. coli (Bondia et al. 2021 Chemical Communications). If present in this experiment, this effect could confound the interpretation of the PMF dynamics reported in the paper.

      We think the PMF plays a minority role in determining the membrane potential in E. coli. For reasons outlined before (H+ is a minority ion in E. coli compared with K+).

      (5) Figures 4D - E indicate that a ∆kch mutant has increased propidium iodide (PI) staining in the presence of blue light; this is interpreted to mean that Kch-mediated membrane potential dynamics help protect cells from blue light. However, Live/Dead staining results in these strains in the absence of blue light are not reported. This means that the possibility that the ∆kch mutant has a general decrease in survival (independent of any effects of blue light) cannot be ruled out.

      Both strains of bacterial has similar growth curve and also engaged in membrane potential dynamics for the duration of the experiment. We were interested in bacterial cells that observed membrane potential dynamics in the presence of the stress. Bacterial cells need to be alive to engage in membrane potential dynamics (hyperpolarize) under stress conditions. Cells that engaged in membrane potential dynamics and later stained red were only counted after the entire duration. We believe that the wildtype handles the light stress better than the ∆kch mutant as measured with the PI.

      (6) Additionally in Figures 4D - E, the interpretation of this experiment can be confounded by the fact that PI uptake can sometimes be seen in bacterial cells with high membrane potential (Kirchhoff & Cypionka 2017 J Microbial Methods); the interpretation is that high membrane potential can lead to increased PI permeability. Because the membrane potential is largely higher throughout blue light treatment in the ∆kch mutant (Fig. 3AB), this complicates the interpretation of this experiment.

      Kirchhoff & Cypionka 2017 J Microbial Methods, using fluorescence microscopy, suggested that changes in membrane potential dynamics can introduce experimental bias when propidium iodide is used to confirm the viability of tge bacterial strains, B subtilis (DSM-10) and Dinoroseobacter shibae, that are starved of oxygen (via N2 gassing) for 2 hours. They attempted to support their findings by using CCCP in stopping the membrane potential dynamics (but never showed any pictoral or plotted data for this confirmatory experiment). In our experiment methodology, cell death was not forced on the cells by introducing an extra burden or via anoxia. We believe that the accumulation of PI in ∆kch mutant is not due to high membrane potential dynamics but is attributed to the PI, unbiasedly showing damaged/dead cells. We think that propidium iodide is good for this experiment. Propidium iodide is a dye that is extensively used in life sciences. PI has also been used in the study of bacterial electrophysiology (https://pubmed.ncbi.nlm.nih.gov/32343961/, ) and no membrane potential related bias was reported.

      Throughout the paper, many ThT intensity traces are compared, and described as "similar" or "dissimilar", without detailed discussion or a clear standard for comparison. For example, the two membrane potential curves in Fig. S1C are described as "similar" although they have very different shapes, whereas the curves in Fig. 1B and 1D are discussed in terms of their differences although they are evidently much more similar to one another. Without metrics or statistics to compare these curves, it is hard to interpret these claims. These comparative interpretations are additionally challenging because many of the figures in which average trace data are presented do not indicate standard deviation.

      Comparison of small changes in the absolute intensities is problematic in such fluorescence experiments. We mean the shape of the traces is similar and they can be modelled using a HH model with similar parameters.

      The differences between the TMRM and ThT curves that the authors show in Fig. S1C warrant further consideration. Some of the key features of the response in the ThT curve (on which much of the modeling work in the paper relies) are not very apparent in the TMRM data. It is not obvious to me which of these traces will be more representative of the actual underlying membrane potential dynamics.

      In our experiment, TMRM was used to confirm the dynamics observed using ThT. However, ThT appear to be more photostable than TMRM (especially towars the 2nd peak). The most interesting observation is that with both dyes, all phases of the membrane potential dynamics were conspicuous (the first peak, the quiescent period and the second peak). The time periods for these three episodes were also similar.

      A key claim in this paper (that dynamics of firing differ depending on whether cells are alone or in a colony) is underpinned by "time-to-first peak" analysis, but there are some challenges in interpreting these results. The authors report an average time-to-first peak of 7.34 min for the data in Figure 1B, but the average curve in Figure 1B peaks earlier than this. In Figure 1E, it appears that there are a handful of outliers in the "sparse cell" condition that likely explain this discrepancy. Either an outlier analysis should be done and the mean recomputed accordingly, or a more outlier-robust method like the median should be used instead. Then, a statistical comparison of these results will indicate whether there is a significant difference between them.

      The key point is the comparison of standard errors on the standard deviation.

      In two different 3D biofilm experiments, the authors report the propagation of wavefronts of membrane potential; I am unable to discern these wavefronts in the imaging data, and they are not clearly demonstrated by analysis.

      The first data set is presented in Figures 2A, 2B, and Video S3. The images and video are very difficult to interpret because of how the images have been scaled: the center of the biofilm is highly saturated, and the zero value has also been set too high to consistently observe the single cells surrounding the biofilm. With the images scaled this way, it is very difficult to assess dynamics. The time stamps in Video S3 and on the panels in Figure 2A also do not correspond to one another although the same biofilm is shown (and the time course in 2B is also different from what is indicated in 2B). In either case, it appears that the center of the biofilm is consistently brighter than the edges, and the intensity of all cells in the biofilm increases in tandem; by eye, propagating wavefronts (either directed toward the edge or the center) are not evident to me. Increased brightness at the center of the biofilm could be explained by increased cell thickness there (as is typical in this type of biofilm). From the image legend, it is not clear whether the image presented is a single confocal slice or a projection. Even if this is a single confocal slice, in both Video S3 and Figure 2A there are regions of "haze" from out-of-focus light evident, suggesting that light from other focal planes is nonetheless present. This seems to me to be a simpler explanation for the fluorescence dynamics observed in this experiment: cells are all following the same trajectory that corresponds to that seen for single cells, and the center is brighter because of increased biofilm thickness.

      We appreciate the reviewer for this important observation. We have made changes to the figures to address this confusion. The cell cover has no influence on the observed membrane potential dynamics. The entire biofilm was exposed to the same blue light at each time. Therefore all parts of the biofilm received equal amounts of the blue light intensity. The membrane potential dynamics was not influenced by cell density (see Fig 2C).

      The second data set is presented in Video S6B; I am similarly unable to see any wave propagation in this video. I observe only a consistent decrease in fluorescence intensity throughout the experiment that is spatially uniform (except for the bright, dynamic cells near the top; these presumably represent cells that are floating in the microfluidic and have newly arrived to the imaging region).

      A visual inspection of Video S6B shows a fast rise, a decrease in fluorescence and a second rise (supplementary figure 4B). The data for the fluorescence was carefully obtained using the imaris software. We created a curved geometry on each slice of the confocal stack. We analyzed the surfaces of this curved plane along the z-axis. This was carried out in imaris.

      3D imaging data can be difficult to interpret by eye, so it would perhaps be more helpful to demonstrate these propagating wavefronts by analysis; however, such analysis is not presented in a clear way. The legend in Figure 2B mentions a "wavefront trace", but there is no position information included - this trace instead seems to represent the average intensity trace of all cells. To demonstrate the propagation of a wavefront, this analysis should be shown for different subpopulations of cells at different positions from the center of the biofilm. Data is shown in Figure 8 that reflects the velocity of the wavefront as a function of biofilm position; however, because the wavefronts themselves are not evident in the data, it is difficult to interpret this analysis. The methods section additionally does not contain sufficient information about what these velocities represent and how they are calculated. Because of this, it is difficult for me to evaluate the section of the paper pertaining to wave propagation and the predicted biofilm critical size.

      The analysis is considered in more detail in a more expansive modelling article, currently under peer review in a physics journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      There are some instances in the paper where claims are made that do not have data shown or are not evident in the cited data:

      (1) In the first results section, "When CCCP was added, we observed a fast efflux of ions in all cells"- the data figure pertaining to this experiment is in Fig. S1E, which does not show any ion efflux. The methods section does not mention how ion efflux was measured during CCCP treatment.

      We have worded this differently to properly convey our results.

      (2) In the discussion of voltage-gated calcium channels, the authors refer to "spiking events", but these are not obvious in Figure S3E. Although the fluorescence intensity changes over time, it's hard to distinguish these fluctuations from measurement noise; a no-light control could help clarify this.

      The calcium transients observed were not due to noise or artefacts.

      (3) The authors state that the membrane potential dynamics simulated in Figure 7B are similar to those observed in 3D biofilms in Fig. S4B; however, the second peak is not clearly evident in Fig. S4B and it looks very different for the mature biofilm data reported in Fig. 2. I have some additional confusion about this data specifically: in the intensity trace shown in Fig. S4B, the intensity in the second frame is much higher than the first; this is not evident in Video S6B, in which the highest intensity is in the first frame at time 0. Similarly, the graph indicates that the intensity at 60 minutes is higher than the intensity at 4 minutes, but this is not the case in Fig. S4A or Video S6B.

      The confusion stated here has now been addressed. Also it should be noted that while Fig 2.1 was obtained with LED light source, Fig S4A was obtained using a laser light source. While obtaining the confocal images (for Fig S4A ), the light intensity was controlled to further minimize photobleaching. Most importantly, there is an evidence of slow rise to the 2nd peak in Fig S4B. The first peak, quiescence and slow rise to second peak are evident.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Fig. 3C needs the "still" for the movie of control C. owczarzaki (in Movie S1).

      We have now added a WT control in this figure panel.

      (2) The elongated cell shape is seen infrequently in control cells, and I wonder whether these events are transient inactivation of coHpo or coWts in these cells. Perhaps the authors could comment on this in the discussion.

      This is an interesting possibility and we have now included it in our discussion (Lines 401403).

      (3) Does C. owczarzaki normally aggregate or this is a lab-specific phenotype? For example, the slime mold Dictyostelium discoideum forms aggregates during its life cycle. Could some additional information about C. owczarzaki be added to the introduction?

      Unfortunately little is known about Capsaspora “in the wild”, as it was isolated as an endosymbiont from a laboratory strain of snails. However, some related filasterians isolated from natural environments also show aggregatve ability, indicating that aggregation is in fact a physiological process in this group of organisms. We have updated our introduction to include this fact (Line 78-80).

      Reviewer #2 (Recommendations For The Authors):

      The studies on Hippo signalling in Capsaspora are currently limited to genetic experiments and analysis of Yki/YAP localisation. Biochemical evidence that Co Wts phosphorylates Co Yki/YAP on a conserved serine residue(s) would give important further evidence that this essential signalling step in the animal Hippo pathway is conserved in Capsaspora. However, such experiments require antibodies that detect specific phosphorylation events, which might not be available at present. Is mass spectrometry of the phospho-proteome a potential approach that could be employed to investigate this? The benefit of this approach is it would give information on other Hippo pathway proteins and could be used to probe signalling events under different culture conditions (e.g., aggregate, non-aggregate).

      In response to this recommendation, we attempted to detect Phospho-coWts and PhosphocoHpo using commercial antibodies against mammalian their homologs, in the hope of cross-species reactivity. However, we could not detect a signal by Western blot. Thus better reagents or refinement of techniques beyond the scope of this article may be required to examine the phosphorylation of these Capsaspora proteins. There was a published report of Capsaspora phosphoproteome analysis (Sebe Pedros et al., 2016 Dev Cell), although phosphorylation of the conserved sites on coYki, coWts, and coHpo was not reported in this analysis, suggesting more targeted approaches may be needed to examine phosphorylation of these core Hippo pathway components.

      The following statement that Wts LOF is stronger than Hpo LOF Capsaspora is consistent with overgrowth phenotypes in flies and mammals:

      "Interestingly, we found that coWts-/- cells were significantly more likely to show nuclear mScarlet-coYki localization than coHpo-/- cells (Figure 1D), which is consistent with Hpo/MST independent activity of Wts/LATS previously reported in Drosophila and mammals (Zheng et al., 2015)."

      However, the following statement describes a stronger phenotype in Hpo LOF Capsaspora than Wts LOF:

      "As contractile cells in the coHpo mutant background tended to show a more extreme elongated morphology than the coWts mutant, we focused on the coHpo mutant for further analysis."

      Does this mean that Hpo can regulate actomyosin contractility in both Wts/Yki-dependent and independent manners? A genetic experiment, similar to those that have been performed in Drosophila and mammals could help to address this, e.g., what is the phenotype of Hpo, Yki Capsaspora and Wts, Yki double mutant Capsaspora? Do they phenocopy Yki LOF Capsaspora and are the actomyosin phenotypes associated with Hpo and Wts mutant Capsaspora completely or partially suppressed? The authors indicate that generation of double mutant Capsaspora is not technically possible at present, however.

      Indeed given available techniques the generation of such double mutants is not currently possible. With this phenotype (aberrant cytoskeletal dynamics), it is hard to say what a “stronger” phenotype is, and which mutant has the “stronger” phenotype. We have edited this statement to try and reflect this point (Line 208-209).

      Another outstanding question is whether the Hpo/Wts/Yki-related actomyosin phenotypes are linked to regulation of transcription by Yki, or are regulated non-transcriptionally. Indeed, a non-transcriptional role for Drosophila Yki in promoting actomyosin contractility has been reported (Fehon lab, Dev Cell, 2018). Generation of Scalloped/TEAD mutant Capsaspora would allow this question to be investigated. Alternatively, this could be explored using variant Co Yki transgenes, e.g., one a Co Yki transgene does not form a physical complex with Co Sd/TEAD and a Co Yki transgene that is targeted to the cell cortex.

      To address this point, we tested whether a conserved amino acid residue in coYki (F123) that is required for transcriptional activity of human YAP (in this case, F95) is required for the phenotypic effects of the coYki 4SA mutant. We found that, in contrast to expression of coYki 4SA, expression of a coYki 4SA F123A mutant showed no effect on cell or aggregate morphology. These new results, which support a requirement for transcriptional activity for coYki function, have now been added to Figure 7.

      Reviewer #3 (Recommendations For The Authors):

      Repetition from previous publication:

      (1) ej: last sentences of the abstract in both works: From Phillips et al. eLife 2022;0:e77598: "Taken together, these findings implicate an ancestral role for the Hippo pathway in cytoskeletal dynamics and multicellular morphogenesis predating the origin of animal multicellularity, which was co-opted during evolution to regulate cell proliferation".

      From this manuscript: "Together, these results implicate cytoskeletal regulation but not proliferation as an ancestral function of the Hippo pathway and uncover a novel role for Hippo signaling in regulating cell density in a proliferation-independent manner "

      Our two papers deal with different components of the Hippo pathway: Yorkie/YAP/coYki in Phillips et al. eLife 2022;0:e77598 and upstream kinases in the current paper. The fact that perturbing different components of the pathway leads to similar conclusions actually strengthens the overall conclusion. Nevertheless, to be more clear about the novelty of the current manuscript, we have now changed the current text from “Hippo pathway” to “Hippo kinase cascade”, to emphasize that the current analysis deals with kinases upstream of Yorkie/YAP/coYki (Lines 35, 368-371).

      (2) The authors claim that the change in localization of coYki in Hpo -/- and Wts -/- , being now able to enter the nucleus, is the demonstration that the nuclear regulation of Yki by the Hippo pathway is ancestral to animals. Nevertheless, the authors had already made this claim in their publication of eLife 2022, when they made a mutant version of Yki with the four conserved phosphorylation sites (Sebé-Padrós 2012) mutated. Figure 5 A to F in Phillips et al. eLife 2022;0:e77598. In their words "This regulation of coYki nuclear localization, along with the previous finding that coYki can induce the expression of Hippo pathway genes when expressed in Drosophila (Sebé-Pedrós et al., 2012), suggests that the function of coYki has a transcriptional regulator and Hippo pathway effector is conserved between Capsaspora and animals. ".

      I understand that the localization of Yki in the coHpo-/- and coWts-/- is needed as part of final proof that Hpo and Wts are the kinases that control Yki phosphorylation in C. owczarzaki, but does not constitute a completely new message and should be written like that. Figure 1C of the actual manuscript drives to the same conclusion as Figure 5 A to F in Phillips et al. eLife 2022;0:e77598

      We think that demonstrating that Hippo and Warts orthologs specifically are responsible for regulation of coYki localization is a very important finding: Many unicellular organisms encode Hippo, Warts, and/or Yorkie’s transcriptional factor partner Sd, but not Yorkie. Our understanding is that in these earlier-branching unicellular organisms, the Hippo/Warts kinase module and Sd-like proteins functioned in distinct signaling modules. Thus Yorkie has the interesting property of “fusing” these two distinct signaling modules when it emerged. In this framework, it is interesting to show that this “fusion” occurred in Capsaspora, the most distant known relative of animals with a Yorkie ortholog, indicating that this “fusion” event is very ancient. Although fleshing out of this idea is beyond the scope of this manuscript and we plan to write about it elsewhere, we have modified our discussion to point out the importance that Hippo and Warts specifically are upstream regulators of coYki.

      In Drosophila among the genes transcriptionally regulated by Yki, are the positive regulators of the Hippo pathway in order to down regulate the Yki production.

      (1) The authors don't explain if these upstream regulators of the Hippo pathway are conserved in C. owczarzaki.

      We have now indicated the conservation of some upstream Hippo pathway components (Line 69-71).

      (2) Also it would be important to know how much coYki is being active in the C. owczarzaki in the mutant lines of coHpo-/- and coWts-/- in respect to wt and also in respect to coYki 4SA, and how this is impacting the transcription and protein production of down stream genes of coYki. I think some transcriptional and proteomic data would be informative. At least for those genes related with cytoskeleton.

      We have now performed RNA-seq on the coHpo and coWts mutants to address the concerns above (See Figure 8 and the final section of Results).

      Related with the above. Among the downstream targets of coYki, the authors mentioned in their previous work (Phillips et al. eLife 2022;0:e77598) that B-integrins were up regulated in coYki -/- suggesting that B-integrins could be behind the stronger cell-substrate attachment observed in the coYki-/- mutant. It would be important to investigate if the integrin adhesome is now down regulated and how previous and new results are related to the stronger cellsubstrate attachment in the coHpo-/- and coWts-/- lines. It would be important that previous results on coYki-/-, a mutant line of the same pathway, are discussed in these two new mutant contexts.

      Two Capsaspora integrin beta genes were previously found to be upregulated in the coYki mutant (CAOG_05058 and CAOG_01283, from Phillips et al., 2022 eLife). In our coWts and coHpo mutant RNAseq data, we see that CAOG_05058 is upregulated in both coHpo and coWts mutants, whereas CAOG_01283 does not show significantly different expression in either the coHpo or coWts mutant. Because the CAOG_05058 expression data seems to go in the “opposite” direction than you might expect (i.e. not “down regulated” as the reviewer predicts), and because we see no change in expression in CAOG_01283, these results are difficult to interpret. Therefore the role of integrins in Capsaspora Hippo pathway mutant phenotypes is thus still an open question.

      Some cells from the coHpo-/- and coWts-/- mutant lines, show higher attachment to the substrate, which results in an elongated shape while the cell detaches from the substrate. The authors claim this phenotype as a contractile behavior in these cells. This behavior would be caused by changes in cytoskeleton regulation or increased number of microvilli or a change in the distribution of microvilli.

      (1) In my opinion, this phenotype can not be considered a behavior per se (the cells become round once they are free from the substrate, so the elongation is temporal and the contractile behavior is a consequence from this attachment to the substrate), so I would not say that the Hippo pathway controls a contractile behavior as the authors state as one of the main conclusions of the manuscript.

      Many cell behaviors are known to depend on external conditions, such as substrates, growth factors, nutrients, chemokines, etc., and are therefore “temporal” by the reviewer’s criteria. We therefore feel that the phenotype we describe here can be considered a cell behavior.

      (2) On the other hand I think that further efforts on microscopy or immunocytochemistry could be performed in order to discern among the different causes; more microvilli? change in microvilli distribution? change in the acto-myosin cytoskeleton? Moreover these options are not mutually exclusive and very likely the explanation is multifactorial.

      (3) coWts-/- has a different phenotype at the periphery of the aggregates than coHpo-/-. The authors use stable transfected lines with NMM-Venus to visualize microvilli. It would be interesting that further experiments using this tool would be performed in order to visualize putative differences of the cell membrane at the periphery in the two mutant genotypes.

      We have now performed experiments examining filopodia in round vs elongated cells using the NMM-venus marker, as well as differences in filopodial morphology within aggregates in the different genotypes. Our data and conclusions are included in our updated manuscript (Figure 3- figure supplement 1).

      The authors nicely inspect the consequences of the mutant lines coHpo-/- and coWts-/- in the formation of the aggregates. They find that the aggregates in these cases are more densely packed likely due to the higher attachment from microvilli, which they are able to revert by using myosin inhibitors.

      (1) As mentioned above, it would be interesting that further experiments are performed by using NMM-Venus transfection into the coHpo-/- and coWts-/-genotypes in order to visualize putative differences of the strength and distribution of the microvilli in the aggregates of these two mutant genotypes. These experiments would inform if more or less microvilli contacts are created in these lines and support a mechanical explanation of the denser aggregates in the mutant lines, as they now suggest in the discussion.

      We have now performed these experiments, and our data and conclusions are described in the updated manuscript (Figure 5- figure supplement 1).

      (2) On the other hand, myosin inhibition through blebbistatin increases the number of elongated cells in the mutant lines, demonstrating that myosin is necessary for the cells to resolve their substrate attachment and become round. In my view is confusing that myosin is needed for cells to become round again (wt phenotype) and at the same time myosin inhibition is needed for aggregates to become less dense (wt phenotype). Do they lose density because more elongated cells are now in the aggregate? These results look confusing to me and I think they should be better discussed. Again the above transfections of NMM-Venus into the coHpo-/- and coWts-/-genotypes could be informative.

      We have attempted to detect cells with an “elongated” morphology within WT and mutant aggregates but so far have been unable to visualize such cells. More advanced microscopy techniques at extended time scales may allow us to observe such things, but we believe such studies are beyond the scope of this manuscript.

      The authors do not connect and discuss their results with a very relevant study done in Drosophila, Xu J et al. Dev Cell. 2018; 46(3): 271-284.e5, where a transcriptionally independent role of Yki is characterized. In Drosophila, Yki has an important role in a positive feedback loop with myosin at the cortical part of the cell, which is especially relevant for cytoskeleton regulation.

      The results encountered by the authors in their previous study with coYki-/-, indicated that coYki was important for proper actin dynamics and cell shape in C. owczarzaki. At that moment they did not interrogate if this phenotype could be due to the lack of a possible role of coYki in the cortex and they argue that the phenotype was caused by the lack of transcription regulation of downstream genes of coYki, which actually many were cytoskeleton related.

      Because the cortex function of Yki is independent of regulation of Hpo and Wts, the authors could use these genotypes by comparing them with WT (where the cortical role of Yki should be the same) and coYki-/- to investigate if the cortex role of Yki, is conserved in C. owczarzaki. In Drosophila the cortex role of Yki has been suggested to control tension at the cell surface. Drosophila Yki at the cortex activates myosin II through the N-terminal part of the protein and establishes a positive feedback loop by down regulating the Hippo pathway and obtaining therefore more active DmYki into the nucleus. This mechanism has been proposed by Xu et al. to work as the link between sensing cell tensions at the surface with control of tissue proliferation.

      In my opinion these are relevant results in the field that should be addressed in this study or at least well discussed. Actually, I think they could be a great opportunity for investigating if a putative cortex role of Yki is ancestral to its role linked to the Hippo pathway.

      We have now addressed this study in our manuscript- please see our response to reviewer #2’s last comment above.

      It would be informative to understand how stable expression through hygromicin selection is achieved in the transfection experiments. Is the recombinant plasmid integrated in the genome? Or is it stable as an episome?

      We believe that the plasmids stably integrate, as we never lose fluorescent signal once established in a clonal line, even after extended culturing (>6 months). It may be worthwhile to definitely determine integration vs. episome in future studies.

      The authors do not speculate or discuss how cell tension and cell proliferation is different for a unicellular organism or a tissue (multicellular) and I think should be addressed since the contexts are different.

      This is an interesting and important point, which we plan to discuss in detail in an upcoming review article, as a proper discussion of this idea, we think, is beyond the scope of this manuscript.

      Minor point. The study should cite other unicellular holozoans that have been also developed into treatable organisms such as Monosiga brevicollis (Woznica A, Kumar A, et al 2021eLife 10:e70436) and Abeoforma whisleri (Faktorová, D., Nisbet, R.E.R., Fernández Robledo, J.A. et al. Nat Methods17, 481-494 (2020) in line 83 of the manuscript. I am sure the authors appreciate how much effort there is behind every non-model organism put forward as experimentally treatable and should be properly acknowledged.

      We agree, and we have now included these examples of non-model organism development in our manuscript.

    2. Reviewer #1 (Public Review):

      Summary:

      This Research Advance is an extension of this group's prior eLife paper published in 2022 on the conserved roles of the Hippo pathway effector Yorkie in C. owczarzaki (PMID: 35659869). This species is an amoeba that holds an important phylogenetic position as a close relative of multicellular animals. The prior study used genome editing to delete the C. owczarzaki Yki (termed coYki) and found that Yki is not required for proliferation but instead regulates cell contractility and cell aggregation. In the current study, the authors wanted to address whether Hippo pathway kinases - coHippo (coHpo) and coWarts (coWts) - regulate coYki and whether they are dispensable for proliferation but instead regulate cell contractility and cell aggregation. They used genome editing to delete coHpo and coWts singly in C. owczarzaki. Both mutant strains had increased nuclear location of transfected coYki (tagged with Scarlet), suggesting that Hippo kinase pathway regulation of Yki is conserved in this organism. Neither kinase is required for proliferation. Either kinase mutant strain had a significantly increased percentage of cells that were elongated, which was relatively rare in a control population. The incident of elongation could be enhanced in both kinase-mutant and in control cells when myosin inhibitors were added to the media. coHpo and coWts-mutant aggregates were more tightly packed than control cell aggregates, which they hypothesize is due to the increased contractility seen in kinase-mutant cells. They could reduce the density of packing in kinase-mutant aggregates when they treated the cells with myosin or F-actin inhibitors. To test whether the effects observed in kinase-mutant strains were due to increased Yki activation, they generated a coYki with four S to A substitutions (termed coYki4SA), which should produce a dominant-active Yki impervious to phosphorylation and hence inactivation by Hippo kinases. Control C. owczarzaki cells transfected with coYki4SA had increased cell density in aggregates and elongation in adherent cells. These results support their conclusions that coHpo and coWts regulate cell contractility and cell packing through coYki.

      Strengths:

      The major strengths of the paper include high quality data, robust analyses of the data, and a well-written manuscript. The combination of genome editing in C. owczarzaki, transfection of C. owczarzaki, and time-lapse movies of adherent cells generally support the conclusions (1) that control of cell density is an ancient function of the Hippo pathway; (2) that Hippo pathway control of cytoskeletal properties and contractile behavior underlie its regulation of cell density, and (3) that Hippo kinase control of Yki localization is likely an ancient function of the pathway.

      Weaknesses:

      There are no weaknesses.

    3. Reviewer #2 (Public Review):

      The study builds on the work of the Pan group and others which has described the existence of core Hippo pathway proteins in Capsaspora and, more recently, described a role for a Yorkie/YAP homologue in regulation of cell shape and actin, as opposed to proliferation. For this recent study, they developed genetic techniques to mutate genes in Capsaspora, and this technology has been leveraged again in this study. Using loss of function genetic approaches, the authors find that loss of either of the two major kinases in the Hippo pathway core kinase cassette (Warts and Hippo) impact Capsaspora morphology and the actin cytoskeleton. This is phenocopied by overexpression of Capsaspora Yorkie/YAP. In addition, Capsaspora Yorkie/YAP accumulates in the nucleus of organisms lacking Warts or Hippo, as it does in metazoans. While these experiments are not overly surprising, they still provide important verification that core Hippo signaling events are conserved in Capsaspora.

      Subsequently, they show that Capsaspora lacking Warts or Hippo do not overproliferate, which contrasts with many studies in metazoans (flies, mice, fish), particularly in epithelial tissues where loss of Warts or Hippo often causes overproliferation. Rather, the authors show that Capsaspora Warts and Hippo regulate cell morphology and actomyosin-dependent contractile behaviour. They speculate from these findings that Hippo signalling could regulate the density of Capsaspora when they grow in aggregates and draw parallels to the known role of the Hippo pathway in contact inhibition of mammalian cells grown in culture.

      Together with their 2022 paper, this study paints an emerging picture that the ancestral function of the Hippo pathway is to regulate the actin cytoskeleton, not proliferation, which is a significant finding. This also suggests that the ability to control proliferation was something that the Hippo pathway was re-purposed to do at some stage during the evolution of metazoans. These findings are important for the Hippo field, and our understanding of cellular signalling and evolution more broadly.

      In future studies, further biochemical and genetic experiments would allow the authors to more conclusively prove that core features of Hippo signalling are conserved in Capsaspora - e.g., that Capsaspora Hippo/MST activates Warts/LATS by phosphorylation and Warts/LATS represses Yorkie/YAP by phosphorylation hey serine residues. Some of these experiments are challenging or not yet possible due to technical limitations. Higher resolution imaging approaches such as electron microscopy would likely give further mechanistic insights into how Hpo, Wts and Yki modulate actomyosin contractility in Capsaspora. Recent advances in mass spectrometry of the phospho-proteome should provide a valuable way to explore Hippo signalling in Capsaspora. The benefit of this approach is it has the potential to give information on all Hippo pathway proteins and could be used to probe signalling events under different culture conditions (e.g., aggregate, non-aggregate).

    4. Reviewer #3 (Public Review):

      The authors present in this study the characterization of two mutant lines of the filasterean Capsaspora owczarzaki, a unicellular holozoan with a key phylogenetic position to understand multicellular development in animals. The present study is built on a previous work from the same research group, on the mutant of the orthologue of the Yorki gene in C. owczarzaki. By knocking out the two upstream kinases of the same pathway, coHpo-/- and coWts-/-, in single cell and aggregates of C. owkzarzaki, they now have mutated the entire pathway and in different cellular contexts.

      The authors obtain results in the same direction as the previous work, demonstrating that the Hippo pathway of the unicellular holozoan C. owczarzaki, is not involved in the control of cell proliferation but is related with cytoskeletal dynamics through the actin-myosin mechanism.

      In this revised version of the study, the authors have addressed my concerns by providing additional experiments, references and discussing further the points of controversy.

      I think the authors have done a great job improving the robustness of the paper proving further some of the claims raised in the previous version of the manuscript.

    1. eLife assessment:

      The study answers the important question of whether the conformational dynamics of proteins are slaved by the motion of solvent water or are intrinsic to the polypeptide. The results from neutron scattering experiments, involving isotopic labelling, carried out on a set of four structurally different proteins are convincing, showing that protein motions are not coupled to the solvent. A strength of this work is the study of a set of proteins using spectroscopy covering a range of resolutions, however, it suffers from some scholarly shortcomings and limited discussion of results. The work is of broad interest to researchers in the fields of protein biophysics and biochemistry.

    2. Reviewer #1 (Public Review):

      Summary:

      Zheng et al. study the 'glass' transitions that occur in proteins at ca. 200K using neutron diffraction and differential isotopic labeling (hydrogen/deuterium) of the protein and solvent. To overcome limitations in previous studies, this work is conducted in parallel with 4 proteins (myoglobin, cytochrome P450, lysozyme, and green fluorescent protein) and experiments were performed at a range of instrument time resolutions (1ns - 10ps). The author's data looks compelling, and suggests that transitions in the protein and solvent behavior are not coupled and contrary to some previous reports, the apparent water transition temperature is a 'resolution effect'; i.e. instrument response is limited. This is likely to be important in the field, as a reassessment of solvent 'slaving' and the role of the hydration shell on protein dynamics should be reassessed in light of these findings.

      Strengths:

      The use of multiple proteins and instruments with a rate of energy resolution/ timescales.

      Weaknesses:

      The paper could be organised to better allow the comparison of the complete dataset collected.<br /> The extent of hydration clearly influences the protein transition temperature. The authors suggest that "water can be considered here as lubricant or plasticizer which facilitates the motion of the biomolecule." This may be the case, but the extent of hydration may also alter the protein structure.

    3. Reviewer #2 (Public Review):

      Summary:

      The manuscript entitled "Decoupling of the Onset of Anharmonicity between a Protein and Its Surface Water around 200 K" by Zheng et al. presents a neutron scattering study trying to elucidate if at the dynamical transition temperature water and protein motions are coupled. The origin of the dynamical transition temperature has been highly debated for decades, specifically its relation to hydration.

      Strengths:

      The study is rather well conducted, with a lot of effort to acquire the perdeuterated proteins, and some results are interesting.

      Weaknesses:

      The present work could certainly contribute some arguments, but I have the feeling that not all known facts are properly discussed.

      The points the authors should carefully discuss are the following:

      (1) Daniel et al. (10.1016/S0006-3495(98)77694-5) have shown that enzymes can be functional below the dynamical transition temperature which is at odds with some of the claims of the authors.

      (2) It is not as easy to say that protonated proteins in D2O reflect protein dynamics while perdeuterated proteins in H2O reflect water dynamics. A recent study by Nidriche et al. (PRX LIFE 2, 013005 (2024)) reveals that H <-> D exchange is much faster than usually assumed and has important consequences for such studies.

      (3) A publication by Jasnin et al. (10.1039/b923878f) on heparin sulfate shows a resolution effect.

      (4) The authors should discuss the impact of the chosen q-range on their findings (see Phys. Chem. Chem. Phys., 2012, 14, 4927-4934, where the authors see a huge effect !).

      (5) The authors underline that the dynamical transition is intrinsic to the protein. However, Cupane et al. (ref 12) have shown that it can also be found in a mixture of amino acids without any protein backbone.

      (6) The authors say that they find similar dependences from MSD. They should explain that the MSD is inversely proportional to the summed intensities squared.

      (7) A decoupling between water dynamics and membrane dynamics has already been discussed by K. Wood, G. Zaccai et al.

      (8) The fact that transition temperature in lipid membranes is higher when the membrane is dry is also well known (A.V. Popova, D.K. Hincha, BMC Biophys. 4, 11 (2011)).

      (9) The authors should mention the slope (K/min) they used for DSC and discuss the impact of it on the results.

      (10) In the introduction, the authors should present the different explanations forwarded for the dynamical transition.

    1. eLife assessment

      This is an important study examining the role of prediction error in state allocation of memories. The data are convincing and largely support the conclusion that a gradual change between acquisition and extinction maintains the memory state of acquisition and, thus, results in extinction that is resistant to restoration. This paper is of interest to behavioural and neuroscience researchers studying learning, memory, and the neural mechanisms of those processes, as well as to clinicians using extinction-based therapies in treating anxiety-based disorders.

    2. Reviewer #1 (Public Review):

      Summary:

      In "Prediction error determines how memories are organized in the brain: a study of Pavlovian fear 2 extinction in rats", Kennedy et al examine how new information is organized in memory. They tested an idea based on latent theory that suggests that a large prediction error leads to the formation of a new memory, whereas a small prediction error leads to memory updating. They directly tested the prediction by extinguishing fear-conditioned rats with gradual extinction. For their experiment, gradual extinction was carried out by progressively reducing the intensity of shocks that were co-terminated with the CS, until the CS was presented alone. Doing so resulted in diminished spontaneous recovery and reinstatement compared to Standard Extinction. The results are compelling, and have important implications for the field of fear learning and memory as well as translation to anxiety-related disorders.

      The authors carried out the Spontaneous Recovery experiment in 2 separate experiments. In one, they found differences between the Gradual and Standard Extinction groups, but in the second, they did not. It seems that their reinstatement test was more robust, and showed significant differences between the Gradual and Standard Extinction groups.

      The authors carried out important controls that enable proper contextualization of the findings. They included a "Home" group, in which rats received fear conditioning, but not extinction manipulation. Relative to this group, the Gradual and Standard extinction groups showed a reduction in freezing.

      In Experiments 3 and 4, the authors essentially carried out clever controls that served to examine whether shock devaluation (Experiment 4) and reduction in shock intensity (rather than a gradual decrease in shock intensity) (Experiment 3) would also yield a decrease in the return of fear. In line with a latent-cause updating explanation for accounting for the Gradual Extinction, they did not.

      In Experiment 5, the authors examined whether a prediction error produced by a change of context might contribute interference to the latent cause updating afforded by the Gradual Extinction. Such a prediction would align with a more flexible interpretation of a latent-cause model, such as those proposed by Redish (2007) and Gershman et al (2017), but not the latent-cause interpretation put forth by the Cochran-Cisler model (2019). Their findings showed that whereas Gradual Extinction carried out in the same context as acquisition resulted in less return of fear than Standard Extinction, it actually yielded a greater degree of return of fear when carried out in a different context, in support of the Redish and Gershman accounts, but not Cochran-Cisler.

      Experiment 6 extended the findings from Experiment 5 in a different state-splitting modality: timing. In this experiment, the authors tested whether a shift in temporal context also influenced the gradual extinction effect. They thus carried out the extinction sessions 21 days after conditioning. They found that while Gradual Extinction was indeed effective when carried out one day after fear conditioning, it did not when conducted 21 days later.

      The authors next carried out an omnibus analysis which included all the data from their 6 experiments, and found that overall, Gradual Extinction resulted in diminished return of fear relative to Standard Extinction. I thought the omnibus analysis was a great idea and an appropriate way to do their data justice.

      Strengths:

      Compelling findings. The data support the conclusions. 6 rigorous experiments were conducted which included clever controls. Data include male and female rats. I really liked the omnibus analysis.

      Weaknesses:

      None noted.

    3. Reviewer #2 (Public Review):

      Summary:

      The present article describes a series of experiments examining how a gradual reduction in unconditional stimulus intensity facilitates fear reduction and reduces relapse (spontaneous recovery and reinstatement) relative to a standard extinction procedure. The experiments provide compelling, if somewhat inconsistent, evidence of this effect and couch the results in a scholarly discussion surrounding how mechanisms of prediction error contribute to this effect.

      Strengths:

      The experiments are theoretically motivated and hypothesis-driven, well-designed, and appropriately conducted and analyzed. The results are clear and appropriately contextualized into the broader relevant literature. Further, the results are compelling and ask fundamental questions regarding how to persistently weaken fear behavior, which has both strong theoretical and real-world implications. I found the 'scrambled' experiment especially important in determining the mechanism through which this reduction in shock intensity persistently weakens fear behavior.

      Weaknesses:

      Overall, I found very few weaknesses in this paper. I think some might view the somewhat inconsistent effects on relapse between experiments to be a substantial weakness, I appreciate the authors directly confronting this and using it as an opportunity to aggregate data to look at general trends. Further, while Experiment 1 only used males, this was corrected in the rest of the experiments and therefore is not a substantial concern.

    4. Reviewer #3 (Public Review):

      Summary:

      The manuscript examined the role of large versus small prediction errors (PEs) in creating a state-based memory distinction between acquisition and extinction. The premise of the paper is based on theoretical claims and empirical findings that gradual changes between acquisition and extinction would lead to the potential overwriting of the acquisition memory with extinction, resulting in a more durable reduction in conditioned responding (i.e. more durable extinction effect). The paper tests the hypotheses in a series of elegant experiments in which the shock intensity is decreased across extinction sessions before non-reinforced CS presentations are given. Additional manipulations include context change, shock devaluation, and controlling for lower shock intensity exposure. The critical comparison was standard non-reinforced extinction training. The critical tests were done in spontaneous recovery and reinstatement.

      Strengths:<br /> The findings are of tremendous importance in understanding how memories can be updated and reveal a well-defined role of PE in this process. It is well-established that PE is critical for learning, so delineating how PE is critical for generating memory states and the role it serves in keeping memories dissociable (or not) is exciting and clever. As such the paper addresses a fundamental question in the field.

      The studies test clear and defined predictions derived from simulations of the state-belief model of Cochran & Cisler (2019). The designs are excellent: well-controlled and address the question.

      The authors have done an excellent job of explaining the value of the latent state models.

      The authors have studied both sexes in the study presented, providing generality across the sexes in their findings. However, depicting the individual data points in the bar graphs and noting which data represent males and which represent females would be of great value.

      Weaknesses:

      (1) While it seems obvious that delivering a lower intensity shock will generate a smaller PE than say no shock, it would have been nice to see data from say a compound testing procedure that confirms this.

      (2) The devaluation experiment is quite clever, but it also would be strengthened if there was evidence in the paper that this procedure does indeed lead to shock devaluation.

      (3) It would have been very exciting to see even more parametric examinations of this idea, like maintaining shock intensity but gradually reducing shock duration, which would have increased the impact of the paper.

      (4) Individual data points should be represented in the test figures (see above also).

    1. eLife assessment

      Wound infections are very common and can lead to delayed wound healing or poor wound healing which significantly impacts morbidity and overall quality of life for patients. This manuscript uses scRNA-Seq to try to understand the impact of infection on various cell types during wound healing in a mouse model. The methodology is solid and the results provide a valuable 'atlas' of the cellular changes associated with infected and uninfected wounds which will of interest to the field.

    2. Reviewer #1 (Public Review):

      Summary:

      This is an interesting study that performs scRNA-Seq on infected and uninfected wounds. The authors sought to understand how infection with E. faecalis influences the transcriptional profile of healing wounds. The analysis demonstrated that there is a unique transcriptional profile in infected wounds with specific changes in macrophages, keratinocytes, and fibroblasts. They also speculated on potential crosstalk between macrophages and neutrophils and macrophages and endothelial cells using NicheNet analysis and CellChat. Overall the data suggest that infection causes keratinocytes to not fully transition which may impede their function in wound healing and that the infection greatly influenced the transcriptional profile of macrophages and how they interact with other cells.

      Strengths:

      It is a useful dataset to help understand the impact of wound infection on the transcription of specific cell types. The analysis is very thorough in terms of transcriptional analysis and uses a variety of techniques and metrics.

      Weaknesses:

      Some drawbacks of the study are the following. First, the fact that it only has two mice per group, and only looks at one time point after wounding decreases the impact of the study. Wound healing is a dynamic and variable process so understanding the full course of the wound healing response would be very important to understand the impact of infection on the healing wound. Including unwounded skin in the scRNA-Seq would also lend a lot more significance to this study. Another drawback of the study is that mouse punch biopsies are very different than human wounds as they heal primarily by contraction instead of re-epithelialization like human wounds. So while the conclusions are generally supported the scope of the work is limited.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors have performed a detailed analysis of the complex transcriptional status of numerous cell types present in wounded tissue, including keratinocytes, fibroblasts, macrophages, neutrophils, and endothelial cells. The comparison between infected and uninfected wounds is interesting and the analysis suggests possible explanations for why infected wounds are delayed in their healing response.

      Strengths:

      The paper presents a thorough and detailed analysis of the scRNAseq data. The paper is clearly written and the conclusions drawn from the analysis are appropriately cautious. The results provide an important foundation for future work on the healing of infected and uninfected wounds.

      Weaknesses:

      The analysis is purely descriptive and no attempt is made to validate whether any of the factors identified are playing functional roles in wound healing. The experimental setup is analyzing a single time point and does not include a comparison to unwounded skin.

    1. eLife assessment

      The paper investigates a potential cause of a type of severe epilepsy that develops in early life because of a defect in a gene called KCNQ2. The significance is fundamental because it substantially advances our understanding of a major research question. The strength of the evidence is convincing because appropriate methods are used that are in line with the state-of-the art, although there are some revisions/corrections that would strengthen the evidence further.

    2. Reviewer #1 (Public Review):

      Abreo et al. performed a detailed multidisciplinary analysis of a pathogenic variant of the KCNQ2 ion channel subunit identified in a child with neonatal-onset epilepsy and neurodevelopmental disorders. These analyses revealed multiple molecular and cellular mechanisms associated with this variant and provided important insights into what distinguishes distinct pathogenic variants of KCNQ2 associated with self-limited familial neonatal epilepsy versus those leading to developmental and epileptic encephalopathy, and how they may mechanistically differ, to result in different extents of developmental impairment.

      The authors first provide a detailed clinical description of the patient heterozygous for a novel pathogenic variant encoding KCNQ2 G256W. They then model the structure of the G256W variant based on recent cryo-EM structures of KCNQ2 and other ion channel subunits and find that while the affected position is quite distinct from the channel pore, it participates in a novel, evolutionarily conserved set of amino acids that form a network of hydrogen bonds that stabilize the structure of the pore domain.

      They then undertake a series of rigorous and quantitative laboratory experiments in which the KCNQ2 G256W variant is coexpressed exogenously with WT KCNQ2 and KCNQ3 subunits in heterologous cells, and endogenously in novel gene-edited mice generated for this study. This includes detailed electrophysiological analyses in the transfected heterologous cells revealing the dominant-negative phenotype of KCNQ2 G256W. They found altered firing properties in hippocampal CA1 neurons in brain slices from the heterozygous KCNQ2 G256W mice.

      They next showed that the expression and localization of KCNQ channels are altered in brain neurons from heterozygous KCNQ2 G256W mice, suggesting that this variant impacts KCNQ2 trafficking and stability.

      Together, these laboratory studies reveal that the molecular and cellular mechanisms shaping KCNQ channel expression, localization, and function are impacted at multiple levels by the variant encoding KCNQ2 G256W, likely contributing to the clinical features of the child heterozygous for this variant relative to patients harboring distinct KCNQ2 pathogenic variants.

    3. Reviewer #2 (Public Review):

      Summary:

      The paper entitled "Plural molecular and cellular mechanisms of pore domain KCNQ2 encephalopathy" by Abreo et al. is a complex and integrated paper that is well-written with a focus on a single gene variant that causes a severe developmental encephalopathy. The paper collates clinical outcomes from 4 individuals and investigates a variant causing KCNQ2-DEE using a wide range of experimental techniques including structural biology, in vitro electrophysiology, generation of genetically modified animal models, immunofluorescence, and brain slice recordings. The overall results provide a plausible explanation of the pathophysiology of the G265W variant and provide important findings to the KCNQ2-DEE field as well as beginning to separate the understanding between seizures and encephalopathies.

      Strengths:

      (1) The authors describe in detail how the structural biology of the channel with a mutation changes the movement of the protein and adds insights into how one variant can change the function of the M-current. The proposed model linking this change to pathogenic consequences should help pave the way for additional studies to further support this type of approach.

      (2) The multiple co-expression ratio experiments drill down to the complex nature of the assembly of channels in over-expression systems and help to move toward an understanding of heterozygosity. It might have been interesting if TEA was tested as a blocker to better understand the assembly of the transfected subunits or possibly use vectors to force desired configurations.

      (3) The immunofluorescent approach to understanding re-distribution is another component of understanding the function of this critical current. The demonstration that Q2 and Q3 are diminished at the AIS is an important finding and a strength to the totality of the data presented in the paper.

      (4) Brain slice work is an important component of studying genetically modified animals as it brings in the systems approach, and helps to explain seizure generation and EEG recordings. The finding that G265W/+ neurons were more sensitive to current injections is a critical component of the paper.

      (5) The strength of this body of work is how the authors integrated different scientific approaches to knitting together a compelling set of experiments to better explain how a single variant, and likely extrapolation to other variants, can cause a severe neonatal developmental encephalopathy with a poor clinical outcome.

      Weaknesses:

      (1) Minor comment: Under the clinical history it is unclear whether the mother was on Leviracetam for suspected in-utero seizures or if Leviracetam was given to individual 1. The latter seems more likely, and if so this should be reworded.

      (2) As described in the clinical history of patient 1, treatment with ezogabine was encouraging with rapid onset by a parental global impression with difficulty in weaning off the drug. When studying the genetically modified mice, it would have been beneficial to the paper to talk about any ezogabine effects on the genetically modified mice.

      (3) It is a bit surprising that CA1 pyramidal neurons from the heterozygous G256W mice have no difference in resting membrane potential. The discussion section might explore this in a bit more detail.

      (4) It was mentioned in the paper about a direct comparison between SLFNE and G256W. However, in the slice recordings, there was no comparison. Having these data comparing SLFNE to G256W would have been a more fulsome story and would have added to the concept around susceptibility to action potential firing.

    4. Reviewer #3 (Public Review):

      Summary:<br /> This manuscript describes the symptoms of patients harboring KCNQ2 mutation G256W, functional changes of the mutant channel in exogenous expression, and phenotypes of G256W/+ mice. The patients presented seizures, the mutation reduced currents of the channel, and the G256W/+ mice showed seizures, increased firing frequency in neurons, reduced KCNQ2 expression,<br /> and altered subcellular distribution.

      Strengths:

      This is a large amount of work and all results corroborated the pathogenicity of the mutation in KCNQ2, providing an interesting example of KCNQ2-associated neurological disorder's impact on functions at all levels including molecular, cellular, tissue, animal model, and patients.

      Weaknesses:

      The manuscript described observations of changes in association with the mutation at molecular cellular functions and animal phenotype, but the results in some aspects are not as strong as in others. Nevertheless, the manuscript made overarching conclusions even when the evidence was not sufficiently strong.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      We thank all the reviewers for taking the time to assess and provide valuable feedback on the manuscript. We believe these comments helped clarify the manuscript’s prose, and the suggestions on the functionality and aim of the toolbox were globally incorporated into the following updates of the toolbox. Particularly, we would like to point out some changes that will help all reviewers, independently of their individual comments, to understand the current state of the toolbox and some systematic changes that were made to the manuscript.

      We have received a significant amount of feedback asking for a PyTorch implementation of the toolbox. Consequently, we decided to enact this, and the next version of the toolbox will be exclusively in PyTorch. We will maintain the Application Programming Interface (API) and tutorial documentation for the TensorFlow version of the toolbox on the online website. However, going forward we will focus exclusively on bug-fixing and expanding from the latest version of MotorNet, which will be in PyTorch. We now believe that the greater popularity of PyTorch in the academic community makes that choice more sustainable while helping a greater proportion of research projects.

      These changes led to a significant alteration of the MotorNet structure, which is reflected by changes made throughout the manuscript, most particularly in Figure 3 and Table 1. A beneficial side-effect of this is a much simpler structure for MotorNet which ought to contribute positively toward its usability by researchers in the neuroscience community.

      We also refactored some terminology to be more in line with current computational neuroscience vocabulary:

      • The term “plant”, which comes from industrial engineering and is more niche in neuroscience, has been replaced by “effector”.

      • The term “task” has been replaced by “environment” to match the gymnasium toolbox terminology, which MotorNet is now compatible with. Task objects essentially performed the same function as environment objects from the gymnasium toolbox.

      • The term “controller” has been replaced by “policy” throughout, as this term is more general.

      • The term “motor command” is very specific to the motor control subfield of neuroscience, and therefore is replaced by “action”, which is more commonplace for this modelling component in computational neuroscience and machine learning.

      Reviewer #1 (Public Review):

      Summary:

      Codol et al. present a toolbox that allows simulating biomechanically realistic effectors and training Artificial Neural Networks (ANNs) to control them. The paper provides a detailed explanation of how the toolbox is structured and several examples that demonstrate its usefulness.

      Main comments:

      (1) The paper is well written and easy to follow. The schematics help in understanding how the toolbox works and the examples provide an idea of the results that the user can obtain.

      We thank the reviewer for this comment.

      (2) As I understand it, the main purpose of the paper should be to facilitate the usage of the toolbox. For this reason, I have missed a more explicit link to the actual code. As I see it, researchers will read this paper to figure out whether they can use MotorNet to simulate their experiments, and how they should proceed if they decide to use it. I'd say the paper provides an answer to the first question and assures that the toolbox is very easy to install and use. Maybe the authors could support this claim by adding "snippets" of code that show the key steps in building an actual example.

      This is an important point, which we also considered when writing this paper. We instead decided to focus on the first approach, because it is easier to illustrate the scientific use of the toolbox using code or interactive (Jupyter) notebooks than a publication format. We find the “how to proceed” aspect of the toolbox can more easily and comprehensively be covered using online, interactive tutorials. Additionally, this allows us to update these tutorials as the toolbox evolves over different versions, while it is more difficult to update a scientific article. Consequently, we explicitly avoided code snippets on the article itself. However, we appreciate that the paper would gain in clarity if this was more explicitly stated early. We have modified the paper to include a pointer to where to find tutorials online. We added this at the last paragraph of the introduction section:

      “The interested reader may consult the full API documentation, including interactive tutorials on the toolbox website at https://motornet.org.”

      (3) The results provided in Figures 1, 4, 5 and 6 are useful, because they provide examples of the type of things one can do with the toolbox. I have a few comments that might help improving them:

      (a) The examples in Figures 1 and 5 seem a bit redundant (same effector, similar task). Maybe the authors could show an example with a different effector or task? (see point 4).

      The effectors from figures 1 and 5 are indeed very similar. However, the tasks in figure 1 and 5 present some important differences. The training procedure in figure 1 never includes any perturbations, while the one from figure 5 includes a wide range of perturbations of different magnitudes, timing and directions. The evaluation procedure of figure 1 includes center-out reaches with permanent viscous (proportional to velocity) external dynamics, while that of figure 5 are fixed, transient, square-shaped perturbation orthogonal to the reach direction. Finally, the networks in figure 1 undergo a second training procedure after evaluation while the network of figure 5 do not. While we agree that some variation of effectors would be beneficial, we do show examples of a point-mass effector in figure 6. Overall, figure 5 shows a task that is quite different from that of figure 1 with a similar effector, while the opposite is true for figure 6. We have modified the text to clarify this for the reader, by adding the following.

      End of 1st paragraph, section 2.4.

      “Therefore, the training protocol used for this task largely differed from section 2.1 in that the networks are exposed to a wide range of mechanical perturbations with varying characteristics.”

      1st paragraph of section 2.5

      […] this asymmetrical representation of PMDs during reaching movements did not occur when RNNs were trained to control an effector that lacked the geometrical properties of an arm such as illustrated in Figure 4c-e and section 2.1.

      (b) I missed a discussion on the relevance of the results shown in Figure 4. The moment arms are barely mentioned outside section 2.3. Are these results new? How can they help with motor control research?

      We thank the reviewer for this comment. This relates to a point from reviewer 2 indicating that the purpose of each section was sometimes difficult to grasp as one reads. Section 2.3 explains the biomechanical properties that the toolbox implements to improve realism of the effector. They are not new results in the sense that other toolboxes implement these features (though not in differentiable formats) and these properties of biological muscles are empirically well-established. However, they are important to understand what the toolbox provides, and consequently what constraints networks must accommodate to learn efficient control policies. An example of this is the results in figure 6, where a simple effector versus a more biomechanically complex effector will yield different neural representations.

      Regarding the manuscript itself, we agree that more clarity on the goal of every paragraph may improve the reader’s experience. Consequently, we ensured to specify such goals at the start of each section. Particularly, we clarify the purpose of section 2.3 by adding several sentences on this at the end of the first paragraph in that section. We also now clearly state the purpose of section 2.3 with the results of figure 6 and reference figure 4 in that section.

      (c) The results in Figure 6 are important, since one key asset of ANNs is that they provide access to the activity of the whole population of units that produces a given behavior. For this reason, I think it would be interesting to show the actual "empirical observations" that the results shown in Fig. 6 are replicating, hence allowing a direct comparison between the results obtained for biological and simulated neurons.

      These empirical observations are available from previous electrophysiological and modelling work. Particularly, polar histograms across reaching directions like panel C are displayed in figures 2 and 3 of Scott, Gribble, Graham, Cabel (2001, Nature). Colormaps of modelled unit activity across time and reaching directions like panel F are also displayed in figure 2 of Lillicrap, Scott (2013, Neuron). Electrophysiological recordings of M1 neurons during a similar task in non-human primates can also be seen on “Preserved neural population dynamics across animals performing similar behaviour” figure 2 B (https://doi.org/10.1101/2022.09.26.509498) and “Nonlinear manifolds underlie neural population activity during behaviour” figure 2 B as well (https://doi.org/10.1101/2023.07.18.549575). Note that these two pre-prints use the same dataset.

      We have added these citations to the text and made it explicit that they contain visualizations of similar modelling and empirical data for comparison:

      “This heterogeneous set of responses matches empirical observations in non-human primate primary motor cortex recordings (Churchland & Shenoy, 2007; Michaels et al., 2016) and replicate similar visualizations from previously published work (Fortunato et al., 2023; Lillicrap & Scott, 2013; Safaie et al., 2023).”

      (4) All examples in the paper use the arm26 plant as effector. Although the authors say that "users can easily declare their own custom-made effector and task objects if desired by subclassing the base Plant and Task class, respectively", this does not sound straightforward. Table 1 does not really clarify how to do it. Maybe an example that shows the actual code (see point 2) that creates a new plant (e.g. the 3-joint arm in Figure 7) would be useful.

      Subclassing is a Python process more than a MotorNet process, as python is an object-oriented language. Therefore, there are many Python tutorials on subclassing in the general sense that would be beneficial for that purpose. We have amended the main text to ensure that this is clearer to the reader.

      Subclassing a MotorNet object, in a more specific sense, requires overwriting some methods from the base MotorNet classes (e.g., Effector or Environment classes, which correspond to the original Plant and Task object, respectively). Since we made the decision (mentioned above) to not include code in the main text, we added tutorials to the online documentation, which include dedicated tutorials for MotorNet class subclassing. For instance, this tutorial showcases how to subclass Environment classes:

      https://colab.research.google.com/github/OlivierCodol/MotorNet/blob/master/examples/3-environments.ipynb

      (5) One potential limitation of the toolbox is that it is based on Tensorflow, when the field of Computational Neuroscience seems to be, or at least that's my impression, transitioning to pyTorch. How easy would it be to translate MotorNet to pyTorch? Maybe the authors could comment on this in the discussion.

      We have received a significant amount of feedback asking for a PyTorch implementation of the toolbox. Consequently, we decided to enact this, and the next version of the toolbox will be exclusively in PyTorch. We will maintain the Application Programming Interface (API) and tutorial documentation for the TensorFlow version of the toolbox on the online website. However, going forward we will focus exclusively on bug-fixing and expanding from the latest version of MotorNet, which will be in PyTorch. We now believe that the greater popularity of PyTorch in the academic community makes that choice more sustainable while helping a greater proportion of research projects.

      These changes led to a significant alteration of the MotorNet structure, which are reflected by changes made throughout the manuscript, notably in Figure 3 and Table 1.

      (6) Supervised learning (SL) is widely used in Systems Neuroscience, especially because it is faster than reinforcement learning (RL). Thus providing the possibility of training the ANNs with SL is an important asset of the toolbox. However, SL is not always ideal, especially when the optimal strategy is not known or when there are different alternative strategies and we want to know which is the one preferred by the subject. For instance, would it be possible to implement a setup in which the ANN has to choose between 2 different paths to reach a target? (e.g. Kaufman et al. 2015 eLife). In such a scenario, RL seems to be a more natural option Would it be easy to extend MotorNet so it allows training with RL? Maybe the authors could comment on this in the discussion.

      The new implementation of MotorNet that relies on PyTorch is already standardized to use an API that is compatible with Gymnasium. Gymnasium is a standard and popular interfacing toolbox used to link RL agents to environments. It is very well-documented and widely used, which will ensure that users who wish to employ RL to control MotorNet environments will be able to do so relatively effortlessly. We have added this point to accurately reflect the updated implementation, so users are aware that it is now a feature of the toolbox (new section 3.2.4.).

      Impact:

      MotorNet aims at simplifying the process of simulating complex experimental setups to rapidly test hypotheses about how the brain produces a specific movement. By providing an end-to-end pipeline to train ANNs on the simulated setup, it can greatly help guide experimenters to decide where to focus their experimental efforts.

      Additional context:

      Being the main result a toolbox, the paper is complemented by a GitHub repository and a documentation webpage. Both the repository and the webpage are well organized and easy to navigate. The webpage walks the user through the installation of the toolbox and the building of the effectors and the ANNs.

      Reviewer #2 (Public Review):

      MotorNet aims to provide a unified interface where the trained RNN controller exists within the same TensorFlow environment as the end effectors being controlled. This architecture provides a much simpler interface for the researcher to develop and iterate through computational hypotheses. In addition, the authors have built a set of biomechanically realistic end effectors (e.g., an 2 joint arm model with realistic muscles) within TensorFlow that are fully differentiable.

      MotorNet will prove a highly useful starting point for researchers interested in exploring the challenges of controlling movement with realistic muscle and joint dynamics. The architecture features a conveniently modular design and the inclusion of simpler arm models provides an approachable learning curve. Other state-of-the-art simulation engines offer realistic models of muscles and multi-joint arms and afford more complex object manipulation and contact dynamics than MotorNet. However, MotorNet's approach allows for direct optimization of the controller network via gradient descent rather than reinforcement learning, which is a compromise currently required when other simulation engines (as these engines' code cannot be differentiated through).

      The paper could be reorganized to provide clearer signposts as to what role each section plays (e.g., that the explanation of the moment arms of different joint models serves to illustrate the complexity of realistic biomechanics, rather than a novel discovery/exposition of this manuscript). Also, if possible, it would be valuable if the authors could provide more insight into whether gradient descent finds qualitatively different solutions to RL or other non gradient-based methods. This would strengthen the argument that a fully differentiable plant is useful beyond improving training time / computational power required (although this is a sufficiently important rationale per se).

      We thank the reviewer for these comments. We agree that more clarity on the section goals may improve the reader’s experience and ensured this is the case throughout the manuscript. Particularly, we added the following on the first paragraph of section 2.3, for which an explicit goal was most missing:

      “In this section we illustrate some of these biomechanical properties displayed by MotorNet effectors using specific examples. These properties are well-characterised in the biology and are often implemented in realistic biomechanical simulation software.”

      Regarding the potential difference in solutions obtained from reinforcement or supervised learning, this would represent a non-trivial amount of work to do so conclusively and so may not be within the scope of the current article. We do appreciate however that in some situations RL may be a more fitting approach to a given task design. In relation to this point we now specify in the discussion that the new API can accommodate interfacing with reinforcement learning toolboxes for those who may want to pursue this type of policy training approach when appropriate (new section 3.2.4.).

      Reviewer #3 (Public Review):

      Artificial neural networks have developed into a new research tool across various disciplines of neuroscience. However, specifically for studying neural control of movement it was extremely difficult to train those models, as they require not only simulating the neural network, but also the body parts one is interested in studying. The authors provide a solution to this problem which is built upon one of the main software packages used for deep learning (Tensorflow). This allows them to make use of state-of-the-art tools for training neural networks.

      They show that their toolbox is able to (re-)produce several commonly studied experiments e.g., planar reaching with and without loads. The toolbox is described in sufficient detail to get an overview of the functionality and the current state of what can be done with it. Although the authors state that only a few lines of code can reproduce such an experiment, they unfortunately don't provide any source code to reproduce their results (nor is it given in the respective repository).

      The possibility of adding code snippets to the article is something we originally considered, and which aligns with comment two from reviewer one (see above). Hopefully this provides a good overview of the motivation behind our choice not to add code to the article.

      The modularity of the presented toolbox makes it easy to exchange or modify single parts of an experiment e.g., the task or the neural network used as a controller. Together with the open-source nature of the toolbox, this will facilitate sharing and reproducibility across research labs.

      I can see how this paper can enable a whole set of new studies on neural control of movement and accelerate the turnover time for new ideas or hypotheses, as stated in the first paragraph of the Discussion section. Having such a low effort to run computational experiments will be definitely beneficial for the field of neural control of movement.

      We thank the reviewer for these comment.

    2. eLife assessment

      This work will be of interest to the motor control community as well as neuroAI researchers interested in how bodies constrain neural circuit function. The authors present "MotorNet", a useful software package to train artificial neural networks to control a biomechanical model of an effector. The manuscript provides solid evidence that MotorNet is easy to use and can reproduce past results in the field, both at the neural and behavioural levels. Validation is limited to planar arm-like plants or point-masses, so future work exploring three-dimensional movements and other types of plants would strengthen the impact of the tool.

    3. Reviewer #1 (Public Review):

      Summary:<br /> Codol et al. present a toolbox that allows simulating biomechanically realistic effectors and training Artificial Neural Networks (ANNs) to control them. The paper provides a detailed explanation of how the toolbox is structured and several examples demonstrating its utility.

      Main comments:<br /> (1) The paper is well-written and easy to follow. The schematics facilitate understanding of the toolbox's functionality, and the examples give insight into the potential results users can achieve.<br /> (2) The toolbox's latest version, developed in PyTorch, is expected to offer greater benefits to the community.<br /> (3) The new API, being compatible with Gymnasium, broadens the toolbox's application scope, enabling the use of Reinforcement Learning for training the ANNs.

      Impact:<br /> MotorNet is designed to simplify the process of simulating complex experimental setups, enabling the rapid testing of hypotheses on how the brain generates specific movements. Implemented in PyTorch and compatible with widely-used machine learning toolboxes, including Gymnasium, it offers an end-to-end pipeline for training ANNs on simulated setups. This can greatly assist experimenters in determining the focus of their subsequent efforts.

      Additional context:<br /> The main outcome of the work, a toolbox, is supplemented by a GitHub repository and a documentation webpage. Both the repository and the webpage are well-organized and user-friendly. The webpage guides users through the toolbox installation process, as well as the construction of effectors and Artificial Neural Networks (ANNs).

    4. Reviewer #2 (Public Review):

      MotorNet aims to provide a unified interface where the trained RNN controller exists within the same TensorFlow environment as the end effectors being controlled. This architecture provides a much simpler interface for the researcher to develop and iterate through computational hypotheses. In addition, the authors have built a set of biomechanically realistic end effectors (e.g., a 2 joint arm model with realistic muscles) within TensorFlow that are fully differentiable.

      MotorNet will prove a highly useful starting point for researchers interested in exploring the challenges of controlling movement with realistic muscle and joint dynamics. The architecture features a conveniently modular design and the inclusion of simpler arm models provides an approachable learning curve. Other state-of-the-art simulation engines offer realistic models of muscles and multi-joint arms and afford more complex object manipulation and contact dynamics than MotorNet. However, MotorNet's approach allows for direct optimization of the controller network via gradient descent rather than reinforcement learning, which is a compromise currently required when other simulation engines (as these engines' code cannot be differentiated through).

      The paper has been reorganized to provide clearer signposts to guide the reader. Importantly, the software has been rewritten atop PyTorch which is increasingly popular in ML and computational neuroscience research.

      One paragraph in the discussion regarding a "spinal cord" module is a bit perplexing. Quite sensibly, the software architecture partitions motor control into the plant or effector (the physical body being moved) and the controller (a model of the brain and spinal cord). Of course, the authors certainly appreciate this, though a reader from outside of neuro might not realize that control of movement is distributed throughout the central nervous system, spanning a network of spinal, subcortical (cerebellum, basal ganglia, thalamus, brainstem), and cortical brain regions. Casting the spinal cord as a pre-filter within the effector module would seem to belie its complex and dynamic role in these distributed neural circuits. This is particularly noticeable when contrasted with the subsequent paragraph on "Modular polices" (which is excellent). In my view, the spinal cord would be better treated as a module of this policy section rather than as part of the effector. I understand the nuance here, and suspect I'd see eye to eye with the authors for the most part. The choice of controller vs. plant depends on perspective (one could call the arm itself part of the controller, and treat the environment / manipulated object as the plant; similarly, one could treat the brain as controlling the cord rather than the body). However, I fear that someone lacking the appropriate neurophysiological/anatomical context might read the "Spinal Compartment" paragraph, think that it would be fine to introduce a simple filter module as the spinal cord, and then start referring to the MotorNet policy network as a model of motor cortex per se.

    1. Author Response

      This important work presents a new methodology for the statistical analysis of fiber photometry data, improving statistical power while avoiding the bias inherent in the choices that are necessarily made when summarizing photometry data. The reanalysis of two recent photometry data sets, the simulations, and the mathematical detail provide convincing evidence for the utility of the method and the main conclusions, however, the discussion of the re-analyzed data is incomplete and would be improved by a deeper consideration of the limitations of the original data. In addition, consideration of other data sets and photometry methodologies including non-linear analysis tools, as well as a discussion of the importance of the data normalization are needed.

      Thank you for the thorough and positive review of our work! We will incorporate this feedback to strengthen the manuscript. Specifically, we plan to revise the Discussion section to include a deeper consideration of the limitations of the original data, a description of the capacities of our method for conducting non-linear analyses, and the role data normalization plays in applicability of our tool.

      Reviewer 1:

      Strengths:

      The framework the authors present is solid and well-explained. By reanalyzing formerly published data, the authors also further increase the significance of the proposed tool opening new avenues for reinterpreting already collected data.

      Weaknesses:

      However, this also leads to several questions. The normalization method employed for raw fiber photometry data is different from lab to lab. This imposes a significant challenge to applying a single tool of analysis.

      Thank you for the positive feedback, we will address your comments in our revision. We agree that any data pre-processing steps will have down-stream consequences on the statistical inference from our method. Note, though, that this would also be the case with standard analysis approaches (e.g., t-tests, correlations) applied to summary measures like AUCs. For that reason, we do not believe that variability in pre-processing is an impediment to widespread adoption of a standard analysis procedure. Rather, we argue that the sensitivity of analysis results to pre-processing choices underscores the need for establishing statistical techniques that reduce the need for pre-processing, and properly account for structure in the data arising from experimental designs. The reviewer brings up an excellent point that we can further elaborate on how our methods actually reduce the need for such pre-processing steps. Indeed, our method provides smooth estimation results along the functional domain (i.e., across trial timepoints), has the ability to adjust for between-trial and -animal heterogeneity, and provides a valid statistical inference framework that quantifies the resulting uncertainty. For example, adjustment for session-to-session variability in signal magnitudes or dynamics could be accounted for, at least in part, through the inclusion of session-level random effects. This heterogeneity would then influence the width of the confidence intervals. This stands in contrast to “sweeping it under the rug” with a pre-processing step that may have an unknown impact on the final statistical inferences. Similarly, the level of smoothing is at least in part selected as a function of the data, and again is accounted for directly in the equations used to construct confidence intervals. In sum, our method provides both a tool to account for challenges in the data, and a systematic framework to quantify the additional uncertainty that accompanies accounting for those data characteristics.

      Does the method that the authors propose work similarly efficiently whether the data are normalized in a running average dF/F as it is described in the cited papers? For example, trace smoothing using running averages (Jeong et al. 2022) in itself may lead to pattern dilution. The same question applies if the z-score is calculated based on various responses or even baselines.

      This is an important question given how common this practice is in the field. Briefly, application of pre-processing steps will change the interpretation of the results from our analysis method. For example, if one subtracts off a pre-trial baseline average from each trial timepoint, then the “definition of 0”, and the interpretation of coefficients and their statistical significance, changes. Similarly, if one scales the signal (e.g., divides the signal magnitude by a trial- or animal-specific baseline), then this changes the interpretation of the FLMM regression coefficients to be in terms of an animal-specific signal unit as opposed to a raw dF/F. This is, however, not specific to our technique, and pre-processing would have a similar influence on, for example, linear regression (and thus t-tests, ANOVAs and Pearson correlations) applied to summary measures. We agree with the reviewer that explicitly discussing this point will strengthen the paper.

      While it is difficult to make general claims about the anticipated performance of the method under all the potential pre-processing steps taken in the field, we believe that most common pre-processing strategies will not negatively influence the method’s performance or validity; they would, instead, change the interpretation of the results. We are releasing a series of vignettes to guide analysts through using our method and, to address your comment, we will add a section on interpretation after pre-processing.

      How reliable the method is if the data are non-stationary and the baselines undergo major changes between separate trials?

      This is an excellent question. We believe the statistical inferences will be valid and will properly quantify the uncertainty from non-stationarities, since our framework does not impose stationarity assumptions on the underlying process. It is worth mentioning that non-stationarity and high trial-to-trial variability may increase variance estimates if the model does not include a rich enough set of covariates to capture the source of the heterogeneity across trial baselines. However, this is a feature of our framework, rather than a bug, as it properly conveys to the analyst that high unaccounted for variability in the signal may result in high model uncertainty. Finally, mixed effects modeling provides a transparent, statistically reasonable, and flexible approach to account for between-session, and between-trial variability, a type of non-stationarity. We agree with the reviewer that this should be more explicitly discussed in the paper, and will do so.

      Finally, what is the rationale for not using non-linear analysis methods? Following the paper's logic, non-linear analysis can capture more information that is diluted by linear methods.

      Functional data analysis assumes that the function varies smoothly along the functional domain (i.e., across trial timepoints). It is a type of non-linear modeling technique over the functional domain since we do not assume a linear model (straight line). Therefore, our functional data analysis approach is able to capture more information that is diluted by linear models. While the basic form of our model assumes a linear change in the signal at a fixed trial timepoint, across trials/sessions, our package allows one to easily model changes with non-linear functions of covariates using splines or other basis functions. One must consider, however, the tradeoff between flexibility and interpretability when specifying potentially complex models.

      Reviewer 2

      Strengths:

      The open-source package in R using a similar syntax as the lme4 package for the implementation of this framework on photometry data enhances the accessibility, and usage by other researchers. Moreover, the decreased fitting time of the model in comparison with a similar package on simulated data, has the potential to be more easily adopted.

      The reanalysis of two studies using summary statistics on photometry data (Jeong et al., 2022; Coddington et al., 2023) highlights how trial-by-trial analysis at each time-point on the trial can reveal information obscured by averaging across trials. Furthermore, this work also exemplifies how session and subject variability can lead to opposite conclusions when not considered.

      Thank you for the positive assessment of our work!

      Weaknesses:

      Although this work has reanalyzed previous work that used summary statistics, it does not compare with other studies that use trial-by-trial photometry data across time-points in a trial.

      As described by the authors, fitting pointwise linear mixed models and performing t-test and Benjamini-Hochberg correction as performed in Lee et al. (2019) has some caveats. Using joint confidence intervals has the potential to improve statistical robustness, however, this is not directly shown with temporal data in this work. Furthermore, it is unclear how FLMM differs from the pointwise linear mixed modeling used in this work.

      We agree with the reviewers that providing more detail about the drawbacks of the approach applied in Lee et al., 2019 will strengthen the paper. We will add an example analysis applying the method proposed by Lee et al., 2019 to show how the set of timepoints at which coefficient estimates reach statistical significance can vary dramatically depending on the sampling rate one subsamples their data at, a highly undesirable property of this strategy. Our approach is robust to this, and still provides a multiple comparisons correction through the joint confidence intervals.

      In this work, FLMM usages included only one or two covariates. However, in complex behavioral experiments, where variables are correlated, more than two may be needed (see Simpson et al. (2023), Engelhard et al. (2019); Blanco-Pozo et al. (2024)). It is not clear from this work, how feasible computationally would be to fit such complex models, which would also include more complex random effects.

      This is a good point. In our experience, the code is still quite fast (often taking seconds to tens of seconds in our experience) on a standard laptop when fitting complex models that include, for example, 10 covariates, or complex random effect specifications on dataset sizes common in fiber photometry. In the manuscript, we included results from simpler models with few covariates in an attempt to show results from the FLMM versions of the standard analyses (e.g., correlations, t-tests) applied in Jeong et al., 2022. Our goal was to show that our method reveals effects obscured by standard analyses even in simple cases. Some of our models did, however, include complex nested random effects (e.g., the models described in Section 4.5.2).

      Like other mixed-model based analyses, our method becomes slower when the number of observations in the dataset is on the order of tens of thousands of observations. However, we coded the methods to be memory efficient so that even these larger analyses can be run on standard laptops. We thank the reviewer for this point, as we worked extremely hard to scale the method to be able to efficiently fit models commonly applied in neuroscience. Indeed, challenges with scalability were one of the main motivations for applying the estimation procedure that we did; in the appendix we show that the fit time of our approach is much faster than existing FLMM software such as the refund package function pffr(), especially for large sample sizes. While pffr() appears to scale exponentially with the number of clusters (e.g., animals), our method appears to scale linearly. We will more explicitly emphasize the scalability in the revision, since we agree this will strengthen the final manuscript.

      Reviewer #3

      Strengths:

      The statistical framework described provides a powerful way to analyze photometry data and potentially other similar signals. The provided package makes this methodology easy to implement and the extensively worked examples of reanalysis provide a useful guide to others on how to correctly specify models.

      Modeling the entire trial (function regression) removes the need to choose appropriate summary statistics, removing the opportunity to introduce bias, for example in searching for optimal windows in which to calculate the AUC. This is demonstrated in the re-analysis of Jeong et al., 2022, in which the AUC measures presented masked important details about how the photometry signal was changing.

      Meanwhile, using linear mixed methods allows for the estimation of random effects, which are an important consideration given the repeated-measures design of most photometry studies.

      Thank you for the positive assessment of our work!

      Weaknesses:

      While the availability of the software package (fastFMM), the provided code, and worked examples used in the paper are undoubtedly helpful to those wanting to use these methods, some concepts could be explained more thoroughly for a general neuroscience audience.

      We appreciate this and, to address your and other reviewers’ comments, we are creating a series of vignettes walking users through how to analyze photometry data with our package. We will include algebraic illustrations to help users gain familiarity with the regression modeling here.

      While the methodology is sound and the discussion of its benefits is good, the interpretation and discussion of the re-analyzed results are poor:

      In section 2.3, the authors use FLMM to identify an instance of Simpson's Paradox in the analysis of Jeong et al. (2022). While this phenomenon is evident in the original authors' metrics (replotted in Figure 5A), FLMM provides a convenient method to identify these effects while illustrating the deficiencies of the original authors' approach of concatenating a different number of sessions for each animal and ignoring potential within-session effects. The discussion of this result is muddled. Having identified the paradox, there is some appropriate speculation as to what is causing these opposing effects, particularly the decrease in sessions. In the discussion and appendices, the authors identify (1) changes in satiation/habitation/motivation, (2) the predictability of the rewards (presumably by the click of a solenoid valve) and (3) photobleaching as potential explanations of the decrease within days. Having identified these effects, but without strong evidence to rule all three out, the discussion of whether RPE or ANCCR matches these results is probably moot. In particular, the hypotheses developed by Jeong et al., were for a random (unpredictable) rewards experiment, whereas the evidence points to the rewards being sometimes predictable. The learning of that predictability (e.g. over sessions) and variation in predictability (e.g. by attention level to sounds of each mouse) significantly complicate the analysis. The FLMM analysis reveals the complexity of analyzing what is apparently a straightforward task design.

      While we are disappointed to hear the reviewer felt our initial interpretations and discussion were poor, the reviewer brings up an excellent point that we had not considered. They have convinced us that acknowledging and elaborating on this alternative perspective will strengthen the paper. We agree that the ANCCR/RPE model predictions were made for unpredictable rewards and, as the reviewer rightly points out, there is evidence that the animals sense the reward delivery. Regardless of the learning theory one adopts (RPE, ANCCR or others), we agree that this (potentially) learned predictability alone could account for the increase in signal magnitude across sessions.

      After reading the reviewer’s comments, we consulted with a number of researchers in this area, and several felt that a CS+ can serve as a reward, within itself. From this perspective, the rewards in the Jeong et al., 2022 experiment might still be considered unexpected. After discussing extensively with the authors of Jeong et al., 2022, it is clear that they went to enormous trouble to prevent the inadvertent generation of a CS+, and it is likely changes in pressure from the solenoid (rather than a sound) that served as a cue. This underscores the difficulty of preventing perception of reward delivery in practice. As this paper is focused on analysis approaches, we feel that we can contribute most thoughtfully to the dopamine–learning theory conversation by presenting both sides.

      Overall, we agree with the reviewer that future experiments will be needed for testing the accuracy of the models’ predictions for random (unpredicted) rewards. While we understand that our attempt to document our conversations with the Jeong et al., 2022 authors may have room for improvement, we hope the reviewer can appreciate that this was done with the best of intentions. We wish to emphasize that we also consulted with several other researchers in the field when crafting the discussion. The Jeong et al., 2022 authors could easily have avoided acknowledging the potential incompleteness of their theory, by claiming that our results do not invalidate their predictions for a random reward, as the reward was not unpredicted in the experiment (as a result of the inadvertent solenoid CS+). Instead, they went out of their way to emphasize that their experiment did test a random reward, and that our results do present problems for their theory. We think that engagement with re-analyses of one’s data, even when findings are inconvenient, is a good demonstration of open science practice. For that reason as well, we feel providing readers with a perspective on the entire discussion will contribute to the scientific discourse in this area.

      Finally, we would like to reiterate that this conversation is happening because our method, by analyzing the signal at every trial timepoint, revealed a neural signal that appears to indicate that the animals sense reward delivery. Ultimately, this was what we set out to do: help researchers ask questions of their data that they could not ask before. We believe that having a demonstration that we can indeed do this for a “live” issue is the most appropriate way of demonstrating the usefulness of the method.

      It is clear the reviewer put a lot of time into understanding what we did, and was very thoughtful about the feedback. We would like to thank the reviewer again for taking such care in reviewing our paper.

      If this paper is not trying to arbitrate between RPE and ANCCR, as stated in the text, the post hoc reasoning of the authors of Jeong et al 2022 provided in the discussion is not germane.

      While we appreciate that the post hoc reasoning of the authors of Jeong et al., 2022 may not seem germane, we would like to provide some context for its inclusion. As statisticians and computer scientists, our role is to create methods, and this often requires using open source data and recreating past analyses. This usually involves extensive conversation with authors about their data and analysis choices because, if we cannot reproduce their findings using their analysis methods, we cannot verify that results from our own methods are valid. As such, we prefer to conduct method development in a collaborative fashion, and we strive to constructively, and respectfully, discuss our results with the original authors. We feel that giving them the opportunity to suggest analyses, and express their point of view if our results conflict with their original conclusions, is important, and we do not want to discourage authors from making their datasets public. As such, we conducted numerous analyses at the suggestion of Jeong et al., 2022 and discussed the results over the course of many months. Indeed the analyses in the Appendix that the reviewer is referring to were conducted at the suggestion of the authors of Jeong et al., 2022, in an attempt to rule out alternative explanations. We nevertheless appreciate that our interpretations of these results can include some of the caveats suggested by the reviewer, and we will strive to improve these sections.

      Arbitrating between the models likely requires new experimental designs (removing the sound of the solenoid, satiety controls) or more complex models (e.g. with session effects, measures of predictability) that address the identified issues.

      We agree with the reviewer that the results suggest that new experimental designs will likely be necessary to adjudicate between models. It is our hope that, by weighing the different issues and interpretations, our paper might provide useful suggestions into what experimental designs would be most beneficial to rule out competing hypotheses in future data collection efforts. We believe that our methodology will strengthen our capacity to design new experiments and analyses. We will make the reviewer’s suggestions more explicit in the discussion by emphasizing the limitations of the original data.

      Of the three potential causes of within-session decreases, the photobleaching arguments advanced in the discussion and expanded greatly in the appendices are not convincing. The data being modeled is a processed signal (ΔF/F) with smoothing and baseline correction and this does not seem to have been considered in the argument.

      We are disappointed to hear that this extensive set of analyses, much of which was conducted at the suggestion of Jeong et al., 2022, was not convincing. We agree that acknowledging any pre-processing would provide useful context for the reader. We do wish to clarify that we analyzed the data that were made available online (raw data was not available). Moreover, for comparison with the authors’ results, we felt it was important to maintain the same pre-processing steps as they did. These conditions were held constant across analysis approaches; therefore, we think that the changes within-trial are likely not influenced substantially by these pre-processing choices. While we cannot speak definitively to the impact any of the processing conducted by the authors had on the results, we believe that it was likely minor, given that the timing of signals at other points in the trial, and in other experiments, were as expected (e.g., the signal rose rapidly after cue onset in Pavlovian tasks).

      Furthermore, the photometry readout is also a convolution of the actual concentration changes over time, influenced by the on-off kinetics of the sensor, which makes the interpretation of timing effects of photobleaching less obvious than presented here and more complex than the dyes considered in the cited reference used as a foundation for this line of reasoning.

      We appreciate the nuance of this point, and we will add it to our discussion. In response to your criticism, we have consulted with more experts in the field regarding the potential for bleaching in this data, and it is not clear to us why photobleaching would be visible in one time-window of a trial, but not at another (less than a second away), despite high dF/F magnitudes in both time-windows. We do wish to point out that, at the request of the authors, we analyzed many experiments from the same animals and in most cases did not observe other indications of photobleaching. Hence, it is not clear to us why this particular set of experiments would garner additional skepticism regarding the potential for photobleaching to invalidate results. While the role of photobleaching may be more complicated with this sensor than others in the references, that citation was included, at the suggestion of Jeong et al., 2022 simply as a way of acknowledging that non-linearities in photobleaching can occur.

      Within this discussion of photobleaching, the characterization of the background reward experiments used in part to consider photobleaching (appendix 7.3.2) is incorrect. In this experiment (Jeong et al., 2022), background rewards were only delivered in the inter-trial-interval (i.e. not between the CS+ and predicted reward as stated in the text). Both in the authors' description and in the data, there is a 6s before cue onset where rewards are not delivered and while not described in the text, the data suggests there is a period after a predicted reward when background rewards are not delivered. This complicates the comparison of this data to the random reward experiment.

      Thank you for pointing this out!! We will remove the parenthetical on page 18 of the appendix that incorrectly stated that rewards can occur between the CS+ and the predicted reward.

      The discussion of the lack of evidence for backpropagation, taken as evidence for ANCCR over RPE, is also weak.

      This point was meant to acknowledge that, although our method yields results that conflict with the conclusions described by Jeong et al., 2022 on data from some experiments, on other experiments our method supports their results. Again, we believe that a critical part of open science is acknowledging both areas where analyses support and conflict with those of the original authors. We agree with the reviewer that qualifying our results so as not to emphasize support for/against RPE/ANCCR will strengthen our paper, and we will make these changes.

      A more useful exercise than comparing FLMM to the methods and data of Jeong et al., 2022, would be to compare against the approach of Amo et al., 2022, which identifies backpropagation (data publicly available: DOI: 10.5061/dryad.hhmgqnkjw). The replication of a positive result would be more convincing of the sensitivity of the methodology than the replication of a negative result, which could be a result of many factors in the experimental design. Given that the Amo et al. analysis relies on identifying systematic changes in the timing of a signal over time, this would be particularly useful in understanding if the smoothing steps in FLMM obscure such changes.

      Thank you for this suggestion, and we agree this could be a useful analysis for the field. Your thoughtful review has convinced us that focusing on our statistical contribution will strengthen the paper, and we will make changes to further emphasize that we are not seeking to adjudicate between RPE/ANCCR. We only had space in the manuscript to include a subset of the analyses conducted on Jeong et al., 2022, and had to relegate the results from the Coddington et al., data to an appendix. Realistically, it would be hard for us to justify analyzing a third dataset. As you may surmise from the one we presented, reanalyzing a new dataset is usually very time consuming, and invariably requires extensive communication with the original authors. We did include numerous examples in our manuscript where we already replicated positive results, in a way that we believe demonstrates the sensitivity of the methodology. We have also been working with five groups at NIH and elsewhere using our approach, in experiments targeting different scientific questions. In fact, one paper that extensively applies our method and compares the results from those yielded by standard analysis of AUCs is already accepted and in press. Hence there should soon be additional demonstrations of what the method can do in less controversial settings. Finally, our forthcoming vignettes include additional analyses, not included in the manuscript, that replicate positive results. We take your point that our description of the data supporting one theory or the other should be qualified, and we will correct that. Again, your review was very thorough, and we appreciate your taking so much time to help us improve our work.

      Reviewer #2 (Recommendations For The Authors):

      First, I would like to commend the authors for the clarity of the paper, and for creating an open-source package that will help researchers more easily adopt this type of analysis.

      Thank you!

      I would suggest the authors consider adding to the manuscript, either some evidence or some intuition on how feasible would be to use FLMM for very complex model specifications, in terms of computational cost and model convergence.

      This is an excellent point and we will make this suggested change in the Methods and Discussion section in the next draft.

      From my understanding, this package might potentially be useful not just for photometry data but also for two-photon recordings for example. If so, I would also suggest the authors add to the discussion this potential use.

      We appreciate your thinking on this point, as it would definitely help expand use of the method. We included a brief point in the Discussion that this package would be useful for other techniques, but we will expand upon this.

      Reviewer #3 (Recommendations For The Authors):

      The authors should define 'function' in context, as well as provide greater detail of the alternate tests that FLMM is compared to in Figure 7. Given the novelty of estimating joint CIs, the authors should be clearer about how this should be reported and how this differs from pointwise CIs (and how this has been done in the past).

      Thank you, this is a very good point and will be critical for helping analysts describe and interpret results. We will add more detail to the Methods section on this point.

      The authors identify that many photometry studies are complex nested longitudinal designs, using the cohort of 8 animals used in five task designs of Jeong et al. 2022 as an example. The authors miss the opportunity to illustrate how FLMM might be useful in identifying the effects of subject characteristics (e.g. sex, CS+ cue identity).

      This is a great suggestion and we will add this important point to the discussion , especially in light of the factorial designs common in neuroscience experiments.

      In discussing the delay-length change experiment, it would be more accurate to say that proposed versions of RPE and ANCCR do not predict the specific change.

      We will make this change and agree this is a better phrasing.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides valuable insights into how the brain parses the syntactic structure of a spoken sentence. A unique contribution of the work is to use a large language model to quantify how the mental representation of syntactic structure updates as a sentence unfolds in time. Solid evidence is provided that distributive cortical networks are engaged for incremental parsing of a sentence, although the contribution could be further strengthened if the authors would further highlight the main results and clarify the benefit of using a large language model.

      We thank the editors for the overall positive assessment. We have revised our manuscript to further emphasize our main findings and highlight the advantages of using a large language model (LLM) over traditional behavioural and corpus-based data.

      This study aims to investigate the neural dynamics underlying the incremental construction of structured interpretation during speech comprehension. While syntactic cues play an important role, they alone do not define the essence of this parsing process. Instead, this incremental process is jointly determined by the interplay of syntax, semantics, and non-linguistic world knowledge, evoked by the specific words heard sequentially by listeners. To better capture these multifaceted constraints, we derived structural measures from BERT, which dynamically represent the evolving structured interpretation as a sentence unfolds word-by-word.

      Typically, the syntactic structure of a sentence can be represented by a context-free parse tree, such as a dependency parse tree or a constituency-based parse tree, which abstracts away from specific content, assigning a discrete parse depth to each word regardless of its semantics. However, this context-free parse tree merely represents the result rather than the process of sentence parsing and does not elucidate how a coherent structured interpretation is concurrently determined by multifaceted constraints. In contrast, BERT parse depth, trained to approach the context-free discrete dependency parse depth, is a continuous variable. Crucially, its deviation from the corresponding discrete parse depth indicates the preference for the syntactic structure represented by this context-free parse. As BERT processes a sentence delivered word-by-word, the dynamic change of BERT parse depth reflects the incremental nature of online speech comprehension.

      Our results reveal a behavioural alignment between BERT parse depth and human interpretative preference for the same set of sentences. In other words, BERT parse depth could represent a probabilistic interpretation of a sentence’s structure based on its specific contents, making it possible to quantify the preference for each grammatically correct syntactic structure during incremental speech comprehension. Furthermore, both BERT and human interpretations show correlations with linguistic knowledge, such as verb transitivity, and non-linguistic knowledge, like subject noun thematic role preference. Both types of knowledge are essential for achieving a coherent interpretation, in accordance with the “constraint-based hypothesis” of sentence processing.

      Motivated by the observed behavioural alignment between BERT and human listeners, we further investigated BERT structural measures in source-localized EEG/MEG using representational similarity analyses (RSA). This approach revealed the neural dynamics underlying incremental speech comprehension on millisecond scales. Our main findings include: (1) a shift from bi-hemispheric lateral frontal-temporal regions to left-lateralized regions in representing the current structured interpretation as a sentence unfolds, (2) a pattern of sequential activations in the left lateral temporal regions, updating the structured interpretation as syntactic ambiguity is resolved, and (3) the influence of lexical interpretative coherence activated in the right hemisphere over the resolved sentence structure represented in the left hemisphere.

      From our perspective, the advantages of using a LLM (or deep language model) like BERT are twofold. Conceptually, BERT structural measures offer a deep contextualized structural representation for any given sentence by integrating the multifaceted constraints unique to the specific contents described by the words within that sentence. Modelling this process on a word-by-word basis is challenging to achieve with behavioural or corpus-based metrics. Empirically, as demonstrated in our responses to the reviewers below, BERT measures show better performance compared to behavioural and corpus-based metrics in aligning with listeners’ neural activity. Moreover, when it comes to integrating multiple sources of constraints for achieving a coherent interpretation, BERT measures also show a better fit with the behavioural data of human listeners than corpus-based metrics.

      Taken together, we propose that LLMs, akin to other artificial neural networks (ANNs), can be considered as computational models for formulating and testing specific neuroscientific hypotheses, such as the “constraint-based hypothesis” of sentence processing in this study. However, we by no means overlook the importance of corpus-based and behavioural metrics. These metrics play a crucial role in interpreting and assessing whether and how ANNs stimulate human cognitive processes, a fundamental step in employing ANNs to gain new insights into the neural mechanisms of human cognition.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the authors investigate where and when brain activity is modulated by incoming linguistic cues during sentence comprehension. Sentence stimuli were designed such that incoming words had varying degrees of constraint on the sentence's structural interpretation as participants listened to them unfolding, i.e. due to varying degrees of verb transitivity and the noun's likelihood of assuming a specific thematic role. Word-by-word "online" structural interpretations for each sentence were extracted from a deep neural network model trained to reproduce language statistics. The authors relate the various metrics of word-by-word predicted sentence structure to brain data through a standard RSA approach at three distinct points of time throughout sentence presentation. The data provide convincing evidence that brain activity reflects preceding linguistic constraints as well as integration difficulty immediately after word onset of disambiguating material.

      We thank Reviewer #1 (hereinafter referred to as R1) for their recognition of the objectives of our study and the analytical approaches we have employed in this study.

      The authors confirm that their sentence stimuli vary in degree of constraint on sentence structure through independent behavioral data from a sentence continuation task. They also show a compelling correlation of these behavioral data with the online structure metric extracted from the deep neural network, which seems to pick up on the variation in constraints. In the introduction, the authors argue for the potential benefits of using deep neural networkderived metrics given that it has "historically been challenging to model the dynamic interplay between various types of linguistic and nonlinguistic information". Similarly, they later conclude that "future DLMs (...) may provide new insights into the neural implementation of the various incremental processing operations(...)".

      We appreciate R1’s positive comments on the design, quantitative modelling and behavioural validation of the sentence stimuli used in this experiment.

      By incorporating structural probing of a deep neural network, a technique developed in the field of natural language processing, into the analysis pipeline for investigating brain data, the authors indeed take an important step towards establishing advanced machine learning techniques for researching the neurobiology of language. However, given the popularity of deep neural networks, an argument for their utility should be carefully evidenced.

      We fully concur with R1 regarding the need for cautious evaluation and interpretation of deep neural networks’ utility. In fact, this perspective underpinned our decision to conduct extensive correlation analyses using both behavioural and corpus-based metrics to make sense of BERT metrics. These analyses were essential to interpret and validate BERT metrics before employing them to investigate listeners’ neural activity during speech comprehension. We do not in any way undermine the importance of behavioural or corpus-based data in studying language processing in the brain. On the contrary, as evidenced by our findings, these traditional metrics are instrumental in interpreting and guiding the use of metrics derived from LLMs.

      However, the data presented here don't directly test how large the benefit provided by this tool really is. In fact, the authors show compelling correlations of the neural network-derived metrics with both the behavioral cloze-test data as well as several (corpus-)derived metrics. While this is a convincing illustration of how deep language models can be made more interpretable, it is in itself not novel. The correlation with behavioral data and corpus statistics also raises the question of what is the additional benefit of the computational model? Is it simply saving us the step of not having to collect the behavioral data, not having to compute the corpus statistics or does the model potentially uncover a more nuanced representation of the online comprehension process? This remains unclear because we are lacking a direct comparison of how much variance in the neural data is explained by the neural network-derived metrics beyond those other metrics (for example the main verb probability or the corpusderived "active index" following the prepositional phrase).

      From our perspective, a primary advantage of using the neural network-derived metrics (or LLMs as computational models of language processing), compared to traditional behavioural and corpus-based metrics, lies in their ability to offer more nuanced, contextualized representations of natural language inputs. There seems no effective way of computationally capturing the distributed and multifaceted constraints within specific contexts until the current generation of LLMs came along. While it is feasible to quantify lexical properties or contextual effects based on the usage of specific words via corpora or behavioural tests, this method appears less effective in modelling the composition of meanings across more words on the sentence level. More critically, it struggles with capturing how various lexical constraints collectively yield a coherent structured interpretation.

      Accumulating evidence suggests that models designed for context prediction or next-word prediction, such as word2vec and LLMs, outperform classic count-based distributional semantic models (Baroni et al. 2014) in aligning with neural activity during language comprehension (Schrimpf et al. 2021; Caucheteux and King 2022). Relevant to this, we have conducted additional analyses to directly assess the additional variance of neural data explained by BERT metrics, over and above what traditional metrics account for. Specifically, using RSA, we re-tested model RDMs based on BERT metrics while controlling for the contribution from traditional metrics (via partial correlation).

      During the first verb (V1) epoch, we tested model RDMs of V1 transitivity based on data from either the behavioural pre-test (i.e., continuations following V1) or massive corpora. Contrasting sharply with the significant model fits observed for BERT V1 parse depth in bilateral frontal and temporal regions, the two metrics of V1 transitivity did not exhibit any significant effects (see Author response image 1).

      Author response image 1

      RSA model fits of BERT structural metrics and behavioural/corpus-based metrics in the V1 epoch. (upper) Model fits of BERT V1 parse depth (relevant to Appendix 1-figure 10A); (middle) Model fits of the V1 transitivity based on the continuation pre-rest conducted at the end of V1 (e.g., completing “The dog found …”); (bottom) Model fits of the V1 transitivity based on the corpus data (as described in Methods). Note that verb transitivity is quantified as the proportion of its transitive uses (i.e., followed by a direct object) relative to its intransitive uses.

      In the PP1 epoch, which was aligned to the onset of the preposition in the prepositional phrase (PP), we tested the probability of a PP continuation following V1 (e.g., the probability of a PP after “The dog found…”). While no significant results were found for PP probability, we have plotted the uncorrected results for PP probability (Author response image 2). These model fits have very limited overlap with those of BERT parse depth vector (up to PP1) in the left inferior frontal gyrus (approximately at 360 ms) and the left temporal regions (around 600 ms). It is noteworthy that the model fits of the BERT parse depth vector (up to PP1) remained largely unchanged even when PP probability was controlled for, indicating that the variance explained by BERT metrics cannot be effectively accounted for by the PP probability obtained from the human continuation pre-test.

      Author response image 2

      Comparison between the RSA model fits of BERT structural metrics and behavioural / corpusbased metrics in the PP1 epoch. (upper) Model fits of BERT parse depth vector up to PP1 (relevant to Figure 6B in the main text); (middle) Model fits of the probability of a PP continuation in the prerest conducted at the end of the first verb; (bottom) Model fits of BERT parse depth vector up to PP1 after partialling out the variance explained by PP probability.

      Finally, in the main verb (MV) epoch, we tested the model RDM based on the probability of a MV continuation following the PP (e.g., the probability after “The dog found in the park…”). When compared with the BERT parse depth vector (up to MV), we observed a similar effect in the left dorsal frontal regions (see Author response image 3). However, this effect did not survive after the whole-brain multiple comparison correction. Subsequent partial correlation analyses revealed that the MV probability accounted for only a small portion of the variance in neural data explained by the BERT metric, primarily the effect observed in the left dorsal frontal regions around 380 ms post MV onset. Meanwhile, the majority of the model fits of the BERT parse depth vector remained largely unchanged after controlling for the MV probability.

      Note that the probability of a PP/MV continuation reflect participants’ predictions based on speech input preceding the preposition (e.g., “The dog found…”) or the main verb (e.g., “The dog found in the park…”), respectively. In contrast, BERT parse depth vector is designed to represent the structure of the (partial) sentence in the speech already delivered to listeners, rather than to predict a continuation after it. Therefore, in the PP1 and MV epochs, we separately tested BERT parse depth vectors that included the preposition (e.g., “The dog found in…”) and the main verb (e.g., “The dog found in the park was…”) to accurately capture the sentence structure at these specific points in a sentence. Despite the differences in the nature of information captured by these two types of metrics, the behavioural metrics themselves did not exhibit significant model fits when tested against listeners’ neural activity.

      Author response image 3

      Comparison between the RSA model fits of BERT structural metrics and behavioural / corpusbased metrics in the MV epoch. (upper) Model fits of BERT parse depth vector up to MV (relevant to Figure 6C in the main text); (middle) Model fits of the probability of a MV continuation in the pre-rest conducted at the end of the prepositional phrase (e.g., “The dog found in the park …”); (bottom) Model fits of BERT parse depth vector up to MV after partialling out the variance explained by MV probability.

      Regarding the corpus-derived interpretative preference, we observed that neither the Active index nor the Passive index showed significant effects in the PP1 epoch. In the MV epoch, while significant model fits of the passive index were observed, which temporally overlapped with the BERT parse depth vector (up to MV) after the recognition point of the MV, the effects of these two model RDMs emerged in different hemispheres, as illustrated in Figures 6C and 8D in the main text. Consequently, we opted not to pursue further partial correlation analysis with the corpus-derived interpretative preference. Besides, as shown in Figure 8A, 8B and 8C, subject noun thematic role preference and non-directional index exhibit significant model fits in the PP1 or the MV epoch. Interesting, these effects lead corresponding effects of BERT metrics in the same epoch (see Figure 6B and 6C), suggesting that the overall structured interpretation emerges after the evaluation and integration of multifaceted lexical constraints.

      In summary, our findings indicate that, in comparison to corpus-derived or behavioural metrics, BERT structural metrics are more effective in explaining neural data, in terms of modelling both the unfolding sentence input (i.e., incremental BERT parse vector) and individual words (i.e., V1) within specific sentential contexts. This advantage of BERT metrics might be due to the hypothesized capacity of LLMs to capture more contextually rich representations. Such representations effectively integrate the diverse constraints present in a given sentence, thereby outperforming corpus-based metrics or behavioural metrics in this respect. Concurrently, it is important to recognize the significant role of corpus-based / behavioral metrics as explanatory variables. They are instrumental not only in interpreting BERT metrics but also in understanding their fit to listeners’ neural activity (by examining the temporal sequence and spatial distribution of model fits of these two types of metrics). Such an integrative approach allows for a more comprehensive understanding of the complex neural processes underpinning speech comprehension.

      With regards to the neural data, the authors show convincing evidence for early modulations of brain activity by linguistic constraints on sentence structure and importantly early modulation by the coherence between multiple constraints to be integrated. Those modulations can be observed across bilateral frontal and temporal areas as well as parts of the default mode network. The methods used are clear and rigorous and allow for a detailed exploration of how multiple linguistic cues are neurally encoded and dynamically shape the final representation of a sentence in the brain. However, at times the consequences of the RSA results remain somewhat vague with regard to the motivation behind different metrics and how they differ from each other. Therefore, some results seem surprising and warrant further discussion, for example: Why does the neural network-derived parse depth metric fit neural data before the V1 uniqueness point if the sentence pairs begin with the same noun phrase? This suggests that the lexical information preceding V1, is driving the results. However, given the additional results, we can already exclude an influence of subject likelihood for a specific thematic role as this did not model the neural data in the V1 epoch to a significant degree.

      As pointed out by R1, model fits of BERT parse depth vector (up to V1) and its mismatch for the active interpretation were observed before the V1 uniqueness point (Figures 6A and 6D). These early effects could be attributed to the inclusion of different subject nouns in the BERT parse depth vectors. In our MEG data analyses, RSA was performed using all LoTrans and HiTrans sentences. Each of the 60 sentence sets contained one LoTrans sentence and one HiTrans sentence, which resulted in a 120 x 120 neural data RDM for each searchlight ROI across the brain within each sliding time window. Although LoTrans and HiTrans sentences within the same sentence set shared the same subject noun, subject nouns varied across sentence sets. This variation was expected to be reflected in both the model RDM of BERT metrics and the data RDM, a point further clarified in the revised manuscript.

      In contrast, when employing a model RDM constructed solely from the BERT V1 parse depth, we observed model fits peaking precisely at the uniqueness point of V1 (see Appendix 1figure 10). It is important to note that BERT V1 parse depth is a contextualized metric influenced by the preceding subject noun, which could account for the effects of BERT V1 parse depth observed before the uniqueness point of V1.

      Relatedly, In Fig 2C it seems there are systematic differences between HiTrans and LoTrans sentences regarding the parse depth of determiner and subject noun according to the neural network model, while this is not expected according to the context-free parse.

      We thank R1 for pointing out this issue. Relevant to Figure 3D (Figure 2C in the original manuscript), we presented the distributions of BERT parse depth for individual words as the sentence unfolds in Appendix 1-figure 2. Our analysis revealed that the parse depth of the subject noun in high transitivity (HiTrans) and low transitivity (LoTrans) sentences did not significantly differ, except for the point at which the sentence reached V1 (two-tailed twosample t-test, P = 0.05).

      However, we observed a significant difference in the parse depth of the determiner between HiTrans and LoTrans sentences (two-tailed two-sample t-test, P < 0.05 for all results in Appendix 1-figure 2). Additionally, the parse depth of the determiner was found to covary with that of V1 as the input unfolded to different sentence positions (Pearson correlation, P < 0.05 for all plots in Appendix 1-figure 2). This difference, unexpected in terms of the contextfree (dependency) parse used for training the BERT structural probing model, might be indicative of a “leakage” of contextual information during the training of the structural probing model, given the co-variation between the determiner and V1 which was designed to be different in their transitivity in the two types of sentences.

      Despite such unexpected differences observed in the BERT parse depths of the determiner, we considered the two sentence types as one group with distributed features (e.g., V1 transitivity) in the RSA, and used the BERT parse depth vector including all words in the sentence input to construct the model RDMs. Moreover, as indicated in Appendix 1-figure 3, compared to the content words, the determiner contributed minimally to the incremental BERT parse depth vector. Consequently, the noted discrepancies in BERT parse depth of the determiner between HiTrans and LoTrans sentences are unlikely to significantly bias our RSA results.

      "The degree of this mismatch is proportional to the evidence for or against the two interpretations (...). Besides these two measures based on the entire incremental input, we also focused on Verb1 since the potential structural ambiguity lies in whether Verb1 is interpreted as a passive verb or the main verb." The neural data fits in V1 epoch differ in their temporal profile for the mismatch metrics and the Verb 1 depth respectively. I understand the "degree of mismatch" to be a measure of how strongly the neural network's hidden representations align with the parse depth of an active or passive sentence structure. If this is correct, then it is not clear from the text how far this measure differs from the Verb 1 depth alone, which is also indicating either an active or passive structure.

      Within the V1 epoch, we tested three distinct types of model RDMs based on BERT metrics: (1) The BERT parse depth vector, representing the neural network’s hidden representation of the incremental sentence structure including all words up to V1. (2) The mismatch metric for either the Active or Passive interpretation, calculated as the distance between the BERT parse depth vector and the context-free parse depth vector for each interpretation. (3) The BERT parse depth of V1, crucial in representing the preferred structural interpretation of the unfolding sentence given its syntactic role as either a passive verb or the main verb.

      While the BERT parse depth vector per se does not directly indicate a preferred interpretation, its mismatch with the context-free parse depth vectors of the two possible interpretations reveals the favoured interpretation, as significant neural fit is only anticipated for the mismatch with the interpretation being considered. The contextualized BERT depth of V1 is also indicative of the preferred structure given the context-free V1 parse depth corresponding to different syntactic roles, however, compared to the interpretative mismatch, it does not fully capture contributions from other words in the input. Consequently, we expected the interpretative mismatch and the BERT V1 depth to yield different results. Indeed, our analysis revealed that, although both metrics extracted from the same BERT layer (i.e., layer 13) demonstrated early RSA fits in the left fronto-temporal regions, the V1 depth showed relatively more prolonged effects with a notable peak occurring precisely at the uniqueness point of V1 (compare Figure 6C and Appendix 1-figure 10). These complementary results underscore the capability of BERT metrics to align with neural responses, in terms of both an incrementally unfolding sentence and a specific word within it.

      In previous studies, differences in neural activity related to distinct amounts of open nodes in the parse tree have been interpreted in terms of distinct working memory demands (Nelson et al. pnas 2017, Udden et al tics 2020). It seems that some of the metrics, for example the neural network-derived parse depth or the V1 depth may be similarly interpreted in the light of working memory demands. After all, during V1 epoch, the sentences do not only differ with respect to predicted sentence structure, but also in the amount of open nodes that need to be maintained. In the discussion, however, the authors interpret these results as "neural representations of an unfolding sentence's structure".

      We agree with the reviewer that the Active and Passive interpretations differ in terms of the number of open nodes before the actual main verb is heard. Given the syntactic ambiguity in our sentence stimuli (i.e., LoTrans and Hi Trans sentences), it is infeasible to determine the exact number of open nodes in each sentence as it unfolds. Nevertheless, the RSA fits observed in the dorsal lateral frontal regions could be indicative of the varying working memory demands involved in building the structured interpretations across sentences. We have added this perspective in the revised manuscript.

      Reviewer #2 (Public Review):

      This article is focused on investigating incremental speech processing, as it pertains to building higher-order syntactic structure. This is an important question because speech processing in general is lesser studied as compared to reading, and syntactic processes are lesser studied than lower-level sensory processes. The authors claim to shed light on the neural processes that build structured linguistic interpretations. The authors apply modern analysis techniques, and use state-of-the-art large language models in order to facilitate this investigation. They apply this to a cleverly designed experimental paradigm of EMEG data, and compare neural responses of human participants to the activation profiles in different layers of the BERT language model.

      We thank Reviewer #2 (hereinafter referred to as R2) for the overall positive remarks on our study.

      Strengths:

      (1) The study aims to investigate an under-explored aspect of language processing, namely syntactic operations during speech processing

      (2) The study is taking advantage of technological advancements in large language models, while also taking linguistic theory into account in building the hypothesis space

      (3) The data combine EEG and MEG, which provides a valuable spatio-temporally resolved dataset

      (4) The use of behavioural validation of high/low transitive was an elegant demonstration of the validity of their stimuli

      We thank R2 for recognizing and appreciating the motivation and the methodology employed in this study.

      Weaknesses:

      (1) The manuscript is quite hard to understand, even for someone well-versed in both linguistic theory and LLMs. The questions, design, analysis approach, and conclusions are all quite dense and not easy to follow.

      To address this issue, we have made dedicated efforts to clarify the key points in our study. We also added figures to visualize our experimental design and methods (see Figure 1, Figure 3C and Figure 5 in the revised main text). We hope that these revisions have made the manuscript more comprehensible and straightforward for the readers.

      (2) The analyses end up seeming overly complicated when the underlying difference between sentence types is a simple categorical distinction between high and low transitivity. I am not sure why tree depth and BERT are being used to evaluate the degree to which a sentence is being processed as active or passive. If this is necessary, it would be helpful for the authors to motivate this more clearly.

      Indeed, as pointed by R2, the only difference between LoTrans and HiTrans sentences is the first verb (V1), whose transitivity is crucial for establishing an initial preference for either an Active or a Passive interpretation as the sentence unfolds. Nonetheless, in line with the constraint-based approach to sentence processing and supported by previous research findings, a coherent structured interpretation of a sentence is determined by the combined constraints imposed by all words within that sentence. In our study, the transitivity of V1 alone is insufficient to fully explain the interpretative preference for the sentence structure. The overall sentence-level interpretation also depends on the thematic role preference of the subject noun – its likelihood of being an agent performing an action or a patient receiving the action.

      This was evident in our findings, as shown in Author response image 1 above, where the V1 transitivity based on corpus or behavioural data did not fit to the neural data during the V1 epoch. In contrast, BERT structural measures [e.g., BERT parse depth vector (up to V1) and BERT V1 parse depth] offered contextualized representations that are presumed to integrate various lexical constraints present in each sentence. These BERT metrics exhibited significant model fits for the same neural data in the V1 epoch. Besides, a notable feature of BERT is its bi-directional attention mechanism, which allows for the dynamic updating of an earlier word’s representation as more of the sentence is heard, which is also changeling to achieve with corpus or behavioural metrics. For instance, the parse depth of the word “found” in the BERT parse depth vector for “The dog found…” differs from its parse depth in the vector for “The dog found in…”. This feature of BERT is particularly advantageous for investigating the dynamic nature of structured interpretation during speech comprehension, as it stimulates the continual updating of interpretation that occurs as a sentence unfolds (as shown by Figure 7 in the main text). We have elaborated on the rationale for employing BERT parse depth in this regard in the revised manuscript.

      (3) The main data result figures comparing BERT and the EMEG brain data are hard to evaluate because only t-values are provided, and those, only for significant clusters. It would be helpful to see the full 600 ms time course of rho values, with error bars across subjects, to really be able to evaluate it visually. This is a summary statistic that is very far away from the input data

      We appreciate this suggestion from R2. In the Appendix 1 of the revised manuscript, we have provided individual participants’ Spearman’s rho time courses for every model RDM tested in all the three epochs (see Appendix 1-figures 8-10 & 14-15). Note that RSA was conducted in the source-localized E/MEG, it is infeasible to plot the rho time course for each searchlight at one of the 8196 vertices on the cortical surface mesh. Instead, we plotted the rho time course of each ROI reported in the original manuscript. These plots complement the time-resolved heatmap of peak t-value in Figures 6-8 in the main text.

      (4) Some details are omitted or not explained clearly. For example, how was BERT masked to give word-by-word predictions? In its default form, I believe that BERT takes in a set of words before and after the keyword that it is predicting. But I assume that here the model is not allowed to see linguistic information in the future.

      In our analyses, we utilized the pre-trained version of BERT (Devlin et al. 2019) as released by Hugging Face (https://github.com/huggingface). It is noteworthy that BERT, as described in the original paper, was initially trained using the Cloze task, involving the prediction of masked words within an input. In our study, however, we neither retrained nor fine-tuned the pre-trained BERT model, nor did we employ it for word-by-word prediction tasks. We used BERT to derive the incremental representation of a sentence’s structure as it unfolded word-by-word.

      Specifically, we sequentially input the text of each sentence into the BERT, akin to how a listener would receive the spoken words in a sentence (see Figure 3C in the main text). For each incremental input (such as “The dog found”), we extracted the hidden representations of each word from BERT. These representations were then transformed into their respective BERT parse depths using a structural probing model (which was trained using sentences with annotated dependency parse tress from the Penn Treebank Dataset). The resulting BERT parse depths were subsequently used to create model RDMs, which were then tested against neural data via RSA.

      Crucially, in our approach, BERT was not exposed to any future linguistic information in the sentence. We never tested BERT parse depth of a word in an epoch where this word had not been heard by the listener. For example, the three-dimensional BERT parse depth vector for “The dog found” was tested in the V1 epoch corresponding to “found”, while the fourdimensional BERT parse depth vector for “The dog found in” was tested in the PP1 epoch of “in”.

      How were the auditory stimuli recorded? Was it continuous speech or silences between each word? How was prosody controlled? Was it a natural speaker or a speech synthesiser?

      Consistent with our previous studies (Kocagoncu et al. 2017; Klimovich-Gray et al. 2019; Lyu et al. 2019; Choi et al. 2021), all auditory stimuli in this study were recorded by a female native British English speaker, ensuring a neutral intonation throughout. We have incorporated this detail into the revised version of our manuscript for clarity.

      It is difficult for me to fully assess the extent to which the authors achieved their aims, because I am missing important information about the setup of the experiment and the distribution of test statistics across subjects.

      We are sorry for the previously omitted details regarding the experimental setup and the results of individual participants. As detailed in our responses above, we have now included the necessary information in the revised manuscript.

      Reviewer #3 (Public Review):

      Syntactic parsing is a highly dynamic process: When an incoming word is inconsistent with the presumed syntactic structure, the brain has to reanalyze the sentence and construct an alternative syntactic structure. Since syntactic parsing is a hidden process, it is challenging to describe the syntactic structure a listener internally constructs at each time moment. Here, the authors overcome this problem by (1) asking listeners to complete a sentence at some break point to probe the syntactic structure mentally constructed at the break point, and (2) using a DNN model to extract the most likely structure a listener may extract at a time moment. After obtaining incremental syntactic features using the DNN model, the authors analyze how these syntactic features are represented in the brain using MEG.

      We extend our thanks to Reviewer #3 (referred to as R3 below) for recognizing the methods we used in this study.

      Although the analyses are detailed, the current conclusion needs to be further specified. For example, in the abstract, it is concluded that "Our results reveal a detailed picture of the neurobiological processes involved in building structured interpretations through the integration across multifaceted constraints". The readers may remain puzzled after reading this conclusion.

      Following R3’s suggestion, we have revised the abstract and refined our conclusions in the main text to explicitly highlight our principal findings. These include: (1) a shift from bihemispheric lateral frontal-temporal regions to left-lateralized regions in representing the current structured interpretation as a sentence unfolds, (2) a pattern of sequential activations in the left lateral temporal regions, updating the structured interpretation as syntactic ambiguity is resolved, and (3) the influence of lexical interpretative coherence activated in the right hemisphere over the resolved sentence structure represented in the left hemisphere.

      Similarly, for the second part of the conclusion, i.e., "including an extensive set of bilateral brain regions beyond the classical fronto-temporal language system, which sheds light on the distributed nature of language processing in the brain." The more extensive cortical activation may be attributed to the spatial resolution of MEG, and it is quite well acknowledged that language processing is quite distributive in the brain.

      We fully agree with R3 on the relatively low spatial resolution of MEG. Our emphasis was on the observed peak activations in specific regions outside the classical brain areas related to language processing, such as the precuneus in the default mode network, which are unlikely to be artifacts due to the spatial resolution of MEG. We have revised the relevant contents in the Abstract.

      The authors should also discuss:

      (1) individual differences (whether the BERT representation is a good enough approximation of the mental representation of individual listeners).

      To address the issue of individual differences which was also suggested by R2, we added individual participants’ model fits in ROIs with significant effects of BERT representations in Appendix 1 of the revised manuscript (see Appendix 1-figures 8-10 & 14-15).

      (2) parallel parsing (I think the framework here should allow the brain to maintain parallel representations of different syntactic structures but the analysis does not consider parallel representations).

      In the original manuscript, we did not discuss parallel parsing because the methods we used does not support a direct test for this hypothesis. In our analyses, we assessed the preference for one of two plausible syntactic structures (i.e., Active and Passive interpretations) based on the BERT parse vector of an incremental sentence input. This assessment was accomplished by calculating the mismatch between the BERT parse depth vector and the context-free dependency parse depth vector representing each of the two structures. However, we only observed one preferred interpretation in each epoch (see Figures 6D-6F) and did not find evidence supporting the maintenance of parallel representations of different syntactic structures in the brain. Nevertheless, in the revised manuscript, we have mentioned this possibility, which could be properly explored in future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Consider fitting the behavioral data from the continuation pre-test to the brain data in order to illustrate the claimed advantage of using a computational model beyond more traditional methods.

      Following R1’s suggestion, we conducted additional RSA using more behavioural and corpusbased metrics. We then directly compared the fits of these traditional metrics to brain data with those of BERT metrics in the same epoch to provide empirical evidence for the advantage of using a computational model like BERT to explain listeners’ neural data (see Appendix 1figures 11-13).

      Clarify the use of "neural representations: For a clearer assessment of the results, please discuss your results (especially the fits with BERT parse depth) in terms of the potential effects of distinct sentence structure expectations on working memory demands and make clear where these can be disentangled from neural representations of an unfolding sentence's structure.

      In the revised manuscript, we have noted the working memory demands associated with the online construction of a structured interpretation during incremental speech comprehension. As mentioned in our response to the relevant comment by R1 above, our experimental paradigm is not suitable for quantitatively assessing working memory demands since it is difficult to determine the exact number of open nodes for our stimuli with syntactic ambiguity before the disambiguating point (i.e., the main verb) is reached. Therefore, while we can speculate the potential contribution of varying working memory demands (which might correlate with BERT V1 parse depth) to RSA model fits, we think it is not possible to disentangle their effects from the neural representation of an unfolding sentence’s structure modelled by BERT parse depths in our current study.

      Please add in methods a description of how the uniqueness point was determined.

      In this study, we defined the uniqueness point of a word as the earliest point in time when this word can be fully recognized after removing all of its phonological competitors. To determine the uniqueness point for each word of interest, we first identified the phoneme by which this word can be uniquely recognized according to CELEX (Baayen et al. 1993). Then, we manually labelled the offset of this phoneme in the auditory file of the spoken sentence in which this word occurred. We have added relevant description of how the uniqueness point was determined in the Methods section of the revised manuscript.

      I found the name "interpretative mismatch" very opaque. Maybe instead consider "preference".

      We chose to use the term “interpretative mismatch” rather than “preference” based on the operational definition of this metric, which is the distance between a BERT parse depth vector and one of the two context-free parse depth vectors representing the two possible syntactic structures, so that a smaller distance value (or mismatch) signifies a stronger preference for the corresponding interpretation.

      In the abstract, the authors describe the cognitive process under investigation as one of incremental combination subject to "multi-dimensional probabilistic constraint, including both linguistic and non-linguistic knowledge". The non-linguistic knowledge is later also referred to as "broad world knowledge". These terms lack specificity and across studies have been operationalized in distinct ways. In the current study, this "world knowledge" is operationalized as the likelihood of a subject noun being an agent or patient and the probability for a verb to be transitive, so here a more specific term may have been the "knowledge about statistical regularities in language".

      In this study, we specifically define “non-linguistic world knowledge” as the likelihood of a subject noun assuming the role of an agent or patient, which relates to its thematic role preference. This type of knowledge is primarily non-linguistic in nature, as exemplified by comparing nouns like “king” and “desk”. Although it could be reflected by statistical regularities in language, thematic role preference hinges more on world knowledge, plausibility, or real-world statistics. In contrast, “linguistic knowledge” in our study refers to verb transitivity, which focuses on the grammatically correct usage of a verb and is tied to statistical regularities within language itself. In the revised manuscript, we have provided clearer operational definitions for these two concepts and have ensured consistent usage throughout the text.

      Please spell out what exactly the "constraint-based hypothesis" is (even better, include an explicit description of the alternative hypothesis?).

      The “constraint-based hypothesis”, as summarized in a review (McRae and Matsuki 2013), posits that various sources of information, referred to as “constraints”, are simultaneously considered by listeners during incremental speech comprehension. These constraints encompass syntax, semantics, knowledge of common events, contextual pragmatic biases, and other forms of information gathered from both intra-sentential and extra-sentential context. Notably, there is no delay in the utilization of these multifaceted constraints once they become available, neither is a fixed priority assigned to one type of constraint over another. Instead, a diverse set of constraints is immediately brought into play for comprehension as soon as they become available as the relevant spoken word is recognized.

      An alternative hypothesis, proposed earlier, is the two-stage garden path model (Frazier and Rayner 1982; Frazier 1987). According to this model, there is an initial parsing stage that relies solely on syntax. This is followed by a second stage where all available information, including semantics and other knowledge, is used to assess the plausibility of the results obtained in the first-stage analysis and to conduct re-analysis if necessary (McRae and Matsuki 2013). In the Introduction of our revised manuscript, we have elaborated on the “constraint-based hypothesis” and mentioned this two-stage garden path model as its alternative.

      Fig1 B&C: In order to make the data more interpretable, could you estimate how many possible grammatical structural configurations there are / how many different grammatical structures were offered in the pretest, and based on this what would be the "chance probability" of choosing a random structure or for example show how many responded with a punctuation vs alternative continuations?

      In our analysis of the behavioural results, we categorized the continuations provided by participants in the pre-test at the offset of Verb1 (e.g., “The dog found/walked …”) into 6 categories, including DO (direct object), INTRANS (intransitive), PP (prepositional phrase), INF (infinitival complement), SC (sentential complement) and OTHER (gerund, phrasal verb, etc.).

      Author response table 1.

      Similarly, we categorized the continuations that followed the offset of the prepositional phrase (e.g., “The dog found/walked in the park …”) into 7 categories, including MV (main verb), END (i.e., full stop), PP (prepositional phrase), INF (infinitival complement), CONJ (conjunction), ADV (adverb) and OTHER (gerund, sentential complement, etc.).

      Author response table 2.

      It is important to note that the results of these two pre-tests, including the types of continuations and their probabilities, exhibited considerable variability between and within each sentence type (see also Figures 2B and 2C).

      Typo: "In addition, we found that BERT structural interpretations were also a correlation with the main verb probability" >> correlated instead of correlation.

      We apologize for this typo. We have conducted a thorough proofreading to identify and correct any other typos present in the revised manuscript.

      "In this regard, DLMs excel in a flexible combination of different types of features embedded in their rich internal representations". What are the "different types", spell out at least some examples for illustration.

      We have rephrased this sentence to give a more detailed description.

      Fig 2 caption: "Same color scheme as in (A)" >> should be 'as in (B)'?, and later A instead of B.

      We are sorry for this typo. We have corrected it in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      My biggest recommendation is to make the paper clearer in two ways: (i) writing style, by hand-holding the reader through each section, and the motivation for each step, in both simple and technical language; (ii) schematic visuals, of the experimental design and the analysis. A schematic of the main experimental manipulation would be helpful, rather than just listing two example sentences. It would also be helpful to provide a schematic of the experimental setup and the analysis approach, so that people can refer to a visual aid in addition to the written explanation. For example, it is not immediately clear what is being correlated with what - I needed to go to the methods to understand that you are doing RSA across all of the trials. Make sure that all of the relevant details are explained, and that you motivate each decision.

      We thank R2 for these suggestions. In the revised manuscript, we have enhanced the clarity of the main text by providing a more detailed explanation of the motivation behind each analysis and the interpretation of the corresponding results. Additionally, in response to R2’s recommendation, we have added a few figures, including the illustration of the experimental design (Figure 1) and methods (see Figure 3C and Figure 5).

      Different visualisation of neural results - The main data result figures comparing BERT and the EMEG brain data are hard to evaluate because only t-values are provided, and those, are only for significant clusters. It would be helpful to see the full 600 ms time course of rho values, with error bars across subjects, to really be able to evaluate it visually.

      In the original manuscript, we opted to present t-value time courses for the sake of simplicity in illustrating the fits of the 12 model RDMs tested in 3 epochs. Following R2’s suggestion, we have included the ROI model fit time courses of each model RDM for all individual participants, as well as the mean model fit time course with standard error in Appendix 1figures 8-10 & 14-15.

      How are the authors dealing with prosody differences that disambiguate syntactic structures, that BERT does not have access to?

      All spoken sentence stimuli were recorded by a female native British English speaker, ensuring a neutral intonation throughout. Therefore, prosody is unlikely to vary systematically between different sentence types or be utilized to disambiguate syntactic structures. Sample speech stimuli have been made available in the following repository: https://osf.io/7u8jp/.

      A few writing errors: "was kept updated every time"

      We are sorry for the typos. We have conducted proof-reading carefully to identify and correct typos throughout the revised manuscript.

      Explain why the syntactic trees have "in park the" rather than "in the park"?

      The dependency parse trees (e.g., Figure 3A) were generated according to the conventions of dependency parsing (de Marneffe et al. 2006).

      Why are there mentions of the multiple demand network in the results? I'm not sure where this comes from.

      The mention of the multiple demand network was made due to the significant RSA fits observed in the dorsal lateral prefrontal regions and the superior parietal regions, which are parts of the multiple demand network. This observation was particularly notable for the BERT parse depth vector in the main verb epoch when the potential syntactic ambiguity was being resolved. It is plausible that these effects observed are partly attributed to the varying working memory demands required to maintain the “opening nodes” in the different syntactic structures being considered by listeners at this point in the sentence.

      Reviewer #3 (Recommendations For The Authors):

      The study first asked human listeners to complete partial sentences, and incremental parsing of the partial sentences can be captured based on the completed sentences. This analysis is helpful and I wonder if the behavioral data here are enough to model the E/MEG responses. For example, if I understood it correctly, the parse depth up to V1 can be extracted based on the completed sentences and used for the E/MEG analysis.

      The behavioural data alone do not suffice to model the E/MEG data. As we elucidated in our responses to R1, we employed three behavioural metrics derived from the continuation pretests. These metrics include the V1 transitivity and the PP probability, given the continuations after V1 (e.g., after “The dog found…”), as well as the MV probability, given the continuations after the prepositional phrase (e.g., after “The dog found in the park…”). These metrics aimed to capture participants’ prediction based on their structured interpretations at various positions in the sentence. However, none of these behavioural metrics yielded significant model fits to the listeners’ neural activity, which sharply contrasts with the substantial model fits of the BERT metrics in the same epochs. Besides, we also tried to model V1 parse depth as a weighted average based on participants’ continuations. As shown in Figure 3A, V1 parse depth is 0 in the active interpretation, 2 in the passive interpretation, while the parse depth of the determiner and the subject noun does not differ. However, this continuation-based V1 parse depth [i.e., 0 × Probability(active interpretation) + 2 × Probability(passive interpretation)] did not show significant model fits.

      Related to this point, I wonder if the incremental parse extracted using BERT is consistent with the human results (i.e., parsing extracted based on the completed sentences) on a sentence-bysentence basis.

      In fact, we did provide evidence showing the alignment between the incremental parse extracted using BERT and the human interpretation for the same partial sentence input (see Figure 4 in the main text and Appendix 1-figures 4-6).

      Furthermore, in Fig 1d, is it possible to calculate how much variance of the 3 probabilities is explained by the 4 factors, e.g., using a linear model? If these factors can already explain most of the variance of human parsing, is it possible to just use these 4 factors to explain neural activity?

      Following R3’s suggestion, we have conducted additional linear modelling analyses to compare the extent to which human behavioural data can be explained by corpus metrics and BERT metrics separately. Specifically, for each of the three probabilities obtained in the pretests (i.e., DO, PP, and MV), we constructed two linear models. One model utilized the four corpus-based metrics as regressors (i.e., SN agenthood, V1 transitivity, Passive index, and Active index), while the other model used BERT metrics as regressors (i.e., BERT parse depth of each word up to V1 from layer 13 for DO/PP probability and BERT parse depth of each word up to the end of PP from layer 14 for MV probability, consistent with the BERT layers reported in Figure 6).

      As shown in the table below, corpus metrics demonstrate a more effective fit than BERT metrics for predicting the DO/PP probability. The likelihood of a DO/PP continuation is chiefly influenced by the lexical syntactic property of V1 (i.e., transitivity), and appears to rely less on contextual factors. Since V1 transitivity is explicitly included as one of the corpus metrics, it is thus expected to align more closely with the DO/PP probability compared to BERT metrics, primarily reflecting transitive versus intransitive verb usage.

      Author response table 3.

      Actually, BERT V1 parse depth was not correlated with V1 transitivity when the sentence only unfolds to V1 (see Appendix 1-figure 6). This lack of correlation may stem from the fact that the BERT probing model was designed to represent the structure of a (partially) unfolded sentence, rather than to generate a continuation or prediction. Moreover, V1 transitivity alone does not conclusively determine the Active or Passive interpretation by the end of V1. For instance, both transitive and intransitive continuations after V1 are compatible with an Active interpretation. Consequently, the initial preference for an Active interpretation (as depicted by the early effects before V1 was recognized in Figure 6D), might be predominantly driven by the animate subject noun (SN) at the beginning of the sentence, a word order cue in languages like English (Mahowald et al. 2023).

      In contrast, when assessing the probability of a MV following the PP (e.g., after “The dog found in the park ...”), BERT metrics significantly outperformed corpus metrics in terms of fitting the MV probability. Although SN thematic role preference and V1 transitivity were designed to be the primary factors constraining the structured interpretation in this experiment, we could only obtain their context-independent estimates from corpora (i.e., considering all contexts). Additionally, despite Active/Passive index (a product of these two factors) are correlated with the MV probability, it may oversimplify the task of capturing the specific context of a given sentence. Furthermore, the PP following V1 is also expected to influence the structured interpretation. For instance, whether “in the park” is a more plausible scenario for people to find a dog or for a dog to find something. Thus, this finding suggests that the corpus-based metrics are not as effective as BERT in representing contextualized structured interpretations (for a longer sentence input), which might require the integration of constraints from every word in the input.

      In summary, corpus-based metrics excel in explaining human language behaviour when it primarily relies on specific lexical properties. However, they significantly lag behind BERT metrics when more complex contextual factors come into play at the same time. Regarding their performance in fitting neural data, among the four corpus-based metrics, we only observed significant model fits for the Passive index in the MV epoch when the intended structure for a Passive interpretation was finally resolved, while the other three metrics did not exhibit significant model fits in any epoch. Note that subject noun thematic role preference did fit neural data in the PP and MV epochs (Figure 8A and 8B). In contrast, the incremental BERT parse depth vector exhibited significant model fits in all three epochs we tested (i.e., V1, PP1, and MV).

      To summarize, I feel that I'm not sure if the structural information BERT extracts reflect the human parsing of the sentences, especially when the known influencing factors are removed.

      Based on the results presented above and, in the manuscript, BERT metrics align closely with human structured interpretations in terms of both behavioural and neural data. Furthermore, they outperform corpus-based metrics when it comes to integrating multiple constraints within the context of a specific sentence as it unfolds.

      Minor issues:

      Six types of sentences were presented. Three types were not analyzed, but the results for the UNA sentences are not reported either.

      In this study, we only analysed two out of the six types of sentences, i.e., HiTrans and LoTrans sentences. The remaining four types of sentences were included to ensure a diverse range of sentence structures and avoid potential adaption the same syntactic structure.

      Fig 1b, If I understood it correctly, each count is a sentence. Providing examples of the sentences may help. Listing the sentences with the corresponding probabilities in the supplementary materials can also help.

      Yes, each count in Figure 2B (Figure 1B in the original manuscript) is a sentence. All sentence stimuli and results of pre-tests are available in the following repository https://osf.io/7u8jp/.

      "trajectories of individual HiTrans and LoTrans sentences are considerably distributed and intertwined (Fig. 2C, upper), suggesting that BERT structural interpretations are sensitive to the idiosyncratic contents in each sentence." It may also mean the trajectories are noisy.

      We agree with R3 that there might be unwanted noise underlying the distributed and intertwined BERT parse depth trajectories of individual sentences. Meanwhile, it is also important to note that the correlation between BERT parse depths and lexical constraints of different words at the same position across sentences is statistically supported.

      References

      Baayen RH, Piepenbrock R, van H R. 1993. The {CELEX} lexical data base on {CD-ROM}. Baroni M, Dinu G, Kruszewski G. 2014. Don't count, predict! A systematic comparison of contextcounting vs. context-predicting semantic vectors. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Vol 1.238-247.

      Caucheteux C, King JR. 2022. Brains and algorithms partially converge in natural language processing. Communications Biology. 5:134.

      Choi HS, Marslen-Wilson WD, Lyu B, Randall B, Tyler LK. 2021. Decoding the Real-Time Neurobiological Properties of Incremental Semantic Interpretation. Cereb Cortex. 31:233-247.

      de Marneffe M-C, MacCartney B, Manning CD editors. Generating typed dependency parses from phrase structure parses, Proceedings of the 5th International Conference on Language Resources and Evaluation; 2006 May 22-28, 2006; Genoa, Italy:European Language Resources Association. 449-454 p.

      Devlin J, Chang M-W, Lee K, Toutanova K editors. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2019 June 2-7, 2019; Minneapolis, MN, USA:Association for Computational Linguistics. 4171-4186 p.

      Frazier L. 1987. Syntactic processing: evidence from Dutch. Natural Language & Linguistic Theory. 5:519-559.

      Frazier L, Rayner K. 1982. Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology. 14:178-210.

      Klimovich-Gray A, Tyler LK, Randall B, Kocagoncu E, Devereux B, Marslen-Wilson WD. 2019. Balancing Prediction and Sensory Input in Speech Comprehension: The Spatiotemporal Dynamics of Word Recognition in Context. Journal of Neuroscience. 39:519-527.

      Kocagoncu E, Clarke A, Devereux BJ, Tyler LK. 2017. Decoding the cortical dynamics of soundmeaning mapping. Journal of Neuroscience. 37:1312-1319.

      Lyu B, Choi HS, Marslen-Wilson WD, Clarke A, Randall B, Tyler LK. 2019. Neural dynamics of semantic composition. Proceedings of the National Academy of Sciences of the United States of America. 116:21318-21327.

      Mahowald K, Diachek E, Gibson E, Fedorenko E, Futrell R. 2023. Grammatical cues to subjecthood are redundant in a majority of simple clauses across languages. Cognition. 241:105543.

      McRae K, Matsuki K. 2013. Constraint-based models of sentence processing. Sentence processing. 519:51-77.

      Schrimpf M, Blank IA, Tuckute G, Kauf C, Hosseini EA, Kanwisher N, Tenenbaum JB, Fedorenko E. 2021. The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences of the United States of America. 118:e2105646118.

    2. eLife assessment

      This valuable study provides insights into how the brain parses the syntactic structure of a spoken sentence. Convincing evidence is provided that distributive cortical networks are engaged for incremental parsing of a sentence, and neural activity recorded by MEG correlates with sentence structure measures extracted by a deep neural network language model, i.e., BERT. A contribution of the work is to use a deep neural network model to quantify how the mental representation of syntactic structure updates as a sentence unfolds in time.

    3. Reviewer #2 (Public Review):

      This article is focused on investigating incremental speech processing, as it pertains to building higher order syntactic structure. This is an important question because speech processing in general is lesser studied as compared to reading, and syntactic processes are lesser studied than lower-level sensory processes. The authors claim to shed light on the neural processes that build structured linguistic interpretations. The authors apply modern analysis techniques, and use state-of-the-art large language models in order to facilitate this investigation. They apply this to a cleverly designed experimental paradigm of EMEG data, and compare neural responses of human participants to the activation profiles in different layers of the BERT language model.

      Comments on revised version:

      Similar to my original review, I find the paper hard to follow, and it is not clear to me that the use of the LLM is adding much to the findings. Using complex language models without substantial motivation dampens my enthusiasm significantly. This concern has not been alleviated since my original review.

    4. Reviewer #3 (Public Review):

      Syntactic parsing is a highly dynamic process: When an incoming word is inconsistent with the presumed syntactic structure, the brain has to reanalyze the sentence and construct an alternative syntactic structure. Since syntactic parsing is a hidden process, it is challenging to describe the syntactic structure a listener internally constructs at each time moment. Here, the authors overcome this problem by (1) asking listeners to complete a sentence at some break point to probe the syntactic structure mentally constructed at the break point, and (2) using a DNN model to extract the most likely structure a listener may extract at a time moment.

      After obtaining incremental syntactic features using a DNN model, i.e., BERT, the authors analyze how these syntactic features are represented in the brain using MEG. The advantage of the approach is that BERT can potentially integrate syntactic and semantic knowledge and is a computational model, instead of a static theoretical construct, that may more precisely reflect incremental sentence processing in the human brain. The results indeed confirm the similarity between MEG activity and measures from the BERT model.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      The brain-machine interface used in this study differs from typical BMIs in that it's not intended to give subjects voluntary control over their environment. However, it is possible that rats may become aware of their ability to manipulate trial start times using their neural activity. Is there any evidence that the time required to initiate trials on high-coherence or low-coherence trials decreases with experience?

      This is a great question. First, we designed the experiment to avoid this possibility. Rats were experienced on the sequence of the automatic maze both pre and post implantation (totaling to weeks of pre-training and habituation). As such, the majority of the trials ever experienced by the rat were not controlled by their neural activity. During BMI experimentation, only 10% of trials were triggered during high coherence states and 10% for low coherence states, leaving ~80% of trials not controlled by their neural activity. We also implemented a pseudo-randomized trial sequence. When considered together, we specifically designed this experiment to avoid the possibility that rats would actively use their neural activity to control the maze.

      Second, we had a similar question when collecting data for this manuscript and so we conducted a pilot experiment. We took 3 rats from experiment #1 (after its completion) and we required them to perform “forced-runs” over the course of 3-4 days, a task where rats navigate to a reward zone and are rewarded with a chocolate pellet. The trajectory on “forced-runs” is predetermined and rats were always rewarded for navigating along the predetermined route. Every trial was initiated by strong mPFC-hippocampal theta coherence. We were curious as to whether time-to-trial-onset would decrease if we repeatedly paired trial onset to strong mPFC-hippocampal theta coherence. 1 out of 3 rats (rat 21-35) showed a significant correlation between time-to-trial onset and trial number, indicating that our threshold for strong mPFC-hippocampal theta coherence was being met more quickly with experience (Figure R1A). When looking over sessions and rats, there was considerable variability in the magnitude of this correlation and sometimes even the direction (Figure R1B). As such, the degree to which rat 21-35 was aware of controlling the environment by reaching strong mPFC-hippocampal theta coherence is unclear, but this question requires future experimentation.

      Author response image 1.

      Strong mPFC-hippocampal theta coherence was used to control trial onset for the entirety of forced-navigation sessions. Time-to-trial onset is a measurement of how long it took for strong coherence to be met. A) Time-to-trial onset was averaged across sessions for each rat, then plotted as a function of trial number (within-session experience on the forced-runs task). Rat 21-35 showed a significant negative correlation between time-to-trial onset and trial number, indicating that time-to-coherence reduced with experience. The rest of the rats did not display this effect. B) Correlation between trial-onset and trial number (y-axis; see A) across sessions (x-axis). A majority of sessions showed a negative correlation between time-to-trial onset and trial number, like what was seen in (A), but the magnitude and sometimes direction of this effect varied considerably even within an animal.

      Is there any evidence that rats display better performance on trials with random delays in which HPC-PFC coherence was naturally elevated?

      This question is now addressed in Extended Figure 5 and discussed in the section titled “strong prefrontal-hippocampal theta coherence leads to correct choices on a spatial working memory task”.

      The introduction frames this study as a test of the "communication through coherence" hypothesis. In its strongest form, this hypothesis states that oscillatory synchronization is a pre-requisite for inter-areal communication, i.e. if two areas are not synchronized, they cannot transfer information. Recent experimental evidence shows this relationship is more likely inverted-coherence is a consequence of inter-areal interactions, rather than a cause. See Schneider et al. (DOI: 10.1016/j.neuron.2021.09.037) and Vinck et al. (10.1016/j.neuron.2023.03.015) for a more in-depth explanation of this distinction. The authors should expand their treatment of this hypothesis in light of these findings.

      Our introduction and discussions have sections dedicated to these studies now.

      Figure 6 - It would be much more intuitive to use the labels "Rat 1", "Rat 2", and "Rat 3"; the "21-4X" identifiers are confusing.

      This was corrected in the paper.

      Figure 6C - The sub-plots within this figure are rather small and difficult to interpret. The figure would be easier to parse if the data were presented as a heatmap of the ratio of theta power during blue vs. red stim, with each pixel corresponding to one channel.

      This suggestion was implemented in the paper. See Fig 6C. Extended Fig. 8 now shows the power spectra as a function of recording shank and channel.

      Ext. Figure 2B - What happens during an acquisition failure? Instead of "Amount of LFP data," consider using "Buffer size".

      Corrected.

      Ext. Figure 2D-E - Instead of "Amount of data," consider using "Window size"

      Referred to as buffer size.

      Ext. Figure 2E - y-axis should extend down to 4 Hz. Are all of the last four values exactly at 8 Hz?

      Yes. Values plateau at 8Hz. These data represent an average over ~50 samples.

      Ext. Figure 2F - consider moving this before D/E, since those panels are summaries of panel F

      Corrected.

      Ext. Figure 4A - ANOVA tells you that accuracy is impacted by delay duration, but not what that impact is. A post-hoc test is required to show that long delays lead to lower accuracy than short ones. Alternatively, one could compute the correlation between delay duration and proportion correctly for each mouse, and look for significant negative values.

      We included supplemental analyses in Extended Fig. 4

      Reviewer #2 (Recommendations For The Authors):

      The authors should replace terms that suggest a causal relationship between PFC-HPC synchrony and behavior, such as 'leads to', 'biases', and 'enhances' with more neutral terms.

      Causal implications were toned down and wherever “leads” or “led” remains, we specifically mean in the context of coherence being detected prior to a choice being made.

      The rationale for the analysis described in the paragraph starting on line 324, and how it fits with the preceding results, was not clear to me. The authors also write at the start of this paragraph "Given that mPFC-hippocampal theta coherence fluctuated in a periodical manner (Extended Fig. 5B)", but this figure only shows example data from 2 trials.

      The reviewer is correct. While we point towards 3 examples in the manuscript now, we focused this section on the autocorrelation analysis, which did not support our observation as we noticed a rather linear decay in correlation over time. As such, the periodicity observed was almost certainly a consequence of overlapping data in the epochs used to calculate coherence rather than intrinsic periodicity.

      Shortly after the start of the results section (line 112), the authors go into a very detailed description of how they validated their BMI without first describing what the BMI actually does. This made this and the subsequent paragraphs difficult to follow. I suggest the authors start with a general description of the BMI (and the general experiment) before going into the details.

      Corrected. See first paragraph of “Development of a closed-loop…”.

      In Figure 2C, as expected, around the onset of 'high' coherence trials, there is an increase in theta coherence but this appears to be very transient. However, it is unclear what the heatmap represents: is it a single trial, single session, an average across animals, or something else? In Figure 3F, however, the increase appears to be much more sustained.

      The sample size was rats for every panel in this figure. This was clarified at the end of Fig. 3.

      In Figure 2D, it was not clear to me what units of measurement are used when the averages and error bars are calculated. What is the 'n' here? Animals or sessions? This should be made clear in this figure as well as in other figures.

      The sample size is rats. This is now clarified at the end of Fig 2.

      Describing the study of Jones and Wilson (2005), the authors write: "While foundational, this study treated the dependent variable (choice accuracy) as independent to test the effect of choice outcome on task performance." (line 83) It was not clear to me what is meant by "dependent" and "independent" here. Explaining this more clearly might clarify how the authors' study goes beyond this and other previous studies.

      The reviewer is correct. A discussion on independent/dependent variables in the context of rationale for our experiment was removed.

      Reviewer #3 (Recommendations For The Authors):

      As explained in the public review, my comments mainly concern the interpretation of the experimental paradigm and its link with previous findings. I think modifying these in order to target the specific advance allowed by the paradigm would really improve the match between the experimental and analytical data that is very solid and the author's conclusions.

      Concerning the paradigm, I recommend that the authors focus more on their novel ability to clearly dissociate the functional role of theta coherence prior to the choice as opposed to induced by the choice. Currently, they explain by contrasting previous studies based on dependent variables whereas their approach uses an independent variable. I was a bit confused by this, particularly because the task variable is not really independent given that it's based on a brain-driven loop. Since theta coherence remains correlated with many other neurophysiological variables, the results cannot go beyond showing that leading up to the decision it correlates with good choice accuracy, without providing evidence that it is theta coherence itself that enhances this accuracy as they suggest in lines 93-94.

      The reviewer is correct. A discussion on independent/dependent variables in the context of rationale for our experiment was removed.

      Regarding previous results with muscimol inactivation, I recommend that the authors expand their discussion on this point. I think that their correlative data is not sufficient to conclude as they do that despite "these structures being deemed unnecessary" (based on causal muscimol experiments), they "can still contribute rather significantly" since their findings do not show a contribution, merely a correlation. This extra discussion could include possible explanations of the apparent, and thought-provoking discrepancies that they uncover such as: theta coherence may be a correlate of good accuracy without an underlying causal relation, theta coherence may always correlate with good accuracy but only be causally important in some tasks related to spatial working memory or, since muscimol experiments leave the brain time to adapt to the inactivation, redundancy between brain areas may mask their implication in the physiological context in certain tasks (see Goshen et al 2011).

      The second paragraph of the discussion is now dedicated to this.

      Possible further analysis :

      • In Extended 4A the authors show that performance drops with delay duration. It would be very interesting to see this graph with the high coherence / low coherence / yoked trials to see if the theta coherence is most important for longer trials for example.

      This is a great suggestion. Due to 10% of trials being triggered by high coherence states, our sample size precludes a robust analysis as suggested. Given that we found an enhancement effect on a task with minimal spatial working memory requirements (Fig. 4), it seems that coherence may be a general benefit or consequence of choice processes. Nonetheless, this remains an important question to address in a future study.

      • Figure 6: The authors explain in the text that although the effect of stimulation of VMT is variable, overall VMT activation increased PFC-HPC coherence. I think in the figure the results are only shown for one rat and session per panel. It would be interesting to add a figure including their whole data set to show the overall effect as well as the variability.

      The reviewer is correct and this comment promoted significant addition of detail to the manuscript. We have added an extended figure (Ext. Fig. 9) showing our VMT stimulation recording sessions. We originally did not include these because we were performing a parameter search to understanding if VMT stimulation could increase mPFC-hippocampal theta coherence. The results section was expanded accordingly.

      Changes to writing / figures :

      • The paper by Eliav et al, 2018 is cited to illustrate the universality of coupling between hippocampal rhythms and spikes whereas the main finding of this paper is that spikes lock to non-rhythmic LFP in the bat hippocampus. It seems inappropriate to cite this paper in the sentence on line 65.

      We agree with the reviewer and this citation was removed.

      • Line 180 when explaining the protocol, it would help comprehension if the authors clearly stated that "trial initiation" means opening the door to allow the rat to make its choice. I was initially unfamiliar with the paradigm and didn't figure this out immediately.

      We added a description to the second paragraph of our first results section.

      • Lines 324 and following: the analysis shows that there is a slow decay over around 2s of the theta coherence but not that it is periodical (as in regularly occurring in time), this would require the auto-correlation to show another bump at the timescale corresponding to the period of the signal. I recommend the authors use a different terminology.

      This comment is now addressed above in our response to Reviewer #2.

      • Lines 344: I am not sure why the stable theta coherence levels during the fixed delay phase show that the link with task performance is "through mechanisms specific to choice". Could the authors elaborate on this?

      We elaborated on this point further at the end of “Trials initiated by strong prefrontal-hippocampal theta coherence are characterized by prominent prefrontal theta rhythms and heightened pre-choice prefrontal-hippocampal synchrony”

      • Line 85: "independent to test the effect of choice outcome on task performance." I think there is a typo here and "choice outcome" should be "theta coherence".

      The sentence was removed in the updated draft.

    2. eLife assessment

      This study enhances our understanding of the relationship between cortico-hippocampal interactions and behavioral performance. Using an inter-areal coherence metric to gate trial initiation in real time, the authors provide solid evidence that links high hippocampal-prefrontal theta coherence to correct performance on spatial working memory and cue-guided decision-making tasks. Although reviewers agreed that the results do not demonstrate causality between hippocampal-prefrontal synchrony and behavioral performance, the findings are viewed as important given their potential implications for brain-machine interface applications in humans.

    3. Reviewer #1 (Public Review):

      Summary:

      Information transfer between the hippocampus and prefrontal cortex is thought to be critical for spatial working memory, but most of the prior evidence for this hypothesis is correlational. This study attempts to test this causally by linking trial start times to theta-band coherence between these two structures. The authors find that trials initiated during periods of high coherence led to a dramatic improvement in performance. This applied not only to a spatial working memory task, but also to a cue-guided navigation task, suggesting that coherence in these regions may be a signature of a heightened attentional or preparatory state. The authors supplement this behavioral result with electrophysiological recordings and optogenetic manipulations to test whether ventral midline thalamus is likely to mediate hippocampal-prefrontal coherence.

      Strengths:

      This study demonstrates a striking behavioral effect; by changing the moment at which a trial is initiated, performance on a spatial working memory task improves dramatically, from around 80% correct to over 90% correct. A smaller but nonetheless robust increase in accuracy was also seen in a texture discrimination task. Therefore, prefrontal-hippocampal synchronization in the theta band may not only be important for spatial navigation, but may also be associated with improved performance in a range of tasks. If these results can be replicated using noninvasive EEG, it would open up a powerful avenue for modulating human behavior.

      Weaknesses:

      Ventral midline thalamic nuclei, such as reuniens, have reciprocal projections to both prefrontal cortex and hippocampus and are therefore well-situated to mediate theta-band interactions between these structures. However, alternative mechanisms cannot be ruled out by the results of this study. For example, theta rhythms are globally coherent across the rodent hippocampus, and ventral hippocampus projects directly to prefrontal cortex. Theta propagation may depend on this pathway, and may only be passively inherited by VMT.

      The optogenetic manipulations are intended to show that theta in VMT propagates to PFC and also affects HPC-PFC coherence. However, the "theta" induced by driving thalamic neurons at 7 Hz is extremely artificial. To demonstrate that VMT is causally involved in coordinating activity across HPC and PFC, it would have been better to optogenetically inhibit, rather than excite, these nuclei. If the authors were able to show that the natural occurrence of theta in PFC depends on activity in VMT, that would be much more convincing test of their hypothesis.

    4. Reviewer #2 (Public Review):

      A number of previous reports have demonstrated that theta synchrony between the hippocampus (HPC) and prefrontal cortex (PFC) is associated with correct performance on spatial working memory tasks. The main goal of the current study is to examine this relationship by initiating trials either randomly (as has typically been done in previous studies) or during periods of high or low PFC-HPC coherence. To this end, they develop a 'brain-machine interface' (BMI) that provides real-time estimates of PFC-HPC theta coherence, which are then used to control trial onset using an automated figure-eight maze. Their main finding is that choice accuracy is significantly higher on trials initiated when theta coherence is high whereas performance on low coherence trials does not differ from randomly initiated control trials. They also observe a similar result using a non-working memory task in the same maze.

      Overall the main experiments (Figures 1-4) are well designed and the BMI approach is convincingly validated. There are also appropriate controls and analyses to rule out behavioral confounds and the results are clearly presented. Although the BMI can not establish a causal relationship between PFC-HPC coherence and behavior, it is helpful for examining how extremes in the distribution of brain states are associated with behavioral performance, something that might be more difficult to examine if trials are initiated randomly. As such, the BMI is an interesting approach for studying the neuronal basis of behavior that could be applied in other fields of neuroscience.

      In addition to the behavioral results, the authors also examine what neuronal mechanisms might support enhanced PFC-HPC synchrony (Figures 5-6). Here, they examine the potential contribution of the ventromedial thalamus (VMT) but the results are inconclusive. In particular, the results of optogenetic stimulation of the VMT (Figure 6) show that it both increases and decreases PFC-HPC theta synchrony, depending on the exact frequency range examined. These results are also somewhat preliminary as they come from only 2 animals.

    5. Reviewer #3 (Public Review):

      Stout et al investigate the link between prefrontal-hippocampal (PFC-HPC) theta-band coherence and accurate decisions in spatial decision making tasks. Previous studies show that PFC-HPC theta coherence positively correlates with task learning and correct decisions but the nature of this relation relies on correlations that cannot show whether coherence leads, coincides or is a consequence of decision making. To investigate more precisely this link, the authors devise a novel paradigm. In this paradigm the rat is blocked during a delay period preceding its choice and they control on a trial-by-trial basis the level of PFC-HPC theta coherence prior to the decision by allowing the rat to access the choice point only at a time when coherence reaches above or below a threshold. The design of the paradigm is very well controlled in many ways. First, using the PFC-HPC theta coherence during the delay period to gate when the rat accesses the choice zone clearly separates this coherence from the behavioural decision itself. Moreover, the behaviour of the animal is similar during high and low coherence periods. Finally, control trials are matched trial-by-trial to the time spent waiting by the rat when gated on theta coherence, which is crucial given that working memory performance depends on delay duration. All these features bolster the specificity of the author's main finding which is that PFC-HPC theta coherence prior to choice making is strongly predictive of accuracy in two tasks : one that requires working memory and another in which behaviour is cue-guided. Although this paradigm does not provide direct causal evidence, it convincingly supports the idea that PFC-HPC theta coherence prior to the behavioural decision is related to correct decision making and is not simply temporally coincidental or a consequence of the decision output.

      The authors also investigate the mechanisms behind the increase in PFC-HPC coherence during the task and show that it likely involves the recruitment of a small population of PFC neurons, via interactions with the Ventral Midline Thalamus that could mediate prefrontal/hippocampal dialogue.

      A key point of interest is the unexpected result showing a link between theta coherence even in the cue-driven version of the task. As the authors point out, muscimol inhibition of neither PFC nor HPC, nor the ventral midline thalamus impacts performance in this task. This raises the question of why coherence between two areas is predictive of choice accuracy when neither area appears to be causally involved. The authors discuss various options and explanations for this discrepancy which clearly adds to the current debate. Moreover their novel paradigm provides new tools to interrogate when inter-area synchrony is associated with information transfer and when this information is then used to drive behavioural decisions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the Authors):

      (1) Within the section on "optimized antigen retrieval", the authors mentioned that weak immunolabelling and strong non-specific labelling may be due to inadequate antigen retrieval. I wonder whether this interpretation is accurate. Could it also be due to inadequate antibody penetration?

      We appreciate the reviewer's comment and have revised our text to improve clarity. Regarding the SDS-electrophoresed sample (Figure S1a right), we acknowledge that the brain-surrounding background noise indicates insufficient antibody penetration. However, in the FLASH-processed sample (Figure S1a left), the background signal is uniformly distributed throughout the entire brain. Therefore, we conclude that incomplete antibody penetration is unlikely under this condition. Below is the revised paragraph:

      Revised manuscript, line 62-66: “We observed that both FLASH-processed and SDS-electrophoresed samples showed weak tyrosine hydroxylase (TH, a marker of dopaminergic neurons) signal (Figure S1a, Supporting Information). Additionally, we noticed that the FLASH-processed samples had almost no signal of NeuN, a marker of neuronal nuclei (Figure S1b left, Supporting Information), and exhibited strong non-specific background noise (Figure S1a left, Supporting Information). The presence of this background noise is considered an indicator of inadequate antigen retrieval.[48]”

      • Also, the authors mentioned the use of FLASH protocol and SDS-based electrophoresis for delipidation which were not described in the methods section.

      We have included the information in the revised Materials and Methods.

      Revised manuscript, line 418-426: S”HIELD processing, SDS-electrophoretic delipidation and FLASH delipidation. PFA-fixed specimens were incubated in SHIELD-OFF solution at 4 °C for 96 hours, followed by incubation for 24 hours in SHIELD-ON solution at 37 °C. All reagents were prepared using SHIELD kits (LifeCanvas Technologies, Seoul, South Korea) according to the manufacturer's instructions. For SDS-electrophoretic delipidation, SHIELD-processed specimens were placed in a stochastic electro-transport machine (SmartClear Pro II, LifeCanvas Technologies, Seoul, South Korea) running at a constant current of 1.2 A for 5-7 days. For FLASH delipidation, the SHIELD-processed specimens were placed in FLASH reagent (4% w/v SDS, 200 mM borate) and then incubated at 54 ℃ for 18 hours.[47] The delipidated specimens were washed with PBST at room temperature for at least 1 day.”

      • In addition, tyrosine hydroxylase (TH) should be a marker of "monoaminergic" neurons rather than specifically "dopaminergic" neurons.

      We appreciate the reviewer's correction. It is true that tyrosine hydroxylase (TH) is a marker for neurons that contain dopamine, norepinephrine, and epinephrine (catecholamines). However, the adrenergic and noradrenergic neurons are relatively few and are mostly located in the medulla and brain stem. Since we only monitoring the brain in this study, we wish to keep TH as an indicator of dopaminergic neurons.

      (2) It was mentioned that tissue integrity was retained following heating treatment during the MOCAT protocol. It would be useful to demonstrate any differences in structural distortion, if any, with before and after images with different delipidation agents.

      We have provided an additional supplementary figure (Figure S5 in the revised manuscript) to display the mouse brain at different stages of the MOCAT protocol, including pre-delipidation, post-delipidation, and post-RI-matching, to demonstrate the tissue integrity.

      Revised manuscript, line 135-137: “Figure S5 shows the gross views of the same mouse brain after undergoing 4% PFA fixation, paraffin processing, optimized antigen retrieval, and RI-matching, demonstrating intactness of the brain shape and preservation of tissue integrity.”

      (3) In this study, the authors have demonstrated the protocol could be successfully applied to FFPE specimens up to 15 years old. However, archival brain bank materials often have brain tissues with extended formalin fixation time. It may be useful to demonstrate that this technique can be utilised on FFPE tissues with long formalin fixation times.

      We appreciate the reviewer's suggestions. We have included an additional supplementary figure (Figure S6) to demonstrate the application of MOCAT to 3-month fixed mouse brain hemispheres. Although the long-term fixed specimens exhibited reduced TH intensity and S/N ratio, the major dopaminergic regions were labeled, and magnified images revealed details of cell bodies and neuronal fibers. These results suggest that MOCAT has the potential to be applied to long-term fixed specimens.

      The fluorescence intensity was more affected by fixation with formalin, which is methanol-stabilized and stronger, than with PFA. This indicates that a stronger antigen retrieval method may be a possible solution. However, achieving the right balance between antigen retrieval efficiency and tissue integrity will require additional testing and investigation.

      Revised manuscript, line 163 to 167: “We also applied MOCAT to 3-month fixed mouse brain hemispheres (Figure S6). Although the long-term fixed specimens exhibited reduced TH intensity and S/N ratio, the major dopaminergic regions were labeled, and magnified images revealed clear details of cell bodies and neuronal fibers. These results suggest that MOCAT has the potential to be applied to long-term fixed specimens.”

      Revised manuscript, line 346-351: “In the demonstration of MOCAT to 3-month fixed specimens, we observed that pontine reticular nucleus (Figure S6A, yellow arrowheads) lose TH-positive signals after long-term fixation. The fluorescence intensity was more affected by fixation with formalin, which is methanol-stabilized and stronger, than with PFA. The results indicate that a stronger antigen retrieval method may be a possible solution. However, achieving the right balance between antigen retrieval efficiency and tissue integrity will require additional testing and investigation.”

      (4) Whilst it is encouraging to see this protocol enables multi-round immunolabelling, further work is required to demonstrate there is no cross-reactivity in subsequent rounds of immunostaining following bleaching (e.g. Non-specific secondary antibody binding).

      We appreciate the reviewer for noting their concern and providing suggestions. To address this issue, we have examined the results of the second to fourth rounds of multi-round staining, as shown in Figure 3. In all three sequential rounds, we utilized rabbit primary antibodies and the same secondary antibodies. Our observations under a 3.6x objective (NA = 0.2) did not reveal any colocalization with the staining from the previous round. Hence, we conclude that cross-reactivity is not significant. However, we acknowledge the need for more comprehensive testing to completely rule out the possibility of cross-reactivity, such as employing antibodies from different hosts or utilizing different types of secondary antibodies (e.g., IgG, Fab2).

      Revised manuscript line 189-191: “The brain shape and structural integrity remained after 4 rounds of immunolabeling, and there is no cross-reactivity in subsequent rounds of immunostaining following bleaching. (Figure S11).”

      • Also, how was the structural integrity maintained for tissues after multiple rounds of heat-induced epitope retrieval?

      We have provided an additional supplementary figure (Figure S11 in the revised manuscript) to demonstrate the structural integrity after 4 rounds of immunolabeling.

      Revised manuscript line 189-191: “The brain shape and structural integrity remained after 4 rounds of immunolabeling, and there is no cross-reactivity in subsequent rounds of immunostaining following bleaching (Figure S11).”

      (5) It may be useful to have a side-by-side comparison in staining quality with equivalent sizes of rodent and human brain tissues as there appeared to be a reduction in clarity and staining quality at greater imaging depth for human tissues.

      We have provided an additional supplementary figure (Figure S12) to show the fluorescent images of TH- and Lectin-labeling in 1mm-thick human and mouse brain tissues at depths of 100 um, 500 um, and 900 um. For millimeter-sized samples, both human and mouse brains showed comparable levels of transparency, with no noticeable reduction in fluorescence signal at varying depths. In our forthcoming studies, we plan to conduct a more comprehensive comparison of centimeter-sized human and mouse brain tissues.

      (6) Lectin staining is used throughout this study to label vasculature of the brain. How specific is this as compared with other vasculature markers such as CD31?

      We appreciate the reviewer for addressing their concern. Lectins are nonimmune-origin carbohydrate-binding proteins that have been utilized to label the surface of the blood vessel lumen. On the other hand, CD31, CD34, etc. are immunomarkers of vascular endothelial cells. Numerous references have confirmed that lectin staining consistently co-localizes with CD31 immunoreactivity (Battistella et al. 2021; Miyawaki et al. 2020). However, in tumors, blood vessels lacking a lumen may display CD31 positive/Lectin negative conditions (Morikawa et al. 2002).

      (7) When discussing the applicability of MOCAT on the astrocytoma mouse model, there is a bit of confusion with regard to the terminology. As astrocytoma by default will be comprised of astrocytes, it may be useful to describe the tumour astrocytes as ASTS1CI-GFP positive astrocytes and immunolabelled astrocytes as GFAP-positive astrocytes.

      We thank the reviewer for their suggestions. To avoid confusion for readers, we have made modifications to the content and labeling of Figure 6A.

      Revised manuscript, line 213-219: “…we subjected an intact FFPE brain from an astrocytoma mouse model (see Materials and Methods) to the MOCAT pipeline to label tumor cells (ASTS1CI-GFP positive astrocytes) and GFAP-positive astrocytes (Figure 6A, C). Accordingly, we could segment GFAP-positive astrocytes surrounding the tumor (Figure 6B, D, and E) and classify them according to their distances from the tumor cells. Statistical analysis (Figure 6F) revealed that nearly half of the GFAP-positive astrocytes were within the tumor, with 63.9% being located near the tumor surface (±200 μm).”

      (8) Within the methods section, further details of the antibodies such as the clonality and immunogen should be included in the supplementary table.

      We appreciate the reviewer for their suggestions. In the revised version, we have included these details in Supplementary Table 1.

      • Furthermore, there is inadequate detail regarding multi-round immunolabelling and the precise timing of immunolabelling including lectin staining, various imaging parameters including the working distance of the lens and excitation laser used.

      We have added the experimental details of multi-round staining for Figure 3 in Supplementary Table 3. This table now includes information about the amounts and types of chemicals and antibodies used, as well as the laser wavelengths used for each round. The staining conditions (including labeling time, temperature, and buffer used) have been disclosed in Materials and Methods (see MOCAT pipeline/Electrophoretic immunolabeling). Furthermore, we have included the working distance and NA value of the objective lens used in MOCAT pipeline/Volumetric imaging and 3D visualization subsection.

      Revised manuscript, line 464-479: “Electrophoretic immunolabeling (active staining). The procedure was modified from the previously published eFLASH protocol[15] and was conducted in a SmartLabel System (LifeCanvas Technologies, Seoul, South Korea). The specimens were preincubated overnight at room temperature in sample buffer (240 mM Tris, 160 mM CAPS, 20% w/v D-sorbitol, 0.9% w/v sodium deoxycholate). Each preincubated specimen was placed in a sample cup (provided by the manufacturer with the SmartLabel System) containing primary, corresponding secondary antibodies and lectin diluted in 8 mL of sample buffer. Information on antibodies, lectin and their optimized quantities is detailed in Supplementary Table 1. The specimens in the sample cup and 500 mL of labeling buffer (240 mM Tris, 160 mM CAPS, 20% w/v D-sorbitol, 0.2% w/v sodium deoxycholate) were loaded into the SmartLabel System. The device was operated at a constant voltage of 90 V with a current limit of 400 mA. After 18 hours of electrophoresis, 300 mL of booster solution (20% w/v D-sorbitol, 60 mM boric acid) was added, and electrophoresis continued for 4 hours. During the labeling, the temperature inside the device was kept at 25 ℃. Labeled specimens were washed twice (3 hours per wash) with PTwH (1× PBS with 0.2% w/v Tween-20 and 10 μg/mL heparin),[23] and then post-fixed with 4% PFA at room temperature for 1 day. Post-fixed specimens were washed twice (3 hours per wash) with PBST to remove any residual PFA.”

      Revised manuscript, line 483-490: “Volumetric imaging and 3D visualization. For centimeter-scale specimens, images were acquired using a light-sheet microscope (SmartSPIM, LifeCanvas Technologies, Seoul, South Korea) with a 3.6x customized immersion objective (NA = 0.2, working distance = 1.2 cm). For samples <3 mm thick, imaging was performed using a multipoint confocal microscope (Andor Dragonfly 200, Oxford Instruments, UK) with objectives that were UMPLFLN10XW (10x, NA = 0.3, working distance = 3.5 mm), UMPLFLN20XW (20x, NA = 0.5, working distance = 3.5 mm), UMPLFLN40XW (40x, NA = 0.8, working distance = 3.3 mm). 3D visualization was performed using Imaris software (Imaris 9.5.0, Bitplane, Belfast, UK).”

      • Also, since refractive index homogenisation is an important step in tissue-clearing experiments, it may be useful to describe the components of NFC1 and NFC2 solutions used and provide images of the "cleared" tissues.

      We have included the image of a cleared mouse brain in Figure S5. Additionally, we have provided the refraction index of NFC1 and NFC2 in Materials and Methods (see MOCAT pipeline/Refractive index matching). However, the composition of NFC1 and NFC2, being commercialized products from Nebulem (Taiwan), is non-disclosable.

      Reviewer #2 (Public Review):

      Major Weaknesses:

      • There is no evidence of actual transparency of the entire mouse brain across different treatments. The suggested protocol is very good at removing lipids (as assessed by DiD staining) and by results of fluorescence registration deep within the brain. BUT, since in many places of the manuscript authors speak of "transparency" the reader will expect the typical picture in which control and processed brains are on top of a white graphical pattern that would evidence transparency (see as an example Figure 1 and 2 of Wan et al. 2018 (Neurophotonics. 2018 Jul;5(3):035007. doi: 10.1117/1.NPh.5.3.035007.)

      We thank the reviewer for their suggestions. We have provided an additional supplementary figure (Figure S5 in the revised manuscript) to demonstrate the transparency.

      • The manuscript lacks clarity on the applicability of MOCAT to regular formalin-fixed tissue and tissues other than the brain.

      We appreciate the reviewer's suggestions. We have included an additional supplementary figure (Figure S6) to demonstrate the application of MOCAT to a 3-month regular formalin-fixed mouse brain hemisphere. We observed that the major dopaminergic regions were still labeled, although with reduced intensity and S/N ratio. We also observed that the fluorescence intensity was more affected in formalin, which is methanol-stabilized and stronger, than in PFA, implying that a stronger antigen retrieval method may be possible to rescue the intensity. However, achieving the right balance between antigen retrieval efficiency and tissue integrity will require additional testing and investigation.

      Revised manuscript, line 163 to 167: “We also applied MOCAT to 3-month fixed mouse brain hemispheres (Figure S6). Although the long-term fixed specimens exhibited reduced TH intensity and S/N ratio, the major dopaminergic regions were labeled, and magnified images revealed clear details of cell bodies and neuronal fibers. These results suggest that MOCAT has the potential to be applied to long-term fixed specimens.”

      Revised manuscript, line 346-351: “In the demonstration of MOCAT to 3-month fixed specimens, we observed that pontine reticular nucleus (Figure S6A, yellow arrowheads) lose TH-positive signals after long-term fixation. The fluorescence intensity was more affected by fixation with formalin, which is methanol-stabilized and stronger, than with PFA. The results indicate that a stronger antigen retrieval method may be a possible solution. However, achieving the right balance between antigen retrieval efficiency and tissue integrity will require additional testing and investigation.”

      Regular formalin

      We agree with the reviewer and plan to investigate the potential use of MOCAT in tissues other than the brain in our subsequent studies.

      • Insufficient information is provided on the "epoxy treatment" or "hydrogel," and a more detailed explanation is warranted.

      We appreciate the reviewer's question. In response, we have included a paragraph in the Discussion section to clarify the appropriate timing for using epoxy or hydrogel in the MOCAT pipeline. However, the harsh conditions, such as pressure and heat, caused by external forces might damage specimens. To protect specimens from the harsh conditions caused by active staining, specimens could be strengthened by treatment with epoxy or acrylamide monomer to form a tissue-epoxy or tissue-hydrogel hybrid.[29,31] Laboratories that do not have adequate devices or handle small specimens could use passive immunolabeling instead and skip the step of epoxy or hydrogel pretreatment.

      Epoxy and acrylamide hydrogel can both strengthen tissue structures. However, in this study, we only used epoxy for treatment in combination with active electrophoretic staining. To avoid confusion and improve clarity, we have made modifications to Figure 1B and included epoxy processing in the MOCAT pipeline subsection within Materials and Methods.

      Revised manuscript, line 329-340: “In Figure 1B, we propose two staining strategies for samples with thicknesses less than 500 um and greater than 1 mm: passive immunolabeling and active immunolabeling. In passive immunolabeling, antibodies penetrate and reach their targets solely through diffusion, without any additional force. It takes approximately two months to passively stain a whole mouse brain.[26,28] Compared to passive immunolabeling, active immunolabeling uses an external force, such as pressure, electrophoresis, etc., to facilitate antibody penetration and therefore significantly speed up the staining process, reducing the required staining time for a whole mouse brain to one day. However, the harsh conditions, such as pressure and heat, caused by external forces might damage specimens. To protect specimens from the harsh conditions caused by active staining, specimens could be strengthened by treatment with epoxy or acrylamide monomer to form a tissue-epoxy or tissue-hydrogel hybrid.[29,31] Laboratories that do not have adequate devices or handle small specimens could use passive immunolabeling instead and skip the step of epoxy or hydrogel pretreatment.”

      • The differences between passive and active immunolabeling, as well as photobleaching data, should be addressed for a comprehensive understanding.

      We appreciate the reviewer's question. We have included a paragraph in the Discussion section to explain the differences between passive and active immunolabeling:

      Revised manuscript, line 329-340: “In Figure 1B, we propose two staining strategies for samples with thicknesses less than 500 um and greater than 1 mm: passive immunolabeling and active immunolabeling. In passive immunolabeling, antibodies penetrate and reach their targets solely through diffusion, without any additional force. It takes approximately two months to passively stain a whole mouse brain.[26,28] Compared to passive immunolabeling, active immunolabeling uses an external force, such as pressure, electrophoresis, etc., to facilitate antibody penetration and therefore significantly speed up the staining process, reducing the required staining time for a whole mouse brain to one day.”

      Regarding the effects of photobleaching, we have added Figure S10 to demonstrate the efficiency of using our approach.

      Revised manuscript, line 184-185: After imaging, we photobleached transparent RI-matched samples using a 100W LED white light to quench the previously labeled fluorophores (Figure S10).

      • The assertion that MOCAT can be rapidly applied in hospital pathology departments seems overstated due to the limited availability of light-sheet microscopes outside research labs.

      We thank the reviewer's question. Since the imaging depth primarily relies on the working distance of the objective lens, if a long working distance objective lens (such as UMPLFLN10XW from Olympys Inc.) is available, it is also possible to scan samples up to a thickness of approximately 3.5mm. However, confocal systems require longer scanning times, and in non-optical sectioning wide-field fluorescence microscopes like the Olympus BX series or ZEISS Axio imager series, deconvolution algorithms must be utilized to eliminate out-of-focus signals.

      Additionally, the epifluorescence system may also result in reduced fluorescent intensity in the deeper regions of the sample. If the fluorescent signal of the target is weak or exceeds the working distance of the objective lens, an alternative option is to send the sample to a microscopy or imaging facility core for scanning and further analysis.

      • The compatibility of MOCAT with genetically encoded fluorescent proteins remains unclear and warrants further investigation.

      We appreciate the reviewer's question. We have included a paragraph in the Discussion section to address this limitation of MOCAT:

      Revised manuscript, line 354-361: “Fourth, MOCAT is not compatible with endogenous fluorescence due to a reduction in fluorescence intensity caused by xylene and alcohol used in paraffin processing. Researchers who need to directly observe genetically encoded fluorescent proteins can utilize tissue-clearing methods such as 3DISCO, X-CLARITY, CUBIC, etc., which have been shown to minimize the decrease in fluorescence intensity. On the other hand, if researchers need to visualize transgenic fluorescent proteins along with other biomarkers, they can use MOCAT for delipidation and boost-immunolabeling to visualize the transgenic fluorescent proteins.”

      • The control of equivalent depths in cryosections for evaluating the intensity of DiD staining should be elaborated upon.

      We have included these information in the section of Materials and Methods:

      Revised manuscript, line 428-430: “Serial 20-µm-thick cryosections were cut from mouse brain slices (2-mm thick) of various treatment conditions for subsequent DiD or Oil red O staining. For DiD staining, cryosections (that were of approximately 0-40 µm depth) were post-fixed with 4% PFA at room temperature for 5 minutes.”

      • The composition of NFC1 and NFC2 solutions for refractive index matching should be provided.

      We have provided the refraction index of NFC1 and NFC2 in Materials and Methods (see MOCAT pipeline/Refractive index matching). However, the composition of NFC1 and NFC2, being commercialized products from Nebulem (Taiwan), is non-disclosable.

      Reviewer #2 (Recommendations for the Authors):

      • A larger readership would benefit from validating imaging depths using fluorescence microscopies commonly found in pathological departments (i.e. Confocal, 2-photon, epifluorescence+deconvolution, etc).

      We thank the reviewer's recommentation. Since the imaging depth primarily relies on the working distance of the objective lens, if a long working distance objective lens (such as UMPLFLN10XW from Olympys Inc.) is available, it is also possible to scan samples up to a thickness of approximately 3.5mm. However, confocal systems require longer scanning times, and in non-optical sectioning wide-field fluorescence microscopes like the Olympus BX series or ZEISS Axio imager series, deconvolution algorithms must be utilized to eliminate out-of-focus signals.

      Additionally, the epifluorescence system may also result in reduced fluorescent intensity in the deeper regions of the sample. If the fluorescent signal of the target is weak or exceeds the working distance of the objective lens, an alternative option is to send the sample to a microscopy or imaging facility core for scanning and further analysis.

      -Investigate the compatibility of MOCAT with genetically encoded fluorescent proteins, a common target in research specimens.

      We appreciate the reviewer's question. We have included a paragraph in the Discussion section to address this limitation of MOCAT:

      Revised manuscript, line 354-361: “Fourth, MOCAT is not compatible with endogenous fluorescence due to a reduction in fluorescence intensity caused by xylene and alcohol used in paraffin processing. Researchers who need to directly observe genetically encoded fluorescent proteins can utilize tissue-clearing methods such as 3DISCO, X-CLARITY, CUBIC, etc., which have been shown to minimize the decrease in fluorescence intensity. On the other hand, if researchers need to visualize transgenic fluorescent proteins along with other biomarkers, they can use MOCAT for delipidation and boost-immunolabeling to visualize the transgenic fluorescent proteins.” References:

      Battistella, Roberta et al. 2021. “Not All Lectins Are Equally Suitable for Labeling Rodent Vasculature.” International Journal of Molecular Sciences 22(21): 22. /pmc/articles/PMC8584019/ (January23, 2024).

      Miyawaki, Takeyuki et al. 2020. “Visualization and Molecular Characterization of Whole-Brain Vascular Networks with Capillary Resolution.” Nature Communications 2020 11:1 11(1): 1–11. https://www.nature.com/articles/s41467-020-14786-z (January23, 2024).

      Morikawa, Shunichi et al. 2002. “Abnormalities in Pericytes on Blood Vessels and Endothelial Sprouts in Tumors.” The American Journal of Pathology 160(3): 985–1000.

    2. eLife assessment

      The reprocessing and reanalysis of archived samples can yield further insights from past experiments. Here, a useful procedure to perform tissue clearing and immunolabeling on large-scale formalin-fixed paraffin-embedded brain specimens is convincingly evaluated on a set of archival pathology specimens, and its applicability to further such samples is analyzed. This method will be of interest to both neuroscientists and pathologists.

    3. Reviewer #1 (Public Review):

      In this study, Lin et al developed a protocol termed MOCAT, to perform tissue clearing and labelling on large-scale FFPE mouse brain specimens. They have optimised protocols for dewaxing and adequate delipidation of FFPE tissues to enable deep immunolabelling, even for whole mouse brains. This was useful for the study of disease models such as in an astrocytoma model to evaluate spatial architecture of the tumour and its surrounding microenvironment. It was also used in a traumatic brain injury model to quantify changes in vasculature density and differences in monoaminergic innervation. They have also demonstrated the potential of multi-round immunolabelling using photobleaching, as well as expansion microscopy with FFPE samples using MOCAT.

      Comments on revised version:

      The revised manuscript by Lin et al is much improved with a more detailed methods description. There are only a few minor comments for the authors:

      - The new figures provided in Supplementary figure 5 did demonstrate a good level of transparency for the mouse brain tissue. However, quite marked tissue expansion can be seen following antigen retrieval and RI matching and this should be mentioned in the manuscript.<br /> - The authors have provided comparison between mouse and human brain samples in Figure S12. However, it is misleading to mention that the "fluorescent signals are comparable at varying depth" as the figure clearly showed a lack of continuous staining especially for SMI312 at 900um depth, and human brain tissue showed considerably increased background signal (likely due to endogenous lipofuscin which has autofluorescent properties).<br /> - It is understandable the authors cannot provide the full detail of the RI matching reagent as it is a commercialised product. However, it may still be useful if they can quote the refractive index +/- pH of the solution.

    4. Reviewer #2 (Public Review):

      The manuscript details an investigation aimed at developing a protocol to render centimeter-scale formalin-fixed paraffin-embedded specimens optically transparent and suitable for deep immunolabeling. The authors evaluate various detergents and conditions for epitope retrieval such as acidic or basic buffers combined with high temperatures in entire mouse brains that had been paraffin-embedded for months. They use various protein targets to test active immunolabeling and light-sheet microscopy registration of such preparations to validate their protocol. The final procedure, called MOCAT pipeline, briefly involves 1% Tween 20 in citrate buffer, heated in a pressure cooker at 121 {degree sign}C for 10 minutes. The authors also note that part of the delipidation is achieved by the regular procedure.

      Major Strengths<br /> - The simplicity and ease of implementation of the proposed procedure using common laboratory reagents distinguish it favorably from more complex methods.

      - Direct comparisons with existing protocols and exploration of alternative conditions enhance the robustness and practicality of the methodology.

      Major Weaknesses

      - The assertion that MOCAT can be rapidly applied in hospital pathology departments seems overstated due to the limited availability of light-sheet microscopes outside research labs. In the first rebuttal letter, authors explain the limitations of other microscopes more readily available in hospitals. This explanation relies on your own investigations and practical experience on the matter, so including them in some part of the manuscript would be beneficial.

      - Refractive index matching is a critical point in the protocol, the one providing final transparency. Authors utilized the commercial solutions NFC1 and NFC2 (Nebulem, Taiwan) with a known refractive index, but for which its composition is non-disclosable. My knowledge on the organic chemistry around refractive index matching is limited, but if users don't really know what is going on in this final step, the whole protocol would rely on a single world-wide provider and troubleshooting would be fishing. I suggest that you try to validate the approach with solutions of known composition, or at least provide the solutions sold by other providers.

      Final considerations<br /> The evidence presented supports the effectiveness of the proposed method in rendering thick FFPE samples transparent and facilitating repeated rounds of immunolabeling.

      The developed procedure holds promise for advancing tissue and 3D-specific determination of proteins of interest in various settings, including hospitals, basic research, and clinical labs, particularly benefiting neuroscience research.

      The methodological findings suggest that MOCAT could have broader applications beyond FFPE samples, differentiating it from other tissue-clearing approaches in that the equipment and chemicals needed are broadly accessible.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to Reviewer 1 Comments (PublicReview)

      Point 1: First, the authors should provide more convincing data showing that tor and tapA genes are indeed duplicated genes in A. flavus. The authors appeared to use the A. flavus PTS strain as a parental strain for constructing the tor and tapA mutants. If so, the A. flavus CA14 strain (Hua et al., 2007) should be the parental wild-type strain for the A. flavus PTS strain. I did a BLAST search in NCBI for the torA (AFLA_044350) and tapA (AFLA_092770) genes using the most recent CA14 genome assembly sequence (GCA_014784225.2) and only found one allele for each gene: torA on chromosome 7 and tapA on chromosome 3. I could not find any other parts with similar sequences. Even in another popular A. flavus wild-type strain, NRRL3357, both torA and tapA exist as a single allele. Based on the published genome assembly data for A. flavus, there is no evidence to support the idea that tor and tapA exist as copies of each other. Therefore, the authors could perform a Southern blot analysis to further verify their claim. If torA and tapA indeed exist as duplicate copies in different chromosomal locations, Southern blot data could provide supporting results.

      Response 1: We thank the reviewer for their insightful observation. Based on the southern blot analysis results presented in Figure 1, we have determined that torA and tapA are single-copy genes. Additionally, we conducted protoplast transformation experiments repeated several times. which revealed that both torA and tapA transformants exhibited ectopic mutations. It is plausible that the deletion of torA and tapA genes may lead to the demise of A. flavus, this phenomenon is consistent with previous studies conducted on the fungus Fusarium graminearum[1].To ensure the rigor of the study, we have retracted the previously incorrect conclusion. We once again express our heartfelt appreciation to the experts for their valuable suggestions.

      Author response image 1.

      Fig.1 Southern blot hybridization analyses of WT, torA, and tapA transformants. (A) The structure diagram of the torA gene. (B) The structure diagram of the tapA gene. (C) Southern blot hybridization analyses of torA gene. (D) Southern blot hybridization analyses of tapA gene.

      Point 2: Second, the authors should consider the possibility of aneuploidy for their constructed mutants. When an essential gene is targeted for deletion, aneuploidy often occurs even in a fungal strain without the "ku" mutation, which results in seemingly dual copies of the gene. As the authors appear to use the A. flavus PTS strain having the "ku" mutation, the parental strain has increased genome instability, which may result in enhanced chromosomal rearrangements. So, it will be necessary to Illumina-sequence their tor and tapA mutants to make sure that they are not aneuploidy.

      Response 2: Thank you for your comment. Based on the sequencing results of the torA and tapA mutants, it was determined that the torA and tapA genes were still present in both mutants. In this case, it suggests that the torA and tapA genes may have undergone a genetic rearrangement or insertion at a different site in the mutant strains.

      Point 3: Furthermore, the genetic nomenclature +/- and -/- should be reserved for heterozygous and homozygous mutants in a diploid strain. As A. flavus is not a diploid strain, this type of description could cause confusion for the readers.

      Response 3: Thank you for your suggestion. We acknowledge your concerns about potential confusion caused by using this type of description, and we agree that it is best to avoid any misunderstandings for readers. Therefore, we have decided to remove this part of the content from the manuscript.

      Response to Reviewer 2 Comments (PublicReview)

      Point 1: However, findings have not been deeply explored and conclusions are mostly are based on parallel phenotypic observations. In addition, there are some concerns for the conclusions.

      Response 1: We are grateful for the suggestion. We conduct additional experiments and analyses to provide a more comprehensive understanding and address concerns raised.

      Response to Reviewer 3 Comments (PublicReview)

      Point 1: The paper by Li et al. describes the role of the TOR pathway in Aspergillus flavus. The authors tested the effect of rapamycin in WT and different deletion strains. This paper is based on a lot of experiments and work but remains rather descriptive and confirms the results obtained in other fungi. It shows that the TOR pathway is involved in conidiation, aflatoxin production, pathogenicity, and hyphal growth. This is inferred from rapamycin treatment and TOR1/2 deletions. Rapamycin treatment also causes lipid accumulation in hyphae. The phenotypes are not surprising as they have been shown already for several fungi. In addition, one caveat is in my opinion that the strains grow very slowly and this could cause many downstream effects. Several kinases and phosphatases are involved in the TOR pathway. They were known from S. cerevisiae or filamentous fungi. The authors characterized them as well with knock-out approaches.

      Response 1: Thank you for your comment. The role of the target of rapamycin (TOR) signaling pathway is of fundamental importance in the physiological processes of diverse eukaryotic organisms. Nevertheless, its precise involvement in regulating the developmental and virulent characteristics of opportunistic pathogenic fungi, such as A. flavus, has yet to be fully elucidated. Furthermore, the mechanistic underpinnings of TOR pathway activity specifically in A. flavus remain largely unresolved. Consequently, our study represents a significant contribution as the first comprehensive exploration of the conserved TOR signaling pathway encompassing a majority of its constituent genes in A. flavus.

      Response to Reviewer 1 Comments (Recommendations For The Authors)

      Point 1: In Table S3, the authors indicated that the Δku70 ΔniaD ΔpyrG::pyrG strain is A. flavus wild-type strain. However, this strain is not a wild-type strain because it seems like a control strain after introducing the pyrG gene into the A. flavus PTS strain (Δku70 ΔniaD ΔpyrG). So please indicate the real wild-type A. flavus strain name to help readers find out its original genome sequence data. Also, the reference for this Δku70 ΔniaD ΔpyrG::pyrG strain is "saved in our lab". This is not an eligible reference. If you use this control strain for the first time in this study, it should be described as "In this study". Otherwise, please indicate the proper reference for which the strain was first used.

      Response 1: Thank you for your valuable feedback on our manuscript. We appreciate your attention to detail and the opportunity to clarify the information regarding the strain in Table S3. The A. flavus CA14 strain which produces aflatoxins and large sclerotia was isolated from a pistachio bud in the Wolfskill Grant Experimental Farm (University of Davis, Winters, California, USA)[2]. The A. flavus CA14 strain is the parental wild-type strain for the A. flavus CA14 PTs (Δku70, ΔniaD, ΔpyrG) strain. The recipient strain CA14 PTs has been used satisfactorily in gene knockout and subsequent genetic complementation experiments[3]. In this study, the A. flavus CA14 PTs strain was used as the transformation recipient strain, and the control strain (Δku70, ΔniaD, ΔpyrG::pyrG) created by introducing the pyrG gene into the A. flavus CA14 PTs strain. Refer to previously published literature[4],this control strain (Δku70, ΔniaD, ΔpyrG::pyrG) was named wild-type strain. Therefore, this control strain was also named wild-type strain in this study. As this control strain is indeed used in this study, we will revise the reference to "In this study" Once again, we appreciate your keen attention to detail and thank you for bringing these issues to our attention.

      Response to Reviewer 2 Comments (Recommendations For The Authors)

      Point 1: As in response: However, the tor gene in A. flavus exhibited varying copy numbers, as was confirmed by absolute quantification PCR at the genome level (Table S1). However, it is hard to understand Table S1: Estimation of copy number of tor gene in A. flavus toro and sumoo stand for the initial copy number, and the data are figured as the mean {plus minus} 95% confidence limit. CN is copy number. As indicated in the section of Method, using sumo gene as reference, the tor and tapA gene copy number was calculated by standard curve. In Table S1 of WT, for tor gene, CN value is 1412537 compared to 1698243 in tor+/-, for the reference gene sumo,794328 compared to1584893, how these data could support copy gene numbers of tor?

      Response 1: Thank you for your suggestion. We understand the confusion with the data presented in Table S1 regarding the copy number estimation of the tor gene in A. flavus. We apologize for not providing a clear explanation for the data in the table. Quantitative real-time PCR (qPCR) is widely used to determine the copy number of a specific gene. It involves amplifying the gene of interest and a reference gene simultaneously using specific primers and probes. By comparing the amplification curves of the gene of interest and the reference gene, you can estimate the relative copy number of the gene.

      To address your concern and provide more accurate information, we have re-performed the copy number analysis using southern blot. Southern blot analysis allows for the direct estimation of gene copy number by hybridizing genomic DNA with a specific probe for the gene. This method provides more reliable and accurate results in determining gene copy numbers. The southern blot analysis results are presented in Figure 1.

      We appreciate your input and apologize for any confusion caused by the earlier presentation of the data.

      Point 2: In response: For the knockout of the FRB domain, we used the homologous recombination method, but because tor genes are double-copy genes, there are also double copies in the FRB domain. Despite our efforts, we encountered challenges in precisely determining the location of the other copy of the tor gene. I could not understand these consistent data, why not for using sequencing?

      Response 2: Thank you for your comment. We observed that the torA gene is a single copy. We removed this part of the results to avoid any ambiguity or potential misinterpretation.

      Point 3: Response in Due to the large number of genes involved, we did not perform a complementation experiment. If there were no complementation data, how to demonstrate data are solid?

      Response 3: Thank you for your important suggestion. We understand that complementation experiments are commonly used to validate gene deletions. Therefore, to ensure the reliability of our data, we have conducted supplementary experiments on specific gene deletions, such as ΔsitA-C and Δppg1-C. Thank you again for your positive comments and valuable suggestions to improve the quality of our manuscript.

      References:

      (1) Yu F, Gu Q, Yun Y, et al. The TOR signaling pathway regulates vegetative development and virulence in Fusarium graminearum. New Phytol. 2014; 203(1): 219-32.

      (2) Hua SS, Tarun AS, Pandey SN, Chang L, Chang PK. Characterization of AFLAV, a Tf1/Sushi retrotransposon from Aspergillus flavus. Mycopathologia. 2007 Feb;163(2):97-104.

      (3) Chang PK, Scharfenstein LL, Mack B, Hua SST. Genome sequence of an Aspergillus flavus CA14 strain that is widely used in gene function studies. Microbiol Resour Announc. 2019 Aug 15;8(33):e00837-19.

      (4) Zhu Z, Yang M, Yang G, Zhang B, Cao X, Yuan J, Ge F, Wang S. PP2C phosphatases Ptc1 and Ptc2 dephosphorylate PGK1 to regulate autophagy and aflatoxin synthesis in the pathogenic fungus Aspergillus flavus. mBio. 2023 Oct 31;14(5):e0097723.

    2. eLife assessment

      This important study presents relevant information about the involvement of TOR pathway in aflatoxin production by Aspergillus flavus. However, some of the presentation is confusing, leaving the study in its current form is incomplete. The strength of the evidence could be augmented with additional experiments and reorganization of the manuscript aiming to fully understand and characterize the involvement of TOR pathway in A. flavus aflatoxin production.

    3. Reviewer #1 (Public Review):

      While I acknowledge the authors' effort in conducting Southern blot analysis to address my prior concern regarding the presence of dual copies of torA and tapA, I find their current resolution inadequate. Specifically, the simple deletion of the respective result sections for torA and tapA significantly impacts the overall significance of this study. The repeated unsuccessful attempts to generate correct mutants only offer circumstantial evidence, as technical issues may have been a contributing factor. Therefore, instead of merely removing these sections, it is essential for the authors to present more compelling experimental data demonstrating that torA and tapA are indeed vital for the viability of A. flavus. Such data would enhance the overall significance of this study.

    4. Reviewer #2 (Public Review):

      In this study, authors identified TOR, HOG and CWI signaling network genes as modulators of the development, aflatoxin biosynthesis and pathogenicity of A. flavus by gene deletions combined with phenotypic observation. They also analyzed the specific regulatory process and proposed that the TOR signaling pathway interacts with other signaling pathways (MAPK, CWI, calcineurin-CrzA pathway) to regulate the responses to various environmental stresses. Notably, they found that FKBP3 is involved in sclerotia and aflatoxin biosynthesis and rapamycin resistance in A. flavus, especially that the conserved site K19 of FKBP3 plays a key role in regulating aflatoxin biosynthesis. In general, the study involved a heavy workload and the findings are potentially interesting and important for understanding or controlling the aflatoxin biosynthesis. However, the findings have not been deeply explored and the conclusions mostly are based on parallel phenotypic observations.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) It is not entirely clear why a tumor-free model is chosen to study immune responses, as immune responses can differ significantly with or without tumor-bearing. A more detailed explanation is needed.

      We appreciate the question. As stated in the original submission, tumor-free mouse models are commonly used to assess off-target outcomes of anti-neoplastic therapies. We have expanded on this point and acknowledged this shortcoming in the revised manuscript (lines 264-265).

      (2) Immune responses in isolated macrophages, neutrophils, and bone marrow cells require priming with LPS, while such responses are not observed in vivo. There is no explanation for these differences.

      The reviewer raises an excellent point. The assembly of inflammasomes such as those nucleated by NLRP3 requires priming signals, which increase the levels of this sensor, which are kept low in homeostatic conditions to prevent spontaneous unwanted inflammation. While LPS is commonly used in vitro as an inducer of priming signals, these cues are triggered in vivo by various molecules, including pro-inflammatory cytokines. We have provided a rationale for the use of LPS in vitro in the revised manuscript (lines 144-145).

      (3) The band intensities on Western blots in Fig. 4 and Fig. 5 are not quantified, and the numbers of repeats are also not provided. This additional information is recommended.

      While caspase-1, caspase-3, GSDMD, and GSDME but not AIM 2 and NLRP3 are activated upon proteolytic cleavage. It is not straightforward to quantify and describe the intensity of the bands of these numerous with different fate outcomes. We regret for not mentioning the numbers of repeats in the original submission. This information has now been provided in figure legends where necessary.

      (4) Many abbreviations are used throughout the text, and some of the full names are not provided.

      Full names are required at the first introduction.

      We agree. We have provided full names at the first introduction (lines 21, 23, 86).

      (5) Fig. 5B needs a label on the X axis.

      We regret the confusion: X axis was for both Fig. 5B and 5C. We have made the change in the new Fig. 5.

      Reviewer #2:

      The following specific points could be addressed to further improve the quality of the manuscript:

      (1) Concerning data presented in Figure 1, 3D micro-CT reconstructions of the entire femurs could be shown instead of just the trabecular bone. Data on cortical bone loss are important. It would be important to show histological (sagittal) sections of the bones at baseline, treated with Doxorubicin or vehicle, and quantify osteoblasts in addition to osteoclasts. Is there increased bone marrow adiposity in Doxorubicin-treated mice? The data with vehicle should be shown in the main figures not just in the supplemental data.

      We thank the reviewer for the suggestion. We have now provided 3 D micro-CT reconstructions of a representative femur containing both trabecular and cortical bones (S1B Fig). Only the metaphyseal area is shown because we did not originally scan the entire femur.

      Quantification of osteoblast number is not a reliable measurement, the reason why we carried out dynamic histomorphometry to assess the effect of doxorubicin on bone formation (original S1D Fig/new S1E Fig).

      Unfortunately, we did not determine the effects of doxorubicin on bone marrow adiposity. However, to address the reviewer’s comment, we have mentioned in the revised manuscript adipogenic effects of doxorubicin based on the literature (lines 264-265).

      (2) Concerning data presented in Figure 2, how long after Doxorubicin injection is leukopenia observed (beyond the 72-hour timepoint)? Does cell-count return to baseline 4 weeks after treatment (when the bone phenotype is characterized)? Why use 12-week-old mice here and 10week-old animals for the rest of the study?

      We appreciate the question. We did not measure leukopenic effects of doxorubicin beyond the 72-hour timepoint based on the following: i) bones are analyzed in mice injected only once with a single dose of doxorubicin; ii) leukopenia is a side effect of doxorubicin whose blood levels should be undetectable 4 weeks after its administration although we did not measure them experimentally. Our premise is that osteopenia observed in doxorubicin-exposed mice is the result of early events that occur after the administration of the drug.

      We apologize for the confusion. We assessed baseline bone mass by VivaCT using 10-week-old mice; doxorubicin was injected 2 weeks when mice were 12-week-old. We have clarified this point in the revised manuscript (line 301).

      (3) It would be important to evaluate local inflammation in bones collected from wild-type and mutant mice. Are ASC specks, Cit-H3, and MPO present in the bone marrow? The expression of some components of the inflammasomes or relevant pathways could be assessed in bone samples deprived of bone marrow and in the bone marrow.

      This is a good point. Although we were not able to reliably measure Cit-H3 and MPO in bone marrow fluid, our data shown in Figs. 3-6, 7A-D are from bone marrow cells.

      (4) Data presented in western blots should be quantified. The ratio of signal intensity obtained for beta-actin over the signal obtained for a given protein should be calculated for each experimental condition (especially in Figure 5, where beta-actin levels fluctuate a lot).

      Please see the response to question #1. Fluctuations in β-actin levels are likely related to doxorubicin cytotoxic effects as mentioned in the original submission (lines 150, 194, 253). Despite this caveat, IL-1β levels are stimulated by this drug.

      (5) In Figure 7, BV/TV of WT and mutant mice at baseline should be quantified and shown. Sagittal histological sections of the femur should be shown. 3D micro-CT reconstructions of the entire femur could be shown instead of just the trabecular bone. Osteoblasts and bone resorption should be quantified. Data obtained with vehicle should be quantified and shown in the main figure. The control and LPS conditions should be better defined. Does it include vehicle?

      Please see the response to reviewer 1’s question #1.

      We have now provided 3 D micro-CT reconstructions of a representative femur containing both trabecular and cortical bone (S3A, B Fig).

      LPS was dissolved in PBS (vehicle), which was used as control. We have now replaced vehicle with PBS in Fig. 7.

      (6) For all figures, the number of biological replicates should be mentioned in the legends, as well as the statistical tests used for the analyses.

      We have now included this information in the legends where necessary.

      (7) Some of the scientific rationales are not totally clear and could be better explained in the text. For example, it is written on page 6 "studies mainly on male mice and revolved around innate immune responses" and "we focused on neutrophils because of their high turnover rate and short lifespan", but it is not clear why. The rationale (page 10) for assessing bone mass in "mice globally lacking AIM2 and/or NLRP3" is not totally clear either. The argument is that systemic inflammation leads to bone loss but the effects obtained with the total ablation of AIM2 and NLRP3 do not prove strictly speaking that systemic inflammation really matters (in this current study, although we know from many other studies that it clearly does matter). We could imagine, for example, that bone mass would be preserved in AIM2 KO mice only because the inflammasome is impaired in osteoblasts and/or osteoclasts, but not in any other cell types. Conversely one could imagine that bone would be preserved only because inflammation is preserved in the gut, for example. The use of global knockouts unfortunately does not tell us much about the importance of systemic versus local effects of the inflammasomes. It shows that reducing inflammation, either in specific organs or globally, limits bone loss in doxorubicin-treated mice. This result is important but it was fully expected since doxorubicin has been reported to induce systemic inflammation, and since many studies have shown that systemic inflammation leads to bone loss.

      We appreciate the comments. We have clarified the rationale for focusing on neutrophils (lines 129-130) and AIM2 and NLRP inflammasomes (lines 209-211). We have also now down played the concept of inflammasome-mediated systemic inflammation in doxorubicin-induced bone loss.

    2. eLife assessment

      This useful study, which systematically addresses off-target effects of a commonly used chemotherapy drug on bone and bone marrow cells and which therefore is of potential interest to a broad readership, presents evidence that reducing systemic inflammation induced by doxorubicin limits bone loss to some extent. Although the work does not inform in detail on the underlying mechanisms of doxorubicin action, the demonstration of the effect of systemic inflammation on bone loss is convincing. While not a new finding, the work sets the scene for additional genetic and pharmacologic experiments and a deeper analysis of the bone phenotype presented here, which should speak to the mechanisms involved in doxorubicin-induced bone loss and which may substantiate the clinical relevance of targeting inflammation in order to limit the negative impact of chemotherapies on bone quality.

    3. Reviewer #1 (Public Review):

      Summary:

      Doxorubincin has long been known to cause bone loss by increasing osteoclast and suppressing osteoblast activities. The study by Wang et al. reports a comprehensive investigation into the off-target effects of doxorubicin on bone tissues and potential mechanisms.. They used a tumor-free model with wild type mice and found that even a single dose of doxorubicin has a major influence by increasing leukopenia and DAMPs and inflammasomes in macrophages and neutrophils, and inflammation-related cell death (pyroptosis and NETosis). The gene knockout study shows that AIM2 and NLRP3 are the major contributors to bone loss. Overall, the study confirmed previous findings regarding the impact of doxorubicin on tissue inflammation and expands the research further into bone tissue. The presented data presented are consistent; however, a major question remains regarding whether doxorubicin drives inflammation and its related events. Most in vitro study showed that the effect of doxorubincin cannot be demonstrated without LPS priming. This observation raises the question of whether doxorubincin itself could activate the inflammasome and the related events. In vivo study, on the other hand, suggested that it doesn't require LPS. The inconsistency here was not explained further. Moreover, a tumor-free mouse model was used for the study; however, immune responses in tumor bearing models would likely be distinct from tumor-free ones. The justification for using tumor-free models is not well-established.

      Strengths:<br /> The paper includes a comprehensive study that shows the effects of doxorubincin on cytokine levels in serum, release of DAMPs and NETosis, and leukopenia using both in vivo and in vitro models. Bone marrow cells, macrophages and neutrophils were isolated from the bone marrow, and the levels of cytokines in serum were also determined.

      They employed multiple knockout models with deficiency in Aim 2, Nlirp3, and double deficiencies to dissect the functional involvement of these two inflammasomes.

      The experiments in general are well designed. The paper is also logically written, and figures were clearly labeled.

      Weaknesses:<br /> Most of the data presented are correlative, and there is not much effort to dissect the underlying molecular mechanism.

      It is not entirely clear why a tumor free model is chosen to study immune responses, as immune responses can differ significantly with or without tumor-bearing.

      Immune responses in isolated macrophages, neutrophils and bone marrow cells require priming with LPS, while such responses are not observed in vivo. There is no explanation for these differences.

      The band intensities on Western blots in Fig. 4 and Fig. 5 are not quantified, and the numbers of repeats are also not provided.

      Many abbreviations are used throughout the text, and some of the full names are not provided.

      Fig. 5B needs a label on X axis.

    4. Reviewer #2 (Public Review):

      Summary:

      Wang and collaborators have evaluated the impact of inflammation on bone loss induced by Doxorubicin, which is commonly used in chemotherapy to treat various cancers. In mice, they show that a single injection of Doxorubicin induces systemic inflammation, leukopenia, and a significant bone loss associated with increased bone-resorbing osteoclast numbers. In vitro, the authors show that Doxorubicin activates the AIM2 and NLRP3 inflammasomes in macrophages and neutrophils. Importantly, they show that the full knockouts (germline deletions) of AIM2 (Aim2-/-) and NLRP3 (Nlrp3-/-) and Caspase 1 (Casp1-/-) limit (but do not completely abolish) bone loss induced 4 weeks after a single injection of Doxorubicin in mice. From these results, they conclude that Doxorubicin activates inflammasomes to cause inflammation-associated bone loss.

      Strength:

      This manuscript provides functional experiments demonstrating that NRLP3 and/or AIM2 loss-of-functions (and thus the systemic impairment of the inflammatory response) prevent bone-loss induced by Doxorubicin in mice.

      Weaknesses:

      Numerous studies have reported that Doxorubicin induces systemic inflammation and activates the inflammasome in myeloid cells and various other cell types. It is also known that systemic inflammation and Doxorubicin treatment lead to bone loss. Hence, the key conclusions drawn from this work have been known already or were very much expected. Therefore, the novelty appears somewhat limited. One important limitation is the lack of experiments that could determine which cell lineages are involved in bone loss induced by Doxorubicin in vivo, while the tools to do so exist. The characterization of the bone phenotype is incomplete, and unfortunately does not tell us whether the inflammasome is activated in some of the cell lineages present in bones in vivo. Another limitation is that the relative importance of the inflammasomes compared to cell senescence and autophagy, which are also induced by Doxorubicin, has not been evaluated. Hence the main molecular mechanisms responsible for bone loss induced by Doxorubicin in vivo remains unknown. Lastly, it would have been interesting, on a more clinical point of view, to compare the few relevant treatments that could limit the deleterious effect of Doxorubicin on bone loss while preserving the toxicity on tumor cells.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Satoshi Yamashita et al., investigate the physical mechanisms driving tissue bending using the cellular Potts Model, starting from a planar cellular monolayer. They argue that apical length-independent tension control alone cannot explain bending phenomena in the cellular Potts Model, contrasting with the vertex model. However, the evidence supporting this claim is incomplete. They conclude that an apical elastic term, with zero rest value (due to endocytosis/exocytosis), is necessary in constricting cells and that tissue bending can be enhanced by adding a supracellular myosin cable. Notably, a very high apical elastic constant promotes planar tissue configurations, opposing bending.

      Strengths:

      • The finding of the required mechanisms for tissue bending in the cellular Potts Model provides a more natural alternative for studying bending processes in situations with highly curved cells.

      • Despite viewing cellular delamination as an undesired outcome in this particular manuscript, the model's capability to naturally allow T1 events might prove useful for studying cell mechanics during out-of-plane extrusion.

      We thank the reviewer for the careful comments and insightful suggestions.

      Weaknesses:

      • The authors claim that the cellular Potts Model is unable to obtain the vertex model simulation results, but the lack of a substantial comparison undermines this assertion. No references are provided with vertex model simulations, employing similar setups and rules, and explaining tissue bending solely through an increase in a length-independent apical tension.

      We did not copy the parameters of the vertex models in the preceding studies because we also found that the apical, lateral, and basal surface tensions must be balanced otherwise the epithelial cell could not maintain the integrity (Supplementary Figure 1), while the ratio was outside of the suitable range in the preceding studies.

      • The apparent disparity between the two models is attributed to straight versus curved cellular junctions, with cells with a curved lateral junction achieving lower minimum energies at steady-state. However, a critical discussion on the impact of T1 events, allowing cellular delamination, is absent. Note that some of the cited vertex model works do not allow T1 events while allowing curvature.

      We appreciate the comment, and will add it to the discussion.

      • The suggested mechanism for inducing tissue bending in the cellular Potts Model, involving an apical elastic term, has been utilized in earlier studies, including a cited vertex model paper (Polyakov 2014). Consequently, the physical concept behind this implementation is not novel and warrants discussion.

      The reviewer is correct but Polyakov et al. assumed “that the cytoskeletal components lining the inside membrane surfaces of the cells provide these surfaces with springlike elastic properties” without justification. We assumed that the myosin activity generated not the elasticity but the contractility based on Labouesse et al. (2015), and expected that the surface elasticity corresponded with the membrane elasticity. Also, in the physical concept, we clarified how the contractility and the elasticity differently deformed the cells and tissue, and demonstrated why the elasticity was important for the apical constriction. We will add it to the discussion.

      • The absence of information on parameter values, initial condition creation, and boundary conditions in the manuscript hinders reproducibility. Additionally, the explanation for the chosen values and their unit conversion is lacking.

      We agree with the comment, and will add them to the methods.

      Reviewer #2 (Public Review):

      Summary:

      In their work, the authors study local mechanics in an invaginating epithelial tissue. The mostly computational work relies on the Cellular Potts model. The main result shows that an increased apical "contractility" is not sufficient to properly drive apical constriction and subsequent tissue invagination. The authors propose an alternative model, where they consider an alternative driver, namely the "apical surface elasticity".

      Strengths:

      It is surprising that despite the fact that apical constriction and tissue invagination are probably most studied processes in tissue morphogenesis, the underlying physical mechanisms are still not entirely understood. This work supports this notion by showing that simply increasing apical tension is perhaps not sufficient to locally constrict and invaginate a tissue.

      We thank the reviewer for recognizing the importance and novelty of our work.

      Weaknesses:

      The findings and claims in the manuscript are only partially supported. With the computational methodology for studying tissue mechanics being so well developed in the field, the authors could probably have done a more thorough job of supporting the main findings of their work.

      We thank the reviewer for the careful assessment and suggestions. However our simulation was computationally expensive, modeling the epithelium in an analytically calculable expression requires a lot of work, and it is beyond the scope of the present study.

    2. eLife assessment

      The results from this study, which investigates the mechanisms necessary for initiating tissue invagination using a cellular Potts modelling approach, suggests that apical constriction is not sufficient to drive the process by itself. The study highlights how choices inherent to modelling - such as permitting straight or curved cell edges - may affect the outcome of simulations and, consequently, their biophysical interpretation. Despite incomplete evidence supporting their major claims due to a rather coarse-grained exploration of the model, this work is useful for biophysicists investigating complex tissue deformation through computational frameworks.

    3. Reviewer #1 (Public Review):

      Summary:

      Satoshi Yamashita et al., investigate the physical mechanisms driving tissue bending using the cellular Potts Model, starting from a planar cellular monolayer. They argue that apical length-independent tension control alone cannot explain bending phenomena in the cellular Potts Model, contrasting with the vertex model. However, the evidence supporting this claim is incomplete. They conclude that an apical elastic term, with zero rest value (due to endocytosis/exocytosis), is necessary in constricting cells and that tissue bending can be enhanced by adding a supracellular myosin cable. Notably, a very high apical elastic constant promotes planar tissue configurations, opposing bending.

      Strengths:

      - The finding of the required mechanisms for tissue bending in the cellular Potts Model provides a more natural alternative for studying bending processes in situations with highly curved cells.

      - Despite viewing cellular delamination as an undesired outcome in this particular manuscript, the model's capability to naturally allow T1 events might prove useful for studying cell mechanics during out-of-plane extrusion.

      Weaknesses:

      - The authors claim that the cellular Potts Model is unable to obtain the vertex model simulation results, but the lack of a substantial comparison undermines this assertion. No references are provided with vertex model simulations, employing similar setups and rules, and explaining tissue bending solely through an increase in a length-independent apical tension.

      - The apparent disparity between the two models is attributed to straight versus curved cellular junctions, with cells with a curved lateral junction achieving lower minimum energies at steady-state. However, a critical discussion on the impact of T1 events, allowing cellular delamination, is absent. Note that some of the cited vertex model works do not allow T1 events while allowing curvature.

      - The suggested mechanism for inducing tissue bending in the cellular Potts Model, involving an apical elastic term, has been utilized in earlier studies, including a cited vertex model paper (Polyakov 2014). Consequently, the physical concept behind this implementation is not novel and warrants discussion.

      - The absence of information on parameter values, initial condition creation, and boundary conditions in the manuscript hinders reproducibility. Additionally, the explanation for the chosen values and their unit conversion is lacking.

    4. Reviewer #2 (Public Review):

      Summary:

      In their work, the authors study local mechanics in an invaginating epithelial tissue. The mostly computational work relies on the Cellular Potts model. The main result shows that an increased apical "contractility" is not sufficient to properly drive apical constriction and subsequent tissue invagination. The authors propose an alternative model, where they consider an alternative driver, namely the "apical surface elasticity".

      Strengths:

      It is surprising that despite the fact that apical constriction and tissue invagination are probably most studied processes in tissue morphogenesis, the underlying physical mechanisms are still not entirely understood. This work supports this notion by showing that simply increasing apical tension is perhaps not sufficient to locally constrict and invaginate a tissue.

      Weaknesses:<br /> The findings and claims in the manuscript are only partially supported. With the computational methodology for studying tissue mechanics being so well developed in the field, the authors could probably have done a more thorough job of supporting the main findings of their work.

    1. Reviewer #2 (Public Review):

      Summary:

      Dr Lenz and colleagues report on their in vitro studies comparing gene transcription and epigenetic modifications in Plasmodium falciparum NF54 parasites selected or not selected for adhesion of the infected erythrocytes (IEs) to the placental IE adhesion receptor chondroitin sulfate A (CSA).

      The authors report that selection led to preferential transcription of var2csa, the gene that encodes the VAR2CSA-type PfEMP1 well-established as the PfEMP1 mediating IE adhesion to CSA. They confirm that transcriptional activation of var2csa is associated with distinct depletion of H3K9me3 marks and that transcriptional activation is linked to repositioning of var2csa.

      Strengths:

      The study confirms previously reported features of gene transcription and epigenetic modifications in Plasmodium falciparum.

      Weaknesses:

      No major new finding is reported.

    1. eLife assessment

      The findings in this manuscript are important and novel, and the genomic analyses are convincing. This study expands upon our understanding of the role of hnRNP proteins in lncRNA function and the evidence is compelling, suggesting shared mechanism(s) in the regulation of ASARs and Xist RNAs by RBPs that bind Cot1 sequences in these lncRNAs. This manuscript should be of interest to the noncoding RNA and chromatin biology communities.

    2. Reviewer #1 (Public Review):

      Summary:

      Thayer et al build upon their prior findings that ASAR long noncoding RNAs (lncRNAs) are chromatin-associated and are implicated in control of replication timing. To explore the mechanism of function of ASAR transcripts, they leveraged the ENCODE RNA binding protein eCLIP datasets to show that a 7kb region of ASAR6-141 is bound by multiple hnRNP proteins. Deletion of this 7kb region resulted in delayed chromosome 6 replication. Furthermore, ectopic integration of the ASAR6-141 7kb region into autosomes or the inactive X-chromosome also resulted in delayed chromosome replication. They then use RNA FISH experiments to show that the knockdown of these hnRNP proteins disrupts ASAR6-141 localization to chromatin and in turn replication timing.

      Strengths:

      Given prior publications showing HNRNPU to be important for chromatin retention of XIST and Firre, this work expands upon our understanding of the role of hnRNP proteins in lncRNA function.

      Weaknesses:

      The work presented is mechanistically interesting, however, one must be careful with the over-interpretation that hnRNP proteins can regular chromosome replication directly. Furthermore, the work could be strengthened by including a few controls and clarifications.

    3. Reviewer #2 (Public Review):

      Summary:

      This paper reports a role for a substantial number of RNA binding proteins (RBPs), in particular hnRNPs, in the function of ASAR "genes". ASARs are (very) long, non-coding RNAs (lncRNAs) that control allelic expression imbalance (e.g.: mono-allelic expression) and replication timing of their resident chromosomes. These relatively novel "genes" have recently been identified on all human autosomes and are of broad significance given their critical importance for basic chromosomal functions and stability. However, the mechanism(s) of ASAR function remain unclear. ASARs exhibit some functional relatedness to Xist RNA, including persistent association of the expressed RNA with its resident chromosome, and similarities in the composition of RNA sequences associated with ASARs, in particular Line1 RNAs. Recent findings that certain hnRNPs control the chromosome territory retention of Cot1-bearing RNAs (which includes Line1) led the authors to test the hypothesis that hnRNPs might regulate ASARs.

      Specific new findings in this paper:

      -Analysis of eCLIP (RNA-protein interaction) ENCODE data shows numerous interactions of the ASAR6-141 RNA with RBPs, including hnRNPs (e.g.: HNRNPU) that have been implicated in the retention of RNAs within local chromosome territories.

      -Most of these interactions can be mapped to a 7kb region of the 185kb ASAR6-141 RNA.

      -Deletion of this 7kb region is sufficient to induce the DMC/DRT phenotype associated with deletion of the entire ASAR region.

      -Ectopic integration into mouse autosomes of the 7kb region is sufficient to cause DMC/DRT of the targeted autosome, and a similar effect upon ectopic integration into inactive X. This raises the question about integration into the active X, which was not mentioned. Is integration into the active X observed? Is it possible that integration might alter Xist expression confounding this interpretation?

      -Knockdown of RBPs that bind the 7kb region causes dissociation of ASAR6-141 RNA from its chromosome territory, and, remarkably, dissociation of Xist RNA from inactive X, and mis-colocalization of the ASAR6-141 and Xist RNAs. Depletion of these RBPs causes DMC/DRT on all autosomes.

      Strengths:

      These are compelling results suggesting shared mechanism(s) in the regulation of ASARs and Xist RNAs by RBPs that bind Cot1 sequences in these lncRNAs. The identification of these RBPs as shared effectors of ASARs and Xist that are required for RNA territory localization mechanistically links previously independent phenomena.

      The data are convincing and support the conclusions. The replication timing method is low resolution and is only a relative measure but seems adequate for the task at hand. The FISH experiments are convincing. The quality of the images is impressive.

      Links to other subfields like X-inactivation and RNA association with chromosome territories provide novel context and protein players, new phenotypes to examine.

      Weaknesses:

      The exact effects of knockdown experiments are unclear and may be indirect, which is acknowledged.

      The mechanism is not much clearer than before.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      "Expanding the Drosophila toolkit for dual control of gene expression" by Zirin et al. aims to develop resources for simultaneous independent manipulation of multiple genes in Drosophila. The authors use CRISPR knock-ins to establish a collection of T2A-LexA and T2A-QF2 transgenes with expression patterns in a number of commonly studied organs and tissues. In addition to the transgenic lines that are established, the authors describe a number of plasmids that can be used to generate additional transgenes, including a plasmid to generate a dual insert of LexA and QF that can be resolved into a single insert using FLP/FRT-mediated recombination, and plasmids to generate RNAi reagents for the LexA and QF systems. Finally, the authors demonstrate that a subset of the LexA and QF lines that they generated can induce RNAi phenotypes when paired with LexAop or QUAS shRNA lines. In general, the claims of the paper are well supported by the evidence and the authors do a thorough job of validating the transgenic lines and characterizing their expression patterns.

      Strengths:

      • Numerous Gal4 lines allow for highly specific genetic manipulation in a wide range of organs and tissues, however, similar tissue-specific drivers using alternative binary expression systems are not currently well developed. This study provides a large number of tissue and organ-specific LexA and QF2 driver lines that should be broadly useful for the Drosophila community.

      • While a minority of the driver lines do not express the expected pattern (likely due to cryptic regulatory elements in the LexA or QF2 sequences), the ability to generate drivers using two different Gal4 alternatives mitigates this issue (as in nearly all cases at least one of the two systems produces a clean driver line with the expected expression pattern).

      • The use of LexA-GAD provides an additional degree of control as it is subject to Gal80 repression. This could prove to be particularly useful in cases where a researcher wishes to manipulate multiple genes using Gal4 and LexA-GAD drivers as the Gal80(ts) system could be used for simultaneous temporal control of both constructs.

      • The use of Fly Cell Atlas information to generate novel oenocyte-specific driver lines provides a useful proof-of-concept for constructing additional highly tissue-specific drivers.

      Weaknesses:

      • Since these reagents will most commonly be paired with existing Gal4 lines, adding information about corresponding Gal4 lines targeting these tissues and how faithfully the LexA and QF2 lines recapitulate these Gal4 patterns would be highly beneficial.

      It is outside the scope of this paper to analyze the expression patterns of the corresponding publicly available Gal4 lines. It is clear from the tissue specificity of the LexA-GAD and QF2 lines that they are expressed in the expected larval tissues based on the target genes. We have added a sentence in the discussion section noting “Further, we expect that there will also be differences between the expression pattern of corresponding Gal4 and the LexA-GAD/QF lines, as the latter were made by knock-in, while the former are often enhancer traps. However, based on our larval mounts and dissections, the stocks generated in this paper are highly specific to the expression pattern of the targeted genes.”

      • It is not stated in the manuscript if these transgenic lines and plasmids are currently publicly available. Information about how to obtain these reagents through Bloomington, Addgene, or TRiP should be added to the manuscript.

      We have added to the materials section that “All vectors described here that are required to produce new driver lines will be made available at Addgene.” And “All transgenic fly stocks described here will be made available at the Bloomington Drosophila Stock Center.”

      Reviewer #2 (Public Review):

      Zirin, Jusiak, and Lopes et al presented an efficient pipeline for making LexA-GAD and QF2 drivers. The tools can be combined with a large collection of existing GAL4 drivers for a dual genetic control of two cell populations. This is essential when studying inter-organ communications since most of the current genetic drivers are biased toward the expression of the central nervous system. In this manuscript, the authors described the methodology for efficiently generating T2A-LexA-GAD and T2A-QF2 knock-ins by CRISPR, targeting a number of genes with known tissue-specific expression patterns. The authors then validated and compared the expression of double as well as single drivers and found the tissue-specific expression results were largely consistent as expected. Finally, a collection of plasmids for LexA-GAD and QF,2 as well as the corresponding LexAop and QUAS plasmids were generated to facilitate the expansion of these tool kits. In general, this study will be of considerable interest to the fly community and the resources can be readily generalized to make drivers for other genes. I believe this toolkit will have a significant, immediate impact on the fly community.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Lines 56-57: Janelia Flylight lines are not necessarily brain-specific - this collection has or could be screened in other tissues.

      Correct. We have altered this sentence to read: However, these lines were developed primarily for brain expression. Although they are often expressed in other tissues, they are not well suited for experiments targeting non-neuronal cell types

      • Line 197 - I don't see the referenced Figure S1 in the reviewer materials. It appears this is actually referencing panels LL and MM in Figure 2.

      Correct. We have fixed this error.

      • No information on the injection efficiency to create the CRISPR knock-in lines is presented. I am guessing the efficiency will be similar to that of other reported HDR-based CRISPR knock-ins, but if this information is available it would be useful to include it so that others know what to expect when injecting these vectors.

      We did not systematically assay the injection efficiency. However, we can say that it was in line with previous descriptions of CRISPR-based plasmid and ‘drop-in’ HDR methods. We have added a note in the methods that “Knock-in efficiencies were comparable to previous reports (Kanca et al. 2019; Kanca et al. 2022).”

      • Demonstration of successful multi-manipulation would strengthen the paper.

      We do not feel that this is necessary as there have been many papers showing combinatorial Gal4+LexA/QF experiments. An example from our lab can be seen in PMID: 37582831.

      • Also, are there approaches for efficiently constructing pairs of UAS/LexAOp or UAS/QUAS shRNA lines that would potentially streamline the genetics for multi-manipulation? Otherwise, this could be rather cumbersome to implement as one needs to combine a Gal4 line, a LexA/QF2 line (which will be constrained as to its chromosomal location by the target gene), and separate UAS-shRNA and LexAop/QUAS-shRNA constructs into the same fly.

      There are some recent innovations that are useful in this respect. We have added a sentence to the discussion that says: “There remains an unmet need for a single vector that would allow for UAS/LexAop/QUAS control of different shRNAs. However, recent innovations in multi module vectors and multiplexed drug-based genetics allow researchers to more efficiently generate UAS/QUAS/lexAop transgenic fly strains (Matinyan et al. 2021; Wendler et al. 2022).”

      • In Figure 5 - is the difference for the hh inserts attributable to the driver line or the GFP/mCherry construct (or differential ability to detect GFP/mCherry)? One could try visualizing hhL(-Q) with the LexAop-GFP line. I guess that the correspondence between the nubbin and hh result suggests that maybe QF2 is suppressed in the wing pouch, but this could also be the difference in the reporter constructs and it would be interesting to know if this difference is truly attributable to the driver constructs from the standpoint of knowing how consistent the QF/LexA patterns are expected to be.

      The difference is not attributable to GFP versus mCherry or the specific LexAop and QUAS lines that we used in figure 5. We tested the double knock-in and derivative single knock-ins with various QUAS and lexAop reporters and always observed the same pattern.

      Reviewer #2 (Recommendations For The Authors):

      There are a few points that should be clarified. A list of these specific points is provided below with the view that this could help the preparations of a stronger, improved paper.

      Line 50-51: "There have been no systematic studies comparing the two systems, with only anecdotal evidence to support one system over the other." It is unclear to me what the anecdotal evidence the authors referred to. Could the authors elaborate more on this part?

      Based on an examination of QUAS brains, Potter et al, 2010 (PMID 20434990) makes the claim that “The low basal expression of QUAS and UAS reporters provides significant advantage compared to the lexA binary expression system.”

      Shearin et al., 2014 (PMID: 24451596) compared Gal4/UAS, LexA/LexAop, and QF/QUAS reporter strength with the nompC driver and found that the QF system produced the strongest expression.

      While these observations might be true in the nervous system, it isn’t clear that this extends to other tissues, nor what effect this would have on gene knockdown experiments.

      There have been some reports that have explored swapping out a Gal4 insertion for a LexA or QF at the same locus. For example, Gohl et al. 2011 PMID: (PMID 21473015) mentions that “the majority of the swaps captured most features of the original GAL4 expression patterns. In some cases, however, either prominent features of the GAL4 pattern were lost or we observed new expression patterns. These changes may have resulted from differences in the strength or responsiveness of reporter lines. Alternately, the swap may have modified some combination of enhancer spacing and sequence composition flanking the promoter.”

      Line 61-62: "On average, each StanEx line expresses LexA activity in five distinct cell types, with only one line showing expression in just one tissue..." What's the evidence to support this claim?

      This observation comes from Figure S3 of Kockel et al. 2016 (PMID: 27527793), where the authors “analyzed a subset of 76 StanEx lines that are unambiguously inserted within, or adjacent to, a single known gene.” We cited this reference in the preceding sentence. To clarify, we have added the citation again for line 61-62.

      Line 63-65: "These findings are consistent with prior studies indicating that enhancers very rarely produce expression patterns that are limited to a single cell type in a complex organism (Jenett et al. 2012)." It might be worth expanding on the use of the split system to achieve high cell-type-specificity. Especially, there are growing resources using split-intein and T2A-split-GAL4 with the prediction of genes from single-cell RNA sequencing datasets.

      We agree that the split system is currently the premier method to produce the most specific driver lines. Indeed, our group has recently published a paper on the split-intein Gal4 system (see PMID 37276389). However, the tradeoff is that split systems usually require generation of transgenic lines, which becomes impractical for research involving two independent binary transcriptional systems, as the user would need to combine at least three driver components into single stocks, plus the UAS/QUAS/LexAop insertions. The ideal would be to generate complementary split insertions on the same chromosome, but we think a discussion of this is tangential to the thrust of our work here.

      The authors did not fully discuss the rationale of using LexA-GAD vs LexA-p65 or VP16AD throughout the manuscript. I assumed the main reason for choosing LexA-GAD was to be compatible with GAL80 suppression. It might be worth explicitly stating in the result (e.g., line 123 or in the introduction). Also, did the authors observe weak transcriptional activation using LexA-GAD? It has been shown that the strength of transactional activation is much weaker for GAL4AD than the p65 or VP16AD. This might be worth noting in the manuscript as well.

      We did briefly mention in the introduction that one disadvantage of the Flylight lines is that they “use a p65 transcriptional activation domain and therefore are not compatible with the Gal80 temperature sensitive Gal4 repression system.” We have expanded on this issue in the introduction which now says: “We chose to use LexA with the Gal4 activation domain, rather than the p65 or VP16 activation domains to allow for temporal control by Gal80 (Lai and Lee 2006; Pfeiffer et al. 2010). We chose to use QF2 variant over the original QF, to avoid the toxicity reported for the latter (Riabinina et al. 2015).”

      We did not have any problems visualizing gene expression with fluorescent reporters. Nor did we have any difficulty obtaining knock-down phenotypes with ubiquitous drivers.

      Line 125-127. Is there a specific reason why the authors chose the SV40 terminator for the double driver construct but the Hsp70 terminator for the single driver construct?

      We found that the Hsp70 terminator gave slightly lower expression and decided to use this for the singles to avoid toxicity. For the doubles we chose the SV40, to compensate for reduced protein expressiojn of the second gene position.

      Line 144-146: "To verify the knock-ins, we PCR-amplified the genomic regions flanking the insertion sites and confirmed that the insertions were seamless and in-frame." Did the authors recover lines with indel introduced, resulting in out-of-frame insertion?

      Yes, we did see indels, which sometimes resulted in out of frame insertions, which were discarded. This result is in line with what we have observed with other CRISPR HDR knock-in experiments.

      The underlying reason might be out of the scope of this manuscript. However, it would still be helpful for the authors to speculate the potential reasons why the T2A-LexA-GAD and T2A-QF2 targeting the same insertion site showed very distinct expressions.

      It is outside the scope of this report to test this issue experimentally. We have a section in the discussion which does speculate as to the reason: “While we had no difficulty obtaining knock-ins for both types of activators, we did observe that for some target genes, the T2A-QF2 was only active in a subset of the expected gene expression pattern. In particular, we found that T2A-QF2 was difficult to express in the wing pouch. It may be that toxicity is an issue, and the weaker QF2w may be a better option for generating drivers in some organs (Riabinina and Potter 2016). Alternatively, differences in the LexA-GAD and QF2 sequences, and sequence length, could impact the function of nearby gene regulatory regions.”

      Regarding the observation that the existence of 3XP3-RFP marker can interfere with the expression of T2A-LexA-GAD and T2A-QF2 expression in a case-by-case manner, it might be worth emphasizing in the discussion that the proper removal of 3XP3-RFP marker by Cre/LoxP recombination is important.

      We have added the following to the discussion: “Importantly, our knock-in constructs contain the 3XP3-RFP cassette for screening transformants. Perhaps due to interaction between the 3XP3 promoter and the regulatory regions of the target gene, we occasionally saw misexpression of the LexA-GAD/QF2 in the 3XP3 domain. We have therefore prioritized Cre-Lox removal of the 3XP3-RFP cassette from our knock-in stocks, and advise that users of the plasmids described here likewise remove the marker, following successful knock-in.”

      For Fig. 5B, 5F-G, the authors should elaborate more in the result section. For example, lines 215-217: "We tested this with the hh and dpp lines and observed robust generation of both T2A-QF2 and T2A-LexA-GAD from hs-Flp; T2A-QF2-T2A-LexA-GAD parents (Figure 5B)." It is unclear what the authors mean by "robust generation". Also, there is no description of the results in Fig. 5F-G.

      We have expanded this section for figure 5B, which now reads: “We tested this with the hh and dpp lines and observed robust generation of both T2A-QF2 and T2A-LexA-GAD from hs-Flp; T2A-QF2-T2A-LexA-GAD parents (Figure 5B). In the case of the hh line, 15 out of 36 heat-shocked parents gave rise to at least one T2A-LexA-GAD progeny, with a mean of 14% recombinant offspring per parent. 20 out of 36 gave rise to at least one T2A-QF2 progeny, with a mean of 9% recombinant offspring per parent. In the case of the dpp line, 31 out of 32 heat-shocked parents gave rise to at least one T2A-LexA-GAD progeny, with a mean of 30% recombinant offspring per parent. 17 out of 32 gave rise to at least one T2A-QF2 progeny, with a mean of 9% recombinant offspring per parent.

      We have also added a description for Figure 5F-G, which reads: “Recombinants were also independently verified by PCR of the insertions (Figure 5F-G), where we observed the expected smaller band sizes in the derivative T2A-QF2 and T2A-LexA-GAD relative to the parental double driver.”

      Line 229, minor error: "Into these vectors, ..."

      We have edited this to read: “We cloned shRNAs targeting forked (f) and ebony (e) genes into these vectors and assayed their phenotypes when crossed to ubiquitous LexA-GAD and QF2 drivers.”

      Line 238-240: "Both Tub-LexA-GAD and Tub-QF2 drivers generated knockdown phenotypes in the thorax when crossed to f and e shRNA lines. However, the Tub-LexA-GAD phenotypes were stronger than those of Tub-QF2 (Figure 6C-D, F-G, I-J)." The stated "stronger phenotypes" are not clear to me. It might be worth elaborating more.

      We have further clarified this by changing it to: “However, the Tub-LexA-GAD phenotypes were stronger than those of Tub-QF2 (Figure 6C-D, F-G, I-J). For example, Tub-LexA-GAD produced a fully penetrant f bristle phenotype (Figure 6F) while some wild-type bristles remained on the thoraces of Tub-QF2 f knockdown (Figure 6G). Neither Tub-LexA-GAD or Tub-QF2 was able to achieve the strength of phenotype generated by the T2A-LexA-GAD da knock-in line (compare the darkness of the cuticle caused by e knockdown in Figure 6H-J).”

      Line 257-250: "Our collection of T2A-LexA-GAD and T2A-QF2 and double driver vectors can be easily adapted to target any gene for CRISPR knock-in, with a high probability that the resulting line will accurately reflect the expression of the endogenous locus" The authors could refer to the recent gene-specific Trojan GAL4/split-GAL4 work to support the idea that these gene-specific T2A-GAL4/split-GAL4 drivers reflect better than the enhancer-based drivers.

      We have added the following sentence to the discussion: “The specificity achieved with this approach can also be seen in recent efforts to build collections of gene specific T2A-Split-Gal4 and T2A-Gal4 insertions (Kanca et al. 2019; Chen et al. 2023; Ewen-Campen et al. 2023).”

      Line 630: "Removal of 3XP3-RFP eliminated gut and anal pad misexpression and did not affect glial cell expression." It would be helpful to add the annotation on Fig. 3B to show the location of glial cell expression.

      We have added arrowheads on Figure 3 and the legend now reads: “Removal of 3XP3-RFP eliminated gut and anal pad misexpression and did not affect glial cell expression (white arrowheads).

      Line 650-651: "The fat body mCherry expression is also present in the reporter stock and does not indicate LexA-GAD activity." I did not get what the authors were trying to convey. Where did the fat body mCherry expression come from? Please elaborate more.

      We have changed this section to explain that “The fat body mCherry expression (yellow arrowhead) is from leakiness of the reporter stock and does not indicate LexA-GAD activity.”

      Line 679-680: "forked shRNA produced a forked bristles phenotype." Please add the annotation on the figures to show where the phenotypes were.

      We have added arrowheads and asterisks to the figure. The legend now reads: “(E-G) forked shRNA produced a forked bristles phenotype (white arrowheads). Note that some bristles retain a more elongated wild-type morphology with the Tub-QF2 driven forked knockdown (G, yellow asterisk).”

      Fig 1D-E and 4A-B. There is no description throughout the manuscript about QA, QS regulation as well as little GAL80ts regulation. It will confuse readers with a little fly genetic background. Please include the introductions of these regulations of different binary expression systems.

      We have added a section in the introduction, which states: “We chose to use LexA with the Gal4 activation domain, rather than the p65 or VP16 activation domains to allow for temporal control by the temperature sensitive Gal4 repressor, Gal80 (Lai and Lee 2006; Pfeiffer et al. 2010). We chose to use QF2 variant over the original QF, to avoid the toxicity reported for the latter (Riabinina et al. 2015). Like Gal80-based modulation of LexA-GAD, QF2 activity can also be regulated temporally by expressing QS, a QF repressor. QS repression of QF can be released by feeding flies quinic acid (Riabinina and Potter 2016).”

      Fig. 2, there are several ND in the figure without any explanation in the manuscript (e.g. Mef2 and He). In addition, the expression patterns look quite different between T2A-LexA-GAD and T2A-QF2 for some genes (e.g., mex1, Myo31DF), but the authors did not mention any of them in the manuscript. Please elaborate more.

      We have altered the Figure 2 legend as follows: “(A-KK) T2A-LexA-GAD knock-in lines crossed to a LexAop-GFP reporter and T2A-QF2 knock-in lines crossed to a QUAS-GFP reporter. Panels show 3rd instar larva. GFP shows the driver line expression pattern. RFP shows the 3XP3 transformation marker, which labels the posterior gut and anal pads of the larva. Gene names and tissues are on the left. We failed to obtain LexA-GAD knock-ins for Mef2 (E) and He (DD). (LL-MM) 3rd instar imaginal disc from the insertions in the nubbin (nub) gene. Note that most of the lines are highly tissue-specific and are comparable between the LexA-GAD and QF2 knock-ins. Insertions in the daughterless gene (da) and nub are an exception, as the T2A-LexA-GAD, but not the T2A-QF2, gives the expected expression pattern. Insertions in the gut-specific genes mex1 (X-Y) and Myo31Df (Z-AA) also differed between the LexA-GAD and QF2 drivers.”

      We have also added a note on the inconsistency of mex1 and Myo31Df in the discussion: “While we had no difficulty obtaining knock-ins for both types of activators, we did observe that for some target genes, the T2A-QF2 was only active in a subset of the expected gene expression pattern. In particular, we found that T2A-QF2 was difficult to express in the wing pouch. Additionally, we found that the driver expression in the gut-specific genes, mex1 and Myo31Df differed between the LexA-GAD and QF2 transformants. In both cases the LexA-GAD was more broadly expressed along the length of the gut than the QF2. It may be that toxicity is an issue, and the weaker QF2w may be a better option for generating drivers in some organs (Riabinina and Potter 2016).”

      Fig. 4B, it is unclear why the hsp70 is present downstream of the enhancer of interest (upstream of T2A). Is it the molecular mark resulting from the cloning steps? Does it serve any specific purpose?

      This is the Drosophila hsp70 gene minimal promoter and is standard for many expression constructs in Drosophila. In the methods section we described how we made versions of the pMCS-T2A-QF2-T2A-LexA-GAD-WALIUM20 with and without tis minimal promoter: “We used pMCS-T2A-QF2-T2A-lexA0GAD-WALIUM20 for dpp-blk and pMCS-T2A-QF2-T2A-lexGAD-WALIUM20-alt (which lacks the hsp70 promoter) for Ilp2, since dpp-blk does not have a basal promoter, but the Ilp2 enhancer does.”

      Fig 5A. The resulting single T2A-QF2 and T2A LexA-GAD from the double driver parental lines retain the sequence of FRT3 upstream of the QF2 and LexA-GAD. I assume the FRT3 part will be translated and remain attached to QF2 and LexA-GAD. Is that correct? If so, would this cause any adverse effect?

      Correct. The FRT3 sequence is present in both the parental double and single derivatives. We can say that the additional amino acids do not prevent LexA-GAD or QF2 transcriptional activation. We do not know whether there may be other adverse effects, though we did not observe any.

      Fig. 5C-C'. It seems like the images of Fig. 5C-C' were the same as Fig. 4D-D'. If so, the authors should indicate that in the figure legend.

      We have made a note of this in the figure legend.

    2. eLife assessment

      This important study reports the generation of genetic tools for manipulating several tissues at the same time in Drosophila. The authors provide convincing evidence that this allows for the generation of LexA and QF2 driver lines, which will be of great utility for understanding inter-organ communication. Making the tools available through the Drosophila stock center and plasmid depository will ensure that they are easily accessed by many researchers.

    3. Reviewer #1 (Public Review):

      Summary:<br /> "Expanding the Drosophila toolkit for dual control of gene expression" by Zirin et al. aims to develop resources for simultaneous independent manipulation of multiple genes in Drosophila. The authors use CRISPR knock-ins to establish a collection of T2A-LexA and T2A-QF2 transgenes with expression patterns in a number of commonly studied organs and tissues. In addition to the transgenic lines that are established, the authors describe a number of plasmids that can be used to generate additional transgenes, including a plasmid to generate a dual insert of LexA and QF that can be resolved into a single insert using FLP/FRT-mediated recombination, and plasmids to generate RNAi reagents for the LexA and QF systems. Finally, the authors demonstrate that a subset of the LexA and QF lines that they generated can induce RNAi phenotypes when paired with LexAop or QUAS shRNA lines. In general, the claims of the paper are well supported by the evidence and the authors do a thorough job of validating the transgenic lines and characterizing their expression patterns.

    4. Reviewer #2 (Public Review):

      Zirin, Jusiak, and Lopes et al presented an efficient pipeline for making LexA-GAD and QF2 drivers. The tools can be combined with a large collection of existing GAL4 drivers for a dual genetic control of two cell populations. This is essential when studying inter-organ communications since most of the current genetic drivers are biased toward the expression of the central nervous system. In this manuscript, the authors described the methodology for efficiently generating T2A-LexA-GAD and T2A-QF2 knock-ins by CRISPR, targeting a number of genes with known tissue-specific expression patterns. The authors then validated and compared the expression of double as well as single drivers and found the tissue-specific expression results were largely consistent as expected. Finally, a collection of plasmids for LexA-GAD and QF,2 as well as the corresponding LexAop and QUAS plasmids were generated to facilitate the expansion of these tool kits. In general, this study will be of considerable interest to the fly community and the resources can be readily generalized to make drivers for other genes. I believe this toolkit will have a significant, immediate impact on the fly community.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors present a potentially useful model involving Ca2+ signaling in inflammasome activation. As it stands, it was felt that the data were not sufficient to support the model and the claims of the study are inadequately presented.

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript proposes a complex unclear model involving Ca2+ signaling in inflammasome activation. The experimental approaches used to study the calcium dynamics are problematic and the results shown are of inadequate quality. The major claims of this manuscript are not adequately substantiated.

      Major concerns:

      (1) The analysis of lysosomal Ca2+release is being carried out after many hours of treatment. Such evidence is not meaningful to claim that PA activates Ca2+ efflux from lysosome and even if this phenomenon was robust, it is not doubtful that such kinetics are meaningful for the regulation of inflammasome activation. Furthermore, the evidence for lysosomal Ca2+ release is indirect and relies on a convoluted process that doesn't make any conceptual sense to me. In addition to these major shortcomings, the indirect evidence of perilysosomal Ca2+ elevation is also of very poor quality and from the standpoint of my expertise in calcium signaling, the data are incredulous. The use of GCaMP3-ML1, transiently transfected into BMDMs is highly problematic. The efficiency of transfection in BMDMs is always extremely low and overexpression of the sensor in a few rare cells can lead to erroneous observations. The overexpression also results in gross mislocalization of such membrane-bound sensors. The accumulation of GCaMP3-ML1 in the ER of these cells would prevent any credible measurements of perilysosomal Ca2+ signals. A meaningful investigation of this process in primary macrophages requires the generation of a mouse line wherein the sensor is expressed at low levels in myeloid cells, and shown to be localized almost exclusively in the lysosomal membrane. The mechanistic framework built around these major conceptual and technical flaws is not especially meaningful and since these are foundational results, I cannot take the main claims of this study seriously.

      Ans) We agree with the reviewer’s concern that transfection efficiency could be low in BMDMs together with possible mislocalization of GCAMP3-ML1. However, in our experiment, transfection of BMDM with test plasmids resulted in good expression of test proteins. Below, we present our data showing good transfection efficiency of BMDM cells, while a different plasmid was employed.

      Author response image 1.

      (2) The cytosolic Ca2+ imaging shown in Figure 1C doesn't make any sense. It looks like a snapshot of basal Ca2+ many hours after PA treatment - calcium elevations are highly dynamic. Snapshot measurements are not helpful and analyses of Calcium dynamics requires a recording over a certain timespan. Unfortunately, this technical approach has been used throughout the manuscript. Also, BAPTA-AM abrogates IL-1b secretion because IL-1b transcription is Ca2+ dependent - the result shown in figure 1D does not shed light on anything to do with inflammasome activation and it is misleading to suggest that.

      Ans) We agree with the reviewer’s concern that snapshot could lead to false conclusion. We have not traced cytosolic Ca2+ content after treatment with LPS + PA. However, we have traced lysosomal Ca2+ and ER Ca2+ for more than 15 min, which was presented in Figure 4B. We also agree with the comment that BAPTA-AM might affect transcription of pro-IL-1β. We have conducted immunoblot analysis after treatment with LPS+PA in the presence of BAPTA-AM. Protein band of pro-IL-1β was not affected by BAPTA-AM treatment suggesting no effect of BAPTA-AM on transcription or translation of pro-IL-1β, which was added to Figure 1D, as suggested.

      (3) Trpm2-/- macrophages are known to be hyporesponsive to inflammatory stimuli - the reduced secretion of IL-1b by these macrophages is not novel. From a mechanistic perspective, this study does not add much to that observation and the proposed role of TRPM2 as a lysosomal Ca2+ release channel is not substantiated by good quality Ca2+ imaging data (see point 3 above). Furthermore, the study assumes that TRPM2 is a lysosomal ion channel. One paper reported TRPM2 in the lysosomes but this is a controversial claim, with no replication or further development in the last 14 years. This core assumption can be highly misleading to readers unfamiliar with TRPM2 biology and it is necessary to present credible evidence that TRPM2 is functional in the lysosomal membrane of macrophages. Ideally, this line of investigation should rest on robust demonstration of TRPM2 currents in patch-clamp electrophysiology of lysosomes. If this is not technically feasible for the authors, they should at least investigate TRPM2 localization on lysosomal membranes of macrophages.

      Ans) We agree with the reviewer’s comment that TRPM2. However, we have shown that TRPM2 current was not activated in the plasma membrane of BMDMs after treatment with LPS+PA. We also agree with the reviewer’s comment that inflammatory cytokine release from TRPM2 KO cells or inflammasome response of TRPM2 KO macrophages to ROS or nanoparticles has been reported to be reduced; however, the role of TRPM2 in metabolic inflammation or inflammasome activation in response to lipid stimulators has not been shown, as discussed in the new lines 9-10 from the bottom of page 18. Regarding the role of lysosomal TRPM2 in inflammation, we have shown that bafilomycin A1 treatment abrogated increase of cytosolic Ca2+ by LPS+PA (Figure 3-figure supplement 1D), supporting the role of lysosome and lysosomal Ca2+ in inflammasome activation by LPS+PA.

      We agree with the reviewer’s comment that TRPM2 expression on lysosome needs to be tested. We conducted confocal microscopy after immunofluorescence staining using anti-TRMP2 and -LAMP2 antibodies, which showed a certain portion of TRPM2 was colocalized with LAMP-2. This result substantiating TRPM2 expression on lysosome of macrophages was incorporated as Figure 2-figure supplement 1A.

      (4) Apigenin and Quercetin are highly non-specific and their effects cannot be attributed to CD38 inhibition alone. Such conclusions need strong loss of function studies using genetic knockouts of CD38 - or at least siRNA knockdown. Importantly, if indeed TRPM2 is being activated downstream of CD38, this should be easily evident in whole cell patch clamp electrophysiology. TRPM2 currents can be resolved using this technique and authors have Trpm2-/- cells for proper controls. Authors attempted these experiments but the results are of very poor quality. If the TRPM2 current is being activated through ADPR generated by CD38 (in response to PA stimulation), then it is very odd that authors need to include 200 uM cADPR to see TRPM2 current (Fig. 3A). Oddly, even these data cast great doubt on the technical quality of the electrophysiology experiments. Even with such high concentrations of cADPr, the TRPM2 current is tiny and Trpm2-/- controls are missing. The current-voltage relationship is not shown, and I feel that the results are merely reporting leak currents seen in measurements with substandard seals. Also 20 uM ACA is not a selective inhibitor of TRPM2 - relying on ACA as the conclusive diagnostic is problematic.

      Ans) We agree with the reviewer’s comment that effects of apigenin and quercetin could be due to mechanisms other than inhibition of CD38-mediated inflammasome activation. Indeed, that is the reason we have used TRPM2 KO mice and cells. Small TRPM2 current after treatment with high concentrations of cADPr might suggest the minor role of plasma membrane of TRPM2 in macrophage. Regarding concern about ACA, we added data showing inhibition of IL-1β release in response to LPS+PA by ACA as a new Figure 3-figure supplement 1A.

      (5) TRPM2 is expressed in many different cell lines. The broad metabolic differences observed by the authors in the Trpm2-/- mice cannot be attributed to macrophage-mediated inflammation. Such a conclusion requires the study of mice wherein Trpm2 is deleted selectively in macrophages or at least in the cells of the myeloid lineage.

      Ans) We agree with the reviewer’s comment that TRPM2 in cells other than macrophage might have affected the results. Thus, we have conducted in vitro stimulation of TRPM2-KO primary peritoneal macrophages with LPS+PA. We have observed that IL-1β release of TRPM2-KO macrophages in response in vitro treatment with LPS+PA was significantly lower than that from wild-type macrophages (Figure 2C & D), showing the role of TRPM2 in macrophages in inflammasome activation by LPS+PA, which could be independent of TRPM2 in tissues or cells other than macrophages.

      (6) The ER-Lysosome Ca2+ refilling experiments rely on transient transfection of organelle-targeted sensors into BMDMs. See point #1 to understand why I find this approach to be highly problematic. Furthermore, the data procured are also not convincing and lack critical controls (localization of sensors has not been demonstrated and their response to acute mobilization of Ca2+ has not been shown to inspire any confidence in these results).

      Ans) We agree with the reviewer’s comment that transfection or ER-targeted Ca2+ sensor could have artifactual effects. However, we have studied ER-Lysosome Ca2+ experiment using not only GEM-CEPIAer but also using D1ER, a FRET-based ER Ca2+ sensor which has an advantage of short distance of molecular interaction. Thus, we believe that changes of ER Ca2+ after treatment with LPS+PA is not due to an artifactual effect. Multiple contact between VAPA and ORP1L (Figure 4E) also supports ER-lysosome contact, likely facilitating ER-lysosome Ca2+ flux.

      (7) Authors claim that SCOE is coupled to K+ efflux. But there is no credible evidence that SOCE is activated in PA stimulated macrophages. The data shown in Fig 4 supp 1 do not investigate SOCE in a reliable manner - the conclusion is again based on snapshot measurements and crude non-selective inhibitors. The correct way to evaluate SOCE is to record cytosolic Ca2+ elevations over a period of time in absence and presence of extracellular Ca2+. However, even such recordings can be unreliable since the phenomenon is being investigated hours after PA stimulation. So, the only definitive way to demonstrate that Orai channels are indeed active during this process is through patch clamp electrophysiology of PA stimulated cells.

      Ans) We agree with the reviewer’s comment that the final proof of SOCE activation is activation of Orai channel evidenced by electrophysiology. However, we have shown STIM1 aggregation colocalized with Ora1, which is another strong evidence of SOCE channel activation (Vaca L. Cell Calcium 47:199, 2010). Such a paper showing the role of SOCE aggregation in SOCE activation was incorporated in the text (line 4 from the bottom of page 10) and References.

      Reviewer #2 (Public Review):

      In this manuscript by Kang et. al., the authors investigated the mechanisms of K+-efflux-coupled SOCE in NLRP3 inflammasome activation by LP(LPS+PA, and identified an essential role of TRPM2-mediated lysosomal Ca2+ release and subsequent IP3Rs-mediated ER Ca2+ release and store depletion in the process. K+ efflux is shown to be mediated by a Ca2+-activated K+ channel (KCa3.1). LP-induced cytosolic Ca2+ elevation also induced a delayed activation of ASK1 and JNK, leading to ASC oligomerization and NLRP3 inflammasome activation. Overall, this is an interesting and comprehensive study that has identified several novel molecular players in metabolic inflammation. The manuscript can benefit if the following concerns could be addressed:

      (1) The expression of TRPM2 in the lysosomes of macrophages needs to be more definitively established. For instance, the cADPR-induced TRPM2 currents should be abolished in the TRPM2 KO macrophages. Can you show the lysosomal expression of TRPM2, either with an antibody if available or with a fluorescently-tagged TRPM2 overexpression construct?

      Ans) We agree with the reviewer’s comment that TRPM2 expression on lysosome needs to be tested. We conducted confocal microscopy after immunofluorescent staining using anti-TRMP2 and -LAMP2 antibodies, which showed a certain portion of TRPM2 was colocalized with LAMP2. This result was incorporated as Figure 2-figure supplement 1A.

      (2) Can you use your TRPM2 inhibitor ACA to pharmacologically phenocopy some results, e.g., about [Ca2+]ER, [Ca2+]LY, and [Ca2+]i from the TRPM2 knockout? Ans) We agree with the reviewer’s comment that the effect of ACA on other experimental results needs to be shown. We did not study the effect of ACA on Ca2+ flux; however, we have observed that ACA inhibited IL-1β release in response to LPS+PA. This data was incorporated as Figure 3-figure supplement 1A.

      Author response image 2.

      (3) In Fig. S4A, bathing the cells in zero Ca2+ for three hours might not be ideal. Can you use a SOCE inhibitor, e.g, YM-58483, to make the point?

      Ans) We agree with the reviewer’s comment that SOCE inhibitor experiment would be necessary in addition to the experiment employing zero Ca2+. In fact, we have already used two SOCE inhibitors (2-APB and BTP2) (Figure 4-fig. supplement 1 B-D. Particularly, BTP2 experiment could eliminate possible role of ER Ca2+ inhibition that might occur when 2-APB was employed.

      (4) In Fig. 1A, you need a positive control, e.g., ionomycin, to show that the GPN response was selectively reduced upon LP treatment.

      Ans) We did not employ ionomycin as a control in this study. In our previous study using other agents inducing lysosomal Ca2+ efflux, we have observed lysosomal Ca2+ efflux with intact subsequent ionomycin response. While we did not include ionomycin in the current paper, we are positive that ionomycin response would be preserved.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      See Public Review.

      Reviewer #2 (Recommendations For The Authors):

      (5) In Fig. 4B, the red label should read "BAPTA-1 Dextran", but not "GAPTA-1 Dextran".

      (6) Writing should be improved in many sections.

    1. eLife assessment

      In this important manuscript, Abd El Hay and colleagues reveal a clear role of TRPV1 and TRPM2 receptors in warm temperature perception and present a technically unique experimental strategy to measure and analyze temperature preference behavior, which will have a lasting impact on the field. In addition to the behavioral data, which is strong, the study provides an analysis of cultured sensory neurons to controlled warmth stimuli - in this case, the evidence relating the activity of TRPM2 channels to the behavioral responses of animals is incomplete. Overall, the findings are of importance for neuroscientists, physiologists, and biophysicists, as there is still substantial discussion in the field regarding the contribution of TRP channels to different aspects of thermosensation.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors use an innovative behavior assay (chamber preference test) and standard calcium imaging experiments on cultured dorsal root ganglion (DRG) neurons to evaluate the consequences of global knockout of TRPV1 and TRPM2, and overexpression of TRPV1, on warmth detection. They find a profound effect of TRPM2 elimination in the behavioral assay, whereas elimination of TRPV1 has the largest effect on neuronal responses. These findings are of importance, as there is still substantial discussion in the field regarding the contribution of TRP channels to different aspects of thermosensation.

      Strengths:

      The chamber preference test is an important innovation compared to the standard two-plate test, as it depends on thermal information sampled from the entire skin, as opposed to only the plantar side of the paws. With this assay, and the detailed analysis, the authors provide strong supporting evidence for the role of TRPM2 in warmth avoidance. The conceptual framework using the Drift Diffusion Model provides a first glimpse of how this decision of a mouse to change between temperatures can be interpreted and may form the basis for further analysis of thermosensory behavior.

      Weaknesses:

      The authors juxtapose these behavioral data with calcium imaging data using isolated DRG neurons. Here, there are a few aspects that are less convincing.

      (1) The authors study warmth responses using DRG neurons after three days of culturing. They propose that these "more accurately reflect the functional properties and abundance of warm-responsive sensory neurons that are found in behaving animals." However, the only argument to support this notion is that the fraction of neurons responding to warmth is lower after three days of culture. This could have many reasons, including loss of specific subpopulations of neurons, or any other (artificial?) alterations to the neurons' transcriptome due to the culturing. The isolated DRGs are not selected in any way, so also include neurons innervating viscera not involved in thermosensation. If the authors wish to address actual changes in sensory nerves involved in warmth sensing in TRPM2 or TRPV1 KO mice without disturbing the response profile as a result of the isolation procedure, other approaches would be needed (e.g. skin-nerve recordings or in vivo DRG imaging).

      (2) The authors state that there is a reduction in warmth-sensitive DRG neurons in the TRPM2 knockout mice based on the data presented in Figure 2D. This is not convincing for the following reasons. First, the authors used t-tests (with FDR correction - yielding borderline significance) whereas three groups are compared here in three repetitive stimuli. This would require different statistics (e.g. ANOVA), and I am not convinced (based on a rapid assessment of the data) that such an analysis would yield any significant difference between WT and TRPM2 KO. Second, there seems to be a discrepancy between the plot and legend regarding the number of LOV analysed (21, 17, and 18 FOV according to the legend, compared to 18, 10, and 12 dots in the plot). Therefore, I would urge the authors to critically assess this part of the study and to reconsider whether the statement (and discussion) that "Trpm2 deletion reduces the proportion of warmth responders" should be maintained or abandoned.

      (3) It remains unclear whether the clear behavioral effect seen in the TRPM2 knockout animals is at all related to TRPM2 functioning as a warmth sensor in sensory neurons. As discussed above, the effects of the TRPM2 KO on the proportion of warmth-sensing neurons are at most very subtle, and the authors did not use any pharmacological tool (in contrast to the use of capsaicin to probe for TRPV1 in Figures S3 and S4) to support a direct involvement of TRPM2 in the neuronal warmth responses. Behavioral experiments on sensory-neuron-specific TRPM2 knockout animals will be required to clarify this important point.

      (4) The authors only use male mice, which is a significant limitation, especially considering known differences in warmth sensing between male and female animals and humans. The authors state "For this study, only male animals were used, as we aimed to compare our results with previous studies which exclusively used male animals (7, 8, 17, 43)." This statement is not correct: all four mentioned papers include behavioral data from both male and female mice! I recommend the authors to either include data from female mice or to clearly state that their study (in comparison with these other studies) only uses male mice.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors of the study use a technically well-thought-out approach to dissect the question of how far TRPV1 and TRPM2 are involved in the perception of warm temperatures in mice. They supplement the experimental data with a drift-diffusion model. They find that TRPM2 is required to trigger the preference for 31{degree sign}C over warmer temperatures while TRPV1 increases the fidelity of afferent temperature information. A lack of either channel leads to a depletion of warm-sensing neurons and in the case of TRPV1 to a deficit in rapid responses to temperature changes. The study demonstrates that mouse phenotyping can only produce trustworthy results if the tools used to test them measure what we believe they are measuring.

      Strengths:

      The authors tackle a central question in physiology to which we have not yet found sufficient answers. They take a pragmatic approach by putting existing experimental methods to the test and refining them significantly.

      Weaknesses:

      It is difficult to find weaknesses. Not only the experimental methods but also the data analysis have been refined meticulously. There is no doubt that the authors achieved their aims and that the results support their conclusions.

      There will certainly be some lasting impact on the future use of DRG cultures with respect to (I) the incubation periods, (II) how these data need to be analyzed, and (III) the numbers of neurons to be looked at.

      As for the CPT assay, the future will have to show if mouse phenotyping results are more accurate with this technique. I'm more fond of full thermal gradient environments. However, behavioural phenotyping is still one of the most difficult fields in somatosensory research.

    4. Reviewer #3 (Public Review):

      Summary and strengths:

      In the manuscript, Abd El Hay et al investigate the role of thermally sensitive ion channels TRPM2 and TRPV1 in warm preference and their dynamic response features to thermal stimulation. They develop a novel thermal preference task, where both the floor and air temperature are controlled, and conclude that mice likely integrate floor with air temperature to form a thermal preference. They go on to use knockout mice and show that TRPM2-/- mice play a role in the avoidance of warmer temperatures. Using a new approach for culturing DRG neurons they show the involvement of both channels in warm responsiveness and dynamics. This is an interesting study with novel methods that generate important new information on the different roles of TRPV1 and TRPM2 on thermal behavior.

      Open questions and weaknesses:

      (1) Differences in the response features of cells expressing TRPM2 and TRPV1 are central and interesting findings but need further validation (Figures 3 and 4). To show differences in the dynamics and the amplitude of responses across different lines and stimulus amplitudes more clearly, the authors should show the grand average population calcium response from all responsive neurons with error bars for all 3 groups for the different amplitudes of stimuli (as has been presented for the thermal stimuli traces). The authors should also provide a population analysis of the amplitude of the responses in all groups to all stimulus amplitudes. Prior work suggests that thermal detection is supported by an enhancement or suppression of the ongoing activity of sensory fibers innervating the skin. The authors should present any data on cells with ongoing activity.

      (2) The authors should better place their findings in context with the literature and highlight the novelty of their findings. The introduction builds a story of a 'disconnect' or 'contradictory' findings about the role of TRPV1 and TRPM2 in warm detection. While there are some disparate findings in the literature, Tan and McNaughton (2016) show a role for TRPM2 in the avoidance of warmth in a similar task, Paricio et al. (2020) show a significant reduction in warm perception in TRPM2 and TRPV1 knock out lines and Yarmolinksy et al. (2016) show a reduction in warm perception with TRPV1 inactivation. All these papers are therefore in agreement with the authors finding of a role for these channels in warm behavior. The authors should change their introduction and discussion to more correctly discuss the findings of these studies and to better pinpoint the novelty of their own work.

      (3) The responses of 60 randomly selected cells are shown in Figure 2B. But, looking at the TRPM2-/- data, warm responses appear more obvious than in WTs and the weaker responders of the WT group appear weaker than the equivalent group in the TRPV1-/- and TRPM2-/- data. This does not necessarily invalidate the results, but it may suggest a problem in the data selection. Because the correct classification of warm-sensitive neurons is central to this part of the study more validation of the classifier should be presented. For example, the authors could state if they trained the classifier using equal amounts of cells, show some randomly selected cells that are warm-insensitive for all genotypes, and show the population average responses of warm-insensitive neurons.

      (4) The interpretation of the main behavioral results and justification of the last figure is presented as the result of changes in sensing but differences in this behavior could be due to many factors and this needs clarification and discussion. (i) The authors mention that 'crucially temperature perception is not static' and suggest that there are fluctuating changes in perception over time and conclude that their modelling approach helps show changes in temperature detection. They imply that temperature perceptual threshold changes over time, but the mouse could just as easily have had exactly the same threshold throughout the task but their motivation (or some other cognitive variable) might vary causing them to change chamber. The authors should correct this. (ii) Likewise, from their fascinating and high-profile prior work the authors suggest a model of internal temperature sensing whereby TRPM2 expression in the hypothalamus acts as an internal sensory of body temperature. Given this, and the slow time course of the behavior in chambers with different ambient temperatures, couldn't the reason for the behavioral differences be due to central changes in hypothalamic processing rather than detection by skin temperature? If TRPM2-/- were selectively ablated from the skin or the hypothalamus (these experiments are not necessary for this paper) it might be possible to conclude whether sensation or body temperature is more likely the root cause of these effects but, without further experiments it is tough to conclude either way. (iii) Because the ambient temperature is controlled in this behavior, another hypothesis is that warm avoidance could be due to negative valence associated with breathing warm air, i.e. a result of sensation within the body in internal pathways, rather than sensing from the external skin. Overall, the authors should tone down conclusions about sensation and present a more detailed discussion of these points.

      (5) It is an excellent idea to present a more in-depth analysis of the behavioral data collected during the preference task, beyond 'the mouse is on one side or the other'. However, the drift-diffusion approach is complex to interpret from the text in the results and the figures. The results text is not completely clear on which behavioral parameters are analyzed and terms like drift, noise, estimate, and evidence are not clearly defined. Currently, this section of the paper slightly confuses and takes the paper away from the central findings about dynamics and behavioral differences. It seems like they could come to similar conclusions with simpler analysis and simpler figures.

      (6) In Figure 2D the % of warm-sensitive neurons are shown for each genotype. Each data point is a field of view, however, reading the figure legend there appear to be more FOVs than data points (eg 10 data points for the TRPV1-/- but 17 FOVs). The authors should check this.

      (7) Can the authors comment on why animals with over-expression of TRPV1 spend more time in the warmest chamber to start with at 38C and not at 34C?

    5. Author Response:

      Reviewer #1:

      Summary:

      The authors use an innovative behavior assay (chamber preference test) and standard calcium imaging experiments on cultured dorsal root ganglion (DRG) neurons to evaluate the consequences of global knockout of TRPV1 and TRPM2, and overexpression of TRPV1, on warmth detection. They find a profound effect of TRPM2 elimination in the behavioral assay, whereas elimination of TRPV1 has the largest effect on neuronal responses. These findings are of importance, as there is still substantial discussion in the field regarding the contribution of TRP channels to different aspects of thermosensation.

      Strengths:

      The chamber preference test is an important innovation compared to the standard two-plate test, as it depends on thermal information sampled from the entire skin, as opposed to only the plantar side of the paws. With this assay, and the detailed analysis, the authors provide strong supporting evidence for the role of TRPM2 in warmth avoidance. The conceptual framework using the Drift Diffusion Model provides a first glimpse of how this decision of a mouse to change between temperatures can be interpreted and may form the basis for further analysis of thermosensory behavior.

      Weaknesses:

      The authors juxtapose these behavioral data with calcium imaging data using isolated DRG neurons. Here, there are a few aspects that are less convincing.

      (1) The authors study warmth responses using DRG neurons after three days of culturing. They propose that these "more accurately reflect the functional properties and abundance of warm-responsive sensory neurons that are found in behaving animals." However, the only argument to support this notion is that the fraction of neurons responding to warmth is lower after three days of culture. This could have many reasons, including loss of specific subpopulations of neurons, or any other (artificial?) alterations to the neurons' transcriptome due to the culturing. The isolated DRGs are not selected in any way, so also include neurons innervating viscera not involved in thermosensation. If the authors wish to address actual changes in sensory nerves involved in warmth sensing in TRPM2 or TRPV1 KO mice without disturbing the response profile as a result of the isolation procedure, other approaches would be needed (e.g. skin-nerve recordings or in vivo DRG imaging).

      We agree that there could be several reasons as to why the responses of cultured DRGs are reduced compared to the acute/short-term cultures. It is possible ––and likely–– that

      transcriptional changes happen over the course of the culturing period. It is also possible that it is a mere coincidence that the 3-day cultures have a response profile more similar to the in vivo situation than the acute cultures. In the revised manuscript, we will therefore tone down the claim that the 3-day cultures mirror the native conditions more appropriately.

      Nevertheless, our results clearly show that acute cultures have a response profile that is much more similar to damaged/”inflamed” neurons, irrespective of any comparison to the 3 daycultures. Therefore, we believe, it is helpful to include this data to make scientists aware that acute cultures are very different to non-inflamed native/in vivo DRG neurons that many researchers use in their experiments.

      In some experiments not shown in the first version of our manuscript, we applied the TRPchannel agonists Menthol, Capsaicin and AITC (mustard oil) consecutively in a few 3-day

      cultures. We also have Capsaicin responses from overnight cultures. We will attempt to correlate the percentage of the neurons responsive to these TRPV1, TRPM8 and TRPA1

      ion channel agonists in our cultures to the percentages of neurons found to express the respective TRP ion channels (TRPM8, TRPV1 and TRPA1) in vivo. While this type of

      analysis won’t prove that 3-day cultures are similar to the in vivo situation (even if there is good correlation between the in vitro and in vivo results), it might support the usage of 3-day cultures as a model.

      (2) The authors state that there is a reduction in warmth-sensitive DRG neurons in the TRPM2 knockout mice based on the data presented in Figure 2D. This is not convincing for the following reasons. First, the authors used t-tests (with FDR correction - yielding borderline significance) whereas three groups are compared here in three repetitive stimuli. This would require different statistics (e.g. ANOVA), and I am not convinced (based on a rapid assessment of the data) that such an analysis would yield any significant difference between WT and TRPM2 KO. Second, there seems to be a discrepancy between the plot and legend regarding the number of LOV analysed (21, 17, and 18 FOV according to the legend, compared to 18, 10, and 12 dots in the plot). Therefore, I would urge the authors to critically assess this part of the study and to reconsider whether the statement (and discussion) that "Trpm2 deletion reduces the proportion of warmth responders" should be maintained or abandoned.

      Yes, we agree that the statistical tests indicated by the referee are more appropriate/robust for the data shown in Figures 1F, 2D, and 4G.

      When we perform 2-way repeated measures ANOVA and subsequent multiple comparison test (with Dunnets correction) against Wildtype, for data shown in Fig. 2D, both the main effect (Genotype) and the interaction term (Stimulus x Genotype) are significant. The multiple comparison yields very similar result as in the current manuscript, with the difference that the TRPM2-KO data for the 2nd stimulus (~36°C) is borderline significant (with a p-value of p=0.050).

      Due to the possible dependence of the repeated temperature stimuli and the variability of each stimulus between FOVs (Fig. 2C), it is possible that a mixed-effect model that accounts for these effects is more appropriate.

      Similarly, for plots 1F and 4G, Genotype (either as main effect or as interaction with Time) is significant after a repeated measures two-way ANOVA. The multiple comparisons (with Bonferroni correction) only changed the results marginally at individual timepoints, without affecting the overall conclusions. The exception is Fig. 4G at 38°C, where the interaction of Time and Genotype is significant, but no individual timepoint-comparison is significant after Bonferroni correction.

      The main difference between the results presented above and the ones presented in the manuscript is the choice of the multiple comparison correction. We originally opted for the falsediscovery rate (FDR) approach as it is less prone to Type II errors (false negatives) than other methods such as Sidaks or Bonferroni, particularly when correcting for a large number of tests. However, we are mainly interested in whether the genotypes differ in their behavior in each temperature combination and the significant ANOVA tests for Fig. 1F and 4G support that point. The statistical test and comparison used in the current version of the manuscript, comparing behavior at individual/distinct timepoints, are interesting, but less relevant (and potentially distracting), as we do not go into the details about the behavior at any given/distinct timepoint in the assay.

      Therefore, and per suggestion of the reviewer, we will update the statistics in the revised version of the manuscript. Also, we will report the correct number of FOVs in the legend.

      (3) It remains unclear whether the clear behavioral effect seen in the TRPM2 knockout animals is at all related to TRPM2 functioning as a warmth sensor in sensory neurons. As discussed above, the effects of the TRPM2 KO on the proportion of warmth-sensing neurons are at most very subtle, and the authors did not use any pharmacological tool (in contrast to the use of capsaicin to probe for TRPV1 in Figures S3 and S4) to support a direct involvement of TRPM2 in the neuronal warmth responses. Behavioral experiments on sensory-neuron-specific TRPM2 knockout animals will be required to clarify this important point.

      As mentioned above, we will tone down the correlation between the cellular and behavioral data and further stress the possibility that the Trpm2-KO phenotype is possibly related to the function of the ion channel outside of DRGs.

      (4) The authors only use male mice, which is a significant limitation, especially considering known differences in warmth sensing between male and female animals and humans. The authors state "For this study, only male animals were used, as we aimed to compare our results with previous studies which exclusively used male animals (7, 8, 17, 43)." This statement is not correct: all four mentioned papers include behavioral data from both male and female mice! I recommend the authors to either include data from female mice or to clearly state that their study (in comparison with these other studies) only uses male mice.

      In the studies by Tan et al. And Vandevauw et al. Only male animals were used for the behavioral experiments. Yarmolinsky et al. And Paricio-Montesinons et al. used both males and females while, as far as we can tell, only Paricio-Montesions et al. Reported that no difference was observed between the sexes. This is a valid point though -- when our study started 6-7 years ago, we only used male mice (as did many other researchers) and this we would now do differently. Nevertheless, we included some female mice in these experiments and will reevaluate if the numbers are sufficient so that we can generalize the phenotypes to both sexes or report differences in the revised ms.

      Wildtypes are all C57bl/6N from the provider Janvier. Generally, all lines are backcrossed to C57bl/6 mice and additionally inbreeding was altered every 4-6 generations by crossing to C57bl/6. Exactly how many times the Trp channel KOs have been backcrossed to C57bl/6 mice we cannot exactly state.

      Reviewer #3:

      Summary and strengths:

      In the manuscript, Abd El Hay et al investigate the role of thermally sensitive ion channels TRPM2 and TRPV1 in warm preference and their dynamic response features to thermal stimulation. They develop a novel thermal preference task, where both the floor and air temperature are controlled, and conclude that mice likely integrate floor with air temperature to form a thermal preference. They go on to use knockout mice and show that TRPM2-/- mice play a role in the avoidance of warmer temperatures. Using a new approach for culturing DRG neurons they show the involvement of both channels in warm responsiveness and dynamics. This is an interesting study with novel methods that generate important new information on the different roles of TRPV1 and TRPM2 on thermal behavior.

      Open questions and weaknesses:

      (1) Differences in the response features of cells expressing TRPM2 and TRPV1 are central and interesting findings but need further validation (Figures 3 and 4). To show differences in the dynamics and the amplitude of responses across different lines and stimulus amplitudes more clearly, the authors should show the grand average population calcium response from all responsive neurons with error bars for all 3 groups for the different amplitudes of stimuli (as has been presented for the thermal stimuli traces). The authors should also provide a population analysis of the amplitude of the responses in all groups to all stimulus amplitudes. Prior work suggests that thermal detection is supported by an enhancement or suppression of the ongoing activity of sensory fibers innervating the skin. The authors should present any data on cells with ongoing activity.

      We will include grand average population analysis of the different groups in the revised version.

      Concerning the point about ongoing activity: We are not sure if it is possible in neuronal cultures to faithfully recapitulate ongoing activity. Ongoing activity has been mostly recorded in skinnerve preparations (or in older studies in other types of nerve recordings) and there are only very few studies that show ongoing activity in cultured experiments and then the ongoing activity only starts in sensory neuron cultures when cultured for even longer time periods than 3 days (Ref.: doi: 10.1152/jn.00158.2018). We have very few cells that show some spontaneous activity, but these are too few to draw any conclusions. In any case, nerve fibers might be necessary to drive ongoing activity which are absent from our cultures.

      (2) The authors should better place their findings in context with the literature and highlight the novelty of their findings. The introduction builds a story of a 'disconnect' or 'contradictory' findings about the role of TRPV1 and TRPM2 in warm detection. While there are some disparate findings in the literature, Tan and McNaughton (2016) show a role for TRPM2 in the avoidance of warmth in a similar task, Paricio et al. (2020) show a significant reduction in warm perception in TRPM2 and TRPV1 knock out lines and Yarmolinksy et al. (2016) show a reduction in warm perception with TRPV1 inactivation. All these papers are therefore in agreement with the authors finding of a role for these channels in warm behavior. The authors should change their introduction and discussion to more correctly discuss the findings of these studies and to better pinpoint the novelty of their own work.

      Paricio-Montesinos et al. argue that TRPM8 is crucial for the detection of warmth, as TRPM8-KO animals are incapable of learning the operant task. TRPM2-KO animals and, to a smaller extent TRPV1-KO animals, have reduced sensitivity in the task, but are still capable of learning/performing the task. However, in our chamber preference assay this is reversed: TRPM2-KO animals lose the ability to differentiate warm temperatures while TRPM8 appears to play no major role. A commonality between the two studies is that while TRPV1 affects the detection of warm temperatures in the different assays, this ion channel appears not to be crucial.

      Similarly, Yarmolinsky et al. show that Trpv1-inactivation only increases the error rate in their operant assay (from ~10% to ~30%), without testing TRPM2. And Tan et al. show the

      importance of TRPM2 in the preference task, without testing for TRPV1.

      More generally, the choice of the assay, being either an operant task (Paricio-Montesinos et al. and Yarmolinsky et al.) or a preference assay without training of the mice (Tan et al. and our data here), might be important and different TRP receptors may be relevant for different types of temperature assays, which we will extend on in the discussion in the revised manuscript. While our results generally agree with the previous studies, they add a different perspective on the analysis of the behavior (with correlation to cellular data). We will adjust the manuscript to highlight the advances more clearly.

      (3) The responses of 60 randomly selected cells are shown in Figure 2B. But, looking at the TRPM2-/- data, warm responses appear more obvious than in WTs and the weaker responders of the WT group appear weaker than the equivalent group in the TRPV1-/- and TRPM2-/- data. This does not necessarily invalidate the results, but it may suggest a problem in the data selection. Because the correct classification of warm-sensitive neurons is central to this part of the study more validation of the classifier should be presented. For example, the authors could state if they trained the classifier using equal amounts of cells, show some randomly selected cells that are warm-insensitive for all genotypes, and show the population average responses of warm-insensitive neurons.

      The classifier was trained on a balanced dataset of 1000 (500 responders and 500 nonresponders), manually labelled traces across all 5 temperature stimuli. The prediction accuracy was 98%. We will describe more clearly how the classifier was trained and include examples and also show the population average responses in the revised manuscript.

      (4) The interpretation of the main behavioral results and justification of the last figure is presented as the result of changes in sensing but differences in this behavior could be due to many factors and this needs clarification and discussion. (i) The authors mention that 'crucially temperature perception is not static' and suggest that there are fluctuating changes in perception over time and conclude that their modelling approach helps show changes in temperature detection. They imply that temperature perceptual threshold changes over time, but the mouse could just as easily have had exactly the same threshold throughout the task but their motivation (or some other cognitive variable) might vary causing them to change chamber. The authors should correct this. (ii) Likewise, from their fascinating and high-profile prior work the authors suggest a model of internal temperature sensing whereby TRPM2 expression in the hypothalamus acts as an internal sensory of body temperature. Given this, and the slow time course of the behavior in chambers with different ambient temperatures, couldn't the reason for the behavioral differences be due to central changes in hypothalamic processing rather than detection by skin temperature? If TRPM2-/- were selectively ablated from the skin or the hypothalamus (these experiments are not necessary for this paper) it might be possible to conclude whether sensation or body temperature is more likely the root cause of these effects but, without further experiments it is tough to conclude either way. (iii) Because the ambient temperature is controlled in this behavior, another hypothesis is that warm avoidance could be due to negative valence associated with breathing warm air, i.e. a result of sensation within the body in internal pathways, rather than sensing from the external skin. Overall, the authors should tone down conclusions about sensation and present a more detailed discussion of these points.

      We are sorry that the statement including the phrase “crucially temperature perception is not static” is ambiguous; what we meant to say is that with the mouse moving across the two chambers, the animal experiences different temperatures over time (not that the perceptual threshold of the mouse changes). We will clarify this stament in the revised version of the manuscript.

      But even so, it could be that some other variable (motivation etc) makes the mouse change the chamber; we hypothesize that this variable (whatever it might be) is still modulated by temperature (at least this would be the likeliest explanation that we see).

      As for the aspect of internal/hypothalamic temperature sensing: we have included this possibility already in the discussion but will further emphasize this possibility in the revised manuscript.

      As for the point of negative valence mediated by breathing in warm air: yes, presumably this could also be possible. The aspect of valence is in interesting aspect by itself: would the mice be rather repelled from the (uncomfortable) hot plate or more attracted to the (more comfortable) thermoneutral plate, or both? Something to elucidate in a different study.

      (5) It is an excellent idea to present a more in-depth analysis of the behavioral data collected during the preference task, beyond 'the mouse is on one side or the other'. However, the drift-diffusion approach is complex to interpret from the text in the results and the figures. The results text is not completely clear on which behavioral parameters are analyzed and terms like drift, noise, estimate, and evidence are not clearly defined. Currently, this section of the paper slightly confuses and takes the paper away from the central findings about dynamics and behavioral differences. It seems like they could come to similar conclusions with simpler analysis and simpler figures.

      We will reassess the description of the drift diffusion model and explain it more clearly. Additionally, we will assess whether we can introduce the drift diffusion model and analysis better at the beginning of the study, subsequent to Figure 1 to have the model and this type of analysis coherent with the first behavior results (instead of introducing the model only at the very end).

      (6) In Figure 2D the % of warm-sensitive neurons are shown for each genotype. Each data point is a field of view, however, reading the figure legend there appear to be more FOVs than data points (eg 10 data points for the TRPV1-/- but 17 FOVs). The authors should check this.

      We check and make sure that in the revised manuscript the number of FOVs mentioned in the legend and the number shown in the Figure 2D are in agreement.

      (7) Can the authors comment on why animals with over-expression of TRPV1 spend more time in the warmest chamber to start with at 38C and not at 34C?

      This is an interesting observation that we did not consider before. A closer look at Figure 4H reveals that the majority of the TRPV1-OX animals, have a proportionally long first visit to the 38°C room. We can only speculate why this is the case. We cannot rule out that this a technical shortcoming of the assay and how we conduced it – but we don’t observe this for the wildtype mice, thus it is rather unlikely a technical problem. It is possible that this is a type of “freezing-” (or “startle-“) behavior when the animals first encounter the 38°C temperature. Freezing behaviors in mice can be observed when sudden/threatening stimuli are applied. It is possible that, in the TRPV1-overexpressing animals, the initial encounter with 38°C leads to activation of a larger proportion of cells (compared to WT ctrls), possibly signaling a “painful” stimulus, and thus leading to this startle effect. It is noteworthy, however, that with more stringent repeated measure statistics applied as suggested by the referees, the difference at the first measured time point in Fig. 4G is not significantly different anymore (see comment #2 above. This does not rule out that this might be a true effect, but such a claim would benefit from additional experiments that test such and hypothesis more rigorously.

    1. eLife assessment

      This work presents an important online platform designed to facilitate the exploration of genes and genetic pathways implicated in human aging. Leveraging a new inference methodology, the tool enables the identification and visualization of key genes and tissues impacted by aging, facilitating scientific discovery. The methods and analyses are convincing and will be broadly used by scientists aiming to mine human aging RNA-seq data.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 1.1: The distinction of PIGS from nearby OPA, which has also been implied in navigation and ego-motion, is not as clear as it could be.

      Response1.1: The main “functional” distinction between TOS/OPA and PIGS is that TOS/OPA responds preferentially to moving vs. stationary stimuli (even concentric rings), likely due to its overlap with the retinotopic motion-selective visual area V3A, for which this is a defining functional property (e.g. Tootell et al., 1997, J Neurosci). In comparison, PIGS does not show such a motion-selectivity. Instead, PIGS responds preferentially to more complex forms of motion within scenes.

      Moreover, PIGS and TOS/OPA are located in differently relative to the retinotopic visual areas. Briefly, PIGS is located adjacent to areas IPS3-4 while TOS/OPA overlaps with areas V3A/B and IPS0 (V7). This point is now highlighted in the new experiment 3b and the new Figure 6. In this revision, we also tried to better highlight these point in sections 4.3, 4.4 and 4.5. (see also the response to the first comment from Reviewer #2).

      Reviewer 2:

      Comment 2.1: First, the scene-selective region identified appears to overlap with regions that have previously been identified in terms of their retinotopic properties. In particular, it is unclear whether this region overlaps with V7/IPS0 and/or IPS1. This is particularly important since prior work has shown that OPA often overlaps with v7/IPS0 (Silson et al, 2016, Journal of Vision). The findings would be much stronger if the authors could show how the location of PIGS relates to retinotopic areas (other than V6, which they do currently consider). I wonder if the authors have retinotopic mapping data for any of the participants included in this study. If not, the authors could always show atlas-based definitions of these areas (e.g. Wang et al, 2015, Cerebral Cortex).

      Response 2.1: We thank the reviewers for reminding us to more clearly delineate this issue of possible overlap, including the information provided by Silson et al, 2016. The issue of possible overlap between area TOS/OPA and the retinotopic visual areas, both in humans and non-human primates, was also clarified by our team in 2011 (Nasr et al., 2011). As you can see in Figure 6 (newly generated), and consistent with those previous studies, TOS/OPA overlaps with visual areas V3A/B and V7. Whereas PIGS is located more dorsally close to IPS3-4. As shown here, there is no overlap between PIGS and TOS/OPA and there is no overlap between PIGS and areas V3A/B and V7.

      To more directly address the reviewer’s concern, in this revision, we have added a new experiment (Experiment 3b) in which we have shown the relative position of PIGS and the retinotopic areas in two individual subjects (Figure 6). All the relevant points are also discussed in section 4.3.

      Comment 2.2: Second, recent studies have reported a region anterior to OPA that seems to be involved in scene memory (Steel et al, 2021, Nature Communications; Steel et al, 2023, The Journal of Neuroscience; Steel et al, 2023, biorXiv). Is this region distinct from PIGS? Based on the figures in those papers, the scene memory-related region is inferior to V7/IPS0, so characterizing the location of PIGS to V7/IPS0 as suggested above would be very helpful here as well. If PIGS overlaps with either of V7/IPS0 or the scene memory-related area described by Steel and colleagues, then arguably it is not a newly defined region (although the characterization provided here still provides new information).

      Response 2.2: The lateral-place memory area (LPMA) is located on the lateral brain surface, anterior relative to the IPS (see Figure 1 from Steel et al., 2021 and Figure 3 from Steel et al., 2023). In contrast, PIGS is located on the posterior brain surface, also posterior relative to the IPS. In other words, they are located on two different sides of a major brain sulcus. In this revision we have clarified this point, including the citations by Steel and colleagues in section 4.3.

      Comments 2.3: Another reason that it would be helpful to relate PIGS to this scene memory area is that this scene memory area has been shown to have activity related to the amount of visuospatial context (Steel et al, 2023, The Journal of Neuroscience). The conditions used to show the sensitivity of PIGS to ego-motion also differ in the visuospatial context that can be accessed from the stimuli. Even if PIGS appears distinct from the scene memory area, the degree of visuospatial context is an alternative account of what might be represented in PIGS.

      Response 2.3: The reviewer raises an interesting point. One minor confusion is that we may be inadvertently referring to two slightly different types of “visuospatial context”. Specifically, the stimuli used in the ego-motion experiment here (i.e. coherently vs. incoherently changing scenes) represent the same scenes, and the only difference between the two conditions is the sequence of images across the experimental blocks. In that sense, the two experimental conditions may be considered to have the same visuospatial “context”. However, it could be also argued that the coherently changing scenes provide more information about the environmental layout. In that case, considering the previous reports that PPA/TPA and RSC/MPA may also be involved in layout encoding (Epstein and Kanwisher 1998; Wolbers et al. 2011), we expected to see more activity within those regions in response to coherently compared incoherently changing scenes. These issues are now more explicitly discussed in the revised article (section 4.6).

      Reviewer 3:

      Comment 3.1: There are few weaknesses in this work. If pressed, I might say that the stimuli depicting ego-motion do not, strictly speaking, depict motion, but only apparent motion between 2s apart photographs. However, this choice was made to equate frame rates and motion contrast between the 'ego-motion' and a control condition, which is a useful and valid approach to the problem. Some choices for visualization of the results might be made differently; for example, outlines of the regions might be shown in more plots for easier comparison of activation locations, but this is a minor issue.

      Response 3.1: We thank the reviewer for these constructive suggestions, and we agree with their comment that the ego-motion stimuli are not smooth, even though they were refreshed every 100 ms. However, the stimuli were nevertheless coherent enough to activate areas V6 and MT, two major areas known to respond preferentially to coherent compared to incoherent motion.

      Reviewer #1 (Recommendations For The Authors):

      I enjoyed reading this article. I have a few suggestions for improvement:

      (1) Delineation from OPA: The OPA has been described in quite similar terms as PIGS, with its involvement in ego-motion (e.g., crawling, walking) and navigation in general (e.g., Dilks' recent work; Bonner and Epstein). The authors address the distinction in section 4.4. Unlike Kamps et al. (2016) and Jones et al. (2023), the authors found weak or no evidence for ego-motion in OPA. They explain this discrepancy with differences in refresh rates and different levels of spatial smoothing of the fMRI data. It is not clear why these fairly small methodological differences would lead to different findings of ego-motion in the OPA. Arguably, the OPA is the closest of the "established" scene areas to PIGS, both in anatomical location and in function. I would therefore appreciate a more detailed discussion of the differences between these two areas.

      Response: Jones et al. have also shown that ego-motion TOS/OPA activity when compared to scrambled scenes. This is fundamentally different than what we have shown here, which coherently vs. incoherently changing scenes (i.e. not a small difference). Also, Kamps et al. used static scenes as a control which, considering TOS/OPA motion-selectivity, have a large impact on TOS/OPA response.

      (2) Random effects analysis: The authors mention using a "random effects analysis" for several of their experiments. I would ask them to provide more details on what statistical models were used here. Were they purely random-effects models or actually mixed-effects models? What were the factors that entered into the analysis? Providing more detail would make the analysis techniques more transparent.

      Response: This point is now clarified in the Methods section.

      (3) Data and code availability: The authors write that data and code "are ready to be shared upon request." (section 2.5) In the spirit of transparency and openness, I strongly encourage the authors to make the data publicly available, e.g., on OSF or OpenNeuro. In particular, having probabilistic maps of PIGS available will allow other researchers to include PIGS in their analysis pipelines, making the current work more impactful.

      Response: We have made the probabilistic labels available to the public. This point is now highlighted in section 2.5.

      (4) Minor comments on the writing that caught my eye while reading the article:

      • Line 27: "in the human brain".

      Response: Done.

      -Line 30: I don't agree with the characterization of the previous model of scene perception as "simplistic." Adding one additional ROI makes it no less simplistic. Perhaps the authors can rephrase to make this slightly less antagonistic?

      Response: Done.

      • Line 71: it is not clear why NHPs are relevant here.

      Response: We decided to keep the text intact.

      • Line 138" "were randomized".

      Response: Done.

      • Line 152: "consisting".

      Response: Done.

      • Line 155: "sets" (plural).

      Response: Done.

      • Lines 253-255: Why were the 3T spatially smoothed but not the 7T data? This seems odd.

      Response: We kept the text intact.

      • Line 481: "we found strong motion selectivity" (remove "a").

      Response: Done.

      • Line 564: a word is missing, probably: "a stronger effect of ego-motion".

      Response: Done.

      • Line 591: "controlling spatial attention" (remove "the").

      Response: Done.

      • Line 591 and 594: Both sentences start with "However". I think the first of these should not because it is setting up the contrast for the second sentence.

      Response: Done.

      • Line 607: "higher-level" (hyphen).

      Response: Done.

      • Throughout the manuscript: adverbial phrases such as "(in)coherently changing" or "probabilistically localized" do not get a hyphen.

      Response: Done.

      Reviewer #2 (Recommendations For The Authors):

      The authors state that "All data, codes and stimuli are ready to be shared upon request". Ideally, these materials should be deposited in appropriate repositories (e.g. OpenMRI, GitHub) and not require readers to contact the authors to obtain such materials.

      Other Comments:

      (a) The title ("A previously undescribed scene-selective site is the key to encoding ego-motion in natural environments") is potentially misleading - the work was not conducted in a natural environment. At best, you could say they are 'naturalistic stimuli'. Also, in what sense is PIGS "key" to encoding ego-motion - the study just shows sensitivity to this factor.

      Response: We changed the title to “naturalistic environments”.

      (b) Figure 1 - I'm not sure what point the authors are trying to make with Figure 1. The comparison is between a highly smoothed, group fixed-effects analysis and a less-smoothed individual subject analysis. The differences between the two could reflect group vs. individual, highly-smoothed (5 mm) versus less-smoothed (2 mm), or differences in thresholding. If the thresholding were lower for the group analysis, it would probably start to look more similar to the individual subject. As it stands, this figure isn't particularly informative, it seems redundant with Figure 2, and Figure 1A is not even referenced in the main text. Further, fixed effects analyses are relatively uncommon in the recent literature, so their inclusion is unusual.

      Response: Figure 1A is a replication of the data/method used in Nasr et al., 2011 and it will help the readers see the difference between the “traditional” scene-selectivity maps generated based on group-averaging” vs. data from individual subjects. In this case, we decided not to change the Figure.

      (c) Figure 3 - why are the two sets of maps shown at different thresholds? For 3B given the larger sample size, it is expected that the extent of the significant activations will increase. Currently the higher threshold for 3B and the smaller range for 3A is making the sets of maps look more comparable.

      Response: As the reviewer noticed, the number of subjects is larger in Figure 3B compared to 3A. The main point of this figure is to show that the PIGS activity center does not vary across populations. Considering this point, we decided not to change this figure.

      (d) Figure 10 - why is the threshold lower than used for other figures? It would be helpful if there was consistent thresholding across figures.

      Response: Experiment 6 and Experiment 1 are based on different stimuli (see Methods). Also, among those subjects who participated in Experiment 1, two subjects did not participate in Experiment 6. These points are already highlighted in the text.

      (e) Figures - how about the AFNI approach of thresholding and showing sub-threshold data at the same time? (Taylor et al, 2023, Neuroimage).

      Response: We highly appreciate the methodology suggested by Taylor and colleagues. However, our main point here is to show the center of PIGS activity. In this condition, showing an unthresholded activity map doesn’t have any advantage over the current maps. Considering these points, we decided not to change the figures.

      (f) Coherent versus incoherent scenes - there are many differences between the coherent and incoherent scenes. Arguing that it must be ego-motion seems a little premature without further investigation. Activity anterior to OPA has been associated with the construction of an internal representation of a spatial environment (Steel et al., 2023, The Journal of Neuroscience). Could it be that this is the key effect, not really the ego-motion?

      Response: In this revision, we discussed the study by Steel et al., 2021 and 2023 in section 4.3.

      Reviewer #3 (Recommendations For The Authors):

      Overall, I think this is already an excellent contribution. The suggestions I have are minor and may help with the clarity of the results.

      (1) My main request of the authors would be to provide more points of reference in some of the figures with cortical maps. In many cases, the authors use arrows to point to the locations of activations of interest. However, the arrows in adjacent figures are often not placed in exactly the same places on maps that are meant to be compared. It would very much help the viewer to compare activations if the arrows pointing to activations or regions of interest were placed in identical locations for the same brains appearing in different sub-panels (e.g. in panels A and B of Figure 1). The underlying folds of the cortical surface provide some points of reference, but these are often occluded to different extents by data in figures that are meant to be compared.

      Response: To address the reviewer’s concern, we regenerated Figure 8 (Figure 7 in the previous submission) and we tried to put arrowheads in identical locations, as much as possible. Especially for PIGS, this point was also considered in Figures 2 and 3.

      (2) Outlines (such as those in Figure 5) are also very useful, and I would encourage broader use of them in other figures (e.g. Figures 7, 10, and 12). Figures 10 and 12 are on the fsaverage surface, so the same outlines could be used for them as for Figure 5.

      To be clear, it's possible to apprehend the results with the figures as they are, but I think a few small changes could help a lot.

      Response: In this revision, we added outlines to Figures 11 and 13 (Figure 10 and 12 in the previous submission). We did not add the outline to Figure 8 because it made it hard to see PIGS. Rather we used arrows (see the previous comment).

      Other minor points:

      In the method for Experiment 4, the authors write: "Other details of the experiment were similar to those in Experiment 1.". Similar or the same? The authors should clarify this statement, e.g. "the number of images per block, the number of blocks, the number of runs were the same as Experiment 1" - with any differences noted.

      Response: This point is now addressed in the Methods section.

      In Figure 8, it would be better to have the panel labels (A, B, C, D) in the upper left of each panel rather than the lower left.

      Response: We tried to keep the panels arrangement consistent across the figures. That is why letters are positioned like this.

      A final gentle suggestion: pycortex (http://github.com/gallantlab/pycortex) provides a means to visualize the flattened fsaveage surface with outlines for localized regions of interest and overlaid lines for major sulci. Though it is by no means necessary for publication, It would be lovely to see these results on that surface, which is freely available and downloadable via a pycortex command (surface here: https://figshare.com/articles/dataset/fsaverage_subject_for_pycortex/9916166)

      Response: We thank the reviewer for bringing pycortex to our attention. We will consider using it in our future studies.

    2. Reviewer #3 (Public Review):

      Summary:<br /> The authors report a scene-selective areas in the posterior intraparietal gyrus (PIGS). This area lies outside the classical three scene-selective regions (PPA/TPA, RSC/MPA, TOS/OPA), and is selective for ego motion.

      Strengths:<br /> The authors firmly establish the location and selectivity of the new area through a series of well-crafted controlled experiments. They show that the area can be missed with too much smoothing, thus providing a case for why it has not been previously described. They show that it appears in much the same location in different subjects, with different magnetic field strengths, and with different stimulus sets. Finally, they show that it is selective for ego motion - defined as series of sequential photographs of an egocentric trajectory along a path. They further clarify that the area is not generically motion selective by showing that it does not respond to biological motion without an egomotion component to it. All statistics are standard and sound; the evidence presented is strong.

      Weaknesses:<br /> There are a few weaknesses in this work. If pressed, I might say that the stimuli depicting ego motion do not, strictly speaking, depict motion, but only apparent motion between 2s apart photographs. However, this choice was made to equate frame rates and motion contrast between the 'ego motion' and a control condition, which is a useful and valid approach to the problem.

      This is a very strong paper.

    3. eLife assessment

      In this manuscript, the authors present a wealth of fMRI data at both 3T and 7T to identify a scene-selective region of the intraparietal gyrus ("PIGS") that appears to have some responsivity to characteristics of ego-motion. In a series of experiments, they delineate the anatomical location of PIGS and functionally differentiate it from nearby V6 and OPA. Evidence for these important findings is solid, but further investigations as to the role of this region in processing ego-motion will be needed to confirm this conclusion.

    4. Reviewer #2 (Public Review):

      Summary

      The authors report an extensive series of neuroimaging experiments (at both 3T and 7T) to provide evidence for a scene-selective visual area in human posterior parietal cortex (PIGS) that is distinct from the main three (parahippocampal place area, PPA; occipital place area, OPA; medial place area, MPA) typically reported in the literature. Further, they argue that in comparison with the other three, this region may specifically be involved in representing ego-motion in natural contexts. The characterization of this scene-selective region provides a useful reference point for studies of scene processing in humans.

      Strengths

      One of the major strengths of the work is the extensive series of experiments reported, showing clear reproducibility of the main finding and providing functional insight into the region studied. The results are clearly presented and convincing with careful comparison to retinotopic and scene-selective regions described in prior studies.

      Weaknesses

      While the results are strong and clear, the claim in the title ("A previously undescribed scene-selective site is the key to encoding ego-motion in naturalistic environments") is not fully supported. The results show that this scene-selective region is sensitive to visual cues that reflect ego-motion but not that it is "key" to encoding ego-motion. Further, there are many differences between the two types of stimuli used to test ego-motion and greater characterization of this scene-selective region will be needed to confirm this conclusion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings characterising the genomic features of E. coli isolated from neonatal meningitis from seven countries, and documents bacterial persistence and reinfection in two case studies. The genomic analyses are solid, although the inclusion of a larger number of isolates from more diverse geographies would have strengthened the generalisability of findings. The work will be of interest to people involved in the management of neonatal meningitis patients, and those studying E. coli epidemiology, diversity, and pathogenesis.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study uses whole genome sequencing to characterise the population structure and genetic diversity of a collection of 58 isolates of E. coli associated with neonatal meningitis (NMEC) from seven countries, including 52 isolates that the authors sequenced themselves and a further 6 publicly available genome sequences. Additionally, the study used sequencing to investigate three case studies of apparent relapse. The data show that in all three cases, the relapse was caused by the same NMEC strain as the initial infection. In two cases they also found evidence for gut persistence of the NMEC strain, which may act as a reservoir for persistence and reinfection in neonates. This finding is of clinical importance as it suggests that decolonisation of the gut could be helpful in preventing relapse of meningitis in NMEC patients.

      Strengths:

      The study presents complete genome sequences for n=18 diverse isolates, which will serve as useful references for future studies of NMEC. The genomic analyses are high quality, the population genomic analyses are comprehensive and the case study investigations are convincing.

      We agree

      Weaknesses:

      The NMEC collection described in the study includes isolates from just seven countries. The majority (n=51/58, 88%) are from high-income countries in Europe, Australia, or North America; the rest are from Cambodia (n=7, 12%). Therefore it is not clear how well the results reflect the global diversity of NMEC, nor the populations of NMEC affecting the most populous regions.

      The virulence factors section highlights several potentially interesting genes that are present at apparently high frequency in the NMEC genomes; however, without knowing their frequency in the broader E. coli population it is hard to know the significance of this.

      We acknowledged the limitations of our NMEC collection in the Discussion. We agree the prevalence of virulence factors in our collection is interesting. The limited size of our collection prevented further evaluation of the prevalence of these virulence factors in a broader E. coli population.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors present a robust genomic dataset profiling 58 isolates of neonatal meningitis-causing E. coli (NMEC), the largest such cohort to be profiled to date. The authors provide genomic information on virulence and antibiotic resistance genomic markers, as well as serotype and capsule information. They go on to probe three cases in which infants presented with recurrent febrile infection and meningitis and provide evidence indicating that the original isolate is likely causing the second infection and that an asymptomatic reservoir exists in the gut. Accompanying these results, the authors demonstrate that gut dysbiosis coincides with the meningitis.

      Strengths:

      The genomics work is meticulously done, utilizing long-read sequencing.

      The cohort of isolates is the largest to be sampled to date.

      The findings are significant, illuminating the presence of a gut reservoir in infants with repeating infection.

      We agree

      Weaknesses:

      Although the cohort of isolates is large, there is no global representation, entirely omitting Africa and the Americas. This is acknowledged by the group in the discussion, however, it would make the study much more compelling if there was global representation.

      We agree. In the Discussion we state this is likely a reflection of the difficulty in acquiring isolates causing neonatal meningitis, in particular from countries with limited microbiology and pathology resources.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Schembri et al performed a molecular analysis by WGS of 52 E. coli strains identified as "causing neonatal meningitis" from several countries and isolated from 1974 to 2020. Sequence types, virulence genes content as well as antibiotic-resistant genes are depicted. In the second part, they also described three cases of relapse and analysed their respective strains as well as the microbiome of three neonates during their relapse. For one patient the same E. coli strain was found in blood and stool (this patient had no meningitis). For two patients microbiome analysis revealed a severe dysbiosis.

      Major comments:

      Although the authors announce in their title that they study E. coli that cause neonatal meningitis and in methods stipulate that they had a collection of 52 NMEC, we found in Supplementary Table 1, 29 strains (therefore most of the strains) isolated from blood and not CSF. This is a major limitation since only strains isolated from CSF can be designated with certainty as NMEC even if a pleiocytose is observed in the CSF. A very troubling data is the description of patient two with a relapse infection. As stated in the text line 225, CSF microscopy was normal and culture was negative for this patient! Therefore it is clear that patient without meningitis has been included in this study.

      We have reviewed the clinical data for our 52 NMEC isolates, noting that for some of the older Finish isolates we relied on previous publications. This data is shown in Table S1. To address the Reviewer’s comment, we have added the following text to the methods section (new text underlined).

      ‘The collection comprised 42 isolates from confirmed meningitis cases (29 cultured from CSF and 13 cultured from blood) and 10 isolates from clinically diagnosed meningitis cases (all cultured from blood).’

      Patient 2 was initially diagnosed with meningitis based on a positive blood culture in the presence of CSF pleocytosis (>300 WBCs, >95% polymorphs). We understand there may be some confusion with reference to a relapsed infection, which we now more accurately describe as recrudescent invasive infection in the revised manuscript.

      Another major limitation (not stated in the discussion) is the absence of clinical information on neonates especially the weeks of gestation. It is well known that the risk of infection is dramatically increased in preterm neonates due to their immature immunity. Therefore E. coli causing infection in preterm neonates are not comparable to those causing infection in term neonates notably in their virulence gene content. Indeed, it is mentioned that at least eight strains did not possess a capsule, we can speculate that neonates were preterm, but this information is lacking. The ages of neonates are also lacking. The possible source of infection is not mentioned, notably urinary tract infection. This may have also an impact on the content of VF.

      We agree. In the Discussion we now note the following (new text underlined):

      ‘… we did not have clinical data on the weeks of gestation for all patients, and thus could not compare virulence factors from NMEC isolated from preterm versus term infants.’

      Submission to Medrxiv, a requirement for review of our manuscript at eLife, necessitated the removal of some patient identifying information, including precise age and detailed medical history.

      Sequence analysis reveals the predominance of ST95 and ST1193 in this collection. The high incidence of ST95 is not surprising and well previously described, therefore, the concluding sentence line 132 indicating that ST95 E. coli should exhibit specific virulence features associated with their capacity to cause NM does not add anything. On the contrary, the high incidence of ST1193 is of interest and should have been discussed more in detail. Which specific virulence factors do they harbor? Any hypothesis explaining their emergence in neonates?

      We compared the virulence factors of ST95 and ST1193 and summarized this information in Figure 4. We also discussed how the K1 polysialic acid capsule in ST95 and ST1193 could contribute to the emergence of these STs in NM. Specifically, we stated the following: ‘We speculate this is due to the prevailing K1 polysialic acid capsule serotype found in ST95 and the newly emerged ST1193 clone [22, 37] in combination with other virulence factors [15, 28, 29] (Figure 4) and the immature immune system of preterm infants.’

      In the paragraph depicted the VF it is only stated that ST95 contained significantly more VF than the ST1193 strains. And so what? By the way "significantly" is not documented: n=?, p=?

      We compared the prevalence of known virulence factors between ST95 and ST1193, and showed that ST95 strains in our collection contained significantly more virulence factors than the ST1193 strains. The P-value and the statistical test used were included in Supplementary Figure 3. To address the reviewers concern, we have now also added this to the main manuscript text as follows (new text underlined):

      ‘Direct comparison of virulence factors between ST95 and ST1193, the two most dominant NMEC STs, revealed that the ST95 isolates (n = 20) contained significantly more virulence factors than the ST1193 isolates (n=9), p-value < 0.001, Mann-Whitney two-tailed unpaired test (Supplementary Table 1, Supplementary Figure 3).’

      The complete sequence of 18 strains is not clear. Results of Supplementary Table 2 are presented in the text and are not discussed.

      NMEC isolates that were completely sequenced in this study are indicated in bold and marked with an asterisk in Figure 1. This information is indicated in the figure legend and was provided in the original submission. All information regarding genomic island composition and location, virulence genes and plasmid and prophage diversity is included in Supplementary Table 2. This information is highly descriptive and thus we elected not to include it as text in the main manuscript.

      46 years is a very long time for such a small number of strains, making it difficult to put forward epidemiological or evolutionary theories. In the analysis of antibiotic resistance, there are no ESBLs. However, Ding's article (reference 34) and other authors showed that ESBLs are emerging in E. coli neonatal infection. These strains are a major threat that should be studied, unfortunately, the authors haven't had the opportunity to characterize such strains in their manuscript.

      We agree 46 years is a long time-span. The study by Ding et al examined 56 isolates comprised of 25 different STs isolated in China from 2009-2015, with ST1193 (n=12) and ST95 (n=10) the most common. Our study examined 58 isolates comprised of 22 different STs isolated in seven different geographic regions from 1974-2020, with ST1193 (n=9) and ST95 (n=20) the most common. Thus, despite differences in the geographic regions from which isolates in the two studies were sourced, there are similarities in the most common STs identified. The fact that we observed less antibiotic resistance, including a lack of ESBL genes, in ST1193 is likely due to the different regions from which the isolates were sourced. We acknowledged and discussed the potential of ST1193 harbouring multidrug resistance including ESBLs in our manuscript as follows:

      ‘Concerningly, the ST1193 strains examined here carry genes encoding several aminoglycoside-modifying enzymes, generating a resistance profile that may lead to the clinical failure of empiric regimens such as ampicillin and gentamicin, a therapeutic combination used in many settings to treat NM and early-onset sepsis [35, 36]. This, in combination with reports of co-resistance to third-generation cephalosporins for some ST1193 strains [22, 34], would limit the choice of antibiotic treatment.’

      Second part of the manuscript:

      The three patients who relapsed had a late neonatal infection (> 3 days) with respective ages of 6 days, 7 weeks, and 3 weeks. We do not know whether they are former preterm newborns (no term specified) or whether they have received antibiotics in the meantime.

      As noted above, patient ages were not disclosed to comply with submission to Medrxiv, a requirement for review of our manuscript at eLife.

      Patient 1: Although this patient had a pleiocytose in CSF, the culture was negative which is surprising and no explanation is provided. Therefore, the diagnosis of meningitis is not certain. Pleiocytose without meningitis has been previously described in neonates with severe sepsis. Line 215: no immunological abnormalities were identified (no details are given).

      We respectfully disagree with the reviewer. The diagnosis of meningitis is made unequivocally by the presence of a clearly abnormal CSF microscopy (2430 WBCs) and an invasive E. coli from blood culture. This does not seem controversial to the authors. We had believed it unnecessary to include this corroborative evidence, but have added the following to support our assertion:

      ‘The child was diagnosed with meningitis based on a cerebrospinal fluid (CSF) pleocytosis (>2000 white blood cells; WBCs, low glucose, elevated protein), positive CSF E. coli PCR and a positive blood culture for E. coli (MS21522).’

      On the contrary, the authors are surprised by the statement that CSF pleocytosis occurs in neonatal sepsis ‘without meningitis’ and do not know of any definitions of neonatal meningitis that are not tied to the presence of a CSF pleocytosis. Furthermore, the later isolation of E. coli from the CSF during the relapsed infection re-enforces the initial diagnosis.

      Patient 2: This patient had a recurrence of bacteremia without meningitis (line 225: CSF microscopy was normal and culture negative!). This case should be deleted.

      In a similar vein to the previous comment, we respectfully assert that this patient has clear evidence of meningitis (330 WBCs in the CSF, taken 24h after initiation of antibiotic treatment). In this case, molecular testing was not performed as, under the principle of diagnostic stewardship, it was not considered necessary by the clinical microbiologists and treating clinicians following the culture of E. coli in the bloodstream. We agree that this is not a case of recurrent meningitis, but our intention was to highlight the recrudescence of an invasive infection (urinary sepsis requiring admission to hospital and intravenous antibiotics) which we hypothesise has arisen from the intestinal reservoir. We did not state that all patients suffered from relapsed meningitis.

      Despite this, to address this reviewers concern, we have changed all reference to ‘relapsed infection’ to now read ‘recrudescent invasive infection’ in the revised manuscript.

      Patient 3: This patient had two relapses which is exceptional and may suggest the existence of a congenital malformation or a neurological complication such as abscess or empyema therefore, "imaging studies" should be detailed.

      This patient underwent extensive imaging investigation to rule out a hidden source. This included repeated MRI imaging of head and spine, CT imaging of head and chest, USS imaging of abdomen and pelvis and nuclear medicine imaging to detect a subtle meningeal defect and CSF leak. All tests were normal, and no abscess or empyema found.

      We have modified the text to include this information:

      Text in original submission: ‘Imaging studies and immunological work-up were normal.’

      New text in revised manuscript (underlined): ‘Extensive imaging studies including repeated MRI imaging of the head and spine, CT imaging of the head and chest, ultrasound imaging of abdomen and pelvis, and nuclear medicine imaging did not show a congenital malformation or abscess. Immunological work-up did not show a known primary immunodeficiency. At two years of age, speech delay is reported but no other developmental abnormality.’

      The authors suggest a link between intestinal dysbiosis and relapse in three patients. However, the fecal microbiomes of patients without relapse were not analysed, so no comparison is possible. Moreover, dysbiosis after several weeks of antibiotic treatment in a patient hospitalized for a long time is not unexpected. Therefore, it's impossible to make any assumption or draw any conclusion. This part of the manuscript is purely descriptive. Finally, the authors should be more prudent when they state in line 289 "we also provide direct evidence to implicate the gut as a reservoir [...] antibiotic treatment". Indeed the gut colonization of the mothers with the same strain may be also a reservoir (as stated in the discussion line 336). Finally, the authors do not discuss the potential role of ceftriaxone vs cefotaxime in the dysbiosis observed. Ceftriaxone may have a major impact on the microbiota due to its digestive elimination.

      We addressed the limitations of our study in the Discussion, including that we did not have access to urine or stool samples from the mother of the infants that suffered recrudescence, and thus cannot rule out mother-to-child transmission as a mechanism of reinfection. We have now added that we did not have clinical data on the weeks of gestation for all patients, and thus could not compare virulence factors from NMEC isolated from preterm versus term infants. The limitations of our study are summarised as follows in the Discussion (new text underlined):

      ‘This study had several limitations. First, our NMEC strain collection was restricted to seven geographic regions, a reflection of the difficulty in acquiring strains causing this disease. Second, we did not have access to a complete set of stool samples spanning pre- and post-treatment in the patients that suffered NM and recrudescent invasive infection. This impacted our capacity to monitor E. coli persistence and evaluate the effect of antibiotic treatment on changes in the microbiome over time. Third, we did not have access to urine or stool samples from the mother of the infants that suffered recrudescence, and thus cannot rule out mother-to-child transmission as a mechanism of reinfection. Finally, we did not have clinical data on the weeks of gestation for all patients, and thus could not compare virulence factors from NMEC isolated from preterm versus term infants.’

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It would be useful to mention the sample size (number of genomes analysed, n=58) in the abstract to give readers a sense of the scale of the analysis.

      We have added the number of genomes in the abstract as suggested (new text underlined).

      ‘Here we investigated the genomic relatedness of a collection of 58 NMEC strains spanning 1974-2020 and isolated from seven different geographic regions.’

      The term 'strain' is used throughout, it would be clearer to use 'isolates' to describe the biological material and 'genomes' when the unit being referred to is genome sequences. For example, lines 108-111 use 'strain' to mean the collection of 52 isolates but also uses 'strain' to mean the collection of 58 genomes including those of the 52 isolates that the authors sequenced plus a further 6 genomes of isolates that they do not have in their isolate collection.

      We have changed the term ‘strain’ to ‘isolate’ or ‘genome’ as suggested.

      Figure 1 (annotated phylogeny) is hard to read and interpret, as so much data is presented. It would assist readers if the authors could provide an interactive form of the phylogeny and metadata/genomic feature data discussed in the text, e.g. using microreact.org, so that details can be explored more easily.

      This is an excellent suggestion, and we created a project on microreact.org. This information has been added to the Figure 1 legend.

      https://microreact.org/project/oNfA4v16h3tQbqREoYtCXj-high-risk-escherichia-coli-clones-that-cause-neonatal-meningitis-and-association-with-recrudescent-infection.

      It would be useful to provide information on the frequency and/or distribution of the virulence factors in the broader E. coli population, to provide context for readers and to better understand the importance/significance of the high frequency of the reported virulence factors within NMEC.

      As noted above, we agree the prevalence of virulence factors in our collection is interesting. We discussed the prevalence of these virulence factors in our collection, and the detailed data is presented in Table S1. However, we also note a limitation in our study is the number of isolates, and thus we would prefer to avoid evaluation of the prevalence of these virulence factors in the context of a broader E. coli population. There are other studies that have examined NMEC virulence factors in the past; some examples are noted below, and we have now referenced these in our manuscript (note Ref 15 was suggested by Reviewer 3 in a comment below; PMID: 11920295).

      Ref 15: Johnson JR, Oswald E, O'Bryan TT, Kuskowski MA, Spanjaard L. Phylogenetic distribution of virulence-associated genes among Escherichia coli isolates associated with neonatal bacterial meningitis in the Netherlands. J Infect Dis 2002; 185(6): 774-84.

      Ref 28: Wijetunge DS, Gongati S, DebRoy C, et al. Characterizing the pathotype of neonatal meningitis causing Escherichia coli (NMEC). BMC Microbiol 2015; 15: 211.

      Ref 29: Bidet P, Mahjoub-Messai F, Blanco J, et al. Combined Multilocus Sequence Typing and O Serogrouping Distinguishes Escherichia coli Subtypes Associated with Infant Urosepsis and/or Meningitis. J Infect Dis. 2007; 196(2):297-303.

      I suggest avoiding the term 'global' to describe the collection, given that there are only seven countries included in the collection and two of the most populous continents (Africa and South America) are not represented at all.

      We agree, and now refer to our collection as ‘an NMEC strain collection from geographically diverse locations.’

      Reviewer #2 (Recommendations For The Authors):

      This is a suggestion regarding discussion/food for thought: This study sheds information on genomic features and indicates the presence of a reservoir in the infected infant. Previous studies have demonstrated the presence of a reservoir in the vaginas of women with recurrent UTIs. Is there any information as to whether the mothers of these infants, especially the three with recrudescent infection, had a UTI or recurrent UTI in their life? It may be worthwhile discussing the potential of testing for E. coli in expecting mothers, if they have a history of UTI.

      We do not have such data, and as indicated above we note this as a limitation of our study.

      It is unclear as written in the main text, as to whether all three cases of recrudescent infection come from the same geographical location. It would be easier to have this information in the corresponding main text, in addition to the supplement.

      The three cases of recrudescent invasive infection were from 3 different locations. We have added the information as following (new text underlined):

      ‘These patients were from different regions in Australia.’

      Reviewer #3 (Recommendations For The Authors):

      Line 48 and 67 change the word "devasting".

      Changed as suggested.

      Line 49 second most in full-term infants.

      Changed as suggested.

      Line 56 delete the sentence "antibiotic resistance genes occurred infrequently".

      We changed the sentence, which now reads (new text underlined):

      ‘Antibiotic resistance genes occurred infrequently in our collection’.

      Line 76 reference 10 is inappropriate.

      Reference 10 reported that 5/24 infants treated for neonatal Gram-negative bacillary meningitis over a 10-year period had a relapse of meningitis after the initial course of treatment. Four of the isolates that caused these relapsed infections were E. coli.

      To address the reviewers concern, we have altered the text as follows (new text underlined):

      ‘Moreover, NMEC is an important cause of relapsed infections in neonates [10]’.

      Line 83 several references related to serotypes are missing, notably doi.org/10.1086/339343.

      We have added this reference.

      Line 171 significantly? n=?, p=?

      The numbers and P-value were provided in the Supplementary Figure 3 legend. We have now added these to the text as follows:

      ‘Direct comparison of virulence factors between ST95 and ST1193, the two most dominant NMEC STs, revealed that the ST95 isolates (n = 20) contained significantly more virulence factors than the ST1193 isolates (n = 9); P-value < 0.001, Mann-Whitney two-tailed unpaired test (Supplementary Table 1, Supplementary Figure 3).”

      Figure 4 is not necessary.

      We respectfully disagree. Figure 4 provides an illustrative comparison of virulence factors between the two most dominant NMEC sequence types, ST95 and ST1193. We believe this will be informative for many readers.

      Line 311 "We speculate....of preterm infants" This sentence does not add anything to the discussion.

      We respectfully disagree and have kept the sentence. This reflects our opinion.

      Line 320 "clear clinical risk factors to explain... ». Term of neonates is missing.

      Updated as follows (new text underlined):

      ‘Although reported rarely, recrudescent invasive E. coli infection in NM patients, including several infants born pre-term, has been documented in single study reports [39, 40]. In these reports, infants received appropriate antibiotic treatment based on antibiogram profiling and no clear clinical risk factors to explain recrudescence were identified, highlighting our limited understanding of NM aetiology.’

    2. eLife assessment

      This valuable study presents findings characterising the genomic features of E. coli isolated from neonatal meningitis from seven countries, and documents bacterial persistence and reinfection in two case studies. The genomic analyses are solid, although the inclusion of a larger number of isolates from more diverse geographies would have strengthened the generalisability of findings. The work will be of interest to people involved in the management of neonatal meningitis patients, and those studying E. coli epidemiology, diversity, and pathogenesis.

    3. Reviewer #1 (Public Review):

      Summary:

      This study uses whole genome sequencing to characterise the population structure and genetic diversity of a collection of 58 isolates of E. coli associated with neonatal meningitis (NMEC) from seven countries, including 52 isolates that the authors sequenced themselves and a further 6 publicly available genome sequences. Additionally, the study used sequencing to investigate three case studies of apparent relapse. The data show that in all three cases, the relapse was caused by the same NMEC strain as the initial infection. In two cases they also found evidence for gut persistence of the NMEC strain, which may act as a reservoir for persistence and reinfection in neonates. This finding is of clinical importance as it suggests that decolonisation of the gut could be helpful in preventing relapse of meningitis in NMEC patients.

      Strengths:

      The study presents complete genome sequences for n=18 diverse isolates, which will serve as useful references for future studies of NMEC. The genomic analyses are high quality, the population genomic analyses are comprehensive and the case study investigations are convincing. The full data set (including phylogenetic tree, annotated with source, lineage and virulence factor information) are publicly available in interactive form via the MicroReact platform.

      Weaknesses:

      The NMEC collection described in the study includes isolates from just seven countries. The majority (n=51/58, 88%) are from high-income countries in Europe, Australia or North America; the rest are from Cambodia (n=7, 12%). Therefore it is not clear how well the results reflect the global diversity of NMEC, nor the populations of NMEC affecting the most populous regions.

      The virulence factors section highlights several potentially interesting genes that are present at apparently high frequency in the NMEC genomes; however without knowing their frequency in the broader E. coli population it is hard to know the significance of this.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The emergence of catalytic self-replication of polymers is an important question in the context of the origin of life. Tkachenko and Maslov present a model in which such a catalytic polymer sequence emerges from a random pool of replicating polymers.

      Strengths:

      The model is part of a theme from many previous papers from the same authors and their colleagues. The model is interesting, technically correct, and demonstrates qualitatively new phenomena. It is good that the paper also makes a connection with possible experimental scenarios -- specifically, concrete proposals are made for testing the core ideas of the model. It would indeed be an exciting demonstration when such an experiment does indeed materialize.

      Weaknesses:

      Unlike the rest of the paper which is very tight in its arguments, I find that the discussion section is not so. Specifically, sentences such as " In fact, this can be seen as a special case of the classical error catastrophe" are a bit loose and not well substantiated -- although these are in the discussion section, I find this to be a weakness of an otherwise good paper. Tightening some of the arguments here will make it an excellent paper in my opinion.

      We followed the reviewer's recommendations by streamlining the discussion and removing the potentially confusing comparison to the classic error catastrophe.

      Reviewer #2 (Public Review):

      Summary:

      The replication of information-coding polymers and the emergence of catalytic ribozymes pose significant challenges, both experimentally and theoretically, in the study of the RNA world hypothesis. In this context, Tkachenko et al. put forth a novel hypothesis regarding a replication oligomer system based on a cleavage ribozyme. They initially highlighted that the breakage of oligomers could contribute to self-replication, provided that these fragments function as primers for subsequent replications. Next, they proposed a self-replicating system of oligomers founded on a hammerhead structure that catalyzes cleavage. By a simple dynamical model, they demonstrated that such a system is self-sustainable in certain parameter regimes. Furthermore, they delved into discussions regarding the potential emergence of such a system and the evolution toward further optimized ribozymes.

      Strengths: Although the cleavage (hammerhead) ribozyme has been discussed in the context of the origins of life, the authors are the first to discuss how they could be selected using a mathematical model as far as I know. The idea is simple: ribozyme activity creates fragments by breakage of an oligomer, which works as a primer for the ribozyme itself, resulting in a positive feedback system (i.e., autocatalytic sets in a broader sense). This potentially enables us to resolve at the same time problems on the (i) supply of new primers (but note that there is a major concern on this as described in the 'weakness'), and (ii) the sustaining of the cleavage ribozyme.

      Weaknesses:

      The major weakness of their theory is that the ends of the new primers, formed through the breakage/cleavage of polymers, must be chemically active (as the authors have already emphasized in the last paragraph of their discussion) to enable further elongation. Reactivating the ends of preexisting oligomers without enzymes, to the best of our current knowledge, could be a challenging task. Although their model heavily relies on this aspect, the authors do not elaborate on it.

      We have added a discussion of the need for chemical activation: "It is important to note that in the context of RNA, such bidirectional elongation requires chemical activation of the phosphate group at the 5' end of the primer to provide free energy for the newly formed covalent bond. Like the polymerization process itself, achieving this without enzymes is biochemically challenging. One might speculate that prebiotic evolution relied on inorganic catalysis, such as on mineral surfaces, or involved polymers other than today's RNA."

      We also included in the discussion a comment on a possible combination of our mechanism and the Virtual Circle Genome model that would avoid the need for bidirectional growth: "It may be possible to incorporate the selection mechanism proposed in this paper into the Virtual Circle Genome model. Such a hybrid approach would avoid the need for the biochemically problematic bidirectional growth while explaining the emergence of early catalytic activity unaffected by sequence scrambling"

      Another weakness is in the setup of their discussion on evolutionary dynamics. While they claim that their model is robust against replication errors, their approach to evolutionary dynamics appears unconventional, and it remains unclear under what conditions their assumptions are founded. They treat a whole set of oligos as a subject of evolution, rather than each individual oligo. This may necessitate more complex assumptions, such as the encapsulation of sets of oligos inside a protocell, to be adequately rationalized. Thus, it remains uncertain whether the system is indeed robust against replication errors in a more natural context. For example, if a mutant oligo, denoted as b', arises due to an error in the replication of oligo b, and if b' has lower catalytic activity but replicates more rapidly than b, it may ultimately come to dominate the system.

      We agree with the reviewer that the evolutionary dynamics in multi-species ecosystems are somewhat complicated and potentially confusing. To this end, we have added the following text and citations to our discussion: "Note that this fitness is defined at the level of the ecosystem, comprising all sequences in the chemostat, and is not necessarily attributable to individual members of that population. Over time, similar to microbial ecosystems, this population changes according to the laws of competitive exclusion [34, 35]". However, we would like to point out that we assume that our model operates in a chemostat-like environment, which can be realized, for example, in a prebiotic pool supplied with a constant flux of monomers. Thus, the evolutionary dynamics described by our equations do not require encapsulation of sets of oligos in a protocell followed by selection of these protocells.

      Reviewer #3 (Public Review):

      Summary:

      Non-enzymatic replication of RNA or a similar polymer is likely to be important for the origin of life. The authors present a model of how a functional catalytic sequence could emerge from a mixture of sequences undergoing non-enzymatic replication.

      Strengths:

      Interesting model describing details of the proposed replication mechanism.

      Weaknesses:

      A discussion of the virtual circular genome idea proposed in [33] is included in the discussion section together with the problem of sequence scrambling faced by this mechanism that was raised in [34]. However, the authors state that sequence scrambling is a special case of the classical error catastrophe. This should be reworded, because these phenomena are completely different. The error catastrophe occurs due to single-point mutational errors in a model that assumes that a complete template is being copied in one cycle. Sequence scrambling arises in models that assume cycles of melting and reannealing, in which case only part of a template is copied in one cycle. Scrambling is due to the many alternative ways in which pairs of sequences can reanneal. Many of these alternatives are incorrect and this leads to the disappearance of the original sequence. This problem exists even in the limit where there is zero mutational error rate. Therefore, it cannot be called a special case of the error catastrophe problem.

      We followed the reviewer's recommendations and removed the potentially confusing comparison to the classic error catastrophe.

      The authors seem to believe that their model avoids the scrambling problem. If this is the case, a clear explanation should be added about why this problem is avoided. Two possible points are mentioned.

      (i) Replication is bidirectional in this model. This seems like a small detail to me. I don't think it makes any difference to whether scrambling occurs.

      (ii) The functional activity is located in a short sequence region. I can imagine that if the length of a strand that is synthesized in a single cycle is long enough to cover the complete functional region, then sometimes the complete functional sequence can be copied in one cycle. Is this what is being argued? If so, it depends a lot on rates of primer extension and lengths of melting cycles etc, and some comment on this should be made.

      As we now explain in the text, while the scrambling problem itself is not completely avoided in our model, it does not affect the replication of the functionally relevant regions of the oligomers. Our key observation is that, due to the simplicity of the cleaving enzymes, the length of the functionally relevant region is much smaller than the scrambling-free length. This can be seen from a back-of-the-envelope estimate of the scrambling-free length added to the text: "...assuming the minimal hybridization length l_0=6 and random statistics of the master sequence, one gets the scrambling free length \sqrt{2 x 4^l_0}+l_0 ~100. This is an order of magnitude larger than both l_0 and the length of the core region of the hammerhead ribozyme."

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I have evaluated that the authors have proposed a novel mechanism potentially relevant to the origins of life, and they have explained it with a sufficiently simple model. However, I recommend that they address the following issues, including those I raised in the public review:

      • Title: I believe that the title "Emergence of catalytic activity in ..." is rather broad. Could it be more specific to accurately represent the system described in the paper? For instance, "Selective advantage (or selection) of the hammerhead cleavage ribozyme in..." may better encapsulate the paper's focus.

      We thank the reviewer for this suggestion. However, our mechanism is not unique to hammerhead ribozymes. So we decided to keep the old title.

      • One theoretically non-trivial aspect is the stability of the cooperative structure. Could the authors provide a more detailed explanation of what drives the instability of the system and what mechanisms restore its stability? For example, in a similar self-reproducing oligomer system with ribozymes and their fragments (Kamimura et al. PLoS Comp. 2019), the symmetry of fragments breaks because they effectively suppress each other's replication. Also, it would be beneficial to clarify the necessary assumptions for stability. (For instance, the authors assumed that a_L can serve as a primer for only a, while a_R can serve for both a and b.).

      We thank the reviewer for bringing this interesting paper to our attention. The cooperative fixed point in our model is intrinsically dynamically stable. It is an interesting point why the replicase in Kamimura et al can be dynamically unstable, while the ligase in our model is always stable. However, it goes beyond the scope of our study. We added the following discussion to the manuscript: "Note that the stability of our cooperative fixed point is a non-trivial result. For example, in a related model by Kamimura et al. [34], the fixed point corresponding to a viable composite replicase is dynamically unstable and requires additional stabilization, e.g., by cell-like compartments."

      • As mentioned in the public review, a critical aspect of the practical applicability of the theory is whether cleaved oligos can be reactivated and further elongated, especially through non-enzymatic pathways. Alternatively, is it possible with the presence of enzymes? While I appreciate the conceptual beauty of their model, I recommend that they at least address the difficulty or feasibility of achieving this.

      We addressed this point in response to the public review

      • As also mentioned, in the section on evolutionary dynamics, it's essential to clarify the unit of evolution and the assumptions made. For a system-level evolution (i.e., all the sets of oligos, a and b can be the unit of evolution), more detailed assumptions are required, such as the presence of compartments whose growth is coupled with the replication of oligos inside, and the competition between these compartments. I recommend the authors clarify these points.

      We addressed this point in response to the public review

      Reviewer #3 (Recommendations For The Authors):

      Assuming that the above points can be addressed, this reviewer would support publication with minor modifications.

      We addressed all points in response to the public review

    2. Reviewer #3 (Public Review):

      Summary:

      Non enzymatic replication of RNA or a similar polymer is likely to be important for the origin of life. The authors present a model of how a functional catalytic sequence could emerge from a mixture of sequences undergoing non-enzymatic replication.

      Strengths:

      Interesting model describing details of the proposed replication mechanism.

      Weaknesses:

      The idea of the virtual circular genome proposed in [37] is included in the discussion section together with the problem of sequence scrambling faced by this mechanism that was raised in [38]. Sequence scrambling arises in models that assume cycles of melting and reannealing, in which case only part of a template is copied in one cycle. Scrambling is due to the many alternative ways in which pairs of sequences can reanneal. Many of these alternatives are incorrect and this leads to the disappearance of the original sequence. This problem exists even in the limit where there is zero mutational error rate. Thus, it is a separate problem from the usual error threshold problem. Scrambling would not occur if there was complete copying of a template from one end to the other.

      The authors seem to believe that their model avoids the scrambling problem to some extent. If I understand correctly, this is because the functional activity is located in a short sequence region. I can imagine that if the length of a strand that is synthesized in a single melting/annealing cycle is long enough to cover the complete functional region, then sometimes the complete functional sequence can be copied in one cycle. The authors give an estimate of a scrambling-free length. I am not sure how this is determined. I think that the problem of how to encode functional sequences in RNA strands undergoing non-enzymatic replication is still not fully resolved.

    3. eLife assessment

      This valuable study uses a model to determine when catalytic self-replication of polymers can emerge from a random pool of replicating polymers. The model accounts for the folding and function of polymers in addition to abstract evolutionary dynamics, providing solid evidence for the claims of the authors. The work will be of relevance to those interested in the origin of life, artificial cells, and evolutionary dynamics.

    4. Reviewer #1 (Public Review):

      Summary:

      The emergence of catalytic self-replication of polymers is an important question in the context of the origin of life. Tkachenko and Maslov present a model in which such a catalytic polymer sequence emerges from a random pool of replicating polymers.

      Strengths:

      The model is part of a theme from many previous papers from the same authors and their colleagues. The model is interesting, technically correct and demonstrates qualitatively new phenomena. It is good that the paper also makes a connection with possible experimental scenarios - specifically, concrete proposals are made for testing the core ideas of the model. It would indeed be an exciting demonstration when such an experiment does indeed materialize.

      Weaknesses:

      Unlike the rest of the paper which is very tight in its arguments, I find that the discussion section is not so. Specifically, sentences such as " In fact, this can be seen as a special case of the classical error catastrophe" are a bit loose and not well substantiated -- although these are in the discussion section, I find this to be a weakness of an otherwise good paper and tightening some of the arguments here will make it an excellent paper in my opinion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      REVIEWER #1

      Leanza et al. investigated the regulation of Wnt signaling factors in the bone tissue obtained from individuals with or without type 2 diabetes. They showed that typical canonical Wnt ligands and downstream factors (Wnt10b, LEF1) are down-regulated, while Wnt5a and sclerostin mRNA are unregulated in diabetic bone tissue. Further, Wnt5a and sclerostin associated with the content of AGEs and SOST mRNA levels also correlated with glycemic control and disease duration.

      Strengths:

      • A strength of the study is the investigation of Wnt signaling in bone tissue from humans with type 2 diabetes. Most studies measure only serum levels of Wnt inhibitors, but this study takes it further and looks into bone specifically.

      • The measurement of AGEs and its correlation to the Wnt signaling molecules is interesting and important. The correlation of sclerostin and Wnt5a with AGEs and disease duration suggests that inhibited Wnt signaling is paralleled by higher AGE levels and potentially weaker bone.

      • The methodology in terms of obtaining the bone samples and the rigorous evaluation of RNA integrity is great and provides a solid basis for further analyses.

      Weaknesses:

      • A weakness may include the rather limited number of samples. Especially for some sub-analyses (e.g. RNA analyses), only a subset of samples was used.

      • How was the sample size determined? It seems like more samples might have been necessary to obtain significant results for methods with a higher standard deviation (e.g. histomorphometry).

      We apology for the oversight in the description of the statistical analysis and we thank the reviewer for the careful reading. For sample size calculation of bone histomorphometry we used the cohort of the only paper analyzing trabecular bone in T2D postmenopausal women by dynamic histomorphometry (Manavalan JS et al, JCEM 2012). We performed a priori sample size calculation using G*Power 3.1.9.7., based on the t-test, difference between two independent groups setting. Analysis demonstrated that given an effect size of 2.2776769, we needed a total of 12 patients (6/group) to reach a power of 0.978. Regarding gene expression analyses, it was performed not in a subset of patients, but in all recruited subjects for this study. Based on the results of gene expression analysis on our main outcome (Wnt signaling), we demonstrated that for SOST gene the effect size was 1.2733824, with a power of 0.9490065, confirming that sample size was sufficient to achieve adequate statistical power.

      • Why is the number of samples different for the mRNA measurements? In most cases, there were 9, but in some 8 and in some 10?

      We sincerely thank the reviewer for the opportunity to clarify such important aspects. The number of samples used for mRNA quantification may differ between the different analyzed genes due to multiple reasons: First, we used for the real-time PCR only samples with high quality ratio (260/280) between 1.8-2.0 as stated in the method section of the manuscript (Page 8, lines 163-164). Moreover, we decided not to use the undetermined values, undetectable after the amplification cycles (40 cycles in total), as specified in the method section (Page 8, line 167).

      Overall, this study validates findings from the group that reported similar findings in 2020. This validates their methodology and shows that alterations in Wnt signaling are reproducible in human bone tissue.

      We thank the reviewer for the positive comment, we really value her/his opinion.

      COMMENTS:

      (1) The authors could provide more details on how much of the bone was analyzed for bone histomorphometry (what area?).

      We truly thank the reviewer for allowing us to explain more in depth our methodology. First, a biopsy containing trabecular bone from the femoral head was fixed in 10% neutral buffered formalin for 24 h prior to storage in 70% ethanol. Tissues were embedded in methylmethacrylate and sectioned sagittally by the Washington University Musculoskeletal Histology and Morphometry Core. Sections were stained with Goldner’s trichrome. Then, a rectangular region of interest containing trabecular bone was chosen below the cartilage-lined joint surface and primary spongiosa. This region had an average dimension of 45 mm2. Tissue processing artifacts, such as folding and edges, were excluded from the ROI. A threshold was chosen using the BIOQUANT software to automatically select trabeculae and measure bone volume. Finally, Osteoid was highlighted in the software and quantified semi-automatically using a threshold and correcting with the brush tool (as shown in the image below).

      We specify that in the methods section (Page 7, lines 146-152).

      Author response image 1.

      (2) Could the number of samples used for histomorphometry be increased? That may also lead to more significant results.

      We sincerely appreciated this suggestion from the reviewer but unfortunately, all available samples for histomorphometry have been analyzed and we are not able to increase the number of recruited participants at this time. Recruitment of people with T2D undergoing hip replacement is extremely difficult giving the limited number of those approved for elective surgery and compliant with our inclusion criteria. Considering also the long time needed to process bone sample for gene expression and histology analysis would require several months to have a consistent increase in recruited subjects. However, we have previously calculated sample size for bone histomorphometry analysis using the only available data of trabecular bone in T2D postmenopausal women measured by dynamic histomorphometry (Manavalan JS et al, JCEM 2012). We performed a priori sample size calculation using G*Power 3.1.9.7., based on the t-test of two independent groups. Analysis demonstrated that given an effect size of 2.2776769, we needed a total of 12 patients (6/group) to reach a power of 0.978.

      (3) It would have been interesting to assess the biomechanical behavior of the bone specimens. While it is known that BMD is often higher in patients with T2D, the resistance to fractures is lower. Ideally, bone strength measures could be correlated with Wnt molecule expression and AGEs.

      We agree with the reviewer that the assessment of biomechanical parameters in our cohort would increase the importance of this study, giving more insights on the effect of downregulation of Wnt signaling on bone strength. Thus, we followed reviewer suggestion, and we performed bone compression tests on trabecular bone core. We found a significant decrease in bone plasticity of T2D compared to controls [Young’s Modulus 21.6 (13.46-30.10 MPa) vs. 76.24 (26.81-132.9 MPa); p=0.0025). We added results of bone compression test in a new paragraph (Page 8, lines 191-194). In order to assess the validity of our results, we performed a post-hoc power calculation using G*Power 3.1.9.7. We demonstrated that effect size was 1.4716626, with a power of 0.9730784, confirming that sample size was sufficient to achieve adequate statistical power. We added methods in the related section and biomechanical data in table 3; we modified the manuscript accordingly (modifications are shown in track changes). Moreover, we also performed correlation analysis between Wnt target genes, AGEs and biomechanical parameters showing significant correlations as reported in the added paragraph in the results section (Page 11, Lines 225-233).

      REVIEWER #2

      This study reports the levels of expression of selected genes implicated in Wnt signaling in trabecular bone from femur heads obtained after surgery from post-menopausal women with (15 women) or without (21 women) type 2 diabetes. They found higher expression levels of SOST and WNT5A, and lower expression levels of LEF-1 and WNT10B in tissues from subjects with T2D, correlating with glycemia and advanced glycation products. No significant differences in bone density were observed. Overall, this is a cross-sectional, observational study measuring a limited set of genes found to vary with glycemia in postmenopausal women undergoing hip surgery.

      Strengths:

      The study demonstrates the feasibility of measuring gene expression in post-surgical trabecular bone samples, and finds differences associated with glycemia despite a relatively small number of subjects. It can form the basis for further research on the causes and consequences of changes in elements of the WNT signaling pathway in bone biology and disease.

      Weaknesses:

      The small number of targeted genes does not provide a comprehensive view of the transcriptional landscape within which the effects are observed. The gene expression changes are not associated with cellular or physiological properties of the tissue, raising questions about the biological significance of the observations.

      We thank the reviewer for the comment. Replying to his/her concerns we have increased the number of Wnt target genes including more interactors of Wnt/β-catenin pathway. We measured GSK3B, AXIN2, BETA-CATENIN and SFRP5 gene expression levels, showing a significant increase in GSK3B, in line with a downregulation of Wnt signaling in T2D. We modified the manuscript accordingly with this new analysis and updated the figure 1 panel (Page 10, lines 210-213). Unfortunately, in this paper we were not able to perform experiments on cellular or physiological properties. However, in order to analyze the biological effect of the analyzed genes on the phenotype, we measured bone strength by performing compression tests on trabecular bone cores (Page 10, lines 201-203 and table 3) and used biomechanical parameters for correlation analysis with targeted genes showing significant correlations of bone strength and Wnt genes. We modified adding a new paragraph in the result section and a new figure panel to the main manuscript (Page 11, lines 225-233 and figure 4).

      COMMENTS:

      (1) The small number of targeted genes does not provide a comprehensive view of the transcriptional landscape within which the effects are observed. Given the author's success in obtaining good-quality RNA from trabecular bone, a more comprehensive exploration would greatly improve the quality of the study.

      We agree with the reviewer that increase the transcriptional landscape related to Wnt signaling would be of interest for this work and we really thank for this opportunity. We were able to increase the number of Wnt target genes including more interactors of Wnt/β-catenin pathway, using the same cohort of patients in which we performed the other analysis. We also measured GSK3B, AXIN2, BETA-CATENIN and SFRP5 gene expression levels, showing a significant increase in GSK3B, in line with a downregulation of Wnt signaling in T2D. We modified the manuscript accordingly with this new analysis and updated the figures panel (Page 10, lines 210-213 and Figure 1).

      (2) The gene expression changes are not associated with cellular or physiological properties of the tissue, raising questions about the biological significance of the observations. Can the authors perform immunohistochemistry to associate the changes in gene expression with protein expression?

      We sincerely acknowledge this comment for focusing the attention on a such important aspect. We have partially replied to this comment in the previous paragraph. Regarding immunohistochemistry analysis, it is not possible to further use the available samples. This is mainly due to the fact that non-decalcified bones were embedded in plastic to allow for separate analysis of newly formed osteoid and mineralized bone. This process leads to poor antigen preservation and unsuitable detection of most targets. Moreover, antibodies for Wnt are also unreliable due to the secreted nature of the protein. Overall, this approach is unlikely to work efficiently. Similarly, RNAscope is not possible due to the resin. Optimization and validation of these analyses will need to be saved for a future study with fresh specimens.

      REVIEWER #3

      The manuscript by Leanza and colleagues explores the regulation of Wnt signaling and its association with advanced glycation end products (AGEs) accumulation in postmenopausal women with type 2 diabetes (T2D). The paper provides valuable insights into the potential mechanisms underlying bone fragility in individuals with T2D. Overall, the manuscript is well-structured, and the methodology is sound. I would suggest some minor revisions to improve clarity.

      Strengths:

      The study addresses an important and clinically relevant question concerning the mechanisms underlying bone fragility in postmenopausal women with T2D.

      The study's methodology appears sound, and the inclusion of postmenopausal women with and without T2D undergoing hip arthroplasty adds to the clinical relevance of the findings. Additionally, measuring gene expression and AGEs in bone samples provides direct insights into the study's objectives.

      The manuscript presents data clearly, and the results are well-organized.

      Weaknesses:

      Title. The title could be more specific to better reflect the content of the study. Also, the abstract should concisely summarize the study's main findings, providing some figures.

      We thank the reviewer for this suggestion, and we modified the title giving specific information on the main findings of this study. The new title is “Bone canonical Wnt signaling is downregulated in type 2 diabetes and associates with higher Advanced Glycation End-products (AGEs) content and reduced bone strength”. Moreover, we added as suggested a graphical abstract summarizing our study results.

      Introduction: the introduction would benefit from the addition of a clearer, more focused statement of the research questions or hypotheses guiding this study.

      We thank the reviewer for this opportunity and we reformulated the hypothesis of this study based on our data and new findings as follow:” we hypothesized that T2D and AGEs accumulation downregulate Wnt canonical signaling and negatively affect bone strength”. (page 6, lines 116-117).

      Methods: more information is needed on the hystomorphometry analysis. Surgical samples from 8 T2D and 9 non-diabetic subjects were used for histomorphometry analysis. How did these subjects compare with the other subjects in the T2D and control groups? Were they representative? How were they selected?

      We thank the reviewer for the opportunity to clarify this important point. The number of subjects included in the different analysis of the paper differ for multiple reasons. In particular, we used only bone specimen with enough trabecular bone material adequate to perform histomorphometry analysis. Therefore, the samples used in the histomorphometry analysis belong to the same subjects enrolled in the study and analyzed for the other experiments of this paper. However, we have previously calculated sample size for bone histomorphometry analysis using the only available data of trabecular bone in T2D postmenopausal women measured by dynamic histomorphometry (Manavalan JS et al, JCEM 2012). We performed a priori sample size calculation using G*Power 3.1.9.7., based on the t-test of two independent groups. Analysis demonstrated that given an effect size of 2.2776769, we needed a total of 12 patients (6/group) to reach a power of 0.978.

      COMMENTS:

      (1) In the Abstract, values and p-values for comparisons, and Spearman's rho and p-values for correlations should be provided. Most adverbs (thus, accordingly, importantly) could be omitted to improve conciseness and clarity.

      We kindly thank the reviewers for this precise and careful comment. We changed the Abstract accordingly. According to the abstract style of the journal we initially reported only the main findings. We have now modified providing values and p values as requested. We defer to the wishes of the editor as to the format in which the abstract should be reported.

      (2) Result presentation: 25th and 75th percentile should be provided rather than the interquartile range, to better reflect data distribution.

      We thank the reviewer for the opportunity to better clarify this part of the results section. We changed the manuscript accordingly.

      (3) Estimated glomerular filtration rate should be calculated and provided as a marker of renal function, rather than serum creatinine values.

      We thank the reviewer for the comment, and we modify the manuscript accordingly, adding the eGFR values in table 1 and in the result section.

      (4) The manuscript should include a statement confirming compliance with the Declaration of Helsinki, considering that human subjects were involved in the study.

      We thank the reviewer for the comment. The study was conducted in accordance with the Declaration of Helsinki. Ethics Committee of Campus Bio-Medico University approved the present study. Informed consent was obtained from all subjects involved in the study. (Page 6, lines 134-137).

    2. eLife assessment

      This study provides valuable insights into understanding bone fragility in T2D patients through the use of human skeletal tissue, reinforcing previous pre-clinical studies or observational studies using serum samples that the Wnt signaling pathway may play a critical role in T2D-related bone impairment. The methods are solid, but a limited number of subjects and a small set of genes with lack of data in terms of cellular properties of skeletal tissue are viewed as weaknesses.

    3. Reviewer #1 (Public Review):

      Summary: Leanza et al. investigated the regulation of Wnt signaling factors in the bone tissue obtained from individuals with or without type 2 diabetes. They showed that typical canonical Wnt ligands and downstream factors (Wnt10b, LEF1) are down-regulated, while Wnt5a and sclerostin mRNA is unregulated in diabetic bone tissue. Further, Wnt5a and sclerostin associated with the content of AGEs and SOST mRNA levels also correlated with glycemic control and disease duration.

      Strengths:

      - A strength of the study is the investigation of Wnt signaling in bone tissue from humans with type 2 diabetes. Most studies measure only serum levels of Wnt inhibitors, but this study takes it further and looks into bone specifically.<br /> - The measurement of AGEs and its correlation to the Wnt signaling molecules is interesting and important. The correlation of sclerostin and Wnt5a with AGEs and disease duration suggests that inhibited Wnt signaling is paralleled by higher AGE levels and potentially weaker bone.<br /> - The methodology in terms of obtaining the bone samples and the rigorous evaluation of RNA integrity is great and provides a solid basis for further analyses.

      Weaknesses:

      - A weakness may include the rather limited number of samples.

      Overall, this study validates findings from the group that have reported similar findings in 2020. This validates their methodology and shows that alterations in Wnt signaling are reproducible in human bone tissue.

    4. Reviewer #2 (Public Review):

      Summary:

      This study reports the levels of expression of selected genes implicated in Wnt signaling in trabecular bone from femur heads obtained after surgery from post-menopausal women with (15 women) or without (21 women) type 2 diabetes. They find higher expression levels of SOST and WNT5A, and lower expression levels of LEF-1 and WNT10B in tissues from subjects with T2D, correlating with glycemia and advanced glycation products. No significant differences in bone density were observed. Overall, this is a cross-sectional, observational study measuring a limited set of genes found to vary with glycemia in postmenopausal women undergoing hip surgery.

      Strengths:

      The study demonstrates the feasibility of measuring gene expression in post-surgical trabecular bon samples and finds differences associated with glycemia despite a relatively small number of subjects. It can form the basis for further research on the causes and consequences of changes in elements of the WNT signaling pathway in bone biology and disease.

      Weaknesses:

      The small number of targeted genes does not provide a comprehensive view of the transcriptional landscape within which the effects are observed. The gene expression changes are not associated with cellular or physiological properties of the tissue, raising questions about the biological significance of the observations.

    1. eLife assessment

      The authors address the function of keratin 17 (K17), a marker of the most aggressive pancreatic ductal adenocarcinomas (PDACs). While this potentially useful study addresses a significant area of pancreatic cancer research, the lack of evidence demonstrating nuclear localization of K17 in human PDAC and the excessive reliance on a single cell line reduce the significance of the work. Moreover, the weak phenotypes of K17 phosphosite mutants provide incomplete support for the authors' mechanistic model.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors suggest that Keratin 17 (K17) a component of intermediate filaments that is highly expressed in the more aggressive basal subtype of pancreatic cancer, is functionally involved in tumor promotion. They use mouse and human cell lines and overexposed wild type or mutant K17 (the latter a form that accumulates in the nuclei) and show a modest reduction in survival and increase in tumor size and metastasis. The authors use in vitro work to show that phosphorylation, through a PKC/MEK/RSK kinase cascade, leads to K17 phosphorylation and K17 disassembly.

      Strengths:

      K17 is an intriguing protein, as it becomes part of intermediate filaments but it has also been described to have a role in the nucleus. Whether K17 functionally drives the malignant phenotype of pancreatic cancer is unclear. Thus, the article addresses an important area of research.

      Weaknesses:

      Some shortcomings with the interpretation of results and the strength of the evidence provided are notes. Among those, evidence that nuclear K17 is a feature of basal pancreatic cancer in human tumors is missing. Further, the survival effects observed in the mouse experiments are modest, especially with the L3.6 cell line. Lastly, while the authors point at some potentially intriguing gene expression changes in pancreatic cancer cells expressing K17, such as the expression of genes related to epithelial mesenchymal transition (EMT) they do not provide evidence that these genes are K17 targets, not that they mediate the nuclear function of K17 in experimental models, nor that they are associated with K17-high human tumors.

    3. Reviewer #2 (Public Review):

      Summary:

      Keratin 17 is a highly stress-inducible keratin that has been implicated in various human disorders. For example, higher K17 expression was shown to be associated with poor survival in several cancers including pancreatic carcinoma. To follow up on these observations, Kawalerski et al. assessed the relevance of K17 and its phosphorylation on this deadly tumor. In particular, they identified novel K17 phosphorylation sites and demonstrated that they affect K17 solubility as well as its nuclear localization. They also studied their significance in vivo.

      Strengths:

      The overall structure is very logical, the manuscript is well-written.

      Weaknesses:

      Unfortunately, the key experiment, i.e. the assessment of growth of cancer cell lines with different phospho-variants of K17, turned largely negative.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The paper addresses the important question of how numerical information is represented in the human brain. Experimental findings are interpreted as providing evidence for a sensorimotor mechanism that involves channels, each tuned to a particular numerical range. However, the logic of the channel concept as employed here, as well as the claims regarding a sensorimotor basis for these channels, is incomplete and thus requires clarification and/or modification.

      Reviewer #1 Public Review

      Anobile and colleagues present a manuscript detailing an account of numerosity processing with an appeal to a two-channel model. Specifically, the authors propose that the perception of numerosity relies on (at least) two distinct channels for small and large numerosities, which should be evident in subject reports of perceived numerosity. To do this, the authors had subjects reproduce visual dot arrays of numerosities ranging from 8 to 32 dots, by having subjects repetitively press a response key at a pre-instructed rate (fast or slow) until the number of presses equaled the number of perceived dots. The subjects performed the task remarkably well, yet with a general bias to overestimate the number of presented dots. Further, no difference was observed in the precision of responses across numerosities, providing evidence for a scalar system. No differences between fast and slow tapping were observed. For behavioral analysis, the authors examined correlations between the Weber fractions for all presented numerosities. Here, it was found that the precision at each numerosity was similar to that at neighboring numerosities, but less similar to more distant ones. The authors then went on to conduct PCA and clustering analyses on the weber fractions, finding that the first two components exhibited an interaction with the presented numerosity, such that each was dominant at distinct lower and upper ranges and further well-fit by a log-Gaussian model consistent with the channel explanation proposed at the beginning.

      Overall, the authors provide compelling evidence for a two-channel system supporting numerosity processing that is instantiated in sensorimotor processes. A strength of the presented work is the principled approach the authors took to identify mechanisms, as well as the controls put in place to ensure adequate data for analysis. Some questions do remain in the data, and there are aspects of the presentation that could be adjusted.

      • The use of a binary colormap for the correlation matrix seems unnecessary. Binary colormaps between two opposing colors (with white in the middle) are best for results spanning positive and negative values (say, correlation values between -1 and +1), but the correlations here are all positive, so a uniform colormap should be applied. I can appreciate that the authors were trying to emphasize that a 2+ channel system would lead to lower correlations at larger ratios, but that's emphasized better in the numerical ratio line plots.

      We agree and now changed the colour maps accordingly (Fig 1 and 3, p. 4 and 11). Thank you.

      • In Figure 1, the correlation matrices in Figure 1 appear blurred out. I am not sure if this was intentional but suspect it was not, and so they should appear like those presented in Figure 3.

      Sorry about that, it was a rendering problem. Now fixed.

      • It's notable that the authors also collected data on a timing task to rule out a duration-based strategy in the numerosity task. If possible, it would be great to have the author also conduct the rest of the analyses on the duration task as well; that is, to look at WF correlation matrices/ratios as well as PCA. There is evidence that duration processing is also distinctly sensorimotor, and may also rely on similar channels. Evidence either for or against this would likely be of great interest.

      We agree that investigating the existence of temporal channels would be of great interest, but it is goes beyond the scope of the current study. Out of curiosity, however, we analysed the duration data. Interestingly, signatures of sensorimotor channels (correlation gradient as a function on duration distance) emerge. Interestingly, this does not hold when correlating number against duration data. These results (if confirmed) would indicate the existence of independent mechanisms for the time and numerosity perception. Our research agenda is now proceeding in this direction.

      • For the duration task, there was no fast-tapping condition. Why not? Was this to keep the overall task length short?

      Yes, this was the main reason.

      • The number of subjects/trials seems a bit odd. Why did some subjects perform both and not others? The targets say they were presented "between 25 and 30 times", but why was this variable at all?

      The two experimental conditions were demanding, lasting around 2 hours each. Some participants, unfortunately, were available for just one slot. To make the two conditions similarly powered, we added some extra non-shared participants. Trials were divided into blocks of 55 trials (5 repetitions for each target). Most of the participants performed 6 blocks in both conditions, few of them (again for availability limits) performed 5 blocks.

      • For the PCA analysis, my read of the methods and results is that this was done on all the data, across subjects. If the data were run on individual subjects and the resulting PCA components averaged, would the same results be found?

      We thank the reviewer for giving us the opportunity to clarify the technique.

      In brief: we measured precision (Weber Fraction) in translating digits (target numbers) into corresponding action sequences. This creates a m by n matrix, each column (n) representing a participant, each row (m) a target number. This matrix was then submitted to PCA. The analyses provided two components. Each target number was assigned with two loading scores: one representing the loading on the 1st and one on the 2nd component. These loadings were than displayed as a function of targets, to describe the tunings. This analysis, by its nature, is across-participants and cannot be performed on individual data.

      • For the data presented in Figure 2, it would be helpful to also see individual subject data underlaid on the plots to get a sense of individual differences. For the reproduced number, these will likely be clustered together given how small the error bars are, but for the WF data it may show how consistently "flat" the data are. Indeed, in other magnitude reproduction tasks, it is not uncommon to see the WF decrease as a function of target magnitude (or even increase). It may be possible that the reason for the observed findings is that some subjects get more variable (higher WFs) with larger target numbers and others get less variable (lower WFs).

      We agree and now added individual data, confirming flat WF distributions (Fig 2 B&D).

      • Regarding the two-channel model, I wonder how much the results would translate to different ranges of numerosities? For example, are the two channels supported here specific to these ranges of low and high numbers, or would there be a re-mapping to a higher range (say, 32 to 64 dots) or to a narrower range (say 16 to 32 dots). It would be helpful to know if there is any evidence for this kind of remapping.

      This is the first study measuring sensorimotor channels for the transformation of numbers into action sequences. Whether these channels are modulated by the numerical context is an interesting open question that we are exploring through specific experimental conditions (now discussed at p. 17, lines 451-460).

      Reviewer #2 Public Review

      The authors wish to apply established psychophysical methods to the study of number. Specifically, they wish to test the hypothesis - supported by their previous work - that human sensorimotor processes are tuned to specific number ranges. In a novel set of tasks, they ask participants to tap a button N times (either fast or slow), where N varies between 8 and 32 across trials. As I understood it, they then computed the Weber fraction (WF) for each participant for each number and correlated those values across participants and numbers. They find stronger correlations for nearby numbers than for distant numbers and interpret this as evidence of sensorimotor tuning functions. Two other analyses - cluster analyses and principal component analyses (PCA) - suggest that participants' performance relied on at least 2 mechanisms, one for encoding low numbers of taps (around 10) and another for encoding larger numbers (around 27).

      Strengths

      Individual differences can be a rich source of scientific insight and I applaud the authors for taking them seriously, and for exploring new avenues in the study of numerical cognition.

      Weaknesses

      Inter-subject-correlation

      The experiment "is based on the idea that interindividual variability conveys information that can reveal common sensory processes (Peterzell & Kennedy, 2016)" but I struggled to understand the logic of this technique. The authors explain it most clearly when they write "Regions of high intercorrelation between neighbouring stimuli intensity can be interpreted to imply that sets of stimuli are processed by the same (shared) underlying channel. This channel, while responding relatively more to its preferred stimulus, will also be activated by neighbouring stimuli that although slightly different from the preferred intensity, are nevertheless included in the same response distribution." As I understood it, the correlations are performed "between participants, for all targets values" - meaning that they are measuring the extent to which different participants' WFs vary together. But why is this a good measure of channels? This analysis seems to assume that if people have channels for numerical estimation, they will have the same channels, tuned to the same numerical ranges. But this is an empirical question - individual participants could have wildly different channels, and perhaps different numbers of channels (even in the tested range). If they do, then this between-subject analysis would mask these individual differences (despite the subtitle).

      Yes, the technique assumes that different individuals have similar channels, and the results confirm this. If everyone had different channels, or different numbers of channels, we would not have found this pattern of results: an ordered scaling of correlations as a function of numerical distance. As specified in the ms, however, this technique (at least as we used it) is not sensitive enough to identify the exact number of channels, so it may have smoothed the results, 'masking' the existence of more than two channels. To avoid possible confounds related to accuracy (reproduction biases), we used Weber Fraction, a standard index of normalized sensory precision (p. 7, lines 182-183).

      Different channels

      I had trouble understanding much of the analyses, and this may account for at least some of my confusion. That said, as I understand it, the results are meant to provide "evidence that tuned mechanisms exist in the human brain, with at least two different tunings" because of the results of the clustering analysis and PCA. However, as the authors acknowledge, "PCA aims to summarize the dataset with the minimal number of components (channels). We can therefore not exclude the possible existence of more than two (perhaps not fully independent) channels." So I believe this technique does not provide more evidence for the existence of 2 channels as for the existence of 4 or 8 or 11 channels, the upper bound for a task testing 11 different numbers. If we can conclude that people may have one channel per number, what does "channel" mean?

      We recognise that the technique is not particularly intuitive, and we apologize for the lack of clarity.

      To clarify: we measured the precision in translating digit numbers into action sequences. This was done for different target numbers (8, 10, 11, 13, 14, 16, 19, 21, 24, 28, 32) and with N participants. For each target number, and independently for each participant, we calculated the reproduction precision (Weber Fraction). The dataset comprised a matrix where each column represents a participant, and each row a target number. Each cell contains the corresponding Weber Fraction value. This dataset was then analysed with a simple correlation, across participants. For example, the WFs provided by the N participants when tested at the target number "8" were correlated with those obtained with the target number 10, 11, 13...32. The results show that the correlation between "8 and 10" (low numerical distance) was higher compared to that obtained correlating "8 with 32" (higher numerical distance). This pattern implies that the shared variance, between numbers, scales with numerical distance, across participants: implying the existence of channels aggregating similar numbers (i.e. tuning selectivity). On the same dataset we than ran a PCA. This analysis provides two main components. Within each component, each target number is assigned with a loading score: one for the 1st and one for the 2nd component. These loading were plotted as a function of targets, to describe the tunings shape (i.e. channels).

      As stated above, we cannot really say exactly how many channels exist. These results should be interpreted as evidence for the existence of at least two channels for the transformation of numerical symbols into action sequences. This is not an obvious result at-all. There is no evidence in the literature for the existence of such mechanism in humans. In the animal (crow), there were found as many channels as the numbers tested. This does not contrast with our 2-channel results, but (very likely) arises from the different resolution of the techniques. Single cell recording has surely higher resolution compared to our interindividual covariance approach. In short, we believe that the channels revealed here are likely a coarse summary representation of several underlying channels.

      We now tried to make these points clearer (p. 7 lines 186-196; p. 15 lines 382-384; p. 16 lines 401-402):

      Several other questions arose for me when thinking through this technique. If people did have two channels (at least in this range), why would they be so broad? Why would they be centered so near the ends of the tested range? Can such effects be explained by binning on the part of the participants, who might have categorized each number (knowingly or not) as either "small" or "large"? Whereas the experiment tested numbers 8-32, numbers are infinite - How could a small number of channels cover an infinite set? Or even the set 8-10,000? More broadly, I was unsure what advantages channels would have - that is - how in principle would having distinct channels for processing similar stimuli improve (rather than impede) discrimination abilities?

      This field of study is completely new, with many questions still open, including whether these channels are modulated by the numerical context such as the tested range and their extremes. The channels appear broad because, as stated above, they likely represent a coarse summary representation of several (probably sharper) underlying channels. We are now exploring the effect of numerical range and trying to modulate the tuning widths through ad-hoc experimental conditions. (p. 16 lines 401-402; p. 17 lines 450-459)

      No number perception

      I was uncertain about the analogy to studies of other continuous dimensions like spatial frequency, motion, and color. In those studies, participants view images with different spatial frequency, motion, or color - the analogy would be to see dot arrays containing different numbers of dots. Instead, here participants read written numerals (like "19"), symbols which themselves do not have any numerical properties to perceive. How does that difference change the interpretation of the effects? One disadvantage of using numerals is that they introduce a clear discontinuity: Our base-10 numerical system artificially chunks integers into decades, potentially causing category-boundary effects in people's reproductions.

      We used these sensory analogies to provide a flavour of the technique. The focus of the current study was on the individual differences in the numbers-to-actions transformation process. To this aim we decided to reduce the noise associated with the encoding of the sensory stimulus di per se. Digits encoding, at least with educated adults, is indeed noiseless, eliminating this source of variability. However, we agree that looking at non-symbolic formats would be interesting. We are now collecting data with dots and flash estimations. The results (so far) are largely in line with those found here, ensuring no chunking strategies, and confirming previous literature showing sensory numerosity selective channels in humans and animals. (p. 14 lines 351-355)

      Sensorimotor

      The authors wished to test for "sensorimotor mechanisms selective to numerosity" but it's not clear what makes their effects sensorimotor (or selective to numerosity, see below). It's true they found effects using a tapping task (which like all behaviour is sensorimotor), but it's not clear that this effect is specific to sensorimotor number reproduction. They might find similar effects for numerical comparison or estimation tasks. Such findings would suggest the effect may be a general feature of numerical cognition across modalities.

      Related to the above comment, the task here was to transform noiseless symbols (digits) into (noisy) numerical action sequences. Given that the source of variability is thus mainly driven by the sensory-to-action process, we believe that the task can be safely assumed to be considered sensorimotor in nature. (p. 14 lines 351-355)

      Yes, the same pattern of results might be found for numerical comparison or estimation tasks, but using non-symbolic formats (dots/flashes). Educated adults make no errors in naming or comparing such simple digits, making this covariance analysis impossible to be performed with digit verbal estimation or comparison tasks. However, to anticipate our future results, we have preliminary data for dots and flashes verbal estimation tasks (“how many?”). The data are suggesting similar results, consolidating the technique, and confirming the large literature showing sensory channels for purely visual numerosity. (p. 17 lines 453-455)

      Specific to numbers

      The authors argue that their effects are "number selective" but they do not provide compelling evidence for this selectivity. In principle, their main findings could be explained by the duration of tapping rather than the number of taps. They argue this is unlikely for two reasons. The first reason is that the overall pattern of results was unchanged across the fast and slow tapping conditions, but differences in duration were confounded with numerosity in both conditions, so the comparison is uninformative. (Given this, I am not sure what we stand to learn by comparing the two tapping speeds.) The second reason is that temporal reproduction was less precise in their control condition than numerical reproduction, but this logic is unclear: Participants could still use duration (or some combination of speed and duration) as a helpful cue to numerosity, even if their duration reproductions were imperfect. If the authors wish to test the role of duration, they might consider applying the same analytical techniques they use for numbers to their duration data. Perhaps participants show similar evidence for duration-selective channels, in the absence of number, as they do for other non-numerical domains (like spatial frequency).

      The fast and slow conditions were not meant to control for duration strategies but to test for the generalizability of the results over different tapping temporal dynamics (temporal frequency in this case). The results confirmed this.

      The control for duration strategies is the comparison between precision in reproducing durations or numbers. In the number-to-action task, participants were free to use any cues, including response duration. However, it is safe to assume that the performance is dominated by the most precise feature, number in this case. In other words, in the number task if participants were reproducing the time required to give a certain number of presses, then in the timing task, where they are explicitly reproducing the same durations, they should show no lower precision. The results are opposite to that prediction. (p. 16 lines 418-420)

      Theories of numerical cognition.

      An expansive literature on numerical cognition suggests that many animals, human children, and adults across cultures have two systems for representing numerosity without counting - one that can represent the exact cardinality of sets smaller than about 4 and another that represents the approximate number of larger sets (but see Cheyette & Piantadosi, 2020). The current paper would benefit from better relating its findings to this long lineage of theories and findings in numerical approximation across cultures, ages, and species.

      The numbers used in this work were well above the subitizing limit (>N7). Indeed, the WFs found showed no signs of subitizing discontinuities. We believe that discussing the literature on subitizing here is too far from the scope of the current work.

      Additional public comments from the Reviewing Editor:

      (1) What, in the present work, makes the case that the operative mechanism is sensorimotor? The authors frame the discussion around a sensorimotor number system but the evidence here could be seen as using a sensorimotor task as one way to get at an amodal number channel. For example, the authors could do the same experiment but have people watch a circle that flashes on and off for n times, with participants reporting the number of flashes (or shown a number and asked to say more or less). They could then apply the same analyses as used here. If they got the same results, it would seem that this would be an argument against the channels being sensorimotor. I suppose if they did NOT get results in the perceptual task, then they would have (much) stronger evidence that the channels are somehow sensorimotor in nature. Either way, an experiment along these lines would be essential for addressing the nature of the channels (tied to sensorimotor or not).

      We chose to use this task because the perception of simple digits (like those used here), at least in educated adults, is noiseless. This ensures that the inter-individual variability remaining on the table is that related to the motor transformation process. For this reason, we believe that the task can be safely considered sensorimotor (see also Kirschhock & Nieder, Number selective sensorimotor neurons in the crow translate perceived numerosity into number of actions, Nature comm, 2022). (p. 14 lines 351-355)

      This is not true for verbal numerosity estimation of non-symbolic stimuli (such as dots and/or series of events). It is well known that the estimation of the latter stimuli is noisy, and there would be no sensorimotor transformation processing in the task. The inter-individual variability in estimation precision and thus the measurable channels would then reflect sensory numerosity tunings. These have been revealed with various techniques in both humans and animals. However, we are now following this idea and we have preliminary data showing that sensory channels are also detectable by the technique used in the current study. This in not in contrast with the sensorimotor nature of the channels found here, but instead indicating the existence of both sensory and sensorimotor number channels.

      The authors may argue that results from other studies such as the 2016 target article make the case about a sensorimotor basis of these channels. While I don't have a great grasp of this literature, my take on the 2016 target article is that the point was not about sensorimotor channels but about interactions between action and vision. This seems more in line with the idea of amodal number channels and indeed, they speak about a "generalized number sense" in that paper.

      The 2016 paper showed that a short period of hand tapping (adaptation) can distort visual numerosity perception. The results implied the existence of sensorimotor number channels, integrating non-symbolic numerosity (dots/flashes) and actions. The current study goes beyond this, describing (for the first time) sensorimotor channels transforming symbolic numbers into action sequences. Whether these channels are also in charge to encode non-symbolic numerosity is an interesting open question that we are currently investigating with cross-tasks analyses. If the same channels are in charge to respond to non-symbolic numerosity (across space and time: dots and sequences of visual/auditory events) as well as to translate digits into actions, we could than speck about a generalized sensorimotor number sense. At present, this remains a possibility, to be tested. (p. 17 lines 450-459)

      (2) There is a need for clarification on the method for creating the correlation matrices. The authors write that they look at correlations between Weber fractions between participants. By "between" do they mean "across"? That is, they calculate the Weber fraction for each individual for each cell. Then for a given cell, you correlate its Weber fraction with every other cell, using the pairs for each individual. I would call this "across" not "between." Is this just a semantic thing or have I misunderstood the process?

      To make this concrete, consider the correlation for cell 10/11. I assume it is something like

      10 11

      Subj1 .25 .31

      Subj2 .13 .09

      Subj3 .22 .16

      Etc

      And correlation across participants will be the data point for the 10/11 cell in the matrix.

      It is a semantic error; this is exactly what we did: across participants.

      To clarify better: we measured the precision in transforming numbers into sequences of actions. This was done for different target numbers (8, 10, 11, 13, 14, 16, 19, 21, 24, 28, 32) and with N participants. For each target number, and independently for each participant, we than calculated the reproduction precision (Weber Fraction). The dataset then consists of a matrix where each column represents a participant, and each row a target number. Each cell contains the corresponding Weber Fraction. This dataset was then analysed with a simple correlation, across participants. For example, the WFs of the N participants obtained when testing the target number "8" were correlated with those obtained with the target numbers "10, 11, 13...32". The results show that the correlation between "8 and 10" (low numerical distance) was higher compared to that obtained correlating "8 with 32" (higher numerical distance). This pattern implies that the shared variance, between numbers (across participants) scales with numerical distance, in line with the existence of channels that aggregate similar numbers (tunings).

      (p. 7 lines 186-196)

      (3) The duration data should be analysed. While n is small, can't the authors correlate WFs across tasks? Suppose a similar pattern is observed, suggestive of >1 channel in this between-task correlation.

      One of the strengths of this technique is that it is very general, it can be applied to virtually every stimulus feature. We are currently collecting data to test the existence of generalised sensorimotor channels for continuous magnitudes: space, time, and numerosity. The logic is exactly as suggested. These correlational analyses however require (relatively) large samples and ad-hoc experimental conditions. We do not feel confident in providing messages on this with 9 participants. Out of curiosity, however, we analysed the data as requested and the results are interesting: signatures of sensorimotor channels emerge in both the number and duration tasks but NOT when analysed in conjunction (cross-task). If these results will be confirmed, would indicate the existence of separate mechanisms for the encoding of time and numerosity (and perhaps also space?).

      (4) The finding of similar results for fast and slow is quite interesting. And provides good motivation to do the duration control experiment. But two issues related to the control experiment:

      (4a) Why not look at the correlation matrix for the duration task? Was this not done because there were only 9 participants? If so, why the small n here?

      Yes, that is the reason. The aim of this work is not to investigate the existence of duration channels. This experimental condition was designed as a control for the use of non-numerical strategies in the number task. It worked well. The results were already obvious with 9 individuals (confirming Kirschhock & Nieder, Nature comm, 2022); we then did not consider necessary to continue in this direction. However, related to the previous point, we run a preliminary analysis on this small data set and (as mentioned above) signatures of sensorimotor channels (correlation gradients) emerge in both number and duration tasks but NOT when analysed in conjunction (cross-task), indicating different mechanism. We are now pursuing this issue using different number and duration tasks.

      (4b) I don't follow why greater precision on the tapping task compared to the duration task makes a strong case against the duration hypothesis. Is the argument that, if based on duration, there should be greater precision on the duration task since the tapping task would exhibit the variability from duration PLUS added noise from tapping? If this is the argument, this should be spelled out.

      Yes. The more precise feature dominates behaviour. In other words, in the number task if participants were reproducing the time required to give a certain number of presses, then in the timing task, where they are explicitly reproducing the same durations, they should show no lower precision. The results are opposite to that prediction. (p. 18 lines 418-420)

      (4c) Related to point 3 above, one would expect based on things like Rammsayer's study that duration judgments would also engage channels. Is the idea that these are different channels in the tapping task? There seems a good case to have participants do both tapping and duration tasks and then do the correlation matrices, comparing within and between tasks.

      Please see response to 3 and 4a.

      Recommendations for the authors:

      (1) On the logic of the channel concept as applied in the current context:

      While the authors present the numerical channel idea by analogy to how this concept is used for other features such as spatial frequency or orientation, there is no input to activate the channels-just a written numeral. The channel concept would mean that to respond to say, "16", you get output from multiple channels, with each weighted by its "tuning" to 16 such that the aggregate results in approximately 16 taps. This seems a bit odd: it would be like saying to draw, I use the output from my spatial frequency channels to create an image with a particular power spectrum. The logic of the channel concept in the current experimental context needs to be reviewed and clarified.

      The channel here reflects (probably) the activity of noisy neurons in charge to translate sensory information into a numerical motor output, such as those shown by Kirschhock & Nieder (Nature comm, 2022) in the crows. We used digits because their encoding (at least for such simple digits and educated adults) has no associated noise. The interindividual variability left, and analysed, is thus mainly associated with the motor transformation process, revealing sensorimotor channels.

      (2) A more thorough analysis of the duration task would strengthen the paper. The n is small for this interesting control condition and the analyses presented in the current version of the paper are limited. It is recommended to make this a fully powered test with complete analyses. Consider making this a new experiment in which participants do both the tapping and duration tasks to allow cross-modal analyses.

      We run some exploratory analyses on this, described in comments 3 and 4a. We prefer to leave this issue to dedicated future experiments (which are just started).

      (3) Expanded discussion of the limitations of the current study. The authors are clear that the methods don't provide a strong test of whether there are two or more than two channels. It would be useful to also comment on whether the estimated locations of the peaks are robust or if there is some sort of statistical bias for them to be at more extreme values. More generally, use the comments on the reviews to elaborate on various issues related to the channel concept.

      We addressed these issues in the ms (p. 17 lines 450-459).

      (4) Clarify the methods used to calculate the correlation matrix (see reviews).

      We now specified better the correlation analyses (p. 7 lines 186-196).

      (5) What is the basis for arguing that the mechanism under consideration is a "sensorimotor number system?" The data in this paper do not appear to provide evidence that the effects are linked to sensorimotor processes rather than reflect an amodal number system that is being accessed in their task through the motor system. At a minimum, present arguments for what motivates/justifies the sensorimotor claim or modify the paper to be neutral on this point.

      We now specified better the sensorimotor nature of the task used here (p. 14 lines 351-355; see also comment 1).

    2. eLife assessment

      This potentially important paper addresses the question of how numerical information is represented in the human brain. Experimental findings are interpreted as providing evidence for a sensorimotor mechanism that involves channels, each tuned to a particular numerical range. While this is an interesting application of methodologies used to identify the presence of channels, the evidence supporting the claim that these have a sensorimotor basis is incomplete.

    3. Reviewer #1 (Public Review):

      Anobile and colleagues present a manuscript detailing an account of numerosity processing with an appeal to a two-channel model. Specifically, the authors propose that the perception of numerosity relies on (at least) two distinct channels for small and large numerosities, which should be evident in subject reports of perceived numerosity. To do this, the authors had subjects reproduce visual dot arrays of numerosities ranging from 8 to 32 dots, by having subjects repetitively press a response key at a pre-instructed rate (fast or slow) until the number of presses equaled the number of perceived dots. The subjects performed the task remarkably well, yet with a general bias to overestimate the number of presented dots. Further, no difference was observed in the precision of responses across numerosities, providing evidence for a scalar system. No differences between fast and slow tapping were observed. For behavioral analysis, the authors examined correlations between the Weber fractions for all presented numerosities. Here, it was found that the precision at each numerosity was similar to that at neighboring numerosities, but less similar to more distant ones. The authors then went on to conduct PCA and clustering analyses on the weber fractions, finding that the first two components exhibited an interaction with the presented numerosity, such that each were dominant at distinct lower and upper ranges and further well-fit by a log-Gaussian model consistent with the channel explanation proposed at the beginning.

      Overall, the authors provide compelling evidence for a two-channel system supporting numerosity processing that is instantiated in sensorimotor processes. A strength of the presented work is the principled approach the authors took to identify mechanisms, as well as the controls put in place to ensure adequate data for analysis.

      One remaining question regards the secondary timing task that was used as a control. There may be interesting findings here to pursue, and so I encourage the authors or other researchers to examine those findings and explore further studies there as well.

    4. Reviewer #2 (Public Review):

      Summary:

      The authors wish to apply established psychophysical methods to the study of numbers. Specifically, they wish to test the hypothesis - supported by their previous work - that human sensorimotor processes are tuned to specific number ranges. In a novel set of tasks, they ask participants to tap a button N times (either fast or slow), where N varies between 8 and 32 across trials. As I understood it, they then computed the Weber fraction (WF) for each participant for each number and correlated those values across participants and numbers. They find stronger correlations for nearby numbers than for distant numbers and interpret this as evidence of sensorimotor tuning functions. Two other analyses - cluster analyses and principal component analyses (PCA) - suggest that participants' performance relied on at least 2 mechanisms, one for encoding low numbers of taps (around 10) and another encoding larger numbers (around 27).

      Strengths:

      Individual differences can be a rich source of scientific insight and I applaud the authors for taking them seriously.

      Weaknesses:

      Implications of intercorrelation. The experiment "is based on the idea that interindividual variability conveys information that can reveal common sensory processes (Peterzell & Kennedy, 2016)" but I struggle to understand the logic of this technique. The authors explain it most clearly when they write "Regions of high intercorrelation between neighbouring stimuli intensity can be interpreted to imply that sets of stimuli are processed by the same (shared) underlying channel. This channel, while responding relatively more to its preferred stimulus, will also be activated by neighbouring stimuli that although slightly different from the preferred intensity, are nevertheless included in the same response distribution." Why does high intercorrelation imply a shared channel and why should it be calculated across participants? Shouldn't performance on any set of tasks (that vary in difficulty) correlate across participants? Why in principle should people have distinct channels for processing similar stimuli and how could such a system improve (rather than impede) discrimination abilities? What pattern of intercorrelation would disconfirm the existence of tuning mechanisms? And perhaps most fundamentally: What is a channel and why do they matter?

      Different channels? I had trouble understanding much of the analyses, and this may account for at least some of my confusion. That said, as I understand it, the results are meant to provide "evidence that tuned mechanisms exist in the human brain, with at least two different tunings" because of the results of the clustering analysis and PCA. But as the authors acknowledge, "PCA aims to summarize the dataset with the minimal number of components (channels). We can therefore not exclude the possible existence of more than two (perhaps not fully independent) channels." I would go a step further and say this technique does not provide more evidence for the existence of 2 channels as for the existence of 4, 8 or 24 channels, the upper bound for a task testing 24 different numbers. If we can conclude that people may have one channel per number, what does "channel" mean?

      Several other questions arise when thinking through this technique, which left me skeptical of its utility. If people did have two channels (at least in this range), why would they be so broad? Why would they be centered so near the ends of the tested range? Can such effects be explained by binning on the part of the participants, who might have categorized each number (knowingly or not) as either "small" or "large"? Or by the kind of data-binning or distributions (i.e. Gaussian) used in the analyses? Or by the physical limits and affordances of the effector participants used (i.e. their finger)? Moreover, if people had sensorimotor channels tuned to different numbers, wouldn't this cause discontinuities in their own WF? Why look at correlations across individuals rather than correlations or discontinuities within individuals? Whereas the experiment tested numbers 8-32, numbers are infinite - How could a small number of channels cover an infinite set? Or even the set 8-10,000? What would the existence of multiple such channels mean for our understanding of numerical cognition? There may be good answers to these questions, but they are not clear to this reader.

      Theories of numerical cognition. An expansive literature on numerical cognition suggests that many animals, human children, and adults across cultures have two systems for representing numerosity without counting - one that can represent the exact cardinality of sets smaller than about 4 and another that represents the approximate number of larger sets. Recent accounts suggest that what appears to be two systems can be explained by a single system of numerical approximation with limited information capacity (see Cheyette & Piantadosi, 2020). The current paper would benefit from better relating its findings to this long lineage of theories and findings in numerical approximation across cultures, ages, and species.

      Specific to numbers? The authors argue that their effects are "number selective" but they do not provide compelling evidence for this selectivity. In principle, their main findings could be explained by the duration of tapping rather than the number of taps. They argue this is unlikely for two reasons. The first reason is that the overall pattern of results was unchanged across the fast and slow tapping conditions, but differences in duration were confounded with numerosity in both conditions, so the comparison is uninformative. The second reason is that temporal reproduction was less precise in their control condition than numerical reproduction, but this logic is unclear: Participants could still use duration (or some combination of speed and duration) as a helpful cue to numerosity, even if their duration reproductions were imperfect.

      If the authors wish to test the role of duration, they might consider applying the same analytical techniques they use for number to their duration data. Perhaps participants show similar evidence for duration-selective channels, in the absence of number, as they do for other non-numerical domains (like spatial frequency).

    5. Reviewer #3 (Public Review):

      Reviewing Editor's Summary:

      The revised manuscript has clarified concerns raised by the reviewers concerning the analysis method in constructing the correlation matrix. These key results are now readily comprehensible. They have also added a final section to the Discussion, sketching some important questions for future research (e.g., number/resolution of channels and extension of the logic used here to look at number channels in other tasks).

      Reviewer 1 was satisfied with these changes and has updated their review. Reviewer 2 did not think the revision tackled the theoretical issues raised in their initial review; as such, this reviewer has opted to leave their initial public review unchanged.

      I also believe that the revision does not adequately address a major theoretical issue, namely whether the current data provide evidence of sensorimotor number channels, the central claim of the paper. The authors argue that since perception is noise free (stimuli were given symbolically), then the task variance comes from processes associated with sensorimotor transformation. Let's consider the task: A number is presented, the participant then attempts to produce that number of taps. To preclude counting, they are required to say the syllable "ba" as fast as possible while tapping. The sensorimotor channel idea would suppose that the symbolic stimulus activates a set of channels, each of which specifies the number of taps that should be produced. For example, a "6" channel likes to produce 6 outputs (with variability), a "10" channel 10 outputs (with variability), etc., with the actual production of the (weighted) integration of these outputs.

      An alternative is that, since explicit counting is prevented by the secondary task, the participant makes an internal estimation of the number of produced taps. These judgments could be based on the output of amodal number channels. For example, the same process would be at play if the task were changed such that the participants watched a dot flash and had to estimate the number of flashes (while concurrently saying "ba"). The authors indicate in their response letter that they are conducting experiments along these lines and that the results are similar. They suggest that this provides support for the existence of both sensory and sensorimotor number channels. Extending this, if the experiment were tones instead of flashes, the argument would be that there are auditory, visual, and sensorimotor number channels. It seems more parsimonious to interpret such a pattern as reflective of amodal number channels.

      I recognize there are other intriguing reasons to think there may be intimate links between our sense of number and movement, but I remain unconvinced that the current results provide evidence for sensorimotor number channels.

    1. eLife assessment

      This valuable study showed convincing evidence that archerfishes can adapt their shooting behaviors to airflow perturbations. The fish also exhibits adaptive behaviors indicative of an egocentric representation of the perturbation, though direct evidence is missing. Hence, this work will be of interest to those interested in cross-species comparisons for motor learning.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors examined whether archerfish have the capacity for motor adaptation in response to airflow perturbations. Through two experiments, they demonstrated that archerfish could adapt. Moreover, when the fish flipped its body position with the perturbation remaining constant, it did not instantaneously counteract the error. Instead, the archerfish initially persisted in correcting for the original perturbation before eventually adapting, consistent with the notion that the archerfish's internal model has been adapted in egocentric coordinates.

      Evaluation:

      This important study demonstrates the remarkable capacity for motor adaptation in archer fish. I found the results of both experiments to be convincing, given the observable learning curve and the clear aftereffect. Nonetheless, within the current set of experiments, no quantitative is provided to demonstrate that the archer fish is sensitive to the relative change in body position, making it unclear whether motor adaptation in archer fish indeed generalizes in egocentric coordinates.

      The authors have cited a previous study to claim that archer fish are sensitive to their relative position in the water tank. However, given the absence of clear visual referents on the screen (e.g., squares with different colors in the corners) and/or some behavioral indication that the fish are sensitive to their relative change in body position, I remain sceptical of the claim that archer fish indeed generalize in egocentric rather than allocentric coordinates. The current results do not rule out the idea that archerfish are ostensibly unaware of changes in body position, they continue with previously successful actions, masquerading as egocentric generalization.

    3. Reviewer #2 (Public Review):

      Summary:

      The work of Volotsky et al presented here shows that adult archerfish are able to adjust their shooting in response to their own visual feedback, taking consistent alterations of their shot, here by an air flow, into account. The evidence provided points to an internal mechanism of shooting adaptation that is independent of external cues, such as wind. The authors provide evidence for this by forcing the fish to shoot from 2 different orientations to the external alteration of their shots (the airflow). This paper thus provides behavioral evidence of an internal correction mechanism, that underlies adaptive motor control of this behavior. It does not provide direct evidence of refractory index-associated shoot adjustance.

      Strengths:

      The authors have used a high number of trials and strong statistical analysis to analyze their behavioral data. They used an elegant experimental design in which they force the fish to shoot from directions chosen by the authors, which elegantly reduced shooting variability.

      Weaknesses:

      A large portion of fish did not make it to the final test (as is often the case in behavioral studies) which raises the question whether all individuals are able to solve the task.

    1. eLife assessment

      This useful study addresses the brain correlates underlying technical reasoning by a set of fMRI experiments and locates it to PF. If confirmed, this study provides an intriguing framework for our understanding of different types of problem-solving processes. However, the current evidence supporting the claims is incomplete, due to the existence of alternative explanations for the main overlapping results and potential confounding variables across conditions.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, Osiurak and colleagues investigate the neurocognitive basis of technical reasoning. They use multiple tasks from two neuroimaging studies and overlap analysis to show that the area PF is central for reasoning, and plays an essential role in tool-use and non-tool-use physical problem-solving, as well as both conditions of mentalizing task. They also demonstrate the specificity of the technical reasoning and find that the area PF is not involved in the fluid-cognition task or the mentalizing network (INT+PHYS vs. PHYS-only). This work suggests an understanding of the neurocognitive basis of technical reasoning that supports advanced technologies.

      Strengths:

      -The topic this study focuses on is intriguing and can help us understand the neurocognitive processes involved in technical reasoning and advanced technologies.

      -The researchers obtained fMRI data from multiple tasks. The data is rich and encompasses the mechanical problem-solving task, psychotechnical task, fluid-cognition task, and mentalizing task.

      -The article is well written.

      Weaknesses:

      - Limitations of the overlap analysis method: there are multiple reasons why two tasks might activate the same brain regions. For instance, the two tasks might share cognitive mechanisms, the activated regions of the two tasks might be adjacent but not overlapping at finer resolutions, or the tasks might recruit the same regions for different cognition functions. Thus, although overlap analysis can provide valuable information, it also has limitations. Further analyses that capture the common cognitive components of activation across different tasks are warranted, such as correlating the activation across different tasks within subjects for a region of interest (i.e. the PF).

      -Control tasks may be inadequate: the tasks may involve other factors, such as motor/ action-related information. For the psychotechnical task, fluid-cognition task, and mentalizing task, the experiment tasks need not only care about technical-cognition information but also motor-related information, whereas the control tasks do not need to consider motor-related information (mainly visual shape information). Additionally, there may be no difference in motor-related information between the conditions of the fluid-cognition task. Therefore, the regions of interest may be sensitive to motor-related information, affecting the research conclusion.

      -Negative results require further validation: the cognitive results for the fluid-cognition task in the study may need more refinement. For instance, when performing ROI analysis, are there any differences between the conditions? Bayesian statistics might also be helpful to account for the negative results.

    3. Reviewer #2 (Public Review):

      Summary:

      The goal of this project was to test the hypothesis that a common neuroanatomic substrate in the left inferior parietal lobule (area PF) underlies reasoning about the physical properties of actions and objects. Four functional MRI (fMRI) experiments were created to test this hypothesis. Group contrast maps were then obtained for each task, and overlap among the tasks was computed at the voxel level. The principal finding is that the left PF exhibited differentially greater BOLD response in tasks requiring participants to reason about the physical properties of actions and objects (referred to as technical reasoning). In contrast, there was no differential BOLD response in the left PF when participants engaged in fMRI variant of the Raven's progressive matrices to assess fluid cognition.

      Strengths:

      This is a well-written manuscript that builds from extensive prior work from this group mapping the brain areas and cognitive mechanisms underlying object manipulation, technical reasoning, and problem-solving. Major strengths of this manuscript include the use of control conditions to demonstrate there are differentially greater BOLD responses in area PF over and above the baseline condition of each task. Another strength is the demonstration that area PF is not responsive in tasks assessing fluid cognition - e.g., it may just be that PF responds to a greater extent in a harder condition relative to an easy condition of a task. The analysis of data from Task 3 rules out this alternative interpretation. The methods and analysis are sufficiently written for others to replicate the study, and the materials and code for data analysis are publicly available.

      Weaknesses:

      The first weakness is that the conclusions of the manuscript rely on there being overlap among group-level contrast maps presented in Figure 2. The problem with this conclusion is that different participants engaged in different tasks. Never is an analysis performed to demonstrate that the PF region identified in e.g., participant 1 in Task 2 is the same PF region identified in Participant 1 in Task 4.

      A second weakness is that there is a variance in accuracy between tasks that are not addressed. It is clear from the plots in the supplemental materials that some participants score below chance (~ 50%). This means that half (or more) of the fMRI trials of some participants are incorrect. The methods section does not mention how inaccurate trials were handled. Moreover, if 50% is chance, it suggests that some participants did not understand task instructions and were systematically selecting the incorrect item.

      A third weakness is related to the fluid cognition task. In the fMRI task developed here, the participant must press a left or right button to select between 2 rows of 3 stimuli while only one of the 3 stimuli is the correct target. This means that within a 10-second window, the participant must identify the pattern in the 3x3 grid and then separately discriminate among 6 possible shapes to find the matching stimulus. This is a hard task that is qualitatively different from the other tasks in terms of the content being manipulated and the time constraints.

      In sum, this is an interesting study that tests a neuro-cognitive model whereby the left PF forms a key node in a network of brain regions supporting technical reasoning for tool and non-tool-based tasks. Localizing area PF at the level of single participants and managing variance in accuracy is critically important before testing the proposed hypotheses.

    4. Reviewer #3 (Public Review):

      Summary:

      This manuscript reports two neuroimaging experiments assessing commonalities and differences in activation loci across mechanical problem-solving, technical reasoning, fluid cognition, and "mentalizing" tasks. Each task includes a control task. Conjunction analyses are performed to identify regions in common across tasks. As Area PF (a part of the supramarginal gyrus of the inferior parietal lobe) is involved across 3 of the 4 tasks, the investigators claim that it is the hub of technical cognition.

      Strengths:

      The aim of finding commonalities and differences across related problem-solving tasks is a useful and interesting one.

      The experimental tasks themselves appear relatively well-thought-out, aside from the concern that they are differentially difficult.

      The imaging pipeline appears appropriate.

      Weaknesses:

      (1) Methodological<br /> As indicated in the supplementary tables and figures, the experimental tasks employed differ markedly in 1) difficulty and 2) experimental trial time. Response latencies are not reported (but are of additional concern given the variance in difficulty). There is concern that at least some of the differences in activation patterns across tasks are the result of these fundamental differences in how hard various brain regions have to work to solve the tasks and/or how much of the trial epoch is actually consumed by "on-task" behavior. These difficulty issues should be controlled for by 1) separating correct and incorrect trials, and 2) for correct trials, entering response latency as a regressor in the Generalized Linear Models, 3) entering trial duration in the GLMs.

      A related concern is that the control tasks also differ markedly in the degree to which they were easier and faster than their corresponding experimental task. Thus, some of the control tasks seem to control much better for difficulty and time on task than others. For example, the control task for the psychotechnical task simply requires the indication of which array contains a simple square shape (i.e., it is much easier than the psychotechnical task), whereas the control task for mechanical problem-solving requires mentally fitting a shape into a design, much like solving a jigsaw puzzle (i.e., it is only slightly easier than the experimental task).

      (2) Theoretical<br /> The investigators seem to overlook prior research that does not support their perspective and their writing seems to lack scientific objectivity in places. At times they over-reach in the claims that can be made based on the present data. Some claims need to be revised/softened.

    1. eLife assessment

      This important study advances our understanding of the mechanisms of neuronal large dense-core vesicle (LDCV) secretion, which mediates neuropeptide and neurotrophin release. It describes a negative regulatory process involving the interaction of the Rab3-effector Rabphilin-3A with the SNARE fusion protein SNAP25, which limits LDCV secretion and neurite growth. The evidence in support of the authors' claims is solid overall, but some conclusions, e.g regarding the exact synaptic localization of Rabphilin-3A, its association with large dense-core vesicles, or the role of Rabphilin-3A-controlled neurotrophin signaling in neurite growth, are incompletely supported. This study will be of interest to the fields of cell biology, cellular neuroscience, and neuroendocrinology.

    2. Joint Public Review:

      The molecular mechanisms that mediate the regulated exocytosis of neuropeptides and neurotrophins from neurons via large dense-core vesicles (LDCVs) are still incompletely understood. Motivated by their earlier discovery that the Rab3-RIM1 pathway is essential for neuronal LDCV exocytosis, the authors now examined the role of the Rab3 effector Rabphilin-3A in neuronal LDCV secretion. Based on multiple live and confocal imaging approaches, the authors provide evidence for a synaptic enrichment of Rabphilin-3A and for independent trafficking of Rabphilin-3A and LDCVs. Using an elegant NPY-pHluorin imaging approach, they show that genetic deletion of Rabphilin-3A causes an increase in electrically triggered LDCV fusion events and increased neurite length. Finally, knock-out-replacement studies, involving Rabphilin-3A mutants deficient in either Rab3- or SNAP25-binding, indicate that the synaptic enrichment of Rabphilin-3A depends on its Rab3 binding ability, while its ability to bind to SNAP25 is required for its effects on LDCV secretion and neurite development. The authors conclude that Rabphilin-3A negatively regulates LDCV exocytosis and propose that this mechanism also affects neurite growth, e.g. by limiting neurotrophin secretion. These are important findings that advance our mechanistic understanding of neuronal large dense-core vesicle (LDCV) secretion.

      The major strengths of the present paper are:

      (i) The use of a powerful Rabphilin-3A KO mouse model.<br /> (ii) Stringent lentiviral expression and rescue approaches as a strong genetic foundation of the study.<br /> (iii) An elegant FRAP imaging approach.<br /> (iv) A cutting-edge NPY-pHluorin-based imaging approach to detect LDCV fusion events.

      Weaknesses that somewhat limit the convincingness of the evidence provided and the corresponding conclusions include the following:

      (i) The limited resolution of the various imaging approaches introduces ambiguity to several parameters (e.g. LDCV counts, definition of synaptic localization, Rabphilin-3A-LDCV colocalization, subcellular and subsynaptic localization of expressed proteins, AZ proximity of Rabphilin-3A and LDCVs) and thereby limits the reliability of corresponding conclusions. Super-resolution approaches may be required here.<br /> (ii) The description of the experimental approaches lacks detail in several places, thus complicating a stringent assessment.<br /> (iii) Further analyses of the LDCV secretion data (e.g. latency, release time course) would be important in order to help pinpoint the secretory step affected by Rabphilin-3A.<br /> (iv) It remains unclear why a process that affects a general synaptic SNARE fusion protein - SNAP25 - would specifically affect LDCV but not synaptic vesicle fusion.<br /> (v) The mechanistic links between Rabphilin-3A function, LDCV density in neurites, neurite outgrowth, and the proposed underlying mechanisms involving trophic factor release remain unclear.

    3. Reviewer #1 (Public Review):

      Summary:

      The manuscript by Hoogstraaten et al. investigates the effect of constitutive Rabphilin 3A (RPH3A) ko on the exocytosis of dense core vesicles (DCV) in cultured mouse hippocampal neurons. Using mCherry- or pHluorin-tagged NPY expression and EGFP- or mCherry tagged RPHA3, the authors first analyse the colocalization of DCVs and RPH3A. Using FRAP, the authors next analyse the mobility of DCVs and RAB3A in neurites. The authors go on to determine the number of exocytotic events of DCVs in response to high-frequency electrical stimulation and find that RPH3A ko increases the number of exocytotic events by a factor 2-3, but not the fraction of released DCVs in a given cell (8x 50Hz stim). In contrast, the release fraction is also increased in RBP3A KOs when doubling the stimulation number (16x 50Hz). They further observe that RPH3A ko increases dendrite and axon length and the overall number of ChgrB-positive DCVs. However, the overall number of DCVs and dendritic length in ko cells directly correlate, indicating that the number of vesicles per dendritic length remains unaffected in the RPH3A KOs. Lentiviral co-expression of tetanus toxin (TeNT) showed a non-significant trend to reduce axon and dendrite length in RPH3a KOs. Finally, the authors use co-expression of RAB3A and SNAP25 constructs to show that RAB3A but not SNAP25 interaction is required to allow the exocytosis-enhancing effect in RPH3A KOs.

      While the authors' methodology is sound, the microscopy results are performed well and analyzed appropriately, but their results in larger parts do not sufficiently support their conclusions. Moreover, the experiments are not always described in sufficient detail (e.g. FRAP; DCV counts vs. neurite length) to fully understand their claims.

      Overall, I thus feel that the manuscript does not provide a sufficient advance in knowledge.

      Strengths:

      - The authors' methodology is sound, and the microscopy results are performed well and analyzed appropriately.<br /> - Figure 2: The exocytosis imaging is elegant and potentially very insightful. The effect in the RPH3A KOs is convincing.<br /> - Figure 4: the logic of this experiment is elegant. It shows that the increased number of DCV fusion events in RPH3A KOs is related to the interaction of RPH3A with RAB3A but not with SNAP25.

      Weaknesses:<br /> - The results in larger parts do not sufficiently support the conclusions.<br /> - The experiments are not always described in sufficient detail (e.g. FRAP; DCV counts vs. neurite length) to fully understand their claims.<br /> - Not of sufficient advance in knowledge for this journal<br /> - The significance of differences in control experiments WT vs. KO) varies between experiments shown in different figures.<br /> - Axons and dendrites were not analyzed separately in Figures 1 and 2.<br /> - The colocalization study in Figure 1 would require super-resolution microscopy.

    4. Reviewer #2 (Public Review):

      Summary:

      Hoogstraaten et al investigated the involvement of rabphilin-3A RPH3A in DCV fusion in neurons during calcium-triggered exocytosis at the synapse and during neurite elongation. They suggest that RPH3A acts as an inhibitory factor for LDV fusion and this is mediated partially via its interaction with SNAP25 and not Rab3A/Rab27. It is a very elegant study although several questions remain to be clarified.

      Strengths:

      The authors use state-of-the-art techniques like tracking NPY-PHluorin exocytosis and FRAP experiments to quantify these processes providing novel insight into LDCs exocytosis and the involvement of RPH3A.

      Weaknesses:

      At the current state of the manuscript, further supportive experiments are necessary to fully support the authors' conclusions.

    5. Reviewer #3 (Public Review):

      Summary:

      The molecular mechanism of regulated exocytosis has been extensively studied in the context of synaptic transmission. However, in addition to neurotransmitters, neurons also secrete neuropeptides and neurotrophins, which are stored in dense core vesicles (DCVs). These factors play a crucial role in cell survival, growth, and shaping the excitability of neurons. The mechanism of release for DCVs is similar, but not identical, to that used for SV exocytosis. This results in slow kinetic and low release probabilities for DCV compared to SV exocytosis. There is a limited understanding of the molecular mechanisms that underlie these differences. By investigating the role of rabphilin-3A (RPH3A), Hoogstraaten et al. uncovered for the first time a protein that inhibits DCV exocytosis in neurons.

      Strengths:

      In the current work, Hoogstraaten et al. investigate the function of rabphilin-3A (RPH3A) in DVC exocytosis. This RAB3 effector protein has been shown to possess a Ca2+ binding site and an independent SNAP25 binding site. Using colocalization analysis of confocal imaging the authors show that in hippocampal neurons RPH3A is enriched at pre- and post-synaptic sites and associates specifically with immobile DCVs. Using site-specific RPH3A mutants they found that the synaptic location was due to its RAB3 interaction site. They further could show that RPH3A inhibits DCV exocytosis due to its interaction with SNAP25. They came to that conclusion by comparing NPY-pHluorin release in WT and RPH3A KO cells and by performing rescue experiments with RPH3A mutants. Finally, the authors showed that by inhibiting stimulated DCV release, RPH3A controlled the axon and dendrite length possibly through the reduced release of neurotrophins. Thereby, they pinpoint how the proper regulation of DCV exocytosis affects neuron physiology.

      Weaknesses:

      Data context<br /> One of the findings is that RPH3A accumulates at synapses and is mainly associated with immobile DCVs. However, Farina et al. (2015) showed that 66% of all DCVs are secreted at synapses and that these DCVs are immobile prior to secretion. To provide additional context to the data, it would be valuable to determine if RPH3A KO specifically enhances secretion at synapses. Additionally, the authors propose that RPH3A decreases DCV exocytosis by sequestering SNAP25 availability. At first glance, this hypothesis appears suitable. However, due to RPH3A synaptic localization, it should also limit SV exocytosis, which it does not. In this context, the only explanation for RPH3A's specific inhibition of DCV exocytosis is that RPH3A is located at a synapse site remote from the active zone, thus protecting the pool of SNAP25 involved in SV exocytosis from binding to RPH3A. This hypothesis could be tested using super-resolution microscopy.

      Technical weakness<br /> One technical weakness of this work consists in the proper counting of labeled DCVs. This is significant since most findings in this manuscript rely on this analysis. Since the data was acquired with epi-fluorescence or confocal microscopy, it doesn't provide the resolution to visualize individual DCVs when they are clumped. The authors use a proxy to count the number of DCVs by measuring the total fluorescence of individual large spots and dividing it by the fluorescence intensity of discrete spots assuming that these correspond to individual DCVs. This is an appropriate method but it heavily depends on the assumption that all DCVs are loaded with the same amount of NPY-pHluorin or chromogranin B (ChgB ). Due to the importance of this analysis for this manuscript, I suggest that the authors show that the number of DCVs per µm2 is indeed affected by RPH3A KO using super-resolution techniques such as dSTORM, STED, SIM, or SRRF.

    1. eLife assessment

      This valuable study combines the use of Fisher Kernels with Hidden Markov models aiming to improve brain-behaviour prediction. The evidence supporting the authors' conclusions is solid, comparing brain-behaviour prediction accuracies across a range of different traits. This work is timely and will be of interest to neuroscientists working on functional connectivity for brain-behaviour association.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors attempt to validate Fisher Kernels on the top of HMM as a way to better describe human brain dynamics at resting state. The objective criterion was the better prediction of the proposed pipeline of the individual traits.

      Strengths:<br /> The authors analyzed rs-fMRI dataset from the HCP providing results also from other kernels.<br /> The authors also provided findings from simulation data.

      Weaknesses:

      (1) The authors should explain in detail how they applied cross-validation across the dataset for both optimization of parameters, and also for cross-validation of the models to predict individual traits.

      (2) They discussed throughout the paper that their proposed (HMM+Fisher) kernel approach outperformed dynamic functional connectivity (dFC). However, they compared the proposed methodology with just static FC.

      (3) If the authors wanted to claim that their methodology is better than dFC, then they have to demonstrate results based on dFC with the trivial sliding window approach.

    3. Reviewer #2 (Public Review):

      Summary:

      The manuscript presents a valuable investigation into the use of Fisher Kernels for extracting representations from temporal models of brain activity, with the aim of improving regression and classification applications. The authors provide solid evidence through extensive benchmarks and simulations that demonstrate the potential of Fisher Kernels to enhance the accuracy and robustness of regression and classification performance in the context of functional magnetic resonance imaging (fMRI) data. This is an important achievement for the neuroimaging community interested in predictive modeling from brain dynamics and, in particular, state-space models.

      Strengths:

      (1) The study's main contribution is the innovative application of Fisher Kernels to temporal brain activity models, which represents a valuable advancement in the field of human cognitive neuroimaging.

      (2) The evidence presented is solid, supported by extensive benchmarks that showcase the method's effectiveness in various scenarios.

      (3) Model inspection and simulations provide important insights into the nature of the signal picked up by the method, highlighting the importance of state rather than transition probabilities.

      (4) The documentation and description of the methods are solid including sufficient mathematical details and availability of source code, ensuring that the study can be replicated and extended by other researchers.

      Weaknesses:

      (1) The generalizability of the findings is currently limited to the young and healthy population represented in the Human Connectome Project (HCP) dataset. The potential of the method for other populations and modalities remains to be investigated.

      (2) The possibility of positivity bias in the HMM, due to the use of a population model before cross-validation, needs to be addressed to confirm the robustness of the results.

      (3) The statistical significance testing might be compromised by incorrect assumptions about the independence between cross-validation distributions, which warrants further examination or clearer documentation.

      (4) The inclusion of the R^2 score, sensitive to scale, would provide a more comprehensive understanding of the method's performance, as the Pearson correlation coefficient alone is not standard in machine learning and may not be sufficient (even if it is common practice in applied machine learning studies in human neuroimaging).

      (5) The process for hyperparameter tuning is not clearly documented in the methods section, both for kernel methods and the elastic net.

      (6) For the time-averaged benchmarks, a comparison with kernel methods using metrics defined on the Riemannian SPD manifold, such as employing the Frobenius norm of the logarithm map within a Gaussian kernel, would strengthen the analysis, cf. Jayasumana (https://arxiv.org/abs/1412.4172) Table 1, log-euclidean metric.

      (7) A more nuanced and explicit discussion of the limitations, including the reliance on HCP data, lack of clinical focus, and the context of tasks for which performance is expected to be on the low end (e.g. cognitive scores), is crucial for framing the findings within the appropriate context.

      (8) While further benchmarks could enhance the study, the authors should provide a critical appraisal of the current findings and outline directions for future research, considering the scope and budget constraints of the work.

    4. Reviewer #3 (Public Review):

      Summary:

      In this work, the authors use a Hidden Markov Model (HMM) to describe dynamic connectivity and amplitude patterns in fMRI data, and propose to integrate these features with the Fisher Kernel to improve the prediction of individual traits. The approach is tested using a large sample of healthy young adults from the Human Connectome Project. The HMM-Fisher Kernel approach was shown to achieve higher prediction accuracy with lower variance on many individual traits compared to alternate kernels and measures of static connectivity. As an additional finding, the authors demonstrate that parameters of the HMM state matrix may be more informative in predicting behavioral/cognitive variables in this data compared to state-transition probabilities.

      Strengths:

      - Overall, this work helps to address the timely challenge of how to leverage high-dimensional dynamic features to describe brain activity in individuals.<br /> - The idea to use a Fisher Kernel seems novel and suitable in this context.<br /> - Detailed comparisons are carried out across the set of individual traits, as well as across models with alternate kernels and features.<br /> - The paper is well-written and clear, and the analysis is thorough.

      Potential weaknesses:

      - One conclusion of the paper is that the Fisher Kernel "predicts more accurately than other methods" (Section 2.1 heading). I was not certain this conclusion is fully justified by the data presented, as it appears that certain individual traits may be better predicted by other approaches (e.g., as shown in Figure 3) and I found it hard to tell if certain pairwise comparisons were performed -- was the linear Fisher Kernel significantly better than the linear Naive normalized kernel, for example?

      - While 10-fold cross-validation is used for behavioral prediction, it appears that data from the entire set of subjects is concatenated to produce the initial group-level HMM estimates (which are then customized to individuals). I wonder if this procedure could introduce some shared information between CV training and test sets. This may be a minor issue when comparing the HMM-based models to one another, but it may be more important when comparing with other models such as those based on time-averaged connectivity, which are calculated separately for train/test partitions (if I understood correctly).

    1. eLife assessment

      In this fundamental study, the authors analyzed associations between circulating immune cells and periodontitis. Convincing evidence identifies three immune cell types related to periodontitis, which substantially advances our understanding of periodontitis.

    2. Reviewer #1 (Public Review):

      Ye et al. used Mendelian randomization method to evaluate the causative association between circulating immune cells and periodontitis, and finally screened out three risk immune cells related to periodontitis. Overall, this is an important and novel piece of work that has the potential to contribute to our understanding of the causal relationship between circulating immune cells related to periodontitis.

    3. Reviewer #2 (Public Review):

      Summary:<br /> This is a carefully done study containing interesting results.

      Strengths:<br /> These findings have significant implications for periodontal care and highlight the potential for systemic immunomodulation management on periodontitis, which is of interest to readers in the fields of periodontology, immunology, and epidemiology.

    1. eLife assessment

      This valuable study combines in vitro and in vivo experiments designed to test if a deoxycytidine kinase inhibitor provides therapeutic benefit during infection with Staphylococcus aureus. Several in vitro methods used to measure therapeutic efficacy are thorough and compelling with appropriate conclusions drawn, however, the overall analysis is incomplete and would benefit from a more rigorous study design. With a strengthened study design, and more nuanced considerations of the strengths and limitations of the study, this paper would be of interest to bacteriologists, immunologists, and those studying host-microbe interactions.

    1. eLife assessment

      This valuable study combines in vitro and in vivo experiments designed to test if a deoxycytidine kinase inhibitor provides therapeutic benefit during infection with Staphylococcus aureus. The authors provide compelling evidence that this putative host-directed therapy has good potential to promote natural clearance of infection without targeting the bacterium. This paper would be of interest to bacteriologists, immunologists, and those studying host-microbe interactions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Review:

      Lujan et al make a significant contribution to the field by elucidating the essential role of TGN46 in cargo sorting and soluble protein secretion. TGN46 is a prominent TGN protein that cycles to the plasma membrane and it has been used as a TGN marker for many years, but its function has been a fundamental mystery.

      In parallel, it remains unclear how most secreted proteins are targeted from the Golgi to the cell surface. These molecules do not contain conserved sequence motifs or post-translation modifications such as lysosomal hydrolases. Cargo receptors for these secreted proteins have remained elusive.

      Therefore, these investigations are likely to have a significant influence on the field.

      To gain an insight into the molecular role of TGN46 in sorting, they systematically test the impact of the luminal, transmembrane, and cytosolic domains. Importantly and against the current thinking, they demonstrate that the luminal domain of TGN facilitates sorting. Interestingly, neither the cytosolic nor the length of the transmembrane domain of TGN46 plays a role in cargo export. The effects of TGN46 depletion are specific as membrane- associated VSVG remains unaffected.

      Interestingly, TGN46 luminal domain also plays an important role in the intracellular and intra-Golgi localization of TGN46, and it contains a positive signal for Golgi export in CARTS. Rigorous, well-performed data support the experimental evidence.

      A speculative part of the manuscript, with some accompanying experimental data, proposes that the luminal domain of TGN46 forms biomolecular condensates that help to capture cargo proteins for export.

      One important point to discuss is that the effects of TGN46 KO are partial, suggesting that TGN46 stimulates the Golgi export of PAUF but is not essential for this process. The incomplete block is apparent in Fig 1 and in Fig 5D.

      We thank the reviewers and the editorial team for their assessment and valuable feedback on our manuscript. Their supporting comments reinforce the significance of our findings.

      Regarding the specific point raised about the partial effects observed in the TGN46 KO cell line, we acknowledge the importance of this issue, and we have addressed it in more detail in the revised version of our manuscript. The partial effects observed when using the TGN46 KO cell line are likely caused by several factors:

      (1) It is important to consider the phenomenon of cell adaptation/compensation, which is documented to occur in gene knockout cell lines. Cells often respond to genetic perturbations by adapting to compensate the loss of a specific gene. These compensatory effects could potentially mitigate the full impact of TGN46 depletion and might explain the partial effects observed.

      (2) Our data indicate that the absence of TGN46 reduces PAUF secretion, but does not completely block its export. These results align with our proposed role TGN46 in cargo sorting. In its absence, the secretory proteins likely exit the TGN via alternative routes/mechanisms, such as "bulk flow" or by entering other transport carriers in an uncontrolled manner. The partial redistribution of the TGN46-∆lum mutant into VSVG carriers (Figure 4D) supports this likelihood. Importantly, similar situations are observed when unrelated sorting factors are depleted from the Golgi membranes. For example, when the cofilin/SPCA1/Cab45 sorting pathway is genetically disrupted, the secretion of this pathway's clients is inhibited but not completely halted (e.g., von Blume et al. Dev. Cell 2011; J. Cell Biol. 2012).

      (3) As suggested by the reviewers, it remains possible that TGN46 is not the sole player for cargo sorting. The existence of redundant or alternative mechanisms cannot be ruled out.

      In our revised manuscript, we have now provided a more in-depth discussion of these factors and their potential contributions to the observed partial effects in TGN46 KO cells (lines 447-463). We believe that a comprehensive exploration of these possibilities will improve our understanding of the role(s) of TGN46 in cargo sorting and TGN export.

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      The reviewers were unanimously enthusiastic about your work. They felt that the manuscript could be significantly improved mostly through careful re-wording, additional explanations and some figure modifications.

      We thank the reviewers and the editorial team for their enthusiastic assessment of our findings. Their positive feedback is reassuring.

      We have now addressed the reviewers' suggestions to improve the clarity of our manuscript. Specifically, we have improved various aspects of the text that may have lacked clarity in the initial submission. This includes a thorough re-writing of respective sections to ensure that the content is more accessible and reader-friendly (see detailed answers to the additional points below). Furthermore, we have carefully followed the recommendations related to figure modifications.

      Please mention the species (human) in the title.

      We have changed the title according to the suggestion. The revised title now is: "Sorting of secretory proteins at the trans-Golgi network by human TGN46". In addition, we have also added the word "human" in the abstract ("... we identified the human transmembrane protein TGN46 as a receptor for the export of secretory cargo protein PAUF in CARTS ...").

      Additional points:

      The main Figures only show quantifications that are challenging to understand without fluorescence micrographs. We suggest putting the micrographs of the fluorescence images (Figures S2A and B) into the main Figure 2 (before 2B and 2C)-the same in Figures 3 and 4.

      Following the reviewers' suggestion, we have incorporated the fluorescence micrographs (included as figure supplements in the initial submission) into the main figures 2–5. Given that these additions have introduced a significant number of extra figure panels, we have carefully re-designed the figure layout to accommodate all the necessary information. This has involved that the FLIP data from old Figs. 2–4 is now included as a new Fig. 3; and the split of old Fig. 4 in the new Figs. 5,6. The supporting figures have also been rearranged accordingly. In addition, we have changed the color palette of the micrographs, in which now the dual-color images are presented in color-blind-friendly green and magenta, instead of green and red as previously. We believe that in this revised manuscript, all data and micrographs are clearly presented.

      For figures such as Fig. 1B, the mean and SD positions are hard to see for the data plotted as solid black dots. Maybe hollow circles would be better.

      The reviewers are right and we apologize for any difficulty in discerning the mean and SD positions from the figure. In our revised version, we have made the necessary modifications to all the figures where data points were plotted as solid black circles by converting them into empty black circles, as suggested by the reviewers.

      In the right side of Fig. 1A, is the difference in PAUF secretion between WT and KO cells truly significant? The meaning of the number of asterisks should be given in the legend. Only one asterisk is shown, suggesting that the significance is low.

      In our revised manuscript, we have included comprehensive information about the statistical significance, such as the statistical test used, p-values/asterisk meaning, and any other relevant details. In addition, we have included the lines connecting the individual data points corresponding to the different replicates of the secretion assays (WT vs KO).

      Experiments such as the one in Fig. 1C may be better described as iFRAP rather than FLIP.

      We appreciate the reviewers' attention to the experimental methods used, e.g., in Figure 1C. We actually performed FLIP experiments rather than iFRAP, and we acknowledge that this might not have been stated clearly in our initial submission. The distinction between iFRAP and FLIP lies in the frequency of photobleaching. In iFRAP, photobleaching occurs only once at the beginning of the experiment, whereas FLIP involves repeated photobleaching (FLIP is sometimes also referred to as "repeated iFRAP"), which was conducted in our experiments. Specifically, in our experiments we performed repeated photobleaching at a relatively slow rate (approximately once per minute; every two imaging frames).

      We understand the potential source of confusion, which may have arisen from the references we provided to introduce our FLIP experiments (Hirschberg et al. 1998; Patterson et al. 2008). In those papers, almost all results were obtained using iFRAP and not FLIP. In light of this feedback, we have made significant efforts in our revised manuscript to clarify the terminology and procedure used in our experiments (lines 148-154). These revisions have improved the understanding of our findings and we appreciate the reviewers' suggestions.

      When using iFRAP to measure the Golgi residence time of a TGN46 construct that has a cytosolic tail, shouldn't recycling from the plasma membrane be taken into account? Unlike a secreted protein, TGN46 will never show complete loss of signal from the Golgi.

      The reviewers are right: for a TGN46 construct that can efficiently recycle back to the TGN from the cell surface, an iFRAP experiment would not report solely the protein residence time at the Golgi. We concur with the reviewers, and we'd like to clarify that the reason we performed FLIP experiments, as opposed to iFRAP, was precisely to address this concern. In an iFRAP experiment, where photobleaching occurs only once at the beginning, the fluorescence decay within the Golgi area would indeed consist of two components: a decay due to the export of the protein and an increase in fluorescence due to the protein that had been exported (after the initial photobleaching) and then recycled back to the Golgi area. In contrast, our choice of conducting FLIP experiments, with repeated photobleaching of the pool of fluorescent protein outside the Golgi area (approximately once per minute), minimizes the influence of recycling. Consequently, the loss of fluorescence in the Golgi area in our FLIP experiments predominantly reflects the protein's export. We acknowledge that this distinction was not adequately communicated in our initial submission and we have emphasized these points in the revised version of the manuscript (lines 230-234).

      Lines 274 to 285 are confusing and controversial. The author argues that the transmembrane domain does not impact TGN localisation and cargo packaging. Later, they state, "These data further support the idea that the slower Golgi export rate of TGN46 mutants with short TMDs is a consequence of their compromised selective sorting into CARTS".

      We appreciate the reviewers' attention to the potential confusion regarding the impact of the TMD on TGN localization and cargo packaging. Actually, our results indicate that the length of the TMD does not seem to have an impact in intra-Golgi protein localization (Fig. 4B,C) but they do play a role in incorporation into CARTS (Fig. 4D,E). We have now clarified this in the text (lines 283-284; 296-297).

      That being said, these results were also surprising to us initially. However, upon closer examination of the amino acid sequence of the cytosolic domain of TGN46, we noticed a possible side effect of shortening its TMD. Shortening the TMD of TGN46 could lead to the partial burial of highly charged residues from TGN46 cytosolic tail (HHNKRK...) into the membrane, potentially affecting its behavior. For that reason, we constructed the TGN46 ∆cyt ST-TMD mutant, which features a short TMD (ST TMD) and lacks the potential interference from the cytosolic tail (see also lines 307-320). Notably, this mutant showed a phenotype similar to that of TGN46-Δcyt, and to that of full length TGN46, particularly in terms of intra-Golgi localization and CARTS specificity. We acknowledge that the interpretation of these results can be debated, and we have ensured that the revised manuscript captures these nuances. Additionally, we have realized that the organization and presentation of these results may have caused confusion, particularly concerning the placement of the results from the GFP-TGN46 ∆cyt ST-TMD mutant. To address this, we have reorganized old Figures 2 and 3 to ensure that the results of the GFP-TGN46 ∆cyt ST- TMD mutant are presented with the short TMD mutants. These adjustments have greatly improved the overall flow of our manuscript. We thank the reviewers for their valuable feedback.

      In lines 444-446 in the Discussion the argumentation is confusing. The experiment shows that the cytosolic domain of TGN46 has no impact on TGN46 localisation or cargo packaging into a nascent vesicle. At the same time, the authors mention that a cytosolic complex composed of Rab6 and p62 is required to generate CARTS.

      We are grateful for the reviewers' feedback regarding our argumentation in lines 444-446. Indeed, our results indicate that the cytosolic tail of TGN46 does not play a major role in packaging of TGN46 in CARTS and in PAUF secretion. However, it is important to acknowledge that our findings do not rule out the possibility that TGN46 might have a dual function at the TGN. It could potentially play a role in mediating or controlling the export of other cargo proteins by alternative mechanisms/routes, which could, in part, depend on its cytosolic domain.

      This complexity is consistent with the open question regarding the role of the cytosolic Rab6- p62 complex in CARTS biogenesis. Interestingly, in experiments reported in Jones et al. (1993), a Golgi budding assay was used to test the involvement of the cytosolic domains of TGN38 and TGN41 in budding of Golgi-derived carriers that contain the transmembrane cargo protein pIgA-R (polymeric IgA-receptor). The authors showed that the budding of these carriers was blocked upon incubation of the Golgi membranes with peptides against the cytosolic tail of TGN38/41 but not peptides against their lumenal domain. However, in the latter experiment, they used a peptide formed by the 15 N-terminal residues of TGN46, which might not functionally block the entire lumenal domain (>400 residues). Our results with reference to earlier results in the field will serve as a basis for further exploring the role(s) of TGN46 in cargo export beyond the scope of the present study.

      In summary, these are all very important points (we thank again the reviewers for highlighting them), which we have now carefully addressed in the revised version of our manuscript (lines 476-485).

      The phase separation experiments are exciting. However, they are not necessary. They may be more confusing than helpful for the following reasons:

      • The authors use very high protein concentrations and crowding reagents. Any protein would condense under these conditions.

      The protein was produced in bacteria so that it won't have post-translational modifications, especially glycosylation, possibly the most critical drivers of phase separation.

      There was no test of direct binding of PAUF with TGN46

      We appreciate that the reviewers share our excitement about our preliminary phase separation experiments. Likewise, while we initially included these experiments in the "Ideas and speculation" section due to their exciting nature, we concur with the reviewers that their preliminary nature and the experimental conditions used to obtain them raise valid concerns.

      In light of these considerations and to prevent any potential confusion for the readers, we have decided to follow the advice of the reviewers. We have removed the phase separation experiments and data from the revised manuscript. Instead, we have retained a simplified and concise "Ideas and speculation" section, in which we propose condensate formation as a potential mechanism by which TGN46 functions as a cargo sorter at the TGN (lines 580- 620).

      The authors reference S5A as the localisation between TGN46deltaLUM images, however, we believe they are referring to fig. S7.

      We apologize for the oversight in referencing the figure and thank the reviewers for bringing this to our attention. We have amended this in the revised version.

      The authors write "remarkably, the amino acid sequence of rat TGN38 is largely conserved amongst other species, including humans (>80% amino acid identity between rat TGN38 and human TGN46)". To understand if this is remarkable, the authors should use the average identity between rat and human proteins.

      We are grateful for the reviewer's insightful comment. Indeed, as the reviewer hints, the average identity between the rat and human proteomes is of the same order of magnitude as the identity reported between rat TGN38 and human TGN46. We therefore acknowledge that the term "remarkable" may not be suitable in this context and could lead to potential misinterpretation. In the revised version, we have removed the term "remarkably".

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      This paper describes the role of WRNIP1 AAA+ ATPase, particularly its UBZ domain for ubiquitinbinding, but not ATPase, to prevent the formation of the R-loop when DNA replication is mildly perturbated. By combining cytological analysis for DNA damage, R-loop, and chromosome aberration with the proximity ligation assay for colocalization of various proteins involved in DNA replication and transcription, the authors provide solid evidence to support the claim. The authors also revealed a distinct role of WRNIP1 in the prevention of R-loop-induced DNA damage from FANCD2, which is inconsistent with the known relationship between WRNIP1 and FANCD2 in the repair of crosslinks.

      One concern is the relationship between WRNIP1 and FANCD2 (Figure 6) in the suppression of Rloop-induced DNA damage. This is different from the relationship in inter-crosslink (ICL) repair (Socha et al. 2020), which shows the epistatic relationship between WRNIP1 as well as its UBZ domain and FANCD2 in the ICL repair. The authors need to re-evaluate the role of FNACD2 in Rloop suppression under mild replication stress (MRS) by analyzing R-loop formation in the FANCD2 knockdown (KD) cells as well as colocalization of FANCD2 with PCNA and RNA polymerase II by the PLA method and restarting the forks by the DNA coming.

      In this line, it is important to show PLA signal between FANCD2 and R-loop depends on WRNIP1 since WRINP1 recruits FANCD2 in ICL repair (Socha et al. 2020).

      In the study referenced by the reviewer, the authors implicated WRNIP1 in repairing interstrand crosslinks (ICLs) induced by agents, such as TMP/UVA, MMC, and Cisplatin (Socha et al., 2020). For the repair of ICLs, the FANCD2/FANCI complex, the central component of the FA pathway, must be recruited to DNA. The study suggests a potential role for WRNIP1 in loading the FANCD2/FANCI complex onto DNA immediately after ICL formation. However, even in the absence of WRNIP1, a residual recruitment of the FANCD2/FANCI complex to DNA was observed, possibly due to alternative mechanisms, as proposed by the authors. Interestingly, the study did not establish a similar relationship between WRNIP1 and FANCD2 after treatments that does not induce ICLs, demonstrating that WRNIP1 and FANCD2 may also play independent roles. Hence, our data demonstrating a distinct role of WRNIP1 from the FA pathway in response to R-loop-associated replication stress are not inconsistent with prior findings. Additionally, considering the UBZ domain ability to interact with ubiquitin in both its free form and when conjugated to other proteins, thereby regulating protein functions, it is not surprising that the UBZ domain of WRNIP1 may also play a role in the response to R-loop accumulation.

      Therefore, to address the reviewer's request for a more in-depth exploration of the role of FANCD2 in the regulation of R-loops, we chose to examine the impact of FANCD2 loss on the accumulation of R-loops in WRNIP1-deficient and WRNIP1 UBZ mutant cells, as well as on the dynamics of stalled forks following aphidicolin-induced MRS. Additionally, we investigated the colocalization between FANCD2 and R-loops in shWRNIP1WT, shWRNIP1 and shWRNIP1D37A cells. Details are provided below.

      In agreement with our observations, the analysis of R-loop formation upon MRS, in WRNIP1deficient cells depleted of FANCD2, revealed a significantly higher accumulation of R-loops in cells with a concomitant loss of both WRNIP1 and FANCD2 compared to those with a single deficiency (see Fig. 6D of the revised manuscript). Similar results were observed in the WRNIP1 UBZ mutant cells in which FANCD2 was abrogated (see Fig. 6D of the revised manuscript). It is important to note that, to eliminate contaminant-free RNA, particularly dsRNA, which could interfere with the binding of RNA-DNA hybrids by the S9.6 antibody (Hartono et al., 2018), and to determine the proximity between FANCD2 and R-loops more accurately, cells were treated with RNase III, following established protocols (Crossley et al., 2020).

      Furthermore, we examined the interaction of FANCD2 with R-loops using a proximity ligation assay (PLA). Our findings revealed significant colocalization between FANCD2 and R-loops in the absence of WRNIP1 and in WRNIP1 UBZ mutant cells following low-dose aphidicolin treatment and RNase III exposure, showing a significant increase compared to the control counterpart (shWRNIP1WT cells; see Fig. 6B of the revised manuscript). Consequently, we conclude that neither WRNIP1 nor its UBZ domain is necessary for FANCD2 recruitment under conditions of MRS.

      We also performed a DNA fiber assay to evaluate restarting replication forks in shWRNIP1WT, shWRNIP1 and shWRNIP1D37A cells in which FANCD2 was abrogated. Our results show that FANCD2 depletion slightly decreased the ability of the cells to restart forks from MRS (see Fig. 6E of the revised manuscript).

      Given a low number (2-4) of PLA foci for WRNIP1-RNA polymerase II or WRNIP1 and R-loop (Figure 4B and 4D), how does this colocalization reflect the functional significance?

      The data from the PLA of Figures 4B and 4D are reported as the mean of three independent experiments. It is important to note that we have introduced a new Figure 4D. To selectively assess R-loop structures, cells were treated with RNase III, a double-stranded RNA-specific endoribonuclease, following established protocols (Crossley et al., 2020). Our PLA analysis confirms the localization of WRNIP1 at/near R-loops in shWRNIP1 and shWRNIP1D37A cells, and this phenomenon is more evident in WRNIP1 UBZ mutant cells (see Fig. 4D of the revised manuscript). Specifically, the new protocol allows us to visualize a higher number of PLA foci, and we observed that Aph increased the spots per nucleus in shWRNIP1D37A cells compared to the previous experiment.

      Regarding the Fig. 4B, it is not uncommon for a low number of PLA spots per nucleus to correspond to a phenotypic effect. For instance, a similar low average in the colocalization of PCNA or RNA pol II with FANCD2 has been observed in a prior paper as well, suggesting that transcription-replication collisions occur upon Aph-induced MRS (Okamoto et al., 2019). Also, not all R-loops could be “targeted” by WRNIP1.

      It would be helpful to readers if the authors were to provide a summary figure of this paper.

      As suggested by the reviewer, we have developed a model to summarize the findings obtained in our study (see Fig. 6F of the revised manuscript).

      Minor points:

      (1) Most of the cytological images in the paper show only colocalized ones, which makes it hard to see a signal. Please show a single-color image.

      For a better visualization of nuclei signals in the figures, single-color images have been provided for Figs. 2A; 3B; 4A, B, C, D and E; 6B and D; Suppl. Fig. 2A and B of the revised manuscript.

      (2) In Figure 2A, only one or two S9.6 focus(foci) can be seen. Why 1 or 2? This focus marks a specific chromosomal locus such as the centromere or telomere.

      We agree with the reviewer that the observed foci in nuclei may indicate a specific chromosomal locus, such as telomeres or centromeres.

      (3) Figure 3A, graph: Why this graph does not use a dot plot like Figure 1B and Figure 3C?

      The graph in Figure 3A has been represented as a dot plot, as requested.

      (4) Figure 1C: P values between unperturbed conditions should be provided.

      In Figure 1C, P values comparing unperturbed conditions were already included. The results showed no significance between shWRNIP1 and shWRNIP1D37A cells when compared to MRC5SV cells and, similarly, to shWRNIP1T294A cells, as indicated in the corresponding legend.

      (5) Figure 2B: Please provide the quantification or show the reproducibility of the data.

      The quantification of R-loops using the S9.6 monoclonal antibody is not accurate, as the specificity for RNA-DNA hybrids is questionable (Hartono et al., 2018). Therefore, to demonstrate the reproducibility of the findings in Fig. 2B, we conducted a repeat of the dot-blot experiment. We treated the samples with RNase H to degrade RNA-DNA hybrids and hybridized the membrane with an anti-dsDNA to quantify R-loop levels more accurately. Our analysis confirms that the S9.6 signal strongly accumulates in shWRNIP1 cells compared to shWRNIP1WT cells (see Fig. 2B of the revised manuscript). Additionally, a graph illustrating the fold-change values of the S9.6/dsDNA signal relative to wild-type untreated cells is provided.

      (6) Figure 4A: the expression of RNaseH under aphidicolin addition increased colocalization of PCNA and RNA pol II. It is important to mention the result and provide an explanation of why it is increasing in the main text.

      Although the result may appear unexpected, and we lack experiments that explain the nature of this phenotype, a previous study reported that overexpression of RNase H1 in mammalian cells may lead to a dose-dependent reduction of certain proteins of the repair pathway, resulting in a significant accumulation of DNA damage (Shen et al., 2017). Consequently, the observed increase in TRCs upon RNase H1 overexpression in wild-type cells may be attributed to the disruption of proteins that, by impairing the repair process, can potentially cause more fork stalling and, consequently, more conflicts. We have introduced a comment in the text.

      Reviewer #2:

      This paper aims at establishing the role of WRN-interacting protein 1 (WRNIP1) and its UBZ domain (an N-terminal ubiquitin-binding zinc finger domain) on genome instability caused by mild inhibition of DNA synthesis by aphidicolin. The authors used human MRC5 fibroblasts investigated with standard methods in the field. The results clearly showed that WRNIP1 silencing and UBZ-mutation (D37A) increased DNA damage, chromosome aberrations, and transcription-replication conflicts caused by aphidicolin. The conclusions of the paper are overall well supported by results, however, aspects of some data analyses would need to be clarified and/or extended.

      (1) The methods (immunofluorescence microscopy and dot-blots) to determine R-loop levels can lack sensitivity and specificity. In particular, since the S9.6 antibody can bind to other structures besides heteroduplex, dot-blot analyses only grossly assess R-loop levels in cellular samples of purified nucleic acids, which are constituted by many different types of DNA/RNA structures.

      To eliminate contaminant-free RNA, particularly dsRNA, which could interfere with the capture of RNA-DNA hybrids by the S9.6 antibody (Hartono et al., 2018), and to determine R-loop levels more accurately, we treated cells with RNase III, following established protocols (Crossley et al., 2020). Under our experimental conditions, RNase III treatment significantly reduced the amount of dsRNA, nearly eliminating it, as evaluated using a specific antibody against dsRNA (see Suppl Fig 2 of the revised manuscript). To better appreciate the effect of the loss of WRNIP1 or its UBZ domain on Rloop accumulation and the amount of DNA damage, we have reproduced key data (see Figs 2B; 3B; 4D and E; 6B of the revised manuscript). Our analysis from immunofluorescence experiments, performed using a dsRNA ribonuclease (RNase III), confirms higher R-loop accumulation in WRNIP1-deficient or WRNIP1 UBZ mutant cells compared to control cells (Fig 3B). Additionally, proximity ligation assay (PLA) data are consistent with those previously presented and, in some cases, are more readily interpretable (see Figs 4D and E; 6B of the revised manuscript). Finally, we performed a new dot-blot experiment (see Fig. 2B of the revised manuscript). We treated with RNase H to degrade RNA-DNA hybrids and hybridized the membrane with an anti-dsDNA antibody to quantify R-loop levels more accurately. Our analysis confirms a significant accumulation of the S9.6 signal in shWRNIP1 cells compared to shWRNIP1WT cells. Additionally, a graph illustrating the foldchange values of the S9.6/dsDNA signal relative to wild-type untreated cells is provided.

      (2) Experimental plan has analyzed the impact of WRNIP1 lack or mutations at steady-state conditions. Thus, the possible role of WRNIP1 at an early step of the mechanism would require some sort of kinetics analysis of the molecular process, therefore not at steady-state conditions. The findings of a co-localization of R-loops and WRNIP1 have been obtained with the S9.6 antibody, which recognizes DNA-RNA heteroduplexes. Since WRNIP1 is known to be recruited at stalled forks and DNA cleavage sites, it is not surprising that WRNIP1 is very close to heteroduplexes, abundant structures at replication forks and cleavage sites. Similar interpretations may also be valid for Rad51/S9.6 co-localization findings.

      Investigating the potential role of WRNIP1 at an early step in the mechanism is undoubtedly very interesting and requires separate investigation. Our decision to explore the relevance of the loss of WRNIP1 or WRNIP1 mutations under steady-state conditions is based on a preliminary alkaline comet assay (provided below). The comet assay, performed at various exposure times of aphidicolin at a concentration of 0.4 micromolar, clearly indicates that the most significant effect on DNA damage accumulation in WRNIP1-deficient cells occurs after 24 hours of treatment. Therefore, we have chosen to study the transcription-associated genomic instability in our cells by treating them with a low-dose of aphidicolin for 24 hours to maximize the effect.

      Author response image 1.

      We agree that the presence of WRNIP1 or RAD51 in proximity to R-loops is consistent with their roles and may not be surprising. However, these experiments formally demonstrate their proximity to R-loops under our conditions. Notably, the new graphs, obtained from experiments repeated by treating with RNase III to reduce the amount of dsRNA and improve the specificity of the S9.6 antibody, show increased interaction of the mutated form of WRNIP1 in the UBZ domain with Rloops when compared to the wild-type form. Additionally, it is more evident that the presence of RAD51 at/near R-loops is reduced in WRNIP1 UBZ mutant cells both in untreated conditions and after MRS (see Figs 4D and E of the revised manuscript).

      (3) Determination of DNA damage, chromosome aberration, and co-localization data are reported as means of measurements with appropriate statistics. However, the fold-change values relative to corresponding untreated samples are not reported. In some instances, it seems that WRNIP1 silencing or mutations actually reduce or do not affect aphidicolin effects. That leaves open the interpretation of specific results.

      To better evaluate the significance of the data presented in the study, we have introduced the foldchange values calculated with respect to the untreated samples, as requested by the reviewer. This allowed us to conclude that the loss of WRNIP1 or the expression of the UBZ mutant form of WRNIP1 does not reduce in any case the effects of aphidicolin-induced mild replication stress.

      I would suggest some additional experiments or analyses to get more convincing results:

      (1) DNA damage should be verified also with other methods, such as DNA damage markers pH2AX and 53BP1.

      The quantification of DNA damage was also corroborated by determining the percentage of gammaH2AX-positive cells, as reported in Supplementary Figure 1B. This result is consistent with the findings from the comet assay, confirming transcription-dependent DNA accumulation in shWRNIP1 and shWRNIP1D37A cells. Regarding the 53BP1 marker, we believe that the existing data sufficiently demonstrate DNA damage accumulation in the absence of WRNIP1 or when its UBZ domain is mutated, providing comprehensive support to the study without necessitating additional results.

      (2) Repair foci may also be detected with Rad51 foci. That will also provide evidence for increased DNA damage levels under the tested conditions.

      Our prior study identified WRNIP1 as a crucial factor for RAD51 function (Leuzzi et al., 2016). Loss of WRNIP1 indeed results in a defective relocalization of RAD51 to chromatin. Consequently, the analysis of RAD51 foci may be not a useful readout to evaluate DNA damage levels under our conditions.

      (3) WRNIP1 effects should be presented as FC (fold-changes) of DNA damage, PLA results, chromosomal errors, etc, to provide evidence of the level of effects on the tested phenotypes.

      We have introduced the fold-change values calculated with respect to the untreated samples, as requested by the reviewer, for a more comprehensive analysis in the graph of Figs. 1B, C and D; 2A and B; 3A, B and C; 4A, B, C, D and E; 6B, C and D.

      (4) R-loop detection ideally should be performed by one of the several types of immunoprecipitation techniques. Alternatively, dot-blot assays should be performed with a 1:2 dilution series of each sample. Then, heteroduplexes should be detected with S9.6 along with a general aspecific dye for DNA quantity in each spot. Next, densitometric analyses of S9.6 signal should be normalized over DNA quantity.

      We acknowledge that the quantification of R-loops using the S9.6 monoclonal antibody is not accurate, as the specificity for RNA-DNA hybrids is questionable (Hartono et al., 2018). Therefore, to overcome this issue, we repeated the experiment shown in Fig. 2B. We treated the samples with RNase H to degrade RNA-DNA hybrids and hybridized the membrane with an anti-dsDNA antibody to quantify R-loop levels more accurately. Our analysis confirms that the S9.6 signal strongly accumulates in shWRNIP1 cells compared to shWRNIP1WT cells (see Fig. 2B of the revised manuscript). Additionally, a graph illustrating the fold-change values of the S9.6/dsDNA signal relative to wild-type untreated cells is provided.

      (5) A major focus on WRNIP1 D37A and T294A mutations may also make the paper overall more convincing. For instance: do the mutations affect protein recruitment at damaged chromatin? Do they increase repair foci? Do they affect the recruitment of WRN or BLM helicases or specific nucleases at chromatin under the tested conditions of MRS?

      To address this point raised by the reviewer, we performed a chromatin experiment to assess the ability of WRNIP1 and its mutated forms to translocate to chromatin upon MRS. Our analysis shows that the mutated forms of WRNIP1 do not exhibit any defects in recruitment to chromatin, although the levels of the WRNIP1 ATPase mutant appear lower than the others (see Western blotting provided below for the reviewer’s use only, Fig. A). Additionally, we tested the presence of WRN helicase, which does not show any difference between cells lines (see Western blot provided below, Author Response image 2B).

      Author response image 2.

      (6) I suggest revising the text for spelling errors.

      The manuscript has been carefully revised to identify and correct any spelling errors that may have occurred.

      Reviewer #3:

      In the manuscript by Valenzisi et al., the authors report on the role of WRNIP1 to prevent R-loop and TRC-associated DNA damage. The authors claim WRNIP1 localizes to TRCs in response to replication stress and prevents R-loop accumulation, TRC formation, replication fork stalling, and subsequent DNA damage. While the findings are of potential significance to the field, the strength of evidence in support of the conclusions is lacking.

      Weaknesses:

      (1) The authors fail to utilize the proper controls throughout the manuscript in regard to the shWRNIP1, WT, and mutant cell lines. It is unclear why the authors failed to use the shWRNIP1WT line in the comet assay, DNA fiber assay, and the FANCD2 assays. This is a key control for i) the use of only a single shRNA (most studies will use at least 2 different shRNAs) and ii) the use of the mutant WRNIP1 lines. In several figures, the authors only show the effect of the UBZ mutant, but don't include the ATPase mutant or WT for comparison. Including these is essential.

      We agree with the reviewer's criticism that the use of shWRNIP1WT cells as a control is more appropriate. Therefore, all the new experiments presented in the revised version of the manuscript have been performed using the shWRNIP1WT cells. Notably, new results are in line with those obtained using the MRC5SV cells, rendering us confident that our findings are reliable overall. By contrast, we do not feel that including the WRNIP1 ATPase mutant cells is always essential, since our data clearly demonstrate that the loss of ATPase activity of WRNIP1 does not affect transcriptionassociated genome instability.

      (2) The authors use the S9.6 antibody to conclude the loss of WRNIP1 causes more R-loops; however, it has been shown that this antibody detects dsRNA in addition to RNA-DNA hybrids. Accordingly, it cannot be ruled out that the increased S9.6 signal is due to increased dsRNA.

      To eliminate contaminant-free RNA, particularly dsRNA, which could interfere with the capture of RNA-DNA hybrids by the S9.6 antibody (Hartono et al., 2018), and to determine R-loop levels more accurately, we treated cells with RNase III, following established protocols (Crossley et al., 2020). Under our experimental conditions, RNase III treatment significantly reduced the amount of dsRNA, nearly eliminating it, as evaluated using a specific antibody against dsRNA (see Suppl Fig 2 of the revised manuscript). To better appreciate the effect of the loss of WRNIP1 or its UBZ domain on Rloop accumulation and the amount of DNA damage, we have reproduced key data (see Figs 3B; 4D and E; 6B, D and E of the revised manuscript). Our analysis from immunofluorescence experiments, performed using a dsRNA ribonuclease, confirms higher R-loop accumulation in WRNIP1-deficient or UBZ WRNIP1 mutant cells compared to control cells (Fig. 3B). Additionally, proximity ligation assay (PLA) data are consistent with those previously presented and, in some cases, are more readily interpretable (see Figs 4D and E; 6B of the revised manuscript).

      (3) Multiple pieces of data do not support the conclusions. For example, Figure 1D shows shWRNIP1 to reduce damage in Aph+DRB cells compared to MRC5SV cells with Aph+DRB. This result suggests that WRNIP1 actually increases DNA damage in stressed cells with transcription blocked. Another result is seen in Figure 4a, where the number of PLA spots (presumably TRCs) increases in the shWRNIP1WT cells with Aph+RNH1 compared to Aph alone. If R-loops are required for TRC accumulation, then the RNH1 should decrease the PLA foci. This result instead suggests that WRNIP leads to increased TRCs in stressed cells with R-loops cleared by RNH1.

      Regarding Figure 1D, in MRC5SV cells, DRB does not significantly increase DNA damage upon Aph treatment. Therefore, it is not correct to conclude that WRNIP1 exacerbates DNA damage in stressed cells with transcription blocked.

      Regarding Figure 4A, while the outcome may appear unexpected, and we do not provide data that explain the nature of this phenotype, a previous study demonstrated that overexpression of RNase H1 in mammalian cells may lead to a dose-dependent reduction of certain proteins of the repair pathway, leading to a significant accumulation of DNA damage (Shen et al., 2017). Accordingly, the observed increase in TRCs upon RNase H1 overexpression in wild-type cells may be attributed to the disruption of proteins that, by impairing the repair process, can potentially cause more fork stalling and, consequently, more conflicts. We have introduced a comment in the text.

      (4) The data are mostly phenomenological and fail to yield mechanistic insight. For example, the authors state that "it remains unclear whether WRNIP1 is directly involved in the mechanisms of Rloop removal/resolution". Unfortunately, the data presented in this manuscript do not provide new insights into this unresolved question.

      We agree with the reviewer that elucidating the mechanism by which WRNIP1 contributes to R-loop suppression would be of interest. Nevertheless, the findings presented here provide compelling evidence of a novel role for WRNIP1 in preventing R-loop accumulation. Investigating how WRNIP1 accomplishes this function will require significant effort, which we are committed to undertaking.

      (5) The authors only show merged images making it impossible to visualize differences in PLA foci.

      For a better visualization of nuclei signals in the PLA panels of Figs 4A, B, C, D and E; 6B, singlecolor images have been provided.

      In addition to including the controls I mentioned in the public review, I recommend investigating the mechanism of how WRNIP1 prevents R-loop accumulation. If it is indeed related to its UBZ domain, then does that mean ubiquitination is an important step in R-loop removal? I believe elucidating this would be a novel and significant contribution. If it's not related to ubiquitination, then how does the UBZ domain regulate R-loops?

      We agree with the reviewer that investigating the precise role of the UBZ domain of WRNIP1 in Rloop prevention would be of interest, and several experiments are required to adequately address this issue. However, as discussed, we hypothesize that the UBZ domain might contribute to directing WRNIP1 to DNA at TRC sites through RAD18.

      I recommend using purified RNH1-dead-GFP to detect R-loops as opposed to the S9.6 antibody. The Cimprich lab has published this recently as a tool for detecting R-loops in fixed cells.

      As explained in point 2), to eliminate contaminant-free RNA, particularly dsRNA, which could interfere with the capture of RNA-DNA hybrids by the S9.6 antibody (Hartono et al., 2018), and to determine R-loop levels more accurately, we used treatment with RNase III, following established protocols (Crossley et al., 2020). New experiments are reported in the revised version of the manuscript for R-loops in all cell lines (see Fig. 3B of the revised manuscript).

      Additionally, colocalization by PLA of WRNIP1/R-loops, RAD51/R-loops, FANCD2/R-loops, and R-loop accumulation by anti-S9.6 antibody in cells depleted of FANCD2 are presented (see Figs. 4D and E; 6B and D of the revised manuscript).

      Furthermore, we repeated the dot-blot experiment (see Fig. 2B of the revised manuscript). We treated the samples with RNase H to degrade RNA-DNA hybrids and hybridized the membrane with an antidsDNA antibody to quantify R-loop levels more accurately. Our analysis confirms that the S9.6 signal strongly accumulates in shWRNIP1 cells compared to shWRNIP1WT cells. Additionally, a graph illustrating the fold-change values of the S9.6/dsDNA signal relative to wild-type untreated cells is provided.

      Importantly, overall, our findings suggest that treatment with RNase III does not substantially change the results obtained without it, but in some cases, such as in Fig. 4D, makes them are more readily interpretable. Specifically, the new protocol allows us to visualize a higher number of PLA foci, and Aph increased the spots per nucleus in shWRNIP1D37A cells compared to the previous experiment (see Fig. 4D of the revised manuscript).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript of Zhao et al. aimed at investigating the relationships between type 2 diabetes, bone mineral density (BMD) and fracture risk using Mendelian Randomization (MR) approach.

      The authors found that genetically predicted T2D was associated with higher BMD and lower risk of fracture, and suggested a mediated effect of RSPO3 level. Moreover, when stratified by the risk factors secondary to T2D, they observed that the effect of T2D on the risk of fracture decreased when the number of risk factors secondary to T2D decreased.

      Strengths:

      • Important question

      • Manuscript is overall clear and well-written

      • MR analyses have been conducted properly, which include the usage of various MR methods and sensitivity analyses, and likely meet the criteria of the MR-strobe checklist to report MR results.

      Response: Thanks.

      Weaknesses:

      • Previous MR studies on that topic have not been discussed

      Response: In the manuscript, we discussed the previous MR findings from Trajanoska et al., BMJ, 2018. This study assessed the effect of 15 clinical risk factors (including type 2 diabetes) on fracture risk. Now we have included the other two studies (Mitchell et al, Diabetologia, 2021; Ahmad et al JBMR, 2016) which took BMD as the exposure in the paragraph when we discussed the effects on BMD.

      • Multivariable MR could have been used to better assessed the mediative effect of BMI or RSPO3 on the relationships between T2D and fracture risk.

      Response: In revision, the inverse weighted multivariable MR model was used to estimate the direct effect of T2D upon the fracture and BMD adjusted for BMI with ‘MVMR’ R package (https://github.com/WSpiller/MVMR). Specifically, we first extracted the overlapping SNPs from the summary data for T2D, BMI and fracture. Then the independent significant SNPs (P<5×10−8 and R2<0.1) for either T2D or BMI were pooled as instruments. Additionally, we performed SNP harmonization to correct the orientation of alleles. Additionally, we performed SNP harmonization to correct the orientation of alleles. The results showed that increased risk of T2D has a direct effect that decreased fracture risk (OR=0.974, 95%CI=0.953-0.995, P=0.017 adjusted BMI), and BMI mediated 9.03% of the protective effect. The multivariable MR analysis suggested that T2D also showed direct effect on increased BMD after adjusting for BMI (β=0.042, 95%CI=0.026-0.057, P=1.92×10-7). We didn’t observe the direct effect of MRI-derived visceral (β=0.02, P=0.831) and abdominal subcutaneous (β=0.03, P=0.57) on fracture risk adjusted for RSPO3 expression. We have updated the Methods and Results accordingly.

      Reviewer #2 (Public Review):

      The authors employed the Mendelian Randomization method to analyze the association between type 2 diabetes (T2D) and fracture using the UK Biobank data. They found that "genetically predicted T2D was associated with higher BMD and lower risk of fracture". Additionally, they identified 10 loci that were associated with both T2D and fracture risk, with the SNP rs4580892 showing the highest signal. While the negative relationship between T2D and fracture has been previously observed, the discovery of these 10 loci adds an intriguing dimension to the findings, although the clinical implications remain uncertain.

      Response: We appreciate the reviewer's thoughtful evaluation of our study. The hypothesis and idea of this study is that the genetically determined type 2 diabetes might not be associated with higher risk of fracture, but the risk association could be observed. However, when stratified by the risk factors secondary to the disease, we observed that the effect of T2D on the risk of fracture decreased when the number of risk factors secondary to T2D decreased, and the association became non-significant if the T2D patients carried none of the risk factors. These results suggested that the risk factors secondary to type 2 diabetes might contribute more to the risk of fracture. Therefore, the clinical implications of our study might lie in the health management of type 2 diabetes patients. We suggest that it is important to manage the complications of type 2 diabetes to prevent the risk of fracture.

      Reviewer #1 (Recommendations For The Authors):

      • Introduction/discussion: findings from MR previously published on that topic have not been discussed in this manuscript (eg, Mitchell et al, Diabetologia, 2021; Ahmad et al JBMR, 2016);

      Response: In the manuscript, we discussed the previous MR findings from Trajanoska et al., BMJ, 2018. The study assessed the effect of 15 clinical risk factors (including type 2 diabetes) on fracture risk. Sorry that we missed the studies you mentioned, these two studies took BMD as the exposure, now we have included them in the paragraph where we discussed the effect of T2D on BMD (Page 14, Line 320-322).

      • In the one-sample MR analysis: I would suggest looking at whether the association between T2D GRS and fracture risk differ across fracture sites; in the hypothesis that BMI might be protective, performing the analysis separately for weight-bearing bones vs not weight-bearing bones would be interesting.

      Response: According to your suggestion, we further categorized fractures into weight-bearing bones (neck, vertebrae, pelvic, femur, tibia) and other bones (detailed codes have been added to Supplementary Table 16). When we regressed the observed fracture on the wGRS, it indicated that there was trend of protective association between T2D wGRS and both weight-bearing bones fracture (OR=0.9772, 95%CI=0.9552-0.9997, P=0.04737, N of fracture=8,992) and other bones fracture (OR=0.9838, 95%CI=0.9688-0.9991, P=0.0386, N of fracture=20,317) (Figure 1). We have updated the Methods and Results accordingly (Page 6, line 129-134 and Page 18, line 408-412).

      In this analysis, I would also suggest verifying the absence of sex interaction with T2D PRS on BMD and fracture risk

      Response: Thanks for your suggestion, we further estimated the effect of sex interaction on BMD and fracture risk with T2D wGRS × sex interaction term in regression model. And you are right, we found no interactions (sex with T2D wGRS) on fracture risk (P=0.5576) and BMD (P=0.66). Moreover, we conducted the stratified analysis by sex. When we regressed the observed fracture on the wGRS in male, we found that the genetically determined type 2 diabetes was also associated with lower risk of fracture (OR=0.977, P=0.015) (adjusting for reference age, sex, BMI, physical activity, fall history, HbA1c and medication treatments). In female, the direction of the association remained with no significance (OR=0.986, P=0.139). We tested the heterogeneity between male and female, and found no significant difference (Pheterogeneity= 0.457). Similarly, the genetically determined type 2 diabetes was associated with higher BMD in male (β=0.023, P=8.23×10-14) and female (β=0.022, P<2.0×10-16), and Pheterogeneity=0.6306 (Supplementary Figure 2). We have updated the Methods and Results accordingly (Page 6, line 134-139 and Page 19, line 425-429).

      • In the two-sample MR analysis: I would suggest performing a multivariable MR to look at the effect of T2D adjusted for BMI on BMD and fracture risk (see Burgess et al, AJE, 2016)

      Response: Thanks for your suggestion, in revision, the inverse weighted multivariable MR model was used to estimate the direct effect of T2D upon the fracture and BMD adjusted for BMI with ‘MVMR’ R package (https://github.com/WSpiller/MVMR). Specifically, we first extracted the overlapping SNPs from the summary data for T2D, BMI and fracture. Then the independent significant SNPs (P<5×10−8 and R2<0.1) for either T2D or BMI were pooled as instruments. Additionally, we performed SNP harmonization to correct the orientation of alleles. Additionally, we performed SNP harmonization to correct the orientation of alleles. The final IVs used in MVMR were presented in Supplementary Table 17. The results showed that increased risk of T2D has a direct effect that decreased fracture risk (OR=0.974, 95%CI=0.953-0.995, P=0.017 adjusted BMI) and increased BMD (β=0.042, 95%CI=0.026-0.057, P=1.92×10-7 adjusted BMI). We have updated the Methods and Results accordingly (Page 7, line 155-158, 162-164, and Page 20, line 456-465).

      • In the section "infer the shared genetics". In addition of using waist circumference and waist-hip ratio, it would have been interesting to use GWAS summary statistics for subcutaneous and visceral adiposity (Agrawal, Nat Comm, 2022), and look at through multivariable MR whether RSPO3 mediate the effect of subcutaneous fat on fracture risk.

      Response: Thanks for your suggestion, we downloaded the genetic summary data from Agrawal, Nat Comm, 2022, and performed the same SMR analysis as we did before. We found that higher expression of RSPO3 was associated with higher MRI-derived visceral (β=0.199, P=4.36×10-5). We have updated the Methods and Results accordingly (Page 9, line 206-208 and Page 22, line 494-495).

      We didn’t observe the direct effect of MRI-derived visceral (β=0.02, P=0.831) and abdominal subcutaneous (β=0.03, P=0.57) on fracture risk adjusted for RSPO3 expression.

      Reviewer #2 (Recommendations For The Authors):

      Specific comments

      Several concerns regarding the study's concept and methodology should be addressed before accepting the findings as credible. I would like to invite the authors to comment on the following points.

      (1) I find the authors' assertion that individuals with type 2 diabetes (T2D) exhibit both higher BMD and an increased risk of fracture to be unconvincing. The BMD measurement they refer to is based on areal BMD, which fails to account for the three-dimensional aspect of bone density. Existing evidence suggests that patients with T2D actually have lower trabecular bone scores (a predictor of fracture risk) compared to those without the condition. Furthermore, there is a lack of a clearly stated hypothesis underlying the study.

      Response: Yes, in this study, the bone mineral density measurement is based on areal BMD. We made this clear in Abstract. And we agree that other measurements, such as trabecular bone score and chest CT texture analysis, could provide additional valuable information in the evaluation of fracture risk, especially in type 2 diabetes patients. We have discussed this in the manuscript (Page 13, line 295-300). Epidemiologic studies from the past decade provided evidence that increased bone fracture risk is one of the complications of type 2 diabetes. but the areal BMD in type 2 diabetes patients could be normal or even higher (Botella Martinez et al., 2016; Romero-Diaz et al., 2021).

      In this study, we employed the mendelian randomization approach to investigate the relationship between type 2 diabetes and fracture/BMD, this method might facilitate the use of genetic data as instrumental variables to alleviate the bias of the unknown confounding factors. We found that the genetically predicted type 2 diabetes was associated with higher BMD and lower risk of fracture. That is to say, by alleviating the bias of the unknown confounding factors through MR analysis, the genetically predicted type 2 diabetes did not show bone paradox.

      We then performed observational analysis in UK Biobank, and found that type 2 diabetes was associated with higher risk of fracture and increased BMD. Further, we stratified the T2D patients with five secondary risk factors (BMI≤25kg/m2, no physical activity, falls in the last year, HbA1c≥47.5mmol/mol and antidiabetic medication treatment), and found that the effect of type 2 diabetes on the risk of fracture decreased when the risk factors secondary to type 2 diabetes decreased, and the association became not significant if the type 2 diabetes patients carried none of the risk factors. That is to say, the diabetic bone paradox might not exist if the secondary risk factors of type 2 diabetes were eliminated.

      The hypothesis and idea we want to deliver is that the genetically determined type 2 diabetes might not be associated with higher risk of fracture, but the association could be observed. However, when stratified by the risk factors secondary to the disease, we observed that the effect of T2D on the risk of fracture decreased when the number of risk factors secondary to T2D decreased, and the association became non-significant if the T2D patients carried none of the risk factors. These results suggested that the risk factors secondary to type 2 diabetes might contribute more to the risk of fracture. Therefore, it is important to manage the complications of type 2 diabetes to prevent the risk of fracture.

      In addition, although we observed type 2 diabetes was observed to be associated with higher risk of fracture, but BMI mediated 30.2% of the protective effect. And the shared genetic architecture between type 2 diabetes and fracture suggested a top signal near RSPO3 gene. Higher expression of RSPO3 was associated with higher waist circumference and higher waist-hip ratio. These results suggested that relatively higher BMI in type 2 diabetes patients might benefit the higher BMD, as our previous study suggested that keeping moderate-high BMI (overweight) might be of benefit to old people in terms of fracture risk(Zhu et al., 2022).

      (2) It is not a good idea to solely concentrate on overall fracture risk as it may obscure the potential relationship between T2D and specific fracture sites, such as hip and vertebral fractures. By solely considering total fracture incidence, important associations at individual fracture sites could be overlooked. I would like to propose that the authors expand their analysis to include the examination of hip and vertebral fractures. By incorporating these specific fracture types into their study, a more comprehensive understanding of the association between T2D and fractures can be achieved.

      Response: This is a good suggestion, incorporating with the comments from another reviewer, and considering the sample size, we classified fractures into weight-bearing fractures (neck, vertebrae, pelvic, femur, tibia) and other bones (skull and facial, ribs, sternum, forearm, wrist and hand, foot and other unspecified body regions) fracture. We identified 6,582 (1.87%) participants with weight-bearing bones fracture and 9,586 (2.72%) participants with other bones fracture within the 352,879 UK Biobank participants. We observed a higher risk of fracture in the type 2 diabetes patients in the cox proportional hazards regression after adjusted for the reference age, sex, BMI, physical activity, fall history, HbA1c and medication treatments (weight-bearing bones fracture: HR=1.792, 95%CI 1.555-2.065, P=8.25×10-16; other bones fracture: HR=1.337, 95%CI 1.167-1.531, P=2.85×10−5), and additionally controlled for BMD (weight-bearing bones fracture: HR=1.850, 95%CI 1.602-2.136, P<2×10−16; other bones fracture: HR=1.377, 95%CI 1.199-1.580, P=5.54×10−6). We have updated the manuscript according in Results, Methods and Figures (Page 11, line 245-250; Page 24, line 540-547; Figure 4A).

      (3) I consider that there is an issue with combining data from both males and females in the analysis. It is widely recognized that women generally have a higher risk of fracture compared to men. Moreover, the association between BMD and fracture may vary between genders, and the risk of T2D is typically lower in women than in men. Therefore, I strongly recommend that the analysis be stratified by gender to account for these differences and provide a more accurate understanding of the relationships involved.

      Response: Thanks for your suggestion, we now add the stratified results by sex to each analysis. Briefly, in wGRS analysis, we found that the genetically determined type 2 diabetes was associated with lower risk of fracture in male (OR=0.977, 95%CI=0.958-0.995, P=0.015) (adjusting for reference age, sex, BMI, physical activity, fall history, HbA1c and medication treatments). The association in female was not significant, but the direction is the same as the male (OR=0.986, 95%CI=0.969-1.004, P=0.139). Meanwhile, the genetically determined type 2 diabetes was associated with higher BMD in both male (β=0.023, 95%CI=0.017-0.030, P=8.23×10−14) and female (β=0.022, 95%CI=0.017-0.026, P<2×10−16). In observational analysis, we observed a higher risk of fracture in the type 2 diabetes patients in the cox proportional hazards regression after adjusted for the reference age, sex, BMI, physical activity, fall history, HbA1c and medication treatments in male (HR=1.587, 95%CI 1.379-1.828, P=1.26×10−10) and female (HR=1.530, 95%CI 1.334-1.756, P=1.27×10−9), respectively. When we additionally controlled for BMD (HR=1.607, 95%CI 1.393-1.853, P=7.21×10−11 in male; HR=1.601, 95%CI 1.393-1.841, P=3.59×10−11 in female), we still observed increased risk of fracture in type 2 diabetes (Page 6, line 136-139; Page 11, line 241-243).

      (4) My understanding is that "BMD" in UK Biobank refers to estimated BMD derived from ultrasound measurements, rather than being directly measured using dual-energy X-ray absorptiometry (DXA). It would be helpful to clarify whether the BMD mentioned in the manuscript refers to estimated BMD or DXA-based BMD to ensure accurate interpretation of the results.

      Response: Yes, we used the BMD estimated from quantitative ultrasound measurement at heel as the outcome. Use of the device generates two variables, including speed of sound (SOS) and BUA (the slope between the attenuation of the sound signal and its frequency as it travels through the bone and soft tissue). Heel BMD was calculated by the following formula: BMD = 0.002592 ×(BUA+SOS)−3.687. We have made this clear in Methods (Page 23, line 526-530).

      (5) The clarification regarding the nature of the 13,817 individuals with T2D mentioned in Supplementary Table 9 is needed. It is unclear whether this figure represents incidence or prevalence. If it refers to incidence, it would be informative to specify the duration of the follow-up period for these individuals.

      Response: The UK Biobank data (application #41376), was applied in our study under a prospective design. We excluded participants if they were identified as follows: 1) ethnically identified as non-European (n =30,481); 2) diagnosed as type 1 diabetes (n=4,455); 3) diagnosed with diseases associated with bone loss (n=21,560); 4) diagnosed as fracture with known primary diseases (n=7,222) (Supplementary Table 15). For the 439,982 UK biobank samples, we focused the participants diagnosed with T2D within the 10-year period from 1 January 2006 to 31 December 2015, leaving 425,772 participants (with 14,860 type 2 diabetes patients). Here, each type 2 diabetes patient had a diagnosis date (i.e., the reference date), we first calculated the onset age, then among the participants who were free of T2D, we selected up to 27 participants (whenever possible) whose age at the reference date (± 3 years) could be matching to the onset age as referents. In total, 363,884 non-T2D referents were individually matched with 6-year age band at the reference date. We prospectively followed these type 2 diabetes patients and referents from the reference date until diagnosis of fracture, death, emigration, 19 April 2021 (diagnose a fracture of the last person in the cohort), whichever came first (with the mean duration of type 2 diabetes 8.34 years). Survival time was calculated based on whether the patient had a fracture. If individuals had a fracture, the survival time is calculated as the time of the first diagnosis of fracture minus the reference date. If individuals did not have a fracture, it was defined as the minimum time of the reference date to diagnose a fracture of the last person in the cohort (19 April 2021), death, or emigration date. We excluded 25,865 participants with fracture diagnosis date, or death or emigration before the reference date, leaving 352,879 participants included in the final analysis (13,817 type 2 diabetes patients and 339,062 referents). We identified 16,147 (4.6%) participants with fracture within the 352,879 UK Biobank participant. We have made this clear in the Methods and Results (Page 18, line 400-406; Page 22-23, line 506-523; Page 10, line 231-233).

      (6) I find the selection of participants for the analysis to be highly problematic. Supplementary Figure 1 suggests that individuals with a history of fracture were excluded from the study. However, it is well established that prior fracture history is a significant predictor of future fractures. Therefore, the exclusion of participants with prior fractures likely introduced selection bias into the analysis, potentially compromising the study's findings.

      Response: Sorry that we used a misleading term “secondary fracture” in the manuscript and figure. What we want to say here is that “the participants diagnosed as fracture with known primary diseases” (n=7,222), because we want to investigate the effect of diabetes on fracture, we should exclude other factures with known reason. We have changed the term in the manuscript and figure accordingly (Page 18, line 405-406; Supplementary Figure 1).

      Since this study is a prospective design, all the participants did not have fracture at the reference date, we prospectively followed these type 2 diabetes patients and referents from the reference date until diagnosis of fracture, death, emigration, 19 April 2021 (diagnose a fracture of the last person in the cohort), whichever came first. Therefore, each study subject either had one fracture or no fracture.

      (7) It is unclear what exactly is meant by "genetically predicted T2D." Could it possibly refer to the polygenic risk score derived from the variants associated with T2D? Clarification is needed regarding the methodology used to determine this "genetically predicted T2D" and its relation to the construction of a polygenic risk score based on T2D-associated variants.

      Response: In this study, we used weighted genetic risk score (wGRS) method and two-sample Mendelian Randomization (MR) method to estimate the effect of genetically predicted T2D on fracture. We constructed the wGRS for the individuals in the UK biobank (294,571 samples with genotypes) as a linear combination of the selected SNPs weighted by their β coefficients on type 2 diabetes: wGRS = β1 SNP1 + β2 SNP2 + … + βn SNPn. n is the number of instrumental variables. To validate the wGRS results, we also performed the two-sample MR analyses that is independent of UK Biobank samples. We used three two-sample MR approaches, the inverse variance weighting (IVW), simple median and MR-PRESSO approaches. Both methods took the genetically predicted type 2 diabetes as the exposure (See Methods Page 18, line 419-422; Page 19, line 439-440).

      (8) My understanding is that the Mendelian Randomization analysis relies on, among others, 2 assumptions: (1) the genetic marker is linked to the exposure (e.g., T2D), and (2) the genetic marker remains independent of the outcome (e.g., fracture) when considering the exposure and all confounding factors. In the authors' study, they identified 10 loci that exhibited associations with both T2D and fracture risk. This finding raises questions about whether the assumptions underlying Mendelian Randomization have been violated?

      Response: You're absolutely right. Because the presence of horizontal pleiotropy could bias the MR estimates, we additionally used the MR pleiotropy residual sum and outlier (MR-PRESSO) method. When we excluded pleiotropic variants using restrictive MR-PRESSO method, the causal association was still detected between type 2 diabetes and fracture (OR=0.967, 95%CI=0.945-0.989, P=0.004) (Page 6, line 146-149).

      (9) The analysis provided in Supplementary Table 10 appears to have certain limitations. From my understanding, the analysis treated fracture and BMD as outcome variables, with T2D regarded as the predictor variable. However, what is of interest is whether the association between T2D and fracture remains significant even after accounting for well-established risk factors for fractures, including BMD. It is crucial to determine whether the association between T2D and fracture is independent of these established risk factors. Therefore, I suggest the authors consider the following 3 models:

      Model 1: fracture ~ age + T2D

      Model 2: fracture ~ age + T2D + BMD

      Model 3: fracture ~ age + T2D + BMD + fracture history + falls

      Response: In our previous analysis, we have adjusted for 7 covariates (including fall history) in the basic model for fracture, i.e.

      fracture ~ T2D + age + sex + BMI + physical activity + HbA1c + medication treatments + fall history (Model 0)

      We have already included “fall history” in the basic model, according to your suggestion, we further considered an additional model for fracture by including BMD as the covariate:

      fracture ~ T2D + age + sex + BMI + physical activity + HbA1c + medication treatments + fall history + BMD (Model 1)

      We cannot include fracture history as the covariate because each study subject either had one fracture or no fracture, as we also answered in Question 6.

      In model 0, we observed a higher risk of fracture in the type 2 diabetes patients in the cox proportional hazards regression after adjusted for the clinical risk factors including reference age, sex, BMI, physical activity, HbA1c, medication treatments and fall history (HR=1.527, 95%CI=1.385-1.685, P<2.0×10-16). When we additionally controlled for BMD (model 1), we still observed increased risk of fracture in type 2 diabetes (model 1: HR=1.574, 95%CI=1.425-1.739, P<2.0×10-16) (Supplementary Table 11).

      We thank for your suggestion, and we have updated accordingly in Methods, Results, and Figures (Page 11, line 243-245; Page 24, line 539-540; Figure 4A).

      (11) The dichotomization of data presented in Figure 4 is not considered ideal, as this approach often leads to a loss of valuable information. It is strongly recommended that the authors reconsider their data analysis strategy and reanalyze the data using continuous variables, such as BMI and HbA1c, to capture a more nuanced understanding of the relationships involved.

      Response: We agree that dichotomization of data would lead to a loss of valuable information. In model 0 and model 1, we used the continuous variables in the analyses, we adjusted for the reference age, sex, BMI (as a continuous variable), physical activity, fall history, HbA1c (as a continuous variable) and medication treatments to analyze the relationship between type 2 diabetes and fracture in the cox proportional hazards regression. We have updated the Figure 4 accordingly.

      In stratified analyses, we took 5 clinical factors secondary to the diseases to classify the individuals at risk, for example, if an individual had BMI≤25kg/m2, no physical activity, falls in the last year, HbA1c≥47.5mmol/mol and antidiabetic medication treatment, this individual was identified to have 5 risk factors, and so forth. Finally, 2,303 patients carried none of the risk factors, 4,128 patients accompanied with one of the risk factors, and 4,252 patients carried at least two risk factors. We found that the effect of type 2 diabetes on the risk of fracture decreased when the risk factors secondary to type 2 diabetes decreased. We have made this clearer in the Methods and Results (Page 11, line 255-257; Page 24, line 548-552).

      (12) The conclusion of the study appears to be somewhat confusing. In the Abstract, the authors initially state that "genetically predicted T2D was associated with higher BMD and lower risk of fracture." However, later on, they write that "the genetically determined T2D might not be associated with a higher risk of fracture." This discrepancy raises uncertainty about the clear take-home message of the study.

      Response: Here we just want to deliver the same message by different statements, avoiding the repeat of writing. The take-home message we want to deliver is that the genetically determined type 2 diabetes might not be associated with higher risk of fracture, but the association could be observed, suggesting the risk factors secondary to type 2 diabetes might contribute more to the risk of fracture. Therefore, it is important to manage the complications of type 2 diabetes to prevent the risk of fracture, especially the 5 factors we investigated in this study.

      (13) Apologies if I offend) It seems that the authors lack comprehensive knowledge of the osteoporosis literature. In the Introduction, their definition of osteoporosis as "an age-related common disease characterized by low bone mass" is inadequate. It would be advisable for the authors to provide a more widely accepted and standard definition of osteoporosis to ensure accuracy and alignment with established definitions in the field.

      Response: Thanks for your suggestion. Now we changed the statement as follow “Osteoporosis is a common chronic disease characterized by low bone mass and disruption of bone microarchitecture. Fragility fracture is the ultimate outcome of poor bone health”.

      (14) There are several instances in which the authors use non-standard terminologies. For example, the use of the word 'effects' (in "the observed effect of T2D on fracture risk") is inappropriate since this study is observational in nature.

      Response: In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population. We have changed some of the words “effect” into “effect size” (whenever appropriate) to refer the Hazard ratio between T2D on fracture.

      (15) Please provide a reference for "diabetic bone paradox".

      Response: We have cited Botella Martínez et al, Endocrinol Nutr. 2016 and Romero-Díaz et al, Diabetes Ther. 2021 in both Introduction and Discussion (Page 3, line 76-77; Page 13, line 295-297).

      References

      Botella Martinez S, Varo Cenarruzabeitia N, Escalada San Martin J, Calleja Canelas A. The diabetic paradox: Bone mineral density and fracture in type 2 diabetes. Endocrinol Nutr. 2016, 63: 495-501.

      Romero-Diaz C, Duarte-Montero D, Gutierrez-Romero SA, Mendivil CO. Diabetes and bone fragility. Diabetes Ther. 2021, 12: 71-86.

      Zhu XW, Liu KQ, Yuan CD et al. General and abdominal obesity operate differently as influencing factors of fracture risk in old adults. iScience. 2022, 25: 104466.

    2. Reviewer #1 (Public Review):

      Summary:

      The manuscript of Zhao et al. aimed at investigating the relationships between type 2 diabetes, bone mineral density (BMD) and fracture risk using Mendelian Randomization (MR) approach.<br /> The authors found that genetically predicted T2D was associated with higher BMD and lower risk of fracture, and suggested a mediated effect of RSPO3 level. Moreover, when stratified by the risk factors secondary to T2D, they observed that the effect of T2D on the risk of fracture decreased when the number of risk factors secondary to T2D decreased.

      Strengths:

      - Important question<br /> - Manuscript is overall clear and well-written<br /> - MR analyses have been conducted properly, which include the usage of various MR methods and sensitivity analyses, and likely meet the criteria of the MR-strobe checklist to report MR results.

      Weaknesses:

      - Interpretation of MR findings should be more nuanced given the modest (almost neutral) relationship between T2D and fracture risk in MR

    3. Reviewer #2 (Public Review):

      The authors employed the Mendelian Randomization method to analyze the association between type 2 diabetes (T2D) and fracture using the UK Biobank data. They found that "genetically predicted T2D was associated with higher BMD and lower risk of fracture". Additionally, they identified 10 loci that were associated with both T2D and fracture risk, with the SNP rs4580892 showing the highest signal. While the negative relationship between T2D and fracture has been previously observed, the discovery of these 10 loci adds an intriguing dimension to the findings, although the clinical implications remain uncertain.

      Many thanks for your response which has clarified my understanding of your paper. And, thank you for the additional analyses. I still find the paper challenging to understand due to two different analyses that yielded conflicting results: (a) in the observational analysis, the authors found that type 2 diabetes was associated with both higher BMD and a higher risk of fracture (ie a paradox); but (b) in the Mendelian randomization analysis, 'genetically predicted type 2 diabetes' was associated with greater BMD and a lower risk of fracture. I consider that your conclusion is not consistent with the data you presented.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript presented convincing single-cell transcriptomic data of hematopoietic cells and immunocytes in zebrafish kidney marrow and showed that these cells have distinctive responses to viral infection. The findings in this study suggest that zebrafish kidney is a secondary lymphatic organ and hematopoietic stem cells in zebrafish may exhibit trained immunity. This represents a valuable discovery of the unique features of the fish immune system.

      Public Reviews:

      Reviewer #1 (Public Review):

      Hu et al. performed sc-RNA-seq analyses of kidney cells with or without virus infection, vaccines, and vaccines+virus infections from pooled adult zebrafish. They compared within these experimental groups as well as kidney vs spleen. Their analyses identified expected populations but also revealed new hematopoietic stem/progenitor cell (HSPC), even in the spleen. Their analyses show that HSPCs in the kidney can respond to virus infection differentially and can be trained to recognize the same infection and argue that zebrafish kidney can serve as a secondary immune organ. The findings are important and interesting. The manuscript is well written and a pleasure to read. However, there are several issues with their figure presentation and figure qualities, as well as the lack of clarity in some of figure legends. Some of the data presentation can be improved for better clarity. It is also important to outline what is conserved and what is unique for fish.

      Major concerns:

      (1) The visualization for several figure panels is very poor. Please provide high resolution images and larger font sizes for gene list or Y and X axis labels. This includes Figure 1B, Figure 1-figure supplement 2, Figure 2B-2C, 3A-3D, 4F, 5B, 6G, Figure 6-figure supplement 1B, Figure 6-figure supplement 2. Figure 7B, 8C-8E, Figure 8-figure supplement 1., 10F, 10G-10J, Figure 10-figure supplement 1.

      Response: We apologize for the issue you have pointed out concerning the inadequate visualization of the graphic panels. It is likely that the formatting of the inserted images was altered during the manuscript upload process, leading to a reduction in resolution. However, the graphics uploaded as separate image files, specifically formatted as vector files in PDF format, preserve their high resolution even when zoomed in. Therefore, we kindly request the reviewer to consult the figures in the submission folder for a more detailed examination. We sincerely apologize for any inconvenience caused.

      (2) What are the figures at the end of the manuscript without any figure legends?

      Response: Thank you for bringing this issue to our attention. The last few figures that lack figure legends are actually supplementary figures included in the text. It is possible that they were automatically and repeatedly generated by the submission system. In the revised manuscript, we will take measures to ensure that this issue is avoided.

      (3) It would be better to use a Table to organize the gene signatures that define each unique population of immune cells such as T, B, NK, etc.

      Response: We greatly appreciate the valuable advice provided by the reviewer. As per the reviewer's recommendation, we have included a comprehensive display of all cell types and corresponding gene signatures in Supplementary File 1 of the revised manuscript.

      (4) What are the similarities for HSPC and immune cell populations between fish and man based on this research? It is better to form a table to compare and discuss.

      Response: Following the valuable suggestion of the reviewer, we have included an additional comparative analysis of HSPC and immune cell populations between zebrafish and humans. This information can be found in Supplementary file 8 and in the "Discussion" section (lines 684-685).

      (5) It is highly likely that sex and age could be the biological variation for how HSPC responds to virus infections and vaccination. The author should clearly state the fish sex and age from their samples and discuss their results taking into consideration of these variations.

      Response: We are grateful for the reviewer's insightful comments. To reduce inter-individual variations, zebrafish samples were selected randomly, with an equal distribution of males and females, during their prime youth period spanning from 3 to 12 months of age. We have included supplementary instructions regarding this selection process in the "Materials and Methods" section (lines 798-799).

      (6) The authors claim that the spleen and kidney share HSPCs. However, their data did not demonstrate this result clearly in Figure 4A. Perhaps they should use different color to make the overlay becoming more obvious? Or include a table to show which HSPCs are shared between the kidney and spleen? Are they sure if these are just HSPCs seeding the spleen to differentiate into B cells or other immune cells?

      Response: We express our gratitude to the reviewer for raising this issue. In this section, we would like to provide detailed explanations regarding this matter. It is important to note that the figures positioned on both the left and right sides of Figure 4A should be interpreted in a corresponding manner. The left-side figure represents the cellular composition from the spleen (depicted in light red) and the kidney (depicted in blue) across various cell types. Each data point in the left-side figure signifies an individual cell, with the two distinct colors indicating the origin of the cell. On the other hand, the right-side figure displays the varied colors representing different cell types. We want to emphasize that the spatial distribution and proportions of diverse cells in the tSNE plot on the right align consistently with the information presented in the left-side figure. This indicates the correspondence between the two plots and reinforces the validity of our findings. When interpreting the figures on the left and right sides of Figure 4A in a corresponding manner, it becomes evident that the overlapping HSPCs shared by both spleen and kidney predominantly reside in the HSPCs1 group (indicated as cluster 5 in the right-side figure). Additionally, there is also a small distribution of the overlapping HSPCs in the HSPCs2 group (cluster 8 in the right-side figure). These observations underline the presence of overlapping HSPCs in both the kidney and spleen. However, further clarification is required to fully comprehend the intricate correlation between the HSPCs in the kidney and spleen.

      Reviewer #1 (Recommendations For The Authors):

      Minor concerns:

      (1) Figure 3C: why is 10 listed in between 1 and 2?

      Response: We appreciate the reviewer's comment. It is pertinent to mention that the graphs in Figure 3C underwent an automatic sorting process facilitated by the software during the analysis. It should be emphasized that the assigned positions resulting from this sorting process have no bearing on the outcomes of the analysis.

      (2) Figure 4A: difficult to assess the overlay between the kidney and spleen.

      Response: As mentioned above, the overlapping HSPCs shared by both the spleen and kidney are mainly distributed in the HSPCs1 group (cluster 5 in the right-side figure), with a small amount also found in the HSPCs2 group (cluster 8 in the right-side figure).

      (3) Figure 4C: What is this sample, kidney or spleen? Please specify.

      Response: Figure 4C represents an overlay of the spleen and kidney cells depicted in Figure 4B, which includes all cells of the spleen and kidney to show the differentiation trajectory of the cells. As per reviewer’s suggestion, we have made corresponding modification to the revised figure.

      (4) The manuscript is very long. Consider to focus on the major findings as the main figures and move the rest to the supplementary figures.

      Response: This article aimed to comprehensively understand the hematopoietic and immunological traits of zebrafish kidneys through a systematic study. As a result, a comprehensive presentation of the findings has been provided. Given that the figures currently integrated into the main text play a significant role in illustrating the principal outcomes of each section, we kindly request that these figures remain in the main body of the article. This will contribute to sustaining the structural coherence and readability of the manuscript. Thank you for taking our request into consideration.

      Reviewer #2 (Public Review):

      In this manuscript, the authors have meticulously constructed a comprehensive atlas delineating hematopoietic stem/progenitor cell (HSPC) and immune-cell types within the zebrafish kidney, employing single-cell transcriptome profiling analysis. Notably, these cell populations exhibited distinctive responses to viral infection. Intriguingly, the investigation revealed that HSPCs manifest positive reactivities to viral infection, indicating the effective induction of trained immunity in select HSPCs. Furthermore, the study unveiled the capacity for the generation of antigen-stimulated adaptive immunity within the kidney, suggesting a role for the zebrafish kidney as a secondary lymphoid organ. This research elucidates the distinctive features of the fish immune system and underscores the multifaceted biology of the kidney in ancient vertebrates.

      Response: We would like to express our gratitude to the reviewers for their overall positive feedback on our article.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors propose that zebrafish kidney is a dual-functional entity with functionalities of both primary and secondary lymphoid organs. Do the authors have any insights into the coordination of these two functions in the kidneys?

      Response: We are grateful for the valuable comments provided. We believe that the question raised by the reviewer poses an intriguing research topic, as it explores the intricate interaction between the hematopoietic and adaptive immune systems in the renal organ. This exploration holds significant value in understanding the underlying mechanisms. To accomplish this, advanced techniques such as spatiotemporal single-cell transcriptomics and dynamic cell tracking will be utilized to validate the interplay between hematopoietic and immune cell lineages.

      (2) Previous studies have found that fish IgZ/IgT specificity exists in mucosal immune organs. Is the expression of the zebrafish IgZ gene observed in the kidney? If so, is there any correlation with IgZ in mucosal immune organs?

      Response: Thank you for drawing attention to this matter. In our study, we observed the expression of the IgZ gene (ighz) in the zebrafish kidney, as shown in Figure 6. This discovery aligns with previous research and confirms its presence in B cells. While IgZ is known to function as an antibody in mucosal immunity, it remains unclear whether the development of its secretory cells (IgZ+ B cells) originates from the central immune system, such as the kidney. Our results suggest that IgZ+ B cells may have their origin in the kidney and then migrate through the peripheral circulation to carry out their functions in the local mucosal system. This finding is consistent with our earlier research, which demonstrated that zebrafish IgZ is not limited to mucosal immune organs but is also abundantly present in systemic immunity, including peripheral blood (Immunology. 2021; 162(1): 105-120).

      Reference:

      Ji, J. F. et al. Differential immune responses of immunoglobulin Z subclass members in antibacterial immunity in a zebrafish model. Immunology, 2021;162(1), 105-120.

      (3) Did the authors use the zebrafish genome or transcriptome for gene annotation? If the former, which version is used? Please supplement in the "Materials and methods".

      Response: We appreciate the comments provided by the reviewer. In this study, we utilized the zebrafish genome, specifically the GRCz11 version, to annotate genes. The detailed genome data can be found at http://asia.ensembl.org/Danio_rerio/Info/Index. We have incorporated this information into the "Materials and Methods" section of the revised manuscript (line 873).

      (4) Since the authors performed single-cell sequencing on leukocytes, why did several kidney cells, such as kidney multicellular cells and kidney mucin cells existed in the samples?

      Response: Thanks for the reviewer’s comments. It is important to acknowledge that inadvertent mixing of kidney cells might have occurred during the preparation of single-cell suspensions in our analyzed sample. However, it is pertinent to emphasize that our primary focus was the analysis of immune cells. Therefore, any minor contamination from kidney cells in the analyzed sample is considered negligible and does not significantly affect the main results of our analysis.

      (5) The application of "trained immunity," although currently popular, appears unsuitable in this context, as the current scenario involves a recall with the cognate antigen.

      Response: To our knowledge, trained immunity is generally recognized as the long-term memory of innate immunity based on transcriptional, epigenetic and metabolic modifications of myeloid cells, which are characterized by elevated pro-inflammatory responses to secondary stimuli, whether they are identical or different (Cell Host Microbe. 2012; 12(2): 223-32; Nat Immunol. 2021; 22(1): 2-6; J Clin Invest. 2022;132(7): e158468). Therefore, stimulation of cognate antigens can be considered as a form of training immunity, and we hope that it will be accepted in this context.

      References:

      (1) Quintin, J. et al. Candida albicans infection affords protection against reinfection via functional reprogramming of monocytes. Cell host & microbe, 2012;12(2), 223-232.

      (2) Divangahi, M. et al. Trained immunity, tolerance, priming and differentiation: distinct immunological processes. Nature immunology, 2021;22(1), 2-6.

      (3) Pernet, E. et al. Training can’t always lead to Olympic macrophages. Journal of Clinical Investigation, 2022;132(7), e158468.

      (6) The discovery that HSPC exhibits trained immune characteristics is novel. Do the authors have any insights into the biological significance of trained immunity in HSPCs concerning immune defense?

      Response: We propose that the generation of trained immunity in HSPCs holds significant physiological implications. This process may expedite the differentiation and activation of specific immune cells upon re-infection, thereby bolstering the body's immune defenses and pathogen clearance. Consequently, it may serve as an intelligent strategy for host defense against pathogens. However, additional research is required to confirm this hypothesis.

      (7) In the Figure 13I, the authors used CpG and CpG+TNP-KLH to stimulate zebrafish, but no corresponding experimental method was provided in the "Materials and methods". Please supplement.

      Response: Thanks for the reviewer’s careful reading. We have included corresponding supplementary instructions in the “Materials and methods” section (lines 1011-1018).

      (8) At line 187-190 in "Results", authors state that "It's noteworthy that cluster 11 exhibited high expression of genes ......, resembling a unique serpin-secreting cell population". Noteworthy is the fact that serpins play a role in diverse immunological processes, including coagulation, inflammation, as well as myeloid and lymphoid cell development. Could this renal cell cluster (kidney mucin cells) potentially harbor immunological functions?

      Response: Given the crucial role of serpins in various immunological processes, secreted serpins from this particular cell cluster likely possess significant immunological functions, suggesting the notable immunological capabilities of this cell group. Consequently, our forthcoming research aims to conduct a more comprehensive investigation of this specific cell population.

      (9) At line 171 in "Results", the number "6" in the "cluster 6" should not be italicized, please correct.

      Response: We have addressed this issue in the revised manuscript (line 170).

      (10) At line 937 in "Materials and methods", the authors isolated T/B lymphocytes through magnetic bead sorting. Please provide information on the source of the antibodies (rabbit anti-TCRα/β or mouse anti-IgM Ab).

      Response: We have included corresponding instructions in the “Materials and methods” section (lines 938-939).

    2. Reviewer #1 (Public Review):

      Hu et al. performed sc-RNA-seq analyses of kidney cells with or without virus infection, vaccines, and vaccines+virus infections from pooled adult zebrafish. They compared within these experimental groups as well as kidney vs spleen. Their analyses identified expected populations but also revealed new hematopoietic stem/progenitor cell (HSPC), even in spleen. Their analyses show that HSPCs in kidney can respond to virus infection differentially and can be trained to recognize the same infection and argue that zebrafish kidney can serve as a secondary immune organ. The findings are important and interesting. The manuscript is well written and a pleasure to read.

    3. eLife assessment

      This study characterizes the composition and immune diversity of the zebrafish kidney, the immune organ equivalent to human bone marrow, with convincing single-cell transcriptomic data of hematopoietic cells and immunocytes. The key findings suggest that zebrafish kidney is a secondary lymphatic organ, and that hematopoietic stem cells in zebrafish may exhibit trained immunity, which are the unique features of the fish immune system. This study provides new and valuable insights into the antiviral response in teleost fish, which will be of interest to biologists in general, and to immunologists and cancer researchers in particular.

    4. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors have meticulously constructed a comprehensive atlas delineating hematopoietic stem/progenitor cell (HSPC) and immune-cell types within the zebrafish kidney, employing single-cell transcriptome profiling analysis. Notably, these cell populations exhibited distinctive responses to viral infection. Intriguingly, the investigation revealed that HSPCs manifest positive reactivities to viral infection, indicating the effective induction of trained immunity in select HSPCs. Furthermore, the study unveiled the capacity for the generation of antigen-stimulated adaptive immunity within the kidney, suggesting a role for the zebrafish kidney as a secondary lymphoid organ. This research elucidates the distinctive features of the fish immune system and underscores the multifaceted biology of the kidney in ancient vertebrates.

      Strengths:

      This study, encompassing 13 figures along with supplementary material, distinguishes itself as one of the most comprehensive investigations on this subject to date.

    1. Author Response

      Reviewer #1 (Public Review):

      The author's goal was to determine the role of O-GlcNAc modification in associate learning in Drosophila using an odor discriminatory task. In particular, they sought to determine the population of O-GlcNAc modified proteins in a region of the brain critical for memory, the mushroom body. They provide compelling evidence that there are brain-region-specific populations of O-GlcNAc modified proteins and that in the mushroom body, proteins involved in translation represent a sizable, and larger fraction than elsewhere in the central nervous system. Using expression of a bacterial protein that cleaves O-GlcNAc in the mushroom body, they show both reductions in the levels of this modification and effects on associative learning. Further exploration of new protein synthesis in situ supports the hypothesis that O-GlcNAc modification affects the activity of the translational machinery and could provide the basis for learning deficits when O-GlcNAc levels are compromised. Rescue of deficits resulting from reductions in O-GlcNAc was achieved by over-expression of dMyc, a known regulator of ribosome biogenesis and translation. While the critical role of protein synthesis in learning is long established, and that O-GlcNAc modification regulates protein synthesis, this work connects O-GlcNAc modification in a specialized region of the brain to translation regulation and associative learning. The authors also provide a method for identification of O-GlcNAc modified proteins using a tissue-specific and inducible proximity-labelling method. This will provide a useful tool for further functional studies of O-GlcNAc modification.

      Thank you for summarizing our main findings and recognizing the usefulness of the tool reported here.

      Reviewer #2 (Public Review):

      In this report Yu et al. try to demonstrate how O-GlcNAcylation of ribosomal proteins in the mushroom body (MB) is required for protein synthesis and olfactory learning. The authors develop a new method combining the O-GlcNAc binding activity of an OGlcNAcase (OGN) and TurboID for efficient isolation. This novel method is a useful tool for the identification of O-GlcNAc modified proteins and closely interacting partners. Transgenic expression of this binder allows the authors to perform a profiling that can be time and tissue/region/cell specific. This novel tool is thoroughly tested to show it works in cultured cells, whole Drosophila and in a tissue specific manner expressing it pan-neuronally or specific regions of the brain.

      The authors had previously shown that reduced O-GlcNAcylation through transgenic expression of a highly active OGN affected olfactory learning. In this work the same approach is used to reduce O-GlcNAcylation in different brain regions to show that specific reduction in the adult MB reduced olfactory learning performance. As control OGN expression in the ellipsoid body has no effect on olfactory learning. Optic and antennal lobes could not be tested as OGN expression affected olfactory acuity. The most critical part of this finding is time specific expression of OGN in the adult in a tissue specific manner given the developmental defects it induces with earlier expression. The MB has a widely reported role in associative learning, therefore this finding while not unexpected it is satisfying.

      Thank you for recognizing the significance of our work.

      Yu et al. use their TurboID-OGA to identify O-GlcNAcylated proteomes in different brain regions. The authors focus on the MB given its role in associative learning and the effect of reduced O-GlcNAcylation in this region. Among other substrates several ribosomal proteins are found to be specifically O-GlcNAcylated to a greater extent in the MB compared to other brain regions.

      To demonstrate the role of MB O-GlcNAcylated ribosomes in protein synthesis an ex vivo OPP fluorescent assay is used in brains of flies expressing OGN or a mutant form lacking its catalytic and binding activities. The experiment shows reduced protein synthesis in the MB. In addition, the authors can increase protein synthesis inducing ribosomal biogenesis through the expression of dMyc. Flies expressing of dMyc and OGN together do not present the learning deficits of flies carrying only OGN. Protein synthesis in MB has been previously reported to be required for associative learning (for example Wu et al.2017 or Lin et al. 2022) and the present results bring further support. A link between ribosomal O-GlcNAcylation and protein synthesis could be a really interesting finding but, unfortunately the experiments presented in this work are still too preliminary.

      The experiments presented just focus on ribosomal proteins while these are just some of the O-GlcNAcylation substrates in the MB. While a correlation between ribosomal modification and protein synthesis is shown, a demonstration is not provided. Many other mechanisms and O-GlcNAcylation of other substrates could account for the same observations. For example, O-GlcNAcylation has been reported to have a role in protein synthesis affecting different translation initiation factors (Li et al 2018, Shu et al 2022). In vitro experiments where specific O-GlcNAcylation ribosomal components could be targeted are required. In addition, O-GlcNAcylation is also known to modify ribosomal-associated mRNAs. Experiments where specific mutations preventing O-GlcNAcylation in ribosomes could demonstrate a direct link of such ribosomal modifications in olfactory learning.

      We appreciate that you bring up a crucial point that our data fall short for a causal connection between O-GlcNAcylation of ribosomes and translational activity. We have made significant changes to the text throughout the manuscript to make our description more accurate.

    2. eLife assessment

      This work describes a valuable new technique involving proximity labelling to identify Drosophila proteins modified by GlcNAcylation in subsets of cells in vivo. A solid set of experiments shows that several ribosomal proteins are modified in the fly mushroom body. Consistent with a role for GlcNAcylation of ribosomal proteins in control of memory related translational control, the authors show that perturbation of GlcNAc modification in KCs prevents efficient consolidation of long-term memory.

    3. Reviewer #1 (Public Review):

      The author's goal was to determine the role of O-GlcNAc modification in associate learning in Drosophila using an odor discriminatory task. In particular, they sought to determine the population of O-GlcNAc modified proteins in a region of the brain critical for memory, the mushroom body. They provide compelling evidence that there are brain-region-specific populations of O-GlcNAc modified proteins and that in the mushroom body, proteins involved in translation represent a sizable, and larger fraction than elsewhere in the central nervous system. Using expression of a bacterial protein that cleaves O-GlcNAc in the mushroom body, they show both reductions in the levels of this modification and effects on associative learning. Further exploration of new protein synthesis in situ supports the hypothesis that O-GlcNAc modification affects the activity of the translational machinery and could provide the basis for learning deficits when O-GlcNAc levels are compromised. Rescue of deficits resulting from reductions in O-GlcNAc was achieved by over-expression of dMyc, a known regulator of ribosome biogenesis and translation. While the critical role of protein synthesis in learning is long established, and that O-GlcNAc modification regulates protein synthesis, this work connects O-GlcNAc modification in a specialized region of the brain to translation regulation and associative learning. The authors also provide a method for identification of O-GlcNAc modified proteins using a tissue-specific and inducible proximity-labelling method. This will provide a useful tool for further functional studies of O-GlcNAc modification.

    4. Reviewer #2 (Public Review):

      In this report Yu et al. try to demonstrate how O-GlcNAcylation of ribosomal proteins in the mushroom body (MB) is required for protein synthesis and olfactory learning. The authors develop a new method combining the O-GlcNAc binding activity of an OGlcNAcase (OGN) and TurboID for efficient isolation. This novel method is a useful tool for the identification of O-GlcNAc modified proteins and closely interacting partners. Transgenic expression of this binder allows the authors to perform a profiling that can be time and tissue/region/cell specific. This novel tool is thoroughly tested to show it works in cultured cells, whole Drosophila and in a tissue specific manner expressing it pan-neuronally or specific regions of the brain.

      The authors had previously shown that reduced O-GlcNAcylation through transgenic expression of a highly active OGN affected olfactory learning. In this work the same approach is used to reduce O-GlcNAcylation in different brain regions to show that specific reduction in the adult MB reduced olfactory learning performance. As control OGN expression in the ellipsoid body has no effect on olfactory learning. Optic and antennal lobes could not be tested as OGN expression affected olfactory acuity. The most critical part of this finding is time specific expression of OGN in the adult in a tissue specific manner given the developmental defects it induces with earlier expression. The MB has a widely reported role in associative learning, therefore this finding while not unexpected it is satisfying.

      Yu et al. use their TurboID-OGA to identify O-GlcNAcylated proteomes in different brain regions. The authors focus on the MB given its role in associative learning and the effect of reduced O-GlcNAcylation in this region. Among other substrates several ribosomal proteins are found to be specifically O-GlcNAcylated to a greater extent in the MB compared to other brain regions.

      To demonstrate the role of MB O-GlcNAcylated ribosomes in protein synthesis an ex vivo OPP fluorescent assay is used in brains of flies expressing OGN or a mutant form lacking its catalytic and binding activities. The experiment shows reduced protein synthesis in the MB. In addition, the authors can increase protein synthesis inducing ribosomal biogenesis through the expression of dMyc. Flies expressing of dMyc and OGN together do not present the learning deficits of flies carrying only OGN. Protein synthesis in MB has been previously reported to be required for associative learning (for example Wu et al.2017 or Lin et al. 2022) and the present results bring further support. A link between ribosomal O-GlcNAcylation and protein synthesis could be a really interesting finding but, unfortunately the experiments presented in this work are still too preliminary.

      The experiments presented just focus on ribosomal proteins while these are just some of the O-GlcNAcylation substrates in the MB. While a correlation between ribosomal modification and protein synthesis is shown, a demonstration is not provided. Many other mechanisms and O-GlcNAcylation of other substrates could account for the same observations. For example, O-GlcNAcylation has been reported to have a role in protein synthesis affecting different translation initiation factors (Li et al 2018, Shu et al 2022). In vitro experiments where specific O-GlcNAcylation ribosomal components could be targeted are required. In addition, O-GlcNAcylation is also known to modify ribosomal-associated mRNAs. Experiments where specific mutations preventing O-GlcNAcylation in ribosomes could demonstrate a direct link of such ribosomal modifications in olfactory learning.

    1. eLife assessment

      The authors provide solid evidence that any contribution of oligodendrocyte precursors to the developing cortex from the lateral ganglionic eminence is minimal in scope. The methods used support the conclusions, with some technical concerns that the authors can address with further experimentation. These are considered valuable additions to our understanding of the origins of oligodendrocytes in the forebrain during development.

    2. Reviewer #1 (Public Review):

      Summary:<br /> In this manuscript the authors re-examine the developmental origin of cortical oligodendrocyte (OL) lineage cells using a combination of strategies, focussing on the question of whether the LGE generates cortical OL cells. The paper is interesting to myelin biologists, the methods used are appropriate and, in general, the study is well-executed, thorough, and persuasive, but not 100% convincing.

      Strengths, weaknesses, and recommendations:<br /> The first evidence presented that the LGE does not generate OLs for the cortex is that there are no OL precursors 'streaming' from the LGE during embryogenesis, unlike the MGE (Figure 1A). This in itself is not strong evidence, as they might be more dispersed. In fact, in the images shown, there is no obvious 'streaming' from the MGE either. Note that in Figure 1 there is no reference to the star that is shown in the figure.

      The authors then electroporate a reporter into the LGE at E13.5 and examine the fate of the electroporated cells (Figures 1C-E). They find that electroporated cells became neurons in the striatum and in the cortex but no OLs for the cortex. There are two issues with this: first, there is no quantification, which means there might indeed be a small contribution from the LGE that is not immediately obvious from snapshot images. Second, it is unexpected to find labelled neurons in the cortex at all since the LGE does not normally generate neurons for the cortex! Electroporations are quite crude experiments as targeting is imprecise and variable and not always discernible at later stages. For example, in Figure 1D, one can see tdTOM+ cells near the AEP, as well as the striatum. Hence, IUE cannot on its own be taken as proof that there is no contribution of the LGE to the cortical OL population.

      The authors then use an alternative fate-mapping approach, again with E13.5 electroporations (Figure 2). They find only a few GFP+ cells in the cortex at E18 (Figures 2C-D) and P10 (Figure 2E) and these are mainly neurons, not OL lineage cells. Again, there is no quantification.

      Figure 3 is more convincing, but the experiments are incomplete. Here the authors generate triple-transgenic mice expressing Cre in the cortex (Emx1-Cre) and the MGE (Nkx2.1-Cre) as well as a strong nuclear reporter (H2B-GFP). They find that at P0 and P10, 97-98% of OL-lineage cells (SOX10+ or PDGFRA+) in the cortex are labelled with GFP (Figure 3). This is a more convincing argument that the LGE/CGE might not contribute significant numbers of OL lineage cells to the cortex, in contrast to the Kessaris et at. (2006) paper, which showed that Gsh2-Cre mice label ~50% of SOX10+ve cells in the motor cortex at P10. The authors of the present paper suggest that the discrepancy between their study and that of Kessaris et al. (2006) is based on the authors' previous observation (Zhang et al 2020) (https://doi.org/10.1016/j.celrep.2020.03.027) that GSH2 is expressed in intermediate precursors of the cortex from E18 onwards. If correct, then Kessaris et al. might have mistakenly attributed Gsh2-Cre+ lineages to the LGE/CGE when they were in fact intrinsic to the cortex. However, the evidence from Zhang et al 2020 that GSH2 is expressed by cortical intermediate precursors seems to rest solely on their location within the developing cortex; a more convincing demonstration would be to show that the GSH2+ putative cortical precursors co-label for EMX1 (by immunohistochemistry or in situ hybridization), or that they co-label with a reporter in Emx1-driven reporter mice. This demonstration should be simple for the authors as they have all the necessary reagents to hand. Without these additional data, the assertion that GSX2+ve cells in the cortex are derived from the cortical VZ relies partly on an act of faith on the part of the reader.

      Note that Tripathi et al. (2011, "Dorsally- and ventrally-derived oligodendrocytes have similar electrical properties but myelinate preferred tracts." J. Neurosci. 31, 6809-6819) found that the Gsh-Cre+ OL lineage contributed only ~20% of OLs to the mature cortex, not ~50% as reported by Kessaris et al. (2006). If it is correct that these Gsh2-derived OLs are from the cortical anlagen as the current paper claims, then it would raise the possibility that the ventricular precursors of GSH2+ intermediate progenitors are not uniformly distributed through the cortical VZ but are perhaps localized to some part of it. Then the contribution of Gsh2-derived OLs to the cortical population could depend on precisely where one looks relative to that localized source. It would be a nice addition to the current manuscript if the authors could explore the distribution of their GSH2+ intermediate precursors throughout the developing cortex. In any case, Tripathi et al. (2011) should be cited.

      Finally, the authors deleted Olig2 in the MGE and found a dramatic reduction of PDGFRA+ and SOX10+ cells in the cortex at E14 and E16 (Figure 4A-F). This further supports their conclusion that, at least at E16, there is no significant contribution of OLs from ventral sources other than the MGE/AEP. This does not exclude the possibility that the LGE/CGE generates OLs for the cortex at later stages. Hence, on its own, this is not completely convincing evidence that the LGE generates no OL lineage cells for the cortex.

    3. Reviewer #2 (Public Review):

      Traditional thinking has been that cortical oligodendrocyte progenitor cells (OPCs) arise in the development of the brain from the medial ganglionic eminence (MGE), lateral/caudal ganglionic eminence (LGE/CGE), and cortical radial glial cells (RGCs). Indeed a landmark study demonstrated some time ago that cortical OPCs are generated in three waves, starting with a ventral wave derived from the medial ganglionic eminence (MGE) or the anterior entopeduncular area (AEP) at embryonic day E12.5 (Nkx2.1+ lineage), followed by a second wave of cortical OLs derived from the lateral/caudal ganglionic eminences (LGE/CGE) at E15.5 (Gsx2+/Nkx2.1- lineage), and then a final wave occurring at P0, when OPCs originate from cortical glial progenitor cells (Emx1+ lineage). However, the authors challenge the idea in this paper that cortical progenitors are produced from the LGE. They have found previously that cortical glial progenitor cells were also found to express Gsx2, suggesting this may not have been the best marker for LGE-derived OPCs. They have used fate mapping experiments and lineage analyses to suggest that cortical OPCs do not derive from the LGE.

      Strengths:<br /> (1) The data is high quality and very well presented, and experiments are thoughtful and elegant to address the questions being raised.

      (2) The authors use two elegant approaches to lineage trace LGE derived cells, namely fate mapping of LGE-derived OPCs by combining IUE (intrauterine electroporation) with a Cre recombinase-dependent IS reporter, and Lineage tracing of LGE-derived OPCs by combining IUE with the PiggyBac transposon system. Both approaches show convincingly that labelled LGE-derived cells that enter the cortex do not express OPC markers, but that those co-labelling with oligodendrocyte markers remain in the striatum.

      (3) The authors then use further approaches to confirm their findings. Firstly they lineage trace Emx1-Cre; Nkx2.1-Cre; H2B-GFP mice. Emx1-Cre is expressed in cortical RGCs and Nkx2.1-Cre is specifically expressed in MGE/AEP RGCs. They find that close to 98% of OPCs in the cortex co-label with GFP at later times, suggesting the contribution of OPCs from LGE is minimal.

      (4) They use one further approach to strengthen the findings yet further. They cross Nkx2.1-Cre mice with Olig2 F/+ mice to eliminate Olig2 expression in the SVZ/VZ of the MGE/AEP (Figures 4A-B). The generation of MGE/AEP-derived OPCs is inhibited in these Olig2-NCKO conditional mice. They find that the number of cortical progenitors at E16.5 is reduced 10-fold in these mice, suggesting that LGE contribution to cortical OPCs is minimal.

      Weaknesses:<br /> (1) The authors use IUE in experiments mentioned in point 2 of 'Strengths' above (Figures 1 and 2) and claim that the reporter was delivered specifically into LGE VZ at E13.5 using this IUE. It would be nice to see some sort of time course of delivery after IUE to show the reporter is limited to LGE VZ at early times post-IUE.

      (2) In the experiments mentioned in point 3 of 'Strengths' (Figure 3), statistical analysis showed that only approximately 2% of OPCs were GFP-negative cells. This 2% could possibly be derived from the LGE/CGE so does not totally rule out that LGE contributes some cortical OPCs.

      (3) In the experiments mentioned in point 4 of 'Strengths' (Figure 4), they do still find cortical OPCs at E16.5 in the Olig2-NCKO conditional mice. It is unclear whether this is due to the recombination efficiency of the CRE enzyme not being 100%, or whether there is some LGE contribution to the cortical OPCs.

      Impact of Study:<br /> The authors show elegantly and convincingly that the contribution of the LGE to the pool of cortical OPCs is minimal. The title should perhaps be that the LGE contribution is minimal rather than no contribution at all, as they are not able to rule out some small contribution from the LGE. These findings challenge the traditional belief that the LGE contributes to the pool of cortical OPCs. The authors do show that the LGE does produce OPCs, but that they tend to remain in the striatum rather than migrate into the cortex. It is interesting to wonder why their migration patterns may be different from the MGE-derived OPCs which migrate to the cortex. The functional significance of these different sources of OPCs for adult cortex in homeostatic or disease states remains unclear though.

    1. eLife assessment

      This useful study presents data regarding the presence of synaptic proteins in the extracellular vesicle pool present in the blood of Parkinson's patients and non-parkinson neurological outpatients, trying to correlate changes in such levels with the progression of Parkinson's symptoms. The results are semi-quantitative and preliminary, suggesting that these biomarkers could be used in the follow up of a specific group of Parkinson patients. The evidence is incomplete at this point, and more quantitative approaches are required to propose this correlation. The isolation of extracellular vesicles was appropriate as revealed by their sizes, but they are not exclusively from neuronal origin. The presented approach is not ready to be used in the clinical setting.

    1. eLife assessment

      This study unveils important mechanistic insights into postnatal lung development and bronchopulmonary dysplasia (BPD) pathology. Using two BPD models enhances our comprehension of the disease, utilizing compelling evidence from single-cell sequencing and flow cytometry, revealing a myofibroblast loss. Pharmacological and genetic approaches convincingly argue against the presumed increase in TGFb signaling causing alveolar simplification; instead, it appears to be a compensatory response. The identified weakness is the absence of validation in tissue, leaving the question unanswered regarding whether myofibroblast loss is due to a lack of myofibroblast proliferation or myofibroblast differentiation/specification.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, the authors used both the commonly used neonatal hyperoxia model as well as cell-type-specific genetic inactivation of Tgfbr2 models to study the basis of BPD. The bulk of the analyses focus on the mesenchymal cells. Results indicate impaired myofibroblast proliferation, resulting in decreased cell number. InactzXivation of Etc2 in Pdgfra-lineaged cells, preventing cytokinesis of myofibroblasts, led to alveolar simplification. Together, the findings demonstrate that disrupted myofibroblast proliferation is a key contributor to BPD pathogenesis.

      Strengths:

      Overall, this comprehensive study of BPD models advances our understanding of the disease. The data are of high quality.

      Weaknesses:

      The critiques are mostly minor and can be addressed without extensive experimentation.

    3. Reviewer #2 (Public Review):

      Summary:

      In this study, the authors systematically explore the mechanism(s) of impaired postnatal lung development with relevance to BPD (bronchopulmonary dysplasia) in two murine models of 'alveolar simplification', namely hyperoxia and epithelial loss of TGFb signaling. The work presented here is of great importance, given the limited treatment options for a clinical entity frequently encountered in newborns with high morbidity and mortality that is still poorly understood, and the unclear role of TGFb signaling, its signaling levels, and its cellular effects during secondary alveolar septum formation, a lung structure generating event heavily impacted by BPD. The authors show that hyperoxia and epithelial TGFb signaling loss have similar detrimental effects on lung structure and mechanical properties (emphysema-like phenotype) and are associated with significantly decreased numbers of PDGFRa-expressing cells, the major cell pool responsible for generation of postnatal myofibroblasts. They then use a single-cell transcriptomic approach combined with pathway enrichment analysis for both models to elucidate common factors that affect alveologenesis. Using cell communication analysis (NicheNet) between epithelial and myofibroblasts they confirm increased projected TGFb-TGFbR interactions and decreased projected interactions for PDGFA-PDGFRA, and other key pathways, such as SHH and WNT. Based on these results they go on to uncover in a sequela of experiments that surprisingly, increased TGFb appears reactive to postnatal lung injury and rather protective/homeostatic in nature, and the authors establish the requirement for alpha V integrins, but not the subtype alphaVbeta6, a known activator of TGFb signaling and implied in adult lung fibrosis. The authors then go beyond the TGFb axis evaluation to show that mere inhibition of proliferation by conditional KO of Ect2 in Pdgfra lineage results in alveolar simplification, pointing out the pivotal role of PDGFRa-expressing myofibroblasts for normal postnatal lung development.

      Strengths:

      (1) The approach including both pharmacologic and mechanistically-relevant transgenic interventions both of which produced consistent results provides robustness of the results presented here.

      (2) Further adding to this robustness is the use of moderate levels of hyperoxia at 75% FiO2, which is less extreme than 100% FiO2 frequently used by others in the field, and therefore favors the null hypothesis.

      (3) The prudent use of advanced single-cell analysis tools, such as NicheNet to establish cell interactions through the pathways they tested and the validation of their scRNA-seq results by analysis of two external datasets. Delineation of the complexity of signals between different cell types during normal and perturbed lung development, such as attempted successfully in this study, will yield further insights into the underlying mechanism(s).

      (4) The combined readout of lung morphometric (MLI) and lung physiologic parameters generates a clinically meaningful readout of lung structure and function.

      (5) The systematic evaluation of TGFb signaling better determines the role in normal and postnatally-injured lungs.

      Weaknesses:

      (1) While the study convincingly establishes the effect of lung injury on the proliferation of PDGFRa-expressing cells, differentiation is equally important. Characterization of PDGFRa expressing cells and tracking the changes in the injury models in the scRNA analysis, a key feature of this study, would benefit from expansion in this regard. PDGFRa lineage gives rise to several key fibroblast populations, including myofibroblasts, lipofibroblasts, and matrix-type fibroblasts (Collagen13a1, Collagen14a1). Lipofibroblasts constitute a significant fraction of PDGFRa+ cells, and expand in response to hyperoxic injury, as shown by others. Collagen13a1-expressing fibroblasts expand significantly under both conditions (Figure 3), and appear to contain a significant number of PDGFRa-expressing cells (Suppl Fig.1). Effects of the applied injuries on known differentiation markers for these populations should be documented. Another important aspect would be to evaluate whether the protective/homeostatic effect of TGFb signaling is supporting the differentiation of myofibroblasts. Postnatal Gli1 lineage gains expression of PDGFRa and differentiation markers, such as Acta2 (SMA) and Eln (Tropoelastin). Loss of PDGFRa expression was shown to alter Elastin and TGFb pathway-related genes. TGFb signaling is tightly linked to the ECM via LTBPs, Fibrillins, and Fibulins. An additional analysis in the aforementioned regard has great potential to more specifically identify the cell type(s) affected by the loss of TGFb signaling and allow analysis of their specific transcriptomic changes in response and underlying mechanism(s) to postnatal injury.

      (2) Of the three major lung abnormalities encountered in BPD, the authors focus on alveolarization impairment in great detail, to a very limited extent on inflammation, and not on vascularization impairment. However, this would be important not only to better capture the established pathohistologic abnormalities of BPD, but also it is needed since the authors alter TGFb signaling, and inflammatory and vascular phenotypes with developmental loss of TGFb signaling and its activators have been described. Since the authors make the point about the absence of inflammation in their BPD model, it will be important to show the evidence.

      (3) Conceptually it would be important that in the discussion the authors reconcile their findings in the experimental BPD models in light of human BPD and the potential implications it might have on new ways to target key pathways and cell types for treatment. This allows the scientific community to formulate the next set of questions in a disease-relevant manner.

    4. Reviewer #3 (Public Review):

      Summary:

      This paper seeks to understand the role of alveolar myofibroblasts in abnormal lung development after saccular stage injury.

      Strengths:

      Multiple models of neonatal injury are used, including hyperoxia and transgenic models that target alveolar myofibroblasts.

      Weaknesses:

      There are several weaknesses that leave the conclusions significantly undersupported by the data as presented:

      (1) There is no validation of the decreased number of myofibroblasts suggested by flow cytometry/scRNAseq at the level of the tissue. Given that multiple groups have reported increased myofibroblasts (aSMA+ fibroblasts) in humans with BPD and in mouse models, demonstrating a departure from prior findings with tissue validation in the mouse models is essential. There are many reasons for decreased numbers of a subpopulation by flow cytometry, most notably that injured cells may be less likely to survive the cell sorting process.

      (2) The hallmark genes used to define the subpopulations are not given in single-cell data. As the definition of fibroblast subtypes remains an area of unsettled discussion in the field, it is possible that the decreased number by classification and not a true difference. Tissue validation and more transparency in the methods used for single-cell sequencing would be critical here.

      (3) There is an oversimplification of neonatal hyperoxia as a "BPD model" used here without a reference to detailed prior work demonstrating that the degree and duration of hyperoxia dramatically change the phenotype. For example, Morty et al have shown that hyperoxia of 85% or more x 14 days is required to demonstrate the septal thickening observed in severe human BPD. Other than one metric of lung morphometry (MLI), which is missing units on the y-axis and flexivent data, the authors have not fully characterized this model. Prior work comparing 75% O2 exposure for 5, 8, or 14 days shows that in the 8-day exposed group (similar to the model used here), much of the injury was reversible. What evidence do the authors have that hyperoxia alone is an accurate model of the permanent structural injury seen in human BPD?

      (4) Thibeault et al published a single-cell analysis of neoantal hyperoxia in 2021, with seemingly contrasting findings. How does this dataset compare in context?

    1. eLife assessment

      Herein, Xie and colleagues use a hamster model to show that Leptospira infection leads to gut pathology, an altered gut microbiota, and increased translocation. A combined use of antibiotics and LPS neutralization prolonged survival, providing a potential new therapeutic approach. This fundamental study uses compelling methods to provide new insights into this emerging disease, which could be dissected further in future studies aimed at gaining mechanistic insight and assessing the translational relevance of these discoveries.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, Xie and colleagues aimed to explore the function and potential mechanisms of the gut microbiota in a hamster model of severe leptospirosis. The results demonstrated that Leptospira infection was able to cause intestine damage and inflammation. Leptospira infection promoted an expansion of Proteobacteria, increased gut barrier permeability, and elevated LPS levels in the serum. Thus, they proposed an LPS-neutralization therapy which improved the survival rate of moribund hamsters combined with antibody therapy or antibiotic therapy.

      Strengths:

      The work is well-designed and the story is interesting to me. The gut microbiota is essential for immunity and systemic health. Many life-threatening pathogens, such as SARS-CoV-2 and other gut-damaged infection, have the potential to disrupt the gut microbiota in the later stages of infection, causing some harmful gut microbiota-derived substances to enter the bloodstream. It is emphasized that in addition to exogenous pathogenic pathogens, harmful substances of intestinal origin should also be considered in critically ill patients.

      Weaknesses:

      (1) There are many serotypes of Leptospira, it is suggested to test another pathogenic serotype of Leptospira to validate the proposed therapy.

      (2) Authors should explain why the infective doses of leptospires was not consistent in different study.

      (3) In the discussion section, it is better to supplement the discussion of the potential link between the natural route of infection and leptospirosis.

      (4) Line 231, what is the solvent of thioglycolate?

      (5) Lines 962-964, there are some mistakes which are not matched to Figure 7.

    3. Reviewer #2 (Public Review):

      Severe leptospirosis in humans and some mammals often meet death in the endpoint. In this article, authors explored the role of the gut microbiota in severe leptospirosis. They found that Leptospira infection promoted a dysbiotic gut microbiota with an expansion of Proteobacteria and LPS neutralization therapy synergized with antileptospiral therapy significantly improved the survival rates in severe leptospirosis. This study is well-organized and has potentially important clinical implications not only for severe leptospirosis but also for other gut-damaged infections.

    4. Reviewer #3 (Public Review):

      Summary:

      This is a well-prepared manuscript that presented interesting research results. The only defect is that the authors should further revise the English language.

      Strengths:

      The omics method produced unbiased results.

      Weaknesses:

      LPS neutralization is not a new method for treating leptospiral infection.

    1. eLife assessment

      This manuscript provides potentially valuable information suggesting that the earliest appearing T-cells during ontogeny may have properties that are fundamentally distinct from those appearing later in life. At this stage, weaknesses in the experimental design and data interpretation provide inadequate support for the conclusions. With modifications, the paper should be of interest to those interested in T-cell development.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The manuscript by Rowell et al aims to identify differences in TCR recombination and selection between foetal and adult thymus in mice. Authors sequenced the unpaired bulk TCR repertoire in foetal and adult mice thymi and studied both TCRB and TCRa characteristics in the double positive (DP, CD4+CD8+) and single positive (SP4 CD4+CD8-CD3+ and SP8 CD4-CD8+CD3+) populations. They identified age-related differences in TCRa and TCRB segment usage, including a preferential bias toward 3'TRAV and 5' TRAJ rearrangements in foetal cells compared to adults who had a larger perveance for 5'TRAV segments. By depleting the thymocyte population in adult thymi using hydrocortisone, the authors demonstrated that the repertoire became more foetal like, they therefore argue that the preferential 5'TRAV rearrangements in adults may be resulting from prolonged/progressive TCRa rearrangements in the adult thymocytes. In line with previous studies, Authors demonstrate that the foetal TCR repertoire was less diverse, less evenly distributed and had fewer non-template insertions while containing more clonal expansions. In addition, the authors claim that changes in V-J usage and CDR1 and CDR2 in the DP vs SP repertoires indicated that positive selection of foetal thymocytes are less dependent on interactions with the MHC.

      Strengths:<br /> Overall, the manuscript provides an extensive analysis of the foetal and adult TCR repertoire in the thymus, resulting in new insights in T cell development in foetal and adult thymi.

      Weaknesses:<br /> Three major concerns arise: 1) the authors have analysed TCR repertoires of only 4 foetal and 4 adult mice, considering the high spread the study may have been underpowered. 2) Gating strategies are missing and 3) the manuscript is very technical and clearly aimed for a highly specialised audience with expertise in both thymocyte development and TCR analysis. Authors are recommended to provide schematics of the TCR rearrangements/their findings and include a summary conclusions/implications of their findings at the end of each results section rather than waiting till the discussion. This will help the reader to interpret their findings while reading the results.

    3. Reviewer #2 (Public Review):

      Summary:<br /> The authors comprehensively assess differences in the TCRB and TCRA repertoires in the fetal and adult mouse thymus by deep sequencing of sorted cell populations. For TCRB and TCRA they observed biased gene segment usage and less diversity in fetal thymocytes. The TCRB repertoire was less evenly distributed and displayed more evidence of clonal expansions and repertoire sharing among individuals in fetal thymocytes. In both fetal and adult thymocytes they show skewing of V segment (CDR1-2) repertoires in CD4 and CD8 as compared to DP thymocytes, which they attribute to MHC-I vs MHC-II restriction during positive selection. However the authors assess these effects to be weaker in fetal thymocytes, suggesting weaker MHC-restriction. They conclude that in multiple respects fetal repertoires are distinct from and more innate-like than adult.

      Strengths:<br /> The analyses of the F18.5 and adult thymic repertoires are comprehensive with respect to the cell populations analyzed and the diversity of approaches used to characterize the repertoires. Because repertoires were analyzed in pre- and post-selection thymocyte subsets, the data offer the potential to assess repertoire selection at different developmental stages. The analysis of repertoire selection in fetal thymocytes may be unique.

      Weaknesses:<br /> (1) Problematic experimental design and some lack of familiarity with prior work have resulted in highly problematic interpretations of the data, particularly for TCRA repertoire development.<br /> The authors note fetal but not adult thymocytes to be biased towards usage of 3' V segments and 5'J segments. It should be noted that these basic observations were made 20 years ago using PCR approaches (Pasqual et al., J.Exp.Med. 196:1163 (2002)), and even earlier by others. The authors also note that in fetal thymus this bias persists after positive selection, and it can be reproduced in adults during recovery from hydrocortisone treatment. The authors conclude that there are fewer rounds of sequential TCRA rearrangements in the fetal thymus, perhaps due to less time spent in the DP compartment in fetus versus adult. However, the repertoire difference noted by the authors does not require such an explanation. What the authors are analyzing in the fetus is the leading edge of a synchronous wave of TCRA rearrangements, whereas what they are analyzing in adults is the unsynchronized steady state distribution. It is certainly true, as has been shown previously, that the earliest TCRA rearrangements use 3' TRAV and 5'TRAJ segments. But analysis of adult thymocytes has shown that the progression from use of 3' TRAV and 5' TRAJ to use of 5' TRAV and 3' TRAJ takes several days (Carico et al., Cell Rep. 19:2157 (2017)). The same kinetics, imposed on fetal development, would put development of a more complete TCRA repertoire at or shortly after birth. In fact, Pasqual showed exactly this type of progression from F18 through D1 after birth, and could reproduce the progression by placing F16 thymic lobes in FTOC. It is not appropriate to compare a single snapshot of a synchronized process in early fetal thymocytes to the unsynchronized steady state situation in adults. In fact, the authors' own data support this contention, because when they synchronize adult thymocytes by using hydroxycortisone, they can replicate the fetal distribution. Along these lines, the fact that positive selection of fetal thymocytes using 3' TRAV and 5' TRAJ segments occurs within 2 days of thymocyte entry into the DP compartment does not mean that DP development in the fetus is intrinsically rapid and restricted to 2 days. It simply means that thymocytes bearing an early rearranging TCR can be positively selected shortly after TCR expression. The expectation would be that those DP thymocytes that had not undergone early positive selection using a 3' TRAV and a 5' TRAJ would remain longer in the DP compartment and continue the progression of TCRA rearrangements, with the potential for selection several days later using more 5'TRAV and 3'TRAJ.<br /> (2) The authors note 3' V and 5'J biases for TCRB in fetal thymocytes. The previously outlined concerns about interpreting TCRA repertoire development do not directly apply here. But it would be appropriate to note that by deep sequencing, Sethna (PNAS 114:2253 (2017)) identified skewed usage of some of the same TRBV gene segments in fetal versus adult. It should also be noted that Sethna did not detect significantly skewed usage of TRBJ segments. Regardless, one might question whether the skewed usage of TRBJ segments detected here should be characterized as relating to chromosomal location. There are two logical ways one can think about chromosomal location of TRBJ segments - one being TRBJ1 cluster vs TRBJ2 cluster, the other being 5' to 3' within each cluster. The variation reported here does not obviously fit either pattern. Is there a statistically significant difference in aggregate use of the two clusters? There is certainly no clear pattern of use 5' to 3' across each cluster.<br /> (3) The authors show that biases in TCRA and TCRB V and J gene usage between fetal and adult thymocytes are mostly conserved between pre- and post-selection thymocytes (Fig 2). In striking contrast, TCRA and TCRB combinatorial repertoires show strong biases pre-selection that are largely erased in post-selection thymocytes (Fig 3). This apparent discrepancy is not addressed, but interpretation is challenging.<br /> (4) The observation that there is a higher proportion of nonproductive TCRB rearrangements in fetal thymus compared to adult is challenging to interpret, given that the results are based upon RNA sequencing so are unlikely to reflect the ratio in genomic DNA due to processes like NMD.<br /> (5) An intriguing and paradoxical finding is that fetal DP, CD4 and CD8 thymocytes all display greater sharing of TCRB CDR3 sequences among individuals than do adults (Fig 5DE), whereas DP and CD8 thymocytes are shown to display greater CDR3 amino acid triplet motif sharing in adults (with a similar trend in CD4). The authors attribute high amino acid triplet sharing to the result of selection of recurrent motifs by contact with pMHC during positive selection. But this interpretation seems highly problematic because the difference between fetal and adult thymocytes is dramatic even in unfractionated DP thymocytes, the vast majority of which have not yet undergone positive selection. How then to explain the differences in CDR3 sharing visualized by the different approaches?<br /> (6) The authors conclude that there is less MHC restriction in fetal thymocytes, based on measures of repertoire divergence from DP to CD4 and CD8 populations (Fig. 6). But the authors point to no evidence of this in analysis of TRBV usage, either by PC or heatmap analyses (A,B,D). The argument seems to rest on PC analysis of TRAV usage (Fig S6), despite the fact that dramatic differences in the SP4 and SP8 repertoires are readily apparent in the fetal thymocyte heatmaps. The data do not appear to be robust enough to provide strong support for the authors' conclusion.

    4. Reviewer #3 (Public Review):

      Summary:<br /> This study provides a comparison of TCR gene segment usage between foetal and adult thymus.

      Strengths:<br /> Interesting computational analyses was performed to find interesting differences in TCR gene usage within unpaired TCRa and TCRb chains between foetal and adult thymus.

      Weaknesses:<br /> This study was significantly lacking insight and interpretation into what the data analysed actually means for the biology. The dataset discussed in the paper is from only two experiments. One comparing foetal and adult thymi from 4 mice per group and another which involved hydrocortisone treatment. The paper uses TCR sequencing methodology that sequences each TCR alpha and beta chains in an unpaired way, meaning that the true identity of the TCR heterodimer is lost. This also has the added problem of overestimating clonality, and underestimating diversity.<br /> Limited detail in the methods sections also limits the ability for readers to properly interpret the dataset. What sex of mice were used? Are there any sex differences? What were the animal ethics approvals for the study?

    1. eLife assessment

      This important work substantially advances our understanding of how resistant leukemia can arise without changes in mutational patterns by displaying epigenetic changes. The evidence supporting the conclusions is compelling, with rigorous genomic assays done on primary samples. and state-of-the-art microscopy. The work will be of broad interest to hematologists and cancer biologists.

    2. Reviewer #1 (Public Review):

      Analysis of a sizable number of matched primary AML samples from diagnosis and relapse was done with ATAC-seq and showed that epigenetic changes are seen at relapse. Meta-analysis of multiple studies showed that relapse is not associated generally with changes in mutational burden. The authors also performed clonal tracking with mitochondrial clones and show that heterogeneity in clonal expansions is seen in various cases. Overall, these are novel findings with translational relevance.

    3. Reviewer #2 (Public Review):

      In the manuscript entitled, "Convergent Epigenetic Evolution Drives Relapse in Acute Myeloid Leukemia", Majeti and colleagues describe patterns of chromatin accessibility alterations at relapse in AML. Through an analysis of publicly available datasets as well as their samples, they show that a subset of AML cases show significant changes in chromatin accessibility despite showing little to no change in clonal composition. Evaluation of predicted changes in gene expression based on chromatin accessibility identifies common differentially expressed pathways at relapse and indicates that blasts are more immature at relapse. Using mitochondrial single-cell ATAC-seq, the authors identify "mitoclones' and observe that mitochondrially-defined clones exhibit more similar chromatin accessibility at relapse relative to diagnosis. Based on these data, the authors conclude that epigenetic evolution is a feature of relapsed AML and that convergent epigenetic evolution can occur following induction chemotherapy.

      The strengths of this study are its novelty in AML and its rigorous use of single-cell ATAC-seq and mitochondrial single-cell ATAC-seq to identify chromatin accessibility patterns in AML blasts at diagnosis and relapse, including in clonally related blasts determined by mitochondrial DNA sequencing. That epigenetic changes contribute to relapse and therapy resistance, or that blasts at relapse are less differentiated are not new ideas, but these studies rigorously demonstrate these concepts in AML patient samples. These insights are important since they have the potential to identify novel targets that can be targeted in combination with induction chemotherapy.

      While these findings advance our understanding of potential mechanisms or disease relapse/therapy resistance in AML, some of the conclusions are less supported due to the lack of more information on clonally unstable cases. Given that 60-70% of AML cases are not clonally stable following chemotherapy, this raises questions regarding the broad applicability of the authors' proposed model. Indeed, it remains unclear why only a subset of AML cases shows stable clonal patterns.

    4. Reviewer #3 (Public Review):

      This manuscript reports a detailed genetic and epigenomic analysis of diagnostic and relapsed AML. The study is mostly correlative and some of the initial findings, such as the stability of mutations in epigenetic regulators at diagnosis and relapse and in signaling pathway modulators such as FLT3 and NRAS being lost - not novel.

      The authors show that in a large fraction (approximately half) of the relapsed AMLs they study, there are no alterations in the AML driver mutations. The authors conclude that this indicates that these patients show non-genetic mechanisms of relapse, for which the authors embark on a series of epigenomic experiments to try and pin down the correlative or causative epigenetic mechanisms. In 9 (out of 25) patient samples with stable driver mutations ( i.e. no change in clonality or novel AML driver mutation accumulation) the study shows that there is high epigenetic variability as measured by chromatin accessibility changes and that these changes resemble less differentiated state in the relapsed compared to the diagnostic sample. The manuscript makes some key observations: 1) non-genetic mechanisms are likely to account for relapse in a substantial proportion of patients. 2) some of the clones that emerge following relapse are likely present in prior diagnosis samples indicating that chemotherapy selects for them, 3) Of note, the authors also look at the LSC and non-LSC compartments and show that the LSC compartment is more rigid in terms of epigenetic evolution towards relapse than the non-LSC cells. 4) Using a small number of patients (but justifiable since the assays used are rigorous and demanding) - the authors present the most interesting finding of the study - that epigenetic evolution of relapse in several different AML patients seems to be convergent.<br /> This is based on the epigenetic similarities in clones (as defined by mitochondrial Atac-seq) between different epigenetic relapsed clones, even though they were distinct at diagnosis. Thus, this study has several important observations. Some of these observations are incremental - it has been shown that epigenetic mechanisms drive relapse in AML but several are not. I think this study - although descriptive and not showing causal relationships - is an important study for advancing our understanding of AML relapse.