2,373 Matching Annotations
  1. Jan 2023
    1. Author Response

      Reviewer #3 (Public Review):

      In this manuscript, Kim et al. use a deep generative model (a Variational Auto Encoder previously applied to adult data) to characterize neonatal-fetal functional brain development. The authors suggest that this approach is suitable given the rapid non-linear development taking place in the human brain across this period. Using two large neonatal and one fetal datasets, they describe that the resultant latent variables can lead to improved characterization of prenatal-neonatal development patterns, stable age prediction and that the decoder can reveal resting state networks. The study uses already accessible public datasets and the methods have been also made available.

      The manuscript is clearly written, the figures excellent and the application in this group novel. The methods are generally appropriate although there are some methodological concerns which I think would be important to address. Although the authors demonstrate that the methods are broadly generalisable across study populations - however, I am unsure about the general interest of the work beyond application of their previously described VAE approach to a new population and what new insight this offers to understanding how the human brain develops. This is a particular consideration given that the major results are age prediction (which is easily done with various imaging measures including something as simple as whole brain volume) and recapitulation of known patterns of functional activity in neonates. As such, the work will be of interest to researchers working in fMRI analysis methods and deep learning, but perhaps less so to a wider neuroscience/clinical readership.

      Specific comments:

      1) (M1) If I understand correctly, the method takes the functional data after volume registration into template space and then projects this data onto the surface. Given the complexities of changing morphology of the development brain. would it not be preferable to have the data in surface space for standard space alignment (rather than this being done later?). This would certainly help with one of the concerns expressed by the authors of "smoothing" in the youngest fetuses leading to a negative relationship between age and performance.

      While projecting onto the cortical surface has its advantages, as suggested here18, several studies have also shown that with careful registration, such as in the current study, volumetric registration can yield comparable performance19. Regardless, we did attempt to directly generate cortical surfaces for our fetuses. We refer the reviewer to our response to the RE-M2 [page 9].

      Regarding the “smoothing” effect in the youngest fetuses, we want to clarify that the smoothing effect in the scans of young fetuses is not unique to the choice of registration method. In other words, the same smoothing effect must be seen with cortical registration as well. Regarding this perspective, we kindly refer the reviewer to our response to RE-M1 [page 7]. Regarding the specific change made in the revised manuscript, we kindly refer to our response to R1-m5 [p21] or [page 9 line 191-213] in the main manuscript.

      2) (M2) A key limitation which I feel is important to consider if the method is aiming to be used for fetuses is the effects of the analysis being limited only to the cortical surface - and therefore the role of subcortical tissue (such as developmental layers in the immature white matter and key structures like the thalami) cannot be included. This is important, as in the fetal (and preterm neonatal) brain, the cortex is still developing and so not only might there be not the same kind of organisation to the activity, but also there is likely an evolving relationship with activity in the transient developmental layers (like the subplate) and inputs from the thalamus.

      The reviewer raises an important point. We agree with the reviewer that the subcortical region plays a critical role in fetal and newborn neurodevelopment. Unfortunately, our current VAE model cannot utilize such information without a major change in the model structure. We added this as a limitation of our study and discussed why our VAE model, in its current form, did not include subcortical areas. Please see our detailed response to RE-M1 [page 4] or [page 25 line 558-570] in the main manuscript.

      3) (M3) As the authors correctly describe, brain development and specifically functional relationships are likely evolving across the study time window. Beyond predicting age and a different way of estimating resting state networks using the decoding step, it is not clear to me what new insight the work is adding to the existing literature - or how the method has been specifically adapted for working with this kind of data. Whilst I agree that these developmental processes are indeed likely non-linear, to put the work in context, I think the manuscript would benefit from explaining how (or if) the method has been adapted and explicitly mentioning what additional neuroscientific/biological gains there are from this method.

      We appreciate the reviewer’s critical insights. In the revised paper, we included additional results that, we hope, can address the reviewer’s concerns. We believe that the strength of the VAE model is that, relative to linear models, it can be more generalizable across different datasets and ages (adult vs. full-term babies vs. preterm babies vs. fetuses). In the original manuscript, this was supported by the superior age prediction performance of the VAE over linear models when applied to different datasets covering the fetal to neonatal periods. Age prediction could also be done using other imaging modalities, as the reviewer pointed out. However, we do not think this undermines the potential impact of having the ability to accurately estimate age based on functional connectivity patterns. Brain function-structure relationships may not exactly be one-to-one20. It is entirely possible that for one disease, brain functional connectivity alterations precede structural changes such that delayed growth trajectories will first manifest in the functional space. There are also certain aspects of brain function that cannot be mapped directly to its structural characteristics (i.e., structural connectivity patterns). For example, brain changes its functional connectivity patterns dynamically over different brain states (resting vs. task-engaging)21, mental disorders (depression22, anxiety23, Schizophrenia24), cognitive traits25, 26, and individual uniqueness25, etc. Therefore, we believe that estimating the functional age of fetuses and neonates given their functional connectivity profiles may provide a biomarker for tracking neurodevelopment trajectories, allowing clinicians to identify deviations early and intervene in a timely manner if necessary. For these reasons, we believe that superior age prediction performance of the VAE model compared to linear models is scientifically significant.

      The value of the VAE lies in its ability to capture FC features that are otherwise not modeled by linear strategies. For example, here, we showed that only the VAE model can extract latent variables representing brain networks that are similar across different datasets. In contrast, linear models, showed higher network pattern similarity between full-term and preterm infants within the dHCP dataset. This suggests that the VAE model can be a very useful tool for capturing common brain networks in datasets acquired using different recording parameters and preprocessing steps. Moreover, the VAE representations predicted age with higher accuracy compared to linear representations. Together, these findings show that the methodology is effective in extracting functionally relevant features of the brain. Please see RE-M1 [page 3] and R1-m13 regarding the specific changes made in the revised manuscript.

      4) (M4) The unavoidable smoothing effect of VAE is very noticeable in the figures - does this suggest that the method will be relatively insensitive to the fine granularity which is important to understand brain development and the establishment of networks (such as the evolving boundaries between functional regions with age) - reducing inference to only the large primary sensory and associative networks? This will also be important to consider for the individual "reconstruction degree" - (which it would likely then overstate - and would need careful intersubject comparison also) if it was to be used as a biomarker or predictor of cognition as suggested by the authors.

      Regarding the first concern, yes. Greater smoothing will tend to yield less granular network patterns; this is true for all representational models (not only VAE, but also models like ICA or PCA). This effect becomes ever more pronounced when representations consist of fewer components (e.g., IC50); the smoothing effect becomes stronger, leading to coarser brain patterns (see Fig. 3 in the revised manuscript). In this regard, higher number of components is desired, but on the flipside, IC maps with higher components are generally less interpretable. In short, there will always be trade-offs between interpretability and spatial resolution. Also, higher components tend to cause over-fitting issue, as shown in our age prediction performance across different datasets (worse performance in the IC300 vs. IC50). In this sense, what matters for the representations is how informative each latent variable (or component) is. In the revised Fig. 2, we showed that latent variables from the VAE model were more informative in representing rsfMRI than linear representations. It is also noteworthy that the smoothing effect of the VAE is comparable to IC300 (similar effect to manual smoothing at the level of FWHM=5mm; revised Fig. 3). Given above results, we believe the VAE model may be more suitable for investigating finer scale of brain networks, than linear models. The above perspective was updated in the revised manuscript as [page 23 line 506-511]:

      "Another interesting observation was that the smoothing effect of the VAE is comparable to IC300 (similar effect to manual smoothing at the level of FWHM=5mm; Fig. 3). Given the above, we believe the VAE model may be more suitable for investigating finer scale of brain networks, than linear models. Perhaps, the VAE model with a greater number of latent variables (e.g., 512 or 1024 instead of 256 in the current VAE) can be utilized to find brain networks at finer scale."

      On top of the points raised above, network mapping with linear models is limited when it comes to mapping the spatial evolution of brain networks over aging due to their linear nature. This limitation can be observed in the ICA study with dHCP dataset (Fig. 4 in 7). On the other hand, thanks to its nonlinearity nature, the VAE model may have a potential to observe the spatial gradient of brain network over aging, while this expectation needs confirmation. To that end, we revised our discussion to reflect our perspective. We refer the full change made in the revised manuscript to our response to R1-m13.

    1. Author Response

      We thank the reviewers for their positive feedback and thoughtful suggestions that will improve our manuscript. Here we summarise our plan for immediate action. We will resubmit our manuscript once additional experiments have been performed to clarify all the major and minor concerns of the reviewers and the manuscript has been revised. At that point, we will respond to all reviewer’s points and highlight the changes made in the text.

      Reviewer #1 (Public Review):

      The authors have tried to correlate changes in the cellular environment by means of altering temperature, the expression of key cellular factors involved in the viral replication cycle, and small molecules known to affect key viral protein-protein interactions with some physical properties of the liquid condensates of viral origin. The ideas and experiments are extremely interesting as they provide a framework to study viral replication and assembly from a thermodynamic point of view in live cells.

      The major strengths of this article are the extremely thoughtful and detailed experimental approach; although this data collection and analysis are most likely extremely time-consuming, the techniques used here are so simple that the main goal and idea of the article become elegant. A second major strength is that in other to understand some of the physicochemical properties of the viral liquid inclusion, they used stimuli that have been very well studied, and thus one can really focus on a relatively easy interpretation of most of the data presented here.

      There are three major weaknesses in this article. The way it is written, especially at the beginning, is extremely confusing. First, I would suggest authors should check and review extensively for improvements to the use of English. In particular, the abstract and introduction are extremely hard to understand. Second, in the abstract and introduction, the authors use terms such as "hardening", "perturbing the type/strength of interactions", "stabilization", and "material properties", for just citing some terms. It is clear that the authors do know exactly what they are referring to, but the definitions come so late in the text that it all becomes confusing. The second major weakness is that there is a lack of deep discussion of the physical meaning of some of the measured parameters like "C dense vs inclusion", and "nuclear density and supersaturation". There is a need to explain further the physical consequences of all the graphs. Most of them are discussed in a very superficial manner. The third major weakness is a lack of analysis of phase separations. Some of their data suggest phase transition and/or phase separation, thus, a more in-deep analysis is required. For example, could they calculate the change of entropy and enthalpy of some of these processes? Could they find some boundaries for these transitions between the "hard" (whatever that means) and the liquid?

      The authors have achieved almost all their goals, with the caveat of the third weakness I mentioned before. Their work presented in this article is of significant interest and can become extremely important if a more detailed analysis of the thermodynamics parameters is assessed and a better description of the physical phenomenon is provided.

      We thank reviewer 1 for the comments and, in particular, for being so positive regarding the strengths of our manuscript and for raising concerns that will surely improve the manuscript. At this point, we propose the following actions to address the concerns of Reviewer 1:

      1) We will extensively revise the use of English, particularly, in the abstract and introduction, defining key terms as they come along in the text to make the argument clearer.

      2) We acknowledge the importance of discussing our data in more detail and we propose the following. We will discuss the graphs and what they mean as exemplified in the paragraph below.

      Regarding Figure 3 - As the concentration of vRNPs increases, we observe an increase in supersaturation until 12hpi. This means that contrary to what is observed in a binary mixture, in which the Cdilute is constant (Klosin et al., 2020), the Cdilute in our system increases with concentration. It has been reported that Cdilute increases in a multi-component system with bulk concentration (Riback et al., 2020). Our findings have important implications for how we think about the condensates formed during influenza infection. As the 8 different genomic vRNPs have a similar overall structure, they could, in theory, behave as a binary system between units of vRNPs and Rab11a. However, a change in Cdilute with concentration shows that our system behaves as a multi-component system. This means that the differences in length, RNA sequence and valency that each vRNP have are key for the integrity of condensates.

      3) The reviewer calls our attention to the lack of analysis of phase separations. We think that phase separation (or percolation coupled to phase separation) governs the formation of influenza A virus condensates. However, we think we ought to exert caution at this point as the condensates we are working with are very complex and that the physics of our system in cells may not be sufficient to claim phase separation without an in vitro reconstitution system. In fact, IAV inclusions contain cellular membranes, different vRNPs and Rab11a. So far, we can only speculate that the liquid character of IAV inclusions may arise from a network of interacting vRNPs that bridge several cognate vRNP-Rab11 units on flexible membranes, similarly to what happens in phase separated vesicles in neurological synapses. However, the speculative model for our system, although being supported by correlative light and electron microscopy, currently lacks formal experimental validation.

      For this reason, we thought of developing the current work as an alternative to explore the importance of the liquid material properties of IAV inclusions. By finding an efficient method to alter the material properties of IAV inclusions, we provide proof of principle that it is possible to impose controlled phase transitions that reduce the dynamics of vRNPs in cells and negatively impact progeny virion production. Despite having discussed these issues in the limitations of the study, we will make our point clearer.

      We are currently establishing an in vitro reconstitution system to formally demonstrate, in an independent publication, that IAV inclusions are formed by phase separation. For this future work, we teamed up with Pablo Sartori, a theorical physicist to derive in- depth analysis of the thermodynamics of the viral liquid condensates. Collectively, we think that cells have too many variables to derive meaningful physics parameters (such as entropy and enthalpy) as well as models and need to be complemented by in vitro systems. For example, increasing the concentration inside a cell is not a simple endeavour as it relies on cellular pathways to deliver material to a specific place. At the same time, the 8 vRNPs, as mentioned above, have different size, valency and RNA sequence and can behave very differently in the formation of condensates and maintenance of their material properties. Ideally, they should be analysed individually or in selected combinations. For the future, we will combine data from in vitro reconstitution systems and cells to address this very important point raised by the reviewer.

      From the paper on the section Limitations of the study: “Understanding condensate biology in living cells is physiologically relevant but complex because the systems are heterotypic and away from equilibria. This is especially challenging for influenza A liquid inclusions that are formed by 8 different vRNP complexes, which although sharing the same structure, vary in length, valency, and RNA sequence. In addition, liquid inclusions result from an incompletely understood interactome where vRNPs engage in multiple and distinct intersegment interactions bridging cognate vRNP-Rab11 units on flexible membranes (Chou et al., 2013; Gavazzi et al., 2013; Haralampiev et al., 2020; Le Sage et al., 2020; Shafiuddin & Boon, 2019; Sugita, Sagara, Noda, & Kawaoka, 2013). At present, we lack an in vitro reconstitution system to understand the underlying mechanism governing demixing of vRNP-Rab11a-host membranes from the cytosol. This in vitro system would be useful to explore how the different segments independently modulate the material properties of inclusions, explore if condensates are sites of IAV genome assembly, determine thermodynamic values, thresholds accurately, perform rheological measurements for viscosity and elasticity and validate our findings”.

      Reviewer #2 (Public Review):

      During Influenza virus infection, newly synthesized viral ribonucleoproteins (vRNPs) form cytosolic condensates, postulated as viral genome assembly sites and having liquid properties. vRNP accumulation in liquid viral inclusions requires its association with the cellular protein Rab11a directly via the viral polymerase subunit PB2. Etibor et al. investigate and compare the contributions of entropy, concentration, and valency/strength/type of interactions, on the properties of the vRNP condensates. For this, they subjected infected cells to the following perturbations: temperature variation (4, 37, and 42{degree sign}C), the concentration of viral inclusion drivers (vRNPs and Rab11a), and the number or strength of interactions between vRNPs using nucleozin a well-characterized vRNP sticker. Lowering the temperature (i.e. decreasing the entropic contribution) leads to a mild growth of condensates that does not significantly impact their stability. Altering the concentration of drivers of IAV inclusions impact their size but not their material properties. The most spectacular effect on condensates was observed using nucleozin. The drug dramatically stabilizes vRNP inclusions acting as a condensate hardener. Using a mouse model of influenza infection, the authors provide evidence that the activity of nucleozin is retained in vivo. Finally, using a mass spectrometry approach, they show that the drug affects vRNP solubility in a Rab11a-dependent manner without altering the host proteome profile.

      The data are compelling and support the idea that drugs that affect the material properties of viral condensates could constitute a new family of antiviral molecules as already described for the respiratory syncytial virus (Risso Ballester et al. Nature. 2021).

      Nevertheless, there are some limitations in the study. Several of them are mentioned in a dedicated paragraph at the end of a discussion. This includes the heterogeneity of the system (vRNP of different sizes, interactions between viral and cellular partners far from being understood), which is far from equilibrium, and the absence of minimal in vitro systems that would be useful to further characterize the thermodynamic and the material properties of the condensates.

      We thank reviewer 2 for highlighting specific details that need improving and raising such interesting questions to validate our findings. We will address all the minor comments of Reviewer 2. To address the comments of Reviewer 2, we propose the actions described in blue below each point raised that is written in italics.

      1) The concentrations are mostly evaluated using antibodies. This may be correct for Cdilute. However, measurement of Cdense should be viewed with caution as the antibodies may have some difficulty accessing the inner of the condensates (as already shown in other systems), and this access may depend on some condensate properties (which may evolve along the infection). This might induce artifactual trends in some graphs (as seen in panel 2c), which could, in turn, affect the calculation of some thermodynamic parameters.

      The concern of using antibodies to calculate Cdense is valid. We will address this concern by validating our results using a fluorescent tagged virus that has mNeon Green fused to the viral polymerase PA (PA-mNeonGreen PR8 virus). Like NP, PA is a component of vRNPs and labels viral inclusions, colocalising with Rab11 when vRNPs are in the cytosol without the need of using antibodies.

      This virus would be the best to evaluate inclusion thermodynamics, where it not an attenuated virus (Figure 1A below) with a delayed infection as demonstrated by the reduced levels of viral proteins (Figure 1B below). Consistently, it shows differences in the accumulation of vRNPs in the cytosol and viral inclusions form later in infection. After their emergence, inclusions behave as in the wild-type virus (PR8-WT), fusing and dividing (Figure 1C below) and displaying liquid properties. The differences in concentration may shift or alter thermodynamic parameters such as time of nucleation, nucleation density, inclusion maturation rate, Cdense, Cdilute. This is the reason why we performed the thermodynamics profiling using antibodies upon PR8-WT infection. For validating our results, and taking into account a possible delayed kinetics, and differenced that may occur because of reduced vRNP accumulation in the cytosol, this virus will be useful and therefore we will repeat the thermodynamics using it.

      As a side note, vRNPs are composed of viral RNA coated with several molecules of NP and each vRNP also contains 1 copy of the trimeric RNA dependent RNA polymerase formed by PA, PB1 and PB2. It is well documented that in the cytosol the vast majority of PA (and other components of the polymerase) is in the form of vRNPs (Avilov, Moisy, Munier, et al., 2012; Avilov, Moisy, Naffakh, & Cusack, 2012; Bhagwat et al., 2020; Lakdawala et al., 2014), and thus we can use this virus to label vRNPs on condensates to corroborate our studies using antibodies.

      Figure 1 – The PA- mNeonGreen virus is attenuated in comparison to the WT virus. A. Cells (A549) were infected or mock-infected with PR8 WT or PA- mNeonGreen (PA-mNG) viruses, at a multiplicity of infection (MOI) of 3, for the indicated times. Viral production was determined by plaque assay and plotted as plaque forming units (PFU) per milliliter (mL) ± standard error of the mean (SEM). Data are a pool from 2 independent experiments. B. The levels of viral PA, NP and M2 proteins and actin in cell lysates at the indicated time points were determined by western blotting. C. Cells (A549) were transfected with a plasmid encoding mCherry-NP and co-infected with PA-mNeonGreen virus for 16h, at an MOI of 10. Cells were imaged under time-lapse conditions starting at 16 hpi. White boxes highlight vRNPs/viral inclusions in the cytoplasm in the individual frames. The dashed white and yellow lines mark the cell nucleus and the cell periphery, respectively. The yellow arrows indicate the fission/fusion events and movement of vRNPs/ viral inclusions. Bar = 10 µm. Bar in insets = 2 µm.

      2) Although the authors have demonstrated that vRNP condensates exhibit several key characteristics of liquid condensates (they fuse and divide, they dissolve upon hypotonic shock or upon incubation with 1,6-hexanediol, FRAP experiments are consistent with a liquid nature), their aspect ratio (with a median above 1.4) is much higher than the aspect ratio observed for other cellular or viral liquid compartments. This is intriguing and might be discussed.

      IAV inclusions have been shown to interact with microtubules and the endoplasmic reticulum, that confers movement, and also undergo fusion and fission events. We propose that these interactions and movement impose strength and deform inclusions making them less spherical. To validate this assumption, we compared the aspect ratio of viral inclusions in the absence and presence of nocodazole (that abrogates microtubule-based movement). The data in figure 2 shows that in the presence of nocodazole, the aspect ratio decreases from 1.42±0.36 to 1.26 ±0.17, supporting our assumption.

      Figure 2 – Treatment with nocodazole reduces the aspect ratio of influenza A virus inclusions. Cells (A549) were infected PR8 WT and treated with nocodazole (10 µg/mL) for 2h time after which the movement of influenza A virus inclusions was captured by live cell imaging. Viral inclusions were segmented, and the aspect ratio measured by imageJ, analysed and plotted in R.

      3) Similarly, the fusion event presented at the bottom of figure 3I is dubious. It might as well be an aggregation of condensates without fusion.

      We will change this, thank you for the suggestion.

      4) The authors could have more systematically performed FRAP/FLAPh experiments on cells expressing fluorescent versions of both NP and Rab11a to investigate the influence of condensate size, time after infection, or global concentrations of Rab11a in the cell (using the total fluorescence of overexpressed GFP-Rab11a as a proxy) on condensate properties.

      We will try our best to be able to comply with this suggestion as we think it is important.

      Reviewer #3 (Public Review):

      This study aims to define the factors that regulate the material properties of the viral inclusion bodies of influenza A virus (IAV). In a cellular model, it shows that the material properties were not affected by lowering the temperature nor by altering the concentration of the factors that drive their formation. Impressively, the study shows that IAV inclusions may be hardened by targeting vRNP interactions via the known pharmacological modulator (also an IAV antiviral), nucleozin, both in vitro and in vivo. The study employs current state-of-the-art methodology in both influenza virology and condensate biology, and the conclusions are well-supported by data and proper data analysis. This study is an important starting point for understanding how to pharmacologically modulate the material properties of IAV viral inclusion bodies.

      We thank this reviewer for all the positive comments. We will address the minor issues brought to our attention entirely, including changing the tittle of the manuscript and we will investigate the formation and material properties of IAV inclusions in the presence and absence of nucleozin for the nucleozin escape mutant NP-Y289H.

      References

      Avilov, S. V., Moisy, D., Munier, S., Schraidt, O., Naffakh, N., & Cusack, S. (2012). Replication- competent influenza A virus that encodes a split-green fluorescent protein-tagged PB2 polymerase subunit allows live-cell imaging of the virus life cycle. J Virol, 86(3), 1433- 1448. doi:10.1128/JVI.05820-11

      Avilov, S. V., Moisy, D., Naffakh, N., & Cusack, S. (2012). Influenza A virus progeny vRNP trafficking in live infected cells studied with the virus-encoded fluorescently tagged PB2 protein. Vaccine, 30(51), 7411-7417. doi:10.1016/j.vaccine.2012.09.077

      Bhagwat, A. R., Le Sage, V., Nturibi, E., Kulej, K., Jones, J., Guo, M., . . . Lakdawala, S. S. (2020). Quantitative live cell imaging reveals influenza virus manipulation of Rab11A transport through reduced dynein association. Nat Commun, 11(1), 23. doi:10.1038/s41467-019-13838-3

      Chou, Y. Y., Heaton, N. S., Gao, Q., Palese, P., Singer, R. H., & Lionnet, T. (2013). Colocalization of different influenza viral RNA segments in the cytoplasm before viral budding as shown by single-molecule sensitivity FISH analysis. PLoS Pathog, 9(5), e1003358. doi:10.1371/journal.ppat.1003358

      Gavazzi, C., Yver, M., Isel, C., Smyth, R. P., Rosa-Calatrava, M., Lina, B., . . . Marquet, R. (2013). A functional sequence-specific interaction between influenza A virus genomic RNA segments. Proc Natl Acad Sci U S A, 110(41), 16604-16609. doi:10.1073/pnas.1314419110

      Haralampiev, I., Prisner, S., Nitzan, M., Schade, M., Jolmes, F., Schreiber, M., . . . Herrmann, A. (2020). Selective flexible packaging pathways of the segmented genome of influenza A virus. Nat Commun, 11(1), 4355. doi:10.1038/s41467-020-18108-1

      Klosin, A., Oltsch, F., Harmon, T., Honigmann, A., Julicher, F., Hyman, A. A., & Zechner, C. (2020). Phase separation provides a mechanism to reduce noise in cells. Science, 367(6476), 464-468. doi:10.1126/science.aav6691

      Lakdawala, S. S., Wu, Y., Wawrzusin, P., Kabat, J., Broadbent, A. J., Lamirande, E. W., . . . Subbarao, K. (2014). Influenza a virus assembly intermediates fuse in the cytoplasm. PLoS Pathog, 10(3), e1003971. doi:10.1371/journal.ppat.1003971

      Le Sage, V., Kanarek, J. P., Snyder, D. J., Cooper, V. S., Lakdawala, S. S., & Lee, N. (2020). Mapping of Influenza Virus RNA-RNA Interactions Reveals a Flexible Network. Cell Rep, 31(13), 107823. doi:10.1016/j.celrep.2020.107823

      Riback, J. A., Zhu, L., Ferrolino, M. C., Tolbert, M., Mitrea, D. M., Sanders, D. W., . . . Brangwynne, C. P. (2020). Composition-dependent thermodynamics of intracellular phase separation. Nature, 581(7807), 209-214. doi:10.1038/s41586-020-2256-2

      Shafiuddin, M., & Boon, A. C. M. (2019). RNA Sequence Features Are at the Core of Influenza a Virus Genome Packaging. J Mol Biol. doi:10.1016/j.jmb.2019.03.018

      Sugita, Y., Sagara, H., Noda, T., & Kawaoka, Y. (2013). Configuration of viral ribonucleoprotein complexes within the influenza A virion. J Virol, 87(23), 12879- 12884. doi:10.1128/JVI.02096-13

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Shaikh and Sunagar addresses the question of the origin of spider venom proteins. It has been known for many years that an important component of spider venoms is a diverse group of small proteins known as disulfide-rich peptides (DRPs). However, it has not been clear whether this group of proteins has a common origin or evolved convergently in different lineages. The authors collected sequences of the genes encoding these proteins from publicly available genomes of spiders from a range of families. They aligned the sequences using the structural cysteines as guides and carried out a phylogenetic analysis of the different sequences, ultimately classifying the different proteins into over 50 super-families. One thing that is not clear from the text or from the references cited (I am not an expert on spider venom) is how many of these superfamilies were known before and how many are novel. There is also no clear indication of what criteria were used to define a subset of sequences as a superfamily. Nonetheless, the authors show that all these superfamilies have a single common ancestor, predating the divergence of araneomorphs and mygalomorphs and that the DRPs underwent independent diversification in each of these two lineages.

      We have identified 78 novel superfamilies in this study and 33 were previously identified (Pineda et al. 2020 PNAS). We had previously described information in lines 90, 101 and 106 regarding the description of novel superfamilies from previous studies and the ones described in this study.

      Line 90 “Recently, using a similar approach, 33 novel spider toxin superfamilies have been identified from the venom of the Australian funnel-web spider, Hadronyche infensa (9).”

      Line 101 “This approach enabled the identification of 33 novel toxin superfamilies along the breadth of Mygalomorphae (Figures S1 and S2).”

      Line 106 “Moreover, analyses of Araneomorphae toxin sequences using the strategy above resulted in the identification of 45 novel toxin superfamilies from Araneomorphae, all of which but one (SF109) belonged to the DRP class of toxins (Figures S3 and S4).”

      Spider toxin superfamilies have been named after gods/deities of death, destruction and the underworld based on nomenclature introduced by Pineda et al. (2014 BMC genomics). We have now included this explanation in the manuscript under the methods and results sections. We have also provided additional details pertaining to this nomenclature in Table S1.

      The authors also looked at selective forces acting on the sequences using dN/dS analyses. They reach the conclusion that there are different modes of selection acting on different sequences based on their role - defensive or predatory venoms - building on previous work by the lead author on venom sequence evolution in diverse animals.

      All in all, this is an admirable piece of molecular evolution work, providing new data on the evolution of spider venom proteins. There are some confusions in terminology that need to be cleared up, and somewhat more context needs to be given for non-specialists as detailed in the points below:

      We thank the reviewer for their constructive and critical suggestions, as well as the kind words of encouragement. Their suggestions have helped us in significantly improving the quality of our work.

      Suggestion 1) Common names of the main spider infraorders should be given.

      We thank the reviewer for their helpful input. We have now introduced spider infraorders with well-known spiders and their common names under the introduction section. Furthermore, we have also included a schematic representation of the spider phylogeny, and highlighted lineages under investigation as Figure 1.

      Suggestion 2) Opisthothelae is not the common ancestor of Mygalomorphae and Araneamorphae, but the clade that encompasses those two clades. This incorrect statement appears in several places. Further on, it is stated that Opisthothelae is the common ancestor of all extant spiders. This is wrong both from a terminological point of view (a clade cannot be ancestral to another clade) and from a factual point of view, since there are extant spiders not included in Opisthothelae.

      We thank the reviewer for pointing out this oversight. We have now corrected it to suborder Opisthothelae as the clade encompassing Mygalomorphae and Araneomorphae spiders.

      Suggestion 3) Several proteins and proteins families are mentioned without being introduced, e.g. knottin. Please provide short descriptions.

      We have now provided a short introduction to terms such as Knottin.

      Reviewer #2 (Public Review):

      This interesting study looks into the evolution of putative spider venom toxins, specifically disulfide-rich peptides (DRPs). The authors use published sequence data to gain new insights into the evolution of DRPs, which are the major component of most spider venoms. Through a series of sequence comparisons and phylogenetic analyses they identify a substantial number of new spider toxin superfamilies with distinct cysteine scaffolds, and they trace these back to a primitive scaffold that must have been present in the last common ancestor of mygalomorph and araneomorph spiders. Looking at the taxonomic distribution of these putative venom DRPs, they conclude that mygalomorph and araneomorph DRPs have evolved in different ways, with the former being recruited into venom at the level of genera, and the latter at the level of families. In addition, they perform selection analyses on the DRP superfamilies to uncover the surprising result that mygalomorph and araneomorph DRPs have evolved under different selective regimes, with the evolution of the former being characterised by positive selection, and the latter by purifying (negative) selection.

      However, I don't think that in the current state of the manuscript these conclusions are robustly supported for several reasons. First, it seems that not all previously published data were included in the phylogenetic analyses that were used to identify new superfamilies of DRPs.

      We have, indeed, analysed all spider toxin sequences available to date. We have relied on the signal and propeptide regions for identifying novel superfamilies, which is an accepted convention: Pineda et al. (2014 BMC Genomics); Pineda et al. (2020 PNAS).

      Although many additional superfamilies can be identified, we have only retained those sequences for which there were at least 5 representatives for the identification of toxin superfamilies, and 15 representatives for selection analyses to ensure robustness. This filtering step ensured that the generated alignments, phylogenetic trees, and evolutionary assessments were robust and devoid of noise that stems from single-representative groups. Adding in those sequences would have enabled us to identify many more superfamilies, solely based on the signal and propeptide examination, but it wouldn’t have been possible to support them with other lines of evidence that were provided for all other superfamilies in this study, jeopardising the overall quality of the manuscript. Nonetheless, there is strong evidence that the left-out sequences are also related to the ones analysed in this study (Figure S10). In future, when more transcriptomes are sequenced, it would be possible to designate these newer toxin superfamilies with much stronger support.

      Second, much of the data were obtained from whole-body transcriptome data, which leaves a degree of uncertainty that these data indeed derive from the venom glands that produce the toxins.

      We respectfully disagree with the reviewer that ‘much of the data’ are from the whole-body transcriptomes. Nearly all sequences in our study are sourced from Pineda et al. (2014 BMC Genomics and 2020 PNAS), Sunagar et al (2013 Toxins), Cole and Brewer (2020 bioRxiv) and transcriptome sequence assembly data from established online repositories NCBI (NR and TSA) and ENA. All the above-mentioned studies (KS is a part of many of these) under their methods section clearly state that the transcriptomes were generated using mRNA isolated from venom gland tissue (BioProject accessions: PRJEB14734; PRJEB6062; PRJNA189679, PRJNA587301 and PRJNA189679, where source tissue type is designated as venom gland).

      We would like to direct the reviewer’s attention to the following excerpts from reference papers from which data for this study has been sourced:

      1. Pineda S et al. (2020 PNAS): “Three days later, they were anesthetized, and their venom glands were dissected and placed in TRIzol reagent (Life Technologies). Total RNA from pooled venom glands was extracted following the standard TRIzol protocol.”
      2. Sunagar et al (2013 Toxins): “Paired venom glands were dissected out and pooled from nine mature females on the fourth day after venom depletion by electrostimulation. Total RNA was extracted using the standard TRIzol Plus method ...”
      3. Cole and Brewer (2020 bioRxiv): “... the venom glands of each ctenid were dissected out, whole RNA was isolated from the venom glands …”

      We would also like to point out that hexatoxins are widely studied and are some of the most well-understood spider venom toxins. Many representatives have been functionally characterised and shown to be potent in affecting prey and predatory species [Sunagar et al (2013 Toxins); Pineda et al. (2014 BMC Genomics and 2020 PNAS); Volker, et al. (2020 PNAS) - KS is a part of most of these studies as well]. However, the current technologies do not permit the high-throughput screening of the enormous diversity of toxins in spiders, which is why not every toxin sequence identified from the venom gland is functionally characterised. Nonetheless, venom researchers will not contest the role of these highly expressed venom gland proteins in envenoming, especially given that they share significant sequence identities with toxins that are functionally well-characterised.

      The only exception to the above is non-ctenid araneomorph toxin superfamily sequences, which are retrieved from whole-body transcriptomes (Cole and Brewer; 2020 bioRxiv). The authors of the paper indicated these as putative toxins. As explained above, homologs of these peptides are well-characterised to be venom toxins. Additionally, in our phylogenetic trees (Figures 3, 4, S6 and S9), they are nested within the toxin clades, reaffirming their identity.

      Third, the taxonomic representation of mygalomorph and araneomorph diversity in this study is so sparse that it becomes impossible to distinguish whether toxin recruitments have happened at the level of genera, families, or even higher-level taxa.

      We respectfully disagree with this suggestion. The taxonomic breadth investigated in this study isn’t sparse. Analysed sequences belong to groups across the breadth of the spider phylogeny. To address this criticism, we are now including a schematic representation of spider phylogeny, where lineages under investigation are highlighted (Figure 1A). Given this broader taxonomic breadth, all of our interpretations are parsimoniously extendable to their common ancestors. For instance, we establish the common origin of all DRPs in the members of these widespread spider families. Therefore, not including sequences from other sister groups will not invalidate this hypothesis, and the most parsimonious explanation will be that the missing members too are likely to have DRPs in their venom (which is also a common understanding of the spider venom research). Whether DRPs dominate the venoms of these missing groups will only come to light upon investigation, but their presence in the venom is highly likely. Moreover, please do note that we have analysed nearly all sequences available in the literature to date.

      As for the recruitment of the toxin superfamily at the taxon level, we would like to point out the phylogenies in Figures 2 and 3 that clearly show the differential recruitment events. We would also like to point out lines 120 and 136 state that this may not only be a result of recruitment and could arise from differential rates of diversification (also evident in other analyses presented in Figures 5 and Tables S2 and S3).

      Line 120 “Interestingly, the plesiotypic DRP scaffold seems to have undergone lineage-specific diversification in Mygalomorphae, where the selective diversification of the scaffold has led to the origination of novel toxin superfamilies corresponding to each genus (Figure 2).”

      Line 136 “However, we also documented a large number of DRP toxins (n=32) that were found to have diversified in a family-specific manner, wherein, a toxin scaffold seems to be recruited at the level of the spider family, rather than the genus. As a result, and in contrast to mygalomorph DRPs, araneomorph toxin superfamilies were found to be scattered across spider lineages (Figure 3; Figure S6; node support: ML: >90/100; BI: >0.95).”

      Adding any number of missing lineages will neither change the fact that araneomorphs ‘appear’ to have recruited these superfamilies at the genera level, nor the family-level recruitment of toxin superfamilies in a large number of examined mygalomorphs.

      We have now introduced a new figure (Figure 7) that highlights the different scenarios that explain the observed differences in the evolution of mygalomorph and araneomorph spider toxins. We have also included additional text in the manuscript to explain this better.

      Fourth, only a selection of DRP superfamilies was used for natural selection analyses, without the authors explaining how this selection was made. Yet, they attempted to draw general conclusions about toxin evolution in mygalomorphs and araneomorphs, even though most of the striking differences they found were restricted to just two mygalomorph genera, and one family of araneomorphs.

      From our experience and previous reports [Sunagar and Moran (2015, PLoS genetics); Sunagar, et al. (2012, MBE); Yang, Z. (2007, MBE)], the unavailability of enough sequences from datasets results in inaccurate estimation of omega values. For instance, if there are only a couple of sequences in a superfamily, both of which are slightly different from one another, then even these minor differences in them would be exaggerated. Hence, we have resorted to performing selection analysis on datasets for which there are at least 15 sequences. No doubt that this conservative approach reduces the number of datasets analysed, but it also ensures that our findings are well-supported. We have now clarified this in our manuscript under the methods section.

      However, we did previously include sequences from all toxin superfamilies described to date in our alignment figure (Fig S10) and analysed their signal and propeptide regions. They were only excluded from selection analyses. It can be seen that they too are DRPs, but they belong to distinct superfamilies from the ones being described here.

      If these concerns are addressed this study can shed important new light on venom toxin evolution in one of the most diverse venomous taxa on Earth.

      We thank the reviewer for their constructive inputs and suggestions which have enabled us to make this manuscript more accessible to a wider audience.

      Reviewer #3 (Public Review):

      This work aims to elucidate the evolutionary origins of disulfide-rich spider toxin superfamilies and to determine the modes of natural selection and associated ecological pressures acting upon them. The authors provide a compelling line of evidence for a single evolutionary origin and differing factors (e.g., prey capture strategies and methods of anti-predator defense) that have shaped the evolution of these toxins. Additionally, the two major spider infraorders are claimed to have experienced differing selective pressures regarding these toxins.

      The results presented here are novel and generally well-presented. The evidence for a single origin of DRP toxins in spiders is exciting and changes the paradigm of spider venom evolution.

      The data are well analyzed, but the methods lack enough detail to reproduce the results. More information regarding the parameters passed to each software package, version numbers of all software employed, and models of molecular evolution employed in phylogenetic analyses are among the necessary missing information.

      We thank the reviewer for their kind words and constructive and critical suggestions. Their suggestions have contributed towards improving the quality of our work. Upon their suggestion, we have now expanded the methods section to include more details.

      The differences in the evolutionary pressures between mygalomorphs and RTA-clade spider DRP toxins are clear, but expanding RTA results to all araneomorphs may be overreaching. Additional araneomorph sequence data is available, despite the claims within this manuscript (e.g., see Jiang et al.. 2013 Toxins; He et al.. 2013 PLoS ONE; and Zobel-Thropp et al.. 2017 PEERJ). These papers include cDNA sequences of spider venom glands and contain representatives of inhibitory cysteine knot toxins, which are DRP toxins. These data would greatly enhance the strengths of the results presented herein.

      In response to the expansion of RTA results to araneomorphs, we would like to point out that RTA comprises about 50% of the diversity recorded in Araneomorphae. The araneomorph data analysed in our study covers a range of araneomorph family divergence time Agelenidae (<70 MYA), Pisauridae (<50 MYA) and Theridiidae (~200 MYA, Magalhaes 2020, Biological Reviews 95.1). We report a strong signature of purifying selection influencing the evolution of araneomorph toxin SFs, despite the long evolutionary time separating them (50 - 200 MYA). We firmly believe that further addition of toxin sequence data from other groups will not deviate from the general trend of molecular evolution observed in both these lineages across such large period of time; barring certain certain exceptions (such as SF13 a defensive toxin identified from Hadronyche experiencing purifying selection; Volker, et al. 2020 PNAS).

      We had initially excluded non-ctenid datasets from our analyses on account of poor sequence annotation and lack of representative sequence data. However, we have now incorporated Dolomedes mizhoanus (DRP) (Jiang et al. 2013 Toxins) and Latrodectus tredecimguttatus (non-DRP) (He et al. 2013 PLoS ONE) toxin dataset into our analyses, following reviewer’s suggestion. This has led to identification of 5 novel superfamilies, providing additional support to our spider venom evolution hypothesis.

    1. Author Response

      Reviewer #1 (Public Review):

      Lin et al. characterise cellular pathologies in PLA2G6 mutant patient-derived neuronal cells (neuronal progenitor cells, NPCs, and IPSc-derived dopaminergic neurones) and a novel compound heterozygous PLA2G6 mutant mouse model. They build on their previous findings in an INAD fly model (lacking PLA2G6) to show that lysosomal and mitochondrial defects are evolutionary conserved in PLA2G6 deficiency. The authors proceed to use their INAD fly model and to screen a number of compounds that are predicted to modulate endo-lysosomal function using a bang sensitivity assay. They then show that the drugs that can rescue this fly behavioural phenotype also reduce LAMP2 expression in patientderived NPCs on Western blot analysis. Lastly, the manuscript reports the creation of new genetic constructs that express human PLA2G6 and study expression levels in a human kidney cell line as well as in patent-derived NPCs. In the latter neuronal model, they show that expression of human PLA2G6 can rescue mitochondrial fragmentation associated with PLA2G6 loss-of-function. Lin et al then show that ICV (intracerebroventricular) and IV (intravenous) injection of a human PLA2G6-containing construct is able to partially rescue the rotarod phenotype in PLA2G6 transheterozygous PLA2G6 mutant mice between ~110 and 150 days. There is also an associated improvement in lifespan and body weight.

      The strengths of this work are that the authors use a number of different model organism systems, including patient-derived neuronal cells, Drosophila models (INAD flies) and mouse models to study PLA2G6-associated neurodegeneration (PLAN) at the cellular level. They also screen drug compounds that are predicted to target endo-lysosomal trafficking and sphingolipid metabolic pathways to ameliorate PLAN, thus identifying potential new therapeutic strategies. The work in mice, showing that gene therapy with human PLA2G6 can rescue a behavioural phenotype and lifespan is the first proof-ofconcept of such an advancement. This work will hopefully lead to further studies for optimisation toward clinical advancement.

      We thank the reviewer and editor for the positive comments about our manuscript.

      The major weaknesses are that the pathogenic mechanisms shown in the patient-derived neuronal cells and mice do not extend as far as those previously shown in the fly model published by the authors. Of note, ceramide levels and retromer function are not studied, both key pathologies described in the previous fly models. In addition, the drug screening is limited by its testing in one fly behavioural assay and LAMP2 Western blot analysis on patient derived NPCs.

      The results, in general, support the conclusions of the authors and represent well-performed work. However, the significance of elevated glucosylceramide levels is not clear in the present study. Although this was previously found to be elevated in INAD flies, it was ceramide levels that were thought to be the main toxic insult, with drugs aimed at reducing ceramide levels being shown to rescue INAD flies.

      We addressed these concerns. Please refer to our response to each of the specific point listed below.

      This work will no doubt be of significant interest to the field, confirming several previous findings in the Drosophila model of PLA2G6 (iPLA2-VIA) knockout. It also extends upon the fly work by identifying compounds that can be further studied for potential drug-re-purposing for the treatment of PLA2G6associated disease. The gene therapy studies are also very interesting and a first proof-of-principle in PLAN using ICV and IV delivery in a mouse model.

      We thank the reviewers and editor as addressing all these concerns really improved the manuscript.

      Reviewer #2 (Public Review):

      This article aims to extend human disease-related studies of PLA2G6 from fly models to iPS-neurons, mouse models, to look for drugs that suppress phenotypes and test them, and to attempt AAV whole body rescue. Generally, each of these questions/aims/experiments is excellent, but as presented, it's a bit of an underdeveloped hodgepodge of results, with each experiment somewhat underdeveloped or analyzed for the respective phenotype, in my opinion. I think the general thrust of the experiments is excellent. But the data are relatively cursory in many instances. Further development and characterization of the phenotypes would require quite a bit of work but vastly improve the paper.

      We thank the reviewer for the positive comments about our manuscript. We have addressed most of the concerns.

    1. Author Response

      Reviewer #1 (Public Review):

      Like other sensory organs, the inner ear has a rich population of pericytes, essential for sensory hair cell heath and normal hearing. In this study, using an inducible and conditional pericyte depletion mouse (PdgfrbCreERT2/iDTR) model, the authors demonstrate that the pericytes play critical roles in maintaining vascular volume and integrity of spiral ganglion neurons (SGNs) in the cochlea. Moreover, using the coculture models, they show vigorous vascular and neuronal growth in neonatal SGN explants in the presence of exogenous pericytes. Mechanistically, this study demonstrates that these roles are achieved mainly through the interactions between pericyte-released exosomes containing VEGF-A and VEGFR2-expressing the vessels and SGNs.

      Overall, the data are analyzed thoroughly, and the conclusions are novel and convincing. It is mechanistically solid. The study is somewhat translationally limited. Nevertheless, understanding the roles of organ-specific pericytes is paramount, making this study timely and significant.

      We thank Reviewer #1 for the positive comment. We agree the pericyte depletion model is not a translational disease model. However, pericyte pathologies, including the decline in pericyte number, pericyte migration, and pericyte trans-differentiation, are frequently seen in aging and noise-induced hearing loss animal models. Moreover, hearing dysfunction due to pericyte pathology has been demonstrated in recent studies (Hou et al., 2020; Hou et al., 2018; Neng et al., 2015).

      Reviewer #2 (Public Review):

      The present study from Xiaorui Shi's lab investigated the effect of pericyte depletion on spiral ganglion neurons and auditory function. Results in vitro culture system proposed that pericyte-derived exosomes contain VEGF, and promote not just vascular stability but neuronal survival through Flk1. This study is an extension of their previous study showing pericyte depletion causes auditory dysfunction, which is ameliorated by VEGF gene therapy (Zhang et al., JCI insight 2021). Overall, the data are clear and sophisticated and promote our understanding of the biological roles of pericytes in neuronal function. Several points should be thoroughly discussed or supported by definitive experiments like analysis of neuron-specific Flk1 KO mice.

      We thank Reviewer #2 for the encouraging positive comments on our study. We especially appreciated the reviewer’s view that there would be value in using neuron-specific Flk1 KO mice to consolidate the results. However, since our in vitro adult SGN neuron cell culture model cearly demonstrates the direct role of exosome-VEGF-A signaling on adult SGN health, as shown in Figs. 5D & E and Figs. 9C & E, we are confident our conclusion is valid. A recent study used neuron-specific Flk1 conditional KO mice to demonstrate neuronal atrophy and dysfunction in memory impairment (Deyama et al., 2020). We do presume disruption of neuronal VEGF/FLK1 signaling in a specific neuronal Flk-1 deletion animal model would cause similar spiral ganglion death and subsequent hearing loss. To test this possibility, we are seeking a Cre-SGN driver animal model from the auditory community and Flk1 floxed mice from the larger research community. Of course, obtaining these models and setting up for a future study will require some time. Nevertheless, reviewer #2’s suggestion is excellent, we have added discussion of the suggestion to the Discussion section.

      Reviewer #3 (Public Review):

      Zhang et al focus on investigating the role of pericytes in the vasculature of the inner ear. They propose that pericyte-derived VEGF is required for vessels and SGN survival. Functionally, they show that pericyte ablation leads to hearing loss.

      This work is interesting to the scientific community. It describes a very specific organ vasculature and its potential crosstalk with the neuronal compartment in the peripheral nervous system.

      Major strengths and weaknesses:

      • The study is well explained, written, and discussed;

      • The design of the experiments is adequate;

      • The study is performed in vivo, in vitro, and with functional readouts;

      • Results are convincing.

      We thank the reviewer for the positive comments on our study. We especially appreciate the reviewer’s suggestions for improving the soundness and quality of the study. We address Review#3’s specific concerns below.

      The main conclusion of the study is that pericyte-derived VEGF acts on inner ear vessels and SGNs to maintain their functionality and survival. While all presented data supports this model, there could be other potential interpretations that should be tested and validated with further evidence:

      The in vitro experiments are performed with SGN explants. Using this system the authors see that pericyte-derived conditioned medium or exosomes lead to increase vessel branching and SGN neurite outgrowth. As explants contain vessels and neurons, there is the possibility that VEGF is primarily acting on endothelial cells, which then in turn signal to neurons (independent of VEGF, even when neurons express VEGFR2). This should be tested. Perhaps by targeting VEGFR2 specifically in neurons, or by culturing isolated SGN neurons and testing the effect of pericyte-derived exosomes.

      This is a great point. To confirm the effect of exosome VEGF-A on SGN neurite outgrowth, we treated isolated adult SGNs with exosomes. As shown in Figs.9C & E, we found much greater SGN dendrite and branch growth in the treated than in the untreated groups.

      • Pericyte ablation via DTA might result in the activation of the immune system, which could also influence vessel and neuronal survival. It should be checked whether there is immune activation upon pericyte ablation.

      Excellent point. We checked on macrophage activation at two weeks after pericyte depletion. We didn’t see any obvious signs of macrophage activation, but we did notice a decrease in macrophage number. We presume the reduction in macrophage number results from insufficiency blood flow and nutrient availability.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors seek to determine how various species combine their effects on the growth of a species of interest when part of the same community.

      To this end, the authors carry out an impressive experiment containing what I believe must be one of the largest pairwise + third-order co-culture experiments done to date, using a high-throughput co-culture system they had co-developed in previous work. The unprecedented nature of this data is a major strength of the paper. The authors also discover that species combine their effect through "dominance", i.e. the strongest effect masks the others. This is important as it calls into question the common assumption of additivity that is implicit in the choice of using Lotka-Volterra models.

      A stronger claim (i.e. in the abstract) is that joint effect of multiple species on the growth of another can be derived from the effect of individual species. Unless I am misunderstanding something, this statement may have to be qualified a little, as the authors show that a model based on pairwise dominance (i.e. the strongest pairwise) does a somewhat better job (lower RMSD, though granted, not by much, 0.57 vs 0.63) than a model based on single species dominance. This is, the effect of the strongest pair predicts better the effect of a trio than the effect of the larger species.

      This issue makes one wonder whether, had the authors included higher-order combinations of species (i.e. five-member consortia or higher), the strongest-effect trio would have predicted better than the strongest-effect pair, which in turn is better predictor than the strongest-effect species. This is important, as it would help one determine to what extent the strongest-effect model would work in more diverse communities, such as those one typically finds in nature. Indeed, the authors find that the predictive ability of the strongest effect species is much stronger for pairs than it is for trios (RMSD of 0.28 vs 0.63). Does the predictive ability of the single species model decline faster and faster as diversity grows beyond 4-member consortia?

      Thank you for raising this important point. It is true that in our study we see that single species predict pairs better than trios, and that pairs predict trios better than single species. As we did not perform experiments on more diverse communities (n>4), we are not sure if or how these rules will scale up. We explicitly address these caveats in our revised discussion.

      Reviewer #3 (Public Review):

      A problem in synthetic ecology is that one can't brute-force complex community design because combinatorics make it basically impossible to screen all possible communities from a bank of possible species. Therefore, we need a way to predict phenomena in complex communities from phenomena in simple communities. This paper aims to improve this predictive ability by comparing a few different simple models applied to a large dataset obtained with the use of the author's "kchip" microfluidics device. The main question they ask is whether the effect of two species on a focal species is predicted from the mean, the sum, or the max of the effect of each single "affecting" species on the focal species. They find that the max effect is often the best predictor, in the sense of minimizing the difference between predicted effect and measured effect. They also measure single-species trait data for their library of strains, including resource niche and antibiotic resistance, and then find that Pearson correlations between distance calculations generated from these metrics and the effect of added species are weak and unpredictive. This work is largely well-done, timely and likely to be of high interest to the field, as predicting ecosystem traits from species traits is a major research aim.

      My main criticism is that the main take-home from the paper (fig 3B)-that the strongest effect is the best predictor-is oversold. While it is true that, averaged over their six focal species, the "strongest effect" was the best overall predictor, when one looks at the species-specific data (S9), we see that it is not the best predictor for 1/3 of their focal species, and this fraction grows to 1/2 if one considers a difference in nRMSE of 0.01 to be negligible.

      As suggested, we have softened our language regarding the take-home message. This matter is addressed in detail above in response to 'Essential Revisions'. Briefly, we see that the strongest model works best when both single species have qualitatively similar effects, but is slightly less accurate when effects are mixed. We also see overall less accurate predictions for positive effects. In light of these findings, we propose that focal species for which the strongest model is not the most accurate is due to the interaction types, and not specific to the focal species.

      We made substantial changes to the manuscript, including the first paragraph of the discussion which more accurately describes these findings and emphasizes the relevant caveats:

      "By measuring thousands of simplified microbial communities, we quantified the effects of single species, pairs, and trios on multiple focal species. The most accurate model, overall and specifically when both single species effects were negative, was the strongest effect model. This is in stark contrast to models often used in antibiotic compound combinations, despite most effects being negative, where additivity is often the default model (Bollenbach 2015). The additive model performed well for mixed effects (i.e. one negative and one positive), but only slightly better than the strongest model, and poorly when both species had effects of the same sign. When both single species’ effects were positive, the strongest model was also the best, though the difference was less pronounced and all models performed worse for these interactions. This may be due to the small effect size seen with positive effects, as when we limited negative and mixed effects to a similar range of effects strength, their accuracy dropped to similar values (Figure 3–Figure supplement 5). We posit that the difference in accuracy across species is affected mainly by the effect type dominating different focal species' interactions, rather than by inherent species traits (Figure 3–Figure supplement 6)." (Lines 288-304)

      The same criticism applies to the result from figure 2-that pairs of affecting species have more negative effects than single species. Considered across all focal species this is true (though minor in effect size, Fig 2A). But there is only a significant effect within two individual species. Again, this points to the effects being focal-species-specific, and perhaps not as generalizable as is currently being claimed.

      Upon more rigorous analysis, and with regard to changes in the dataset after filtering, we see that the more accurate statement is that effects become stronger, not necessarily more negative (in line with the accuracy of the strongest model). The overall trend is towards more negative interactions, due to the majority of interactions being negative, but as stated this is not true for each individual focal. As such the following sentence in the manuscript has been changed:

      "The median effect on each focal was more negative by 0.28 on average, though the difference was not significant in all cases; additionally, focals with mostly positive single species interactions showed a small increase in median effect (Fig. 2D)" (Lines 151-154)

      As well as the title of this section: "Joint effects of species pairs tend to be stronger than those of individual affecting species" (Lines 127-128)

      Another thing that points to a focal-species-specific response is Fig 2D, which shows the distributions of responses of each focal species to pairs. Two of these distributions are unimodal, one appears bimodal, and three appear tri-modal. This suggests to me that the focal species respond in categorically different ways to species addition.

      We believe this distribution of pair effects is related to the distribution of single species effects, and not to the way in which different focal species respond to the addition of second species. Though this may be difficult to see from the swarm plots shown in the paper, below is a split violin plot that emphasizes this point.

      Fig R1: Distribution of single species and pair effects. Distribution of the effect of single and pairs of affecting species for each focal species individually. Dashed lines represent the median, while dotted lines the interquartile range.

      These differences occur even though the focal bacteria are all from the same family. This suggests to me that the generalizability may be even less when a more phylogenetically dispersed set of focal species are used.

      We have added the following sentence to the discussion explicitly emphasizing the phylogenetic limitations of our study:

      "Lastly, it is important to note that our focal species are all from the same order (Enterobacterales), which may also limit the purview of our findings." (Lines 364-366)

      Considering these points together, I argue that the conclusion should be shifted from "strongest effect is the best" to "in 3 of our focal species, strongest effect was the best, but this was not universal, and with only 6 focal species, we can't know if it will always be the best across a set of focal species".

      As mentioned above, we have softened our language regarding the take-home message in response to these evaluations.

      My second main criticism is that it is hard to understand exactly how the trait data were used to predict effects. It seems like it was just pearson correlation coefficients between interspecies niche distances (or antibiotic distances) and the effect. I'm not very surprised these correlations were unpredictive, because the underlying measurements don't seem to be relevant to the environment tested. What if, rather than using niche data across 20 nutrients, only the growth data on glucose (the carbon source in the experiments) was used? I understand that in a field experiment, for example, one might not know what resources are available, and so measuring niche across 20 resources may be the best thing to do. Here though it seems imperative to test using the most relevant data.

      It is true that much of the profiling data is not directly related to the experimental conditions (different carbon sources and antibiotics), but in addition to these we do use measurements from experiments carried out in the same environment as the interactions assays (i.e. growth rate and carrying capacity when growing on glucose), which also showed poor correlation with the effects on focals. Additionally, we believe that these profiles contain relevant information regarding metabolic similarity between species (similar to metabolic models often constructed computationally). To improve clarity, we added the following sentence to the figure legend of Figure 3–Figure supplement 1:

      "The growth rate, and maximum OD shown in panel A were measured only in M9 glucose, similar to conditions used in the interaction assays." (Lines 591-592)

      Additionally and relatedly, it would be valuable to show the scatterplots leading to the conclusion that trait data were uninformative. Pearson's r only works on an assumption of linearity. But there could be strong relationships between the trait data and effect that are monotonic but not linear, or even that are non-monotonic yet still strong (e.g. U-shaped). For the first case, I recommend switching to Spearman's rho over Pearson's r, because it only assumes monotonicity, not linearity. If there are observable relationships that are not monotonic, a different test should be used.

      Per your suggestion, we have changed the measurement of correlation in this analysis from Pearson's r, to Spearman's rho. As we observed similar, and still mostly weak correlations, we did not investigate these relationships further. See Figure 3–Figure supplement 1.

      Additionally, we generated heat maps including scatterplots mapping the data leading to these correlations. We found no notable dependency in these plots, and visually they were quite crowded and difficult to interpret. As this is not the central point of our study, we ultimately decided against adding this information to the plots.

      In general, I think the analyses using the trait data were too simplistic to conclude that the trait data are not predictive.

      We agree that more sophisticated analyses may help connect between species traits and their effects on focal species. In fact, other members of our research group have recently used machine learning to accomplish similar predictions (https://doi.org/10.1101/2022.08.02.502471). As such we have changed the wording in to reflect that this correlation is difficult to find using simple analyses:

      "These results indicate that it may be challenging to connect the effects of single and pairs of species on a focal strain to a specific trait of the involved strains, using simple analysis." (Lines 157-159)

    1. Author Response

      Reviewer #1 (Public Review):

      The authors examined the impact of pre-gravid obesity in human mothers on the monocytes of newborns by collecting umbilical cord blood. Additionally, the authors also used a non-human primate (NHP) model of diet-induced obesity to isolate fetal macrophage and assess the impact of maternal obesity on fetal macrophage function. The comprehensive analysis of the human umbilical cord blood monocytes by studying cytokine release, bulk RNA-seq and bulk ATAC-seq, single cell RNA-seq and single cell ATAC-seq, responses to pathogen stimulation as well as metabolic studies such as glucose uptake are major strength of the work. They present convincing evidence that the monocytes of offspring with obese mothers have epigenetic and transcriptomic profiles consistent with impaired immune responses, both during baseline conditions and upon stimulation.

      We thank the reviewer for these positive remarks

      However, it is not clear from the data how the epigenetic data and the transcriptomic data are related to each other. The implication that the epigenetic changes drive the downstream transcriptional differences is not clearly demonstrated. Furthermore, it is not clear which of the observed attenuations of monocyte transcriptional responses overlap with chromatin accessibility differences. Such an overlap would make a stronger case for the mechanistic link.

      We thank the reviewer for this suggestion. We have included an integration section - with overlap of baseline ATAC-Seq (data from this study) with gene expression responses (from a previous study; https://doi.org/10.4049/jimmunol.1700434) following LPS stimulation in lean and obese groups - Figure 4E. Additionally, we report overlap of LPS induced chromatin changes with gene expression changes following LPS, E.coli and RSV stimulation in Figure 5I. Collectively, these changes provide the reader with a better link between chromatin accessibility and gene expression differences and their discordance with maternal obesity.

      The increased phagocytosis of E.coli in umbilical cord monocytes of newborns with obese mothers appear counter-intuitive because it implies greater host defense capacity.

      E.coli uptake assay is a standard way of measuring cellular phagocytosis by flow cytometry. We would like to clarify that despite impaired ex vivo cytokine responses and poor migration, UCB monocytes demonstrate higher ability to phagocytize pathogens. This is counterintuitive but not surprising, given that enhanced phagocytosis is a hallmark of regulatory monocytes/macrophages.

      One of the most remarkable aspects of the manuscript is the analysis of the fetal macrophages in a non-human primate (NHP) model of diet induced obesity because of the challenge of studying fetal macrophages in humans. The cytokine assays nicely show that the fetal macrophages in the obesity model show impaired cytokine production, consistent with what was seen in the umbilical cord blood monocytes of human newborns. This is especially important because circulating monocytes or monocyte progenitors seed the fetal tissues and give rise to fetal macrophages, thus elegantly linking the human work on circulating umbilical cord blood monocytes to the tissue macrophages in the NHP model. However, the NHP studies do not show any additional macrophage characterization beyond the cytokine assays. Flow cytometry analysis of the macrophage phenotype and functional assays would strengthen the conclusions regarding macrophage dysregulation.

      We have now included phenotyping data for ileal and splenic macrophages in Figure 6C-6E, which were collected during cell sorting. We unfortunately are not able to carry out additional functional assays since we don’t have any additional cells from these animals.

      Reviewer #2 (Public Review):

      This paper will be of interest to scientists studying the molecular effects of maternal obesity on offspring health. The paper represents an extension to earlier findings that have linked epigenomic alterations of monocyte population to aberrant immune responses in offsprings of obese mothers. Bulk and single cell technologies have been implemented to characterize monocytic responses to bacterial and viral pathogens at the transcriptional and epigenetic level. A macaque model of western-style diet induced obesity is also described to provide in vivo evidence in support of monocyte/immune cell reprogramming by western diet/obesity. However, enthusiasm for the paper is significantly dampened by a lack of clarity in data presentation and robustness of the analysis

      We thank the reviewer for this comprehensive summary and thoughtful assessment

      Reviewer #3 (Public Review):

      The manuscript by Sureshchandra et al is a very extensive analysis of monocyte function and their molecular landscape in cord bloods from lean and obese mothers. They aimed to analyze the effects of pre-pregnancy BMI on the functioning of the innate immune system in newborns in a very extensive way. The combination of functional and molecular analyses strengthens their observations and shows many different sides of monocyte activation. I think this approach needs to be praised and should be an inspiration to many others who study monocyte function. This allows for a broad view on the matter and also shows where potential targeting will be necessary in the future. Overall, the manuscript and particularly the methods section is very well written and extensive, making it easy to study how robust the data are.

      We thank the reviewer for their comprehensive and positive assessment of our work

    1. Author Response

      Reviewer #1 (Public Review):

      This study provides further detailed analysis of recently published Fly Atlas datasets supplemented with newly generated single cell RNA-seq data obtained from 6,000 testis cells. Using these data, the authors define 43 germline cell clusters and 22 somatic cell clusters. This work confirms and extends previous observations regarding changing gene expression programs through the course of germ cell and somatic cell differentiation.

      This study makes several interesting observations that will be of interest to the field. For example, the authors find that spermatocytes exhibit sex chromosome specific changes in gene expression. In addition, comparisons between the single nucleus and single cell data reveal differences in active transcription versus global mRNA levels. For example, previous results showed that (1) several mRNAs remain high in spermatids long after they are actively transcribed in spermatocytes and (2) defined a set of post-meiotic transcripts. The analysis presented here shows that these patterns of mRNA expression are shared by hundreds of genes in the developing germline. Moreover, variable patterns between the sn- and sc-RNAseq datasets reveals considerable complexity in the post-transcriptional regulation of gene expression.

      Overall, this paper represents a significant contribution to the field. These findings will be of broad interest to developmental biologists and will establish an important foundation for future studies. However, several points should be addressed.

      In figure 1, I am struck by the widespread expression of vasa outside of the germ cell lineage. Do the authors have a technical or biological explanation for this observation? This point should be addressed in the paper with new experiments or further explanation in the text.

      Thank you for pointing this out. We found that our single cell dataset shows a similar (low) level of vasa expression outside the germline, suggesting that this is not due to single nucleus versus single cell RNA-seq (cluster 1, red in the lefthand umap).

      Analyzing the single nucleus RNA-seq in more detail revealed that, compared to the germline, both the fraction of cells in a cluster expressing vasa and the level at which they express it are very low. This analysis is included in a new Figure 1 – figure supplement 1. It is likely that much of this is due to a technical artifact, such as ambient RNA. Finally, we note in the resubmission that vasa is in fact expressed in embryonic somatic cells, and thus some of the vasa expression we observe may be real (Renault. Biol Open 2012; https://doi.org/10.1242/bio.20121909).

      Plots in the original submission drew undue attention to the few somatic cells that exhibited vasa signal, due to the fact that expressing cell points were forced to the front of the plot. Given our new analysis reporting the low levels and fraction of cells exhibiting vasa expression (Figure 1 – figure supplement 1), we have modified the panels of Figure 1, changing point size to more faithfully reflect the small proportion of somatic cells with some vasa expression.

      The proposed bifurcation of the cyst cells into head and tail populations is interesting and worth further exploration/validation. While the presented in situ hybridization for Nep4, geko, and shg hint at differences between these populations, double fluorescent in situs or the use of additional markers would help make this point clearer. Higher magnification images would also help in this regard.

      We thank the reviewer for their suggestions on clarifying the differences between HCC and TCC populations. As suggested, we have repeated the FISH experiments of Nep4 and geko with higher resolution, and included the additional marker Coracle that demarcates the junction between HCC and TCC (Figure 6O,Q,S,T). These panels replaced previous Nep4 and geko FISH images (see previous Figure 6Q,U,U’). FISH for Nep4 validated the split, and the enrichment of geko strongly suggests that this arm represents one cell type (HCCs). We have not yet identified a gene reciprocally enriched to the other arm. Therefore, in the revised submission, we call the assignment of TCC identity, and to a lesser extent, HCC identity ‘tentative’, but point out that genes predicted to be enriched to one or the other arm represent fertile candidates for the field to test.

      Reviewer #2 (Public Review):

      In this manuscript the authors explain in greater detail a recent testis snRNAseq dataset that many of these authors published earlier this year as part of the Fly Cell Atlas (FCA) Li et al. Science 2022. As part of the current effort additional collaborators were recruited and about 6,000 whole cell scRNAseq cells were added to the previous 42,000 nuclei dataset. The authors now describe 65 snRNseq clusters, each representing potential cell types or cell states, including 43 germline clusters and 22 somatic clusters. The authors state that this analysis confirms and extends previously knowledge of the testis in several important areas.

      “However, in areas where testis biology is well studied, such as the development of germ cells from GSC to the onset of spermatocyte differentiation, the resolution seems less than current knowledge by considerable margins. No clusters correspond to GSCs, or specific mitotic spermatogonia, and even the major stages of meiotic prophase are not resolved. Instead, the transitions between one state and the next are broad and almost continuous, which could be an intrinsic characteristic of the testis compared to other tissues, of snRNAseq compared to scRNAseq, or of the particular experimental and software analysis choices that were used in this study.”

      Note that the referee raises the same issue later in their review also. To respond succinctly, we placed the relevant sentence from a later portion of this referee’s comment here

      “Support for the view that the problems are mostly technical, rather than a reflection of testis biology, comes from studies of scRNAseq in the mouse, where it has been possible to resolve a stem cell cluster, and germ cell pathways that follow known germ cell differentiation trajectories with much more discrete steps than were reported here (for example, Cao et al. 2021 cited by the authors).”

      Respectfully, we have a different interpretation of other work as cited by this referee. Our data, as well as that from others, supports the notion that transitions are generally broad and continuous and are indeed a feature of testis biology. As we report here, data from both single cell and single nucleus RNAseq exhibit transitions from one cluster to the next. Thus, this feature cannot be due to the choice of method (single cell versus single nucleus).

      In fact, prior scRNA-seq results on systems containing a continuously renewing cell population, such as is the case in the testis, do indeed exhibit a contiguous trajectory rather than discrete, well-separated cell states in gene expression space (that is, in a UMAP presentation). For example, this is the case from single-cell or single-nucleus sequencing from spermatogenesis in mouse (Cao et al 2021), human (Sohni et al 2019), and zebrafish (Qian et al 2022).

      Along differentiation trajectories in these tissues, successive clusters are defined by their aggregate, transcript repertoire. Indeed, differentially-expressed genes can be identified for clusters, with expression enriched in a given cluster. However, expression is rarely restricted to a cluster. For instance, Cao et al. subcluster spermatogonia into four subgroups, termed SPG1-4. They state clearly that these SPG1-4 “follow a continuous differentiation trajectory,” as can be inferred by marker expression across cells in this lineage. Similar to our findings, while the spermatogonia can fall into discrete clusters, gene expression patterns are contiguous. For example, the “undifferentiated” marker used in Cao et al, Crabp1, clearly shows expression in SPG1-3, annotated as spermatogonial stem cells, undifferentiated spermatogonia, and early differentiated spermatogonia, respectively. Likewise, markers for the “SPG3” state spermatogonia have detectable expression in SPG2 and SPG4, and likewise for markers of the “SPG4” state (with expression found also in SPG3). <br /> Analogous study of human spermatogenesis arrives at a similar conclusion. In that work, although clusters are named as “spermatogonial stem cell (SSC)”, the authors are careful to specifically point out that, “…while we refer to the SSC-1 and SSC-2 cell clusters as ‘‘SSCs,’’ scRNA-seq is not a functional assay and thus we do not know the percentage of cells in these clusters with SSC activity. These subsets almost certainly contain other A-SPG cells [A type spermatogonia], including SPG progenitors that have committed to differentiate.” (Sohi et al 2019)

      Thus, the work in several disparate systems, all involving renewing lineages, finds that discrete clusters, such as a “stem cell cluster” are not identified. In the Drosophila testis, germline differentiation flows in a continuous-like manner similar to spermatogenesis in several other organisms studied by scRNA-seq, and our finding is not a function of the methodology, but rather a facet of the biology of the organ.

      Operating in parallel with continuous differentiation, we did find evidence of, and extensively discussed in concert with Figure 4, huge and dramatic shifts in transcriptional state in spermatocytes compared to spermatogonia, in early spermatids compared to spermatocytes, and in late spermatid elongation. Lastly, as we describe further below, new data in this resubmission identify four distinct genes with stage-selective expression as predicted by our analysis (new Figure 2 - figure supplement 1), illustrating the utility of our study for the field to find new markers and new genes to test for function.

      A goal of the study was to identify new rare cell types, and the hub, a small apical somatic cell region, was mentioned as a target region, since it regulates both stem cell populations, GSCs and CySCs, is capable of regeneration, and other fascinating properties. However the analysis of the hub cluster revealed more problems of specificity. 41 or 120 cells in the cluster were discordant with the remaining 79 which did express markers consistent with previous studies. Why these cells co-clustered was not explained and one can only presume that similar problems may be found in other clusters.

      Our writing seems not to have been clear enough on this point and we thank the reviewer. We have revised the section. In addition, we have added new data (Figure 7 - figure supplement 2). We had already stated that only 79 of these 120 nuclei were near to each other in 2D UMAP space, while other members of original cluster 90 were dispersed. Thus the 79 hub nuclei in fact clustered together on the UMAP. Other nuclei that mapped at dispersed positions were initially ‘called’ as part of this cluster in the original Fly Cell Atlas (FCA) paper (Li et al., 2022), making it obvious that a correction to that assignment was necessary, which we carried out. To our eye, no other called cluster was represented by such dispersed groupings. For the hub, we definitively established the 79 nuclei to represent hub cells by marker gene analysis, including the identification of a new maker, tup, that was included in the 79 annotated hub nuclei but excluded from the 41 other nuclei (Figure 7). In this resubmission, to independently verify the relationship of the 79 nuclei to each other, we subjected the 120 nuclei from the original cluster 90 defined by the FCA study to hierarchical clustering using only genes that are highly expressed and variable in these nuclei (Figure 7 - figure supplement 2). This computationally distinct approach strongly supported our identification of the 79 definitive hub nuclei.

      Indeed, many other indications of specificity issues were described, including contamination of fat body with spermatocytes, the expression of germline genes such as Vasa in many somatic cell clusters like muscle, hemocytes, and male gonad epithelium, and the promiscuous expression of many genes, including 25% of somatic-specific transcription factors, in mid to late spermatocytes. The expression of only one such genes, Hml, was documented in tissue, and the authors for reasons not explained did not attempt to decisively address whether this phenomenon is biologically meaningful.

      We discussed the question of vasa expression in somatic clusters in some detail above, in response to referee #1, and included new analysis in the resubmission.

      With respect to the observation of ‘somatic gene’ expression in spermatocytes, we are also intrigued. We do not believe this is due to “contamination,” but rather a spermatocyte expression program that includes expression of somatic genes. First, these somatic markers were not observed in other germline clusters, which would be expected if this was due to general transcript contamination. Second, we observed expression of somatic markers in spermatocytes independently in the single-cell and single-nucleus data, making it unlikely to be an artifact of preparation of isolated nuclei. Finally, in the resubmission, in addition to Hml, we validated ‘somatic’ marker expression in spermatocytes by FISH of a somatic, tail cyst cell marker, Vsx1. Vsx1 is predicted to be expressed at low levels in spermatocytes in our dataset and is clearly visible in germline cells by FISH (Figure 3 – figure supplement 2G,H). We also refer the referee to Figure 6K, where the mRNA for the somatic cyst cell marker eya was observed by FISH at low levels in spermatocytes.

      A truly interesting question mentioned by the authors is why the testis consistently ranks near the top of all tissues in the complexity of its gene expression. In the Li et al. (2022) paper it was suggested that this is due an inherently greater biological complexity of spermiogenesis than other tissues. It seems difficult to independently and rationally determine "biological complexity," but if a conserved characteristic of testis was to promiscuously express a wide range of (random?) genes, something not out of the question, this would be highly relevant and important.

      We agree that the massive transcriptional program found in spermatocytes is, indeed, truly interesting. There are many speculations as to why spermatocytes are so highly transcriptional, including the possibility of “transcriptional scanning” (e.g., Xia et al. 2020) regulating the evolution of new genes. Testing such models is beyond the scope of this paper. However, one must also keep in mind that spermatogenesis involves one of the most dramatic cellular transformations in biology, where cellular components spanning from nuclei to chromatin to Golgi, cell cycle, extensive membrane addition, changes in cell shape, and building of a complex swimming organelle all must occur and be temporally coordinated. Small wonder that many genes must be expressed to accomplish these tasks.

      Unfortunately, the most likely problems are simply technical. Drosophila cells are small and difficult to separate as intact cells. The use of nuclei was meant to overcome this inherent problem, but the effectiveness of this new approach is not yet well-documented. Support for the view that the problems are mostly technical, rather than a reflection of testis biology, comes from studies of scRNAseq in the mouse, where it has been possible to resolve a stem cell cluster, and germ cell pathways that follow known germ cell differentiation trajectories with much more discrete steps than were reported here (for example, Cao et al. 2021 cited by the authors).

      We respectfully disagree with the referee about this collection of statements. First, the use of snRNASeq has been extensively characterized and compared to scRNA-seq in brain tissue by McLaughlin et al., 2021 (cited in the original submission) and was shown to be effective (McLaughlin, et al. eLife 2021;10:e63856. DOI: https://doi.org/10.7554/eLife.63856). snRNA-seq has a distinct advantage when dealing with long, thin cells, such as neurons or cyst cells (as featured in this work), where cytoplasm can easily be sheared off during cell isolation. Second, in a previous portion of our response to this referee, we discussed how our interpretation of Cao et al., 2021 differs from that expressed by this referee. Lastly, as requested in ‘Essential revision’ 2, we adjusted clustering methods and selected four genes, two predicted to be markers for early stage germline cells, and two for mid-spermatocyte stage development. FISH analysis demonstrates that expression for each of these maps to the appropriate stages (new Figure 2 - figure supplement 1). This confirms that the datasets we present in this manuscript can be mined to identify unique, diagnostic markers for various stages.

      The conclusions that were made by the authors seem to either be facts that are already well known, such as the problem that transcriptional changes in spermatocytes will be obscured by the large stored mRNA pool, or promises of future utility. For example, "mining the snRNA-seq data for changes in gene expression as one cluster advances to the next should identify new sub-stage-specific markers." If worthwhile new markers could be identified from these data, surely this could have been accomplished and presented in a supplemental Table. As it currently stands, the manuscript presents the dataset including a fair description of its current limitations, but very little else of novel biological interest is to be found.

      “In sum, this project represents an extremely worthwhile undertaking that will eventually pay off. However, some currently unappreciated technical issues, in cell/nuclear isolation, and certainly in the bioinformatic programs and procedures used that mis-clustered many different cells, has created the current difficulties.

      Most scRNAseq software is written to meet the needs of mammalian researchers working with cultured cells, cellular giants compared to Drosophila and of generally similar size. Such software may not be ideal for much smaller cells, but which also include the much wider variation in cell size, properties and biological mechanisms that exist in the world of tissues.”

      We appreciate the referee’s acknowledgement that this ‘undertaking will eventually pay off’. It was not our intention to address ‘function’ for this study, but rather to make the system accessible to the broadest community possible. We are uncertain if there is any remaining reservation held by this referee. A brief summary of what we covered in the manuscript may help allay any residual concern. Obviously, study of the Drosophila testis and spermatogenesis benefits from the knowledge of a large number of established cell-type and stage-selective markers. Thus, we extensively used the community’s accepted markers to assign identity to clusters in both the sn- and sc-RNA-seq UMAPs. We believe that effort well establishes the validity and reliability of the dataset . Furthermore, we identified upwards of a dozen new markers out of the cluster analysis, and verified their expression by FISH or reporter line in various figures throughout (tup, amph, piwi, geko, Nep4, CG3902, Akr1B, loqs, Vsx1, Drep2, Pxt, CG43317, Vha16-5, l(2)41Ab). To our mind, these contributions, coupled with annotation of the datasets, suggest strongly that they will serve the community well. This is especially true as we provide users with objects that they can feed into commonly used software algorithms such as Seurat and Monocle to explore the datasets to their purposes. Rather than simply relying on default settings within some of the applications, we also adjusted parameters for various clusterings as called for; some of which were in response to astute comments from referees, and included in the resubmission. Of course, it is possible that rare issues may arise in the datasets as these are further studied, but that is the case with all scRNA-seq data, and is not specific to work on this model organism.

      Reviewer #3 (Public Review):

      In this study, the authors use recently published single nucleus RNA sequencing data and a newly generated single cell RNA sequencing dataset to determine the transcriptional profiles of the different cell types in the Drosophila ovary. Their analysis of the data and experimental validation of key findings provide new insight into testis biology and create a resource for the community. The manuscript is clearly written, the data provide strong support for the conclusions, and the analysis is rigorous. Indeed, this manuscript serves as a case study demonstrating best practices in the analysis of this type of genomics data and the many types of predictions that can be made from a deep dive into the data. Researchers who are studying the testis will find many starting points for new projects suggested by this work, and the insightful comparison of methods, such as between slingshot and Monocle3 and single cell vs single nucleus sequencing will be of interest beyond the study of the Drosophila testis.

      We greatly appreciate the reviewer’s comments.

      Reviewer #4 (Public Review):

      This is an extraordinary study that will serve as key resource for all researchers in the field of Drosophila testis development. The lineages that derive from the germline stem cells and somatic stem cells are described in a detail that has not been previously achieved. The RNAseq approaches have permitted the description of cell states that have not been inferred from morphological analyses, although it is the combination of RNAseq and morphological studies that makes this study exceptional. The field will now have a good understanding of interactions between specific cell states in the somatic lineage with specific states in the germ cell lineage. This resource will permit future studies on precise mechanisms of communication between these lineages during the differentiation process, and will serve as a model for studies of co-differentiation in other stem cell systems. The combination of snRNAseq and scRNAseq has conclusively shown differences in transcriptional activation and RNA storage at specific stages of germ cell differentiation and is a unique study that will inform other studies of cell differentiation.

      Could the authors please describe whether genes on the Y chromosome are expressed outside of the male germline. For example, what is represented by the spots of expression within the seminal vesicle observed in Figure 3D?

      Prior work demonstrated that proteins encoded by Y-linked genes are not expressed outside of the germline (Zhang et al. Genetics 2020. https://doi.org/10.1534/genetics.120.303324). In our snRNAseq dataset, we find that genes on the Y chromosome are not highly expressed outside of the male germline (on the order of ~100-fold lower in other tissues). In fact, we observe Y chromosome transcripts at this level in many nuclei across tissues collected for the Fly Cell Atlas project, including the ovary. Since we have not followed up on the Fly Cell Atlas observations directly using FISH to examine Y chromosome transcript expression outside the germline, we cannot rule out the possibility that such low level expression is real. However, the detection across several tissues argues that this is likely technical artifact. With regard to ‘spots of expression within the seminal vesicle’ (Figure 3D), a spot is colored red if the average expression level of genes on the Y chromosome is greater in that cell than in an average cell on our plot. These red spots are likely due to ambient RNA being carried over.

      I would appreciate some discussion of the "somatic factors" that are observed to be upregulated in spermatocytes (e.g. Mhc, Hml, grh, Syt1). Is there any indication of functional significance of any of these factors in spermatocytes?

      This is an excellent question. Although we validated expression for several (Hml, Vsx1 and eya), we did not test for their function here and this issue remains to be studied. This is now directly stated in the main text.

      In the discussion of cyst cell lineage differentiation following cluster 74 the authors state that neither the HCC or TCC lineages were enriched for eya (Figure 6V). It seems in this panel that cluster 57 shows some enrichment for eya - is this regarded as too low expression to be considered enriched?

      We thank the reviewer for their insightful comment and we agree with their conclusions. We have modified the text to reflect the low, but present, expression of eya in the HCC and TCC lineages. The text now reads as follows at line (insert line # here): “Enrichment of eya was dramatically reduced in the clusters along either late cyst cell branch compared to those of earlier lineage nuclei (Figure 6J,U).”

    1. Author Response

      Reviewer #2 (Public Review):

      This is an interesting study investigating the effects of sensory conflict on rhythmic behaviour and gene expression in the sea anemone Nematostella vectensis. Sensory conflict can arise when two environmental inputs (Zeitgeber) that usually act cooperatively to synchronize circadian clocks and behaviour, are presented out of phase. The clock system then needs to somehow cope with this challenge, for example by prioritising one cue and ignoring the other. While the daily light dark cycle is usually considered the more reliable and potent Zeitgeber, under some conditions, daily temperature cycles appear to be more prominent, and a certain offset between light and temperature cycles can even lead to a breakdown of the circadian clock and normal daily behavioural rhythms. Understanding the weighting and integration of different environmental cues is important for proper synchronization to daily environmental cycles, because organisms need to distinguish between 'environmental noise' (e.g., cloudy weather and/or sudden, within day/night temperature changes) and regular daily changes of light and temperature. In this study, a systematic analysis of different offsets between light and temperature cycles on behavioural activity was conducted. The results indicated that several degrees of chronic offset results in the disruption of rhythmic behaviour. In the 2nd part of the study the authors determine the effect of sensory conflict (12 hr offset that leads to robust disruption of rhythmic behaviour) on overall gene expression rhythms. They observe substantial differences between aligned and offset conditions and conclude a major role for temperature cycles in setting transcriptional phase. While the study is thoroughly conducted and represents and impressive amount of experimental and analytical work, there are several issues, which I think question the main conclusions. The main issue being that temperature cycles by themselves do not seem to fulfil the criteria for being considered a true Zeitgeber for the circadian clock of Nematostella.

      Major points:

      Line 53: 'However, many of these studies did not compare more than two possible phase relationships.....'. Harper et al. (2016) did perform a comprehensive comparison of different phase relationships between light and temperature Zeitgebers (1 hr steps between 2 and 10 hr offsets), similar to the one conducted here. I think this previous study is highly relevant for the current manuscript and -- although cited -- should be discussed in more detail. For example, Harper et al. show that during smaller offsets temperature is the dominant Zeitgeber, and during larger sensory conflict light becomes the dominant Zeitgeber for behavioural synchronization. Only during a small offset window (5-7 hr) behavioural synchronization becomes highly aberrant, presumably because of a near breakdown of the molecular clock, caused by sensory conflict. Do the authors see something similar in Nematostella? Figure 3 suggests otherwise, at least under entrainment conditions, where behaviour becomes desynchronized only at 10 and 12 hr offset conditions. But in free-run conditions behaviour appears largely AR already at 6 hr offset, but not so much at 4 and 8 hr offsets (Table 2). So there seems to be at least some similarity to the situation in Drosophila during sensory conflict, which I think is worth mentioning and discussing.

      We have added a more detailed discussion of our results in the context of Harper et al. 2016 (L468-476).

      Line 111: The authors state that 14-26C temperature cycle is 'well within the daily temperature range experienced by the source population'. Too me this is surprising, as I was not expecting that water temperature changes that much on a daily basis. Is this because Nematostella live near the water surface, and/or do they show vertical daily migration? Also, I do not understand what is meant by '...range of in situ diel variation (of temperature)'. I think a few explanatory words would be helpful here for the reader not familiar with this organism.

      In fact, one of our motivations for studying temperature is that Nematostella naturally experience extreme temperature variation. The data we cite (Tarrant et al. 2019) are from in-situ water measurements. Nematostella live in extremely shallow water (in salt marshes), and the local population in Massachusetts experience wide swings in temperature due to the temperate latitude.

      We have added this information to the Introduction (L88-90), and we also added a discussion of Nematostella’s ecology in the Discussion section (L591-654).

      Lines 114-117: I was surprised that clock genes can basically not be synchronized by temperature cycles alone. Only cry2 cycled during temperature cycles but not in free-run, so the cry2 cycling during temperature cycles could just be masking (response to temperature). Later the authors show robust molecular cycling during combined LD and temperature cycles (both aligned and out of phase), indicating that LD cycles are required to synchronize the molecular clock. Moreover, a previous study has demonstrated that LD cycles alone (i.e., at constant temperature) are able to induce rhythmic molecular clock gene expression (Oren et al. 2015). Similarly, the free running behaviour after temperature cycles does not look rhythmic to me. In Figure 2A, 14-26C there is at best one peak visible on the first day of DD, and even that shows a ~6 phase delay compared to the entrained condition. After the larger amplitude temperature cycle (8:32C) behaviour looks completely AR and peak activity phases in free-run appear desynchronized as well (Fig. 2B). Overall, I think the authors present data demonstrating that temperature cycles alone are not sufficient to synchronize the circadian clock of Nematostella. One way to proof if the clock can be entrained is to perform T-cycle experiments, so changing the thermoperiod away from 24 hr (e.g., 10 h warm : 10 h cold). If in a series of different T-cycles the peak activity always matches the transition from warm to cold (as in 12:12 T-cycles shown in Fig. 1A) this would speak against entrainment and vice versa.

      Thank you for these thoughtful comments and constructive suggestions. We have conducted an additional experiment, which provides further evidence that temperature cycles can, in fact, synchronize the circadian clock. To do this, we measured the behavior of animals entrained in cycles with a short (12h) period, half the length of a circadian period. This takes advantage of a phenomenon called “frequency demultiplication”, in which organisms in 12h environmental cycles display both 12h and 24h components--essentially, the clock perceives every other cycle as a “day” (Bruce, 1960; Merrow et al., 1999). The important thing is that the 24h behavioral component can only occur if the signal is entraining a circadian clock—otherwise, we would only observe a directly-driven 12h behavior pattern.

      We first show that this phenomenon occurs with 6:6 LD cycles—which we expected, because we know light is a zeitgeber. We then show that animals entrained to a temperature cycle with a 12h period also display 24h behavioral rhythms—and in fact the 24h component is stronger than the 12h component. We believe this is strong evidence that temperature is a bona fide zeitgeber in this system. This experiment is now explained in the Results (L127-154) and in Figure 2–Figure supplement 1.

      In terms of our original data, the reviewer is correct that the statistically-detectable free-running rhythms were weak and not visually obvious). Our confidence in thermal entrainment came from the fact that some individual animals had 24h rhythmicity in free-run, even if the signal was weak in the mean time series—this suggested that temperature must be at least capable of synchronizing internal clocks. It is also important to note that even light-entrained rhythms are “noisy” in cnidarians, which is why we were not surprised that the signal was weak. We have added a discussion of this observation in L601-612.

      Lines 210-226: As mentioned above, I think it is not clear that temperature alone can synchronize the Nematostella clock and it is therefore problematic to call it a Zeitgeber. Nevertheless, Figure 3A, B, D show that certain offsets of the temperature cycle relative to the LD cycle do influence rhythmicity and phase in constant conditions. This is most likely due to a direct effect of temperature cycles on the endogenous circadian clock, which only becomes visible (measureable) when the animals are also exposed to certain offset LD cycles. My interpretation of the combined results would be that temperature cycles play only are very minor role in synchronizing the Nematostella clock (after all, LD and temperature cycles are not offset in nature), perhaps mainly supporting entrainment by the prominent LD cycles.

      With our new data (see previous point), we believe we can safely say that temperature is a zeitgeber. We are not totally clear on what is meant by “a direct effect of temperature cycles on the endogenous circadian clock.” We argue that, because we see changes in free-running behavior during certain offsets, the timing of temperature cycles must affect the internal clock in a way that persists during constant conditions—it can’t just be a direct (clock-independent) effect of temperature.

      Gene expression part: The authors performed an extensive temporal transcriptomic analysis and comparison of gene expression between animals kept in aligned LD and temperature cycles and those maintained in a 12 hr offset. While this was a tremendous amount of experimental work that was followed by sophisticated mathematical analysis, I think that the conclusions that can be drawn from the data are rather limited. First of all, it is known from other organisms that temperature cycles alone have drastic effects on overall gene expression and importantly in a clock independent manner (e.g., Boothroyd et al. 2007). Temperature therefore seems to have a substantially larger effect on gene expression levels compared to light (Boothroyd et al. 2007). In the current study, except for a few clock gene candidates (Figure 2C), the effects of temperature cycles alone on overall gene expression have not been determined. Instead the authors analysed gene expression during aligned and 12 h offset conditions making it difficult to judge which of the observed differences are due to clock independent and clock dependent temperature effects on gene expression. This is further complicated by the lack of expression data in constant conditions. I think the authors need to address these limitations of their study and tone down their interpretations of 'temperature being the most important driver of rhythmic gene expression' (e.g., line 401). At least they need to acknowledge that they cannot distinguish between clock independent, driven gene expression and potential influences of temperature on clock-dependent gene expression rhythms. Moreover, in their comparison between their own data and LD data obtained at constant temperature (taken from Oren et al. 2015), they show that temperature has only a very limited effect (if any) on core clock gene expression, further questioning the role of temperature cycles in synchronising the Nematostella clock. Nevertheless, I noted in Table 3 that there is a 1.5 to 3 hr delay when comparing the phase of eight potential key clock genes between the current study (temperature and LD cycles aligned) and LD constant temperature (determined by Oren et al.). To me, this is the strongest argument that temperature cycles at least affect the phase of clock gene expression, but the authors do not comment on this phase difference.

      We agree with these points about the limitations of our study, and have revised the manuscript to phrase our conclusions more carefully. We still think it is reasonable to observe that temperature was a stronger drive of gene expression than light in our study, but this may not be true in other contexts.

      In terms of the comparison with Oren et al. 2015, we didn’t want to over-interpret these results because there are other differences between the studies (L1181-1185), including the use of a different source population. In addition, we would prefer denser sampling (2h time points rather than 4h) and larger sample sizes to make claims about phase differences.

      Network analysis: This last section of the results was very difficult to read and follow (at least for me). For example, do the colours in Figure 6A correspond to those in Figure 6B, C? A legend for each colour, i.e., which GO terms are included in each colour would perhaps be helpful. As mentioned above, I also do not think we can learn a lot from this analysis, since we do not know the effects of temperature cycles alone and we have no free-run data to judge potential influence on clock controlled gene expression. Under aligned conditions genes are expressed at a certain phase during the daily cycle (either morning to midday, or evening to midnight), which interestingly, is very similar to temperature cycle-only driven genes in Drosophila (Boothroyd et al. 2007). Inverting the temperature cycle has drastic effects on the peak phases of gene expression, but not so much on overall rhythmicity. But since no free-run data are available, we do not know to what extend these (expected) phase changes reflect temperature-driven responses, or are a result of alterations in the endogenous circadian clock.

      We have revised and streamlined this section and Fig. 6, including removing panel 6C. The colors do correspond across panels in the figure. For space, GO terms of select modules are included in Fig. 6, and GO results for all modules are included in the Supplemental Data and discussed in the Results.

      It is true that we can’t distinguish temperature-driven versus clock effects here, and it does seem like many modules simply follow the temperature cycle (which we say in this section). The most interesting finding from this section is probably that the co-expression structure (correlations between rhythmic genes) are substantially weakened during SC, and we do discuss certain modules of genes that lose or gain rhythmicity. We have revised this section to focus on the main points and have cut several of the less pertinent results.

      Reviewer #3 (Public Review):

      This article reflects a significant effort by the authors and the results are interesting.

      For the third set of experiments, are temperature and light really out of synch? While peak in temperature no longer occurs along with lights on, we do still have two 24 hour cycles where changes in the environmental cues still occur simultaneously (lights on with peak in temperature, lights off with min in temperature). I wonder what would happen if light remained at a 24 hour cycle and temperature became either sporadic (randomly changing cycles) or was placed on a longer cycle altogether (temperature taking 20 hours to increase from min to max, and then another 20 hours to go from max to min).

      Thank you for your interesting suggestions for future experiments. This point is addressed in our revisions responding to Reviewer #1, who requested a discussion of the phrase “sensory conflict.” We agree that the binary “in-sync vs. out-of-sync” may be too simplistic. Our original conception of sensory conflict was a situation in which light and temperature provide different phase information, as informed by experiments with only light (prior literature) or only temperature (this work).

      In our revised manuscript, we discuss the idea that “sensory conflict” is not always a useful framework because there are many possible relationships between light and temperature. Although our 12h offset is certainly less “natural” than our aligned time series, it may be useful to think of them simply as 2 different possible light and temperature regimes in which the two signals interact, rather than abstract ideals of “aligned” or “misaligned.”

      An area that could significantly benefit a broader readership would be to improve overall clarity of figures and rethink if all the results are necessary to convert the key findings of the paper. As written, the results sections is somewhat confusing.

      We have revised Figs. 1 and 6 for clarity, and we have also shortened the network analysis portion of the Results.

    1. Author Response

      Reviewer #1 (Public Review):

      Here the authors sought to understand how BPGM/2,3-BPG levels are involved in adaptive responses to hypoxia and whether they are involved in fetal growth restriction. In the current state, I find the data to be confusing and lacking in mechanistic data to justify that increased BPGM is an adaptive response to hypoxia. While the authors find increased staining for the enzyme BPGM in SpA-TGCs after hypoxia, they did not assess 2,3-BPG in cord blood. This would show that increased enzymatic levels have a downstream impact. MRI experiments assessing placental and fetal haemoglobin-oxygenation, showed no differences. Human FGR samples, however, showed reduced 2,3-BPG in cord blood. Further evidence is required to show hypoxia increases BPGM as a compensatory mechanism to permit adequate 2,3-BPG and placental-fetal oxygenation levels as the authors claim.

      Additional experiments that demonstrate that BPGM is advantageous in the context of hypoxia would strengthen the authors arguments, and would provide a novel mechanism for adaptive responses to hypoxia in the placenta which is highly interesting.

      Obtaining cord-blood from mouse embryos and analyzing its 2,3 BPG content is technically not feasible thus we concentrated on the human data only. However note that the dominant physiological effect would be on maternal blood in the placenta, where local elevation of 23BPG can aid in oxygen release.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript will be of interest for investigators in the field of development and the biology of pregnancy. The major strengths of the data are the detailed description of a hypoxia-induced mouse model of fetal growth restriction, where phenotypes, tissue histology, MRI images and metabolic analysis combine to characterize the experimental system. The data seem descriptive and preliminary, and the comparison to human pregnancy is neither supportive nor rigorous.

      Strengths

      • The mouse pregnancy has been used by the authors and by others as a model for placental insufficiency. The manuscript provides incremental data to characterize hypoxia- induced fetal growth restriction

      • The 15.2T MR imaging technology is high quality and informative, even if the results did not reveal marked changes.

      • The detailed characterization of BPGM expression in the apical mouse placental surfaces is valuable.

      • The provided model may be useful for future studies by the authors.

      Weaknesses

      • The metabolic analysis was restricted to one enzyme and metabolite. Placental analysis of 2,3-BPG and BPGM were already published (ref 29-30). At best, if the 2,3 BPG is related to the phenotype, it night be interpreted as a part of the injury in human cases, and adaptive response in the mouse models (as the authors suggested lines 286-288 and 332-336.). However, these assumptions are not tested.

      In the paper of Pritlove et al. (ref. 29) the authors demonstrated the expression of BPGM in normal human cohort. However, they did not test BPGM expression or 2,3 BPG levels in FGR placentae. In the paper of Gu et al. (ref. 30) the authors analyze murine placental BPGM expression secondary to igf2 deletion. Our study is the first to demonstrate the impact of maternal hypoxia on placental BPGM levels in murine gestational hypoxia models .

      • The human cases are not very informative. The causes of FGR were not known, but clearly (Table 1) not analogous to that of the mouse model. Systemic hypoxia in humans might have been more informative. In its absence, the value of cross-species comparison is low. -

      • While the provided experiments are of good quality, the approach is very descriptive and not advancing mechanistic understanding of FGR-related placental insufficiency.

      The human placenta were specifically selected to exclude known causes of FGR such as heavy smoking or iron deficiency. We will work to expand the diversity of cases to test the potential role of BPGM in those cases as well.

    1. Author Response

      Reviewer #1 (Public Review)

      This manuscript describes a new method to perform online movement correction and extraction of calcium signals from a miniscope. The efficiency of the algorithm is tested by quantifying the accuracy of animal location decoding from hippocampal place cells. The online decoding happens with virtually no delay which is promising for closed-loop methods. It seems to be superior to online decoding without motion correction, which was the state of the art.

      The strength of this technique is therefore that it achieves real-time processing.

      The weakness of the study is the lack of comparison of the decoding accuracy with what can be obtained with electrophysiological state of the art, which prevents really estimating how precise the technique is.

      In revision, we present data showing that when our system is used to decode contour-based calcium traces from N≈50 neurons, the decoder achieves a mean distance error of ~30 cm which is worse than the mean error of ~20 cm achieved using maximum likelihood decoding of single unit spike trains from electrophysiological recordings (Fig. 7E). However, when decoding of N=900 contour-free calcium traces from the same image frames in the same rats, the mean decoding error goes down to ~15 cm, which is better than the mean for electrophysiological recordings. From this we conclude that real-time decoding of position from calcium traces achieves accuracies similar to those achievable with electrophysiology.

      Although less critical, there is no demonstration of a closed-loop application.

      It is true that we have not yet demonstrated a real-time closed loop application, but by demonstrating short latency generation of TTL outputs triggered by the decoder, we demonstrate the capability for closed-loop applications.

      Real-time position decoding is technically nice, but the position can be obtained from tracking the animal so it is practically useless.

      We offer two points in reply to this comment. First, decoding position from neural activity could offer useful (though not yet demonstrated) capabilities that would not be achievable with simple position tracking; for example, the position decoder could be trained on CA1 signals obtained during waking and then used to read out position trajectories generating during REM sleep.

      Second, and more importantly, position decoding was selected as a benchmark for performance testing mainly because it allows highly precise comparisons between decoder predictions and ground truth, which is important for establishing that the fidelity of calcium signals imaged in real time is adequate for accurate decoding of behavior at short latencies.

      It is also clear that decoding position on a linear track is easier than on a 2D arena, therefore it is difficult to estimate how much the efficiency of the method can be challenged in harder settings.

      It is true that decoding in a 2D arena would be a greater challenge than a 1D linear track, but in pursuit of our goal to rapidly disseminate a system with capabilities for short latency decoding of behavior from calcium signals, optimizing system performance for one specific application (e.g,, position decoding) is not our main priority. A higher priority is to offer versatility for a wide range of experimental applications. To better demonstrate such versatility, the revised manuscript includes a new section in the Results that demonstrates categorical classification of behaviors during an instrumental touchscreen task.

      Reviewer #2 (Public Review):

      In this paper, the authors developed a new device for online decoding of position based on calcium imaging in freely moving rodents. This device could be used in the brain-computer interface to investigate neurofeedback-based therapies for neurological disorders. The technical part is properly done and gives convincing results that can be truly helpful for the scientific community using the miniscope. Nevertheless, as a methodological article, there should be more details regarding the accuracy of the decoding and of the different steps to follow if someone wants to use their methodology. Moreover, a true online real-time experiment should be performed to validate the device.

      Please find below my comments:

      • From what I read the authors did not perform a true real-time experiment. I think this step iscrucial to ensure the quality of their device.

      It is unclear from this comment where to draw the bar for a “true real-time experiment.” Some previous publications of real-time approaches (such as refs #6,#11,#26) have proposed causal algorithms without performance tests in hardware at all, whereas others (such as ref #14) have performance tested their system in hardware by carrying full experiments using closed-loop feedback (albeit with much smaller numbers of calcium trace predictors than we demonstrate here) without comparing different algorithmic approaches. Here we use an intermediate strategy of feeding raw offline video from a virtual sensor through the hardware processing pipeline (verifying that calcium trace outputs were identical for the real and virtual sensors). We adopted this intermediate approach to achieve the dual objectives of testing a true hardware implementation on real-time performance measures (e.g., microsecond processing latencies) while also benchmarking different algorithms (such as CB versus CF trace extraction as in Fig. 3, or raw calcium traces versus deconvolved spikes as in panel A of the Supplement to Fig. 3) against one another on the same datasets.

      • There should be a validation against a classical offline Bayesian decoding.

      We have presented an accuracy comparison for decoding linear track position from calcium traces with DeCalciOn versus decoding from single-unit spikes with electrophysiological recording data (Fig. 7E); decoding from single-unit spikes utilized a classical Bayesian maximum likelihood approach (see Methods), so Fig. 7E not only offers a comparison between calcium imaging versus electrophysiology, but between online linear classifier versus classical offline Bayesian approaches as well. In addition, we compared the performance of the linear classifier to a naïve Bayes decoder in panel B of the Supplement to Fig 3, showing that performance is better for the linear classifier than naïve Bayes.

      • "To mimic these steps using the virtual sensor in our performance tests, one session of imagedata was collected and stored from each of the 13 rats, yielding ~7 min (8K-9K frames) of sensor and position tracking data per rat. The linear classifier was then trained on data from the first half of each session and tested on data from the second half." This sentence is not clear enough. The authors should clearly describe the exact time needed for each experimental step. What is the time needed for instance for the experimental step 2, during which the linear classifier is trained to decode behavior from the initial dataset? This is crucial information if someone wants to use this device.

      In response to this comment, the Results section of the revised manuscript includes an extensive subsection (‘Steps of a real-time imaging session’) that describes each experimental step in detail (pages 4-6), including the time required for each step. In addition, this information is now more thoroughly summarized in the diagram of Fig. 1B.

      How the accuracy varies with the duration (or the quality) of the initial dataset? It is important that the authors provide an investigation of this to validate their device.

      This issue is now discussed in the Results near the bottom of page 5. In addition, Fig. 3G now plots how position decoding improves as a function of the size of the training dataset.

      • For instance, what is the decrease in decoding accuracy 1) with fewer place cells?

      The scatterplots in the right panels of Fig. 3D show that decoding accuracy improves as a function of the number of neurons imaged in given rat.

      What is the approximative number of place cells to obtain reliable decoding?

      This question is addressed by showing how decoding accuracy improves with the number of imaged neurons (Fig. 3D scatterplots). We also address this issue on our performance comparison of CB versus CF and CF+ traces since differing numbers of calcium trace predictors appear to be an important factor in accounting for the observed performance differences, as discussed in the main text (page 16, last paragraph).

      2) With the duration of the initial recording session. Here it seems to be of the order of 3-4 min.What if the recording session is shorter? Is there some constraint about this recording session (in terms of speed, stops, etc...) to obtain good decoding?

      The revised Fig. 3G plots how position decoding improves as a function of the size of the training dataset.

      3) Is there a link between the decoding accuracy and the number of place cells nearby?

      We did not select calcium traces that met a spatial criterion (i.e, “place cells”) to be include in the decoding analysis, Instead, all detected CA1 calcium traces provided input to the decoder, regardless of their spatial tuning properties (Fig. 3D and panels D,E of the Supplement to Fig. 3 show that many cells were indeed spatially tuned). Also note that when contour-free (CF) trace extraction methods were used, each calcium trace could detect fluorescence from multiple neurons. Under this methodology it is not straightforward to analyze how decoding accuracy at a given position varies with the “number of place cells nearby” and we are not convinced that presenting such an analysis would advance our main goal of demonstrating DeCalciOn’s capabilities to researchers.

      • The authors specified the time delay of 2.5ms for their device. Yet, it is pointless regarding thepurpose of the decoding. The important information is the precise position of the animal when the device is used to trigger a stimulation at a given location. Again, a true online experiment should be done to validate that a TTL can be triggered by the device at a precise location (with a quantification of the error made).

      We agree that this is an important issue, and it has been thoroughly addressed in the revised manuscript.

      • There is no information on the accuracy of the decoding with respect to the location in thelinear track. It is likely that the extremities of the linear track will be better identified. Figure 4C does not provide a clear description of the error made. The choice of D=2 (which seems to represent the spatial bin) is not justified. Two spatial bins seem to represent +/-40 cm which is quite large.

      Polar plots in Fig. 3F of the revised manuscript show mean accuracy in each position bin for decoders trained on offline, CB, CF,. and CB+ calcium traces.

      • The movement artefacts are not equally observed in the maze. The way they are correctedmight be captured by the linear decoder. These artefacts might have a strong influence on the decoding. Please provide a quantification of the correction made during steps 1 and 2 in relation to the position of the animal on the linear track. The authors should provide a correlation between the presence of these corrections with the decoding accuracy.

      Regardless of whether analysis is done offline or online, any calcium imaging and decoding experiment is vulnerable to two potential problems arising from motion artifact:

      PROBLEM #1. Image motion can generate noise in calcium signals that disrupts the accuracy of decoding.

      PROBLEM #2. Image motion that is correlated with behavior can convey uncontrolled information that allows the decoder to learn predictions from image motion rather than calcium signals. Very few published in-vivo calcium imaging experiments provide adequate controls for these two possible sources of artifact (again, such controls are just as necessary for offline as for online experiments). In response to the referee comments, we have provided controls for these confounds in our performance tests of DeCalciOn’s online decoding capabilities.

      Fig. 4B of the revised paper shows that without online motion correction, several rats in the linear track experiment show a significant correlation between position error and motion artifact (indicated by positive values on the y-axis); hence, motion artifact impairs decoding of position on the linear track in these rats (problem #1 above). This correlation between motion artifact and decoding error is reduced or eliminated by online motion correction (as indicated by values near zero on the x-axis), demonstrating that online motion correction helps to prevent motion artifact from impairing the accuracy of decoding.

      Fig. 6 of the revised paper shows that during an operant touchscreen experiment, motion artifact occurs preferentially during specific behaviors such as visiting the food magazine (reward retrieval, Fig. 6A) or touching the screen to make a response (correct choice, Fig. 6B). When motion correction is not used (top graphs in Figs. 6C-F), the average motion artifact is higher during frames when the decoder accurately predicts behavior than during frames when the decoder fails to predict behavior; hence, motion artifact appears to improve the accuracy of predicting these behaviors (problem #2 above). When motion correction is used, the average motion artifact no longer differs for correctly versus incorrectly decoded frames (except in one case, bottom right graph of Fig. 6E), indicating that motion correction helps to prevent the decoder from learning to predict behavior from motion artifact.

      • Besides the methodological part, I have some physiological questions. It is quite common inlinear tracks to have bi-directional and unidirectional place cells. Is it the case here? How many? It is difficult to see this in figure C. Is there an error due to the online decoding of the position in the two directions of the linear track?

      Again, since we did not select calcium traces that met a spatial criterion (i.e, “place cells”) to be include in the decoding analysis, and since CF traces could detect fluorescence from multiple neurons, we are not convinced that presenting a detailed analysis of this issue would advance our primary goal of demonstrating DeCalciOn’s capabilities to reseachers.

      Reviewer #3 (Public Review):

      DeCalciOn is an innovative contribution to the toolbox of real-time processing of calcium imaging data. It provides calcium traces from hippocampal CA1 neurons with a roughly two-millisecond latency and uses them to decode the position of rats running along a linear track - setting the stage for closed-loop experiments requiring fast interpretation of neural activity. The manuscript would be strengthened by a more systematic, empirical comparison to other, currently available alternative approaches. In addition, the decoding analysis does not fully account for the possibility of artifactual motion in the imaging video being informative of position.

      We suggest strengthening this manuscript by addressing the following four points:

      1) In the discussion of other platforms, the authors state that "Any system that lacks motionstabilization would also be vulnerable to artifactually decoding behavior from brain motion (which can be correlated with behavior) rather than neural activity." It follows that the same problem might also occur with incomplete motion correction. While the motion-corrected video shown in Supplementary Video 1 has reduced motion compared to the raw video, motion is still visible, including outside of the marked jitter. It remains possible that the linear decoders for the position in the linear track are utilizing brain motion-induced, as opposed to calcium fluorescence-induced, signal changes. A critical first step to assess this issue is to ask whether the motion in the video is related to the rat's behavior. One could test whether the 2D motion displacement traces can be used to predict rat position using linear classifiers.

      Briefly, we show that motion correction helps to prevent the decoder from learning to predict behavior from motion artifact.

      2) The manuscript would benefit from repeating the experiment in a more complex environment,such as a 2D arena. This would increase the generalizability of the findings. In addition, increasing the complexity of the environment would reduce the possibility that particular types of brain motion are closely linked with positions in the environment.

      We have diversified our performance testing by presenting results for decoding calcium activity from a different brain region (OFC rather than CA1) during a different kind of behavior (an instrumental touchscreen task rather than a linear track).

      3) The authors present an interesting comparison between "contour-free" and traditionalcontour-based source extraction. A more comprehensive discussion on the history or novelty of "contour-free" calcium imaging processing would contextualize this result.

      The revised Discussion section contains a new subsection titled “Source identification” to contextualize this issue.

      4) In the discussion, the authors compare DeCalciOn to two previous online calcium imagingalgorithms. The technical innovations of this work would be better highlighted by directly testing all three of these algorithms, ideally on similar datasets.

      Briefly, one of the two cited systems is designed for compatibility with benchtop 2P microscopes and does not interface with miniscopes; public resources are not available for the other cited online algorithm.

    1. Author Response

      Reviewer #3 (Public Review):

      This is an interesting study to examine how alveolar bone responds to oral infection using unbiased scRNA-seq. The manuscript is well-written and the results are convincing.

      1) The authors should revise the abstract. The study did nothing with the understanding of healing. The whole conditions were performed under infection and inflammation which actually induce bone loss, but not healing.

      Thank you for raising this point. We have revised the manuscript accordingly.

      2) Since periapical inflammation causes progressive bone loss, how MSC with increasing osteogenic potentials contributes to bone loss? The authors should discuss it.

      We would like to thank the reviewer for this important comment. Although AP is an inflammatory disease with periapical bone loss, the progression of AP is usually self-limiting in which a new equilibrium has been established between root canal pathogens and anti-infective defense mechanisms (Wang, Zhang, Xiong, & Peng, 2011). Animal experiments revealed that the bone lesion size reached to stable 21 days after establishing AP, which was resulted from a balance of bone remodeling (Márton & Kiss, 2014; Wang et al., 2011). Previous studies have shown that human apical granulation tissues contain osteogenic cells (Maeda, Wada, Nakamuta, & Akamine, 2004). A population of MSCs were isolated from human periapical cysts, which tended to be directed to differentiate toward the osteogenesis lineage (Marrelli, Paduano, & Tatullo, 2013, 2015; Tatullo et al., 2015). Activated by inflammatory bone destruction, these MSCs with increased osteogenic potentials may rescue the bone resorption process, which reach the equilibrium between bone formation and resorption then drive the progression of AP into stable states (Márton & Kiss, 2014). Since the pathologic stimuli exists constantly, the protective actions can alleviate the bone loss to some extent. In clinical practice, root canal therapy (RCT) aims to disinfect and remove the pathogenic factors, which makes the protective activities overweigh the destructive ones (L. M. Lin, Ricucci, Lin, & Rosenberg, 2009). The bone lesions of AP patients receiving RCT usually fully recovered with resolution of radiolucency after the inflammation is controlled in apical area (Soares, Santos, Silveira, & Nunes, 2006). The healing of AP lesion is highly correlated with the osteogenic potential of inflamed MSCs (L. M. Lin et al., 2009).

      We added the related contents in the discussion section.

      3) Did the authors detect osteoclasts by scRNA-seq? If not, are there any precursors of osteoclasts identified in inflammatory alveolar bones? 1) I suggest that the authors provide a more detailed analysis of inflammation since this is a unique model to study oral bone inflammation.

      Thank you for this valuable point. Bone destruction is a major pathological factor in chronic inflammatory diseases such as AP. Various cytokines including TNF-α, IL-1α, IL-6 were released by immunocytes to recruit the osteoclast precursors and induce the maturation of osteoclasts. We detected osteoclast markers including Ctsk, Acp5, Mmp9 and Nfatc1 by scRNA-seq. Moreover, Csfr1, Cx3cr1, Itgam, and Tnfrs11a were used to identify osteoclast precursors. The expression pattern of these osteoclast-related markers in all clusters were presented in Figure 3A. Markers of osteoclast and osteoclast precursors were highly expressed in the clusters of monocyte and macrophage. The expression levels of these markers were analyzed in all clusters (Figure 3B). The GO analysis showed that inflammation related immune reactions and bone resorption activity were significantly enriched in macrophage cluster (Figure 3C). Moreover, pseudotime analysis was performed for the clusters of macrophage and monocyte. Two independent branch points were determined and five monocyte/macrophage subclusters scattered at different branches in the developmental tree (Figure 3D, G). The results showed that the monocyte cluster differentiated into the macrophage cluster (Figure 3E). During this trajectory, the gene expression pattern across pseudotime showed that osteoclastic genes, such as Ctsk, Acp5, Mmp9, Atp6v0d2, and Dcstamp were progressively elevated (Figure 3F). Of note, we have observed a branch which was highly positive for Ctsk and Acp5 (Figure 3H), indicating the mature osteoclasts were differentiated from monocyte/macrophage lineage and contributed to inflammatory bone resorption during AP. We have also analyzed the expression of osteoclast related genes using the bulk RNA-seq library built on mandibular samples extracted from mice with AP. Markers of osteoclast and osteoclast precursors were significantly upregulated, confirming the osteoclasts activity in the inflammatory-related bone lesion (Figure 3I). Please see page 9 and figure 3.

      4) It is known that macrophages can be classified into M1 and M2. Based on scRNA-seq, did the authors observe these two types?

      We appreciate this point raised by the reviewer. We used CD86, CD80, IL1β, and TNF as markers of M1-like macrophages. CD163, CD206, MSR1 and IL-10 were used as markers to detect M2 subset in the macrophage cluster. The analysis of macrophage cluster showed the M1-like macrophage accounted for the vast majority in AP lesions. The expression pattern of M2 markers were also presented in macrophage cluster (Figure 3-figure supplement 1A, B).

    1. Author Response

      Reviewer #1 (Public Review):

      This study intended to identify the metabolic at-risk profile within PLWH on ART, by integrating and analyzing the multiomics data from multi-omics including untargeted plasma metabolomic, lipidomic, and fecal 16s microbiome. The overall strength of the study is the long-term treatment (~15 years) of the study subjects with well-recovered CD4 cell count and viral suppression. The integration and analysis of multi-omics data using similarity network fusion and factor analysis, etc. to group or differentiate HIV patients are informative and useful. The weakness of the study is the lack of presentation of comparability between patients and healthy controls and the use of multiple regression analysis for controlling potential confounders.

      We are thankful to the reviewer for the critical reading of our manuscript. The primary aim of our study was to identify the molecular data-driven phenotypic patient stratification in a cohort of PLWHART with prolonged suppressive therapy to identify the at-risk metabolic profile following long-term successful therapy. We and others have reported in several studies (e.g., Ref#9 and 10) that there were distinct systemic patterns in multi-omics data. However, as suggested, we have now provided Table 1-source data 1. We have kept HC in the analysis to define which group is presenting an HC-like profile among HIV, but we are not using them to perform statistics and draw conclusions.

      Reviewer #2 (Public Review):

      This study systematically integrates multi-omics (plasma lipidomic and metabolomic, and fecal 16s microbiome) data to identify the metabolic at-risk profiles within people living with HIV on antiretroviral therapy (PLWHART). As a result, three groups of PLWHART (SNF-1 to 3) were identified, which showed distinct phenotypes. Such insights cannot be obtained by a single type of omics data or clinical data, and have implications in personalized medicine and lifestyle intervention. Connecting the findings in this study with specific medical/clinical insights is the next challenge.

      We are thankful to the reviewer for the suggestion. System biology's application in identifying a disease state's biological mechanism in HIV-infected individuals is a relatively new field. We agree with the reviewer that connecting the findings in this study with specific medical/clinical insights is the next challenge. However, the first proof-of-concept study on 108 patients showed that multi-omics studies could generate a correlation network of communities of related analytes associated with physiology and disease. More importantly, the behavioral coaching informed by personal data helped participants to improve clinical biomarkers [PMID: 28714965]. The applications of multi-omics data are more and more valuable in non-communicable diseases [PMID: 35528975, PMID: 36503356 etc.]. As suggested by the reviewer, we have now elaborated on the medical/clinical value in identifying metabolic at-risk profiles, in particular the potential to improve individual risk stratification and to personalize lifestyle interventions. Still, as our study is an association study, data should be regarded as exploratory, and not sufficient to suggest any changes in clinical practice.

      We have concluded the manuscript as follows:

      “However, alterations in the metabolomics profile and higher CD4 T-cell count at the time of sample collection indicate a complex systemic interplay between host immunity and metabolic health. It can lead to an aggravated higher inflammation profile leading to a cardiometabolic risk profile among the MSM that might affect healthy aging in this population. Integrative analytical approaches that reflect the overall systemic health profile of PLWH may improve patient stratification and individual therapeutic and preventive strategies. Given the complex interplay between the clinical and molecular metabolic profile, the application of the multi-omics data for much larger cohorts of PLWH might facilitate a better identification of network perturbations and molecular network connections to detect early disease transition toward metabolic complications at an earlier stage. Developing a more personalized model or targeting the interaction networks rather than individual clinical or omics features may provide novel treatment strategies in countering dysregulated metabolic traits, aiming to achieve healthier aging.”

    1. Author Response

      eLife assessment:

      This study addresses whether the composition of the microbiota influences the intestinal colonization of encapsulated vs unencapsulated Bacteroides thetaiotaomicron, a resident micro-organism of the colon. This is an important question because factors determining the colonization of gut bacteria remain a critical barrier in translating microbiome research into new bacterial cell-based therapies. To answer the question, the authors develop an innovative method to quantify B. theta population bottlenecks during intestinal colonization in the setting of different microbiota. Their main finding that the colonization defect of an acapsular mutant is dependent on the composition of the microbiota is valuable and this observation suggests that interactions between gut bacteria explains why the mutant has a colonization defect. The evidence supporting this claim is currently insufficient. Additionally, some of the analyses and claims are compromised because the authors do not fully explain their data and the number of animals is sometimes very small.

      Thank you for this frank evaluation. Based on the Reviewers’ comments, the points raised have been addressed by improving the writing (apologies for insufficient clarity), and by the addition of data that to a large extent already existed or could be rapidly generated. In particularly the following data has been added:

      1. Increase to n>=7 for all fecal time-course experiments

      2. Microbiota composition analysis for all mouse lines used

      3. Data elucidating mechanisms of SPF microbiome/ host immune mechanisms restriction of acapsular B. theta

      4. Short- versus long-term recolonization of germ-free mice with a complete SPF microbiota and assessment of the effect on B. theta colonization probability.

      5. Challenge of B. theta monocolonized mice with avirulent Salmonella to disentangle effects of the host inflammatory response from other potential explanations of the observations.

      6. Details of all inocula used

      7. Resequencing of all barcoded strains

      Additionally, we have improved the clarity of the text, particularly the methods section describing mathematical modeling in the main text. Major changes in the text and particularly those replying to reviewers comment have been highlighted here and in the manuscript.

      Reviewer #1 (Public Review):

      The study addresses an important question - how the composition of the microbiota influences the intestinal colonization of encapsulated vs unencapsulated B. theta, an important commensal organism. To answer the question, the authors develop a refurbished WITS with extended mathematical modeling to quantify B. theta population bottlenecks during intestinal colonization in the setting of different microbiota. Interestingly, they show that the colonization defect of an acapsular mutant is dependent on the composition of the microbiota, suggesting (but not proving) that interactions between gut bacteria, rather than with host immune mechanisms, explains why the mutant has a colonization defect. However, it is fairly difficult to evaluate some of the claims because experimental details are not easy to find and the number of animals is very small. Furthermore, some of the analyses and claims are compromised because the authors do not fully explain their data; for example, leaving out the zero values in Fig. 3 and not integrating the effect of bottlenecks into the resulting model, undermines the claim that the acapsular mutant has a longer in vivo lag phase.

      We thank the reviewer for taking time to give this details critique of our work, and apologies that the experimental details were insufficiently explained. This criticism is well taken. Exact inoculum details for experiment are now present in each figure (or as a supplement when multiple inocula are included). Exact microbiome composition analysis for OligoMM12, LCM and SPF microbiota is now included in Figure 2 – Figure supplement 1.

      Of course, the models could be expanded to include more factors, but I think this comment is rather based on the data being insufficiently clearly explained by us. There are no “zero values missing” from Fig. 3 – this is visible in the submitted raw data table (excel file Source Data 1), but the points are fully overlapped in the graph shown and therefore not easily discernable from one another. Time-points where no CFU were recovered were plotted at a detection limit of CFU (50 CFU/g) and are included in the curve-fitting. However, on re-examination we noticed that the curve fit was carried out on the raw-data and not the log-normalized data which resulted in over-weighting of the higher values. Re-fitting this data does not change the conclusions but provides a better fit. These experiments have now been repeated such that we now have >=7 animals in each group. This new data is presented in Fig. 3C and D and Fig. 3 Supplement 2.

      Limitations:

      1) The experiments do not allow clear separation of effects derived from the microbiota composition and those that occur secondary to host development without a microbiota or with a different microbiota. Furthermore, the measured bottlenecks are very similar in LCM and Oligo mice, even though these microbiotas differ in complexity. Oligo-MM12 was originally developed and described to confer resistance to Salmonella colonization, suggesting that it should tighten the bottleneck. Overall, an add-back experiment demonstrating that conventionalizing germ-free mice imparts a similar bottleneck to SPF would strengthen the conclusions.

      These are excellent suggestions and have been followed. Additional data is now presented in Figure 2 – figure supplement 8 showing short, versus long-term recolonization of germ-free mice with an SPF microbiota and recovering very similar values of beta, to our standard SPF mouse colony. These data demonstrate a larger total niche size for B. theta at 2 days post-colonization which normalizes by 2 weeks post-colonization. Independent of this, the colonization probability, is already equivalent to that observed in our SPF colony at day 2 post-colonization. Therefore, the mechanisms causing early clonal loss are very rapidly established on colonization of a germ-free mouse with an SPF microbiota. We have additionally demonstrated that SPF mice do not have detectable intestinal antibody titers specific for acapsular B. theta. (Figure 2 – figure supplement 7), such that this is unlikely to be part of the reason why acapsular B. theta struggles to colonize at all in the context of an SPF microbiota. Experiments were also carried to detect bacteriophage capable of inducing lysis of B. theta and acapsular B. theta from SPF mouse cecal content (Figure 2 – figure supplement 7). No lytic phage plaques were observed. However, plaque assays are not sensitive for detection of weakly lytic phage, or phage that may require expression of surface structures that are not induced in vitro. We can therefore conclude that the restrictive activity of the SPF microbiota is a) reconstituted very fast in germ-free mice, b) is very likely not related to the activity of intestinal IgA and c) cannot be attributed to a high abundance of strongly lytic bacteriophage. The simplest explanation is that a large fraction of the restriction is due to metabolic competition with a complex microbiota, but we cannot formally exclude other factors such as antimicrobial peptides or changes in intestinal physiology.

      2) It is often difficult to evaluate results because important parameters are not always given. Dose is a critical variable in bottleneck experiments, but it is not clear if total dose changes in Figure 2 or just the WITS dose? Total dose as well as n0 should be depicted in all figures.

      We apologized for the lack of clarity in the figures. Have added panels depicting the exact inoculum for each figure legend (or a supplementary figure where many inocula were used). Additionally, the methods section describing how barcoded CFU were calculated has been rewritten and is hopefully now clearer.

      3) This is in part a methods paper but the method is not described clearly in the results, with important bits only found in a very difficult supplement. Is there a difference between colonization probability (beta) and inoculum size at which tags start to disappear? Can there be some culture-based validation of "colonization probability" as explained in the mathematics? Can the authors contrast the advantages/disadvantages of this system with other methods (e.g. sequencing-based approaches)? It seems like the numerator in the colonization probability equation has a very limited range (from 0.18-1.8), potentially limiting the sensitivity of this approach.

      We apologized for the lack of clarity in the methods. This criticism is well taken, and we have re-written large sections of the methods in the main text to include all relevant detail currently buried in the extensive supplement.

      On the question of the colonization probability and the inoculum size, we kept the inoculum size at 107 CFU/ mouse in all experiments (except those in Fig.4, where this is explicitly stated); only changing the fraction of spiked barcoded strains. We verified the accuracy of our barcode recovery rate by serial dilution over 5 logs (new figure added: Figure 1 – figure supplement 1). “The CFU of barcoded strains in the inoculum at which tags start to disappear” is by definition closely related to the colonization probability, as this value (n0) appears in the calculation. Note that this is not the total inoculum size – this is (unless otherwise stated in Fig. 4) kept constant at 107 CFU by diluting the barcoded B. theta with untagged B. theta. Again, this is now better explained in all figure legends and the main text.

      We have added an experiment using peak-to-trough ratios in metagenomic sequencing to estimate the B. theta growth rate. This could be usefully employed for wildtype B. theta at a relatively early timepoint post-colonization where growth was rapid. However, this is a metagenomics-based technique that requires the examined strain to be present at an abundance of over 0.1-1% for accurate quantification such that we could not analyze the acapsular B. theta strain in cecum content at the same timepoint. These data have been added (Figure 3 – figure supplement 3). Note that the information gleaned from these techniques is different. PTR reveals relative growth rates at a specific time (if your strain is abundant enough), whereas neutral tagging reveals average population values over quite large time-windows. We believe that both approaches are valuable. A few sentences comparing the approaches have been added to the discussion.

      The actual numerator is the fraction of lost tags, which is obtained from the total number of tags used across the experiment (number of mice times the number of tags lost) over the total number of tags (number of mice times the number of tags used). Very low tag recovery (less than one per mouse) starts to stray into very noisy data, while close to zero loss is also associated with a low-information-to-noise ratio. Therefore, the size of this numerator is necessarily constrained by us setting up the experiments to have close to optimal information recovery from the WITS abundance. Robustness of these analyses is provided by the high “n” of between 10 and 17 mice per group.

      4) Figure 3 and the associated model is confusing and does not support the idea that a longer lag-phase contributes to the fitness defect of acapsular B.theta in competitive colonization. Figure 3B clearly indicates that in competition acapsular B. theta experiences a restrictive bottleneck, i.e., in competition, less of the initial B. theta population is contributed by the acapsular inoculum. There is no need to appeal to lag-phase defects to explain the role of the capsule in vivo. The model in Figure 3D should depict the acapsular population with less cells after the bottleneck. In fact, the data in Figure 3E-F can be explained by the tighter bottleneck experienced by the acapsular mutant resulting in a smaller acapsular founding population. This idea can be seen in the data: the acapsular mutant shedding actually dips in the first 12-hours. This cannot be discerned in Figure 3E because mice with zero shedding were excluded from the analysis, leaving the data (and conclusion) of this experiment to be extrapolated from a single mouse.

      We of course completely agree that this would be a correct conclusion if only the competitive colonization data is taken into account. However, we are also trying to understand the mechanisms at play generating this bottleneck and have investigated a range of hypotheses to explain the results, taking into account all of our data.

      Hypothesis 1) Competition is due to increased killing prior to reaching the cecum and commencing growth: Note that the probability of colonization for single B. theta clones is very similar for OligoMM12 mouse single-colonization by the wildtype and acapsular strains. For this hypothesis to be the reason for outcompetition of the acapsular strain, it would be necessary that the presence of wildtype would increase the killing of acapsular B. theta in the stomach or small intestine. The bacteria are at low density at this stage and stomach acid/small intestinal secretions should be similar in all animals. Therefore, this explanation seems highly unlikely

      Hypothesis 2) Competition between wildtype and acapsular B. theta occurs at the point of niche competition before commencing growth in the cecum (similar to the proposal of the reviewer). It is possible that the wildtype strain has a competitive advantage in colonizing physical niches (for example proximity to bacteria producing colicins). On the basis of the data, we cannot exclude this hypothesis completely and it is challenging to measure directly. However, from our in vivo growth-curve data we observe a similar delay in CFU arrival in the feces for acapsular B. theta on single colonization as in competition, suggesting that the presence of wildtype (i.e., initial niche competition) is not the cause of this delay. Rather it is an intrinsic property of the acapsular strain in vivo,

      Hypothesis 3) Competition between wildtype and acapsular B. theta is mainly attributable to differences in growth kinetics in the gut lumen. To investigate growth kinetics, we carried our time-courses of fecal collection from OligoMM12 mice single-colonized with wildtype or acapsular B. theta, i.e., in a situation where we observe identical colonization probabilities for the two strains. These date, shown now in Figure 3 C and D and Figure 3 – figure supplement 2, show that also without competition, the CFU of acapsular B. theta appear later and with a lower net growth rate than the wildtype. As these single-colonizations do not show a measurable difference between the colonization probability for the two strains, it is not likely that the delayed appearance of acapsular B. theta in feces is due to increased killing (this would be clearly visible in the barcode loss for the single-colonizations). Rather the simplest explanation for this observation is a bona fide lag phase before growth commences in the cecum. Interestingly, using only the lower net growth rate (assumed to be a similar growth rate but increased clearance rate) produces a good fit for our data on both competitive index and colonization probability in competition (Figure 3, figure supplement 5). This is slightly improved by adding in the observed lag-phase (Figure 3). It is very difficult to experimentally manipulate the lag phase in order to directly test how much of an effect this has on our hypothesis and the contribution is therefore carefully described in the new text.

      Please note that all data was plotted and used in fitting in Fig 3E, but “zero-shedding” is plotted at a detection limit and overlayed, making it look like only one point was present when in fact several were used. This was clear in the submitted raw data tables. To sure-up these observations we have repeated all time-courses and now have n>=7 mice per group.

      5) The conclusions from Figure 4 rely on assumptions not well-supported by the data. In the high fat diet experiment, a lower dose of WITS is required to conclude that the diet has no effect. Furthermore, the authors conclude that Salmonella restricts the B. theta population by causing inflammation, but do not demonstrate inflammation at their timepoint or disprove that the Salmonella population could cause the same effect in the absence of inflammation (through non-inflammatory direct or indirect interactions).

      We of course agree that we would expect to see some loss of B. theta in HFD. However, for these experiments the inoculum was ~109 CFUs/100μL dose of untagged strain spiked with approximately 30 CFU of each tagged strain. Decreasing the number of each WITS below 30 CFU leads to very high variation in the starting inocula from mouse-to-mouse which massively complicates the analysis. To clarify this point, we have added in a detection-limit calculation showing that the neutral tagging technique is not very sensitive to population contractions of less than 10-fold, which is likely in line with what would be expected for a high-fat diet feeding in monocolonized mice for a short time-span.

      This is a very good observation regarding our Salmonella infection data. We have now added the fecal lipocalin 2 values, as well as a group infected with a ssaV/invG double mutant of S. Typhimurium that does not cause clinical grade inflammation (“avirulent”). This shows 1) that the attenuated S. Typhimurium is causing intestinal inflammation in B. theta colonized mice and 2) that a major fraction of the population bottleneck can be attributed to inflammation. Interestingly, we do observe a slight bottleneck in the group infected with avirulent Salmonella which could be attributable either to direct toxicity/competition of Salmonella with B. theta or to mildly increased intestinal inflammation caused by this strain. As we cannot distinguish these effects, this is carefully discussed in the manuscript.

      6) Several of the experiments rely on very few mice/groups.

      We have increased the n to over 5 per group in all experiments (most critically those shown in Fig 3, Supplement 5). See figure legends for specific number of mice per experiment.

      Reviewer #2 (Public Review):

      The goal of this study was to understand population bottlenecks during colonization in the context of different microbial communities. Capsular polysaccharide mutants, diet, and enteric infection were also used paired to short-term monitoring of overall colonization and the levels of specific strains. The major strength of this study is the innovative approach and the significance of the overall research area.

      The first major limitation is the lack of clear and novel insight into the biology of B. theta or other gut bacterial species. The title is provocative, but the experiments as is do not definitively show that the microbiota controls the relative fitness of acapsular and wild-type strains or provide any mechanistic insights into why that would be the case. The data on diet and infection seem preliminary. Furthermore, many of the experiments conflict with prior literature (i.e., lack of fitness difference between acapsular and wild-type strain and lack of impact of diet) but satisfying explanations are not provided for the lack of reproducibility.

      In line with suggestions from Reviewer 1, the paper has undergone quite extensive re-writing to better explain the data presented and its consequences. Additionally, we now explicitly comment on apparent discrepancies between our reported data and the literature – for example the colonization defect of acapsular B. theta is only published for competitive colonizations, where we also observe a fitness defect so there is no actual conflict. Additionally, we have calculated detection limits for the effect of high-fat diet and demonstrate that a 10-fold reduction in the effective population size would not be robustly detected with the neutral tagging technique such that we are probably just underpowered to detect small effects, and we believe it is important to point out the numerical limits of the technique we present here. Additionally for the Figure 4 experiments, we have added data on colonization/competition with an avirulent Salmonella challenge giving some mechanistic data on the role of inflammation in the B. theta bottleneck.

      Another major limitation is the lack of data on the various background gut microbiotas used. eLife is a journal for a broad readership. As such, describing what microbes are in LCM, OligoMM, or SPF groups is important. The authors seem to assume that the gut microbiota will reflect prior studies without measuring it themselves.

      All gnotobiotic lines are bred as gnotobiotic colonies in our isolator facility. This is now better explained in the methods section. Additionally, 16S sequencing of all microbiotas used in the paper has been added as Figure 2 – figure supplement 1.

      I also did not follow the logic of concluding that any differences between SPF and the two other groups are due to microbial diversity, which is presumably just one of many differences. For example, the authors acknowledge that host immunity may be distinct. It is essential to profile the gut microbiota by 16S rRNA amplicon sequencing in all these experiments and to design experiments that more explicitly test the diversity hypotheses vs. alternatives like differences in the membership of each community or other host phenotypes.

      This is an important point. We have carried out a number of experiments to potentially address some issues here.

      1) We carried out B. theta colonization experiments in germ-free mice that had been colonized by gavage of SPF feces either 1 day prior to colonization of 2 weeks prior to colonization. While the shorter pre-colonization allowed B. theta to colonize to a higher population density in the cecum, the colonization probability was already reduced to levels observed in our SPF colony in the short pre-colonization. Therefore, the factors limiting B. theta establishment in the cecum are already established 1-2 days post-colonization with an SPF microbiota (Figure 2 - figure supplement 8). 2) We checked for the presence of secretory IgA capable of binding to the surface of live B. theta, compared to a positive control of a mouse orally vaccinated against B. theta. (Fig. 2, Supplement 7) and could find no evidence of specific IgA targeting B. theta in the intestinal lavages of our SPF mouse colony. 3) We isolated bacteriophage from the intestine of SPF mice and used this to infect lawns of B. theta wildtype and acapsular in vitro. We could not detect and plaque-forming phage coming from the intestine of SPF mice (Figure 2 – figure supplement 7).

      We can therefore exclude strongly lytic phage and host IgA as dominant driving mechanisms restricting B. theta colonization. It remains possible that rapidly upregulated host factors such as antimicrobial peptide secretion could play a role, but metabolic competition from the microbiota is also a very strong candidate hypothesis. The text regarding these experiments has been slightly rewritten to point out that colonization probability inversely correlates with microbiota complexity, and the mechanisms involved may involve both direct microbe-microbe interactions as well as host factors.

      Given the prior work on the importance of capsule for phage, I was surprised that no efforts are taken to monitor phage levels in these experiments. Could B. theta phage be present in SPF mice, explaining the results? Alternatively, is the mucus layer distinct? Both could be readily monitored using established molecular/imaging methods.

      See above: no plaque-forming phage could be recovered from the SPF mouse cecum content. The main replicative site that we have studied here, in mice, is the cecum which does not have true mucus layers in the same way as the distal colon and is upstream of the colon so is unlikely to be affected by colon geography. Rather mucus is well mixed with the cecum content and may behave as a dispersed nutrient source. There is for sure a higher availability of mucus in the gnotobiotic mice due to less competition for mucus degradation by other strains. However, this would be challenging to directly link to the B. theta colonization phenotype as Muc2-deficient mice develop intestinal inflammation.

      The conclusion that the acapsular strain loses out due to a difference of lag phase seems highly speculative. More work would be needed to ensure that there is no difference in the initial bottleneck; for example, by monitoring the level of this strain in the proximal gut immediately after oral gavage.

      This is an excellent suggestion and has been carried out. At 8h post-colonization with a high inoculum (allowing easy detection) there were identical low levels of B. theta in the upper and lower small intestine, but more B. theta wildtype than B. theta acapsular in the cecum and colon, consistent with commencement of growth for B. theta wildtype but not the acapsular strain at this timepoint. We have additionally repeated the single-colonization time-courses using our standard inoculum and can clearly see the delayed detection of acapsular B. theta in feces even in the single-colonization state when no increased bottleneck is observed. This can only be reasonably explained by a bona fide lag-phase extension for acapsular B. theta in vivo. These data also reveal and decreased net growth rate of acapsular B. theta. Interestingly, our model can be quite well-fitted to the data obtained both for competitive index and for colonization probability using only the difference in net growth rate. Adding the (clearly observed) extended lag-phase generates a model that is still consistent with our observations.

      Another major limitation of this paper is the reliance on short timepoints (2-3 days post colonization). Data for B. theta levels over 2 weeks or longer is essential to put these values in context. For example, I was surprised that B. theta could invade the gut microbiota of SPF mice at all and wonder if the early time points reflect transient colonization.

      It should be noted that “SPF” defines microbiota only on missing pathogens and not on absolute composition. Therefore, the rather efficient B. theta colonization in our SPF colony is likely due to a permissive composition and this is likely to be not at all reproducible between different SPF colonies (a major confounder in reproducibility of mouse experiments between institutions. In contrast the gnotobiotic colonies are highly reproducible). We do consistently see colonization of our SPF colony by wildtype B. theta out to at least 10 days post-inoculation (latest time-point tested) at similar loads to the ones observed in this work, indicating that this is not just transient “flow-through” colonization. Data included below:

      For this paper we were very specifically quantifying the early stages of colonization, also because the longer we run the experiments for, the more confounding features of our “neutrality” assumptions appear (e.g., host immunity selecting for evolved/phase-varied clones, within-host evolution of individual clones etc.). For this reason, we have used timepoints of a maximum of 2-3 days.

      Finally, the number of mice/group is very low, especially given the novelty of these types of studies and uncertainty about reproducibility. Key experiments should be replicated at least once, ideally with more than n=3/group.

      For all barcode quantification experiments we have between 10 and 17 mice per group. Experiments for the in vivo time-courses of colonization have been expanded to an “n” of at least 7 per group.

    1. Author Response

      Reviewer #2 (Public Review):

      This is a highly interesting paper that provides important insights into the understanding of how HC-derived osteoblasts contribute to trabecular bone formation. Using single-cell transcriptomics, the authors found that HC descendent cells activate MMP14 and the PTH pathway as they transition to osteoblasts in neonatal and adult mice. They further demonstrate that HC lineage-specific Mmp14 null mutants (Mmp14ΔHC) produce more bone. By performing a panel of elegant in vitro studies, the authors show that MMP14 cleaves the extracellular domain of PTH1R, dampening PTH signaling. The authors provide more in vivo evidence showing that HC-derived osteogenic cells respond to PTH which is enhanced in Mmp14ΔHC. Generally, this is a very well-performed study that may contribute important novel aspects to the field.

      I have the following issues for the authors to address:

      1) The novel mechanism identified in this study (i.e. MMP14-induced PTH1R cleavage) is intriguing. It is unclear how specific this pathway is in the transition of HCs to osteoblasts. Are other MMPs besides MMP14 involved in the PTH1R cleavage? Is PTH1R the only substrate of MMP14?

      Thank you for your interest in our findings. ADAMs are known to cleave various transmembrane proteins such as RANKL. As described in supplementary fFgure 4A we tested A Disintegrin And Metalloproteinase (ADAMs) for their potential ability to cleave PTH1R. We did not find that ADAM10, 15, 17 could cleave PTH1R. The lack of the cleaved PTH1R peptide in extracts isolated from osteoblasts isolated from MMP 14 null bones (New Fig. 3E) suggest that there is not another major MMP that cleaves PTH1R. In regard to other substrates that are cleaved by MMP14 – we do review these in the manuscript and the possibility that the phenotype is contributed by deficiency in other substrates.

      2) Would it be possible for the authors to detect the truncated PTH1R fragment(s) from the conditioned medium prepared from either 293T or osteoblast culture?

      We tried to detect whether there could be PTH1R cleaved fragment in cultured medium by western blot of PCA precipitates of cultured medium. We could not detect any free peptide using anti-Flag or anti-HA antibody. It has been reported the ligand binding domain are linked by disulphide bond in vivo, therefore cleavage of PTH1R at the unstructured loop domain does not necessarily imply a release of cleaved fragment.

      3) The finding that HC-descendants persist and contribute to the anabolic response to PTH in aged mice is interesting. Have the authors examined the changes in MMP14 expression in bone with age and in response to PTH treatment?

      Thank you for your question, we added additional data showing induction of MMP14 expression upon PTH treatment in Figure 7—figure supplement 1. It has also been published that PTH stimulation increased MMP14 expression in osteocytes (1).

    1. Author Response

      Reviewer #2 (Public Review):

      Susswein et al. analyze a fine-scale, novel data stream of human mobility, openly available from Safegraph, based on the usage of mobile apps with GPS and sampled from over 45 million smartphone devices. They define a metric $\sigma_{it}$, properly normalized, that quantifies the propensity for visits to indoor locations relative to outdoor locations in a given county $i$ at week $t$. For each pair of counties $i$ and $j$, they compute the Pearson correlation coefficient $\rho_{ij}$ between the corresponding $\sigma$ metrics. This generates a correlation matrix that can be interpreted as the adjacency matrix of a network. They then perform community detection on this network/matrix, effectively clustering together time series that are correlated. This identifies three main clusters of counties, characterized geographically as either in the north of the country, in the south of the country, and possibly in tourism active areas. They then show, via a simple model, how including over-simplified models of seasonality may affect infectious disease models.

      This work is very interesting for the infectious disease modeling community, as it addresses a complex problem introducing a new data stream.

      This work builds on several strengths, among which:

      It is the first analysis of the Safegraph dataset to capture seasonality in indoor behavior.

      It provides a simple metric to quantify indoor activity, that thanks to the dataset can be computed with a high level of spatial detail.

      It aims at characterizing clusters of counties with a similar pattern of indoor activity.

      It aims at quantifying the impact of neglecting finer-scale patterns of seasonality, for example considering seasonality to be homogeneous at the US level.

      We thank the reviewer for the positive review of our work.

      At the same time, it presents several weaknesses that should be addressed to improve the methodology, its results, and the implication:

      There is no quantitative comparison of the newly introduced metric for indoor activity with other proxies of seasonality (e.g. temperature or relative humidity). The (dis)similarity with other proxies may help in assessing the importance of this metric, showing why it can not be exchanged with other data sources (like temperature data) that are widely available and are not affected by sampling issues (more on that later).

      We have now added supplementary figures (Figure S3) to illustrate how indoor activity seasonality compares with temperature and humidity. We have also added text to the Results and the Discussion to discuss this point.

      A major flow of the analysis is to perform community detection on a network defined by the correlation between time series with an algorithm that is based on modularity optimization. As explained in Macmahon et al.[1], all modularity optimization methods rely on null assumptions that in the case of correlation between time series are violated. Therefore, there is a very strong potential bias in their results that is not accounted for. Possible solutions could be to proceed via the methodology presented in [1] or via a different type of algorithm (e.g. Infomap [2]). In both cases, as the network is thresholded (considering only a correlation larger than 0.9), a more quantitative assessment of the impact of the threshold value should be included.

      References

      [1] Mel MacMahon and Diego Garlaschelli Phys. Rev. X 5, 021006 (2015).

      [2] Martin Rosvall and Carl T. Bergstrom PNAS 105, 1118 (2008).

      We thank the reviewer for making this excellent point. We have now added Supplementary Figures S13 and S14. In Figure S13, we demonstrate the robustness of our clustering results with different correlation thresholds. (We have also corrected a typo in our original Methods section which mistakenly stated our correlation threshold as 0.9 rather than the 90th percentile which is what we used.) In Figure S14, we show the clustering results using a different clustering algorithm. In an effort to test a non-network-based clustering approach, we use a hierarchical clustering approach and find a consistent partition of the US to our main results.

      It is not clear what is the added value of the data on indoor activity, as no fitting to real data is performed. Although this may be considered beyond the scope of this paper, I think it would be crucial to quantify how much a data-informed model would better describe real epidemic data (for example in the case of COVID-19). For now, only the impact of neglecting heterogeneity in indoor activity is shown, comparing a model with region-average parameters vs a model with county-level average parameters. Given that the dataset comes with potential bias in sampling (more on this later) it would be good to assess its goodness in predicting real epidemic spread. When showing results from different models, no visible errors are shown on the plot. How have the errors been estimated?

      We appreciate this point by the reviewer, and agree that future work will have to consider how indoor activity seasonality affects our ability to capture observed transmission trends. However, such work would additionally need careful characterization of other seasonal factors hypothesized to drive transmission (including environmental and other behavioral factors), and is beyond the scope of our work. Instead, in Figure 4 we aim to (a) provide the infectious disease modeling community with empirically-inferred parameters for a simple sinusoidal model which is commonly used in infectious disease models to capture transmission seasonality; and (b) demonstrate the implications of ignoring geographic heterogeneity in transmission seasonality in theoretical models of disease dynamics, which are commonly used for scenario analysis and model-based intervention design. As we demonstrate, transmission seasonality described by such sinusoidal models, even when they are empirically characterized as in our case, can lead to meaningfully different epidemic dynamics when transmission seasonality varies from the assumptions.

      Additionally, there is no uncertainty included in Figure 4B because transmission seasonality is either based on empirical data point per time step, or on the fitted sinusoidal model (where the estimated parameters have negligible standard errors).

      The dataset is presented as representative of the US population. However, this has not been assessed over time. As adherence to social distancing is influenced by several socio-economic determinants the lack of representativity in certain strata of the population at a given time may introduce an important bias in the dataset. Although this is an inherent limitation of the dataset, it should be discussed in the paper more thoroughly.

      We agree with the reviewer that this is a limitation. However, we do not have any way of assessing demographic representation in the dataset over time. We have instead included an additional sentence into the Discussion section acknowledging this point.

      In conclusion, I think that the methodology should be revised to account for the fact that the analysis is performed on a correlation matrix. Capturing seasonal patterns of indoor activity can help in tackling the crucial problem of seasonality in human behavior. This could help in identifying effective strategies of disease containment able to curb disease spread at a lower societal cost than fully-fledged lockdowns.

      We thank the reviewer again for their helpful suggestions.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors characterized the expression of DDR2 in the developing craniofacial skeleton. The authors showed that Ddr2-deficient mice exhibited defects in craniofacial bones including impaired calvarial growth and frontal suture formation, cranial base hypoplasia due to aberrant chondrogenesis, and delayed ossification at growth plate synchondroses. The histological studies are well done. However, the studies as shown in this manuscript do not provide cellular and molecular mechanisms beyond what is already known, particularly beyond what the authors have already published in a similar study in Bone Research (Mohamed et al., 2022 Feb 9;10(1):11). With the same Cre lines and analytic approaches, the authors already showed in the Bone Research paper that Ddr2 in the Gli1+ cells is required for chondrocyte proliferation and polarity in growth plate development and osteoblast differentiation. Cartilage development and bone formation occur in both long bones and craniofacial skeleton, the authors showed similar functions of Ddr2 in similar skeletal tissues, although the location is different. One new point in this manuscript might be: the authors indicated that loss of Ddr2 led to ectopic chondrocyte hypertrophic (Fig. 7I). But what the data actually showed was delayed chondrocyte hypertrophy and abnormal location of the delayed hypertrophic chondrocytes, which could be well caused by abnormal chondrocyte polarity. This interesting defect was superficially described with no mechanistic investigation at cellular or molecular level.

      New data is now provided showing that Ddr2 deficiency is associated with abnormal collagen organization and orientation as measured by second harmonic generation (SHG) (Fig 3-figure supplement 1). Specifically, collagen orientation as reflected by SHG anisotropy measurements was disrupted in Ddr2-deficient synchondroses. This result complements data showing that the distribution of type II collagen as measured by immunofluorescence changes with Ddr2 deficiency such that no collagen is seen in the interterritorial matrix between chondrocyte bundles (Fig 3a). This loss of collagen organization provides a potential mechanism to explain the disruption of chondrocyte polarity and altered localization of hypertrophic cells in synchondroses. In further support of this concept, other recently published studies described in the Discussion have shown that Ddr2 deficiency is associated with disruption of collagen fibril orientation in other experimental systems such as in CAF cells surrounding breast tumors as well as at sites of heterotopic ossification and that these abnormalities are associated with defective integrin signaling. Additional studies beyond the scope of the present communication will be required to determine if these matrix changes can explain the observed phenotypes. However, we believe this proposed mechanism is the most likely explanation for DDR2 effects based on current data.

      Reviewer #2 (Public Review):

      DDR2 is a collagen-binding receptor that is required for proper skull development. Ddr2 loss-of-function in humans is associated with the developmental disease spondylo-meta-epiphyseal dysplasia (SMED). Here, the authors aim to elucidate the role of DDR2 in skull development. In this work, the role of DDR2 in skull and face development is studied in mice, which exhibit SMED-like symptoms in the absence of Ddr2. Histological studies showed that Ddr2 knockout disrupts organization and proper differentiation within progenitor-rich regions of the skull from which bone growth occurs. Histology and lineage tracing studies revealed that DDR-expressing cells in/around these zones 1) generally also express the proliferation regulator Gli1, and 2) eventually contribute to osteogenic and chondrogenic lineages. Cell-type specific knockout studies were used to show that DDR2 has a development-specific role: knockout of Ddr2 in Gli+ cells re-capitulated the developmental abnormalities observed in global Ddr2 knockout mice; knockout in chondrocytes partially recapitulated developmental abnormalities, and osteoblast-specific knockout mice were indistinguishable from their wild-type littermates. This work also catalogues the locations of Ddr2 positive cells and their lineages at various stages of development. Additionally, the anatomical effects of loss of DDR2 function on skull and face development are thoroughly described in global and cell-type specific knockouts.

      This work is a vital and stimulating contribution to the scientific literature. The authors' claims and conclusions are well supported by the evidence they present.

      The scientific approach is sound and the conclusions important. However, a limitation of the work's discussion is a lack of attention paid to the specific biophysical mechanism that DDR2 is playing during development. The discussion of the positioning of the golgi is nice, but a lack of golgi polarity is likely a downstream effect of processes occurring within the cell adhesion and mechanotransduction machinery. Perhaps, like integrins, DDR2 is a mechanosensor that the cell needs to properly sense local collagen orientation, polarize, and secrete properly-organized COL2. It would be beneficial to put up some guideposts that will facilitate engagement from the molecular biophysics/mechanobiology community.

      Thank you for this suggestion. In response, we added new studies showing that DDR2 is necessary for ECM organization (please see reviewer 1 comments and additions to the Discussion section). In addition, the Discussion has been revised to include speculation on the relationship between DDR2-dependent ECM organization, mechanical properties of the matrix and cell differentiation. Because very little is known about DDR2 from a mechanistic perspective, much of what we propose is currently conjecture, but hopefully can guide future study.

      Reviewer #3 (Public Review):

      From this work, the authors investigated a number of parameters in order to profoundly understand and demonstrate the vital role of ongoing interaction between components of extracellular matrix and particular stem cells to induce normal Craniofacial development. Thus, there was a focus on the genetic manipulation (knockout) impact of molecules behind the above-mentioned interaction, and on determining how such modification would be reflected on skull bone morphogenesis.

      Strengths and Weaknesses

      • Using different animals' backgrounds in the same experiment might impact work outcomes.

      • Better to have (ethical approval) at the beginning of the material and methods in separate paragraphs.

      • It is great that the authors precisely explain all the measurements.

      • Supplementary file to have details of used antibodies might be required.

      • All methods have been written in academic and clear ways.

      • It is nice that there is a conclusion sentence by end of the results paragraph, which made it easy for readers to fully remember and understand.

      • It is possible to see a reduction in proliferative chondrocyte, with no change in apoptosis rate?

      Reductions in proliferation are certainly seen in many systems. Proliferation and apoptosis are not necessarily coupled.

      • Results are supposed to be compatible.

      • Very nice and representative images from the immunofluorescence protocol.

      • Using different techniques to confirm observations is clearly manifested in methods and results.

      It is clear that the author has used different methods and techniques in order to meet his work's objectives. Importantly, there was more than one procedure to confirm observations that are related to one or more than one aim.

      Although determining to what extent the outcomes of this work could be applied to community need might require a subspecialist physician's opinion, it seems that observations of the present study are likely to require a series of further investigations in order to take it to the level of human users. Notably, identification of molecules and pathways behind skull development abnormalities would open a door to early diagnosis reasons for such deformities, thus mitigating future abnormalities either by developing new prevention methods or discovering unique medications.

      Thank you for these comments. Additional commentary has been added to the Discussion to provide a more mechanistic interpretation of our results, however speculative they may be at this time. Ln 555-605

    1. Author Response

      Reviewer #1 (Public Review):

      King et al. provide an interesting reanalysis of existing fMRI data with a novel functional connectivity modeling approach. Three connectivity models accounting for the relationship between cortical and cerebellar regions are compared, each representing a hypothesis. Evidence is presented that - contrary to a prominent theoretical account in the literature - cortical connectivity converges on cerebellar regions, such that the cerebellum likely integrates information from the cortex (rather than forming parallel loops with the cortex). If true, this would have large implications for understanding the likely computational role of the cerebellum in influencing cortical functions. Further, this paper provides a unique and potentially groundbreaking set of methods for testing alternate connectivity hypotheses in the human brain. However, it appears that insufficient details were provided to properly evaluate these methods and their implications, as described below.

      Strengths:

      • Use of a large task battery performed by every participant, increasing confidence in the generality ofthe results across a variety of cognitive functions.

      • Multiple regression was used to reduce the chance of confounding (false connections driven by a thirdregion) in the functional connectivity estimates.

      • A focus on the function and connectivity of the cerebellum is important, given that it is clearly essentialfor a wide variety of cognitive processes but is studied much less often than the cortex.

      • The focus on clear connectivity-based hypotheses and clear descriptions of what would be expectedin the results if different hypotheses were true.

      • Generalization of models to a completely held-out dataset further increases confidence in thegeneralizability of the models.

      Concerns:

      1) The main conclusion of the paper (including in the title) involves a directional inference, and yet it is notoriously difficult to make directional inferences with fMRI. The term "input" into the cerebellum is repeatedly used to describe the prediction of cerebellar activity based on cortical activity, and yet the cerebellum is known to form loops with the cortex. With the slow temporal resolution of fMRI it is typically unclear what is the "input" versus the "output" in the kinds of predictions used in the present study. Critically, this may mean that a cerebellar region could receive input from a single cortical region (i.e., the alternate hypothesis supposedly ruled out by the present study), then output to multiple cortical regions, likely resulting (using the fMRI-based approach used here) in a faulty inference that convergent signals from cortex drove the results. On pg. 4 it is stated: "We chose this direction of prediction, as the cerebellar BOLD signal overwhelmingly reflects mossy-fiber input, with minimal contribution from cerebellar output neurons, the Purkinje cells (Mathiesen et al., 2000; Thomsen et al., 2004)." First, it would be good to know how certain this is in 2022, given the older references and ongoing progress in understanding the relationship between neuronal activity and the BOLD signal (e.g., Drew 2019). Second, given that it's likely that activity in the mossy-fiber inputs has an impact on Purkinje cell outputs, and that some cortical activity supposedly reflects cerebellar output, it is possible that FC could also reflect the opposite direction (cerebellumcortex). It would seem important to consider these possibilities in the interpretation of the results.

      We agree that making directional inferences with fMRI BOLD signals is difficult. We also note that because of the low temporal resolution of fMRI BOLD signals, we have not tried to extract directional information based on temporal lags. Rather, we emphasize that the relationship between neural activity and BOLD differs between the neocortex and cerebellum. In the cerebellum, mossy fiber activity releases glutamate which activates granule cells and the release of Nitric oxide (NO). NO is mostly released by granule cells and stellate cells. The release of NO increases the diameter of capillaries which in turn causes changes in blood flow and blood volume, two major contributors to BOLD signal changes (Alahmadi et al. 2016; Alahmadi et al. 2015; Drew 2019; Mapelli et al. 2017; Gagliano et al. 2022). Importantly, there is a negligible contribution of NO from the Purkinje cells. Taken together, these data make a strong case that the BOLD signal in the cerebellar cortex reflects activity at the input stage. We acknowledge that the references cited in our initial submission were somewhat dated. We have now provided additional references (which are in agreement with the findings from the earlier papers).. Based on this evidence, we chose to predict cerebellar activity from cortical activity.

      References: Alahmadi, A. A., Samson, R. S., Gasston, D., Pardini, M., Friston, K. J., D’Angelo, E., ... & Wheeler-Kingshott, C. A. (2016). Complex motor task associated with non-linear BOLD responses in cerebro-cortical areas and cerebellum. Brain Structure and Function, 221(5), 2443-2458.

      Alahmadi, A. A., Pardini, M., Samson, R. S., D'Angelo, E., Friston, K. J., Toosy, A. T., & Gandini Wheeler‐Kingshott, C. A. (2015). Differential involvement of cortical and cerebellar areas using dominant and nondominant hands: an FMRI study. Human brain mapping, 36(12), 5079-5100.

      Mapelli, L., Gagliano, G., Soda, T., Laforenza, U., Moccia, F., & D'Angelo, E. U. (2017). Granular layer neurons control cerebellar neurovascular coupling through an NMDA receptor/NO-dependent system. Journal of Neuroscience, 37(5), 1340-1351.

      Gagliano, G., Monteverdi, A., Casali, S., Laforenza, U., Gandini Wheeler-Kingshott, C. A., D’Angelo, E., & Mapelli, L. (2022). Non-Linear Frequency Dependence of Neurovascular Coupling in the Cerebellar Cortex Implies Vasodilation–Vasoconstriction Competition. Cells, 11(6), 1047.

      Drew, P. J. (2019). Vascular and neural basis of the BOLD signal. Current Opinion in Neurobiology, 58, 61–69.

      2) It would be helpful to have more details included in the "Connectivity Models" sub-section of the Methods section. The GLM-based connectivity approach is highly non-standard, such that more details on the logic behind it and any validation of the approach would be helpful. More specifically, it would be helpful to have clarity on how this form of functional connectivity relates to more standard forms, such as Pearson correlation and perhaps less standard multiple regression (or partial correlation) approaches. If I understand this approach correctly, each cortical parcel's time series is modulated (up or down) using that parcel's task-evoked beta weights, then "normalized" by the standard deviation of that parcel's time series, with the resulting time series then used in a multiple regression model to explain variance in a given cerebellar voxel's time series. It would be helpful if each of these steps were better explained and justified. For example, it is unclear what modulation of the cortical parcel time series by task-related beta weights does to the functional connectivity estimates, and thus how they should be interpreted.

      All of the models are multiple regression models. The independent variables (X) are the fitted (task-evoked) time series of the cortical parcels and the dependent variables (Y) are the fitted time series of each cerebellar voxel. Coefficients from multiple regression are identical to partial correlation coefficients if the cortical and cerebellar time series are z-standardized (SD=1). Here we only standardized the cortical time series. This only retains the weighting of the different cerebellar voxels (a cerebellar voxel that has a strong task-related signal should contribute more to the overall evaluation than a voxel where the task-related signal is weak); beyond this, the conclusions will be the same as that obtained with a partial correlation analysis.

      Because the number of predictors (#cortical parcels) approaches or outstrips the number of available observations (#task-related regressors), the ordinary-least-squares (OLS) solution to the multiple regression problem is not unique. We thus compared 3 common ways of regularizing a multiple regression problem: a) Picking only the most important regressor (a form of feature selection or optimal subspace selection), Ridge regression (L2 regularization) or Lasso regression (L1 regularization). Each method biases the solution in a particular way: The winner-take-all solution is obviously very sparse, the Lasso solution somewhat less sparse, and the Ridge solution quite dispersed. Here we exploited these differences in inductive bias, reasoning that the method with the bias that best matches the structure of the data-generating process will lead to better prediction performance on independent data.

      The results clearly favored a distributed input to each cerebellar voxel from the cortical parcels. We have rewritten the method section on connectivity models to better communicate the main idea.

      3) It appears that task-related functional connectivity is used in the present study, and yet the potential for task-evoked activations to distort such connectivity estimates does not appear to be accounted for (Norman-Haignere et al. 2012; Cole et al. 2019). For example, voxel A may respond to just the left hemifield of visual space while voxel B may respond to just the right hemifield of visual space, yet their correlation will be inflated due to task-evoked activity for any centrally presented visual stimuli. There are multiple methods for accounting for the confounding effect of task-evoked activations, none of which appear to be applied here. For example, the following publications include some options for reducing this confounding bias: (Cole et al. 2019; Norman-Haignere et al. 2012; Ito et al. 2020; Rissman, Gazzaley, and D'Esposito 2004; Al-Aidroos, Said, and Turk-Browne 2012). If this concern does not apply in the current context it would be important to explain/show why.

      The papers cited by the reviewer focus on the problem of how to remove task-evoked activity to estimate the correlation of spontaneous (task-independent) fluctuations. Here we are doing the opposite. We removed almost all spontaneous fluctuations and noise by averaging across trials and runs in order to fit the task-evoked activity. Additionally, we used a crossed approach as a way to control for the influence of task-independent fluctuations on the regression models: Within each task set, cerebellar activity from one half of the runs was predicted from cortical activity from the other half of the runs. Returning to the papers cited by the reviewer, these are designed to look at connectivity not related to task-evoked activity. We briefly summarize each below:

      ● Cole et al. (2019): Demonstrates that the removal of mean task-evoked activations while preserving task-evoked response shape is an important preprocessing step for validating task-based FC.

      ● Ito et al. (2020): Addressed the issue of shared variability between brain regions during task-evoked activity by estimating time series variance. They removed task-evoked activity from the time series in order to get a direct measure of neural-to-neural correlations (e.g., “background connectivity”) rather than task-to-neural associations.

      ● Al-Aidroos et al. (2012): Confronted with a similar problem of interpreting intrinsic correlations related to a goal (e.g., attending to scenes) from correlations related to synchronized stimulus-evoked responses. To mitigate this confound, they removed stimulus-evoked responses from the data resulting in “background connectivity” which was then used to assess inter-region coupling.

      ● Rissman et al. (2004): Introduced a new approach to characterize inter-region correlations during event-related activity by allowing inter-regional interactions to be assessed independent of activity at individual stages of a task.

      ● Norman-Haignere et al. (2012): To assess inter-region interactions (between fusiform gyrus and parahippocampal cortex), the authors removed the mean stimulus-evoked response and examined the correlations that occurred in the background of stimulus-locked changes (e.g., background connectivity).

      4) It is stated (pg. 21): "To reduce the influence of these noise correlations, we used a "crossed" approach to train the models: The cerebellar time series for the first session was predicted by the cortical time series from the second session, and vice-versa (see Figure 1). This procedure effectively negates the influence of noise processes, given that noise processes are uncorrelated across sessions." However, this does not appear to be strictly true, given that the task design (parts of which repeat across sessions) could interact with sources of noise. For example, task instruction cues (regardless of the specific task) likely increase arousal, which likely increases breathing and heart rates known to impact global fMRI BOLD signals. The current approach likely reduces the impact of noise relative to other approaches, but such strong certainty that noise processes are uncorrelated across sessions appears to be unwarranted.

      We completely agree. What we meant to say is that the procedure “negates the influence of any noise process that is uncorrelated with the tasks.” If we can predict the cerebellar activity patterns in session 2 by the cortical activity patterns measured in session 1, we can conclude that this prediction must be based on task-related signal changes given that the sequence of tasks is randomized. However, we do not know whether these task-related signals are caused directly by neural processes or indirectly by physiological processes (for example increased heart-rate in some conditions). The procedure only removes the influence of noise processes that are unrelated to the tasks. In our experience, these noise correlations can be quite strong and methods to remove them can introduce biases. For task-related noise processes we relied on high-pass filtering, a standard approach in task-based GLM approaches (see Methods).

      5) It appears possible that the sparse cerebellar model does worse simply because there are fewer predictors than the alternate models. It would be helpful to verify that the methods used, such as cross-validation, rule out (or at least reduce the chance) that this result is a trivial consequence of just having a different number of predictors across the tested models. It appears that the "model recovery" simulations may rule this out, but it is unclear how these simulations were conducted. Additional details in the Methods section would be important for evaluating this portion of the study.

      Our methods ensure full correction for model complexity (see response to major comment #2). Note that the sparse methods select regressors from all available cortical parcels; as such, “model complexity” is not well summarized by the number of non-zero regressors. We have now clarified these issues in the Methods section and have also revised the paper to better describe our model recovery simulations designed to address the issue of possible biases caused by different degrees of collinearity between cortical regressors.

      Reviewer #2 (Public Review):

      The human cerebellum likely has a significant but understudied contribution to cognition and behavior beyond the motor domain. Clarifying its functional relationship with the cerebral cortex is a critical detail necessary for understanding cerebellar functions. This paper addresses this challenge by testing three simple but intuitive models: winner-take-all, one-to-one model versus two converging input models. Results showed that the convergence model outperformed the one-to-one mapping model, indicating that cerebellar regions received multiple converging inputs from the different cortical regions. Overall the paper is well-written, and the results are clean and interesting. The methodological rigor of using cross-validation and generalization is also a strength of this paper.

      1) The authors concluded that some cerebellar regions receive converging inputs from multiple cortical regions because the Ridge and Lasso models outperformed the WTA model. The WTA model has a fixed diagonal pattern, in contrast, Ridge/Lasso models included more weights in the connectivity matrix. Considering what's being estimated in this matrix, then perhaps the findings are not surprising because even after penalizing and regularization, the ridge regression models are still more complex than the WTA model (more elements are allowed to vary). In other words, Lasso/Ridge models allow more variables from the X side to explain variances in Y, similar to how throwing in more regressors can always improve the R square. I am unsure if cross-validation mitigates this issue. It would be more straightforward for the authors to compare model performance in a way that controls for the number of variables in the Ridge/Lasso models.

      We now recognize that we could have done a better job in explaining our approach on this issue in the original submission. The models (including connectivity weights and regularization parameter) are trained solely on data from Task set A. They are tested on 2 independent datasets: 1) Data from the same participants performing novel tasks; 2) Data from new participants performing novel tasks. This allows us to compare models of different structure and complexity.

      2) The authors did an excellent job reviewing the anatomical relationship between the cerebral cortex and the cerebellum. There are several issues that the authors should address in the introduction or discussion. First, if the anatomical relationship between the cerebellum and the cortex is closed-loop as suggested in the intro, then how convergence can arise from multiple cortical inputs given there is no physical cross-talk? Second, there are multiple synapses connecting a cerebellar region and the cortex, and therefore could integration occur at other sites but not the cerebellum? For example, the caudate, the thalamus, or even the cortex (integrating inputs before sending to the cerebellum)?

      We agree that the correlation structure of BOLD signals in the neocortex and cerebellum is shaped by the closed-loop (bi-directional) interactions between the two structures. As such, some of the observed convergence could be caused by divergence of cerebellar output. We have added a new section to the discussion on the directionality of the model (Page 18).

      That said, there are strong reasons to believe that our results are mainly determined by how the neocortex sends signals to the cerebellum, and not vice versa. An increasing body of physiological studies (and this includes newer papers, see response to reviewer #1, comment #1 for details) show that cerebellar blood flow is determined by signal transmission from mossy fibers to granule cells and parallel fibers, followed by Nitric oxide signaling from molecular layer interneurons. Importantly, it is clear that Purkinje cells, the only output cell of the cerebellar cortex, are not reflected in the BOLD signal from the cerebellar cortex. (We also note that increases in the firing rate of inhibitory Purkinje cells means less activation of the neocortex). Thus, while we acknowledge that cerebellar-cortical connectivity likely plays a role in the correlations we observed, we cannot use fMRI observations from the cerebellar cortex and neocortex to draw conclusions about cerebellar-cortical connectivity. To do so we would need to measure activity in the deep cerebellar nuclei (and likely thalamus).

      The situation is different when considering the other direction (cortico-cerebellar connections). Here we have the advantage that the cerebellar BOLD signal is mostly determined by the mossy fiber input which, at least for the human cerebellum, comes overwhelmingly from cortical sources. On the neocortical side, the story is admittedly less clear: The cortical BOLD signal is likely determined by a mixture of incoming signals from the thalamus (which mixes inputs from the basal ganglia and cerebellum), subcortex, other cortical areas, and local cortical inputs (e.g., across layers). While the cortical BOLD signal (in contrast to the cerebellum) also reflects the firing rate of output cells, not all output cells will send collaterals to the pontine nuclei. These caveats are now clearly expressed in the discussion section2.

      On balance, there is an asymmetry: Cerebellar BOLD signal is dominated by neocortical input without contribution from the output (Purkinje) cells. Neocortical BOLD signal reflects a mixture of many inputs (with the cerebellar input making a small contribution) and cortical output firing. This asymmetry means that the observed correlation structure between cortical and cerebellar BOLD activity (the determinant of the estimated connectivity weights) will be determined more directly by cortico-cerebellar connections than by cerebellar-cortical connections. Given this, we have left the title and abstract largely the same, but have tempered the strength of the claim by discussing the influence of connectivity in the opposite direction.

      3) The dispersion metric quantifying the spread level in cortical inputs is interesting. Could the authors expand this finding and show anatomically what the physical spread is like in cortical space? The metric is novel but hard to interpret. A figure demonstrating the physical spread in the cortex should help readers interpret this result.

      Figure 3 (previously Figure 4) was included to provide examples of differences in the spatial spread of cortical inputs. For example, regions 1 and 2 are explained by a more restricted and spatially contiguous set of cortical inputs (e.g., primary motor cortices) whereas regions 7 & 8 are explained by a set of spatially disparate regions (e.g., angular gyrus, superior and middle frontal cortices, and superior temporal gyrus). Prompted by this comment, we have opted to reverse the order of Figures 3 and 4 to give the reader a chance to visualize differences in physical spread of cortical regions before we walk through the quantitative analysis.

      4) At the end of the discussion section, the authors discussed how results are more likely driven by cortical inputs to the cerebellum but not the other way around. This interpretation is likely overstated given the hemodynamic blurring and low temporal resolution of BOLD. Without a faster imaging sequence and accurate models that account for differences in hemodynamic properties, the more parsimonious interpretation is results are driven by bidirectional cortico-cerebellar interactions. The results are still very interesting without this added nuisance.

      Our analyses do not rely on the exact time course or delays between neocortical and cerebellar activation, but only on the activity profiles across a wide range of tasks. In terms of bidirectionality, please see our response above. We have added a dedicated section in the revised Discussion on this issue.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors sought to define the molecular mechanism of activation of the thrombopoietin receptor (TpoR), a very important cytokine receptor that regulates megakaryocyte differentiation and platelet production. They conducted a thorough series of experiments combining mutagenesis experiments with sophistical biological assays and that also includes solid-state NMR structural measurements. This work builds on a body of previous studies of TpoR from this group and from others. They focused both on (1) the role and impact of W515 located in the juxtamembrane cytosolic domain and (2) the impact of introducing either Asn at sites in the transmembrane domain to induce various dimerization modes, or insertion of pairs of Ala residues to induce helical rotation to the TM domain. There is a lot of nice data in this paper, which is fairly intricate - a tough read, but that's because it's a complicated system. The writing is excellent.

      This paper presents a model for receptor activation in which the inactive receptor is the monomeric form of the receptor in which the juxtamembrane domain, including W515, maintains a helical structure. Activation of the receptor triggers dimerization of the transmembrane domain and loss of helicity of the juxtamembrane segment, which facilitates optimal interactions of the kinase domains with their JACK2 domain phosphorylation substrates.

      There is a lot to like in this careful work and the resulting manuscript. There is one major shortcoming in this manuscript, which concerns W515. It is known that mutation of W515 to any of 17 of the canonical amino acids, including Phe, is sufficient to trigger homodimerization and receptor activation. The authors present some evidence that the phenomenon behind this is that mutation of W515 to almost any other residues disrupts the helical secondary structure of the critical juxtamembrane segment, which promotes dimerization and receptor activation. What I find puzzling is why a Trp at site 515 promotes helix formation, but nearly all other amino acids at this site disrupt helix formation. This strongly suggests the side chain of W515 must be interacting with another domain of the protein in the inactive state, in a manner that is responsible for how Trp stabilizes the juxtamembrane helix, which is a central feature that helps define that state. I think that for this paper, this dangling missing piece of their mechanistic model should be resolved.

      We agree with the reviewers that the mechanism by which Trp515 stabilizes the TM helix is central to the mechanism of activation. More broadly, our studies over the past decade have sought to address the importance of the entire RWQFP insert in the TM domain. Our working model for this sequence has been that cation-π interactions are central to the role of the Trp and the accompanying amino acids.

      Arginine and tryptophan both are over-represented at the cytoplasmic TM-JM boundaries of membrane proteins. Arginine is positively charged and part of the “positive-inside” rule for membrane protein insertion. Arginine and lysine define the cytoplasmic ends of TM helices and prefer to be accessible to the water-exposed membrane surface. In contrast, tryptophan residues prefer hydrophobic head-group or membrane interior locations. A revealing aspect of the RWQFP motif is that the arginine and tryptophan are located at the membrane to cytosolic border. As a result, in order to accommodate arginine in a more water-inaccessible membrane environment, it interacts with the surface of the tryptophan indole ring. Partitioning of the RWQF sequence in a more water-inaccessible environment also drives the formation of helical secondary structure as an unpaired backbone C=O...NH in a hydrophobic environment is estimated to cost 3-6 kcal/mol of energy.

      We have taken two approaches in respond to this essential criticism of the reviewers: one structural and one computational. Additional NMR data (structural approach) has been included in the supporting information (see response to point 2 below). Computational approaches provide a second way to address whether a cation– interaction between Trp515 and the positively charged Arg514 is responsible for stabilizing the C-terminal TM helix. We have included a new supporting figure using Alpha-Fold 2.0 that probes the structural changes upon mutation of Trp515. In the wild-type receptor, Arg514 is predicted to form a cation– interaction with Trp515. In the W515K mutant, the helical secondary structure in the RKQFP sequence is disrupted and Arg514 forms a new cation– interaction with Trp529. Similar changes occur in other Trp515 mutants (e.g. W515A) highlighting the ability of Alpha-Fold to predict such interactions and the consequences of mutation. Overall, 15 out of 19 W515X mutants are predicted to be unfolded. Experimentally, 17 out of 19 mutations lead to activation. Importantly, W515C and W515P are the only two amino acid substitutions that do not cause constitutive activity experimentally (Defour, Chachoua, Pecquet, & Constantinescu, 2016). Computationally, these two sites do not predict helix unraveling. In short, the overall the predictions of Alpha-Fold agree with the unique nature of tryptophan at position 515.

      In addition, we have expanded the arguments supporting the potential role of cation–π interactions by adding a new section entitled “Unfolding of the RWQF -helical motif is a common mechanism of receptor activation”.

      These modifications are now in the revised manuscript starting with line 213:

      Our working model for the mechanism of activation in the wild-type or mutant receptors is that the RWQF motif is stabilized in the inactive state as an -helix as a result of a cation- interaction between R514 and W515. This interaction allows the RWQF sequence to partition into the more hydrophobic head-group region of the bilayer. Both Arg and Trp are over-represented at the cytoplasmic ends of TM helices (von Heijne, 1992), but whereas Arg prefers a water-accessible environment, Trp prefers to be buried in a more hydrophobic environment (Yau, Wimley, Gawrisch, & White, 1998). Since Arg and Trp are located at the border between membrane and cytosolic domains and Arg precedes Trp in the sequence, partitioning into the membrane head-group region results in a favorable interaction of the positive charge associated with the guanidinium group of the R514 side chain with the partial negative charge associated with the aromatic surface of the W515 side chain. Partitioning of the RWQF sequence into the more water-inaccessible environment drives the formation of helical secondary structure as an unpaired backbone C=O...NH in a hydrophobic environment is estimated to cost 6 kcal/mol of energy (Engelman, Steitz, & Goldman, 1986). In this model, activation of the receptor results in or is caused by disruption of the R514-W515 cation-π interaction. In the W515 mutants, R514 is no longer stabilized in a membrane environment and the helix containing the RWQFP sequence unravels to allow the positively charged side chain to reach outside of the membrane. In the case of the Asn mutants and in the wild-type receptor with bound Tpo, dimerization of hTpoR (or rotation of the TM helices in mTpoR dimer), places W515 in the center of the helix-helix interface. The data suggest that a steric clash of the W515 side chains results in unraveling of the cytoplasmic end of the TM helix.<br /> Computational and additional NMR data are provided in the supplementary figures to support the model of helix unraveling suggested by the solid-state NMR studies. Computationally, we used AlphaFold 2.0 (Jumper et al., 2021) calculations of hTpoR TM-JM peptides to predict the influence of all possible mutations at position 515 on the TM-JM helix structure. Remarkably, -helix unraveling was predicted for 15 out of 20 possible amino acids at 515 (supplement 2 to Figure 3). Importantly, two of the mutations that are not predicted to cause helix unraveling are W515C and W515P. Experimentally, these two amino acid substitutions are the only ones that do not induce constitutive activity among all possible amin oacid substitutions at W515 (Defour et al., 2016). Introducing a Trp at the preceding position 514 instead of R/K in W515K/R mutants reverses helix unfolding in AlphaFold simulations (supplement 3 to Figure 3). This result agrees with our previous data that the WRQFP mutant is inactive and is essentially monomeric (J. P. Defour et al., 2013). Structurally, we have undertaken solution-NMR studies of the wild-type hTpoR TM-JM peptide and its W515K mutant. Relaxation measurements of the backbone 15N resonances show that W515K mutation leads to association of the TM helices, and that it induces upfield chemical shift changes in the RWQF sequence consistent with helix unraveling (supplement 1 to Figure 3).

      Reviewer #2 (Public Review):

      The thrombopoietin receptor (TpoR) regulates stem cell proliferation, platelet production, and megakaryocyte differentiation. Past cell biology and biophysical studies have established that ligand-induced dimerization constitutes the mechanism of activation of TpoR. Specifically, ligands bind to the extracellular domain of TpoR and generate an allosteric response that is transmitted to the transmembrane domain, activating downstream signaling. However, up to now the molecular details of how the allosteric signals are transmitted to the intramembrane domains have been elusive. In this manuscript, Constantinescu and co-workers combined NMR, in vitro, and in vivo assays to investigate the activation and oncogenicity of TpoR. The authors concluded that the unwinding of the juxtamembrane domain is the main structural event that determines TpoR activation and regulates oncogenicity. The solid-state NMR studies were carried out in lipid membranes with polypeptides spanning the juxtamembrane and transmembrane residues. The authors show a series of spectra of 13CO resonances that encompass the juxtamembrane domain that is diagnostic of a structural transition from a helical conformation to a partially disordered state. The unwinding of the helical juxtamembrane domain was confirmed by site-specific mutations in this region. The chemical shift changes clearly indicate the transition from order to disorder (and vice versa) for selected sites. These conclusions are compounded by INEPT-type experiments that detect the most dynamic region of polypeptides. To rationalize the molecular mechanism for activation, the authors also used Ala-Ala insertions at strategic positions along the transmembrane domain. These experiments showed that the specific orientation of the transmembrane residues is central for TpoR activation, and a slight rotation of the helix is critical for activation of the receptor. Transcriptional activity assays confirm the importance of the proper orientation of the transmembrane domain for receptor activation.

      Overall, I believe the data are solid, and both biophysical and cell biology studies support the conclusions of the authors. These new findings represent a significant advancement in understanding cytokine receptor activation.

      We thank the reviewer for these comments.

      Reviewer #3 (Public Review):

      The authors sought to propose a mechanism by which cancer-causing mutations in the thrombopoietin receptor (TpoR) activate the receptor. To do so, they used a systematic approach of introducing non-native and naturally occurring mutations into the receptor and use a combination of in-vivo and cell-based assays and solid-state NMR spectroscopy. They propose that the proximity of the asparagine mutations to the cytosolic boundary influences the secondary structure of the receptor and suggests that this structural change induces receptor activation.

      The strengths of this work are the importance of the system being studied and tackling a problem that is not yet fully resolved. The authors acquired a large and convincing set of biological data, including in vivo experiments that support the gain-of-function/activating role of the mutations studied. The solid-state NMR data are of high quality as well. In particular, the INEPT data in figure 6a display very clear differences within one region of the wild-type compared to the mutants.

      One significant weakness is the validity of the conclusions given the limited atomistic measurements presented. Namely, the authors make rather specific conclusions about protein folding based on a single set of 13C alanine carbonyl chemical shifts in the wild-type and mutant TM peptides. Essentially, the authors observe chemical shift perturbations at this carbonyl carbon when mutations are introduced into a protein and use this information to make conclusions about secondary structure. I am not convinced that the authors have presented sufficient evidence to justify the conclusion that the helix unwinds and that this is responsible for the mechanism of activation. While the other cell-based experiments in mutations are interesting, deciphering such a specific folding mechanism with limited atomistic data is not justified.

      We added both computational data and solution NMR to support our conclusion.

    1. Author Response

      Reviewer #1 (Public Review):

      Proton pumps are necessary to set up gradients necessary for myriad biological processes. The malaria-causing parasite Plasmodium falciparum, uses two main pathways to achieve this, the vacuolar ATPase (V-type ATPase) and a more ancient vacuolar pyrophosphatase (PfPV1). The proton motive force set up across the parasite plasma membrane holds particular significance since it is necessary for transport of nutrients and waste products into and out of the cell. Motivated by the observation that the V-type ATPase is no expressed until several hours after the parasite has entered host cells, the present study examines the function of PfVP1. The authors demonstrate PfVP1 depletion blocks the early development of Plasmodium-specifically the transition from the ring to the trophozoite stage-and this is associated with changes to cellular pH and pyrophosphate levels, consistent with predicted functions. Complementation of the conditional knockdown suggests that pyrophosphatase activity alone is not sufficient to overcome the loss of PfVP1. Overall, data supporting a critical role for PfVP1 in parasite energetics is compelling. However, the lack of several key controls somewhat weakens the conclusions of the paper when it comes to complementation of the mutants and description of which activities are needed for parasite survival. Because the proximal activities of the enzyme ATP generation and the proton motive force are incompletely examined, some of the major conclusions from the study remain speculative.

      We thank the reviewer for these constructive comments. We are grateful to the reviewer for his/her recognition of the significance of our study. The major discovery of this manuscript is to uncover PfVP1’s essential role in the early-stage development of the 48h asexual lifecycle in P. falciparum. Our data suggest PPi is an energy source when ATP level is likely low in the ring stage malaria parasite and its transition to the trophozoite stage. We have performed additional experiments and tried the best to address each comment from the reviewer.

      Reviewer #2 (Public Review):

      In this work, the authors characterize a proton pump from the parasite Plasmodium falciparum that uses pyrophosphate as an energy source (PfVP1).

      They looked at the expression and localization of the pump in different stages of the parasite and determined that it localizes to the plasma membrane and it is highly expressed in the ring stage. They studied the biochemical function by expressing the gene in Saccharomyces followed by isolation of vesicles and measurements of proton transport and PPi hydrolysis. They also characterized the biological role of PfVP1 in the parasites by creating conditional mutants that express PfVP1 when cultured in the presence of anhydrotetracycline (ATC). Upon removal of ATC the expression of PfVP1 is downregulated, which impacted growth and transition to the trophozoite stage. Mutant parasites struggled to progress through the ring state and failed to become trophozoites in the second intraerythrocytic cycle. They complemented the mutants with the yeast inorganic pyrophosphatase gene and the Arabidopsis vacuolar pyrophosphatase.

      We thank the reviewer for positive and constructive comments. We have seriously worked on every comment raised by the reviewer. We have tried the best to perform additional experiments.

      Reviewer #3 (Public Review):

      Solebo and coworkers investigated the energy requirements of blood-stage malaria parasites (the stage of infection that causes symptoms). Traditionally, parasites were thought to be somewhat quiescent during the first half of their life cycle in red blood cells and become metabolically active as they prepare for replication. Consequently, antimalarial drugs are more active against parasites during the second half of their life cycle. In this report, the authors show that the metabolic by-product pyrophosphate is an essential energy source for the development of early-stage malaria parasites and that it is consumed by a vacuolar pyrophosphatase (PfVP1). Knock down studies showed that PfVP1 is required for the development of early-stage parasites and localization studies established that it is located in the parasite plasma membrane. Characterization of PfVP1 heterologously expressed in yeast confirmed that it is a pyrophosphate hydrolyzing proton pump. Consequently, loss of PfVP1 in early-stage parasites results in reduced pyrophosphate consumption and a reduction in pH (accumulation of protons). The authors further show that a similar vacuolar pyrophosphatase from Arabidopsis thaliana can complement the loss of the parasite ortholog, but a general pyrophosphatase enzyme cannot. Consistent with this result, mutations designed to inactivate either the pyrophosphatase activity or the proton-pumping activity demonstrated that both activities are essential for the development and survival of early-stage parasites.

      The conclusions of this paper are firmly supported by data, often from more than one type of experimental approach. The conclusions provide fundamental information about the stage of parasite development that has been hard to target with antimalarial drugs. The most energy-consuming process in a cell is the maintenance of membrane potential and in malaria parasites, it is known that proton pumps (rather than sodium pumps) are responsible for this process. Although PfVP1 was previously reported to be located internally in an organelle of the parasite, the data presented in this report clearly define its location on the plasma membrane and its essential role in maintaining the membrane potential. PfVP1 inhibitors could preferentially target early stage malaria parasites and the current results support efforts to find these inhibitors. Perhaps the most exciting aspect of this work is the potential to act synergistically and enhance the effect of current antimalarial drugs on early stage parasites. In this vein, the authors tested four antimalarial compounds in conjunction with knockdown of PfVP1 to determine whether there was enhanced activity. These experiments were not conducted in a systematic way and this is perhaps the only weakness of the paper.

      We thank the reviewer for positive, constructive, and encouraging comments. We really appreciate that. We are also very excited about our discovery that a non-ATP driven proton pump plays essential roles in the early-stage development of the asexual lifecycle. Our data suggest PPi is an energy source in the malaria parasite P. falciparum.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Xu et. al. does a very thorough characterization and molecular dissection of the role of SSH2 in spermatogenesis. Loss of SSh2 in germ cells results in germ cell arrest In step2-3 spermatids and eventually leads to germ cell loss by apoptosis. Molecular characterization of the mutant mice shows that the loss of SSH2 prevents the fusion of proacrosomal vesicles leading to the formation of a fragmented acrosome. The fragmentation of the acrosome is due to the impaired actin bundling and dephosphorylation of COFILIN. In short, this is a comprehensive body of work.

      We thank the referee for these insightful comments.

      Reviewer #2 (Public Review):

      The acrosome is a unique sperm-specific subcellular organelle required for the fertilization process, and it is also an organelle undergoing extensive morphological and structural transformation during sperm development. The mechanism underlying the extensive acrosome morphogenesis and biogenesis remains incompletely understood. Xu et al in their manuscript entitled "The Slingshot phosphatase 2 is required for acrosome biogenesis during spermatogenesis in mice" reported that the Slingshot Phosphatase 2 is essential for acrosome biogenesis and male fertility through their characterization of spermatogenic and acrosomal defects in Ssh2 knockout mice they generated. Specifically, the authors provided molecular, genetic, and subcellular evidence supporting that Ssh2 mutation impaired the phosphorylation of an acting-binding protein, COFILIN during spermiogenesis and accordingly actin cytoskeleton remodeling, crucial for proacrosomal vesicle trafficking and acrosome biogenesis. The manuscript by Xu et. al. does a very thorough characterization and molecular dissection of the role of SSH2 in spermatogenesis. Loss of SSh2 in germ cells results in germ cell arrest In step2-3 spermatids and eventually leads to germ cell loss by apoptosis. Molecular characterization of the mutant mice shows that the loss of SSH2 prevents the fusion of proacrosomal vesicles leading to the formation of a fragmented acrosome. The fragmentation of the acrosome is due to the impaired actin bundling and dephosphorylation of COFILIN. In short, this is a comprehensive body of work.

      We appreciate and thank Referee #2 for the positive feedback and insightful comments.

      Strengths:

      Nicely written manuscript, addresses an important mechanistic question of the roles of cytoskeleton remodeling in acrosome biogenesis and provided genetic, subcellular, and molecular evidence to build up their support for their hypothesis that Ssh2 regulates actin cytoskeleton remodeling, a process essential for proacrosomal vesicle trafficking and acrosome biogenesis, through dephosphorylation actin-binding protein during spermiogenesis.

      We again thank to the Referee #2 for appreciating and encouraging us regarding our current research work.

      Weaknesses:

      For body weight, and testis weight of the mutants, the authors concluded that there is no significant difference between the mutant and wildtype (Fig 1E -1G), but they appear to use mice between 6-8 wk old, both the testis and body weight of males at 6-8 wks is still growing, with the number of mice analyzed being six, you could easily miss the significant difference of the testis size and or body weight with such a varied age and a small sample size.

      We thank the referee for their prompting of this important discussion point, which we now cover in our revised manuscript. In our originally submitted manuscript, we only presented the data for body weight, testis weight, and T/B ratio for mice between the age of 6–8 weeks, however, we have added the additional data of mice with age more than 8 weeks in the revised manuscript in a new Figure 1E-1G with the sample size of 12 for each genotype. We have also updated the relevant content in the figure caption. The revised figure caption for Figure 1 panels E–G reads as follows: “(E-G) Body weights (26.3609 ± 0.4914 for WT; 25.1741 ± 0.5189 for Ssh2 KO), weights of the testes (0.0862 ± 0.0036 for WT; 0.0788 ± 0.0023 for Ssh2 KO), and the testis-to-body weight ratio (0.3281 ± 0.0153 for WT; 0.3154 ± 0.0135 for Ssh2 KO) of adult WT and Ssh2 KO males (n = 12). Data are presented as the mean ± SEM; p > 0.05 calculated by Student’s t-test. Bars indicate the range of the data.”

      Other points:

      Comments: 1) Could the uniform cytoplasmic distribution of diminutive actin filaments in the wild type and disrupted actin filament remodeling be examined at the EM level on the round spermatids?

      We apologize for the confusion. Previously, we conducted a transmission electron microscopy (TEM) analysis on the testes samples to discover the distribution and ultrastructural organization of F-actin in WT and Ssh2 KO round spermatids. Unfortunately, even at high magnification (30,000x, right panel of Figure R1-Response Figure 1) by TEM of testicular section no diminutive actin filament was observed in the cytoplasm of round spermatids except for the acroplaxome-an actin-rich specialized structure anchors the acrosome-in WT spermatids as well as some thick bundle-like structures located at the acrosomal region of Ssh2 KO spermatids (Fig. R1). According to their unique characteristic of appearance, we interpreted these electron-dense bundles as the aberrantly aggregated actin filaments whose lengths are in accordance with the lengths of COFILIN-saturated F-actin fragments (Bamburg et al., 2021), suggesting the disrupted actin filament remodeling during acrosome biogenesis resulted from Ssh2 KO. However, due to the technological limitations of TEM and the complexity of intracellular environment of round spermatids, we only recognized few aggregated actin bundles with the loss of filamentous appearance in Ssh2 KO spermatids and no typical diminutive actin filament was detected which had been imaged under high-resolution cryo-TEM (Haviv et al., 2008) or live-cell total internal reflection fluorescence microscopy (Johnson et al., 2015) on the purified actin bundles and cultured cells. Given the lack of effective approaches to culture murine round spermatids in vitro, confocal microscopy of flourescence-labelled F-actin (e.g., IF staining by FITC-phalloidin) is a more accessible method for visualizing the disruption of actin remodeling than EM in murine spermatids as the actin-related findings that several other studies demonstrated (Djuzenova et al., 2015; Meenderink et al., 2019).

      Comments: 2) Any other defects are seen besides acrosome in the mutant testis given the important roles of actin cytoskeleton network and high expression of Ssh2 in spermatocytes, were chromatoid bodies or mitochondria affected in any way? Any other defects in the mice overall including female fertility and other organs, given the previously reported roles in the nervous system. It could be helpful information for others interested in Ssh 2 protein and actin cytoskeleton's roles in general.

      The referee has here raised an interesting point. Firstly, besides the acrosome-related defects in Ssh2 KO spermatids, we identified increased germ cell apoptosis and aberrant activation of apoptotic Bcl-2/Caspase-3 pathway in the testes of Ssh2 KO mice which were speculated to be triggered by the disordered COFILIN-mediated F-actin remodeling and have attracted our attention to further elucidate the underlying mechanisms in the future. Secondly, given the high expression of SSH2 in spermatocytes demonstrated by IF staining shown in figure 4B and 4C,we thus performed the surface chromosome spreading on spermatocytes to observe whether the morphology of chromatid bodies and the meiotic progression was affected by Ssh2 KO and no obvious defects were observed as shown in supplementary Figure S3 in originally submitted manuscript. Thirdly, no obvious morphological abnormality in chromatin or mitochondrial structure was detected in Ssh2 KO germ cells such as spermatocytes and round spermatids under TEM which prevents us to pursue it further. Fourthly, we have observed the potential effect(s) of Ssh2 KO on female fertility using Ssh2 KO female mice and did not find any obvious infertility defect in Ssh2 KO females compared to their WT littermates as demonstrated by the data of the body weight, ovary weight, ovary-to-body weight ratio, size of ovaries and fertility test as well as the images of ovarian HE staining (Fig. R1). Moreover, given that during our investigation period, Ssh2 KO males and females did not manifest any defective physical development, aberrant physiological status or mental disorder notwithstanding the roles of SSH2 in neurite extension had been reported (Endo, Ohashi, & Mizuno, 2007), we did not conduct the experiments to observe the effect(s) of SSH2 in other organs except for the female fertility.

      Fig. R1 No reproductive defects were found in Ssh2 KO females. (A-C) Body weights, weights of the ovaries, and the ovary-to-body weight ratio of adult WT and Ssh2 KO females aged 8-10 weeks (n = 5); p > 0.05 calculated by Student’s t-test. Bars indicate the range of data. (D) The size of ovaries from Ssh2 KO were indistinguishable from ovaries of WT mice age 8 weeks, n = 4. (E) Histology of the ovaries from WT and Ssh2 KO mice. Sections were stained with hematoxylin and eosin. Scale bars: 200 μm. Images are representative of ovaries extracted from 8-week-old adult female mice per genotype. (F) Number of pups per litter from WT and Ssh2 KO male mice (8 weeks old) after crossing with WT adult male mice (n =3); p > 0.05 calculated by Student’s t-test. Bars indicate the range of the data.

      Comments: 3) Providing detailed information on the number of animals used and cells analyzed in the legend is nice, but it might be even better for the readers to include sample size and the number of cells examined in the figure/graph if possible.

      We appreciate the suggestions from the reviewer. We have integrated some information of sample size in the figures where appropriate. Firstly, we integrated sample size in the figure 1C, 1E, 1F, 1G and 1I. Secondly, we included sample size and the number of seminiferous tubule/epididymal duct we evaluated for TUNEL (+) cell counting in figure 2C and figure 2D. Thirdly, we included sample size and the number of spermatids for co-localization in figure 6B and figure 6D.

      Comments: 4) Nice discussion and comparison with GOPC and GM130, how about comparison and discussion with other acrosome defective mutants like PICK1, and ATG to provide some insights into acrosome biogenesis and proacrosomal vesicle trafficking?

      We greatly appreciate the referee for positive appraisal of our work with constructive suggestions, unfortunately, we are unable to address these defective mutants with certainty due to the lack of proper sample accessibility (only 3 of 16-month-old Ssh2 KO mice are accessible now). We compared the cytological staining of GM130 and GOPC in WT and Ssh2 KO spermatids using tubule squash sections as the description in the originally submitted manuscript which are prepared from fresh testes originated from 8-week-old mice and we now have several aged Ssh2 KO mice which prevent us to achieve the staining of PICK1 and ATG. PICK1 was previously reported to facilitate vesicle trafficking from the Golgi apparatus to the acrosome which co-localizes with GOPC in the proacrosomal granules (Xiao et al., 2009) and the phenotypes of Pick1 KO mice share a lot of similar characteristics with that of Ssh2 KO mice such as the fragmentation of the acrosome and increased germ cell apoptosis. Both autophagy-related ATG5 (Huang et al., 2021) and ATG7 (Wang et al., 2014) were reported to participate in the process of acrosome biogenesis and ATG7 is required for proacrosomal vesicle transportation/fusion by conjugating LC3 to the membrane of proacrosomal vesicles. Although the spermatids evaluated in these KO mice models could still be developed into spermatozoa with defective acrosome that is different from the situation in Ssh2 KO mice, it would be meaningful to discover the affects by Ssh2 KO on the localization of these regulators of acrosome biogenesis in spermatids and their potential interactions with SSH2. Indeed, in future work, we plan to pursue these issues and the content related to PICK1 has been added to the discussion in the revised manuscript as follows: “Moreover, it is intriguing to note that the phenotypes of Ssh2 KO mice share a lot of similarities with that of Pick1 KO model (Xiao et al., 2009) such as acrosome fragmentation and enhanced germ cell apoptosis, suggesting the possibility that SSH2 and PICK1 work together in a same trafficking machinery functioning in acrosome biogenesis which needs to be clarified further.”

      Comments: 5) Given the literature on Cofilin's requirement for male fertility and the increased p-Cofilin in Ssh2 mutant testis by Western and IF, the authors have a strong case for their hypothesis. But given the general role of phosphatase, it might be prudent to discuss alternative possibilities.

      We thank the reviewer for these valuable suggestions. Given that p-COFILIN is the only known substrate of SSH2 based on previous reports, we focused principally on this cascade to conduct our investigation. As a phosphatase, SSH2 is very likely to interact with many other proteins functioning in various cellular processes other than the actin-binding proteins which remain elusive. As directed, we now have added some content related to the regarding above concern in the discussion section of the revised manuscript as follows: “Given the diverse physiological roles reported for Slingshot family proteins, the possibility of the alternative mechanism underlying involvement of SSH2 in cellular events beyond the COFILIN-mediated actin remodeling should be noted. According to some publicly accessible databases as the indicators of potential protein–protein interactions such as BioGRID (Oughtred et al., 2019) and IntAct (Del Toro et al., 2022), SSH2 might interact with a set of actin-based molecular motors covering MYH9, MYO19 and MYO18A, which have been implicated in the maintenance of Golgi morphology and Golgi anterograde vesicular trafficking via the PI4P/GOLPH3/MYO18A/F-actin pathway (Rahajeng et al., 2019).”

    1. Author Response

      Reviewer #1 (Public Review):

      Voltage-clamp fluorometry combines electrophysiology, reporting on channel opening, with a fluorescence signal reporting on local conformational changes. Classically, fluorescence changes are reported by an organic fluoropohore tethered to the receptor thanks to the cysteine chemistry. However, this classical approach does not allow fluorescent labeling of solvent-inaccessible regions or cytoplasmic regions. Incorporation of the fluorescent unnatural amino acid ANAP directly in the sequence of the protein allows counteracting these limitations. However, expression of ANAP-containing receptors is usually weak, leading to very small ANAP-related fluorescence changes (ΔFs).

      In this paper, the authors developed an improved method for expression of full-length, ANAP-mutated proteins in Xenopus oocytes. In particular, they managed to increase the ratio of full-length over truncated proteins for C-terminal ANAP incorporation sites. Since C-terminally truncated P2X receptors are usually functional, it is important to maximize the full-length over truncated protein ratio to have a good correspondence between the observed current and fluorescence. Using their improved strategy, they screened for ANAP incorporation sites and ATP-mediated ANAP ΔFs along the whole structure of the P2X7 receptor: extracellular ligand binding domain (head domain), M2 transmembrane segment (gate), as well as a large extracellular domain specific for the P2X7 subtype, the "ballast" domain. The functional role of this domain and its motions following ATP application are indeed unknown. Monitoring ANAP fluorescence changes in this region following ATP binding provides a unique way to study those questions. By analyzing ATP-induced ΔFs from different parts of the receptors, the authors conclude that the ATP-binding domain mainly follows gating, while intracellular "ballast" motions are largely decoupled from ATP-binding

      Strengths of the paper:

      This paper provides an improved method for efficient unnatural amino acid incorporation in Xenopus oocytes. Thanks to this technique, they managed to enhance membrane expression of ANAP-mutated P2X7 receptors and observed strong fluorescent changes upon ATP application. The paper furthermore describes an impressive screen of ANAP-incorporation sites along the whole protein sequence, which allows them to monitor conformational changes of solvent-inaccessible regions (transmembrane domains) and cytoplasmic regions that were not accessible to cysteine-reactive fluorophores. This screen was performed in a very thorough manner, each ANAP mutant being characterized biochemically for membrane expression, as well as in term of fluorescence changes. The limitations of the approach -small ΔF upon ATP application on wt receptors, problem of baseline fluorescence variations in presence of calcium- are well explained. Overall, this study should thus not only serve as a guide to anyone willing to perform VCF on P2X7 receptors but it should be useful to the whole community of researchers using unnatural amino acids. Thanks to orthogonal labeling with TMRM and ANAP, the authors managed to simultaneously monitor the motions of the extracellular and intracellular domains of P2X7. Finally, they propose methods to simultaneously monitor intracellular domain motion and downstream signaling.

      Weaknesses:

      Although the fluorescence screen is impressive and well conducted, the biological conclusions remain superficial at this stage. The paper furthermore lacks quantitative analysis. Finally, the title only reflects a minor part of the paper and is therefore not representative of the paper content.

      Quantitative analyses (DRCs and current rise times) were now added for the key mutations. In addition, we performed a variety of experiments to address the challenging question of mechanistic insight (mutants that track facilitation) and effects of intracellular factors (mutation of calmodulin binding site, FRET experiments with calmodulin). These data confirmed that deletion of a cysteine-rich intracellular region eliminates current facilitation (Roger et al., 2010) and that some of our mutants indeed track facilitation. However, mutation of the CaM binding site and FRET experiments did not support an effect of calmodulin or were inconclusive. As pointed out above, we think that VCF has limited capacity to identify novel biologically relevant consequences of receptor activation but is more suited to determine the sites and dynamics of already defined interactions.

      The title was changed to: "Improved ANAP incorporation and VCF analysis reveals details of P2X7 current facilitation and a limited conformational interplay between ATP binding and the intracellular ballast domain"

      Reviewer #2 (Public Review):

      The authors aimed to elucidate the structural rearrangements and activation mechanisms of P2X7 upon ATP application by voltage clamp fluorometry (VCF) using fluorescent unnatural amino acid (fUAA) and other fluorophores. They improved the fUAA methodology and detected ATP binding evoked changes in the ATP binding region and other regions. They also observed facilitation of fluorescence (F) changes by repeated application of ATP associated with gating. The F change in the cytoplasmic ballast region was minor, and with their experimental data, they discussed this region is involved in activation by other cytoplasmic factors, such as Ca2+.

      The strengths of the study are as follows.

      (1) fUAA methodology was improved to enable experiments by one time injection to oocytes (Figs. 1 and Suppl).

      (2) They performed intensive mutagenesis study of as many as 61 mutants (Figs. 3, 4, 5).

      (3) A careful evaluation of the successful Anap incorporation and formation of full length proteins was performed by western blot analysis (Fig. 2).

      (4) By three wave lengths F recording, they obtained better information, i.e. they classified the interpretation of F changes to, quenching, dequenching, increase in polarity and decrease in polarity (Fig. 3E).

      (5) They detected F changes upon ATP application in various regions of P2X7, but not many in the ballast region, showing that the ballast region is not well involved in the ATP evoked gating.

      (6) They analyzed the kinetics of F and current and their changes upon repeated ATP application to approach the known facilitation mechanisms. The data are very interesting. They concluded that it is intrinsic to the P2X7 molecule and that it is associated not with the ATP binding but with the gating process (Figs. 3F, 4D, 6A).

      (7) They performed interesting analysis to clarify the mechanisms of activation by cytoplasmic factors, especially Ca2+ entered via P2X7 (Fig. 6).

      The weaknesses of the study are as follows.

      (1) As both structures of P2X in the open and closed states are already solved, and the ATP binding evoked structural rearrangements from the ATP binding site to the gate are already known in detail. The structural rearrangements detected in the extracellular region (Fig. 3) and TM region (Fig. 4) upon ATP application are just as expected. The impact and scientific merits of this part are rather limited.

      We generally agree that the cryo-EM structures clarified basic principles of receptor function. However, considering the specific features of the P2X7 receptor and its likely regulation/modulation by membrane components and environment and the fact that the actual states of the receptor structures (e.g. facilitated or not?) is not known, we think that VCF analysis of its dynamics in a more native cellular environment is still required to confirm the predicted motions and also has the potential to identify details of "P2X7 fine tuning".

      (2) The facilitation mechanism is of high interest. The authors showed it is intrinsic to P2X2 and associated with the gating rather than ATP binding. However, this reviewer cannot have better understanding about the actual mechanism. (a) What is the mechanistic trigger of facilitation? Possibilities are discussed, but it appears there is no clear answer with experimental evidences yet. (b) How is the memory of the 1st ATP application stored in the molecule, i.e. how does the P2X7 structure just before the 1st application differ from that just before the 2nd application of ATP?

      These are indeed fundamental questions but based on the available information we do not see a rational approach to address this issue any further. Additional extensive "screening" for ideal fluorophore positions would probably be required and is beyond our possibilities in the present study.

      (3) The structural rearrangement of the CaM-M13 region (Fig. 6B, C) attached at the C-terminus by Ca2+ influx through P2X7 upon ATP application is natural due course and not very surprising. Also, it is not accepted as an evidence proving that Ca2+ is the mediator of facilitation.

      We apologize, this is a misunderstanding. We only provided protocols for parallel recordings of ANAP with other fluorophores for further analysis of downstream signaling pathways but we did not show or propose any functional consequences of the Ca2+ influx (see also point 7 above).

      (4) As to the ballast region, data showed its limited involvement in the ATP-induced structural rearrangements. The function of the ballast region is not clear yet. A possible involvement in GDP binding and/ or metabolism is discussed, but there is no clear experimental evidence.

      We are aware of these limitations. In the absence of a clear fluorescence change around the GTP/GDP-binding site or information about its role, it is difficult to investigate its molecular function by VCF. The fact, that (un-)binding of the guanosine nucleotide does not seem to be related to channel opening (McCarthy et al., 2019) further limits our options to study its function and currently it is not even known whether GDP/GTP has just a structural role. However, we identified A564* as a potential reporter for yet undefined processes that might affect GTP/GDP binding and/or metabolism.

      Reviewer #3 (Public Review):

      This research contributes to optimizing the amber stop-codon suppression protocol for voltage-clamp fluorometry (VCF) experiments using Xenopus oocyte heterologous expression system. By in vitro RNA synthesizing the tRNA and tRNA synthetases, combined with the dominant-negative release factor initially developed by Jason Chin's lab, L-Anap can be site-specifically labeled to proteins by a single microinjection of a mixture of molecular components into the cytoplasm of oocytes. Although it avoids nuclear microinjection to oocytes, it adds more RNA synthesis steps. This strategy of using eRF dominant negative variant (eRF1-E55D), was previously applied to the Anap incorporation system using mammalian cell lines and model proteins (Gordon et al, eLife, 2018). In this previous 2018 paper, with eRF1-E55D, the percentage of full-length protein expression increased substantially. Using oocytes in this paper, this percentage apparently did not increase significantly as shown in Fig. 1D, different from the previous paper. Nevertheless, the overall expression level increased successfully by this method, which could facilitate macroscopic fluorescence measurements, especially considering that L-Anap is relatively dim as a fluorophore.

      Anap fluorescence change was measured mostly using its environmental sensitivity, which has limited information in interpreting structural changes. The structural mechanisms proposed could be potentially strengthened and the conclusions could be further validated by combining FRET or other distance ruler experiments with the VCF method. The engineered CaM-M13 FRET experiments mostly report the calcium entry, not measuring the rearrangements of P2X7 directly.

      We tried FRET analyses with ANAP-labeled P2X7 and mNeonGreen-labeled CaM but unfortunately, results were inconclusive.

      In addition, results of ATP dose-response relationship for channel activation correlated with ATP dose-dependent Anap fluorescence change, especially for sites showing a large percentage of ATP-induced change in fluorescence, would provide more insights regarding the allosteric mechanism of the channel.

      We agree, but unfortunately, bleaching of ANAP and the variation of background fluorescence in individual oocytes prevented such analyses .

    1. Author Response

      Reviewer #2 (Public Review):

      Weaknesses:

      1) The relevance of the LPS-induced calvarial osteolysis model is not clear. Calvaria is mostly composed of cortical bone-like structures lacking marrow space, though small marrow space exists near the suture. Osteolysis appears to occur in areas apart from where marrow is located. The authors did not show in the manuscript which cells Adipoq-Cre marks in the calvaria.

      We have shown in a recent publication that MALPs exist in the calvarial bone marrow (2). As shown in Fig. R1A, Td+ cells are layer of cortical bone (Fig. R1B, blue arrows). In WT mice, after LPS injection, the normal bone structure, including suture and cortical bone, were mostly eroded, and filled with inflammatory cells (green arrows). Thus, osteolysis does occur at the area where bone marrow is originally located. On the contrary, calvarial bone structure was preserved in the CKO mice, demonstrating that Csf1 deficiency in MALPs suppresses LPS-induced osteolysis. We included the H&E staining data in the revised manuscript:

      "H&E staining showed that calvarial bone marrow is surrounded by a thin layer of cortical bone (Fig. 5C). After the LPS injection, normal calvarial structure, including suture and cortical bone, were mostly eroded and filled with inflammatory cells in WT mice, but unaltered in CKO mice."

      Figure R1. Calvarial bone marrow structure. (A) Representative coronal section of 1.5-month-old Adipoq/Td mouse calvaria. Bone surfaces are outlined by dashed lines. Boxed areas in the low magnification image (top) are enlarged to show periosteum (bottom left), suture (bottom middle), and bone marrow (BM, bottom right) regions. Red: Td; Blue: DAPI. Adopted from our previous publication (2). (B) H&E staining of coronal sections of WT and Csf1 CKOAdipoq mice after LPS injection. Blue arrows point to bone marrow space close to suture (indicated by *). Green arrows point to the osteolytic lesion where cortical bone was eroded, and the space were filled with inflammatory cells.

      2) Although the contrast between the two Csf1 conditional deletion models (Adipoq-Cre and Prx1-Cre) is very interesting, the relationship between these two cell populations are not well described. The authors did not clarify if MALPs are also targeted by Prx1-Cre, or these two cell types are from different cell lineages. "Other mesenchymal lineage cells" in the subtitle is not extremely helpful to place this finding in context.

      We thank the Reviewer for this comment. The original article constructing Prx1-Cre mouse line demonstrates that Prx1-Cre targets all mesenchymal cells in the limb bud at early as 10.5 dpc (10). This early expression pattern ensures that all bone marrow mesenchymal lineage cells, including MALPs, are targeted by Prx1-Cre. In addition, based on our scRNA-seq data (1), Adipoq is mainly expressed in MALPs, while Prrx1 (Prx1) is highly expressed not only in MALPs but also in EMPs, IMPs, LMPs, LCPs, and OBs (Fig. R2). Thus, the fact that Prx1-Cre driven CKO mice have much more severer bone phenotypes than AdipoqCre driven CKO mice indicates that mesenchymal lineage cells other than MALPs also contribute Csf1 to regulate bone resorption. To avoid confusion, we changed the title and the first sentence in the Result session about Prx1 mice to the following:

      "Csf1 from mesenchymal lineage cells other than MALPs regulate bone structure.

      To explore whether Csf1 from MALPs plays a dominant role in regulating bone structure, we generated Prx1-Cre Csf1flox/flox (Csf1 CKOPrx1) mice to knockout Csf1 in all mesenchymal lineage cells in bone (10), including MALPs."

      Figure R2. Dotplot of Prrx1 and Adipoq expression in bone marrow mesenchymal lineage cells based on our scRNA-seq analysis of 1-month-old mice.

      3) The data supporting defective bone marrow hematopoiesis in Csf1 CKO mice are not particularly strong. They observed a reduction in bone marrow cellularity, but this was only associated with an expected reduction in macrophages and a mild reduction in overall HSPC populations. More in-depth analyses might be required to define mechanisms underlying reduced bone marrow cellularity in CKO mice.

      We thank the Reviewer for this constructive comment. Accordingly, we performed a thorough analysis of bone marrow hematopoietic compartments and observed significant decreases of monocytes and erythroid progenitors in CKO mice compared to WT mice. These results are now included as Fig. 6E.

      4) Some of the phenotypic analyses are still incomplete. The authors did not report whether CHet (Adipoq-Cre Csf1(flox/+)) showed any bone phenotype. Further, the authors did not report whether Csf1 mRNA or M-Csf protein is indeed expressed by MALPs, with current evidence solely reliant on scRNAseq and qPCR data of bulk-isolated cells. More specific histological methods will be helpful to support the premise of the study.

      A pilot microCT study revealed the same femoral trabecular bone structure in WT and Adipoq-Cre Csf1flox/+ (Csf1 Het) mice at 3 months of age (Fig. R3). While the sample number for Het is low, we are confident about this conclusion.

      Figure R3. MicroCT measurement of trabecular bone structural parameters from WT and Csf1 Het mice. BV/TV: bone volume fraction; BMD: bone mineral density; Tb.N: trabecular number; Tb.Th: trabecular thickness; Tb.Sp: trabecular separation; SMI: structural model index. n=3-8 mice/group.

    1. Author Response

      Reviewer #1 (Public Review):

      This study provides evidence for previously unknown relationship between oncogenic protein kinase A (PKA) signaling and MYC family members. Specifically, the authors have employed a combination of systems biology and biochemical assays to capture mediators of oncogenic PKA signaling in a fibrolamellar carcinoma and melanoma cell line. This lead to identification of Aurora A and PIM kinases as potential effectors of constitutively active PKA. Aurora A and PIM kinases have been previously shown to stabilize MYC proteins. Accordingly, evidence is provided that the effects of PKA/Aurora A and PKA/PIM axis are mediated via MYC. Collectively, these findings suggest a model whereby the effects of aberrant PKA signaling are mediated via Aurora A and PIM kinases and related feedback mechanisms that ultimately result in stabilization of MYC proteins. Importantly, PKA-driven cancer cell lines exhibited high sensitivity to Aurora A kinase inhibitors in cell culture-based assays. These findings not only provide pioneering insights into oncogenic PKA signaling, but may also have implications for developing therapeutic approaches for neoplasia that harbor constitutively active PKA.

      Strengths:

      This study addresses the role of aberrant PKA signaling in cancer, which represents a major gap in knowledge in cancer biology. Systems biology approaches and dissection of signaling networks downstream of constitutively active PKA are found to be exciting in the context of this study and likely to provide a wealth of information for future studies. Results from samples obtained from fibrolamellar carcinoma patients partially confirmed correlations observed in cell lines, which was seen as an advantage. Notwithstanding that, it was thought that orthogonal genetic validation may in some cases be warranted, pharmacological approaches using e.g. Aurora A inhibitors hold a promise for accelerated translation of observed findings into the clinic.

      We appreciate this positive assessment of our work and are hopeful that we have solidified the significance and potential impact of our findings through additional analysis.

      Weaknesses:

      The major drawback of the study is the lack of in vivo models to validate observations garnered from the cell lines. This is particularly important considering that experiments carried out in samples from fibrolamellar carcinoma patients suggested additional Aurora A and PIM kinase-independent mechanisms of PKA-driven increase in MYC levels and likely in neoplastic growth may be implicated in vivo. In addition, it was thought that more mechanistic evidence is required for linking PKA to PIM kinase, especially because different PIM kinases were implicated in stabilization of MYC in fibrolamellar carcinoma vs. melanoma cell lines. Finally, although pharmacological approaches were appreciated, due to potential issues with the specificity of the inhibitors, it was thought that orthogonal genetic approaches are warranted to further corroborate the proposed model.

      We acknowledge the lack of in vivo treatment modeling in this manuscript. The work presented here provides motivation for these important experiments, but they remain outside the scope of this manuscript. The expansion of the manuscript in revision with new investigations into protein translation and several additional data sets creates a more complete systems biology analysis of PKA signaling and PKA-induced signaling dependencies. This expanded scope makes in vivo validation of specific treatments and treatment combinations an even larger undertaking. The text has been modified to emphasize this point. We further acknowledge the accuracy of the reviewer’s assessment of our findings on PIM2. The limited reagents to study PIM kinases made this relatively difficult to expand. We shifted the focus of the work to include assessment of PKA effects on mRNA translation as a mechanism of c-MYC regulation. We have strengthened our assessments with loss- and gain-of-function genetic and pharmacological models, which we believe will more completely answer the reviewer’s concerns.

      Reviewer #2 (Public Review):

      Protein kinase A (PKA) is often stimulated and contributes to cancer growth, yet the downstream kinase signaling cascades remain unclear. Here the authors use a global phosphoproteomics and kinome activity profile to show that not only is the RAS/MAPK pathway activated, as expected, but the authors also suggest Aurora kinase A (AURKA) and PIM kinases are activated to stabilize the expression of MYC expression; a potent oncoprotein associated with poor prognosis and aggressive disease. The authors use a number of different cell lines in this study, but focus on fibrolamellar carcinoma as PKA is known to contribute to this disease.

      Strengths: It has been notoriously difficult to map kinases and their substrates as these protein-protein interactions are not always amenable to traditional biochemical techniques due to their labile nature, and kinase substrate consensus sites are often overlapping and not highly specific. Thus, the authors' pipeline to delineate such kinase cascades is quite novel and useful. They apply it here to determine PKA signaling in cancer using sophisticated computational strategies and then validate with classic molecular techniques.

      We appreciate this positive assessment of our analytical tools and the importance of understanding oncogenic PKA signaling.

      Weaknesses: The lack of mechanistic evidence linking aberrant PKA activation with regulation of MYC family members was considered to be a major weakness of the study. As it stands, it is hard to delineate whether observed changes in the levels of MYC family members are indeed a consequence of aberrant PKA signaling. It also remains unclear which MYC phosphorylation sites are implicated in the context of neoplastic PKA function and whether MYC family members are regulated at the level of protein stability or mRNA translation. Moreover, some methodological issues (e.g. using single siRNAs) were also observed. Collectively it was thought that these weaknesses should be addressed to corroborate author's conclusions.

      We acknowledge these concerns about our initially submitted manuscript and present extensive data that advances the manuscript in answering the key questions posed by the reviewer. We note that with the development of data showing PKA-induced phosphorylation of translation initiation components and sensitivity of c-MYC levels to eIF4A inhibition, some detailed evaluations of c-MYC phosphorylation were not undertaken, although key c-MYC mutants were tested in the course of our study and are included for reviewer interest.

    1. Author Response

      Reviewer #1 (Public Review):

      In the current study, the authors reanalyze a prior dataset testing effects of D2 antagonism on choices in a delay discounting task. While the prior report using standard analysis, showed no effects, the current study used a DDM to examine more carefully possible effects on different subcomponents of the decision process. This approach revealed contrasting effects of D2 blockade on the effect of reward size differences and bias. Effects were uncorrelated, suggesting separate mechanisms perhaps. The authors speculate that these opposing effects explain the variability in effects across studies, since they mean that effects would depend on which of these factors is more important in a particular design. Overall the study is novel and well-executed, and the explanation offers interesting insight into neural processes.

      We thank the reviewer for judging our study as interesting and well-executed.

      Reviewer #2 (Public Review):

      The authors aim to test the hypothesis that dopamine mediates the evaluation of temporal costs in intertemporal choice in humans, with a specific goal of synthesizing the competing accounts and previous results regarding whether dopamine increases or decreases evaluation of delays in comparing differently delayed future rewards. To do this, they computationally dissect the impact of the drug amisulpride, a D2R antagonist, using a variant of a sequential sampling model, the drift-diffusion model (DDM), that is well established in decision-making literature as a cognitive process model of choice. This model allows the dissociation of starting bias from the rate at which decision evidence is integrated ('drift'), which the authors map to different accounts of the role of dopamine: the temporal proximity of an outcome is proposed to impact bias, while the cost of a delay to impact the drift rate of evidence evaluation/accumulation. Consistent with previous results, and perhaps integrating conflicting findings, the authors find that d2R blockade impacts both bias and drift rate in a cohort of 50 participants, demonstrating dopaminergic action at this receptor is implicated in dissociable components of intertemporal choice, with D2R block reducing the bias towards sooner, more temporally proximate rewards as well as enhancing the contrast between reward magnitudes irrespective of delay, effectively diminishing the effect of delay in the drug condition. These effects are consistent across a small subset of alternative models, confirming the multiple cognitive mechanisms through which D2R block impacts intertemporal choice is a robust feature of decisions on this task.

      Overall, this study is a detailed dissection of the specific effects of amisulpride on a type of future-oriented, hypothetical intertemporal choice, and provides consistent evidence integrating conflicting accounts that implicate dopaminergic signaling on evaluation of the cognitive costs, such as a delay, on choice. However the specificity of the empirical intervention and the task design limits the interpretation of the broader dopaminergic mechanisms at play in intertemporal choice, especially given the complexity of receptor specificity of this drug, dopamine precursor availability and individual differences and the specifics of the intertemporal choice in this task. As it stands, the results contribute an interesting, synthesized account of how D2R manipulation can impact evaluation of delays in multiple ways, that will likely be useful for motivating future studies and more detailed computational assessments of the cognitive process-level components of intertemporal choice more generally.

      We thank the reviewer for the positive overall evaluation of our study. We revised the manuscript according to the reviewer’s comments, addressing also the receptor specificity of amisulpride and the specifics of the administered intertemporal choice task, which further improved the quality of the manuscript.

      The focus of this study is important, and delineating the role of DA in intertemporal choice is of high relevance given DA disfunction is prevalent in many psychiatric disorders and a key target of pharmacological treatment. While the hypotheses of the current study are framed with respect to "costs", the task used by the authors reduces these to evaluation of a hypothetical delay, one which the participants do not necessarily experience in the context of the task. In some respects this is reasonable, given the prevalence of this task paradigm in testing temporal aspects of choice in humans in an economic sense. However, humans are also notoriously subject to framing effects and the impact of instructions in cognitive tasks like these, which can limit the generality of the conclusions, and in particular the specific ways in which a delay can be interpreted as costly (for eg cost as loss of potential earnings, cost as effortful waiting, cost as computational/simulation cost in future evaluation). Given the hypothesis recruits the idea of cost in assessing the role of dopamine, testing for generality in the effects of amisulpride in related but differently framed tasks seems critical for making this link in a general sense, and in connecting it to the previous studies in the literature the authors point to as demonstrating conflicting effects.

      We agree that it is important to discuss whether our findings for delay costs can be generalized to other costs types as well, such as risk, social costs, effort, or opportunity costs. Based on a recent literature review (Soutschek, Jetter, & Tobler, 2022), we speculate that dopamine may moderate proximity effects also for risk and social costs but not for effortful rewards, though we emphasize that these hypotheses still require more direct empirical evidence. We also discuss the issue that delays can be perceived as costly in different ways. While in some tasks participants actually experience the waiting time until reward delivery, such that delayed rewards are associated with opportunity costs, in our current task paradigm delayed rewards were virtually free of opportunity costs as participants could engage in other reward-related behaviors during the waiting time. Previous studies suggest that lower tonic dopamine levels reduce the sensitivity to opportunity costs (Niv et al., 2007), which seems in line with our finding that amisulpride decreases the influence of delays on the starting bias parameter. Nevertheless, we emphasize that further evidence is needed to decide whether dopamine shows similar effects for experienced and non-experienced waiting costs. In the revised manuscript, we discuss the cost specificity of our findings on p.22:

      “An important question refers to whether our findings for delay costs can be generalized to other types of costs as well, including risk, social costs (i.e., inequity), effort, and opportunity costs. In a recent review, we proposed that dopamine might also moderate proximity effects for reward options differing in risk and social costs, whereas the existing literature provides no evidence for a proximity advantage for effort-free over effortful rewards (Soutschek et al., 2022). However, these hypotheses need to be tested more explicitly by future investigations. Dopamine has also been ascribed a role for moderating opportunity costs, with lower tonic dopamine reducing the sensitivity to opportunity costs (Niv et al., 2007). While this appears consistent with our finding that amisulpride (under the assumption of postsynaptic effects) reduced the impact of delay on the starting bias, it is important to note that choosing delayed rewards did not involve any opportunity costs in our paradigm, given that participants could pursue other rewards during the waiting time. Thus, it needs to be clarified whether our findings for delayed rewards without experienced waiting time can be generalized to choice situations involving experienced opportunity costs.”

      Further, while the study aims to test the actions of dopamine broadly, the empirical manipulation is limited to the action of amisulpride, a D2R anatgonist. There is little to no discussion of, or control for, the relationship between dopaminergic action at D2 receptors (the site of amisulpride effects) and wider mechanisms of dopaminergic action at other sites eg D1-like receptors, and the interplay between activation at these two receptor types alongside baseline levels of dopamine concentration. This is necessary for a comprehensive account of dopamine effects on intertemporal choice as the authors aim to test, as opposed to a specific test of the role of the D2 receptor, which is what the study achieves. On a related note, in some preparations at least, amisulpride also acts at some of the 5-HT receptors, raising the possibility of a non-dopaminergic mechanism by which this drug might impact intertemporal decisions. This possibility, while it would not be expected to act without dopaminergic effects as well, is consistent with established effects of serotonin on waiting behaviors and patience. Granted, the limits of pharmacology in humans does not necessarily mean this can be controlled for, it should be kept in mind with a systemic manipulation such as this.

      We agree with the reviewer that it is important to distinguish between the contributions of D1 and D2 receptors to decision making, given that these receptor families are hypothesized to have dissociable functional roles. We therefore re-analyzed also data on the impact of a D1 agonist on intertemporal decision making (previous findings for this data set were published in Soutschek et al., 2020, Biological Psychiatry). This analysis provided no evidence for significant effects of D1R stimulation on parameters from a drift diffusion model. This suggests that D2R, rather than D1R, activation mediates the impact of proximity on intertemporal choices.

      In the revised manuscript, we report the findings for the D1 agonist study on p.16:

      “To assess the receptor specificity of our findings, we conducted the same analyses on the data from a study (published previously in Soutschek et al. (2020)) testing the impact of three doses of a D1 agonist (6 mg, 15 mg, 30 mg) relative to placebo on intertemporal choices (between-subject design). In the intertemporal choice task used in this experiment, the SS reward was always immediately available (delay = 0), contrary to the task in the D2 experiment where the delay of the SS reward varied from 0-30 days. Again, the data in the D1 experiment were best explained by DDM-1 (DICDDM-1 = 19,657) compared with all other DDMs (DICDDM-2 = 20,934; DICDDM-3 = 21,710; DICDDM-5 = 21,982; DICDDM-6 = 19,660; note that DDM-4 was identical with DDM-1 for the D1 agonist study because the delay of the SS reward was 0). Neither the best-fitting nor any other model yielded significant drug effects on any drift diffusion parameter (see Table 4 for the best-fitting model). Also model-free analyses conducted in the same way as for the D2 antagonist study revealed no significant drug effects (all HDI95% included zero). There was thus no evidence for any influence of D1R stimulation on intertemporal decisions.”

      We discuss the specificity of D2 receptors for moderating the proximity bias on p.17: “This finding represents first evidence for the hypothesis that tonic dopamine moderates the impact of proximity (e.g., more concrete versus more abstract rewards) on cost-benefit decision making (Soutschek et al., 2022; Westbrook & Frank, 2018). Pharmacological manipulation of D1R activation, in contrast, showed no significant effects on the decision process. This provides evidence for the receptor specificity of dopamine’s role in intertemporal decision making (though as caveat it is worth keeping the differences between the tasks administered in the D1 and the D2 studies in mind).”

      We also agree that amisulpride acts also on 5-HT7 receptors, such that it remains unclear whether also such effects contribute to the observed result pattern. We discuss this limitation in the revised manuscript on p.21:

      “Lastly, while the actions of amisulpride on D2/D3 receptors are relatively selective, it also affects serotonergic 5-HT7 receptors (Abbas et al., 2009). Because serotonin was related to impulsive behavior (Mori, Tsutsui-Kimura, Mimura, & Tanaka, 2018), it is worth keeping in mind that amisulpride effects on serotonergic, in addition to dopaminergic, activity might contribute to the observed result pattern.”

      Overall the modeling methods are robust and appropriate for the specific test of decision impacts of D2R blockade, and include several prima facie variable alternative models for comparison. Some caution is warranted, since there are not many trials per subject, and some trials are discarded as well as outliers, which raises the question of power. Given the models are fit hierarchically, which gives both group-level and individual-level parameter estimates, the elements are there to probe more deeply into individual differences, and to test how reliably this approach can dissociate the dual effects of bias and drift rate at the individual level, and perhaps correlate it with other informative subject measures of either dopamine activity/capacity or other dopamine-dependent behaviors. Alternative DDMs might also capture some of this individual variation, with meaningful differences potentially in model comparison at the individual level. It should be noted that the scope of these models do not exhaust the ways in which proximity (here, temporal) of rewards and contrast between choice options might be incorporated into a cognitive process model account of choice; all alternatives here rest on the same implicit 2-alternative forced choice assumption of the DDM, and the assumptions of this model are not here tested against other accounts of choice, for example the linear ballistic accumulator (LBA) and its derivatives. Further, the concept of proximity as a global feature of a trial (on average, how soon are these options overall?) is never tested on my read of the alternative models.

      We thank the reviewer for these interesting suggestions. First, to explore whether measures of dopaminerigc activity correlate with individual differences in drug effects on DDM parameters, we now report correlations between DDM parameters and performance in the digit span backward task as proxy for dopamine synthesis capacity (Cools et al., 2008). None of these correlation analyses showed significant results. In the revised manuscript, we report these analyses on p.13:

      “However, we observed no evidence that individual random coefficients for the drug effects on the drift rate or on the starting bias correlated with body weight, all r < 0.22, all p > 0.10. There were also no significant correlations between DDM parameters and performance in the digit span backward task as proxy for baseline dopamine synthesis capacity (Cools, Gibbs, Miyakawa, Jagust, & D'Esposito, 2008), all r < 0.17, all p > 0.22. There was thus no evidence that pharmacological effects on intertemporal choices depended on body weight as proxy of effective dose or working memory performance as proxy for baseline dopaminergic activity.”

      Regarding model comparisons on the individual level, we note that the hierarchical Bayesian modelling approach allows (to the best of our knowledge) computing indices of model fit like DIC only on the group, not the individual level (while accounting for individual differences). However, we agree with the reviewer that theoretically different models might work best in different individuals (depending, for example, on the individual sensitivity to proximity). While such fine-grained model comparisons on the individual level are beyond the scope of the current study (and might not yield robust results given the limited number of trials for each participant), we now discuss this limitation in the revised manuscript (p.17-18):

      “We note that the hierarchical modelling approach allowed us to compare models on the group level only, such that in some individuals behavior might better be explained by a different model than DDM-1. Such model comparisons on the individual level, however, were beyond the scope of the current study and might not yield robust results given the limited number of trials per individual.”

      Likewise, linear ballistic accumulator (LBA) models represent a further class of process models with different assumptions on the mechanisms underlying the choice process than DDMs. In LBAs, evidence is accumulated separately for each choice alternative, whereas DDMs assume only one accumulation process which integrates attributes from two choice options, limiting the use of DDMs to two-alternative forced-choice scenarios. Nevertheless, proximity effects might be incorporated also in LBA models via modulating the starting point of the option-specific accumulators as a function of proximity. To the best of our knowledge, there is no built-in function in JAGS that allows estimating LBA models in a hierarchical Bayesian fashion (in contrast to, e.g., STAN), such that in the context of the current study it is difficult to directly compare our DDM-based approach with LBA models. It is importance to emphasize, however, that similar to other studies we do not make any claims about whether the choice process per se is best explained by DDMs or LBA models; instead, we focus on how rewards and delay costs affect different components of the decision process within a class of decision models. Nevertheless, we discuss such alternative modelling approaches in the revised manuscript on p.18:

      “We also emphasize that alternative process models like the linear ballistic accumulator (LBA) model make different assumptions than DDMs, for example by positing the existence of separate option-specific accumulators rather than only one as assumed by DDMs. However, proximity effects as investigated in the current study might be incorporated in LBA models as well by varying the starting points of the accumulators as function of proximity.”

      Lastly, we thank the reviewer for the interesting suggestion to assess whether the starting bias parameter is affected by the overall proximity of offers (sum of delays) instead of by the difference in proximity between the options. We ran a further DDM to test this hypothesis, but this model explained the data worse (DIC = 9,492) than the original DDM (DIC = 9,478). Nevertheless, also the overall proximity DDM yielded a significant amisulpride effect on the impact of reward magnitude on the drift rate, HDImean = 0.83, HDI95% = [0.04; 1.75], underlining the robustness of this effect. In the revised manuscript, we report this analysis on p.12:

      “In a further model (DDM-4), we explored whether the starting bias is affected by the overall proximity of the options (sum of delays, Delaysum) rather than the difference in proximity (Delaydiff; see Table 3 for an overview over the parameters included in the various models). Importantly, our original DDM-1 (DIC = 9,478) explained the data better than DDM-2 (DIC = 9,481), DDM-3 (DIC = 10,224), or DDM-4 (DIC = 9,492). Nevertheless, amisulpride moderated the impact of Magnitudediff on the drift rate also in DDM-2, HDImean = 0.86, HDI95% = [0.18; 1.64], and DDM-4, HDImean = 0.83, HDI95% = [0.04; 1.75], and amisulpride also lowered the impact of Delaydiff on the starting bias in DDM-3, HDImean = -0.02, HDI95% = [-0.04; -0.001]. Thus, the dopaminergic effects on these subcomponents of the choice process are robust to the exact specification of the DDM.”

      Reviewer #3 (Public Review):

      Soutschek and Tobler provide an intriguing re-analysis of inter-temporal choice data on amisulpride versus placebo which provides evidence for an as-yet untested hypothesis that dopamine interacts with proximity to bias choices.

      The modeling methods are sound with a robust and reasonably exhaustive set of models for comparison, with good posterior predictive checks at the single subject level, and decent evidence of parameter recoverability. Importantly, they show that while there is no main effect of drug on the proportion of larger, later (LL) versus smaller, sooner (SS) choices, this obscures conflicting-directional effects on drift rate versus starting point bias which are under-the-hood, yet anticipated by the hypothesis of interest.

      We thank the reviewer for judging our findings as intriguing and the modelling approach as robust and convincing.

      While I have no major concerns about methodology, I think the Authors should consider an alternative interpretation - albeit an interpretation which would actually support the hypothesis in question more directly than their current interpretation. Namely, the Authors should re-consider the possibility that amisulpride's effects are mediated primarily by acting at pre-synaptic receptors. If the D2R antagonist were to act pre-synaptically, it would drive more versus less post-synaptic dopamine signaling.

      There are multiple reason for this inference. First, the Authors observe that the drug increases sensitivity to differences in the relative offer amounts (in terms of effects on the drift rate). With respect to the canonical model of dopamine signaling in the direct versus indirect pathway, greater post-synaptic signaling should amplify sensitivity to reward benefits - which is what the Authors observe.

      Second, the Authors also observe an effect on the starting bias which may also be consistent with an increase in post-synaptic dopamine signaling. Note that according to the Westbrook & Frank hypothesis, a proximity bias in delay discounting should favor the SS over the LL reward, yet the Authors primarily observe a starting bias in the direction of the LL reward. This contradiction can be resolved with the ancillary assumption that, independent of any choice attribute, participants are on average predisposed to select the LL option. Indeed, the Authors observe a reliable non-zero intercept in their logistic regression model indicating that participants selected the LL more often, on average. As such, the estimated starting point may reflect a combination of a heightened predisposition to select the LL option, opposed by a proximity bias towards the sooner option. Perhaps the estimated DDM starting point is positive because the predisposition to select the LL option has a larger effect on choices than the proximity bias towards sooner rewards does in this data set. To the extent that amisulpride increases post-synaptic dopamine signaling (by antagonizing pre-synaptic D2Rs) it should amplify the proximity bias arising from the differences in delay, shifting the starting bias towards the SS option. Indeed, this is also what the Authors observe.

      Note that it remains unclear why an increase in post-synaptic dopamine signaling would amplify one kind of proximity bias (towards sooner over later rewards) without amplifying the other (towards a predisposition to select the LL option). Perhaps the cognitive / psychological nature of the sooner bias is more amenable to interacting with dopamine signaling than the latter. Or maybe proximity bias effects are most sensitive to dopamine signaling when they are smaller, and the LL predisposition bias is already at ceiling in the context of this task. These assumptions would help explain why a potential increase in post-synaptic dopamine signaling both amplified the proximity effect of delay when it was smallest (when the differences in delay were smaller), and also failed to amplify the predisposition to select the LL option (which may already be maxed out). More importantly, the assumption that there are opposing proximity biases would also help explain why there is a negative effect of delay magnitude on the estimated starting point on placebo. Namely - as the delay gets larger, the psychological proximity of sooner over later rewards grows, counteracting the proximity bias arising from choice predisposition / repetition.

      We thank the reviewer for suggesting this alternative interpretation of our data. We agree that the administered dose of 400 mg amisulpride can show both postsynaptic (reducing D2R activation) and presynaptic effects (enhancing D2R activation), which in many studies makes it difficult to decide whether the observed behavioral effects are caused by presynaptic or postsynaptic mechanisms.

      The reviewer suggests that the observed stronger influence of reward magnitudes on drift rates under amisulpride compared with placebo speaks in favor of presynaptic effects, because according to theoretical accounts higher dopamine levels should increase reward seeking (e.g., Frank & O’Reilly, 2006). On the other hand, Figure 2C suggests that amisulpride (compared with placebo) increased the preference only for relatively high, above-average rewards. If the difference between reward magnitudes was below average, amisulpride reduced rather than increased the preference for the larger reward. In our view, this is consistent with the hypothesis that D2R activation implements a cost control, with higher D2R activation increasing the attractiveness of costly rewards and lower D2R activation reducing it. In other words, under low dopamine levels individuals should decide for the costlier reward only if the magnitude of the costlier reward is sufficiently large compared with the lower, less costly reward. In fact, this is exactly what we find in our data according to Figure 2C. In our view, the amisulpride effect on drift rates is thus compatible with both presynaptic and postsynaptic mechanisms of action, depending on the underlying conceptual account of dopamine, as we now discuss in the revised manuscript.

      According to the reviewer, also the observed influence of amisulpride on the starting bias speaks in favor of increased rather than reduced dopamine levels. We agree with the reviewer that the result pattern for the starting bias is somewhat complex and seems to combine the effects of two different biases: a general tendency to choose LL over SS rewards (intercept of starting bias where the difference in delays is close to zero), and a shift towards the SS option under placebo if one options has a strong (temporal) proximity advantage over the other. Amisulpride shows opposite effects on the two different biases, as it shifts the intercept of the starting bias further away from the LL option but also reduces the proximity advantage of the SS over the LL reward for larger differences in delay. The reviewer writes that “To the extent that amisulpride increases post-synaptic dopamine signaling (by antagonizing pre-synaptic D2Rs) it should amplify the proximity bias arising from the differences in delay, shifting the starting bias towards the SS option. Indeed, this is also what the Authors observe.” In contrast to that statement, in our study amisulpride reduced rather than increased the starting bias arising from delay (as in Figure 2K the regression line is flatter under amisulpride compared with placebo, despite the differences regarding the intercept). We believe that the amisulpride effects on both the intercept and the delay-dependent slope can be explained via postsynaptic effects: First, the shift of the intercept of the starting bias (small differences in proximity) from the LL towards the SS option under amisulpride is consistent with the assumption that lower dopamine reduces the preference for larger reward (e.g., Beeler & Mourra, 2018; Salamone & Correa, 2012). Second, the finding that amisulpride weakens the proximity advantage of SS over LL rewards (delay-dependent slope) is consistent with the proximity account by Westbrook & Frank (2018) according to which lower tonic dopamine should reduce proximity effects. Thus, if we assume that the result pattern for the starting bias parameter is driven by dopaminergic effects on two separate decision biases (as suggested by the reviewer), we believe that both effects can better be explained by pharmacologically reduced rather than increased dopamine levels.

      In the revised manuscript, we extensively discuss the question as to whether the observed drug effects are caused by postsynaptic versus presynaptic effects. We clarify that the amisulpride effect on drift rates seems consistent with both presynaptic and postsynaptic effects (depending on the underlying conceptual account). We moreover discuss that the starting bias effects may reflect the interaction between two different bias types, and the drug effects on both bias types can more easily be reconciled with postsynaptic than presynaptic effects. On balance, we believe that the observed effects are more likely to reflect lower as compared to higher dopamine levels, but the extended discussion of this issue gives all readers the opportunity to weigh the arguments for and against these alternatives. If the reviewer should not agree with some aspects of our argumentation as outlined above, we would of course be happy to modify the discussion according to the reviewer’s advice.

      In the revised manuscript, we modified the discussion of presynaptic versus postsynaptic effects as follows (p.20-21):

      “While higher doses of amisulpride (as administered in the current study) antagonize post-synaptic D2Rs, lower doses (50-300 mg) were found to primarily block pre-synaptic dopamine receptors (Schoemaker et al., 1997), which may result in amplified phasic dopamine release and thus increased sensitivity to benefits (Frank & O'Reilly, 2006). At first glance, the stronger influence of differences in reward magnitude on drift rates under amisulpride compared with placebo might therefore speak in favor of presynaptic (higher dopamine levels) rather than postsynaptic mechanisms of action in the current study. On the other hand, one could argue that amisulpride reduced the preference for the LL reward if the gain from the costlier LL option compared with the SS option was small (as suggested by Figure 2C), which is consistent with the cost control hypothesis of dopamine (Beeler & Mourra, 2018). The impact of amisulpride on the drift rate thus appears ambiguous regarding the question of pre- versus postsynaptic effects. The result pattern for the starting bias parameter, in turn, suggests the presence of two distinct response biases, reflected by the intercept and the delay-dependent slope of the bias parameter (see Figure 2K), which are both under dopaminergic control but in opposite directions. First, participants seem to have a general bias towards the LL option in the current task (intercept), which is reduced under amisulpride compared with placebo, consistent with the assumption that dopamine strengthens the preference for larger rewards (Beeler & Mourra, 2018; Salamone & Correa, 2012; Schultz, 2015). Second, amisulpride reduced the proximity advantage of SS over LL rewards with increasing differences in delay, as predicted by the proximity account of tonic dopamine (Westbrook & Frank, 2018). On balance, the current results thus appear more likely under the assumption of postsynaptic rather than presynaptic effects. Unfortunately, the lack of a significant amisulpride effect on decision times (which should be reduced or increased as consequence of presynaptic or postsynaptic effects, respectively) sheds no additional light on the issue.”

      Regardless of the final interpretation, showing that pharmacological intervention into striatal dopamine signaling can simultaneously modify a starting point bias and drift rate (in opposite directions - thus having systematic effects on choice biases without altering the average proportion of LL choices) provides crucial first evidence for the hypothesis that dopamine and proximity interact to influence decision-making. These results thereby enrich our understanding of the neuromodulatory mechanisms influencing inter-temporal choice, and take an important step towards resolving prior contradictions in this literature. They also have implications for how striatal dopamine might impact decision-making in diverse domains of impulsivity beyond inter-temporal choice, ranging from cognitive neuroscience (e.g. in numerous cognitive control tasks) to psychiatry (treating diverse disorders of impulse control).

      We thank the reviewer for highlighting the importance of the current findings for understanding dopamine’s role in decision making.

    1. Author Response

      Reviewer #1 (Public Review):

      Liau and colleagues have previously reported an approach that uses PAM-saturating CRISPR screens to identify mechanisms of resistance to active site enzyme inhibitors, allosteric inhibitors, and molecular glue degraders. Here, Ngan et al report a PAM-saturating CRISPR screen for resistance to the hypomethylating agent, decitabine, and focus on putatively allosteric regulatory sites. Integrating multiple computational approaches, they validate known - and discover new - mechanisms that increase DNMT1 activity. The work described is of the typical high quality expected from this outstanding group of scientists, but I find several claims to be slightly overreaching.

      Major points:

      The paper is presented as a new method - activity-based CRISPR scanning - to identify allosteric regulatory sites using DNMT1 as a proof-of-concept. Methodologically, the key differentiating feature from past work is that the inhibitor being used is an activity-based substrate analog inhibitor that forms a covalent adduct with the enzyme. I find the argument that this represents a new method for identifying allosteric sites to be relatively unconvincing and I would have preferred more follow-up of the compelling screening hits instead. The basic biology of DNMT1 and the translational relevance of decitabine resistance are undoubtedly of interest to researchers in diverse fields. In contrast, I am unconvinced that there is any qualitative or quantitative difference in the insights that can be derived from "activity-based CRISPR scanning" (using an activity-based inhibitor) compared to their standard "CRISPR suppressor scanning" (not using an activity-based inhibitor). Key to their argument, which is expanded upon at length in the manuscript, is that decitabine - being an activity-based inhibitor that only differs from the substrate by 2 atoms - will enrich for mutations in allosteric sites versus orthosteric sites because it will be more difficult to find mutations that selectively impact analog binding than it is for other active-site inhibitors. However, other work from this group clearly shows that non-activity-based allosteric and orthosteric inhibitors can just as easily identify resistance mutations in allosteric sites distal from the active site of an enzyme (https://www.biorxiv.org/content/10.1101/2022.04.04.486977v1). If the authors had compared their decitabine screen to a reversible DNMT1 inhibitor, such as GSK3685032, and found that decitabine was uniquely able to identify resistance mutations in allosteric sites, then I would be convinced. But with the data currently available, I see no reason to conclude that "activity-based CRISPR scanning" biases for different functional outcomes compared to the "CRISPR suppressor scanning" approach.

      We appreciate the reviewer’s comments and thank them for their constructive feedback. We agree with the reviewer that our claims regarding the utility of activity-based CRISPR scanning would be more strongly supported with a head-to-head comparison against a non-covalent, reversible inhibitor. To address this point, we conducted a CRISPR scanning experiment on DNMT1 and UHRF1 using GSK3484862 (GSKi), which is shown in Fig. 1e–h. We observed that the top enriched sgRNA under GSKi treatment targets H1507, which directly interacts with the drug and contributes to compound binding. (Fig. 1e,h, Supplementary Fig. 1e). Our results are consistent with previous structural and biochemical studies of these inhibitors (reported in Pappalardi, M.B. et al., Nat. Cancer 2021), in which they demonstrate that the H1507Y mutation reduces GSK3685032 (a derivative of GSK3484862) inhibition of DNMT1 by >350-fold compared to wild-type DNMT1. By contrast, the top enriched sgRNA under decitabine (DAC) treatment targets D702 in the autoinhibitory linker region (Fig. 1c). Furthermore, comparison of sgRNA resistance scores across DAC and GSKi treatment conditions reveals highly distinct sgRNA enrichment profiles (Fig. 1g). Taken together, our data suggest that these two mechanistic classes of inhibitors may exert differential selective pressures that lead to unique enrichment profiles.

      While we consider these data to strengthen our claim that activity-based CRISPR scanning can preferentially enrich for mutations in allosteric sites versus orthosteric sites, we also recognize that allosteric site mutations can be identified without the use of activity-based inhibitors, as the reviewer points out. To address this point, we have modified the text to suggest that the use of activity-based inhibitors may exert a greater bias for the enrichment of allosteric site mutations but clarifying that the enrichment of such mutations are not exclusive to the use of activity-based inhibitors.

      How can LOF mutations from cluster 2 be leading to drug resistance? It is speculated in the paper that a change in gene dosage decreases the DNA crosslinks that cause toxicity. However, the immediate question then would be why do the resistance mutations cluster around the catalytic site? If it's just gene dosage from LOF editing outcomes, would you not expect the effect to occur more or less equally across the entire CDS?

      This is an excellent point. As outlined previously above, we recognize that our gene dosage hypothesis regarding the mechanism of cluster 2 sgRNAs may lack sufficient explanation to convey our reasoning clearly, and we have added more text and data to clarify and support our claim.

      Mutations that are highly likely to lead to a nonfunctional protein product (i.e., frameshift, nonsense, splice site disrupting) are annotated as “loss-of-function” (LOF) in the text, with all other protein coding mutations designated as “in-frame.” The key insight underlying our gene dosage hypothesis is that sgRNAs targeting essential protein regions and functional domains generate greater proportions of null (i.e., knockout) mutations and undergo stronger negative selection compared to sgRNAs targeting non-essential protein regions (see Shi, J. et al., Nat. Biotechnol. 2015). This is because in-frame coding mutations in protein regions that are functionally important (e.g., DNMT1 catalytic domain) are more likely to disrupt protein function than those in non-essential protein regions. As a result, sgRNAs targeting functional protein regions are more likely to generate in-frame mutations resulting in a null allele and are thus “effectively LOF.” Importantly, the observation that sgRNAs targeting specific protein regions are more likely to lead to null mutations also implies that 1. not all CDS-targeting sgRNAs are equivalent at inducing LOF effects and 2. sgRNAs that are more effective at generating null mutations may exhibit preferential clustering within functionally important protein regions.

      In this context, we reasoned that cluster 2 sgRNAs, which target the essential catalytic domain, may be more effective at reducing DNMT1 gene dosage than other DNMT1-targeting sgRNAs because in-frame mutations generated by these sgRNAs are more likely to lead to nonfunctional DNMT1 protein. That is, cluster 2 sgRNAs may generate greater proportions of “effectively LOF” in-frame mutations that disrupt DNMT1’s essential function. Consequently, we posited that the observed clustering of these sgRNAs in the catalytic domain is likely a reflection of its functional importance. To test this idea, we transduced WT K562 cells with 6 individual sgRNAs targeting the N-terminus, RFTS domain, and catalytic domain of DNMT1, and performed genotyping on the cellular pools over 28 days (Fig. 4f). We observed that sgRNAs targeting outside of the catalytic domain exhibited increasing frequencies of in-frame mutations over time, consistent with the idea that these sgRNAs generate functional in-frame mutations that are not under strong negative selection. By contrast, catalytic-targeting sgRNAs exhibited significant depletion of inframe mutations over time, supporting the notion that in-frame mutations in essential regions are functional knockouts and thus negatively selected under normal growth conditions. Consequently, the ability of catalytic-targeting sgRNAs to generate greater proportions of null mutations would therefore make them more effective at conferring resistance through gene dosage reduction than other DNMT1-targeting sgRNAs.

      Our hypothesis implies that a large proportion of in-frame mutations generated by cluster 2 sgRNAs are functionally equivalent to LOF mutations (i.e., frameshift, nonsense, splice site disruption), and therefore neither in-frame or LOF mutations should be preferentially selected for under DAC treatment, in contrast to the positive selection of gain-of-function (GOF) in-frame mutations in cluster 1 sgRNAs. Consistent with this idea, our data indicate that the relative proportions of in-frame and LOF mutations in cluster 2 sgRNAs remain comparable across vehicle and DAC treatments (Fig. 4b). Furthermore, since the selective pressure on in-frame and LOF mutations should be similar if they are functionally equivalent, the relative proportions of in-frame versus LOF mutations in cluster 2 sgRNAs should be primarily dictated by their frequencies as editing outcomes. Consistent with this idea, the observed proportions of in-frame versus LOF mutations in cluster 2 sgRNAs under DAC treatment do not deviate significantly from their expected proportions as predicted by inDelphi (Supplementary Fig. 4c). Conversely, cluster 1 sgRNAs exhibit greater ratios of in-frame versus LOF mutations under DAC treatment than their predicted ratios from inDelphi (Supplementary Fig. 4c,d). Altogether, these data are consistent with the notion that cluster 2 sgRNAs may operate through a gene dosage reduction effect.

      In general, I found the screens, and integrative analyses, highly compelling. But the follow-up was rather narrow. For example, how much do these mutations shift the IC50 curves for DAC?

      To address this point, we derived two clonal cell lines from the screen harboring endogenous DNMT1 mutations in either the autoinhibitory linker or the RFTS domain (Supplementary Fig. 3g). We treated these cell lines, in addition to WT K562 cells, with varying concentrations of DAC and observed a partial growth rescue in the mutant cell lines relative to WT K562 cells (Fig. 3i). We also show that these mutant cell lines exhibit DAC-mediated degradation of DNMT1, consistent with our fluorescent reporter results (Supplementary Fig. 3h). To further validate whether these endogenous DNMT1 mutations confer partial resistance to DAC, we transduced WT K562 cells with vectors encoding an shRNA targeting the 3' UTR of the endogenous DNMT1 transcript and a DNMT1 overexpression vector encoding WT and mutant DNMT1 constructs (Supplementary Fig. 3i). Upon treating these knockdown and overexpression cells with varying concentrations of DAC, we again observed a partial growth rescue in the presence of mutant versus WT DNMT1 (Fig. 3j).

      What kinetic parameters have changed to increase catalytic activity?

      We performed enzyme activity assays at various temperatures with recombinant DNMT1 protein for WT and mutant DNMT1 constructs, observing that mutant DNMT1 constructs exhibit varying degrees of overactivity relative to WT DNMT1 at different temperatures (Fig. 3h, Supplementary Fig. 4f). Whereas the autoinhibitory linker mutations display consistently higher levels of activity relative to WT DNMT1 at all temperatures tested, we observed that RFTS and CXXC mutants exhibited decreasing levels of overactivity with increasing temperature (Fig. 3h). Previous studies (see Berkyurek, A.C. et al., J. Biol. Chem. 2014) have observed similar behavior with RFTS mutations, suggesting that these mutations may disrupt critical hydrogen bonds at the autoinhibitory interface that reduce the activation energy required to release DNMT1 from an autoinhibited to active conformation. Our RFTS and CXXC mutations exhibit behavior that are consistent with this hypothesis, which may explain the decreasing levels of overactivity with increasing temperature.

      Do the mutants with increased catalytic activity alter the abundance of methylated DNA (naively or in response to the drug)? It is speculated that several UHRF1 sgRNAs disrupt PPIs and not DNA binding, but this is never tested.

      While we derived clonal cell lines containing DNMT1 mutations, as noted above, it proved too difficult to compare these drug-resistant cells to naïve cells because they were cultured in the presence of DAC for 2 months, leading to large changes in DNA methylation that may confound any conclusions about the effects of the mutations alone. Additionally, the reviewer also brings up valid limitations regarding our studies on UHRF1, which also proved very difficult to biochemically purify and beyond our expertise. After some initial studies, we chose not to pursue these additional experiments further but instead prioritized the GSKi CRISPR-suppressor scan and cluster 2 studies, as suggested by the reviewers. We acknowledge these limitations in the text.

      Reviewer #2 (Public Review):

      In this manuscript, Ngan and coworkers described a CRISPER-based screening approach to identify potential variants of DNMT1 and UHRF1 that can suppress the anti-proliferation role of decitabine. In theory, such an effect can be achieved by at least two types of gain-of-activity DNMT1/UHRF1 mutants by directly boosting the enzymatic activity or by indirectly abolishing the intrinsic inhibitory activity of the DNMT1-UHRF1 axis. Through systematically targeting the DNMT1-UHRF1 reading frames with a rationally designed sgRNA library, the authors identified and characterized a few potential hotspots within multiple autoinhibitory motifs. While the approach has its merits in regard to the unbiased screening of the target proteins in living cells, there are the following serious concerns in terms of how the data were interpreted and the limitation of the approach itself as detailed below.

      (1) Although the authors identified multiple hotspots in the DNMT1-UHRF1 complex with their alterations associated with the resistance to decitabine, it is risky to argue these mutations increase DNMT1 activity simply because they are clustered within known auto-inhibitory regions. There are many alternative explanations for this observation. For instance, some mutants may allosterically alter how DNMT1 recognizes decitabine-containing vs native GpC motifs; others may recruit other proteins as modulators. The key gap here is to associate the decitabine-resistance phenotype to the loss of auto-inhibitory functions because multiple hotspots were in the auto-inhibitory regions.

      In our original manuscript, we supported our claim that gain-of-function DNMT1 mutations enhance DNMT1 activity with experimental data using purified DNMT1 protein constructs in enzyme activity assays (Fig. 3g, Fig. 4g), so our conclusion was not solely inferred from sgRNA clustering at the autoinhibitory interface, but also experimentally validated. In our revised manuscript, we provide additional experimental biochemical characterization to further support the claim that autoinhibition is weakened in the DNMT1 mutants we identified (Fig. 3h, Supplementary Fig. 4f). Moreover, we provide cellular data using clonal cell lines harboring endogenous DNMT1 mutations in addition to knockdown/overexpression experiments, demonstrating that RFTS and autoinhibitory linker mutations confer partial growth rescue to DAC treatment (Fig. 3i,j). We agree that we cannot rule out the possibility that these mutations may exert other effects that independently contribute to the observed resistance phenotype (e.g., altered CpG recognition), and we have added a statement acknowledging this limitation.

      (2) Lack of general biological relevance of the corresponding findings. Through this work, the author identified multiple DNMT1-UHRF1 variants that alter the anti-proliferation role of decitabine. However, the observation that the multiple mutants were clustered in a hotspot doesn't mean that these mutants have to act via the same mechanism. The authors seem to underestimate the complexity of how these mutants can render the same biological readouts and even haven't considered the possibility of transcriptional modulation of antagonists or agonists in the DNMT1-UHRF1. Therefore, the biological relevance of these findings remains unclear.

      We agree that although the cluster 1 mutations share a common property of increased DNMT1 activity, it does not preclude alternative mechanisms. Indeed, it is likely that these mutations have complex and nuanced mechanistic differences in the biochemical alterations underlying their observed increases in DNMT1 activity. Indeed, we have included enzyme activity data suggesting that autoinhibitory linker mutations may exhibit a different biochemical basis for increased DNMT1 activity than RFTS and CXXC mutations. That said, we did not intend to make broader claims regarding biological relevance and were instead focused on conveying that this activity-based methodology can identify gain-of-function mutations, which we directly support with experimental data. To clarify these points, we have adapted the text to more precisely convey our intended claims and have acknowledged that other complex mechanisms may also be involved.

      (3) Collectively for reasons (1) and (2), the mechanistic analysis seems only to associate the current findings with known regulatory pathways. Without detailed in vitro and in-cell characterization of the DNMT1-UHRF1 mutants, the novel regulatory mechanisms, which may exist, could be largely missed.

      We have added some additional characterization of these mutations in the revised manuscript, which have been detailed above, and we would like to note that we identified new sites in DNMT1 and UHRF1 that may be functional based off our allele analysis. However, since this manuscript is intended more as a methodology, we believe that extensively exploring novel regulatory mechanisms and their mechanism is beyond the scope of this report.

      (4) The current CRISPER-based screening approach has the technical limitation of mainly screen deletion with some exceptions for point mutations. As a result, the majority of loss/gain-of-function point mutations will be missed by the CRISPER-based screening method.

      We acknowledge that a technical limitation of this Cas nuclease-based mutational scan is that it is biased toward insertion/deletion mutations versus point mutations. However, we disagree with the reviewer’s claim that this means that the majority of the loss-/gain-of-function mutations will be missed, since insertion/deletions are often larger perturbations than point mutations and thus have stronger effect sizes in many cases. In principle, the selection modalities (e.g., activity-based inhibitors) used here — which are the primary focus of the study — can also be combined with alternative genomic editing approaches to assess distinct mutational perturbations, such as base editing for point mutations (see Lue, N.Z. et al., Nat. Chem. Biol. 2022). To acknowledge the reviewer’s concern, however, we have added additional text explicitly stating that the screen is biased against point mutations and that future integration with base editing and other mutational modalities may be useful to complement our nuclease-based approach.

      (5) The current CRISPER-based screening approach can work only in the context of living cells. As a result, robust cellular readouts are needed. The DNMT1-UHRF1 in combination with decitabine is among few suitable targets for such application.

      While running CRISPR-based screens requires robust cellular assays, the main advantage of CRISPRbased mutational scanning is the ability to mutagenize the endogenous protein target in situ and assess the effect of the perturbation in the native cellular and genomic context. Resistance to activity-based probes — and small molecules more broadly — provides a robust phenotypic readout that has been extensively used by our group and many others. Alternatively, other types of phenotypic readouts that do not rely on cell viability can also be employed with these screens, including those used to assess DNA methylation (see Lue, N.Z. et al., Nat. Chem. Biol. 2022). Given the increasingly large body of literature applying CRISPR-based screens towards a multitude of biological pathways and diverse targets, we disagree with the reviewer’s claim that only a few targets can be evaluated in such a manner.

      (6) Although the authors claim that their mutants are "gain-of-function" for DNMT1/UHRF1, they were indeed due to the loss of inhibitory regulation. It is a little disappointing because the screening outcomes still fall into the conventional expectation of the loss-of-function variants.

      We agree that the mutations are not truly neomorphic, but instead likely hypermorphic due to loss of an autoinhibitory mechanism, resulting in gain-of-function increase in catalytic activity. While discovering neomorphic mutations would be extraordinary, we do not believe that our results are disappointing since the identification of autoinhibitory mechanisms is nevertheless impactful.

      Collectively, the current status of the manuscript is short of merits in terms of the impacts of technology and biological findings.

      We respectfully disagree with the reviewer’s comment as we believe that the experimental and computational methodology may be broadly useful for the field. Indeed, we have already implemented many of the tools developed here in our current ongoing work.

    1. Author Response

      Reviewer #2 (Public Review):

      This manuscript presents a rather technical modelling analysis of the impact of local lockdowns on Covid-19 hospitalisations in the Netherlands. The major strength of the study is that the authors attempt to calibrate their model to a novel data source, a commercial database of mobility patterns between municipalities. The major weakness is that the model seems overly complicated, many parameters seem to have been 'guessed' without a formal uncertainty analysis, e.g. within a Bayesian framework, so that it is impossible to judge how robust the results and therefore the conclusions are.

      Major points:

      1) In some aspects the structure of the model presented seems overly complicated: It is not clear why the authors chose the 1:100 population scale and why they didn't go directly for modelling the full population. Artificially reducing the population size has important stochastic effects at the early phase of the epidemic. Also it is not clear what it means when 1:100 of one municipality mixes with 1:100 of another municipality? The authors should at least attempt to see what impact this has on output, i.e. conduct a sensitivity analysis.

      The reason for choosing a 1:100 population scale instead of the full population is computational speed. Indeed, this (and its consequences) is not mentioned explicitly and will be added. Moreover, to identify the sensitivity of the results to population scale, we add runs on different population scales.

      • Added reasoning and consequences associated with the 1:100 population scale in SI C.1.

      • The sensitivity of the results to population scale is now discussed in SI C.1 using runs with other population resolutions.

      2) On the other hand the model goes into (too) much detail regarding mixing behaviour and attempts to model processes during each hour of the day. This does not seem to be informed by actual data, but the data seems to be made up e.g. as in A.6. As an ex-student and a father of a teenager I can tell you that the susceptibility profile guessed in Table 3 does not seem to be very realistic. As it is stated in the appendix, the Mezuro data set only provides daily averages of travelling between communitities, so it is not clear why the hourly resolution is actually needed in the model.

      Indeed, several aspects in the model are informed by “secondary statistics” which unfortunately add uncertainty. An example would be the normalization of the mobility matrices by means of data on how people spend their time (see SI A.3). Note that the example of the susceptibility profile that the reviewer mentions, however, does not involve such secondary statistics and happens to be exactly reported by the Dutch health agency (cited in SI A.5).

      We agree that the model includes much detail, which potentially has weaknesses as the reviewer rightfully mentions. However, one of the main points of this paper is that in order to address the questions of local interventions, geographical spread and associated hospital admissions, we simply need this level of detail, or even higher. In other words, assessments of such mechanics would be even more uncertain if this level of detail is not included.

      We agree that the motivation for hourly resolution is not well described in the manuscript – this will be added. The reasoning is that mixing of the population is highly heterogeneous throughout the day: clearly, seen in Fig. S5 (SI A.7), mixing at work is fully different from mixing at school or at home.

      Moreover, people meet at work in different municipalities and then return to home to potentially spread the disease further. It is exactly such mechanics that we are after in our analysis.

      • Added a more in-depth discussion of the mobility data in SI C.2.

      • Added the motivation for hourly resolution in SI A.1-A.3.

      3) It is not clear why the authors rely on only one short period of the Mezuro data set in March 2019 and not investigate the same data source during the actual lockdown in 2020, or even for the full year, as travelling is likely to be very season dependent. This would provide much better estimates of the effects of lockdown on travel patterns. The analysis presented and categorisation into frequent, regular and incidental also need further explanation. It is not clear how international travel is accounted for in the mobility data.

      The reviewer is correct that using a longer mobility dataset or one that is exactly addressing the period of the actual lockdown would be beneficial. The reason we are not doing so is simply that this data is not available.

      The model accounts for international travel by means of its initialization, but not further. In practise, international travel got severely reduced throughout this period. Hence, we deem the uncertainties associated with not accounting for international travel limited.

      • Added a discussion on the effect of using this mobility dataset in SI C.2. • Added a further explanation of categorizing the movements (in SI C .2).

      • Added a discussion on international travel in SI C.2.

      4) Beyond the technical points on the modelling, the main hypothesis of whether local lockdowns may work has also not been sufficiently discussed outside of the Dutch context. The authors fail to mention that this was the approach chosen in Northern Italy at the start of the epidemic (https://en.wikipedia.org/wiki/COVID-19_lockdowns_in_Italy) where it didn't work, as we all know. On the other hand, more recent local lockdowns in China appeared to be successful, albeit at a great societal cost in terms of restrictions to freedom (https://en.wikipedia.org/wiki/COVID19_lockdown_in_China).

      The reviewer is correct that we only show this in the Dutch context. We can reason about other situations, but clearly these situations differ vastly from country to country.

      Reviewer #3 (Public Review):

      This work uses an agent-based model of SARS-CoV-2 transmission (calibrated to the first wave in the Netherlands) to examine how the societal impact of interventions could have been reduced - while maintaining epidemiological impact - if they were implemented at a subnational (eg, municipality) rather than a national level. After more than two years of lockdowns and mobility restrictions, the societal cost of such measures is becoming better understood, and it is important to evaluate the effectiveness of such measures and reflect upon how they can be deployed in a minimally disruptive fashion. Mathematical and computational models are a natural choice for such investigations as they enable researchers to explore counter-factual scenarios ("what might have happened had we acted differently?")

      The authors conclude that subnational interventions, triggered via prevalence in a particular municipality, could have controlled the first wave of SARS-CoV-2 in the Netherlands with minimal health cost but less societal disruption than national interventions. This claim is supported by reference to Figure 4 showing the impact on (a) hospital admissions and (b) municipalities without interventions through different phases of the outbreak. For more remote/rural municipalities, the use of interventions is delayed by ~1 week, although some (6%) of municipalities avoid interventions altogether.

      Strengths:

      As noted above, the general objective of this study is important and of potentially broad interest. The agent-based model is complex, but not unreasonably so, and makes good use of rich demographic, mobility, epidemiological/clinical, etc. data for calibration. The simulations conducted using the model support the specific conclusions of the manuscript.

      Weaknesses:

      While the motivation and approach are strong points of this work, the analysis and interpretation would benefit from further development. The robustness of model behaviour to the threshold used to trigger subnational interventions is explored; however, there are other aspects of the model that are not subjected to sensitivity analysis, including:

      1) The impact of imperfect surveillance (eg, due to asymptomatic transmission, reporting delays, etc);

      2) The impact of non-compliance, which could potentially differ for subnational versus national interventions;

      3) The impact of pathogens/variants with transmission/severity characteristics different from the original SARS-CoV-2 strain.

      In the absence of such analyses, it is difficult to generalise the findings beyond "this is how subnational interventions could have been used to control the first wave of SARS-CoV-2 in the Netherlands" to "this is how subnational interventions could be used effectively in the event of future outbreaks" (of a SARS-CoV-2 variant or other pathogen).

      The discussion focuses on limitations associated with the model but does not consider other potential implications of subnational interventions. For example:

      1) Subnational interventions may produce unintended consequences if populations respond by relocating from regions with interventions (high prevalence) to regions without interventions (low prevalence).

      2) Subnational interventions would require extremely effective public health messaging to avoid confusing populations. Particularly in densely populated regions where municipalities may be small and tightly connected, the feasibility of communicating (and enforcing compliance with) interventions may be challenging.

      3) A proposal to implement subnational interventions - following the results of this work - may raise ethical questions about cost-benefit trade-offs (eg, whether 355 additional hospital admissions is an acceptable price to pay for 36 million person-days without interventions; ie, two days per citizen, on average). The fact that such decisions would (in the even of a future outbreak) need to be made rapidly, in the face of potential uncertainty about pathogen characteristics, heightens the need for clear understanding of how situational factors may affect the likely effectiveness of interventions (at any scale).

      Impact and broader utility:

      As noted, the question addressed - how we can reduce the disruption caused by interventions for transmission control - is important. Thus, the work presented in this manuscript has the potential for broad utility. Currently, this is limited by the focus on specific outbreak instance.

      In general terms, we agree with the reviewer. That said, the “possibility space” of policymaking is infinite dimensional, in the sense that the intervention measures can take an infinitely many forms, starting times and durations. The framework that we have built upon combining data sources such as demography, mobility, interactions and disease parameters now makes it possible to explore these possibilities. These will be explored in future work.

    1. Author Response

      Reviewer #1 (Public Review):

      The data that is presented is quite clear, and expected given the prior in vitro work, as well as prior work in vivo with helminth infection and BCG vaccination. Overall, it is important to demonstrate that observations from in vitro experiments are relevant in vivo, however, there are concerns with the design of this study which limits its impact. In addition, the study confirms what is expected from prior work, but falls short of adding any new mechanistic insight.

      We thank the Reviewer for evaluation of the manuscript and for the comments. Indeed, published studies have shown that helminth infection can impair the response to the BCG vaccine. However, this manuscript shows for the first time that IL-4 and helminth infection impair MINCLE expression in vivo. In addition, it is the first report demonstrating a negative effect of helminth infections on the antigen-specific Th1/Th17 response after vaccination with a MINCLE-dependent adjuvant.

      Regarding mechanistic insight, we have employed mice deficient in IL-4/IL-13 to determine whether the thwarted Th1/Th17 response is caused by these Th2 cytokines in helminth-infected mice. New Figure 6 in the revised manuscript indeed demonstrates recovery of antigen-specific IFN and IL-17 production in the absence of IL-4/IL-13.

      In terms of the in vivo experimental design, it is unclear why the authors chose to administer BCG IP, when the vaccine is given SC (and then based on more recent data, IV could be arguably interesting and relevant). The focus on the peritoneum limits the potential application of these findings to address the important question of the effects of helminth infection on BCG vaccine responses. The ultimate in vivo experiment to be able to demonstrate a physiological relevance of the mechanisms explored here would be to see what the effect was on Mtb infection in the lung.

      BCG was injected i.p. to induce upregulation of MINCLE on peritoneal cells and to be able to ask whether IL-4 and/or helminth infection will lead to a down-regulation of MINCLE expression on myeloid cells in vivo. Thus, we were not interested in this context in the adaptive immune response to BCG. Instead, the peritoneal BCG injection provided access to myeloid cells exposed to Th2 immune condition in vivo for analysis of MINCLE protein levels on the surface. As stated in the Discussion section (lines 400-405 in the revised manuscript), detection of MINCLE by flow cytometry from tissue cells is complicated by the loss of cell surface protein during enzymatic organ digestion.

      We agree that it would be interesting to study the impact of helminth infection on BCG-induced protection to Mtb challenge infection in the lung. As we have described here the impairment of Th1/Th17 immune responses after immunization with H1/CAF01 that induces protection (Werninghaus et al. 2009 J Exp Med), it would make most sense to perform such challenge infections first in this setting. However, Mtb infection requires a dedicated BSL3 animal facility, we therefore consider such challenge experiments beyond the scope of this manuscript

      The authors do report different responses in the spleen and lymphnode, which is interesting, but lines 336-337 accurately point out that compartmentalized overexpression of IL-10 in the spleens but not the lymph nodes has been described in mice with chronic schistosomiasis. Mechanistic insight into this phenomenon was lacking, and the relevance to Mtb infection is still unknown.

      We agree that the mechanism for the compartmentalized regulation of adaptive immune differentiation in helminth-infected mice is not clear.

      Reviewer #2 (Public Review):

      The manuscript entitled "IL-4 and helminth infection downregulate Mincle-dependent macrophage response to mycobacteria and Th17 adjuvanticity" by Schick et al. demonstrate the inhibitory activity of IL-4 and helminth infection on mycobacteria-mediated Th17 immunity. Overall, the authors reported interesting findings with solid data that advance our understanding of CLR function in fungal-bacterial co-infection.

      We thank the Reviewer for the appreciation of our study.

      Reviewer #3 (Public Review):

      The authors first demonstrated in bone marrow-derived macrophages (BMMs) that IL-4 treatment of BMMs led to a significant reduction of BCG- and TDB-induced MINCLE expression (Fig. 1). While IL-4 treatment did not impact BCG phagocytosis by BMMs, it led to a reduced production of the cytokines G-CSF and TNF by BMMs (Fig. 2). In an elegant model using hydrodynamic injection of mini-circle DNA encoding IL-4, the authors show that IL-4 overexpression abrogated the increased MINCLE expression in monocytes upon BCG infection in vivo. Similar findings were observed in a co-infection model with the hookworm Nippostrongylus brasiliensis, where MINCLE expression on inflammatory monocytes from BCG-infected mice was reduced compared to control mice infected only with BCG (Fig. 3). The key findings of the manuscript include the two murine helminth infection models, S. mansoni as a chronic infection, and N. brasiliensis as a transient infection, in both of which the authors showed an organ-specific inhibition of the Th17 response in a vaccination setting with a MINCLE-dependent adjuvant (Fig. 4 and 5).

      Data shown in the manuscript represents a major advance over previous studies because for the first time a relation between IL-4 and MINCLE expression and function is demonstrated in vivo in relevant co-infection models. All experiments have been done with care. Appropriate controls have been included and conclusions are largely supported by the data. Future studies in human patients will be needed to determine the clinical relevance of the findings observed in the murine helminth infection models.

      We thank the Reviewer for the positive comments and agree that it will be interesting to study the impact of helminth infection on CLR expression and function in human infection and vaccination settings.

    1. Author Response

      Reviewer #1 (Public Review):

      COVID-19 severity has been previously linked to a genetic region on chromosome 3 introgressed from Neandertals. The authors use several computational methods to, within this region, identify specific regions that putatively regulate gene expression, and to identify genes within these regions associated with COVID-19 severity. The use of several complementary computational approaches is a major strength of the paper as it bolsters confidence in the findings and narrows the search for significant genomic regions down to most likely candidates. They find 14 genes that exhibit expression regulated by the identified introgressed genomic regions. Among these are several chemokine receptors including two - CCR1 and CCR5 - whose upregulation is associated with severe COVID-19. The authors then use functional genomics to determine whether the identified regions do regulate gene expression.

      We thank this Reviewer for highlighting these strengths.

      In contrast to the robustness of the computational findings, the authors' MPRA results are less robust with respect to the significance of the paper to clinical severity of COVID-19. The MPRA shows that the computational methods were reasonably effective at identifying regulatory elements within the introgressed region (53%). The authors then focus on emVars where the H.n. allele differentially regulates expression and identify 4 putative emVars that may regulate expression of CCR1 and CCR5. However, the authors found in their MPRA that these emVars downregulate reporter gene expression, whereas the genes of interest CCR1 and CCR5 are upregulated during severe COVID.

      This result highlights the principal weakness of using the MPRA in this context, as it assumes that reporter gene expression using a minimal promoter has identical regulatory determinants as expression of the gene of interest. Its strength is the high-throughput nature of the assay, but its weakness is the lack of specificity with respect to the question at hand. This lack of specificity mitigates the impact of the functional aspect of the work. The authors' computational findings certainly bolster previous work that H.n. introgressed alleles are associated with COVID-19 severity and that this association may be at least partly dependent on gene expression differences between the archaic and modern alleles. However, the specific question at hand, whether chemokine receptor expression is linked to the clinical phenotype, remains unaddressed.

      Ultimately the authors results support the conclusions that the 4 emVars identified do regulate gene expression. However, the hypothesis that these specific regions are linked to COVID-19 severity is not supported. The authors' speculation as to why their results may differ from the observed upregulation during disease is intriguing, but lacks support.

      We thank the Reviewer for providing these important points and we hope through our new experimental approach we helped to strengthen our findings. However, we also have modified the manuscript to also be more critical of our findings in the context of the issues Reviewer has brought up. This is shown in our updated Discussion, whose parts are provided above in the section addressed to the Editor, as well as in the newly revised manuscript.

      Reviewer #2 (Public Review):

      Previous research using GWAS and population genetics approach identified a genetic haplotype on chromosome 3 derived from Neanderthals as the major risk factor for severe COVID-19. However, the specific variants that are causative of the severe COVID-19 phenotype remain unknown. Here, Jagoda et al. aim to identify the causative variants for the severe COVID-19 by leveraging eQTL analysis followed by Massively parallel reporter assays (MPRA). Their datasets and results are unique and novel. Their research is well designed, and will serve as a model strategy for future studies of functional annotation of disease-associated variants.

      We thanks Reviewer #2 for these compliments.

      However, there are following critical weaknesses in this manuscript that reduce the impact of this work; (1) The quantitativity of the MPRA output is questionable because of their incomplete definition of MPRA activity, which is based on absolute barcode counts without comparing negative controls. (2) Molecular mechanisms (binding transcription factors, etc.) of causative variants that underlie the regulation of CCR1/5 expression and COVID19 severity are not analyzed and validated.

      We hope that below we have addressed these comments through our analyses and new experiments.

      Reviewer #3 (Public Review):

      This manuscript by Jagoda et al. addresses the genetic mechanism of the haplotype at chromosome 3 where introgressed from Neanderthals shows the strong association with COVID-19 severity in Europeans. They re-evaluate the adoptively introgressed segment using Sprime and U and Q95 methods and analyze cis- and trans- eQTLs based on the whole blood dataset. All the 361 Sprime-identified introgressed variants act as eQTLs in the whole blood and alter the expression of 14 genes including seven chemokine receptor genes. Then they tested the 613 variants using a Massively Parallel Reporter Assay (MPRA) in K562 cells and narrow downed the 20 emVars. In the end, they selected the four variants based on four criteria regarding the association of COVID-19 severity, eQTL data, chromosomal interaction, and epigenetic marks in immune cells. They highlighted variant rs35454877 (CCR5 regulation), rs71327024, rs71327057, and rs34041956 (CCR1 regulation).

      Narrowing down the four critical variants from the around 800 kb introgressed region is impressive work. However, MPRA and eQTL data are not consistent, and these data don't support clinical gene expression data (increased expression of CCR1 in severe COVID-19 patients).

      We thank this Reviewer for noting our impressive work, we have now addressed these concerns.

    1. Author Response

      Reviewer #1 (Public Review):

      This is an interesting and timely paper investigating the impact on participation in cancer screening programs across Italy during the COVID-19 pandemic where there was massive disruptions to health services. What is of particular interest in this analysis was the investigation of social, educational and cultural factors that might have impacted access and participation to screening.

      • In the present study, the authors analyzed data collected by PASSI between 2017 and 2021, from interviews of more than 106,000 people, a representative sample of the Italian population aged 25-69 was selected but its not clear what was the representativeness by region, gender and age educational attainment? Also what is the total population (so I don't have to look it up). I am wondering if participation differed by characteristics and what approach to achieving the representative sample was made (e.g. replacement of individuals or oversampling certain strata where participation was lower).

      PASSI is one of the two routinely collected Italian National Health Interviews. It has been described in several papers and there is a website reporting in detail methods, percentage of refusals, and numbers of interviews. Nevertheless, we agree with the reviewer that a brief summary of the methods is needed, and we added some details on data collection. Furthermore, details on the number of interviews according to the selected period, age, and sex strata cannot be found in the general description of the survey. Therefore, we gave more details also on the sample used for this study in supplementary table 1.

      • For figures 5-8 what is the N for the different groups not just the %?

      We agree with the reviewer that giving the actual numbers on which the percentages are computed is necessary. Nevertheless, as with any stratified sample, estimates from PASSI are computed using weights, therefore percentages cannot be computed directly from the observed numbers. We decided to add supplementary table 1, which reports the number of valid interviews on which percentages are estimated.

      • Table 2 to me is a key piece of information and very interesting can the authors formally test if there are significant differences between the time periods?

      Thank the reviewer for this suggestion. Firstly, we added a table in which we analyzed all the data together and we included the calendar period, categorized as before and after the pandemic, among covariates. Secondly, we checked if any of the differences between the prevalence ratios observed in the two periods were significantly different at a 0.05 alpha error threshold and we added a comment in the text: “Nevertheless, the differences could be due to random fluctuations”. We did not add p-values for the interaction of all the variables in each cancer screening because the table is already very complex, and three more columns would make it difficult to read.

      Reviewer #3 (Public Review):

      This study is primarily a descriptive analysis that provides a clear and accessible account of how screening activity varied across Italy and between groups. While primarily a simple descriptive account such work is important to document what were the impacts of the pandemic on preventative health services and to understand how they differed across groups. The combination of survey responses from regional screening programmes and individuals is a useful use of two data sources. The study is very clearly written and does not over-interpret the presented data.

      The methods description states that the analysis presents the "standard months" required for the programmes to recover from the service delays. The subsequent reporting of these delays in the results section did not use the same terminology and I see scope for clarification by using common language regarding this assessment throughout the paper. I see scope for further disaggregation of the regional results within the study but equally I understand why the authors might not wish to report outcomes for specific regions. I see scope for improvement in the figures within the manuscript but this is a relatively presentational matter. I would like to see some further description of the Poisson regression analysis as what is included within the manuscript appears rather brief. There is also one section of the methods that seems as if it would better belong in the introduction, but overall the manuscript was very clearly structured.

      We thank the reviewer for his encouraging comments. We checked all the manuscript and we tried to use always the same name for each concept.<br /> We expanded the method section giving more details on models and statistical analysis. We decided not to report data at the regional level but the variability within macro areas.

      The analysis presented achieves the authors' stated aims in my view. I see a useful contribution in documenting the impact of the COVID-19 pandemic on screening in Italy. This may inform further work on assessing the eventual health impact of delays as well as work considering how best to make screening programmes more resilient to such shocks. Ultimately it will take time to observe just how significant the impacts of service interruptions were on cancer prevention. Readers should remember that many screening services may still provide good protection against cancer as long as the interruptions are limited to simply to delays in coverage rather than the longer-term loss of participation, especially for those with incomplete screening histories or of otherwise elevated risk of disease.

      Further work may wish to consider how programmes prioritised capacity or what efforts have been made to restart screening. Similarly, there is scope for more detailed disaggregation assessment of who received screening as programs restarted. Both these issues are beyond the scope of the present study however. The present submission provides a good basis for any further such exploration.

      We thank the reviewer for these comments. We tried to capture some of the concepts in our discussion.

    1. Author Response

      Reviewer #3 (Public Review):

      The authors explore the use of SRT as a host-directed therapy for use in combination with other first-line TB antibiotics. This manuscript is of substantial importance since TB is a major world health concern, and there is growing interest in the development of host-directed therapies to augment existing therapies for TB. Demonstrating the effectiveness of adding an FDA-approved drug to existing cocktails of anti-TB drugs has potentially exciting implications.

      The manuscript is bolstered by their use of multiple in vitro and in vivo models of infection, as well as a clinically relevant strain of TB. While their findings generally support the use of SRT as an effective HDT/treatment, the mechanistic details underlying the effectiveness of SRT remain somewhat obscure, and as presented, the in vitro experiments support more limited conclusions.

      Major concerns:

      In vitro studies (i.e. bacterial culture) were only performed with SRT up to 6 uM while the cultured cell experiments used a range up to 20 uM. 5 uM had almost no effect on the viability/growth of Mtb in macrophages. The authors should use the same concentrations in vitro as their macrophage studies to test whether SRT directly impacts Mtb viability to be able to rule in/out that SRT does not impact Mtb viability when cultured.

      We haven’t seen any appreciable decrease in the growth of Mtb at upto 20M in in vitro experiments, nearly 30-40% restriction after 8 days of culture. We used in combination of HR a lower dose of 6mM in combination with HR to offset the effect of minimal SRT inhibitory effects so that only the effect of SRT is understood.

      The mechanism of action of SRT during TB infection and the conclusions drawn by the authors are not supported by the limited experimentation. SRT is presented as an antagonist of polyI:C-induced type I IFNs, but during TB infection, cytosolic DNA sensing via the cGAS/STING axis constitutes the major pathway through which type I IFNs are induced in macrophages.

      To offer more support that SRT inhibits type I IFN, the authors should consider measuring the the actual amount of type I IFN using an IFNb ELISA. Additionally, the authors should use human/mouse primary macrophages (not just THP1 reporter cells) and measure transcript levels (at key time points post infection) and protein levels of type I IFN and other proinflammatory mediators (e.g. TNFa, IL-1, IL-6) +/- SRT to determine if SRT is specific to the type I IFN response. If this is indeed the case, other NFkB genes/cytokines should not be impacted.

      Moreover, to draw the conclusion that "augmentation property of SRT is due to its ability to inhibit IFN signalling" a set of experiments using an IFN blocking antibody would enhance Figure 2, as both cGAS and STING KO macs have significant differences in basal gene expression and their ability to respond to innate immune stimuli.

      Because the first half of the paper focuses on type I IFNs during macrophage infection to explain the mechanism of action for SRT, additional analysis of the mouse infections to examine levels of type I IFNs, as well as IL-1B and IFN-g (in serum/tissues?), is important for connecting the two halves of the manuscript. The in vivo data would also be strengthened by quantitative analysis of histological changes by, for example, blinded pathology scoring. This type of quantitation would also permit statistical analyses of this important pathology readout.

      We have performed analyse of tissue cytokine levels and did not see stark differences in the levels between HRZE and HRZES at two time points of 4 and 8weeks post treatment (Figure below). We feel that such studies would need a more comprehensive analyses of the immunological response induced in the host by the treatment at multiple time points. Such studies would be part of a more focussed plan in the future proposals and manuscripts. We have also conducted a manual scoring of the lesions between the groups and have recorded this data in the manuscript (Fig.4-figure supplement 1)

      The authors conclude that SRT functions through an inflammasome-related function, but this conclusion requires further support of actual inflammasome activation, such as IL-1B secretion by ELISA or IL-1B processing by western blot analysis, rather than Il1b gene expression alone. Additional functional readouts of inflammasome activation like cell death assays would also strengthen this conclusion.

      We thank the reviewer for these suggestions. These studies are currently underway and will be part of a future manuscript detailing the mechanistics of SRT mediated increase in antibiotic efficacy.

      What strain of TB was used in these studies? The results and methods do not indicate the strain used, which is critical to know since different strains have varying pathogenesis phenotypes.

      We have used Mtb Erdman for routine drug sensitive and N73 for the drug tolerant studies. This has been added in the text.

      Minor concerns:

      It might be worth consistently using the more common INH and RIF abbreviations to increase the clarity/readability of the MS and figures.

      We have used the conventional clinical abbreviations used for INH and Rifampicin What is the physiological concentration of SRT when taken for depression and how does that compare to the concentrations used in vitro? Are the in vitro concentrations feasible to achieve in patients?

      In Figure 3B, why is there a spike in TNF-a in the HRS treated cells only at 42h?

      The authors wish to thank the reviewer for this query. We have reanalysed the data and have depicted the modified figures in the current text version. The spike at 42H for TNF was an oversight and due to an erroneous representation of the values in the figure.

      Was statistical analysis performed on the data in Figure 3B and D?

      Yes, we have incorporated this information in the modified figure.

      A description/discussion of the different mouse strains use in infection - what benefits each has as a model and why several were used - would help convey the impact of the in vivo studies.

      These have been incorporated in the text. A discussion of the mouse strains and their immunopathology in infection has been included in the text.

      Since antibiotics and SRT were administered ad libitum, how did the authors ensure that mice took enough of the antibiotics and especially SRT? Is it known whether these drugs affect the water taste enough to affect a mouse's willingness to drink them?

      We preferred the use of ad libitum delivery of TB drugs in drinking water as used in the previous studies by Vilchèze et .al, 2018 Antimicrob Agents Chemother 23;62(3):e02165-17. To avoid non drinking, we used 5% glucose in the water of all animals including the non-antibiotic treated groups. We also followed the uptake of water during the treatment and found comparable levels of usage between the groups.

      Was statistical analysis performed on time-to-death experiments?

      Because of the inherent differences in the susceptibility and response between males and females C3HEBFEJ mice, we did not perform statistical analyses between the groups.

      Were CFUs measured in mice from Figure 4 to determine empirically how effective the antibiotic treatments were? And if SRT impacted their effectiveness?

      We have not tested the effect of SRT on bacterial burdens on bacteria treated with HR alone as these studies were aimed at deciphering chronic pathology. We have tested the effect on bacterial loads in the C3HEBFEJ model with the four-drug therapy and the C57BL6 and Balbc models of infection.

      The H&E images could use some additional labels to more easily discern what groups they belong to.

      These have been incorporated in the figure.

    1. Author Response

      eLife assessment

      The purpose of this study was to determine whether heme oxygenase -2 deficiency translates to deficiencies in motor neuron function. This paper plays a plausible mechanism by which heme oxygenase-2 deficiency can lead to obstructive apneas. Indeed, this is among the first papers to comprehensively describe a signaling pathway in motor neurons and the consequences of its deficiency. Furthermore, the work completed here may be relevant to other diseases in which motor neuron signal transmission is a key contributor.

      We thank for their assessment and constructive comments. Based on their input below we performed additional analyses focused on the impact of HO-2 dysregulation on the rhythmogenesis from the preBötC.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript discussed the combination use of pyrotinib, tamoxifen, and dalpiciclib against HER2+/HR+ breast cancer cells. Through a series of in vitro drug sensitivity studies and in vivo drug susceptibility studies, the authors revealed that pyrotinib combined with dalpiciclib exhibits better therapeutic efficacy than the combination use of pyrotinib with tamoxifen. Moreover, the authors found that CALML5 may serve as a biomarker in the treatment of HER2+/HR+ breast cancer.

      The authors provide solid evidence for the following:

      1) The combination use of pyrotinib with dalpiciclib exhibits better therapeutic efficacy than the combination use of pyrotinib with tamoxifen.

      2) Nuclear ER distribution is increased upon anti-HER2 therapy and could be partially abrogated by the treatment of dalpiciclib.

      3) CALML5 may serve as a putative risk biomarker in the treatment of HER2+/HR+ breast cancer.

      The manuscript has significant strengths and several weaknesses. The strengths include the identification of the novel role of dalpiciclib in the treatment of HER2+/HR+ breast cancer. Moreover, the authors provide solid evidence that the combined use of dalpiciclib with pyrotinib significantly decreased the total and nuclear expression of ER. The main weakness of the manuscript is that the manuscript is difficult to read due to language inconsistency. In addition, some figure captions and figure legends should be carefully amended.

      Thanks for your comments on our manuscript. We feel sincerely sorry for the inconsistency of the manuscript due to poor language. We have improved our manuscript as well as the figures according to your valuable suggestions.

      Reviewer #2 (Public Review):

      The authors performed preclinical studies to investigate the underlying mechanism of how the combination of pyrotinib, letrozole and dalpiciclib achieved satisfactory clinical outcomes in the MUKDEN 01 clinical trial (NCT04486911). Mechanistically, using anti-HER2 drugs such as pyrotinib and trastuzumab could degrade HER2 and facilitate the nuclear transportation of ER in HER2+HR+ breast cancer, which enhanced the function of ER signaling pathway. The introduction of dalpiciclib partially abrogated the nuclear transportation of ER and exerted its canonical function as cell cycle blockers, which led to the optimal cytotoxicity effect in treating HER2+HR+ breast cancer. Furthermore, using mRNA-seq analysis and in vivo drug susceptibility test, the authors succeeded in identifying CALML5 as a novel risk factor in the treatment of HER2+HR+ breast cancer.

      Thanks for your comments and valuable suggestions, we’ve improved our manuscript according to your suggestions.

      Reviewer #3 (Public Review):

      In this research, the authors explore a novel mechanism of CDK4/6 inhibitor dalpiciclib in HER2+HR+ breast cancers, in which dalpiciclib could reverse the process of ER intra-nuclear transportation upon HER2 degradation. The conclusions are significant to gain insight into the biological behavior of TPBC and provided a conceptual basis for the ideal efficacy in the published clinical trial. The findings are supported by supplemented in vivo assay and transcriptomic analysis.

      Thanks for your comments and valuable suggestions to us so that we could improve this manuscript.

  2. Dec 2022
    1. Author Response

      Reviewer #2 (Public Review):

      The majority of genetic effects discovered in genome-wide association studies (GWAS) of common human diseases point to non-coding variants with putative gene regulatory effects. In principle, studying genetic effects on gene expression phenotypes, as mediators between genotype and disease, can help understand the underlying function of GWAS variants.

      Lafferty et al., set to study the regulation of microRNA (miRNA) levels in mid-gestation human neocortical tissues as a potential contributor to brain-related phenotypes. To this end they performed miRNA expression profiling via small-RNA sequencing, followed by assaying expression quantitative trait loci (eQTLs) that locally regulate miRNA genes.

      In addition to reporting some properties of miRNA-eQTLs, e.g., their tissue-specificity, the authors searched for potential overlap or "colocalization" between these eQTL loci and GWAS loci for several putatively brain-related phenotypes. They reported colocalization at the locus containing the SNP rs4981455 which is an eQTL for miR-4707-3p and is also associated with global cortical surface area (GSA) and educational attainment phenotypes in GWAS. They further showed that exogenously increased expression of miR-4707-3p in primary human neural progenitor cells (as a model to study neurogenesis) derives an increased rate of proliferation.

      The reported results are interesting and important, particularly for the understanding of miRNA biology. That said, as I detail below, the claim that miR-4707-3p expression modulates brain size and thus cognitive ability, although potentially consistent with the data, is not unequivocally supported by the analyses. As such, considering the potential social impact of the misinterpretations of these results, I believe the authors should explicitly discuss caveats, alternative explanations consistent with the data, and broader implications:

      We thank the reviewer for their positive evaluation of our work and detailed comments. We agree that misinterpretation of our results could have negative social impacts, and now have added caveats and alternative explanations to our discussion section.

      1) The colocalization analysis used effectively tests whether miRNA-eQTL and GWAS variants are in linkage disequilibrium (LD), and does not formally test whether the miRNA-eQTL and GWAS signals are explained by the same genetic variant which is necessary for establishing causality. In this study, a formal test of colocalization is challenging given that the LD patterns in the eQTL data (from mixed ancestries) are dissimilar to the GWAS data (from European-descent samples). Furthermore, even if GWAS and miRNA-eQTL signals are explained by the same variant, this could be due to confounding (a confounder affecting both), or pleiotropy (genotype independently affecting both), and not necessarily that the miRNA-eQTL signal mediates the GWAS signal. These are also true for colocalization analyses of miRNA-eQTLs with mRNA-eQTLs or splicing-QTLs. One practical suggestion is whether authors can perform the colocalization analysis better, e.g., with methods such as SMR (https://yanglab.westlake.edu.cn/software/smr/#Overview).

      As the reviewer mentioned, testing colocalized genetic signals using the eQTL dataset presented in this study remains challenging given the mixed-ancestry of the samples. We believe our primary test for colocalization, conditioning the miRNA-eQTL association using a secondary signal index variant, is sufficient evidence for a shared genetic signal (Nica et al., 2010). This is particularly true when looking for colocalizations between the miRNA-eQTLs and mRNA-e/sQTLs because both datasets used largely the same samples for expression quantification. However, the colocalization between the miRNA-eQTL for miR-4707-3p expression and the GWAS signal for educational attainment warrants greater scrutiny because the GWAS signal was discovered in European-descent samples.

      To address this concern, we have conducted an additional colocalization test using the SMR and HEIDI methods as suggested by the reviewer (Zhu et al., 2016). We have updated the results section, “Colocalization of miR-4707-3p miRNA-eQTL with brain size and cognitive ability GWAS”:

      "In addition to the HAUS4 mRNA-eQTL colocalization, the miRNA-eQTL for miR-4707-3p expression is also co-localized with a locus associated with educational attainment (Figure 5A)(2). Conditioning the miR-4707-3p associations with the educational attainment index SNP at this locus (rs1043209) shows a decrease in association significance, which is a hallmark of colocalized genetic signals (Figure 5-figure supplement 2A)(58,59). Additionally, the significance of the variants at this locus associated with miR-4707-3p expression are correlated to the significance for their association with educational attainment (Pearson correlation=0.898, p=5.1x10-7, Figure 5-figure supplement 2B). To further test this colocalization, we ran Summary-data-based Mendelian Randomization (SMR) at this locus which found a single causal variant to be associated with both miR-4707-3p expression and educational attainment (p=7.26x10-7)(60). Finally, the heterogeneity in dependent instruments test (HEIDI), as implemented in the SMR package to test for two causal variants linked by LD, failed to reject the null hypothesis that there is a single causal variant affecting both gene expression and educational attainment when using the mixed-ancestry samples in this study as the reference population (p=0.159). The HEIDI test yielded similar results when estimating LD with 1000 Genomes European samples (p=0.120). All this evidence points to a robust colocalization between variants associated with both miR-4707-3p expression and educational attainment despite the different populations from which each study discovered the genetic associations."

      To strengthen the argument for colocalization, we added Figure 5-figure supplement 2.

      Given the unique problem of colocalizing genetic signals from datasets with different LD patterns, we also attempted to colocalize the miRNA-eQTL for miR-4707-3p and educational attainment GWAS using eCAVIAR and coloc (Hormozdiari et al., 2016; Wang et al., 2020). Neither of these methods produced a significant colocalization between these two genetic signals at this locus. However, neither of these methods were designed or tested using mix-ancestry reference populations, and therefore we are still confident in declaring a shared genetic signal at this locus.

      2) Although possible, there is no direct evidence that the GWAS signals at rs4981455 for educational attainment and GSA are driven by variation in miRNA levels in the studied tissue. As the authors noted, rs4981455 is also an eQTL for the gene HAUS4. Furthermore, rs4981455 is a significant e/sQTL across almost all adult tissues in GTEx, and so likely has regulatory activity across wide ranges of cell or tissue types. Therefore, pinpointing the causal contexts mediating the effect in GWAS is impossible with the current data.

      We agree that fully understanding the causal relationship, or mechanism, between rs4981455 and educational attainment is impossible with the current data. However, we believe the miRNA-eQTL at rs4981455, discovered in developing brain tissue, provides clues as to the causal context of this locus on educational attainment. We have updated the language throughout the manuscript to temper the notion that expression differences in miR-4707-3p is causal for changes in educational attainment (discussed below), yet we maintain that the evidence provided is consistent with miR-4707-3p playing a role in brain development ultimately leading to changes in adult educational attainment. The updated hypothesized causal relationship is shown in Figure 6H and expanded discussion on the caveats of this study, addressed in the next section, also serve to mitigate this concern.

      3) Orthogonal to the issues above, the genotype-to-phenotype pathway as hypothesized, i.e., genotype → miRNA levels → brain structure → educational attainment, is oversimplistic and rests on an implicit prior belief that genetic associations with educational attainment can be trivially mapped to fundamental brain features that determine cognitive ability. To illustrate the problem with this prior I refer to an old example by Christopher Jencks: in a society that prevents red-hair kids to go to school, genetic effects on hair color would be associated with educational attainment, despite having no intrinsic biological relationship with cognition. I give two scenarios consistent with the specific case of rs4981455 that are fundamentally different from what is implied in the paper: (i) The case of indirect genetic effects (see Kong et al., Science 2018). In this case, rs4981455 affects the nurturing behavior of an individual's parents, which in turn influences the individual's educational achievements and consequently brain structure features. (ii) The case of confounding. In this case, the genetic effects on brain structure are shared with another feature, such as facial shape (see Naqvi et al., Nature Genetics 2021). Variation in facial shape in a discriminatory educational environment can covary with educational attainment.

      The causal pathway presented in the original version of this manuscript was indeed too simplistic and inferred a causal pathway between rs4981455 and educational attainment that was not fully backed by our data nor could be fully proved experimentally. The point we had hoped to make, and which is better represented by the updated version of Figure 6H, is that if there is a causal relationship between rs4981455 and educational attainment mediated by miR-4707-3p expression, we may be able to detect the influence of miR-4707-3p on a cellular phenotype that would explain the association of rs4981455 with cortical surface area, intracranial volume, and head size.

      An updated discussion summarizes how we were not able to find evidence for a molecular mechanism consistent with the radial unit hypothesis, but that a biological link between the miRNA-eQTL and GWAS phenotypes may yet be uncovered:

      "We did find one colocalization between a miRNA-eQTL for miR-4707-3p expression and GWAS signals for brain size phenotypes and educational attainment. This revealed a possible molecular mechanism by which genetic variation causing expression differences in this miRNA during fetal cortical development may influence adult brain size and cognition (Figure 6H). Experimental overexpression of miR-4707-3p in proliferating phNPCs showed an increase in both proliferative and neurogenic gene markers with an overall increase in proliferation rate. At two weeks in differentiating phNPCs, we observed an overall increase in the number of cells upon miR-4707-3p overexpression, but we did not detect a difference in the number of neurons at this time point. Based on the radial unit hypothesis (26,73), we expected to find an overall decrease in proliferation or increase in neurogenesis upon miR-4707-3p overexpression which would explain decreased cortical surface area. However, our in vitro observations with phNPCs do not point to a mechanism consistent with the radial unit hypothesis by which increased miR-4707-3p expression during cortical development leads to decreased brain size. This has also been seen in similar studies using stem cells to model brain size differences linked with genetic variation (74). Nevertheless, the transcriptomic differences associated with overexpression of miR-4707-3p in differentiating phNPCs suggest this miRNA may influence synaptogenesis or neuronal maturation, but these phenotypes may be better interrogated at later differentiation time points, by jointly expressing HAUS4 and mir-4707, or with assays to directly measure neuronal migration, maturation, or synaptic activity."

      We believe the two cases addressed by the reviewer of indirect genetic effects and confounding which may actually explain the association between rs4981455 and educational attainment are less likely to be influencing the miRNA expression of miR-4707-3p measured in developing cortical tissue. This is combined with a discussion on the caveats of our findings and is addressed in the next section.

      4) The paper lacks a discussion on caveats to protect against potential misinterpretation of findings, especially considering the troubled history of linking facial shape and head morphology to human behavior and intelligence. I refer to the last paragraph of Naqvi et al., Nature Genetics 2021, as an example of such discussion. This is particularly crucial given that the frequency of rs4981455 varies across human populations. For example, it is important to state that the GSA and education attainment GWAS findings are in individuals of European descent, and may not necessarily point to an effect in other ancestries or even in European-descent individuals that differ from the GWAS samples in various ways, e.g., socioeconomic status (see Mostafavi et al., eLife 2020). In other words, these findings pertain to variation within the studied samples. On this note, it is important to state the amount of variation in multiple phenotypes explained by rs4981455 (which is likely tiny), and that it by no means determines the phenotype.

      We have added a paragraph to the discussion highlighting the caveats of our analysis and protecting from overinterpretation of our findings:

      "Here we have proposed a biological mechanism linking genetic variation to inter-individual differences in educational attainment. Given the important societal implications education plays on health, mortality, and social stratification, a proposed causal mechanism between genes and education warrants greater scrutiny (75,76). Any given locus associated with educational attainment may be driven by a direct effect on brain development, structure, and function, an indirect genetic effect such as parental nurturing behavior, or confounding caused by discriminatory practices or societal biases (77,78). Given that expression was measured in prenatal cortical tissue, where confounding societal biases are less likely to drive genetic associations and that experimental overexpression of miR-4707 affected molecular and cellular processes in human neural progenitors, the evidence at this locus is consistent with a direct effect of genetic variation on brain development, structure, and function rather than being driven by confounding or indirect effects. However, there are some important caveats to this statement. First, our study only provides evidence for the direct effect on the brain at this one educational attainment locus. Our study does not provide evidence for the direct brain effects of any other locus identified in the educational attainment GWAS. Second, common variation at this locus explains a mere 0.00802% of the variation in educational attainment in a population, so this locus is clearly not predictive or the sole determinant of this phenotype. Third, the GWAS for educational attainment and brain structure were conducted in populations of European ancestry, and allele frequency differences at these loci cannot be used to predict differences in educational attainment or brain size across populations. Finally, though both experimental and association evidence suggests a causal link between this locus and educational attainment mediated through brain development, we are unable to directly test the influence of miR-4707-3p expression during fetal cortical development on adult brain structure and function phenotypes. Therefore, we cannot rule out the possibility that the causal mechanism between rs4981455 and adult cognition may be a result of genetic pleiotropy rather than mediation at this locus. Despite these caveats, identifying the mechanisms leading from genetic variation to inter-individual differences in educational attainment will likely be useful for understanding the basis of psychiatric disorders because educational attainment is genetically correlated with many psychiatric disorders and brain-related traits (2,79)."

      We hope that this paragraph contextualizes our results sufficiently to emphasize the high bar that must be surpassed to propose a biological link between a miRNA-eQTL and a risk loci for brain related traits while maintaining that we can not completely rule out the possibility of genetic pleiotropy.

      5) The main colocalization signal is tentatively shown for GSA. However, the authors casually refer to links with "brain size" or "head size" throughout the paper.

      In addition to the locus showing a sub-genome wide significant association to global cortical surface area (GSA) presented in Figure 5, a GWAS for head size (Knol et al., 2020) and a GWAS for intracranial volume (Nawaz et al., 2022) (recently published since submitting the original manuscript) both show genomic associations at this locus for miR-4707-3p expression. The index variants for both traits colocalize with the miRNA-eQTL for miR-4707-3p and their effect directions match: alleles increasing expression of miR-4707-3p show association to decreased head size and decreased intracranial volume. For both of these studies, the summary data is not yet publicly available, preventing us from constructing plots at this locus (similar to those shown in Figure 5) or conducting additional colocalization analyses. To be more consistent throughout the paper, we have replaced many “head size” references with “brain size” when talking about this locus.

    1. Author Response

      Reviewer #2 (Public Review):

      I am not a specialist in cryo-EM, so cannot comment on the technicalities of the structure reconstruction or methods used. I thus focus on the conclusions and observations that the authors provide in the manuscript and their relevance to functional photosynthesis.

      The authors attempt to resolve the structure of PSII from Dunaliella and noticed that three types of PSII could be identified: two conformational states, and a stacked configuration. There is no doubt that these structures add to our current knowledge of PSII and that they exist in abundance upon solubilisation of the sample. My main issue however is the relevance to in vivo conditions, and the efforts to exclude the possibility that pigment loss and conformational states and stacking are a reflection of ex-vivo manipulations.

      Our compact model contains 202 Chls molecules while the stretched conformation contains 206 Chls. All of the differences in Chl binding are attributed to CP29. We have compiled a table enumerating the different CP29 structures currently available from plants and green alga at similar resolution to our work (Supplementary table 2). In the larger plant complexes (C2S2M2) CP29 contains 14 chls, while CP29 in smaller C2S2 complexes contains 10-13 chls, so it appears the some chl loss from CP29 is associated with the release of LHCIIM. In the green alga structures, CP29 contains less chls in general and shows a similar trend. The currently published structure most relevant to our work contains 8 chls (6KAC), a somewhat lower amount then both the compact and stretched models (9 and 11 chls, respectively). The stretched orientation, which is the closest match to the known PSII core arrangement, therefore contains more chls than comparable models. While the in-vivo configuration is not known in the sense that it could contain more chls, the current structure is apparently the closest representation of it.

      The presence of CP29 with lower chls content in the chlamy C2S2 (6KAC, which is in a stretched orientation) supports a conclusion that pigment loss from CP29 alone is not sufficient to trigger the stretch to compact transition although it is associated with it. In general, the precise orientation of CP29 is variable and seem to depend on the binding of additional LHCII, it is possible that some chl loss is accompanied with these changes in vivo.

      I see a number of questions pertaining to this work. Starting from the two conformations of PSII, compact and stretched, the authors say that both are highly active based on oxygen measurements at a saturating light intensity. In the meantime, they report large variations in the chl content and positions of the chlorophyll molecules in these structures (also compared to other known PSIIs). This gives the impression that one can lose two chlorophylls, and freely modify the distance between others without losing efficiency, certainly a risky conclusion. Are the samples highly active also in light-limiting conditions? It is thought that even tiny movements and alterations in chl-chl distances alter their coupling and spectral properties, how come the variations in this report are so huge? In other words, the assay tests the charge separation activity of the PSII RC in the preps, but not the light-harvesting efficiency.

      The chl content differences reported in this work amounts to 2%. In our opinion this represents quite a low variation in pigment content, which exist in virtually any experiment involving large complexes. We agree that measurements of activity in limiting light conditions are interesting, however this goes beyond the scope of the current work. Light harvesting efficiency in PSII is known to vary substantially as a result of additional mechanisms (NPQ in some of its forms), not associated with chl loss or gain. While the formation of quenching centers is attributed to small structural changes within specific pigment protein complexes, what we are showing in this work are structural changes between pigment protein complexes. These can affect transfer rates between the different complexes but are distinct from the structural changes thought to accompany the formation of quenching centers within specific pigment protein complexes.

      How does one ascertain that the lost chlorophyll molecules in CP29 are not a preparation error? Does slightly increasing the detergent concentration impact the proportion of stretched:compact forms?

      The effect of detergent concentration on the proportion of the different forms was not tested directly. However, we do not detect many differences in lipids or bound detergent molecules content between the two conformations, suggesting that for these “ligands” the differences are not substantial. We can only distinguish these two forms at the very last stages of data processing, at the present state of cryoEM cost and time availability, mapping the effect of detergent concentration on the different orientations is outside our reach.

      On a similar note, how do the authors exclude that a certain interaction with this type of grid impacts the distribution of these complexes? Is it identical to a biologically separate preparation of algae? In case of discoveries of this type, it is of high importance to exclude as many possibilities of non-native conditions or influences on the structure.

      It’s hard to completely exclude grid and sample preparation issues. However, we employed relatively standard grids and vitrification conditions. The observed complexes are embedded in vitrified ice and do not interact with the grid directly. The differences we observed are mainly in the orientations of the PSII cores, all the interactions between PSII subunits within each core are preserved and agree with previously published structures. Since the interactions within the core and between cores involve the same physical principles, we think its fairly conservative to think that the observed core orientations are not an artefact of sample preparation.

      I would further like to encourage the authors to elaborate on the CP29 phosphorylation. What is the proportion of PSIIcomp that are phosphorylated? I assume it is not 100%, as in this case, the authors would propose that this is the effect that modulates between compact and stretched architectures.

      Its difficult to estimate the proportion of observed phosphorylation/sulfinylation. To be detected in maps, most of the residues (above 50%) are probably modified. We attempted to estimate this by refining the atom occupancies of the Pi molecule on Ser84 and the oxygens attached to Cys218, both values suggested that about 70% of the complexes are modified. With regards to the possibility that these modifications can promote the formation of the compact state, we think that this is certainly a possibility, since these modifications were detected in this state and are in close proximity to each other. However, this can also result from the resolution differences of the maps and the structural implications of both modifications are hard to predict. At this point we prefer to note their existence without further interpretations.

      In line 290, the authors highlight the structural heterogeneity within the two groups' PSII conformations. I would like to see how does the distribution look like for all the structures together: are the two (stretched and compact) specifically forming two heterogenous distributions? Or is it possible that the distribution between the two is quasi-continuous? In other words, if the structures are not perfectly defined, how do the authors decide that two- and not more or less subtypes exist?

      We went back and refined the initial particle group (containing both compact and stretched orientations) using multibody with masks defining the two PSII monomers. This analysis showed the expected two peaks only in the first Principal components which accounted for ~38% of the variance in the dataset.

      Multibody refinement carried out on the combined particle dataset shows one very large PC accounting for about 38% of the variance and the presence of two distinct peaks in the particle distribution of the first PC.

      From this analysis it’s clear that there are two distinct classes in this particle set (as expected), as none of the other PC’s shows any signs of multiple peaks, this analysis suggests that two distinct models are the best representation of this eukaryotic PSII. Whether these are quasi continuous or distinct is more complex. There is continuity in this representation (particle distributions along PC), a different picture may appear if characters such as CP29 state are considered, but the size of CP29 and the remaining heterogeneity does not provide enough signal to carry out this classification at the moment.

      Considering the stacked PSII, I also have a few concerns. Contrary to previous studies the authors do not assign a functional role to the stacking beyond the structural aspect. This could be better backed by a discussion about the closest chlorophyll a molecules across the stacked PSII, which given the rather large distance shown in fig. 4L seems to be too large for any EET across the stromal gap.

      The closest chl-chl distance that we can measure in the stacked PSII dimer is ~54 Å, with most distances at the ~70 Å range, making EET between staked complexes very slow. We have added a statement clarifying this to our manuscript. In our opinion a structural role for the staked PSII dimer is more likely.

      There is a report that suggests the presence of some density between the stacked PSII - could the authors comment on the differences between it and their work? Are the angles and positions conserved between these types of stacks? https://doi.org/10.1038/s41598-017-10700-8

      We referred to Albanese et al, in our manuscript. We isolated the C2S2 complex from green alga, the analysis in Albanese et al was done on C2S2M1 complexes from pea and this can account for some of the differences. At any rate, our conclusion that we don’t find any evidence for protein linkers in the stacked complex is stated clearly. The angles described in Albanese et al are consistent with our analysis.

      Line 387, the authors state that due to the transient nature of the interactions across the stromal gap, the stacks could be "under-detected" in cryo-ET data. This statement is in my opinion misformulated. For once, the transient interaction argument would apply the same (if not more due to changing conditions induced by the purification process) to the single particle analysis performed in this paper. Second, tomographic volumes detect hundreds of PSII in a suspended state. Any transient interaction that adds up to 25% of particle population in a steady state cell should be clearly visible, while the in situ data suggests not more than random cross-stromal-gap orientations. Of course, this can be a specificity of Chlamydomonas or a particular growth condition. The statement used by the authors could be indeed converted into: the PSII stacks are over-detected in vitro, and it is certainly a simpler explanation for their presence. It is also important to mention that PSII stacking alone is not the only reason for grana architecture - stacking with the antenna of larger complexes, absent in the authors' preparation could also contribute to grana maintenance; and auxiliary proteins such as CURT help with this issue as well. Here a recent demonstration of the importance of minor antenna should probably be also cited: https://doi.org/10.1101/2021.12.31.474624

      We used the term “flexible” rather than “transient” to describe the interactions within the stacked PSII dimer. Our data (and tomographic data) do not contain any temporal component. When we used the term under-detected we refer to the fact that PSII is mainly detected by the luminal extrinsic subunits. The flexibility detected in our analysis may affect the concurrent visibly of these features in the PSII complexes making up an individual PSII stack. Specifically, Wietrzynski et al mainly analyze C2S2M2L2 complexes while our analysis only contained C2S2 complexes. It is likely that the different amount of bound LHCII affect PSII stacking as well. For example, Wietrzynski et al, show some overlap between LHCII complexes and little overlap between cores in the larger complexes they analyzed. We observe mainly core to core overlap with little LHCII overlap in the smaller C2S2, although we did not observe any states where LHC’s were not included in what appear to be the binding interface. We agree with the reviewer on the relevance Lhcb’s and CURT contributions to stacking but prefer to focus on what was directly demonstrated in our data. We clearly note that we are discussing in-vitro results.

      Taking these last thoughts, I would like to finish by mentioning one more thing - almost philosophical. The authors are certainly at the forefront of the booming cryoEM revolution in biology which is profoundly changing the way we understand the living. There is absolutely zero doubt that this powerful technique is of the highest interest. But a growing number of structures of photosynthetic complexes remain puzzling, in particular with regard to their abundance in vivo (such as the PSII stacks) and functional relevance. How do we ascertain that these interactions are not due to in vitro preparation (isolation from cells, solubilisation)? Which ways can we use to try to exclude this (simple) hypothesis? I suggest that at least a small extent of biological replicas - experiments performed on separate batches, in different technical conditions, with slightly altered solubilization conditions, and so on - could shed light on the nature of these structures and their occurrence in vivo. Technical reps of the freezing+analysis pipeline could also be tried to see the variability. This would strongly reinforce this manuscript and its conclusions, and while not completely unequivocal (the stacked PSII, for example, could form upon each purification), a quantification of the effects would be of high interest.

      We certainly share the reviewer hope of being able to conduct cause and effect cryoEM experiments covering a complete set of experimental parameters. This is still beyond reach in terms of time and cost. Within each cryoEM experiment, however, all the analysis is consistent and, more importantly, transparent with regards to image analysis, which is the most important factor in our opinion. Preparation artefacts are always a possibility but, in our opinion, cryoEM is not affected by them differentially compared to other techniques. As we mentioned above, the particles are being observed suspended in vitreous ice, this is not different, and one can say even better, then numerous low temperature spectroscopic observations on samples suspended in glass state or crystals obtained in the presence of high concentrations of various agents. One thing that validates structural studies are the chemical details (bond lengths and angles etc…) underlying every model which are consistence with known values to close tolerances.

      Reviewer #3 (Public Review):

      In this manuscript, Caspy et al. present a detailed structural analysis of eukaryotic photosystem II (PSII) isolated from the green alga Dunaliella salina. By combining single-particle cryo-EM with multibody refinement, the authors not only reveal a high-resolution (2.4Å) structure of the eukaryotic PSII, but also demonstrate alternate conformations and intrinsic flexibility of the overall complex. Stretched and compact conformations of the PSII dimer were readily identified within the single-particle dataset. From this structural analysis, the authors propose that excitation energy transfer properties may be modulated by changes in transfer distance between key chlorophyll molecules observed in different conformational states of the PSII dimer. Due to the high resolution of the maps obtained, the authors identify post-translational modifications and a sodium binding site based on the observed cryo-EM maps. Additionally, the authors analyze PSII complexes in stacked and unstacked configurations, and find that compact and stretched states also exist within the stacked PSII complexes. From their cryo-EM maps, the authors demonstrate that there is no direct protein-protein interaction between stacked PSII complexes, and rather propose a model wherein long-range electrostatic interactions mediated by divalent cations such as magnesium, can facilitate PSII stacking.

      The conclusions and models presented in the manuscript are mostly well justified by the data. The cryo-EM maps are high quality and the models appear generally well refined. However, some aspects of data processing and analysis, as well as the resultant conclusions need to be clarified.

      1) In general, it is not clear from the cryo-EM processing workflow (suppl. Fig 1) or the methods section when exactly symmetry was applied during 3D classification and refinement. In the case of C2S2 unstacked particles, when was symmetry first applied in the overall processing workflow? To identify the compact and stretched configurations of C2S2, did the 3D classification without alignment (and/or the refinement preceding this classification) have C2 symmetry applied? If so, have you considered the possibility that some particles may actually be asymmetric in some regions?

      We modified figure S1 to clearly indicate the use of symmetry and particle expansion. In general, we refined most of the particle sets without symmetry (C1). At the final processing stage of the unstacked PSII sets, after we separated both conformations, we used C2 symmetry to expand the data, this was followed by multibody refinement. No symmetry or symmetry expansion was used for the stacked PSII particle sets.

      2) Following multibody refinement in Relion individual maps and half-maps for each body will be generated. There is no mention in the methods of how these individual maps for each C2S2 "monomer" were combined to produce an overall map of the dimer following multibody refinement. There are several methods currently used to combine such maps, including taking the maximum or average of the two maps or using a model-based approach in phenix. The authors should be explicit about the method they used, any potential artifacts that may develop from this map combination process, and/or the interface between masks used in multibody refinement.

      We used phenix.combined_focused_maps to combine the maps. This is now indicated in the method section.

      3) In addition to the point raised above, following multibody refinement there will be an individual FSC curve and resolution for each body. However, in supplemental figure 2 and supplemental table 1, only a single FSC curve and resolution are reported. Are these FSC curves/resolutions only reported for the better of the two bodies? If not, how was a single resolution calculated for the overall map of combined bodies?

      Both FSC curves were calculated and were highly similar, as expected following C2 expansion. This can also be evaluated from the local resolution maps which are highly similar between the two bodies. The reported resolutions are all taken from the displayed FSC curves generated through relion PostProcess.

      4) One of the major conclusions from the 3D classification and multibody refinement is that conformational changes and inherent flexibility of the PSII dimers have the potential to change distances between cofactors in the complex, ultimately leading to altered excitation energy transfer. However, it is unclear whether or not the authors believe one conformation over another may more readily support the evolution of oxygen. It would be nice if the authors could elaborate slightly upon this topic in the discussion.

      As discussed above the structural changes associated with the formation of quenching centers are not expected to be detected in the current work. The changes we observe can however affect the transfer to such centers and by doing so can play an important part in PSII biology. We do not detect any changes around the OEC and we don’t find any reason to think the two conformations are different with respect to their ETC.

      5) Along the lines of point 4 above, on line 95 the authors claim that "the high specific activity of 816 umol O2/ (mg Chl * hr) suggest that" both the C2S2 compact and stretched conformation are highly active. However, it is not clear to me why this measure of specific activity would suggest that both PSII conformations should have "high" activity. Maybe a reference here would help guide readers to previous measures of specific activity?

      Looking at specific activity from previously published structural studies on eukaryotic PSII we find that Sheng et al, 2019 reported on a specific activity of 272 mol O2/ (mg Chl * hr), this difference can stem partially from the presence of larger complexes in their preparation and is comparable to the activity that we measured in our As fraction (276 mol O2/ (mg Chl * hr), Figure 1-figure supplement 9). Reported specific activity values from plants (Pisum sativum) are also similar, Su et al, reported on a maximal value of 288 mol O2/ (mg Chl * hr), again, for larger complexes which can explain some of the difference. However, the specific activity measured for the C2S2 PSII isolated in the current study is 2.8 X higher than this value, more than the differences in chl content which ranges between 1.5 X to 2 X in favor of the larger complexes. If either one of the conformations is not as active, it would only mean that the other conformation will display even higher specific activity which seems less likely. In addition, we find no difference around the oxygen evolution center or in the peripheral luminal subunits in both the shape or map strength so both orientations show highly similar structures around these regions which determine the oxygen evolution activity.

      6) It is claimed that "more than 2100 water molecules were detected in the C2S2 compressed model", and the water distribution is shown in Figure 3. Obtaining resolutions capable of visualizing waters with cryo-EM is still a significant challenge. Upon visual inspection of the map supplied, it appears that several of the waters that were built into the atomic model simply do not have supporting peaks in the coulomb potential map above the level of noise. While some of the modeled waters are certainly supported by the map, in my opinion, there are many waters that simply are not, or at best are questionable. What method or tool was originally used to build waters into the model, and how were these waters subsequently validated during structure refinement?

      We followed standard methods for water placement and refinement in the preparation of the model, in addition to manually curating the water structure. However, in light of the reviewer comment we undertook additional rounds of refinement and inspection of the water molecules in the model. We removed a few hundred water molecules so that the total number of water molecules is now around 1700. All the water molecules in the present model should be well supported at maps values higher then 2.5 sigma and in our opinion the current water model should be regarded as conservative and underestimates the number of bound water molecules. This also led to some improvements in additional validation statistics of the model which are listed in the Table 1. The new model has been deposited in the PDB and the new PDB validation report is included in our resubmission.

      7) The authors claim to identify several unique map densities during model building. One of these is a sodium ion close to the OEC, which is coordinated by D1-His337, several backbone carbonyls, and a water molecule. When looking closely at the cryo-EM map supplied, it appears that the coulomb potential map is quite weak for this sodium, and is only visible at quite low contour levels. In fact, the features for the coordinating water, and chloride ions located ~7-9A away are much stronger than the sodium. Do the authors have any explanation for why the cryo-EM map is significantly weaker for the sodium compared to the coordinating water or chloride ions in the same general vicinity? Similar to what they did for the other post-translational modifications, the authors should consider showing the actual cryo-EM map for the bound sodium in supplemental Figure 10 a,b.

      Our main support for the placement of a Na+ ion in this location stems from the analysis of Wang et al. Our maps show the presence of a density which is discernible at 4 σ with an elongated shape suggesting the presence of multiple atoms/waters. Although in principle positive ions should have very strong densities in cryoEM maps due to their interactions with electrons, other factors such as occupancy, coordination and b-factor also play a role making the distinction between water and sodium complicated and case specific. The sodium peak is not observed in unsharpened maps (as do most of the water molecules which occupy conserved positions).

        We collected a few examples from comparable cases (cryo-EM maps of similar resolution ranges) where the presence of sodium ions is highly probable based on additional evidence. These maps densities highlight the factors we discussed above. In cases ‘a’ (dual oxidase 1 prepared in high sodium conditions) and ‘b’ (human voltage-gated sodium channel), Na+ is observed in a highly coordinated states and especially in ‘a’ shows the expected increase density values compared to water molecules. However, cases ‘d’ (human Na+/K+ P type Atpase) and ‘e’ (voltage-gated sodium channel) appear very similar to the proposed Na+ assignment in PSII. We conclude that map density alone is not enough to distinguish between Na+ and water molecules and rely on the additional experiments described by Wang et al. which show increase PSII activity in elevated Na+ levels in basic conditions.

      8) The cryo-EM maps showing CP29-Ser84 phosphorylation and CP47-Cys218 sulfinylation are quite convincing. However, it is interesting that these modifications are only observed in the compact conformation, and not in the stretched conformation. Can the authors elaborate on whether or not they believe the compact and stretched conformations could be a result of these posttranslational modifications, or vice versa?

      This is an interesting suggestion. In our opinion it is less likely that the modification themselves trigger the transition between compact and stretched states. It is not clear how these modifications will stabilize the compact vs the stretched states. It is equally likely that these modifications are somehow triggered by the structural change. We cannot be certain that these modifications are not present in the stretched orientation as well but remain unobserved due to resolution differences. The correlation between the states and post translation modifications should be verified before a discussion on their possible roles in the transitions.

      9) Do the authors believe that PSII dimers in the solution can readily interconvert between compact and stretched conformations? Or is the relative ratio of these conformations fixed at the time of membrane solubilization with decyl-maltoside?

      We think that its more probable that the transition between these states occur in the membrane phase. The main reason for this will be that pigment loss and structural transitions in CP29 are more likely to occur in the membrane rather than in aqueous/micelle environments.

      10) The model proposed for divalent cation-mediated stacking of PSII dimers is compelling, and seems to be in agreement with previous investigations that observed a lack of stacked dimers in cryo-EM preparations lacking calcium/magnesium. However, my understanding from reading the methods section is that the observed lack of density between the stacked PSII dimers was inferred from maps obtained after multibody refinement. Based on the way the masks to define bodies were created for multibody refinement (Fig. 4A), the region between stacked dimers would be highly prone to map artifacts following multibody refinement. Have the authors looked closely at the interfacial region between stacked dimers following conventional 3D classification/refinement to ensure that there are indeed no features observed in the interfacial region even at low contour levels?

      We’ve made several attempts to resolve differences in the space between the stacked PSII dimer. These include focused classification with masks containing selected volumes from this regions and masks that include only one of the stacked PSII dimers to avoid signal subtraction in this region. All of these did not reveal any discernible features in this region. In addition, any stable binding of a bridging protein across the stacked dimer will probably be at least partially visible as additional density over the unstacked PSII. We searched for such features and found none.

    1. Author Response

      Reviewer #2 (Public Review):

      Weaknesses

      The author's approach, as with traditional approaches to molecular identification of vector species, relies on expert entomologists capable of identifying mosquitoes in the field which is rare in most places. The authors do not provide citations for the taxonomic keys used for morphological identification, which in many places are outdated or unavailable for specific locations.

      We have added references for taxonomic identification keys in lines 677–679.

      The authors give no explanation as to why they chose rRNA-seq as their method of next-generation sequencing, which is most commonly used for transcriptomics, instead of traditional DNA-based metagenomics which is more commonly used to define community relationships as would be more appropriate for this study.

      We have added a sentence in the Introduction (lines 65–66) to explain why RNA-seq is a frequent choice for surveillance and virus discovery in mosquitoes.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper shows that nuclear pore complex components are required for Kras/p53 driven liver tumors in zebrafish. The authors previously found that nonsense mutation in ahctf1 disrupted nuclear pore formation and caused cell death in highly proliferative cells in vivo. In the absence of this gene, there are multiple mitotic functions involving the nuclear pore that are defective, leading to p53 dependent cell death. Heterozygous fish are viable but have reduced kras/p53 liver tumor growth, and this is associated with multiple nuclear and mitotic defects that lead to cancer cell death/lack of growth. This therapeutic window suggests targetability of this pathway in cancer. I think the data are robust, rigorous, and clearly presented. I believe this in vivo work will encourage therapeutic targeting of NPCs in cancer.

      We are pleased that this reviewer believes that our data are robust, rigorous, and clearly presented and that our in vivo work will encourage therapeutic targeting of NPCs in cancer.

      Reviewer #2 (Public Review):

      Overall this is a very interesting and important paper that demonstrates a novel synthetic interaction between nucleoporin inhibition and oncogene-driven hyperproliferation. This work is especially significant because of the paucity of effective treatments for hepatocellular carcinoma (HCC). The authors' demonstration that the Nup inhibitor Selinexor decreases larval liver size in KRAS-overexpressing zebrafish but does not cause toxicity in wild-type animals lays the groundwork for exploiting this class of drugs in HCC treatment. This paper represents an elegant demonstration of the utility of zebrafish models in cancer studies. The relevance of this work to human cancer is supported by the authors' studies using TCGA data, wherein they demonstrate that decreased NUP expression is associated with increased survival in HCC.

      Other major strengths of the paper include beautiful pictures demonstrating that ahctf1+/- decreases the density and volume of nuclear pores in TO(kras) larvae and increases the rate of multipolar spindle formation, misaligned chromosomes, and anaphase bridges. The experiments are very well-controlled, including detailed analysis of the effects of ahctf1 heterozygosity and Selinexor on wild-type animals. The inclusion of distinct methods for disruption nucleoporins (ranbp2 heterozygosity and drug treatment) bolsters the authors' conclusion that this represents a viable drug target in HCC.

      My major concerns are as follows:

      1) The authors state that "the beneficial effect of ahctf1 heterozygosity to reduce tumour burden persists in the absence of functional Tp53, due to compensatory increases in the levels of tp63 and tp73". However, tp63 and tp73 appear similarly upregulated in ahctf1 heterozygotes regardless of tp53 status. The authors do not provide enough evidence that tp63 and tp73 are compensating for tp53 loss. An alternative possibility based on the data presented is that the effects of ahctf1+/- are independent of tp53 family members, and the effects on apoptosis go through a different pathway.

      We agree with this reviewer that we did not provide enough evidence that tp63 and tp73 are compensating for tp53 loss. Accordingly, we have addressed this issue comprehensively.

      2) The authors state in multiple locations that nucleoporin inhibition decreases tumor burden. In my opinion, this is not strictly correct. The TO(kras) model clearly results in HCC in adults, but it's a little unclear whether the larval liver overgrowth is truly HCC or not based on the original paper by Nguyen et al. (2012 Dis Model Mech).

      We agree with these comments and accordingly, we performed several new experiments in adult fish.

      Reviewer #3 (Public Review):

      The nuclear transport machinery is aberrantly regulated in many cancers in a context-dependent fashion, and mounting evidence with cultured cell and animal models indicates that reducing the activity or expression of certain nuclear transport proteins can selectively kill cancer cells while sparing nontransformed cells. Here the authors further explore this concept using a zebrafish model for hepatocellular carcinoma (HCC) induced by liver-specific transgenic expression of oncogenic krasG12V. The transgene causes greatly increased liver size by day 7 in larvae, associated with a gene expression profile that resembles early-stage human HCC. This study focuses on Ahctf1, a nuclear pore complex (NPC) protein known to be essential for postmitotic NPC assembly. Using the krasG12V background, the authors analyze animals that are heterozygous for a recessive mutation in the ahctf1 gene that leads to ~50% reduction in ahctf1 mRNA (and likely the encoded protein). The authors show that the ~4-fold increase in liver volume of krasG12V animals is reduced by ~1/3 in the ahctf1 heterozygous mutants. This is associated with increased apoptosis, decreased DNA replication, up-regulation of pro-apoptotic and cdk-inhibitor genes, and down-regulation of anti-apoptotic gene. These effects found to be substantially Tp53-dependent. Consistent with previous Ahctf1 depletion studies, hepatocytes of ahctf1 heterozygotes show decreased NPC density at the nuclear surface, elevated levels of aberrant mitoses and increased DNA damage/double stranded breaks. Finally, the authors show that combining the achtf1 heterozygous mutant with a heterozygous mutation in another NPC protein- RanBP2- or treating animals with a chemical inhibitor of exportin-1 (Selinexor) can further reduce liver volume. Overall they suggest that combinatorial targeting of the nuclear transport machinery can provide a therapeutic approach for targeting HCC.

      This is an interesting study that bolsters the notion that reduction in the levels of discrete nucleoporins (and/or inhibiting specific nuclear transport pathways) can result in cancer cell-selective killing. Moreover, the work extends previous studies involving cultured cell and mouse xenografts to a new cancer model (HCC) and nucleoporin (Ahctf1). Whereas the authors describe multiple aberrant cellular phenotypes associated with the dosage reduction in ahctf1, the exact causes for reduction in liver size in the krasG12V model remain unclear. Although it would be desirable to parse effects of Ahctf1 related to NPC number, aberrant mitoses, licensing of DNA replication and chromatin regulation, this is a tall order at present, given the limited understanding of Ahctf1. However, useful insight on these and related questions could be gained with further analysis of the system as outlined below.

      We are pleased this reviewer thinks this is an interesting study that bolsters the notion that reduction in the levels of discrete nucleoporins (and/or inhibiting specific nuclear transport pathways) can result in cancer cell-selective killing. This reviewer also suggests that useful insight on these and related questions could be gained with further analysis of the system as outlined below:

      1) In the krasG12V model, it would be helpful to distinguish the contribution of increased cell death vs decreased cell proliferation to the change in liver size seen with heterozygous ahctf1. Is this predominantly due to decreased proliferation?

      We think this question is difficult to address, because the relative contributions of the two processes may vary with time. Our data show definitively that by 7 dpf, the impact of ahctf1 heterozygous mutation has disrupted multiple cellular processes, leading to a 40% increase in the number of hepatocytes expressing Annexin 5 (dying cells), and a 40% decrease in the number of hepatocytes incorporating EdU over a 2 h incubation (fewer cells in S-phase). Both responses are likely to contribute to the reduction in liver volume observed in response to ahctf1 heterozygosity. It is worth stating that in our experiments, we captured snapshots of apoptosis and DNA replication in the livers of larvae at 7 days post-fertilisation after 5d of dox treatment/KrasG12V expression. To answer the Reviewer’s question properly, we would need to monitor the behaviour of individual cells over time. If such experiments were technically possible, we think that some cells that undergo growth arrest in response to dox treatment might ultimately succumb to apoptosis (unless dox treatment is withdrawn) while other cells might enter into a state of prolonged senescence. However, given the technical challenges, we did not attempt to test this in the current manuscript.

      2) It would be good to know whether the heterozygous ahctf1 state blunts the overall level of Ras activity in krasG12V animals.

      We have addressed this interesting question thoroughly in new Fig. 1g, h. To do this, we used a commercial RAS-RBD pulldown kit followed by western blot analysis to determine the levels of activated GTP-bound Kras protein. Our results demonstrate that the levels of GTP-bound Kras protein, expressed as a proportion of total Kras protein, do not change in response to ahctf1 heterozygosity. We conclude from these data that the potentially therapeutic value of reduced ahctf1 expression in a cancer setting is not caused by inhibiting Kras activity.

      3) Notwithstanding the analysis of Tp53 target genes presented in this study, it would be helpful to see detailed transcriptional profiling of hepatocytes in the krasG12V model with the heterozygous ahctf1 mutation, and to assess the effects of Selinexor. GSEA type analysis offers a way to start untangling the effects of these pathways. Moreover this analysis could provide insight on the relevance of this model to human HCC.

      We used RNAseq to address the relevance of our larval model to human HCC. Specifically, we performed differential gene expression analysis to identify up- and downregulated genes in cohorts of ahctf1+/+ (WT) larvae versus dox-treated ahctf1+/+(WT);krasG12V larvae. We used gene set enrichment analysis to compare these differentially regulated transcripts with the gene expression signature of 369 patient samples in the Liver hepatocellular carcinoma (LIHC) dataset versus healthy liver samples in the TCGA. These analyses revealed a significant association between the patterns of gene expression in our larval model of zebrafish HCC and those of human HCC (Fig. 1-figure supplement 1c, d).

      The genetic experiments we report in Figures 4, 5, 6 show that WT Tp53 is required for the reductions in liver enlargement (Fig. 4), apoptosis (Fig. 5) and DNA replication (Fig. 6) that occurs in response to ahctf1 heterozygosity in dox-treated krasG12V larvae. We also used RT-qPCR to show that a Tp53-mediated transcriptional program was activated in these ahctf1 heterozygous livers (Fig. 5). Similarly, in adult livers, ahctf1 heterozygosity triggered the upregulation of Tp53 target genes, including pro-apoptotic genes (pmaip1, bbc3, bim and bax) and cell cycle arrest genes (cdkn1a and ccng1) (new Fig. 6-figure supplement 1). These results show that to obtain the full potential of ahctf1 heterozygosity in reducing growth and survival of KrasG12V-expressing hyperplastic hepatocytes requires activation of WT Tp53. This is an important conclusion from our paper that is likely to be relevant in a clinical setting, for instance in patient selection, if ELYS inhibitors are developed for the treatment of HCC in which the KRAS/MAPK pathway is activated.

      Also, one reviewer mentions performing genome-wide transcriptional profiling of hepatocytes in the krasG12V model in response to ahctf1 heterozygosity and the presence and absence of Selinexor treatment. While these are potentially interesting experiments, they are substantial in nature and not crucial for the main messages of our paper. Therefore, we respectively contend that they are beyond the scope of the current manuscript.

      4) Functions of Achtf1 in regard to chromatin regulation could be compromised in this model. Scholz et al (Nat Gen 2019) report that Ahctf1 is involved in increasing Myc expression via gene gating mechanism. It would be good to know what the effects are in this system.

      The Scholz, 2019 and Gondor, 2022 papers from the same group, are very interesting in that they demonstrate a novel role for the ELYS protein in addition to the ones we pursued in our paper. The authors showed that in HCT116 cells, a human colorectal cancer cell line in which proliferation is driven by aberrant WNT/CTNNB1 signalling, the longevity of nascent MYC mRNA was increased by accelerating its movement from the nucleus to the cytoplasm, thereby preventing its degradation by nuclear surveillance mechanisms. The authors showed that siRNA knockdown of AHCTF1 in HCT-116 cells reduced the rate of nuclear export of MYC transcripts without changing the transcriptional rate of the MYC gene. They proposed a mechanism that depended on the formation of a complex chromatin architecture comprising transcriptionally active MYC and CCAT1 alleles plus proteins including β-Catenin, CTCF and ELYS. Together these interacting components guided nascent MYC mRNA molecules to nuclear pores, enhanced their export to the cytoplasm to be translated, resulting in activation of a MYC transcriptional program that induced expression of pro-proliferation genes.

      In theory, this role of ELYS in protecting MYC from nuclear degradation might extrapolate to other cancer settings where MYC expression is elevated. While interplay between MYC and mutant KRAS to enhance cancer growth has been previously reported, to date, most emphasis on this interaction has focused on the role of mutant KRAS in increasing the stability of the MYC protein, for example via RAS effector protein kinases (ERK1/2 and ERK5) that stabilise MYC by phosphorylation at S62 (Farrell and Sears, 2014: https://doi.org/10.1101/cshperspect.a014365) (Vaseva and Blake 2018: DOI:https://doi.org/10.1016/j.ccell.2018.10.001). While we appreciate the novelty of the recent papers, the current findings are limited to -Catenin activated HCT-116 cells and may not be relevant to our zebrafish model of mutant Kras-driven HCC. Accordingly, we have not allocated a high priority to following this up in our current manuscript.

      6) The synthetic lethality argument pressed in this manuscript seems exaggerated. Standard anti-cancer treatments typically target several cellular pathways, and nucleoporins directly affect a multiplicity of pathways besides nuclear transport.

      While we do not disagree that standard anti-cancer treatments may target several cellular pathways, we believe our data are consistent with the accepted definition of a synthetic lethal interaction whereby single mutations in two separate genes (kras and ahctf1) cooperate to cause cell death, whereas cells harbouring just one of these mutations are spared.

    1. Author Response

      Reviewer #1 (Public Review):

      1) Context and definitions for stochasticity and heritability: The authors provide well-referenced introductions and explanations throughout the manuscript. However, key understanding of concepts for their central hypothesis on transient heritability are not shared until well into the results sections (Lines 215-227), leaving the introduction somewhat unclear on the authors thinking and motivation. The manuscript would benefit by including clear definitions of "stochastic", "transiently heritable", and "heritable" and their relationships to "intrinsic" and "deterministic" in the introduction.

      Regarding the first point, we agree it is important to include clear definitions timely. Therefore, we added much more detail to the introduction (see tracked changes), and added the following definitions and additional explanations:

      Multilayered stochasticity: “stochasticity originating from different levels over the course of an infection.“

      “Importantly, although the terms stochasticity and determinism seem highly dichotomous, deterministic features (e.g., epigenetic regulation) are often, if not always, stochastically regulated (Zernicka-Goetz and Huang, 2010). However, in cellular decision-making, the major difference between a stochastic process and a deterministic process boils down to the effects of (varying) inputs on dictating (varying) outputs. In fact, a stochastic process in characterized by the exact same stimulus leading to varying response outcomes, often as a result of varying host-intrinsic factors (Symmons and Raj, 2016). In contrast, a deterministic process is characterized by an outcome (e.g., IFN-I production) that is fixed, or at least to a large degree, while the input can be variable. How cells are epigenetically predispositioned, in turn, can again be a stochastic process, similar to the fundamentals of developmental biology in which cells are randomly pushed towards deterministic outcomes (Zernicka-Goetz and Huang, 2010).”

      “Transient heritability refers to heritable epigenetic profiles [e.g., profiles encoding cellular fates for the production IFN-Is] that only transfer over a couple of generations, as observed across cell types and systems including cancer drug resistance (Shaffer et al., 2020), cancer fitness (Fennell et al., 2022; Oren et al., 2021), NK cell memory (Rückert et al., 2022), HIV reactivation in T cells (Lu et al., 2021), epithelial immunity (Clark et al., 2021), and trained immunity (Katzmarski et al., 2021).”

      “Besides a growing body of evidence on the role of transient heritable fates dictating cellular behaviors, the effects of population density, often referred to as quorum sensing, are getting more established for immune (signaling) systems (Antonioli et al., 2019; Polonsky et al., 2018; Van Eyndhoven and Tel, 2022). On top of the intrinsic features characterized by stochasticity and determinism, individual immune cells can communicate in various ways to elicit appropriate systemic immune responses. Typically, cytokine-mediated communication is categorized into two types: autocrine and paracrine signaling. Autocrine signaling is defined by cells secreting signaling molecules while simultaneously expressing the cognate receptor. Paracrine signaling is defined by cells either secreting signaling molecules without expressing the cognate receptor, or cells expressing the receptor without secreting the molecule. In essence, quorum sensing can be considered a phenomenon in which autocrine cells determine their population density based on cells engaging in neighbor communication, but without self-communication (Doğaner et al., 2016; Van Eyndhoven and Tel, 2022). Especially in the presence of other competitive decision makers [i.e., cytokine consumers and producers], it is critical for individual cells to assess cellular density, and act accordingly (Oyler-Yaniv et al., 2017).”

      2) Generalizability of findings to other cell types, systems, and triggers: The cell line and Poly(I:C) delivery method used by the authors lacks sufficient characterization to extend the conclusions derived from its use. Notably, the NIH3T3-IRF7-CFP cell line expresses IRF7 constitutively and thus may only be a good model for cells with similar expression levels; many primary cells only express IRF7 at low levels or not at all until stimulated (PMID: 2140621). The conclusions would be greatly strengthened by demonstrating similar first responder dynamics/heritability in other cell types. The experiments measuring the efficiency of Poly(I:C) delivery by transfection lack sufficient resolution to determine if the Poly(I:C) is intracellular or membrane bound. IFN-I response kinetics, and potentially quality, would likely be distinct between cytosolic and endosomal sensing and may impact the likelihood of becoming a first responder.

      Regarding the generalizability of findings to other cell types, systems, and triggers, we thank reviewer 1 for binging up this crucial point. About the IRF7 expression, IRF7 is expressed at a low amount in most cells and is strongly induced by type I IFN-mediated signaling (Marie et al., 1998; Sato et al., 1998b; Honda et al., 2006). How we used the word “constitutively” refers to the IRF7 molecules always being fluorescent, not that IRF7 is always highly expressed in these cells. Therefore, NIH3T3 is similar to all other cells, except for plasmacytoid dendritic cells, which are known for their high background levels of IRF7. We changed the revised manuscript accordingly:

      “Accordingly, we used a NIH3T3:IRF7-CFP reporter cell line, expressing low, physiological background levels of IRF7-CFP fusion proteins, to monitor signaling dynamics during early phase IFN-I response dynamics (Figure 1b).”

      Regarding the comparison with other cell types, we emphasized the similar responders numbers observed in plasmacytoid dendritic cells (an argument that the intrinsic factor of IRF7 background differences is not determining responders). We changed the revised manuscript accordingly:

      “In short, IFN-I responses are elicited by fractions of so-called first responding cells, also referred to as ‘precocious cells’ or ‘early responding cells’, which start the initial IFN-I production upon viral detection, both validated in vitro, in vivo, and across cell types (Bauer et al., 2016; Hjorton et al., 2020; Patil et al., 2015; Shalek et al., 2014; Van Eyndhoven et al., 2021a; Wimmers et al., 2018).”

      “This percentage is in line with what has been found across literature, species [i.e., human and mice] and cell types [i.e., fibroblasts, monocyte derived dendritic cells, plasmacytoid dendritic cells], which ranges from 0.8 to 10% of early responders, emphasizing the elegant yet robust feature of only a fraction of first responding cells driving the population-wide IFN-I system (Bauer et al., 2016; Drayman et al., 2019; Patil et al., 2015; Shalek et al., 2014; Van Eyndhoven et al., 2021a; Wimmers et al., 2018).”

      Regarding the hypothesis brought up by the reviewer on the role of cytosolic versus endosomal sensing impacting IFN-I response kinetics, and potentially quality, we hypothesize otherwise. Shalek and colleagues tested LPS (TLR4 ligand), PIC (TLR3 ligand, endosomal), and PAM (TLR2 ligand), all eliciting similar early responding cells, which they called precocious cells. This argues that the phenomenon of first responders is independent of the type of stimulation. Besides, for plasmacytoid dendritic cells, both R848 (TLR7/8 ligand), and CpG-C (TLR9 ligand) elicit very similar early IFN-I responses. In contrast, R848 and CpG-C elicit very different late IFN-I response dynamics, reflected by the fraction and activation dynamic of second responders (yet unpublished). We clarified accordingly:

      “Moreover, various stimuli (live and synthetic) targeted membrane, cytosolic, and endosomal receptors, arguing that the mode of activation is not driving the discrepancies in responder fates.”

      3) Epigenetic regulation of transient heritability: To test the contribution of epigenetic regulation on first responder fate, the authors treat their cells with DNMTi. While treatment with this drug does increase the proportion of first responder cells, the authors don't provide evidence that the mechanism of action is mediated by inhibiting DNA methylation. This is further confounded by the reduced responder frequencies in DNMTi treated cells transduced with Poly(I:C) (Fig 4g). The authors offer an explanation for this observation, but their reported data (Fig 4h) doesn't measure whether DNMTi, leads to latent retrovirus activation, broader demethylation, or a combination of the two.

      We are well aware that the hypothesis on retrovirus activation are inconclusive. Unfortunately, we currently do not have the ability to utilize the tools to properly assess this hypothesis. Instead, we can only speculate. However, we were able to assess the effects of a different epigenetic drug [i.e., HDACi], as suggested later by the reviewer. Therefore, to strengthen our data interpretation, we added the following additional information and experimental data to the revised manuscript:

      “Also the treatment with varying dosages and durations of Trichostatin A, an histone deacetylase inhibitor (HDACi), increased the number of responding cells (Supplementary Figure 5).”

      “The rather long timescales of switching from responders to non-responders, and the other way around, imply epigenetic mechanisms at play, and indeed, prior work has indicated an important role for epigenetics dictating IFN-I response dynamics (reviewed in (Barrat et al., 2019)).”

      “Both methylation and histone acetylation have been suggested in dictating transient heritable cellular fates (Clark et al., 2021; Lu et al., 2021; Shaffer et al., 2020).”

      4) Temporal experimental data to validate and extend transient heritability and quorum sensing: Developing a model for cellular-decision making during early IFN-I responses, the authors formalize and test the hypothesis of transient heritability. While the data largely fit the model proposed (Fig 6D-F), the reported data points lack sufficient temporal resolution to validate the model during the earlier and more variable generations. Given that by generation 9 variability in first responder frequency has almost stabilized, there is only one data point (generation 6) to evaluate the fit of the ODE described. More densely sampled data points below generation 10 are necessary to validate the model. Moreover, a discussion of Kon calculation/observation, meaning, and validation is missing. To partially test their claim that Kon is a function of density (i.e., quorum sensing), the authors plate cells at different densities and measure the responder frequency at generation 6. This analysis lacks contextualization of other autocrine and paracrine signals potentially impacting IFN-I response. Moreover, these signals will be diverse in different cell types and could impact Kon and/or the overall model.

      We agree that our first model validation was suboptimal, indeed because of lacking sufficient temporal resolution. Therefore, we performed additional experiments on clones of generation 1, 2, 3, 4, 5, of which the results turned out to be remarkably robust. We changed the revised manuscript accordingly:

      “Surprisingly, the data obtained from clones of generation one through nine resulted in a mean higher than 2.134% (Figure 6d; Supplementary Figure 9), and a fluctuating CV (Figure 6e). From generation 13 onwards, both the mean and the CV start to meet the data obtained from the regular cultures again, which are similar to the theoretical outcomes of a stochastic process. Accordingly, we modeled first responders as a binary switch, where individual cells are either responding (ON) or nonresponding (OFF), similar to the transient heritable fates characterized and modeled before (Shaffer et al., 2020). Details on the ODE model are provided in the Materials and Methods section. We could fit the transient heritability model to the data when starting from 100% responders at generation zero [i.e., a single cell isolated from the regular culture]. Cells switch OFF after 5 generation on average, with a constant kon rate throughout. Interestingly, in generation zero we observed (nearly) only IFN-I responders, which we believe might be caused by single cells being deprived from any paracrine cues, which could include inhibitory factors that normally limited responsiveness. However, single IFN-I-producing cells [i.e., plasmacytoid dendritic cells and monocyte derived dendritic cells] encapsulated in picoliter droplets or captured in small microfluidic chambers did not display this behavior (Shalek et al., 2014; Wimmers et al., 2018). Instead, one could argue that single cells establish a different microenvironment, compared to a situation in which cells are close to neighboring cells, which elicits behavioral changes accordingly. The dimensions of microfluidic droplets and chambers are in the same range of cell-to-cell contacts in vitro, while single cells seeded for cloning are surrounded by rather massive areas and volumes without other cells present. Therefore, we hypothesize that these single cells lack biochemical, and perhaps biomechanical cues provided by dense cell populations, which result in behavioral changes in these cells, in our case, making them more responsive. Similarly, in quorum sensing, cells secrete soluble signaling molecules (called autoinducers), which enables cells to get a sense of their cell density (Postat and Bousso, 2019; Waters and Bassler, 2005). Without signaling of these molecules, cells perceive being isolated from the rest. In microfluidic droplets and chambers, these molecules accumulate, given the relatively small volumes.”

      Regarding the contextualization of autocrine and paracrine signaling impacting IFN-I response dynamics in these studies, we added the following additional information:

      “On top of the intrinsic features characterized by stochasticity and determinism, individual immune cells can communicate in various ways to elicit appropriate systemic immune responses. Typically, cytokine-mediated communication is categorized into two types: autocrine and paracrine signaling. Autocrine signaling is defined by cells secreting signaling molecules while simultaneously expressing the cognate receptor. Paracrine signaling is defined by cells either secreting signaling molecules without expressing the cognate receptor, or cells expressing the receptor without secreting the molecule. In essence, quorum sensing can be considered a phenomenon in which autocrine cells determine their population density based on cells engaging in neighbor communication, but without self-communication (Doğaner et al., 2016; Van Eyndhoven and Tel, 2022).”

      Regarding the point that signals will be diverse in different cell types and could impact Kon and/or the overall model, yes, but we expect this to be only minor. Besides, the model can be easily adjusted to the different parameters per cell type (see Saint-Antoine et al., 2022).

      Reviewer #3 (Public Review):

      1) For the small fraction of cells that respond in the absence of Poly(I:C), are these cells just showing IRF7 translocation or are they fully responding with IFNB production? Has this been observed in other experimental systems or contexts? Do you also observe secondary responders in the unstimulated samples (as shown in the stimulated in Fig. 2G-I)?

      Regarding the first point on the unstimulated translocated cells, excellent point. Although we have not experimentally validated it, we hypothesize that cells are able to produce constitutive levels of IFN-Is, as thoroughly described in literature, so we assume that these translocated cells produce IFN-Is. We provided additional speculation in the revised manuscript:

      “Besides, the background numbers of translocated cells possibly reflect the intrinsic feature of the IFN-I system to ensure basal IFN-I expression and IFNAR signaling to equip immune cells to rapidly mobilize effective antiviral immune responses, and homeostatic balance through tonic signaling (Gough et al., 2012; Ivashkiv and Donlin, 2014).”

      2) Do the second responders only arise through direct IFN-I production by first responders? Is it possible that this response has any relationship with the initial transfection with Poly(I:C)?

      From the droplet-based experiments with plasmacytoid dendritic cells performed before (Wimmers et al., 2018; Van Eyndhoven et al., 2021), we could conclude that the second responders indeed required the activation and subsequent early IFN-I production of first responders. Whereas droplet-based microfluidics is a very stable, and controlled method, producing thousands of homogeneous droplets, we concluded that the difference between first and second responders is not elicited upon variations in activation (e.g., transfection discrepancies).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors use their expertise in live-cell imaging and mathematical modeling to further explore the relationship between chromatin structure, gene positioning and transcriptional coregulation. One of the strengths of the manuscript arises from the authors analysis of two publicly available datasets encompassing chromatin tracing and transcriptional activity. Using spatial analysis and modeling, the authors have impressively extended the findings of Su et. al, Cell 2020, who generated the analyzed dataset. A number of important concepts were explored including 1.) do genes re-position upon activation and 2.) can spatial proximity be correlated with transcriptional co-regulation. In general the authors conclusions are supported by their findings and should provide a blueprint for analysis of additional related big imaging datasets in the future.

      However there are a number of weaknesses including lack of statistical analysis or incomplete description (e.g. bootstrapping parameters, statistical methods, number of genes/cells/measurements, etc.) on some figures that make it difficult to interpret the significance of the trends. In addition, the modeling using live-cell studies is generalized based on a behavior (e.g. diffusion) of a single gene. The manuscript is densely written in a way that may be inaccessible for non-specialists. A final schematic model that summarizes biological findings would help alleviate this weakness.

      We are glad that the reviewer considers the observed phenomenon important and that our overall findings are consistent with our results. We implemented changes in response to each of the above requests:

      1) we added additional explanation of test statistics;

      2) we analyzed diffusion of additional genes;

      3) we tried to simplify the text;

      4) we added a final schematic.

      Reviewer #2 (Public Review):

      In their manuscript, Bohrer and Larson reanalyse previously published imaging datasets in order to tackle a long-standing question in modern genome biology: does the physical proximity of transcribed genes correlate with their co-expression?

      The authors start off by reanalysing fixed-cell data, in which they find that active genes (i.e., any gene with RNA FISH signal) often repositions towards the centroid of the imaged chromatin environment one transcriptionally active. The analysis is straightforward, but the notion of "closer to the centroid" remains a bit vague to me, and is not well defined as regards its functional significance. There is no doubt of the clear trend in the analysed data -- but the interpretation could be strengthened.

      We tried to clarify this part of the text and also added a schematic illustration to the discussion to help clarify this important point (Fig. 5).

      Then, using the same dataset, the question on physical gene proximity is addressed. This is not only an important and timely question, but also one which the authors address very nicely. They deduce that when a pair of loci are brought within sufficiently low physical 3D proximity (unrelated to their genomic distance) they are more likely than expected to be co-expressed. In cis, this distance can be defined to approx. <2.5 Mb of genomic separation. They also looked in trans, via a complex transfer of knowledge from live-cell imaging to the fixed-cell dataset, to show that genes brought within approx. 400 nm from one another display quite a high coexpression correlation. Despite the parsimonious nature of the model and assumptions that the authors use for this (testing more complex parameters might prove beneficial here), their postulations can quite adequately explain observations published by others that were previously left largely without interpretation.

      In my opinion, the main strength of this manuscript lies with the initial analysis of the fixed-cell data and the clear trends therein. The latter part, which nicely identifies caveats in available data and analyses and which makes a solid effort to combine live-cell with fixed-cell data, leaves more scenarios to be tested. Nevertheless, based on the outcome of this analysis (mostly found in Fig. 4), the value of ~400 nm as a physical proximity cutoff for co-expression is reasonable (based on previous knowledge) and does provide a solid first step in the direction of deciphering the rules that allow coordinated gene expression in mammalian cells.

      We agree that the modelling section is more of a first step and that future work will need to be done to investigate further. In the revision, we make this point explicit within the main text (See below).

      Overall, this is a conceptual advance of merit that can re-shape ways of approaching the stillopen issue of gene co-bursting in light of novel (mostly imaging) technologies.

      We appreciate the comment.

    1. Author Response

      Reviewer #2 (Public Review):

      This paper by Angueyra, et al., adds to the field’s current understanding of photoreceptor specification and factors regulating opsin expression in vertebrates. Current models of specification of vertebrate photoreceptors are largely based on studies of mammals. However, a great number of animals including teleosts express a wider array of photoreceptor subtypes. Zebrafish for example have 4 distinct cone subtypes and rods. The approach is sound and the data are quite convincing. The only minor weaknesses are that the statistical analyses need to be revisited and the discussion should be a bit more focused.

      To identify differentially expressed transcription factors, the authors performed bulk RNA-seq of pooled, hand-sorted photoreceptors. The selection criterion was tightly controlled to limit unhealthy cells and cellular debris from other photoreceptors subtypes. The pooling of cells provided a considerable depth of sequencing, orders of magnitude better than scSeq. The authors identified known transcription factors and several that appear to be novel or their role has not been determined. The data are made available on the PIs website as is a program to access and compare the gene expression data.

      The authors then used CRISPR/Cas9 gene targeting of two known and several novel factors identified in their analysis for effects on cell fate decisions and opsin expression. Phenotyping performed on the injected larvae is possible, and the target genes were applied and sequenced to demonstrate the efficiency of the gene targeting. Targeting of 2 genes with know functions in photoreceptor specification in zebrafish, Tbx2b and Foxq2 resulted in the anticipated changes in cell fate, albeit, the strength of the alterations in cell fate in the F0 larvae appears to be less than the published phenotypes for the inherited alleles. Interestingly, the authors also identified the expression of an RH2 opsin in the SWS2 another cone type. The changes are subtle but important.

      The authors then targeted tbx2a, the function of which was not known. The result is quite interesting as it matches the increase of rods and decrease of UV cones observed in tbx2b mutants. However, the injected animals also showed RH2 opsin expression but are now in the LWS cone subtype. These data suggest that Tbx2 transcription factors repress misexpression of opsins in the wrong cell type.

      The authors also show that targeting additional differentially expressed factors does not affect photoreceptor fate or survival in the time frame investigated. These are important data to present. For these or any of the other targeted genes above, did the authors test for changes in photoreceptor number or survival?

      We have attempted to address this point, but the answer is not clear cut. We used activated caspase-3 inmmunolabeling as a marker of apoptosis (Lusk and Kwan 2022). At 5 dpf, the age we chose to make quantifications, we don’t see an increase in activated caspase-3 positive cells when we compare control and tbx2a F0 mutants (Reviewer Figure 1A-B). Labeled cells are very rare and located near the ciliary marginal zone irrespective of genotype. This suggests that there is no detectable active death at this late stage of development in tbx2 F0 mutants. Earlier in development, at 3 dpf, when photoreceptor subtypes first appear, there is also a normal wave of apoptosis in the retina (Blume et al. 2020; Biehlmaier, Neuhauss, and Kohler 2001), resulting in many cells positive for activated caspase-3; our preliminary quantifications don’t show a marked increase in the number of labeled cells in tbx2a F0 mutants, but we consider that it’s likely that subtle effects might be obscured by the physiological wave of apoptosis (Reviewer Figure 1C-D).

      Reviewer Figure 1 - Assessment of apoptosis in tbx2a F0 mutants. (A-B) Confocal images of 5 dpf larval eyes of control (A and A’) and tbx2a F0 mutants (B and B’) counterstained with DAPI (grey) and immunolabeled against activated Caspase 3 (yellow) show sparse and dim labeling, restricted to cells located in the ciliary marginal zone, without clear differences between groups. (C-D) Confocal images of 3 dpf larval eyes of control (C and C’) and tbx2a F0 mutants (D and D’) immunolabeled against activated Caspase 3 show many positive cells, located in all retinal layers, as expected from physiological apoptosis at this stage of development and without clear differences between groups.

      Furthermore, the additional single-cell RNA-seq datasets we have reanalyzed suggest that tbx2a and tbx2b are expressed by other retinal neurons and progenitors and not just photoreceptors (Reviewer Figure 2), further confounding attempts at the quantification of apoptosis specifically in photoreceptor progenitors.

      Reviewer Figure 2 – Expression of tbx2 paralogues across retinal cell types. The transcription factors tbx2a and tbx2b are expressed by many retinal cells. Plots show average counts across clusters in RNA-seq data obtained by Hoang et al. (2020).

      At this stage, we consider that fully resolving this issue is important and will require considerably more work, which we will pursue in the future using full germline mutants and live-imaging experiments.

      Reviewer #3 (Public Review):

      Angueyra et al. tried to establish the method to identify key factors regulating fate decisions in the retinal visual photoreceptor cells by combining transcriptomic and fast genome editing approaches. First, they isolated and pooled five subtypes of photoreceptor cells from the transgenic lines in each of which a specific subtype of photoreceptor cells are labeled by fluorescence protein, and then subjected them to RNA-seq analyses. Second, by comparing the transcriptome data, they extracted the list of the transcription factor genes enriched in the pooled samples. Third, they applied CRISPR-based F0 knockout to functionally identify transcription factor genes involved in cell fate decisions of photoreceptor subtypes. To benchmark this approach, they initially targeted foxq2 and nr2e3 genes, which have been previously shown to regulate S-opsin expression and S-cone cell fate (foxq2) and to regulate rhodopsin expression and rod fate (nr2e3). They then targeted other transcription factor genes in the candidate list and found that tbx2a and tbx2b are independently required for UV-cone specification. They also found that tbx2a expressed in the L-cone subtype and tbx2b expressed in L-cones inhibit M-opsin gene expression in the respective cone subtypes. From these data, the authors concluded that the transcription factors Tbx2a and Tbx2b play a central role in controlling the identity of all photoreceptor subtypes within the retina.

      Overall, the contents of this manuscript are well organized and technically sound. The authors presented convincing data, and carefully analyzed and interpreted them. It includes an evaluation of the presented data on cell-type specific transcriptome by comparing it with previously published ones. I think the current transcriptomic data will be a valuable platform to identify the genes regulating cell-type specific functions, especially in combination with the fast CRISPR-based in vivo screening methods provided here. I hope that the following points would be helpful for the authors to improve the manuscript appropriately.

      1) The manuscript uses the word “FØ” quite often without any proper definition. I wonder how “Ø” should be pronounced - zero or phi? This word is not common and has not been used in previous publications. I feel the phrase “F0 knockout,” which was used in the paper cited by the authors (Kroll et al 2021), is more straightforward. If it is to be used in the manuscript, please define “FØ” and “CRISPR-FØ screening” appropriately, especially in the abstract.

      We have made changes to replace “FØ” to “F0.” In our other citation (Hoshijima et al., 2019), “F0 embryo” was used throughout the paper. Following our references and Dr Kojima’s suggestion, we adopted “F0 mutant larva” as the most straightforward and less confusing term. We have also made changes in the abstract to define our approach more clearly and made appropriate changes throughout the manuscript.

      2) Figure 1-supplement 1 shows that opn1mw4 has quite high (normalized) FPKM in one of the S-cone samples in contrast to the least (or no) expression in the M-cone samples, in which opn1mw4 is expected to be detected. The authors should address a possible origin of this inconsistent result for opn1mw4 expression as well as a technical limitation of using the Tg(opn1mw2:egfp) line for detection of opn1mw4 expression in the GFP-positive cells.

      In Figure 1 - Supplement 1, we had attempted to provide a summarized figure of all phototransduction genes, but the big differences in expression levels — in particular, the high expression of opsins genes — forced us to use gene-by-gene normalization for display. Without normalization, the expression of opn1mw4 is very low across all samples, and its detection in that sole S-cone sample can likely be attributed to some degree of inherent noise in our methods. We have revised Figure 1 - Supplement 1: we find that we can avoid gene-by-gene normalization and still provide a good summary of the expression of phototransduction genes if the heatmap is broken down by gene families, which have more similar expression levels. In addition, we have added caveats to the use of the Tg(opn1mw2:egfp) line as our sole M-cone marker in the results section describing our RNA-seq approach, including our inability to provide data on Opn1mw4-expressing M cones.

      3) The manuscript lacks a description of the sampling time point. It is well known that many genes are expressed with daily (or circadian) fluctuation (cf. Doherty & Kay, 2010 Annu. Rev. Genet.). For example, the cone-specific gene list in Fig.2C includes a circadian clock gene, per3, whose expression was reported to fluctuate in a circadian manner in many tissues of zebrafish including the retina (Kaneko et al. 2006 PNAS). It appears to be cone-specific at this time point of sample collection as shown in Fig.2, but might be expressed in a different pattern at other time points (eg, rod expression). The authors should add, at least, a clear description of the sampling time points so as to make their data more informative.

      We have included this information in the materials and methods. We collected all our samples during the most active peak of the zebrafish circadian rhythm between 11am and 2pm (3h to 6h after light onset) to avoid the influence of circadian fluctuations in our analysis.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors set out to develop an in vitro model of multiple species representing diversity in the CF airway as a platform for a range of studies on why polymicrobial communities resist therapy. The rationale for their design is sound and the methods appear justifiable and reproducible. The major strength of this work is in producing a method for a range of future work, ideally for multiple groups in the field. The primary findings are interesting but not groundbreaking. One weakness in the method of reporting interspecies interactions and another in evaluating alternative causes of lasR advantages present opportunities for a stronger research contribution beyond this terrific method.

      We thank the reviewer for this accurate summary of the data presented in our manuscript. We have addressed the raised concerned in the revised document. The modifications and comments can be seen in the “Essential Revisions” section above.

      Reviewer #2 (Public Review):

      Differences between the infection environment and in vitro model systems likely contribute to disconnects between the antimicrobial susceptibility profile of bacterial isolates and the clinical response of patients. The authors of this paper focus on a specific aspect of the infection environment, the polymicrobial nature of some chronic infections like those in people with Cystic Fibrosis (CF), as a factor that could impact antibiotic tolerance. They first use published genomic datasets and computational techniques to identify a clinically relevant, four-member polymicrobial community composed of Pseudomonas aeruginosa, Staphylococcus aureus, Streptococcus spp., and Prevotella spp. They then develop a high throughput methodology in which this community grows and persists in a CF-like environment and in which antibiotic susceptibility can be tested. The authors determine that living as a member of this community decreases the antibiotic tolerance of some strains of biofilm-associated P. aeruginosa and increases the tolerance of most strains of planktonic and biofilm-associated S. aureus and planktonic and biofilm-associated Streptococcus. They focus on the decreased tolerance of P. aeruginosa and determine that a ΔlasR mutant of P. aeruginosa does not display increased tobramycin susceptibility in the mixed community. One of the phenotypes associated with a ΔlasR mutant is an overproduction of phenazines. The authors find that by deleting the phenazine biosynthesis genes from ΔlasR, they can restore community-acquired susceptibility. They further investigate this phenomenon by showing that a specific type of phenazine, PCA, is significantly increased in mixed communities with the ΔlasR mutant compared to WT. Finally, they demonstrate that adding a specific phenazine, pyocyanin, to mixed communities can restore the tolerance of WT P. aeruginosa.

      Strengths:

      With this study the authors address a very important problem in infectious disease microbiology - our in vitro drug susceptibility assays do a poor job of mimicking the infection environment and therefore do a poor job of predicting how effective particular drugs will be for a particular patient. By demonstrating how an infection-relevant community modifies tolerance to a clinically relevant drug, tobramycin, the authors identify specific interactions that could be targeted with therapeutics to improve our ability to treat the chronic infections associated with CF. In addition, this study provides a framework for how to effectively model polymicrobial infections in vitro.

      The experiments in the paper are very rigorous and well-controlled. Statistical analysis is appropriate. The paper is very well-written and clear.

      The authors do an admirable job of using in silico analysis to inform their in vitro studies. Specifically, they provide a comprehensive rationale for why they chose and studied the specific community they did.

      The authors provide a very robust dataset which includes determining how strain differences of each of their four community members affect community dynamics and antibiotic tolerance. These types of analyses are laborious but very important for understanding how broadly applicable any given result is.

      We appreciate the reviewer’s thorough summary of our work and their positive comments.

      Weaknesses:

      The authors very clearly and convincingly demonstrate that WT P. aeruginosa becomes more susceptible to tobramycin in their mixed community. Our ability to turn these types of observations into therapeutic development depends on mechanistic insight. That said, it is unclear if the authors can make any solid conclusions about what specific aspects of the polymicrobial environment cause WT P. aeruginosa to become more susceptible. The authors make a compelling case that increased phenazine production by the ΔlasR mutant restores tolerance in the mixed community and that exogenous phenazine addition increases the survival of WT P. aeruginosa in the mixed community. However, it remains a plausible explanation that the effects of phenazines on tobramycin susceptibility are independent of the initial observation that WT. P. aeruginosa becomes susceptible to tobramycin in the mixed community.

      We agree with the reviewer’s comment here as it pertains to the initial observation of P. aeruginosa becoming more susceptible to tobramycin in the mixed community. However, as mentioned by the reviewer, we provide several lines of evidence that phenazines play a key role in the tolerance of the lasR mutant tobramycin, including genetic studies and feeding studies wherein exogenous addition of this molecule to WT P. aeruginosa phenocopies the lasR mutant exposed to tobramycin. Why the community impacts phenazine production of the WT strain is an open question, and the subject of future work. We have modified the abstract of the manuscript as follows at Lines 41–43:

      “Our data suggest that the molecular basis of this community-specific recalcitrance to tobramycin for the P. aeruginosa LasR mutant is increased production of phenazines.”

      Some aspects of the methodology are unclear. Specifically, the authors note that they use a specific sealed container system to grow their strains in anoxic conditions, which mimic portions of CF sputum. However, it is unclear how the authors change medium over the course of their experiments, or how they test susceptibility to tobramycin, without exposing the cells to oxygen. It is well understood that oxygen exposure impacts the susceptibility of P. aeruginosa to tobramycin, so it is very important that the methodology involving oxygen deprivation and exposure is described in detail.

      We have made the necessary modifications to the manuscript as indicated in the “Essential Revisions” section to address these concerns (see Comment #3). Furthermore, new validation experiments were performed in a controlled anoxic environmental chamber that yielded observations similar to the data presented in the original manuscript, thereby confirming that we were using anoxic conditions with the GasPak anaerobic jar system (see Figure 1 - figure supplement 2 and Figure 2 - figure supplement 7).

      Lines 198–204: “The impact of residual oxygen negatively influencing the growth of P. melaninogenica in monoculture was ruled out by performing these experiments using an anoxic environmental chamber (Figure 1 – figure supplement 2). That is, we did not detect CFU counts for either planktonic or biofilm populations of P. melaninogenica when grown in ASM in the anaerobic chamber, but as a positive control, significant growth was detected when using a medium shown previously to support growth of this microbe (10) (Prevotella Growth Medium, or PGM) (Figure 1 – figure supplement 2).”

      Lines 406–414: “Also, we ruled out the possibility of remaining oxygen in ASM negatively impacting the viability of P. melaninogenica by reproducing our results using an anoxic chamber (Figure 1 – figure supplement 2). That is, we observed that P. melaninogenica can robustly grow as a planktonic or biofilm monospecies community in a medium capable of sustaining its growth (PGM) while this microbe fails to grow in ASM (Figure 1 – figure supplement 2). Thus, we argue that the mixed-community-specific growth of Prevotella spp. we observed across several conditions (Figure 1C, Figure 1 – figure supplement 5, Figure 2 – figure supplement 6) is not due to residual oxygen.”

      Lines 290–293: “Growing and replenishing the preformed biofilm communities with fresh ASM supplemented or not with tobramycin using an anoxic environmental chamber resulted in similar phenotypes for all tested microorganisms (Figure 2 – figure supplement 7), indicating that the use of the GasPak system provides a robust anoxic environment.”

      Lines 533–540: “Plates were incubated using an AnaeroPak-Anaerobic container with a GasPak sachet (ThermoFisher) at 37 °C for 24 hours. Then, unattached cells were aspirated with a multichannel pipette and the pre-formed biofilms replenished with 100 µl of fresh ASM on the bench and incubated for an additional 24 hours at 37 °C using an AnaeroPak-Anaerobic container with a GasPak sachet (ThermoFisher). Similar experiments were performed using an anoxic environmental chamber (Whitley A55 - Don Whitley Scientific, Victoria Works, UK) with 10% CO2, 10% H2, 80% N2 mixed gas at 37 °C, yielding results identical to those observed for the GasPak system.”

      Reviewer #3 (Public Review) :

      This manuscript by Jean-Pierre et al. describes the creation and experimentation with a model CF lung community in an artificial sputum medium. The group uses data from 16S rRNA sequencing studies to select organisms for creating the model and then performs experiments to determine outcomes of growth competition and antibiotic tolerance in a community context. The main finding of the manuscript is that P. aeruginosa, notorious for its antimicrobial resistance phenotypes, is more susceptible to tobramycin in the community context than when grown alone. The manuscript is well prepared and follow-up experiments with mutant strains and phenazines greatly strengthen the project overall. The initial results paragraph where the authors go through the rationale for selecting the different organisms is perhaps a bit overkill, the organisms selected make sense based on their prevalence in CF airways, which in and of itself is a strong enough rationale. This aspect of the manuscript could be minimized to focus more on the exciting culture experiments in the latter parts of the results. Overall, this is a strong and well-crafted manuscript that will have a broad interest in the CF and microbial ecology fields.

      We thank the reviewer for this thoughtful review of our manuscript. We have not minimized the “front-end” of the paper because we believe the rationale for selecting the community and its members, and the validation of the model system are key for placing the resulting observations in a robust context, and for providing the underlying rationale to support the relevance of the findings.

      Major Critiques. I have two major critiques of this study.

      (1) Prevotella growth in monoculture. After reading the methods section it appears that the cultures were extensively washed and prepped prior to the inoculation into ASM. Prevotella did not grow alone, is this due to oxygen penetration of the cells during preparation? Perhaps oxygen is present in ASM prior to placement in an anaerobic bag? It is interesting, and perhaps worth exploring, whether the mixed community draws down oxygen from the media explaining the ability of Prevotella to grow. I suspect this is the case, but more detail is needed in the methods and this experiment would help us understand this interesting result.

      As presented in the “Essential Revisions” section (Comment #3), we have repeated the experiment using fully anoxic conditions (i.e., using an anoxic environmental chamber where the cultures were grown, washed and mixed before incubation) and still observed absence of growth of Prevotella cultivated in ASM in both biofilm and planktonic populations. Moreover, including a positive control, Prevotella Growth Medium, resulted in robust growth of this microbe. Taken together, our data suggest that residual oxygen in ASM is not the driver of the community-specific growth of P. melaninogenica.

      (2) Dilution of the community reproducing toby tolerance of P. aeruginosa. In supplemental figures, the replication of the 1:1000 dilution of the mixed community with P. aeruginosa shows poor replication and very large error bars. This experiment should be repeated to ensure it is reproducible.

      The diluted mixed community experiment was repeated a fourth time, yielding the same statistical conclusions. An updated “Figure 2 – figure supplement 1” was added to the paper. The highest (1:1000) dilution still yielded high variation which could perhaps be explained by low (i.e., ~103 CFU/mL) inoculum for S. aureus, S. sanguinis and P. melaninogenica used in these experiments; see updated “Microbial assays” paragraph of the “Materials and Methods” section). Thus, the variation at low inoculum is robust and reproducible. The Materials and Methods section was also updated to clarify the CFU counts used for those experiments. We have added modifications to the text as follows to address this critique:

      Lines 526–532: “The optical density (OD600) was then measured for each bacterial suspension and diluted to an OD600 of 0.2 in ASM. Monocultures and co-culture conditions were prepared from the OD600 = 0.2 suspension and diluted to a final OD600 of 0.01 for each microbial species in ASM corresponding to final bacterial concentrations of 1x107 CFU/mL, 3.5x106 CFU/mL, 1.2x106 CFU/mL and 4.6x106 CFU/mL of P. aeruginosa, S. aureus, Streptococcus spp. and Prevotella spp. respectively. A volume of 100 µl of bacterial suspension all at a final OD600 of 0.01 each in the mix was added to three wells.”

      Lines 558–570: “For experiments with varying concentrations of S. aureus, S. sanguinis and P. melaninogenica in monocultures and co-cultures, the organisms were grown from bacterial suspensions adjusted to an OD600 = 0.8 in ASM. Suspensions were further diluted in ASM to an OD600 of either 0.1, 0.001, 0.0001 or 0.00001 while maintaining P. aeruginosa at OD600 = 0.01 (approximating 1x107 CFU/mL) in all conditions. The OD600 = 0.1 dilution factor resulted in CFU/mL count average of 3.8x108 CFU/mL for S. aureus, 1.6x108 CFU/mL for S. sanguinis and 1.0x108 CFU/mL for P. melaninogenica. The OD600 = 0.001 dilution factor resulted in a CFU/mL count average of 6.7x105 CFU/mL for S. aureus, 1.1x105 CFU/mL for S. sanguinis and 1.4x105 CFU/mL for P. melaninogenica. The OD600 = 0.0001 dilution factor resulted in a CFU/mL count average of 4.2x104 CFU/mL for S. aureus, 3.3x104 CFU/mL for S. sanguinis and 4.6x104 CFU/mL for P. melaninogenica. The OD600 = 0.00001 dilution factor resulted in a CFU/mL count average of 5.6x103 CFU/mL for S. aureus, 4.4x103 CFU/mL for S. sanguinis and 6.2x103 CFU/mL for P. melaninogenica.”

    1. Author Response

      Reviewer #4 (Public Review):

      The study employs a number of methods, including TEM morphometric analysis, immunochemistry, western blotting, genomics, genetically modified models, whole heart measurements.

      However, the manuscript seems to be a collection of two unfinished works: one on the transition p20-p60 in post-natal development of the heart, second about the role of ephrinB1 in the maturation of the crests of the sarcolemma. Otherwise, it is not clear why in the first figure there is no staining for ephB1, and why there is staining for claudin 5 instead.

      The reason is clearly explained in the text on page 6. The first figure explores the postnatal maturation of the CM crests and their molecular determinants and our previous paper described Claudin-5 as the first molecular determinant of the crests (Guilbeau-Frugier et al, Cardiovasc Research 2019). Based on our previous demonstration of ephrin-B1 as a direct claudin-5 partner and regulator (Genet et al, Circulation Research 2012), we thus intuitively proposed ephrin-B1 as another potential molecular determinant of the crests that we explored for the first time in our current paper in revision. Moreover, ephrin-B1 is part of a large family of direct physical cell-cell communication proteins (Eph-Ephrin system), its role in the lateral crest-crest interaction was also obvious.

      This is why at the beginning of the paper we explored claudin-5 and thereafter ephrin-B1 to explore more the functional role of the crests using Efnb1 KO mouse model we had already established in the lab.

      The authors are trying to defend the idea that development of the heart in rats doesn't finish on postnatal day 20 and goes on for up to day 60. However, it is not convincing.

      It is no surprise transcription profile is different between day 20 and day 60, I am sure as life goes on development continues into aging and any comparison of samples collected with sufficient time lapse will give transcriptional differences. Whether these differences represent a truly separate development stage is not a clear-cut story.

      Most of the argument is based on morphometric study of TEM images.

      But also on confocal microscopy studies and more importantly on transcriptomic data.

      Whether it was evident that transcription profile is different between day 20 and day 60, then most of the studies in this postnatal field would have extended their study window over P20 which is not the case. As we mentioned it in the manuscript, most people in the field were assuming terminal maturity of the CM based essentially on its typical rod-shape which is already acquired at P20. Then growth of the heart between P20 and P60 was assumed to rely only on an increase in tissue quantitative content and not on transcriptomic changes, i.e. in qualitative content.

      However, the method is not described at all. There is reference to another paper by the authors, but this paper doesn't provide a concise description of the morphometry either. It is unclear how randomisation of images and fields of view has been achieved and what statistical methods has been implemented. In TEM it is often possible to find all sorts of oddities depending on how you choose the images.

      We agree with the author that TEM is often associated with “all sorts of oddities” and that‘s the reason our recent paper (Guilbeau-Frugier et al, Cardiovasc Research 2019) was dedicated to the analysis of technical pitfalls and analysis. All this paper relies on that: How to proceed the cardiac tissue to avoid artifacts on the crests/SSM visualization and how to quantify them?.

      Now, instead of only citing our previous paper, we have implemented the “Material and methods” / “Transmission electron microscopy (TEM) and quantitative analysis” section (Main manuscript, page 20-21) by highly detailing all the TEM observation/quantification.

      The question of randomization of images of the number fields of view is a general question in all imaging techniques and not specific at all with our TEM study. In imaging, there is no randomization.

      All statistical analysis of TEM data quantifications are accurately described in all figure legends. For instance, in the figure 1: (B) Quantification of crest heights / sarcomere length (left panel), SSM number / crest (middle panel) and SSM area (right panel) from TEM micrographs obtained from P20- or P60 rat hearts (P20 n=6, P60 n=6; 4 to 8 CMs/rat, ~ 70 crests/rat). However, to better clarify the “P20 n=6, P60 n=6”, we have now specified “P20 or P60 n=6 rats”. This have been now specified in the figure legends for all statistical analysis (highlighted in yellow in the revised manuscript).

      Why didn't the authors use microscopy of live isolated cells, which may be more relevant to study crest height?

      We clearly explained it at the very beginning of the results section of our paper (first paragraph, second sentence (i, ii). The use of living CMs is a non-sense based on our two previous papers on this topic (Dague et al JMCC 2014 and Guilbeau-Frugier et al, Cardiovasc Research 2019). Our first paper was essentially based on AFM studies using isolated CMs and we found that rapidly after isolation, CM surface crests/SSM have a high tendency to shrink and disappear in control mice. This is why the second paper was based on an extensive characterization of the crests within the tissue using TEM experiments and the comparison of CM crests between tissue and living cells is also highlighted in this paper. More importantly, in this recent paper, we have described for the first time using high resolution imaging techniques (TEM and STEAD), the existence of intermittent physical interactions between neighboring CMs on their lateral side through crest-crest interaction via the extracellular domain of claudin-5. This crest-crest physical interaction can only be observed within the tissue since isolated adult CMs remain isolated and do not reproduce CM-CM physical interactions (through lateral or physical interactions at the longitudinal level, i.e. the intercalated disk level).

      Both claudin5 and EphrinB1 seem to be expressed highly after p5, which doesn't correlate with the proposed maturation of crests at days 20 to 60.

      Many processes do not rely only on gene/protein expression but on post-translational processes and localization/trafficking of proteins within the cell. This is exactly what we show with ephrin-B1 and claudin-5 proteins that traffic from the cytoplasm to the lateral membrane at the surface of the CMs between P20 and P60, as shown by our confocal images of the cardiac tissue while the global expression level of these two proteins doesn’t change (western blot results).

      There is no causative relationship between the lack of ephrinb1 and crest maturing, at least to my mind.

      Comparing the cardiac tissue between P20 an P60 and showing both ephrin-B1 trafficking at the CM lateral surface and crest maturation is obviously not a criterion of any relationship between these two events. However, when you delete a specific protein, i.e ephrin-B1, from a specific cell, i.e. the CM, and the phenotype of the KO mice is again a lack of crest maturation, you can at least deduce that ephrin-B1 is involved, directly or indirectly we don’t know, in the maturation process of the crests in the CM.

      Now, because of the constitutive deletion of Efnb1, we couldn’t completely exclude that the phenotype of the constitutive Efnb1 CM-KO mice we described at the adult stage was directly related to specific alteration of CM surface crest/diastolic function at the adult stage or more likely related to other earlier developmental defects (secondary mechanisms). Also, to discriminate between these two possibilities, we have now used in the revision process a tamoxifen-inducible conditional-knockout (Mer-Cre-Mer) of Efnb1 in the CM (MHC promotor). This mouse model has never been reported before but its characterization (new Supplementary Figure 16) indicated that tamoxifen injection can lead to up to 50 % of Efnb1 deletion in CMs. In these conditions, deletion of Efnb1 (tamoxifen injection) was initiated at the young adult stage (2-month old) and the systolic and diastolic function (echo Doppler and LV-catheterism) but also CM crest phenotype (TEM) were examined one month later. As shown in the new Figure 7, deletion of efnb1 at the adult stage led to partial loss of CM surface crests (New Fig 7B), agreeing with the partial deletion of Efnb1, associated with a significant increase in the IVRT (echo-doppler), LVEDP (LV catheterism) with no modification of the ejection fraction (echo) compared to the control mouse littermates (tamoxifen injected) (New Fig. 7C, D). Thus, these data clearly demonstrate that ephrin-B1 is a specific determinant of the crest architecture at the CM surface and of the diastolic function at the adult stage.

    1. Author Response

      Reviewer #3 (Public Review):

      The manuscript by Le T.D.V. et al used in vitro cell culture and inhibitors for cellular signaling molecules and found that GLP-1 receptor activation stimulated the phosphorylation of Raptor, which was PKA-mediated and Akt-independent. The authors reported the physiological function of this GLP-1R-PKA-Raptor in liraglutide stimulated weight loss. This timely study has high significance in the field of metabolic research for the following reasons.

      (1) The authors' findings are significant in the field of obesity research. GLP-1 receptor (GLP-1R) is a successful target for diabetes (and weight loss) therapeutics. However, the mechanisms of action for the weight-loss effect of GLP-1 agonists are not fully understood. Therefore, mechanistic studies to elucidate the signaling pathways of GLP-1 receptors pertaining to weight loss at the cellular level are timely.

      (2) G protein-coupled receptors (GPCRs) induces various signaling activities, which could be cellular and tissue specific. As these are an important protein family for drug targeting, understanding the basic biology of these receptors is of interest to a broad readership.

      (3) The authors have made important discoveries that Exendin-4 stimulated mTORC1 signaling was essential for the anorectic effect induced by Exendin-4. The study reported in this current manuscript provides more details of brain GLP-1R signaling pathways and is innovative.

      Overall, the authors have presented sufficient background in a clear and logically organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings, took potential caveats into consideration, and made a justified conclusion.

      Recommendations for the authors:

      The manuscript can be further strengthened with more clarification on the following points.

      1) In Figure 1 panels B and C, please provide the quantification for pCREB/CREB. In Figure 1 panel D, please provide the quantification for pAkt/Akt.

      We thank the Reviewer for this suggestion. We now provide quantification of pCREB and pAkt expression in Supp. Fig. 1.

      2) The western blots to assess the signaling activities revealed the phosphorylation status of the key signaling molecules at a single time point. Whether the overall signaling dynamics have been affected is unclear.

      We agree with the reviewer on this point. We conducted initial time course experiments to identify a suitable time point for the subsequent experiments conducted in the present studies. The 1h time point presented in our results was chosen because it was the earliest time point at which both liraglutide stimulated mTORC1 signaling and this effect was inhibited by the various pharmacological inhibitors. We agree with the reviewer that at this point it is not clear whether the various inhibitors or the Ser791Ala mutation in Raptor modifies the dynamics of mTORC1 signaling. Although we have preliminary data in CHO-K1 cells suggesting that the temporal dynamics of these signaling events are not affected, this does not necessarily translate to the in vivo setting. Once we identify the key target tissue/cell type(s) mediating the weight loss effect of liraglutide via the PKA-Raptor interaction and generate the necessary mutant mice, we will test whether this affects signaling dynamics in vivo.

      3) Figure 3 panels A and B demonstrated the remarkable importance of the Ser791 Raptor. However, this PKA-resistant mutant did not completely abolish the weight loss effect of liraglutide. The authors pointed out the importance of AMPK in mTORC1 signaling. Other pathways that may complement GLP-1R-PKA-Raptor signaling can be further discussed.

      We agree with the reviewer that other signaling pathways are likely involved that contribute to the remaining weight-lowering effect of liraglutide. Besides AMPK, we have also included a discussion of Akt being a potential molecule that interacts with these pathways in vivo (lines 218-225). The word limitations of a Short Communication prevent us from further expanding on these possible mechanisms.

      4) Food intake was decreased on day 2 in Figure 3D but became comparable between WT and S791A Raptor groups on the following days. Could this be due to some compensatory mechanisms?

      This pattern of food intake response to GLP-1R agonists has been previously reported by our group and others (please see Brown JD et al. Am J Physiolo Regul Integr Comp Physiol 2018 and Adams JM et al. Diabetes 2018). The reason for this is unclear at this moment, but we can speculate that the rebound in food intake is a compensatory mechanism to prevent the organism from continuously losing weight. We now also present also showing an initial drop in energy expenditure with liraglutide treatment that progressively increases to pre-treatment levels.

    1. Author Response

      Reviewer #3 (Public Review):

      The size of the excitation region and the size of the aster are linearly correlated but are drastically different in size. This provokes several questions.

      • Why does only one aster form if the region of excitation is over 10x the size? Why are there not multiple asters formed within this activation region?

      • A much larger excitation diameter than the size of the resultant structure suggests the amount of dimeric motor is not limiting. Why then does the size of the aster increase with excitation diameter?

      • A linear relationship between excitation region and aster size may suggest a constant density of material within the aster. While the intensity profile of a single aster is given in Fig 1C, the magnitude of intensity versus the estimated size of the aster would determine whether the system is reduceable purely to changes in size/radial distribution.

      We thank the reviewer for the careful consideration of our work. In the experiments performed for this study, we were careful to be in a regime in which a single aster formed within the excitation region. However, by varying the concentration of components in the system, it is possible for multiple asters to form. See Figure R2 for example images of cases in which multiple asters formed.

      The increase in aster size with excitation region was also described previously in Ross, et al. 2019. In this, we found that the aster size scales with the volume of the excitation region, suggesting that the number of microtubules is limiting to aster size. This supports the hypothesis that there may be a density limit to the microtubules, likely due to steric interactions between the microtubules. We clarified this and added reference to the Ross, et al. findings in lines 115-118, as follows:

      “In Ross, et al., it was determined that the aster size roughly scaled with the volume of the excitation area, suggesting that the number of microtubules limits the size of the aster. This hints that there may be a density limit to the microtubules in an aster.”

      Is dimerization reversible after activation? If the motors cannot unbind from each other, and act as crosslinkers (for as long as they remain bound) are they likely to accumulate within the aster over time? This may challenge the steady state assumption.

      We thank the reviewer for the thoughtful analysis. Dimerization is reversible after activation - the lifetime of the optogenetic bond is about 20 seconds (Guntas et al., 2015). In order to form an aster, we repeatedly activate the sample at 20 second intervals, so there is a balance between motors unbinding from each other and ones becoming dimerized. This balance can create a non-equilibrium steady state. We have clarified this in lines 78-80, as follows:

      “The optogenetic bond lasts for about 20 seconds before reverting to the undimerized state, thus in our experiments, we repeatedly illuminate the sample every 20 seconds (Guntas, et al. 2015).”

    1. Author Response

      Reviewer #3 (Public Review):

      Gomolka et al. are trying to establish how aquaporin-4 (AQP4) water channels, a key component of the glymphatic system, facilitate brain-wide movement of interstitial fluid (ISF) into and through the interstitial space of the brain parenchyma. Authors employ a number of advanced non-invasive techniques (diffusion-weighted MRI and high-resolution 3D non-contrast cisternography), invasive dynamic-contrast enhanced (DCE-) MRI along with ex-vivo histology to build a robust picture of the effects of the removal of AQP4 on the structure and the fluid dynamics in the mouse brain. This work is a further step for the implementation of non-invasive tools for studying the glymphatic system.

      The main strengths of the manuscript are in the extensive brain-wide and regional analysis, interrogating potential changes in the structural composition, tissue architecture, and interstitial fluid dynamics due to the removal of AQP4. The authors demonstrate an increase in the interstitial fluid volume space, an increase in total brain volume, and a higher brain water content in AQP4 knockout mice. Importantly, an increase in apparent diffusion coefficient (ADC) was reported in most brain regions in the AQP4-KO animals which would suggest an increase in the movement of the fluid, which is supported by an increase in interstitial fluid space measures by real-time iontophoresis with tetramethylammonium (TMA). There is a reduction in the ventricular CSF space compartment while the perivascular space remains consistent. A reduction in gadolinium-based MRI tracer influx into many regions of the AQP4 KO mouse brain parenchyma is found, which supports conclusions of slowing down of fluid transfer while noting that the tracer dynamics in the main CSF compartments show no significant differences.

      The interpretation of non-invasive measures of the interstitial fluid dynamics in relationship to regional AQP4 expression is less well supported. The regional AQP4 channel expression in WT mice positively correlates with the ADC and extravascular diffusivity (D) measures. However, their finding that regional ADC also increases when AQP4 is removed weakens the conclusion that the removal of AQP4 leads to interstitial fluid stagnation.

      We are thankful to the reviewer for the positive feedback. Indeed, we aimed to provide the scientific field with the most detailed and objective assessment on effect of congenital loss of AQP4 channel on the brain water homeostasis and glymphatic transport. Therefore, we predominantly employed MRI techniques enabling non-invasive assessment, while superimposing obtained findings to standard DCE-MRI and physiological evaluation in-vivo and ex-vivo.

      In response to the remark, it is indeed difficult to discuss this phenomena other than relating the regional AQP4 expression to a specific metabolic or morphological structure in WT mice brain, thus associating AQP4 channel expression with regional water distribution. This would have a background not only in to date report highlighting upregulation of AQP4 in response to fluid stagnation, but also in possibility of rapid AQP4 relocalization after acute water intoxication (as comprehensively reviewed by Salman et al. 2022). This would also not reject the possibility that AQP4 is by default expressed more in the regions of functionally higher water content, reflected by higher ADC measures.

      In KO mice, we found deletion of AQP4 channel affecting mainly the brain water homeostasis (Figure 1), and thus increased slow MR diffusion metrics would be related to increased brain swelling and increased ISF space compared to WT littermates (Figure 2). However, it is not excluded that this might be rather a superposition of two opposing effects: decrease in measured ADC due to decrease water exchange, and even larger increase in ADC as a manifestation of increased ISF space volume resulting from prior phenomenon. Such explanation was previously presented based on estimation using Latour’s model of long-time diffusion behavior (Pavlin et al. 2017, https://pubmed.ncbi.nlm.nih.gov/28039592/) and connected to rather to enlarged interstitial space Urushihata et al. 2021, https://pubmed.ncbi.nlm.nih.gov/34617156/) that are not paralleled by changes in blood perfusion between genotypes (Zhang et al. 2019, https://pubmed.ncbi.nlm.nih.gov/31220136/).

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the authors investigate the genes involved in the retention of eggs in Aedes aegypti females. They do so by identifying two candidate genes that are differentially expressed across the different reproductive phases and also show that the transcripts of those two genes are present in ovaries and in the proteome. Overall, I think this is interesting and impressive work that characterizes the function of those two specific protein-coding genes thoroughly. I also really enjoyed the figures. Although they were a bit packed, the visuals made it easy to follow the authors' arguments. I have a few concerns and suggested changes, listed below.

      1) These two genes/loci are definitely rapidly evolving. However, that does not automatically imply that positive selection has occurred in these genes. Clearly, you have demonstrated that these gene sequences might be important for fitness in Aedes aegypti. However, if these happen to be disordered proteins, then they would evolve rapidly, i.e., under fewer sequence constraints. In such a scenario, dN/dS values are likely to be high. Another possibility is that as these are expressed only in one tissue and most likely not expressed constitutively, they could be under relaxed constraints relative to all other genes in the genome. For instance, we know that average expression levels of protein-coding genes are highly correlated with their rate of molecular evolution (Drummond et al., 2005). Moreover, there have clearly been genome rearrangements and/or insertion/deletions in the studied gene sequences between closely- related species (as you have nicely shown), thus again dN/dS values will naturally be high. Thus, high values of dN/dS are neither surprising nor do they directly imply positive selection in this case. If the authors really want to investigate this further, they can use the McDonald Kreitman test (McDonald and Kreitman 1991) to ask if non- synonymous divergence is higher than expected. However, this test would require population-level data. Alternatively, the authors can simply discuss adaptation as a possibility along with the others suggested above. A discussion of alternative hypotheses is extremely important and must be clearly laid out.

      We agree with the reviewer’s point that rapid evolution is not the same as positive selection. We also agree with the reviewer’s point that McDonald-Kreitman test (MK test) is more powerful than dN/dS analysis. We took advantage of a large population dataset from Rose et al. 2020. After filtering the data, we kept 454 genomes for MK tests. We found both genes are marginally significant or insignificant (tweedledee p = 0.068; tweedledum p = 0.048), despite that these are small genes and have low Pn values. This suggests that it is likely the genes evolve under positive selection.

      In line with the reviewer’s suggestion, we performed another analysis using a large amount of population data. We asked if the SNP frequencies of tweedledee and tweedledum are correlated with environmental variables. We found that when compared to a distribution of 10,000 simulated genes with randomly-sampled genetic variants, both tweedledee and tweedledum showed significant correlation to multiple ecological variables reflecting climate variability, such as mean diurnal range, temperature seasonality, and precipitation seasonality (p<0.05). These results are now incorporated into the manuscript in Figure 5 and Figure 5 – Figure supplement 1.

      2) The authors show that the two genes under study are important for the retention of viable eggs. However, as these genes are close to two other conserved genes (scratch and peritrophin-like gene), it is unclear to me how it is possible to rule out the contribution of the conserved genes to the same phenotype. Is it possible that the CRISPR deletion leads to the disruption of expression of one of the other important genes nearby (i.e., in a scratch or peritrophin-like gene) as the deleted region could have included a promoter region for instance, which is causing the phenotype you observe? Since all of these genes are so close to each other, it is possible that they are co-regulated and that tweedledee and tweedledum and expressed and translated along with the scratch and peritrophin-like gene. Do we know whether their expression patterns diverge and that scratch and peritrophin-like genes do not play a role in the retention of viable eggs?

      This is a fair criticism; however, we think the chance that the phenotypes are caused by interrupting nearby genes is very low. First, peritrophin-like acts in the immune response, and scratch is a brain-biased transcription factor. Neither of the genes show expression in the ovary before or after blood feeding (TPM <1 or 2 are generally considered unexpressed, while scratch and peritrophin-like expression levels are overall lower than 0.1 TPM).

      This suggests that peritrophin-like and scratch are not likely to function in the ovary. Thus, although we cannot completely rule out the gene knockout impacts regulation of very distant genes, it is unlikely. Since the mounting evidence we show in this manuscript that tweedledee and tweedledum are highly translated in the ovary after blooding feeding, under the principle of parsimony, we expect the phenotypes came from knocking out the highly expressed and translated genes.

      Reviewer #2 (Public Review):

      This manuscript is overall quite convincing, presenting a well- thought-out approach to candidate gene detection and systemic follow- ups on two genes that meet their candidate gene criteria. There are several major claims made by the authors, and some have more compelling evidence than others, but in general, the conclusions are quite sound. My main issues stem from how the strategy to identify genes playing a role in egg retention success has led to very particular genes being examined, and so I question some of the elements of the discussion focusing on the rapid evolution and taxon- uniqueness of the identified genes. In short, while I believe the authors have demonstrated that tweedledee and tweedledum play an important role in egg retention, I'm not sure whether this study should be taken as evidence that taxon-specific or rapidly evolving genes, in general, are responsible for this adaptation, or simply play an important role in it.

      We have revised the paper to make it clearer that the focus is indeed on these two genes on not on the greater question of taxon-specific or rapidly-evolving genes.

      First, the authors present evidence that Aedes aegypti females can retain eggs when a source of fresh water is lacking, confirming that females are not attracted to human forearms while retaining eggs and that up to 70% of the retained eggs hatch after retaining them for nearly a week. This ability is likely an important adaptation that allows Aedes aegypti to thrive in a broad range of conditions. The data here seem fairly compelling.

      Based on this observation, the authors reason that genes responsible for the ability to retain eggs must: 1) be highly expressed in ovaries during retention, but not before or after. 2) be taxon-specific (as this behavior seems limited to Aedes aegypti). While this approach to enriching candidate genes has proven fruitful in this particular case, I'm not sure I agree with the authors' rationale. First, even genes at a low expression in the ovaries may be crucial to egg retention. Second, while egg-laying behavior is vastly varied in insects, I'm not sure focusing on taxon-restricted genes is necessary. It is entirely possible that many of the genes identified in Figure 2E play a crucial role in egg retention evolution. These are minor issues, but they are relevant to some later points made by the authors.

      We regret framing the discovery of tweedledee and tweedledum in the original submission using this somewhat artificial set of filtering criteria. The reality is that the genes caught our attention for their novel sequence, tight genetic linkage, and interesting expression profile. That really is the focus of the paper, not these other peripheral questions that have been the focus of attention of the reviews. We really do apologize for all of the confusion about what this paper is about.

      Nonetheless, the authors provide very compelling evidence that the two genes meeting their criteria - tweedledee and tweedledum, play an important role in egg retention. The genes seem to be expressed primarily in ovaries during egg retention (some observed expression in brain/testes is expected for any gene), and the proteins they code seem to be found in elevated quantities in both ovaries and hemolymph during and immediately after egg retention. RNA for the genes is detected in follicles within the ovary, and CRISPR knockouts of both the genes lead to a large decrease in egg viability post retention.

      My earlier qualms about their search strategy relay into some issues with Figure 4, which describes how the two genes are 1) taxon- restricted and 2) have evolved very rapidly. Neither of the two statements is unexpected given the authors' search strategy. Of course, the genes examined precisely for their lack of homologs do not have any homologs. Similarly, by limiting themselves to genes that show a lack of homology (i.e. low sequence similarity) to other genes as well as genes with high expression levels in the ovaries, a higher rate of evolution is almost inevitable to infer (as ovary expressed genes tend to evolve more rapidly in mosquitoes). I agree with the authors that inferences of the evolutionary history of these genes are quite difficult because of their uniqueness, and I especially appreciate their attempts to identify homologs (although I really dislike the term "conceptualog").

      We have removed our term “conceptualog” and replaced with the mor conventional “putative ortholog”

      This leads to my main (fairly minor) issue of the paper - the discussion on the evolutionary history of these genes and its implications (sections "Taxon-restricted genes underlie tailored adaptations in a diverse world" and "Evolutionary histories and catering to different natural histories"). As noted, inferring this history is very difficult because the authors have focused on two rapidly evolving, taxon-restricted genes. The analyses they have performed here definitely demonstrate that the genes play an important role in egg retention, however, they do not show that taxon-restricted genes play a disproportionate role in egg retention evolution. Indeed, the only data relevant to this point would be the proportion of genes in Figure 2E that are taxon-restricted (3/9), but I'm not sure what the null expectation for this proportion for highly expressed ovary genes is to begin with. Furthermore, the extremely rapid evolution of this gene makes it hard to judge how truly taxon-restricted it is. My own search of tweedle homologs identified multiple as previously having been predicted to be "Knr4/Smi1-like", and while no similar genes are located in a similar location in melanogaster, there is generally little synteny conservation in Drosophila (for instance Bhutkar et al 2008), so I'm unsure what can really be said about their evolutionary origins/lack of homologs in Drosophila.

      In short - the manuscript makes clear that tweedledee and tweedledum play an important role in egg retention in A. aegypti, nonetheless, it is not clear that this is a demonstration of how important taxon- restricted genes are to understanding the evolution of life-history strategies.

      Again, we should have never framed the paper the way we did in the original version. We make no claims whatsoever that taxon-restricted genes in general should play a role in this biology, only that the two candidate genes under study influence egg viability after extended retention. We hope that the framing is clearer in this revision.

    1. Author Response

      Reviewer #1 (Public Review):

      Understanding the evolution of broadly neutralizing influenza antibodies is key to developing a more universal vaccine. In this study, Phillips et al. performed a comprehensive analysis of the evolutionary pathway of CH65, which is an H1-specific broadly neutralizing antibody. The authors generated a combinatorial mutant library with 2^16 members that contained all possible evolutionary intermediates between the unmutated common ancestor (UCA) and CH65, less two mutations that did not affect binding. The binding affinity of each member in the library was measured against HAs from MA90 and SI06, which were isolated 16 years apart, as well as MA90 with a UCA escape mutation G189E. The binding affinity was measured using a high-throughput approach that combined yeast display and Tite-Seq, with careful experimental validation. The results showed that epistasis between mutations within the heavy chain and also across heavy and light chains plays an important role in CH65 to evolve breadth. Although this study highly resembles a previous study by the authors that focused on another broadly neutralizing influenza antibody called CR9114 (Phillips et al., eLife 2021), there are several key differences. Firstly, CR9114 is a HA stem-directed antibody, whereas CH65 binds to the receptor-binding site of HA. Secondly, their previous study only studied the mutations in the heavy chain, whereas the present study looked at mutations in both heavy and light chains. Lastly, the present study provided a structural mechanism of epistasis by solving crystal structures. Such investigation of structural mechanisms was absent in their previous study. Overall, the data quality in this study is very high. In addition, the results have important implications for vaccine development.

      We thank Reviewer #1 for their review of our work and have implemented each of their suggestions to improve the clarity of our manuscript.

      Reviewer #2 (Public Review):

      Although many broadly-neutralizing antibodies were discovered against virus accumulating mutations such as HIV, Influenza, and Sars-CoV-2, the methodology to induce such antibodies or design to generate them is highly demanded. The authors take the broadly-neutralizing antibody, CH65 as a model antibody and try to recapitulate the generation of the broadly-neutralizing antibody from an unmutated common ancestor over time. By performing Tite-Seq assays, Epistasis analysis, Pathway analysis, and Affinity measurement, and structural study, the authors proposed a scenario of the evolution of CH65.

      Strengths

      Combining the models and affinity/structure data, the authors enable us to show the possible track of gaining the breadth of the CH65 antibody from the unmutated repertoire. Using the Tite-Seq assay, the authors took a forward genetics approach which is high-throughput and non-bias and mimics the situation of the evolution of a B cell repertoire in an individual over time. The data is robust, and its outcome will provide an opportunity to build a prediction model to design the antibody in silico. Especially their identification of amino acid positions important for epistasis mode in antibody evolution is valuable. Antigen selection scenarios are decisive in this study.

      Weakness

      The proposed scenarios cannot be tested using human CH65. The readers would have great interest in how these hypothetical scenarios are fitting to the evolution occurring in vivo situation, especially in a quantitative way. The broadly neutralizing antibodies often react with self-antigens as the authors cite previous work(ref 19). How do these environmental factors affect the evolution of the antibody? These already-known facts could be mentioned and discussed in detail.

      We thank Reviewer #2 for these comments and agree that applying these insights to understand in vivo antibody affinity maturation would be fascinating. As the Reviewer points out, our study is limited to examining antigen affinity and neglects other properties that are known to impact antibody affinity maturation (e.g., autoreactivity). As we mention in the Discussion, our work shows how the acquisition of breadth is shaped by mutations that interact epistatically to determine binding affinity, and future work is required to understand how these mutations and interactions may also impact the myriad other properties relevant to antibody maturation.

    1. Author Response

      Reviewer #2 (Public Review):

      The paper has two key messages: the discovery and the function of LncSox17. Claims of gene discovery are today untrivial, given the large number of genome-wide datasets. Of course, I understand the authors cannot check everything but I feel some more clear and deep analysis of current databases is lacking.

      The reviewer is right when stating that there is an extremely high number of publicly available datasets and resources. In the current manuscript, we used Ensembl genes, Genecode V36 and Genecode V36 lncRNAs (commonly used datasets for gene and transcript annotation) and could not find reports of long non-coding RNAs with similar location, length and strand of T-REX17 (see Fig. 1). To further ensure that we did not overlook it, during the revision we inspected these datasets again, coming to the same conclusion that T-REX17 has not been previously reported at this locus.

      As we show, T-REX17 is only very transiently expressed in definitive endoderm and given that there are few available RNA-seq datasets covering this developmental transition from hiPSCs it is not entirely surprising that it has been missed in the past.

      Also, the exact coordinates of the lncRNA are not easy to find in the manuscript.

      This is certainly an important annotation we missed in the manuscript. We now updated the legend of Figure 1A to include the exact genomic location of T-REX17.

      Many statistical analyses are rather lacking. In particular I did not find details of how the DEGs were identified during differentiation (FDR? How many replicates?).

      We thank the reviewer for pointing this out. We now specify in the Methods section (page 42, lines 1037-1039) and in the figure legends (page 54, lines 1269-1271) how the DEGs have been identified, which thresholds have been used, and number of replicates performed.

      The results of the smFISH are surprising, since the level of expression seems rather low in comparison to the qPCR (only 4 times less expressed than Sox17) or the RNA-seq.

      Direct quantitative comparisons between smFISH and qPCR (or RNA-seq) assays are in general quite hard since the two technologies rely on different biochemical principles. qPCR and RNA-seq include an amplification step, and therefore their interpretation should be considered as relative rather than absolute. On the other hand, smFISH offers a more absolute quantitative information and provides clues about the subcellular localization of the investigated target. At the same time, in smFISH experiments, individual foci could represent the accumulation of more than one molecule, making it hard to accurately infer gene expression levels from images. Throughout the manuscript we combine the two assays in an attempt to provide more robust information about T-REX17 expression dynamics.

      We would also like to note the high specificity of our smFISH signal, given that we do not observe any detectable foci for T-REX17 in undifferentiated cells (Fig. 2C) or T-REX17 depleted endoderm cells (Fig. 3C).

    1. Author Response

      1) Response to the Editor

      We thank the Editor and the Reviewers for the kind words, the helpful suggestions, and the points of critique, which have all helped us substantially strengthen the manuscript in this revised version. Regarding the 3 general critiques highlighted by the Editor:

      Essential Revisions:

      1) Some hypothesis, and in particular the one that all individuals have the same inter-burst interval distribution should be tested/justified/discussed.

      (a) We have generalized the theory to directly address this point by relaxing the assumption of an identical inter-burst interval for all individuals. In short: the main insights continue to hold and we discuss the nuances in the text.

      (b) Experimentally, the hypothesis that all single fireflies isolated from the group exhibit the same interburst interval (IBI) distribution could not be rigorously tested. The main reason is practical: in order to compare IBI distributions across individuals, we would need to collect a large number of fireflies and track them for long durations, which was not realistic given our experimental setup and the short window of firefly emergence. In addition, external environmental factors might slightly alter behaviors as well, making comparisons even more complex. Thus, due to paucity of field data, we eventually use the assumption that all individual fireflies follow the same IBI distribution.

      2) Comparison between the models and the data must be improved, in particular through a quantification of the differences between distributions and sensitivity analysis of the numerical results.

      (a) Regarding the comparison of the agent-based simulations with experimental data, in Fig. 7, we compare the underlying distributions using the two-sided Kolgomorov-Smirnov statistical test for goodness-of-fit. These appear to us the most straightforward and informative approaches, without over-fitting.

      (b) Regarding sensitivity analysis for the agent-based simulations, for each β value from 0 to 1 we statistically compared simulations to the experimental distributions to find the most well-fitted β.

      (c) Finally, owing to experimental constraints leading to sparsity of available data in characterizing the interburst distribution, we strive to strike a delicate balance between sophisticated statistical tools to compare theoretical and simulation distributions (with unrestricted access to large sample sizes) to the finite samples in the empirical distributions. As such, we think it is the apposite to use the first two moments of respective distributions In Fig. 3 to show the striking similarity of trends.

      3) More discussion of the modeling in connection to past theoretical results and existing literature is necessary to better contextualize the present work and assess its originality.

      We have done this closely following the specific suggestions from reviewers.

      2) Revised terminology: removing usage of “model”

      Since unintended ambiguity may be caused by use of the word “model”, which could refer to either (1) the theoretical framework, principle of emergent periodicity, and attendant analytic calculation , or (2) the agent-based simulation in the computational realization, we have removed all instances of the word “model” from the results presented in the paper, and replaced by the specific meaning (theory or simulation) in each context.

      Similarly, in responding to Reviewers’ comments, we clarify what we understand by their use of the word “model” in each case.

      3) Addressing an error in the agent-based simulation code

      We (OM and OP) have now addressed an inadvertent unit typo in the agent-based simulation code. The discharging time (Td) before the typo was fixed was set to 10000ms. After the fix, the Td value was correctly set to 100ms. This caused very slow discharges, keeping the voltage high until any beta addition was received, resulting in more frequent bursts than we’d actually expect from the model dynamics. This has been fixed, and in our responses to the reviewers, we address the results of this fix by referring to the “unit typo”. We corrected the panels corresponding to agent-based simulation in Figs. 3 and 5 to reflect the new numerical simulation results, as well as the corresponding sections in the text of the paper.

      4) Addressing changes to experimental dataset

      We increased the size of our N=1 dataset (N is number of fireflies) to correctly match what was reported in the original text of 10 samples. Additionally, we have added characterization of the size of the datasets for N=5, 10, 15, and 20 fireflies.

      5) Response to Reviewer 1

      We thank the Reviewer for kind remarks, and the highlights of the strengths of the paper.

      Regarding concerns raised, point by point:

      Reviewer #1 (Public Review):

      Weaknesses:

      The work presented here is an excellent start at understanding the collective behavior of this particular species of firefly. However, the model does not apply to other species in which individual males are intrinsically rhythmic. So the model is less general than it may appear at first.

      We take the Reviewer’s point well. We have added text to the paper to clearly highlight this point.

      The modeling framework is also developed under the very stylized conditions of experiments conducted in a small tent. While that is a natural place to begin, future work should consider the conditions that fireflies encounter in the wild. Swarms that are spread out in space would require a model with a more complicated structure, perhaps with network connectivity and coupling strengths that both change in time as fireflies move around. This is not so much a weakness of the present work as a call to arms for future research.

      We agree with the Reviewer that this is an exciting call to arms for future research!

      Other comments:

      This assumption that all individuals have the same IBI distribution could be directly tested. Has this been done? If not, why not? e.g. Are there difficulties with letting one firefly flash long enough to collect sufficient data to fill out the distribution?

      1. We have generalized the theory to directly address this point by relaxing the assumption that all individuals exhibit the same inter-burst interval distribution. In short: the main insights continue to hold and we discuss the nuances in the text.

      2. Experimentally, hypothesis that all single fireflies isolated from the group exhibit the same interburst interval (IBI) distribution could not be rigorously tested. The main reason is practical: in order to compare IBI distributions across individuals, we would need to collect a large number of fireflies and track them for long durations, which was not realistic given our experimental setup and the short window of firefly emergence. In addition, external environmental factors might slightly alter behaviors as well, making comparisons even more complex. Thus, due to paucity of field data, we eventually use the assumption that all individual fireflies follow the same IBI distribution.

      The derivation given in 6.2.1 is clearer than the approach taken here, which unnecessarily introduces Q, q, and c and then never uses them again.

      We agree with the Reviewer and have accordingly revised the manuscript.

      We have also implemented the suggested edits in the marked up manuscript. We are grateful for the detailed feedback, which helped us substantially extend results, and improve presentation and clarity.

      6) Response to Reviewer 2

      We thank the Reviewer for their thorough feedback. We provide point by point responses below.

      Reviewer #2 (Public Review):

      1) The biological relevance of certain hypotheses is insufficiently discussed. This is important because if the observed behaviour is a universal one, alternative models may explain it as well.

      We thank the reviewer for raising this point. The main hypotheses underlying our models are: 1) individual fireflies in isolation flash at random intervals; 2) these random intervals are drawn from the empirical distribution reported (implicitly: all fireflies follow the same distribution); 3) once a firefly flashes, it triggers all others. Hypothesis 1) is directly supported by the data presented. Hypothesis 2) is comprehensively addressed in the revised manuscript, as discussed previously. Hypothesis 3) is central to the proposed principle, and enables intrinsically non-oscillating individuals to oscillate periodically when in a group. The resulting phenomenon has been compared to experimental data and extensively discussed in the manuscript. Further, we have also simulated the effect of changing the strength of coupling between fireflies based on this hypothesis in the revised section on agent-based simulation.

      2) Comparison between the models and the data could be improved, in particular through quantification of the differences between distributions and sensitivity analysis of the numerical results.

      1. Regarding the comparison of the agent-based simulations with experimental data, in Fig. 7, we compare the underlying distributions using the two-sided Kolgomorov-Smirnov statistical test for goodness-of fit. These appear to us the most straightforward and informative approaches, without over-fitting.

      2. Regarding sensitivity analysis for the agent-based simulations, for each β value from 0 to 1 we statistically compared simulations to the experimental distributions to find the most well-fitted β.

      3. Finally, owing to experimental constraints leading to sparsity of available data in characterizing the interburst distribution, we strive to strike a delicate balance between sophisticated statistical tools to compare theoretical and simulation distributions (with unrestricted access to large sample sizes) to the finite samples in the empirical distributions. As such, we think it is the apposite to use the first two moments of respective distributions In Fig. 3 to show the striking similarity of trends.

      Reviewer #2 (Recommendations for the authors):

      A. The assumption that single-firefly spikes obey the same distribution (there is no individual variation in the frequency, or even of the composing number of bursts, of the flash) does not seem to have been verified on the data, that are instead pulled together in one single distribution (Fig. 1D). Moreover, the main feature of such distribution is that it has a minimum at 12 secs (discarding the faster bursts that are not considered in the model) and that it is sufficiently skewed so that it takes a minimal coupling for collective synchrony to emerge. I think that the agreement between the distributions for different N would be more meaningfully discussed having previous work as a reference, whereas now this is relegated to the discussion, so that it is unclear how much of the theoretical results are novel and/or unexpected. Quantification of the distance between distributions would also be interesting: it looks like the two models (analytical and simulations) disagree more among themselves than with the data.

      Regarding the hypothesis that all individual fireflies exhibit the same interflash interval, please see our response to Main Point 1. Regarding comparing the analytical theory and numerical simulation analysis, Figs. 3 and 5 have been revised after a unit typo was found in the code (see Section 2). Following the update, the analytical and numerical models agree in (1) the location of the peak in Fig. 3 for all N values, and (2) the peak approaches the minimum of the input distribution as N increases.

      B. If I understand correctly, simulations are introduced as a way to get a dependence on the intensity of the coupling (\beta). There are several issues here. First, I do not see how the coupling constant could change in the present experimental setup, where all fireflies presumably see each other (different from when there is vegetation). Second, looking at Fig. 3, the critical coupling strength appears to depend very weakly from N, and it is not clear how the 'detailed comparison' that leads to the fit is realized (in fact, the fitted \betas look larger that those at which the transition occurs in Fig. 3A). I think a sensitivity analysis is needed in order to understand how do results change when \beta is changed, and also what is the effect of the natural Tb distribution (Fig. 2 F). Results of the simulations might be clearer if instead of using the envelope of the experimental results, the authors tried to fit it to a standard distribution (ex. Poisson) so that it can be regularized. This should allow to trace with higher resolution the boundary between asynchronous and synchronous firing.

      We have included agent-based numerical simulations as a way to provide a concrete instantiation of the theory principle and analytical results in the preceding section. While the analytic theory results are fitting parameters free, in the agent-based simulations, we introduce an additional fitting parameter, to see what happens when we relax one hypothesis of the analytical theory: the instantaneous triggering of all fireflies upon an initial flasher. Additionally, the agent-based simulations pave the way for future work, allowing for convenient exploration of the connectivity between individuals and analysis of the behavior of individual fireflies. in this context, please note that Fig. 5 has been corrected (see above), leading to a stronger co-dependence of β and N. In addition to the envelopes, we also report the trends in the first empirical moments (mean and STD) for comparison and tracking of the transition to synchrony.

      C. More care should be put in explaining what are the initial conditions hypothesized for the different models. For instance, the results of paragraph 3 are understandable if all fireflies are initialized just after firing, something that is only learnt at the end of the paragraph. I also wonder whether initial conditions may be involved with T_bs in the low-coupling region of Fig. 3A not being uniformly distributed, as I would have expected for a desynchronized population.

      We have clarified that, indeed, all fireflies are re-initialized after firing. The initial conditions then become a new random vector of interflash intervals. Importantly, we found after receiving the reviews that, due to inconsistent units in our numerical simulation code, Fig. 5 was incorrect. With proper units, the new results show a much more widespread distribution at low coupling, as expected by the Reviewer.

      D. I found that equations were hard to understand either because one of the variables was not precisely (or at all) defined, or because some information was missing: Eq. 1: q is not defined Eq. 2: explain what it means: the prob. that others have not flashed times that that one flashes. Also, say explicitly what is the 'corresponding PDF. Eq. 3: the equation for \epsilon(t) to which this is coupled is missing Why introduce \beta_{i,j} and T_bi if they are then taken independent of the indexes? Definitions of collective and group burst interval should be provided. It would be clearer if t_b0 was defined in the first paragraph of the results, so as to clarify as well its relation with T_b. Define T^i_b in the caption of Fig. 3 (they are defined later than the figure is first discussed). The definition of 'the vertical axis label' (maybe find a word for that...) is pretty cumbersome. I could imagine that other definitions would allow the lines in Fig. 3 E to converge to the same line for large betas, which would make more sense, considering that in the strong coupling limit I see no reason why the collective spiking should not be the same for different N (the analytical model could help here).

      Thank you for these comments; we have incorporated these and related changes.

      E. I think that the author's reading of the two 'dynamical quorum sensing' papers they cite is incorrect: De Monte et al. was not about the Kuramoto model, but the same limit cycle oscillators as in Strogatz; Taylor et al. considers excitable systems, potentially closer to noisy integrate-and-fire, at least in that they do not have self-sustained oscillations. Both papers show that oscillations appear above a certain density threshold, and that the frequency of oscillations increases with density, as found in this work. A more accurate link to previous publications in the field of synchronization theory, including the models by Kurths and colleagues for fireflies, would be useful both in the introduction and in the discussion, and would help the reader to position this work and appreciate its original contributions.

      1. Thank you for pointing out an inaccuracy in our literature citations regarding synchronization. We have now made corrections to address this point.

      2. While we take the Reviewer’s points well, our theory framework (“model”), building off of the principle of emergent periodicity we propose here, is fundamentally different in the nature of individuals from extant “models”. The reference in question has individuals as oscillators, and the fastest frequency is the frequency of the fastest individual oscillator. In contrast, in our work there is no fastest individual oscillator and the “fastest frequency” has a completely different meaning, since individuals do not have a particular frequency associated with them. In this sense, our work is not inspired by theirs. That said, we have included citations as suggested by the Reviewer.

      F. The authors say that part of the data is unpublished. I guess they mean that the whole data set will be published with this manuscript. I think the formulation is ambiguous.

      Thank you for this comment. We have now clarified that the data will indeed be published with the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper tests whether people vary their reliance on episodic memory vs. incremental learning as a function of the uncertainty of the environment. The authors posit that higher uncertainty environments should lead to more reliance on episodic memory, and they find evidence for this effect across several kinds of analyses and across two independent samples.

      The paper is beautifully written and motivated, and the results and figures are clear and compelling. The replication in an independent sample is especially useful. I think this will be an important paper of interest to a broad group of learning, memory, and decisionmaking researchers. I have only two points of concern about the interpretation of the results:

      1) My main concern regards the indirect indicator of participants' use of episodic memory on a given trial. The authors assume that episodic memory is used if the value of the chosen object (as determined by its value the last time it was presented) does not match the current value of the deck it is presented in. They find that these mismatch choices happen more often in the high-volatility environment. But if participants simply choose in a more noisy/exploratory way in the high volatility environment, I believe that would also result in more mismatched judgments. What proportion of the trials labeled as episodic should we expect to be a result of noise or exploration? It seems conceivable that a judgment to explore could take longer, and result in the observed RT effects. Perhaps it could be useful to match up putative episodic trials with later recognition memory for those particular items. The across-subjects correlations are an indirect version of this, but could potentially be subject to a related concern if participants who explore more (and are then judged as more episodic) also simply have a better memory.

      Thank you for this important suggestion. We agree that noisy/exploratory choices could potentially masquerade as episodic on the episodic-based choice index used as one of our behavioral measures. As pointed out, this is because participants may be more likely to make noisier incremental value-based decisions in the high volatility compared to the low volatility environment. In our revision, we provided a new analysis that shows that, as the reviewer predicted, choices are indeed more noisy in the high volatility environment. We answer this concern in two ways. First, we took this noise into account in our analysis of the episodic/incremental tradeoff and show that it does not account for the main findings. And second, we provided a new analysis of subsequent memory that shows that choices that are defined as episodic during the decision-task are also associated with better recognition memory later on. These new analyses are described below as well.

      We used a mixed-effects logistic regression model to test for an interaction effect of environment and model-estimated deck value on whether the orange deck was chosen. We fit this model only to trials without the presence of a previously seen object in order to achieve a more accurate measure of noise specific to incremental learning. In both the main and replication samples, participants did indeed make noisier incremental decisions in the high compared to the low volatility environment (Main: 𝛽 = −1.589, 95% 𝐶𝐼 = [−2.091, −1.096], Replication: 𝛽 = −1.255, 95% 𝐶𝐼 = [−1.824, −0.675]). To account for the possibility that the measured difference between environments in our episodic-based choice index may be related to this difference in incremental noise between the environments, we included each participant’s random effect of the environment by deck value interaction from this model as a covariate in our analysis of the effect of environment on the episodic-based choice index. While each participants’ propensity to choose with greater noise in the high volatility environment did have an effect on the episodic-based choice index (Main: 𝛽 = 0.042, 95% 𝐶𝐼 = [0.012, 0.072], Replication: 𝛽 = 0.055, 95% 𝐶𝐼 = [0.027, 0.082]), the effect of environment was similar to that originally reported in the manuscript for both samples following this adjustment. The reported effects (lines 178 and Appendix 1) and methods (lines 643-655) have been updated to reflect these changes.

      We applied a similar logic to the reaction time analysis, to address the possibility that decisions based on exploration may take longer compared to decisions based on exploitation of learned deck value. We included a covariate in the analysis of the effect of episodic-based choices on reaction time that captured possible slowing due to switching from choosing one deck to the other (lines 656-662) and found that the slower reaction times on episodic choices are not fully explained by exploration. Because in this task a decision to explore is captured by switching from one deck to another, the effect of episodic-based choices on reaction time reported in the manuscript should account for this behavior. We have clarified this reasoning in the methods (lines 661-662).

      Finally, thank you for the idea to sort objects in the recognition memory test by whether they were from episodic- or incremental-based choice trials to provide a further test of whether our approach for sorting episodic decisions withstands an independent test. We performed this analysis and found that, in both samples, participants had better memory for objects from episodic-based choice trials. This result provides further support for the putative episodic nature of these trials and is now reported in the Results (lines 300-304 and Appendix 1), Methods (lines 737-742) and appears as a new panel in Figure 5 (Figure 5A).

      2) The paper is framed as tapping into a trade-off between the use of episodic memory vs. incremental learning, but it is not clear why participants would not use episodic memory in this particular task setup whenever it is available to them. The authors mention that there is "computational expense" to episodic memory, but retrieval of an already-established strong episodic memory could be quite effortless and even automatic. Why not always use it, since it is guaranteed in this task to be a better source of information for the decision? If it is true that RT is higher when using episodic memory, that is helpful toward establishing the trade-off, so this links to the concern above about how confident we can be about the use of episodic memory in particular trials.

      Thank you for raising this important point and for giving us the opportunity to clarify. We now address this point in two ways: first, we provide a new analysis of episodic memory and choice behavior and we address this point explicitly in the discussion.

      As now emphasized in the paper (lines 118-122 and lines 384-388), in this task, it is true that an observer with perfect episodic memory should always make use of it whenever available (i.e. on trials featuring previously seen objects). However, human memory is fallible and resourcelimited, and we find that participants with less reliable episodic memory overall actually relied less on this strategy and more on incremental learning throughout the task (Figure 5C and 5D). In other words, there is noise and uncertainty also in the episodic memory trace. While it is not the main focus of our study, the noise in episodic memory is indeed another reason why trading off between episodic memory and incremental learning is advantageous for behavior. We further agree that while the RT effects show that, relative to using incremental value, episodic memory retrieval takes longer, we cannot make strong statements about effort or “computational expense” per se from our data. Accordingly, we have removed the “computational expense” phrase (line 491), as well as our suggestion that episodic retrieval is “perhaps more effortful overall” (line 181), from the paper.

      Reviewer #2 (Public Review):

      This manuscript addresses the broad question of when humans use different learning and memory systems in the service of decision-making. Previous studies have shown that, even in tasks that can be performed well using incremental trial-and-error learning, choices can sometimes be based on memories of individual past episodes. This manuscript asks what determines the balance between incremental learning and episodic memory, and specifically tests the idea that the uncertainty associated with each alters the balance between them in a rational way. Using a task that can separate the influence of incremental learning and episodic memory on choice in two large online samples, several lines of evidence supporting this hypothesis are reported. People are more likely to rely on episodic memory in more volatile environments when incremental learning is more uncertain and during periods of increased uncertainty within a given environment. Individuals with more accurate episodic memories are also more likely to rely on episodic memory and less likely to rely on incremental learning. These data are compelling, even more so because all of the main findings are directly replicated in a second sample. These data extend the notion of uncertainty-based arbitration between different forms of learning/memory, which has been proposed and evaluated in other contexts, to the case of episodic memory versus incremental learning.

      The weaknesses in the paper are mostly minor. One potential weakness is the nature of the online sample. Many participants apparently did not respond to the volatility manipulation, making it impossible to test whether this altered their choices. It is unclear whether this is a feature of online samples (where people can be distracted, unmotivated, etc.) or of human performance more generally.

      Thank you for your comments. Indeed, we also found it interesting that many participants were insensitive to the manipulation of volatility in our study, as assessed and filtered based on the initial deck learning task. As you note, our study is not positioned to determine the cause and whether this is due to the online population or human performance more generally, and we added a discussion of this point to the paper (lines 477-485). Also, fractions exceeding 1/3 apparently inattentive participants are very much the norm in our experience with other online studies across many tasks. While there is much to say about the implications of this (see e.g. Zorowitz, Niv & Bennett PsyArXiv 2021), our basic philosophy (which we follow here) is that it is best practice, and conservative, to exclude aggressively so as to focus analyses on those participants for whom the experimental questions can meaningfully be asked.

      Reviewer #3 (Public Review):

      The purpose of this work is to test the hypothesis that uncertainty modulates the relative contributions of episodic and incremental learning to decisions. The authors test this using a "deck learning and card memory task" featuring a 2-alternative forced choice between two cards, each showing a color and an object. The cards are drawn from different colored decks with different average values that stochastically reverse with fixed volatility, and also feature objects that can be unfamiliar or familiar. Objects are not shown more than twice, and familiar objects have the same value as they did when shown previously. This allows the authors to construct an index of episodic contributions to decision-making: in cases where the previous value of the object is incongruous with the incrementally observed value, the subject's choice reveals which strategy they are relying on.

      The key manipulation is to introduce high- and low- volatility conditions, as high volatility has been shown to induce uncertainty in incremental learning by causing subjects to adopt an optimal low learning rate. The authors find that the subjects show a higher episodic choice index in the high-volatility condition, and in particular immediately after reversals when the model predicts uncertainty is at a maximum. The authors also construct a trial-wise index of uncertainty and show that episodic index correlates with this measure. The authors also find that at the subject level, the overall episodic choice index correlates with the ability to accurately identify familiar objects, and the reason that this indicates higher certainty in episodic memory is predicting the usage of episodic strategies. The authors replicate all of their findings in a second subject population.

      This is a very interesting study with compelling results on an important topic. The task design was a clever way to disentangle and measure different learning strategies, which could be adopted by others seeking to further understand the contributions of different strategies to decision-making and its neural underpinnings. The article is also very clearly written and the results clearly communicated.

      A number of questions remain regarding the interpretation of the results that I think would be addressed with further analysis and modeling.

      At a conceptual level, I was unsure about the equivalence drawn between volatility and uncertainty: the main experiments and analyses all regard reversals and comparisons of volatility conditions, but the conclusions are more broadly about uncertainty. Volatility, as the authors note, is only one way to induce uncertainty. It also doesn't seem like the most obvious way to intervene on uncertainty (eg manipulated trial-wise variance seems more obvious). The trial-wise relative uncertainty measurements in Fig 4 speak a bit more to the question of uncertainty more generally, but these were not the main focus and also do not disambiguate between trial-wise uncertainty derived from reversals versus within block variation.

      Thank you for your comments. We agree that this distinction was unclear and appreciate the opportunity to clarify. We hope the manuscript is now clear about the conceptual distinction between uncertainty as the construct of theoretical interest vs. volatility as the operational manipulation being used to access it. We have adjusted the presentation and added discussion to clarify this, and also enhanced the trial-wise analyses to strengthen the interpretation of results in terms of uncertainty more generally. Regarding obviousness, we think perhaps there is a difference between areas of study on this point. While trial-wise outcome variance (which we call stochasticity) has been widely used to manipulate uncertainty in perceptual and sensorimotor studies, it has been more rarely manipulated in reward learning studies, where instead the volatility manipulation we use has predominated. We have a recent paper reviewing examples of both and arguing that the field has underemphasized the importance of stochasticity, so we are sympathetic here (Piray and Daw, Nature Communications 2021).

      In any case, to address these points on revision, we have reframed the first section of the results, where we look at effects of environment on episodic-based choice, to focus primarily on volatility. Specifically, we have expanded on our explanation of how volatility induces uncertainty, changed the subtitle of the section from ‘uncertainty’ to ‘volatility’, and have specified that the prediction in this section is primarily about volatility (lines 97 and 116-123). We also reframed the second section of the results to be primarily about the uncertainty induced by volatility: while differences between the environments capture coarse effects of volatility, trialwise uncertainty should be present following reversals across both environments. We have now focused our explanation in this section on trial-wise uncertainty within the environments rather than volatility between the environments (lines 184-192). Further, we agree that there are other sources of uncertainty besides volatility that we did not manipulate in the paper, and that it remains for future work whether their manipulation would produce similar results. To amend this, we have added a new paragraph to the discussion covering these alternative sources and further qualifying the scope of our conclusions (lines 434-446).

      We also agree that our analyses in Figure 4 did not yet speak to differences in episodic-based choice that may arise due to blockwise volatility (as captured by the categorical effect of environment) vs. trial-to-trial fluctuations in uncertainty (as captured by relative uncertainty, over and above the blockwise effect). We have addressed this by adding an additional, separate effect of the interaction between environment and episodic value to our combined choice models which is explained in more detail in the recommendations for the authors portion of our response. These changes and results are described in the Methods (lines 686-694) and Results (lines 276-277; Figure 4C).

      Another key question I had about design choice was the decision to use binary rather than drifting values. Because of this, the subjects could be inferring context rather than continuously incrementing value estimates (eg Gershman et al 2012, Akam et al 2015): the subjects could be inferring which context they are in rather than tracking the instantaneous value + uncertainty. I am not sure this would qualitatively affect the results, as volatility would also affect context confidence, but it is a rather different interpretation and could invoke different quantitative predictions. And it might also have some qualitative bearing on results: the subjects have expectations about how long they will stay in a particular environment, and they might start anticipating a context change after a certain amount of time which would lead to an increase in uncertainty not just immediately after switches, but also after having stayed in the environment for a long period of time. Moreover, depending on the variance within context, there may be little uncertainty following context shifts.

      Thank you for raising this important point. To address the possibility that the task structure could have encouraged participants to infer context rather than engage in incremental learning, we added an alternative contextual inference (CI) model, based on a hidden Markov model with two hidden states (e.g. that either the red deck is lucky and the blue deck unlucky or vice versa). This model is now described in the Results of the main text (lines 226-228), listed in the Methods (line 674), and explained in detail in Appendix 3 alongside the computational models of incremental learning. Following model comparison, we found that this model provided a worse fit than the incremental learning models we previously presented in both samples, suggesting that incremental learning is a better descriptor of participants’ choices in this task than contextual inference. The results of this comparison are reflected in an updated Figure 3A.

    1. Author Response

      Reviewer #1 (Public Review):

      Slusarczyk et al present a very well written manuscript focused on understanding the mechanisms underlying aging of erythrophagocytic macrophages in the spleen (RPM) and its relationship to iron loading with age. The manuscript is diffuse with a broad swath of data elements. Importantly, the manuscript demonstrates that RPM erythrophagocytic capacity is diminished with age, restored in iron restricted diet fed aged mice. In addition, the mechanism for declining RPM erythrophagocytic capacity appears to be ferroptosis-mediated, insensitive to heme as it is to iron, and occur independently of ROS generation. These are compelling findings. However, some of the data relies on conjecture for conclusion and a clear causal association is not clear. The main conclusion of the manuscript points to the accumulation of unavailable insoluble forms of iron as both causing and resulting from decreased RPM erythrophagocytic capacity.

      We are proposing that intracellular iron accumulation progresses first and leads to global proteotoxic damage and increased lipid peroxidation. This eventually triggers the death of a fraction of aging RPMs, thus promoting the formation of extracellular iron-rich protein aggregates. More explanation can be found below. Besides, iron loading suppresses the erythrophagocytic activity of RPMs, hence further contributing to their functional impairment during aging.

      In addition, the finding that IR diet leads to increased TF saturation in aged mice is surprising.

      We believe that this observation implies better mobilization of splenic iron stores, and corroborates our conclusion that mice that age on an iron-reduced diet benefit from higher iron bioavailability, although these differences are relatively mild. More explanation can be found in our replies to Reviewer #2.

      Furthermore, whether the finding in RPMs is intrinsic or related to RBC-related changes with aging is not addressed.

      We now addressed this issue and we characterized in more detail both iron and ROS levels in RBCs.

      Finally, these findings in a single strain and only female mice is intriguing but warrants tempered conclusions.

      We tempered the conclusions and provided a basic characterization of the RPM aging phenotype in Balb/c female mice.

      Major points:

      1) The main concern is that there is no clear explanation of why iron increases during aging although the authors appear to be saying that iron accumulation is both the cause of and a consequence of decreased RPM erythrophagocytic capacity. This requires more clarification of the main hypothesis on Page 4, line 17-18.

      We thank the reviewer for this comment. It was previously reported that iron accumulates substantially in the spleen during aging, especially in female mice (Altamura et al., 2014). Since RPMs are those cells that process most of the iron in the spleen, we aimed to explore what is the relationship between iron accumulation and RPM functions during aging. This investigation led us to uncover that indeed iron accumulation is both the cause and the consequence of RPM dysfunction. Specifically, we propose that intracellular iron loading of RPMs precedes extracellular deposition of iron in a form of protein-rich aggregates, driven by RPMs damage. To support this, we now show that the proteome of RPMs overlaps with those proteins that are present in the age-triggered aggregates (Fig. 3F). Furthermore, corroborating our model, we now demonstrate that transient iron loading of RPMs via iron-dextran injection (new Fig. 3G) leads to the formation of protein-rich aggregates, closely resembling those present in aged spleens (new Fig. 3H). This implies that high iron content in RPMs is indeed a major driving factor that leads to aggregation of their proteome and cell damage. Importantly, we now supported this model with studies using iRPMs. We demonstrated that iron loading and blockage of ferroportin by synthetic mini-hepcidin (PR73)(Stefanova et al., 2018) cause protein aggregation in iRPMs and lead to their decreased viability only in cells that were exposed to heat shock, a well-established trigger of proteotoxicity (new Fig. 5K and L). We propose that these two factors, namely age-triggered decrease in protein homeostasis and exposure to excessive iron levels, act in concert and render RPMs particularly sensitive to damage during aging (see also Discussion, p. 16).

      In parallel, our data imply that the increased iron content in aged RPMs drives their decreased erythrophagocytic activity, as we now better documented by more extensive in vitro experiments in iRPMs (new Fig 6E-H). We cannot exclude that some of the senescent splenic RBCs that are retained in the red pulp and evade erythrophagocytosis due to RPM defects in aging, may also contribute to the formation of the aggregates. This is supported by the fact that mice that lack RPMs as well exhibit iron loading in the spleen (Kohyama et al., 2009; Okreglicka et al., 2021), and that the proteome of aggregates overlaps to some extent with the proteome of erythrocytes (new Fig. 3F).

      We believe that during aging intracellular iron accumulation is chiefly driven by ferroportin downregulation, as also suggested by Reviewer#3. We now show that ferroportin drops significantly already in mice aged 4 and 5 months (new Fig. 4H), preceding most of the other impairments. This drop coincides with the increase in hepcidin expression, but if this is the sole reason for ferroportin suppression during early aging would require further investigation outside the scope of the present manuscript.

      In sum, to address this comment, we now modified the fragment of the introduction that refers to our hypothesis and major findings to be more clear (p. 4), we improved our manuscript by providing new data mentioned above and we added more explanation in the corresponding sections of the Results and Discussion.

      2) It is unclear if RPMs are in limited supply. Based on the introduction (page 4, line 13-15), they have limited self-renewal capacity and blood monocytes only partially replenished. Fig 4D suggests that there is a decrease in RPMs from aged mice. The %RPM from CD45+ compartment suggests that there may just be relatively more neutrophils or fewer monocytes recruited. There is not enough clarity on the meaning of this data point.

      Thank you for this comment. We fully agree that %RPMs of CD45+ splenocytes, although well-accepted in literature (Kohyama et al., 2009; Okreglicka et al., 2021), is only a relative number. Hence, we now included additional data and explanations regarding the loss of RPMs during aging.

      It was reported that the proportion of RPMs derived from bone marrow monocytes increases mildly but progressively during aging (Liu et al., 2019). This implies that due to the loss of the total RPM population, as illustrated by our data, the cells of embryonic origin are likely even more affected. We could confirm this assumption by re-analysis of the data from Liu et al. that we now included in the manuscript as Fig. 5E. These data clearly show that the representation of embryonically-derived RPMs drops more drastically than the percent of total RPMs, whereas the replenishment rate from monocytes is not affected significantly during aging. Consistent with this, we have not observed any robust change in the population of monocytes (F4/80-low, CD11b-high) or pre-RPMs (F4/80-high, CD11b-high) in the spleen at the age of 10 months (Figure 5-figure supplement 2A and B). We also have detected a mild decrease, not an increase, in the number of granulocytes (new Figure 5-figure supplement 2C). Furthermore, we measured in situ apoptosis marker and found a clear sign of apoptosis in the aged spleen (especially in the red pulp area), a phenotype that is less pronounced in mice on an IR diet (new Fig. 5O). This is consistent with the observation that apoptosis markers can be elevated in tissues upon ferroptosis induction (Friedmann Angeli et al., 2014) and that the proteotoxic stress in aged RPMs, which we now emphasized better in our manuscript, may also lead to apoptosis (Brancolini & Iuliano, 2020). Taken together, we strongly believe that the functional defect of embryonically-derived RPMs chiefly contributes to their shortage during aging.

      3) Anemia of aging is a complex and poorly understood mechanistically. In general, it is considered similar to anemia of chronic inflammation with increased Epo, mild drop in Hb, and erythroid expansion, similar to ineffective erythropoiesis / low Epo responsiveness. It is not surprising that IR diet did not impact this mild anemia. However, was the MCV or MCH altered in aged and IR aged mice?

      We now included the data for hematocrit, RBC counts, MCV, and MCH in Figure 1-figure supplement 5. Hematocrit shows a similar tendency as hemoglobin levels, but the values for RBC counts, MCV, and MCH seem not to be altered. We also show now that the erythropoietic activity in the bone marrow is not affected in aged versus young mice. Taken together, the anemic phenotype in female C57BL/6J mice at this age is very mild, which we emphasized in the main text, and is likely affected by other factors than serum iron levels (p. 6).

      4) Page 6, line 23 onward: the conclusion is that KC compensate for the decreased function of RPM in the spleen, based on the expansion of KC fraction in the liver. Is there evidence that KCs are engaged in more erythrophagocytosis in aged mice? Furthermore, iron accumulation in the liver with age does not demonstrate specifically enhanced erythrophagocytosis of KC. Please clarify why liver iron accumulation would not be simply a consequence of increased parenchymal iron similar to increased splenic iron with age, independent of erythrophagocytic activity in resident macrophages in either organ.

      Thanks for these questions. For the quantification of the erythrophagocytosis rate in KC, we show, as for the RPMs (Fig. 1K), the % of PKH67-positive macrophages, following transfusion of PKH67-stained stressed RBCs (Fig. 1M). The data implies a mild (not statistically significant) drop (of approx. 30%) in EP activity. We believe that it is overridden by a more pronounced (on average, 2-fold) increase in the representation of KCs (Fig. 1N). The mechanisms of iron accumulation between the spleen and the liver are very different. In the liver, we observed iron deposition in the parenchymal cells (not non-parenchymal, new Fig. 1P) that we currently characterizing in more detail in a parallel manuscript. Our data demonstrate a drop in transferrin saturation in aged mice. Hence, it is highly unlikely that aging would be hallmarked by the presence of circulating non-transferrin-bound iron that would be sequestered by hepatocytes, as shown previously (Jenkitkasemwong et al., 2015). Thus, the iron released locally by KCs is the most likely contributor to progressive hepatocytic iron loading during aging. The mechanism of iron delivery to hepatocytes from erythrophagocytosing KCs was demonstrated by Theurl et al.(Theurl et al., 2016), and we propose that it may be operational, although in a much more prolonged time scale, during aging. We now discussed this part better in our Results sections (p. 7).

      5) Unclear whether the effect on RPMs is intrinsic or extrinsic. Would be helpful to evaluate aged iRPMs using young RBC vs. young iRPMs using old RBCs.

      We are skeptical if the generation of iRPMs cells from aged mice would be helpful – these cells are a specific type of primary macrophage culture, derived from bone marrow monocytes with MCSF1, and exposed additionally to heme and IL-33 for 4 days. We do not expect that bone marrow monocytes are heavily affected by aging, and would thus recapitulate some aspects of aged RPMs from the spleen, especially after 8-day in vitro culture. However, to address the concerns of the reviewer, we now provide additional data regarding RBC fitness. Consistent with the time life-span experiment (Fig, 2A), we show that oxidative stress in RBCs is only increased in splenic, but not circulating RBCs (new Fig. 2C, replacing the old Fig. 2B and C). In addition, we show no signs of age-triggered iron loading in RBCs, either in the spleen (new Fig. 2F) or in the circulation (new Fig. 2B). Hence, we do not envision a possibility that RPMs become iron-loaded during aging as a result of erythrophagocytosis of iron-loaded RBCs. In support of this, we also have observed that during aging first RPMs’ FPN levels drop, afterward erythrophagocytosis rate decreases, and lastly, RBCs start to exhibit significantly increased oxidative stress (presented now in new Fig. 4H, J and K).

      6) Discussion of aggregates in the spleen of aged mice (Fig 2G-2K and Fig 3) is very descriptive and non-specific. For example, if the iron-rich aggregates are hemosiderin, a hemosiderin-specific stain would be helpful. This data specifically is correlatory and difficult to extract value from.

      Thanks for these comments. To the best of our knowledge Prussian blue Perls’ staining (Fig. 2J) is considered a hemosiderin staining. Our investigations aimed to better understand the nature and the origin of splenic iron deposits that to some extent are referred to as hemosiderin. Most importantly, as mentioned in our reply R1 Ad. 1. to assign causality to our data, we now demonstrated that iron accumulation in RPMs in response to iron-dextran (Fig. 3G) increases lipid peroxidation (Fig. 5F), tends to provoke RPMs depletion (Fig. 5G) and triggers the formation of protein-rich aggregates (new Fig. 3H). Of note, we assume that the loss of embryonically-derived RPMs in this model may be masked by simultaneous replenishment of the niche from monocytes, a phenomenon that may be addressed by future studies using Ms4a3-driven reporter mice (as shown for aged mice in our new Fig. 5E).

      7) The aging phenotype in RPMs appears to be initiated sometime after 2 months of age. However, there is some reversal of the phenotype with increasing age, e.g. Fig 4B with decreased lipid peroxidation in 9 month old relative to 6 month old RPMs. What does this mean? Why is there a partial spontaneous normalization?

      Thanks for this comment and questions. Indeed, the degree of lipid peroxidation exhibits some kinetics, suggestive of partial normalization. Of note, such a tendency is not evident for other aging phenotypes of RPMs, hence, we did not emphasize this in the original manuscript. However, in a revised version of the manuscript, we now present the re-analysis of the published data which implies that the number of embryonically-derived RPMs drops substantially between mice at 20 weeks and 36 weeks (new Fig. 5E). We think that the higher proportion of monocyte-derived RPMs in total RPM population later in aging (9 months) might be responsible for the partial alleviation of lipid peroxidation. We now discussed this possibility in the Results sections (p. 12).

      8) Does the aging phenotype in RPMs respond to ferristatin? It appears that NAC, which is a glutathione generator and can reverse ferroptosis, does not reverse the decreased RPM erythrophagocytic capacity observed with age yet the authors still propose that ferroptosis is involved. A response to ferristatin is a standard and acceptable approach to evaluating ferroptosis.

      We fully agree with the Reviewer that using ferristatin or Liproxstatin-1 would be very helpful to fully characterize a mechanism of RPMs depletion in mice. However, previous in vivo studies involving Liproxstatin-1 administration required daily injections of this ferroptosis inhibitor (Friedmann Angeli et al., 2014). This would be hardly feasible during aging. Regarding the experiments involving iron-dextran injection, using Liproxstatin-1 would require additional permission from the ethical committee which takes time to be processed and received. However, to address this question we now provide data from iRPMs cell cultures (new Fig.5 K-L). In essence, our results imply that both proteotoxic stress and iron overload act in concert to trigger cytotoxicity in RPM in vitro model. Interestingly, this phenomenon does not depend solely on the increased lipid peroxidation, but when we neutralize the latter with Liproxstatin-1, the cytotoxic effect is diminished (please, see also Results on p. 13 and Discussion p. 15/16).

      9) The possible central role for HO-1 in the pathophysiology of decreased RPM erythrophagocytic capacity with age is interesting. However, it is not clear how the authors arrived at this hypothesis and would be useful to evaluate in the least whether RBCs in young vs. aged mice have more hemoglobin as these changes may be primary drivers of how much HO-1 is needed during erythrophagocytosis.

      Thanks for this comment. We got interested in HO-1 levels based on the RNA sequencing data, which detected lower Hmox-1 expression in aged RPMs (Figure 3-figure supplement 1). We now show that the content of hemoglobin is not significantly altered in aged RBCs (MCH parameter, Figure 1-figure supplement 5E), hence we do not think that this is the major driver for Hmox-1 downregulation. Likewise, the levels of the Bach1 message, a gene encoding Hmox-1 transcriptional repressor, are not significantly altered according to RNAseq data. Hence, the reason for the transcriptional downregulation of Hmox-1 is not clear. Of note, HO-1 protein levels in the total spleen are higher in aged versus young mice, and we also detected a clear appearance of its nuclear truncated and enzymatically-inactive form (see a figure below, we opt not to include this in the manuscript for better clarity). The appearance of truncated HO-1 seems to be partially rescued by the IR diet. It is well established that the nuclear form of HO-1 emerges via proteolytic cleavage and migrates to the nucleus under conditions of oxidative stress (Mascaro et al., 2021). This additionally confirms that the aging spleen is hallmarked by an increased burden of ROS. Moreover, we also detected HO-1 as one of the components of the protein iron-rich aggregates. Thus, we propose that the low levels of the cytoplasmic enzymatically active form of HO-1 in RPMs (that we preferentially detect with our intracellular staining and flow cytometry) may be underlain by its nuclear translocation and sequestration in protein aggregates that evade antibody binding [this is also supported by our observation that the protein aggregates, despite the high content of ferritin (as indicated by MS analysis) are negative for L-ferritin staining. Of note, we also cannot exclude that other cell types in the aging spleen (eg. lymphocytes) express higher levels of HO-1 in response to splenic oxidative stress.

      Fig. Total splenic levels of HO-1 in young, aged IR and aged mice.

      Reviewer #2 (Public Review):

      Slusarczyk et al. investigate the functional impairment of red pulp macrophages (RPMs) during aging. When red blood cells (RBCs) become senescent, they are recycled by RPMs via erythrophagocytosis (EP). This leads to an increase in intracellular heme and iron both of which are cytotoxic. The authors hypothesize that the continuous processing of iron by RPMs could alter their functions in an age-dependent manner. The authors used a wide variety of models: in vivo model using female mice with standard (200ppm) and restricted (25ppm) iron diet, ex vivo model using EP with splenocytes, and in vitro model with EP using iRPMs. The authors found iron accumulation in organs but markers for serum iron deficiency. They show that during aging, RPMs have a higher labile iron pool (LIP), decreased lysosomal activity with a concomitant reduction in EP. Furthermore, aging RPMs undergo ferroptosis resulting in a non-bioavailable iron deposition as intra and extracellular aggregates. Aged mice fed with an iron restricted diet restore most of the iron-recycling capacity of RPMs even though the mild-anemia remains unchanged.

      Overall, I find the manuscript to be of significant potential interest. But there are important discrepancies that need to be first resolved. The proposed model is that during aging both EP and HO-1 expression decreases in RPMs but iron and ferroportin levels are elevated. In their model, the authors show intracellular iron-rich proteinaceous aggregates. But if HO-1 levels decrease, intracellular heme levels should increase. If Fpn levels increase, intracellular iron levels should decrease. How does LIP stay high in RPMs under these conditions? I find these to be major conflicting questions in the model.

      We thank the Reviewer for her/his valuable feedback. As we mentioned in our replies we can only assume that a small misunderstanding in the interpretation of the presented data underlies this comment. We show that ferroportin levels in RPMs (Fig. 1F) are modulated in a manner that fully reflects the iron status of these cells (both labile and total iron levels, Figs. 1H and I). FPN levels drop in aged RPMs and are rescued when mice are maintained on a reduced iron diet. As pointed out by Reviewer#3, and explained in our replies we believe that ferroportin levels are critical for the observed phenotypes in aging. We now described our data in a more clear way to avoid any potential misinterpretation (p.6).

      Reviewer #3 (Public Review):

      This is a comprehensive study of the effects of aging of the function of red pulp macrophages (RPM) involved in iron recycling from erythrocytes. The authors document that insoluble iron accumulates in the spleen, that RPM become functionally impaired, and that these effects can be ameliorated by an iron-restricted diet. The study is well written, carefully done, extensively documented, and its conclusions are well supported. It is a useful and important addition for at least three distinct fields: aging, iron and macrophage biology.

      The authors do not explain why an iron-restricted diet has such a strong beneficial effect on RPM aging. This is not at all obvious. I assume that the number of erythrocytes that are recycled in the spleen, and are by far the largest source of splenic iron, is not changed much by iron restriction. Is the iron retention time in macrophages changed by the diet, i.e. the recycled iron is retained for a short time when diet is iron-restricted (making hepcidin low and ferroportin high), and long time when iron is sufficient (making hepcidin high and ferroportin low)? Longer iron retention could increase damage and account for the effect. Possibly, macrophages may not empty completely of iron before having to ingest another senescent erythrocyte, and so gradually accumulate iron.

      We are very grateful to this Reviewer for emphasizing the importance of the iron export capacity of RPMs as a possible driver of the observed phenotypes. Indeed, as mentioned above, we now show in the revised version of the manuscript that ferroportin drops early during aging (revised Fig. 4). Importantly, we now also observed that iron loading and limitation of iron export from iRPMs via ferroportin aggravate the impact of heat shock (a well-accepted trigger of proteotoxicity) on both protein aggregation and cell viability (new Fig. 5K and L). Physiologically, recent findings show that aging promotes a global decrease in protein solubility [BioRxiv manuscript (Sui X. et al., 2022)], and it is very likely that the constant exposure of RPMs to high iron fluxes renders these specialized cells particularly sensitive to proteome instability. This could be further aggravated by a build-up of iron due to the drop of ferroportin early during aging, ultimately leading to the appearance of the protein aggregates as early as at 5 months of age in C57BL/6J females. Based on the new data, we emphasized this model in the revised version of the manuscript (please, see Discussion on p. 16)

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript clearly demonstrates that murine malaria infection with Plasmodium chabaudi impairs B cells' interaction with T cells, rather than DCs interaction with T cells. The authors elegantly showed that DCs were activated, capable of acquiring antigens and priming T cells during P. chabaudi infection. B cells are the main APC to capture particulate antigens such as infected RBC (iRBC), while DCs preferentially take up soluble antigens. This study is important to understand how ongoing infections such as malaria may negatively affect heterologous immunizations.

      Overall, the experimental designs are straightforward, and the manuscript is well-written. However, there were several limitations in this study.

      Specific comments:

      1) The mechanism of how the prior capture of iRBC by B cells lead to the impairment of B-T interaction was not understood. It is unclear whether the impairment of B-T cell interaction is due to direct BCR interaction with iRBC, or an indirect response to extrinsic factors induced by malaria infection.

      We believe we have carefully demonstrated that impairment of B-T interactions does not require specific BCR-antigen interactions between B cells and iRBCs (for a complete explanation of this point, please see the response to the next comment). However, the question remains whether direct, antigen-nonspecific iRBC-B cell interactions (i.e., not mediated by the BCR) or additional extrinsic factors, or a combination, are responsible for the observed defects in Tfh and GC B cell populations.

      Existing studies from other infection models are informative in answering this question. Daugan et al (Front Immunol 2016; PMID 27994594) previously published experiments similar to ours, but used LCMV instead of Plasmodium. That is, they immunized uninfected or LCMV-infected mice with the well-studied immunogen NPP-CGG and measured NP-specific antibody production and other parameters. They found that LCMV infection concurrent with immunization (or 4-8 days before) significantly decreased the numbers of NP-specific splenic antibody-secreting cells and IgG1 titers, and caused major disruptions to splenic architecture. These defects were shown to require type I interferon (T1IFN) signaling in B cells. However, T1IFN is unlikely to be solely responsible for the observed phenotypes, because simultaneous infection with VSV, another virus that also induces T1IFN, did not cause any defects in NP-specific antibody production. Contrasting with the work of Daugan et al, Banga et al (PloS One 2015; PMID 25919588) found that infecting with LCMV (or with Listeria monocytogenes) two days after heterologous immunization did not disrupt immunogen-specific responses, whereas P. yoelii did. Examining both these studies, we hypothesize that both LCMV and Plasmodium infections can disrupt humoral responses, but that LCMV does so within a narrower time frame, thereby yielding different results depending on whether infection comes a few days before or a few days after immunization.

      Complementing these studies of heterologous immunization, additional publications have reported that cytokines induced by several different pathogenic infections drive disruption of germinal centers and decreases in antibody titers specific for the pathogen itself, often correlated with disordered splenic architecture. Glatman Zaretsky et al. (Infect Immun 2012; PMID 22851754) showed that Toxoplasma gondii infection causes transient disruption of splenic architecture and loss of defined GCs by microscopy. These defects were partially due to decreased lymphotoxin expression by B cells, and were rescued by a lymphotoxin receptor agonist. Similarly, we previously reported that blood-stage Plasmodium infection disrupted germinal center responses to a Plasmodium liver-stage antigen (Keitany et al. Cell Rep 2016; PMID 28009289). In this context, however, the same lymphotoxin receptor agonist had no effect on GCs; instead, blockade of the pro-inflammatory cytokine interferon gamma partially restored antibody responses to the liver-stage antigen. Overall, we favor the hypothesis that several different pathogens can disrupt GCs and antibody responses indirectly by inducing inflammation and a disordered splenic environment; however, the precise mechanisms of disruption likely differ from infection to infection, with different cytokines or other effectors playing key roles in some but not other settings. Importantly, not all pathogens disrupt antibody production, since again, infection with VSV or L. monocytogenes did not affect immunogen-specific titers in immunized mice (Daugan Front Immunol 2016; Banga et al. 2015). We have now addressed this topic at length in the Discussion (lines 399-418).

      The existence of indirect, inflammation- or cytokine-related mechanisms that may interfere with germinal center formation and antibody production does not preclude additional direct interactions between B cells and iRBCs that might also affect B cell function. We address this possibility more fully in the response to the next comment.

      2) Would malaria infection in MD4 mouse that carries transgenic BCR that does not recognize malaria parasite impair subsequent B cell response to HEL immunization? This may clarify whether the impairment of subsequent B cell response is BCR-specific. If malaria impairs subsequent B cell response to HEL in MD4 mouse, it might suggest that other cell types and B cell-extrinsic factors might be involved in causing the impaired B cell responses, instead of malaria affecting B cells directly.

      The question of whether the impairments we observe require BCR-specific interactions with iRBCs is an important one. However, we believe that the experiment the reviewer proposes to address this question has technical limitations; further, we assert that we have already provided data to address a requirement for BCR specificity.

      With regard to the proposed experiment of immunizing MD4 mice with HEL in the presence or absence of malaria infection: MD4 mice, in which B cells express a transgenic receptor specific for HEL, can be expected to mount a massive, monoclonal response to direct immunization with HEL that would be very different from the physiological context of a polyclonal B cell population. We are doubtful that this experimental setup would be informative for the question at hand, especially because we are studying the effects of B-Tfh interactions, which are already limiting in the physiological setting of a polyclonal B cell response, but would be massively unbalanced in an MD4 mouse where all B cells express the receptor for HEL.

      Usually, investigators studying MD4 B cell responses generate a more physiological setting by adoptively transferring a small but detectable number of MD4 transgenic B cells into a mouse with a normal polyclonal B cell population, and immunizing that mouse. We maintain that this approach is essentially what we have done in our study, except that instead of using transferred. transgenic cells to identify a B cell population of known specificity, we have used tetramers to detect a specific population of endogenous B cells in a polyclonal setting. By examining GP-specific B cells in our immunization experiments, we restricted our analysis to B cells that could not have had any BCR-mediated, antigen-specific interactions with iRBCs (because the GP antigen is not present in the iRBCs; it is delivered as a soluble protein antigen, 5 days after initiation of infection). Because we see dysfunction in the GP-specific T and B cell populations despite the absence of this antigen within iRBCs, we can conclude that the disruptions to these populations are not due to antigen-specific iRBC-BCR interactions.

      We do also show (using MD4 B cells in Fig. S1B) that selective interactions between iRBCs and B cells do not require an antigen-specific BCR. Thus, it is still possible that direct interactions between iRBCs and B cells (that are independent of antigen binding to the BCR) are responsible for disrupting subsequent adaptive responses, perhaps in addition to the more indirect factors that we discuss in the response to Comment #1 above. We are very interested in this possibility, which is discussed in lines 428-436 of the manuscript. But the use of MD4 B cells would not address this specific question. Instead, we would need to identify an alternative pathway or receptor that mediates the iRBC-B cell interaction, and study the effects of blocking that pathway on downstream adaptive responses. We have spent considerable time and energy on this question, but have not yet been able to identify such a pathway; this remains a matter for further study.

      3) MD4 mice were mentioned in the Methods in vitro RBC binding, although none of the figures described the usage of MD4 mice. This experiment data might be important to show whether RBC binding to B cells is mediated through BCR.

      Cells from MD4 mice were used in Figure S1B to show that in vitro binding of iRBCs to B cells did not require interaction with an antigen-specific BCR. We agree that this is an important point and have revised the text (lines 152-156) to outline it more clearly.

      4) Does P. chabaudi infection have any effects on B cell uptake of subsequent antigens, such as soluble antigen PE or particulate antigen CFSE-labeled P. yoelii iRBC?

      We examined uptake of PE by B cells in P. chabaudi-infected mice (5 days post-infection) compared to naïve mice. There was a trend towards increased uptake in the infected mice, but this difference was not significant. These data are taken from the same samples that did reveal a significant increase in PE uptake by DCs in infected mice (Fig. 3C). We have now included the B cell data in the paper as Figure 3D, and discussed them in lines 231-232.

      5) Is this phenomenon specific to malaria infection? Does malaria-irrelevant particulate immunization affect T-B interaction of subsequent heterologous immunization?

      We do not believe this phenomenon is specific to malaria infection; please see the extensive discussion of this point in the response to Comment #1 above. We would hypothesize that malaria-irrelevant particle immunization (as with nanoparticles) would not affect T-B interactions for subsequent heterologous immunizations, however, since the disruption seems to be associated with the massive inflammation and splenic disorganization that occurs following certain infections.

      6) Despite the impaired Tfh and GC 8 days after immunization following malaria infection, Fig. 5F showed GP-specific IgG eventually increased to the same level as the uninfected immunized mice on day 23. Did the authors check whether these mice had a delayed Tfh and GC response that eventually increase on day 23? Are these antibody responses derived from GC, or GC-independent response?

      We have now examined GP-specific T cell numbers and polarization between days 23 and 35 post-immunization. We found that although a defect persists in the percentage of GP66-specific T cells that exhibit a GC Tfh phenotype at later timepoints, the absolute number of GC Tfh cells is not significantly defective in infected mice at these times. Concurrently there is a slight (though nonsignificant) increase in the total numbers of GP66+ T cells in the infected mice; we believe that this modest overall expansion permits recovery of the GC Tfh population numbers despite the continued defect in their frequency. These findings are consistent with our observation that antibody levels recover in infected mice by 3 weeks post-infection. We have added these data to Figure 4 (E-G) and discuss them in lines 283-293.

      7) Does recovery from malaria infection by antimalarial treatment rescue the B cell response to subsequent heterologous immunization?

      We have shown previously that drug-mediated clearance of blood-stage Plasmodium infection restores GC and antibody responses to a liver-stage-specific antigen, which normally are disrupted by emergence of the blood-stage (Keitany et al. Cell Rep 2016). We have also shown that antimalarial drug treatment restores GC responses in mice lacking the innate immune sensor CGAS, which have higher parasitemia, exacerbated splenic disruption, and diminished GC responses following P. yoelii infection (Hahn et al., JCI Insight 2018). Based on these results we hypothesize that drug-mediated clearance of blood-stage infection would also rescue B cell responses to heterologous immunization.

      8) Fig. 1C shows more nRBC was taken up than iRBC in B cells, but Line 142 states that "B cells bound significantly more iRBC than nRBC. Is there a mistake in the figure arrangement? Why do B cells take up for naïve RBC than iRBC?

      The symbols in the figure legend were switched in error; the filled circles are actually iRBC+ and the outlined circles are nRBC+. We regret the error and appreciate the reviewer bringing it to our attention. We have corrected the figure.

      9) Fig. S1 C and D are confusing. CD45.1+ CD45.2+ mouse did not receive labeled iRBC, but why iRBC was detected as much as 40% in the spleen of this naïve mouse?

      The experiment depicted in Figs. S1 C and D was designed to test whether B cells actually bound injected iRBCs in vivo, or whether the binding occurred during processing of the tissue. With this experimental setup (injecting labeled iRBCs into CD45.2+ mice, then excising and disrupting the spleen together with an untreated CD45.1+ CD45.2+ spleen), iRBC signal from in vivo uptake should be observed only in CD45.2+ splenocytes, whereas iRBC binding that occurs during tissue processing will be distributed between the two genotypes. Thus, the ~40% of iRBC signal observed in CD45.1+ CD45.2+ B cells leads us to conclude that much of the observed B cell binding from our in vivo experiments occurs during processing, as we state in the text (lines 151-152). Even so, in vitro experiments clearly show that B cells selectively bind iRBCs over naïve RBCs in a setting where processing is not a confounder (Fig. S1B). To clear up any confusion, we have expanded the description of the experiment and its interpretation in the Supplemental Figure Legend.

      Reviewer #2 (Public Review):

      The data presented support the conclusions of the paper, and my concerns are largely conceptual in how we understand this data in the context of malaria infection in vaccination in endemic areas

      1) The data is presented based on the idea that antigen uptake and presentation differ between particle and soluble antigens, and that during malaria infection particle uptake is more important due to circulating iRBCs. However, during parasite invasion of RBCs, the parasite sheds large amounts of antigen into the circulation, at least some of which would then be found in a soluble form in the circulation. Can the authors comment on this aspect of infection and if/how this may impact the interpretation of results? Do authors assume that any soluble antigen taken up and presented (via DCs?) during infection would be impacted as for GP66 soluble antigen? Or could an interaction on immune responses where the antigen is presented via both particle and soluble pathways?

      This is an important point and we have now discussed it further in the text (lines 111-115, 204-210, and 356-357). In our previously published study, where we extensively characterized CD4 T cell responses to the GP66 epitope expressed by P. yoelii, the epitope was fused to a parasite protein (Hep17) that localizes to the parasitophorous vacuole membrane, and so we do assume that the majority of this antigen is encountered by APCs in the context of an iRBC, rather than shed in soluble form. In contrast, some merozoite surface antigens such as cleaved MSP1 are shed copiously from the parasite coat upon RBC invasion, and therefore would be expected to exist in soluble as well as parasite-associated form.

      Unfortunately, our laboratory does not currently have tetramer reagents or access to transgenic mice that would allow us to assess T cell responses specific for shed or soluble parasite antigens. But a previous study from Stephens et al. (Blood 2005; PMID 15890689) reported that T cells with a transgenic TCR specific for an epitope in the shed portion of MSP1 could boost antibody production when transferred into T cell-deficient mice infected with P. chabaudi, suggesting that at least some fraction of the MSP1-specific T cells differentiate into T helper cells capable of supporting B cell activity. However, antibody production was significantly delayed in this setting compared to its usual kinetics in wild-type mice. Further side-by-side characterization would be needed to assess differentiation of these MSP1-specific transgenic T cells during infection, and determine whether they are primed from B cells or from DCs (or a combination).

      We will note that we have extensively characterized B cell responses to MSP1 during both infection and immunization. While robust and T-dependent, MSP1-specific B cell responses in infected mice are delayed relative to their kinetics in mice immunized with recombinant MSP1 or other protein antigens. This may indicate that MSP1-specific T cell activation or cognate B-T interactions are defective in infected mice relative to immunized mice, despite the presumed presence of soluble (shed) MSP1 during infection. If this is the case, it suggests that the defects we describe in the current manuscript exist for both particle-associated and soluble parasite antigens. However, as we mentioned above, a careful characterization of MSP-1-specific T cell differentiation is needed to really understand this, and that requires additional tools that we can’t easily access at this time.

      2) Impact of particle antigen opsonisation on antigen uptake and presentation. The authors use parasites isolated from mice who have been infected for 6-7 days to investigate the ability of different subsets to update particle antigens. At this time point, have mice developed antibody responses that opsonise these parasites, or are antibody levels low and parasites opsonised? Would opsonised parasites, such as those coated with sera from children in a setting of chronic infection, have a different pattern/ability to be opsonised by different immune cell subsets? And/or would opsonisation change how the DC and other cell types are processing/presenting antigens? While these issues could be addressed experimentally either now or in the future, the manuscript should at least consider this issue because, during a human infection in areas of high exposure, individuals are likely to have reasonable levels of antibodies with opsonised parasites circulating.

      We ourselves have been very interested in the question of whether host antibodies (or other host factors such as complement) might affect uptake of iRBCs. As the reviewer notes, the iRBCs we use in our experiments are taken from mice 6-7 days post-infection, at which time some amount of anti-parasite antibody has developed. We spent a considerable amount of time trying to measure effects of opsonizing antibody, or even deposited complement, on uptake of iRBCs. However, we did not see any change in B cell binding of iRBCs in vitro when we blocked complement receptor with anti-CD21; blocked antibody receptors (Fc receptors) with anti-CD16/CD32 or excess normal mouse serum; or used iRBCs taken from complement-depleted mice (treated with cobra venom factor) or from uMT mice (which entirely lack B cells and antibody). Thus, we have not been able to find any role for opsonizing antibody (or complement) in iRBC uptake. We have not included these experiments in the manuscript because they yielded only negative data, and we were not able to design positive controls robust enough to give us confidence that the experiments were technically sound (and therefore that the negative results were due to the underlying biology and not to experiment failure). We have added a discussion point about this issue (lines 438-442).

      3) While authors show that malaria infection disrupts the response to soluble antigens, the relevance directly to vaccination should be considered carefully, specifically because vaccines of soluble antigens are largely given alongside adjuvants which also will modulate DC function. Again, this could be addressed experimentally now or in the future, but definitely should be mentioned and considered when interpreting the results.

      Whenever we performed soluble protein immunizations to examine adaptive immune responses in this study, the immunogen was delivered in adjuvant, specifically Sigma Adjuvant System (SAS), as described in the Methods. This adjuvant contains the Monophosphoryl Lipid A component from Salmonella in an oil-water emulsion, and as such, its formulation is at least roughly similar to the AS01 adjuvant used in Mosquirix (RTS,S), the only licensed anti-malaria vaccine, as well as other vaccines currently used in humans. SAS has been shown to elicit very high titers of neutralizing antibodies in mice (Sastry et al., PloS One 2017, PMID 29073183). Therefore our results should have relevance for vaccination in humans. We have modified the manuscript text (lines 454-455 to highlight that in this study, protein immunogens were administered with adjuvant.

    1. Author Response

      Reviewer #1 (Public Review):

      The study by Xie et al., investigates whether the entorhinal-DG/CA3 pathway is involved in working memory maintenance. The main findings include a correlation between stimulus and neural similarities that was specific for cued stimulus and entorhinal-DG/CA3 locations. The authors observed similar results (cuing and region specificity) using inverted encoding modeling approach. Finally, they also showed that trials in which participants made a smaller error showed a better reconstruction fidelity on the cued side (compared to un-cued). This effect was absent for larger-error trials.

      The study challenges a widely held traditional view that working memory and episodic memory have largely independent neural implementations with the MTL being critical for episodic memory but not for working memory. The study adds to a large body of evidence showing involvement of the hippocampus across a range of different working memory tasks and stimuli. Nevertheless, it still remains unclear what functions may hippocampus play in working memory.

      We thank the reviewer’s positive appraisal of the current research, which adds to the growing research interest in the MTL’s contribution to WM.

      Reviewer #2 (Public Review):

      Xie et al. investigated the medial temporal lobe (MTL) circuitry contributions to pattern separation, a neurocomputational operation to distinguish neutral representations of similar information. This presumably engages both long-term memory (LTM) and working memory (WM), bridging the gap between the working memory (WM) and long-term memory (LTM) distinction. Specifically, the authors combined an established retro-cue orientation WM task with high-resolution fMRI to test the hypothesis that the entorhinal-DG/CA3 pathway retains visual WM for a simple surface feature. They found that the anterior-lateral entorhinal cortex (aLEC) and the hippocampal DG/CA3 subfield both retained item-specific WM information that is associated with fidelity of subsequent recall. These findings highlight the contribution of MTL circuitry to item-specific WM representation, against the classic memory models.

      I am a long-term memory researcher with expertise in representational similarity analysis, but not in inverted encoding modeling (IEM). Therefore, I cannot verify the correctness of these models and I will leave it to the other reviewers and editors. However, after an in-depth reading of the manuscript, I could evaluate the significance of the present findings and the strength of evidence supporting these findings. The conclusions of this paper are mostly well supported by data, but some aspects of image acquisition and data analysis need to be clarified.

      We thank the reviewer for positive appraisal of the current study.

      I would like to list several strengths and weaknesses of this manuscript:

      Strengths:

      • Methodologically, the authors addressed uncertainty in previous research resulting from several challenges. Namely, they used a high-resolution fMRI protocol to infer signals from the MTL substructures and an established retro-cue orientation WM task to minimize the task load.

      • The authors selected a control ROI - amygdala - irrelevant for the experimental task, and at the same time adjacent to the other MTL ROIs, thus possibly having a similar signal-to-noise ratio. The reported effects were observed in the aLEC and DG/CA3, but not in the amygdala.

      • Memory performance, quantified as recall errors, was at ceiling - an average recall error of 12 degrees was only marginally away from the correct grating towards the closest incorrect grating (predefined with min. 20 degrees increments). However, the authors controlled for the effects of recall fidelity on MTL representations by comparing the IEM reconstructions between precise recall trials and imprecise recall trails (resampled to an equal number of trials). The authors found that precise recall trails have yielded better IEM reconstruction quality.

      • The author performed a control analysis of time-varying IEM to exclude a possibility that the mid-delay period activity in the aLEC-DG/CA3 contains item-specific information that could be attributed to perceptual processing. This analysis showed that the earlier TR in the delay period contains information for both cued and uncued items, whereas the mid-delay period activity contains the most information related to the cued, compared to uncued, item.

      We thank the reviewer for highlighting the multiple strengths of the current study.

      Weaknesses:

      • The authors formulate their main hypothesis building on an assumption related to the experimental task. This task requires correctly selecting the cued grating orientation while resisting the interference from internal representations of the other orientation gratings. The authors hypothesize that if this post-encoding information selection function is supported by the MTL-s entorhinal-DG/CA3 pathway, the recorded delay-period activity should contain more information about the cued item that the uncued item (even if both are similarly remembered). Thus, the assumption here is that resolving the interference would be reflected by a more distinct representation in MTL for the cued item. Could it be the opposite, namely the MTL could better represent the unresolved interference, for example by the mechanism of hippocampal repulsion (Chanales et al., 2017). It could strengthen the findings if the authors comment on the contrary hypothesis as well.

      We thank the reviewer for pointing out this interesting alternative hypothesis. Because of the different task design (e.g., over the course of learning vs. WM) and stimuli (e.g., spatial memory vs. orientation grating), it is hard to directly compare Chanales et al.’s findings with the current results. That said, we think the idea that the representation of similar information would lead to greater task demand on the MTL is consistent with our intuition regarding the role of the MTL in supporting the qualitative aspect of WM representation. We have now further discussed this issue in our revised manuscript to invite further consideration of the suggested alternative hypothesis,

      “Our data suggest that this process would result in more similar and stable representations for the same remembered item across trials, as detected by multivariate correlational and decoding analyses in the current study. However, under certain task conditions (e.g., learning spatial routes in a naturalistic task over many repetitions), the MTL may maximally orthogonalize overlapping information to opposite representational patterns (hence “repulsion”) to minimize mnemonic interference (Chanales et al., 2017). It remains to be determined how these learning-related mechanisms in a more complex setting are related to MTL’s contributions to WM of simple stimulus features.”

      • It is not clear for me why the authors chose the inverted encoding modelling approach and what is its advantage over the others multivoxel pattern analysis approaches, for example representational similarity analysis also used in this study. How are these two complementary? Since the IEM is still a relatively new approach, maybe a little comment in the manuscript could help emphasizing the strength of the paper? Especially that this paper is of interest to researchers in the fields of both working memory and long-term memory, the latter being possibly not familiar with the IEM.

      We thank the reviewer for this suggestion. In principle, the IEM is a multivariate pattern classification analysis based on an encoding model. There is no fundamental difference between this approach and other machine-learning or classification approaches, except that the IEM is a more model-based approach and therefore can be more computationally efficient (see Xie et al., 2023 for a conceptual overview for multivariate analysis of high-dimensional neural data). The relationship between IEM and representational similarity is grounded in item-specific information that could lead to shared neural variance. How these two analyses are complimented each other is well characterized by a recent theoretical review (Kriegeskorte & Wei, 2021). The rationale is that trial-wise RSA reveals shared neural variance between items, implying the presence of item-specific information in the recorded neural data. And the IEM approach or other classification algorithms can more directly test this item-specific information under a prediction-based framework (e.g., train the data and test on a hold-out set). As a result, the findings of these two methods are correlated at the subject-level (Figure S4), which is important to note for the purpose of analytical reliability. Furthermore, using the IEM also allows us to compare our current findings with that from the previous research (Figure S3), addressing some replicability questions in the field (e.g., Ester et al., 2015).

      We have clarified more on this issue in the paragraph when we first introduce IEM,

      “To directly reveal the item-specific WM content, we next modeled the multivoxel patterns in subject-specific ROIs using an established inverted encoding modeling (IEM) method (Ester et al., 2015). This method assumes that the multivoxel pattern in each ROI can be considered as a weighted summation of a set of orientation information channels (Figure 3A). By using partial data to train the weights of the orientation information channels and applying these weights to an independent hold-out test set, we reconstruct the assumed orientation information channels to infer item-specific information for the remembered item – operationalized the resultant vector length of the reconstructed orientation information channel normalized at 0° reconstruction error (Figure S2). As this approach verifies the assumed information content based on observed neural data, its results can be efficiently computed and interpreted within the assumed model even when the underlying neuronal tuning properties are unknown (Ester et al., 2015; Sprague et al., 2018). This approach, therefore, complements the model-free similarity-based analysis by linking representational geometry embedded in the neural data with item-specific information under a prediction-based framework (Kriegeskorte and Wei, 2021; Xie et al., 2023). Based on this method, previous research has revealed item-specific WM information in distributed neocortical areas, including the parietal, frontal, and occipital-temporal areas (Bettencourt and Xu, 2015; Ester et al., 2015; Rademaker et al., 2019; Sprague et al., 2016), which are similar to those revealed by other multivariate classification methods (e.g., support vector machine, SVM, Ester et al., 2015). We have also replicated these IEM effects in the current dataset (Figure S3).”

      Overall, this work can have a substantial impact of the field due to its theoretical and conceptual novelty. Namely, the authors leveraged an established retro-cue task to demonstrate that a neurocomputational operation of pattern separation engages both working-memory and long-term memory, both mediated by the MTL circuitry, beyond the distinction in classic memory models. Moreover, on the methodological side, using the multivariate pattern analyses (especially the IEM) to study neural computations engaged in WM and LTM seems to be a novel and promising direction for the field.

      Thanks for the reviewer for this positive appraisal of the current study.

      Reviewer #3 (Public Review):

      This work addresses a long-standing gap in the literature, showing that the medial temporal lobe (MTL) is involved in representing simple feature information during a low-load working memory (WM) delay period. Previously, this area was suggested to be relevant for episodic long-term memory, and only implicated in working memory under conditions of high memory load or conjunction features. Using well-rounded analyses of task-dependent fMRI data in connection with a straightforward behavioural experiment, this paper suggests a more general role of the medial temporal lobe in working memory delay activity. It also provides a replication of previous findings on item-specific information during working memory delay in neocortical areas.

      We thank the reviewer for highlighting the contribution of the current study to fill a gap in the literature.

      Strengths:

      The study has strengths in its methods and analyses. Firstly, choosing a well-established cueing paradigm allows for straightforward comparison with past and future studies using similar paradigms. The authors themselves show this by replicating previous findings on delay-period activity in parietal, frontal, and occipito-temporal areas, strengthening their own and previous findings. Secondly, they use a template with relatively fine-grained MTL-subregions and choose the amygdala as a control area within the MTL. This increases confidence in the finding that the hippocampus in particular is involved in WM delay-period activity. Thirdly, their combined use stimulus-based representational similarity analysis as well as Inverted Encoding Modeling and the convergence on the same result is encouraging. Finally, despite focusing on the delay period in their main findings, extensive supplementary materials give insight into the time-course of processing (encoding) which will be helpful for future studies.

      We thank the reviewer for highlighting multiple strengths of this current study.

      Weaknesses:

      While the evidence generally supports the conclusions, there are some weaknesses in behavioural data analysis. The authors demonstrated fine stimulus discrimination in the neural data using Inverted Encoding Modeling (IEM), however the same standard is not applied in the behavioural data analysis. In this analysis, trials below 20 degrees and trials above 20 degrees of memory error are collapsed to compare IEM decoding error between them. As a result, the "small recall error" group encompasses a total range of 40 degrees and includes neighbouring stimuli. While this is enough to demonstrate that there was information about the remembered stimulus, it does not clarify whether aLEC/CA3 activity is associated with target selection only or also with reproduction fidelity. It leaves open whether fine-grained neural information in MTL is related to memory fidelity.

      We thank the reviewer for this cautious note. As the current task is optimized to reveal the neural representation during visual WM and as our participants are cognitively normal college students, participants’ behavioral performance in the current experiment tends to be very good (Figure 1). This leaves us relatively small variation to further probe the behavioral outcomes of the task. We have recently generalized our findings using intracranial EEG and confirmed that trial-by-trial mnemonic discrimination during a short delay is indeed associated with the fidelity of item-specific WM representation (Xie, Chapeton, et al., in press).

      We have further discussed this issue in the revised Discussion,

      “… These two approaches are therefore complementary to each other. Nevertheless, these analyses are correlational in nature. Hence, although fine-grained neural representations revealed by these analyses are associated with participants’ behavioral outcomes (Figure 4), it remains to be determined whether the entorhinal-DG/CA3 pathway contributes to the fidelity of the selected WM representation or also to the selection of task-relevant information. Strategies for resolving this issue can involve generalizing the current findings to other WM tasks without an explicit requirement of information selection (e.g., intracranial stimulation of the MTL in a regular WM task without a retro-cue manipulation, Xie et al., in press) and/or further exploring how the frontal-parietal mechanisms related to visual selection and attention interact with the MTL system (Panichello and Buschman, 2021).”

      Moreover, the authors could be more precise about the limitations of the study and their conclusions. In particular, the paper at times suggests that the results contribute to elucidating common roles of the MTL in long-term memory and WM, potentially implementing a process called pattern separation. However, while the paper convincingly shows MTL-involvement in WM, there is no comparison to an episodic memory condition. It therefore remains an open question whether it fulfils the same role in both scenarios. Moreover, the paradigm might not place adequate pattern separation demands on the system since information about the un-cued item may be discarded after the cue.

      We thank the reviewer for this cautious note. We have now included a more detailed discussion on this issue.

      In the Discussion,

      “To more precisely reveal the MTL mechanisms that are shared across WM and long-term memory, future research should examine the extent to which MTL voxels evoked by a long-term memory task (e.g., mnemonic similarity task, Bakker et al., 2008) can be directly used to directly decode mnemonic content in visual WM tasks using different simple stimulus features.”

    1. Author Response

      Reviewer #2 (Public Review):

      In regions that implement an elimination strategy prolonged periods of no local transmission mean that there is no data available to estimate Reff using the currently available methods. Transmission rates from travellers to community members, and between community members, are different when border restrictions occur, as is frequently the case when implementing an elimination strategy. When cases are low and importation risk is high, a reasonable estimation method must acknowledge this transmission heterogeneity, for example, as shown in equations 5-8 and 10-11 of this paper.

      The calculation of transmission potential adds significant data requirements (summarized in Figure 1), such that some regions where the methodology would be valuable may lack the data to estimate the macro- and micro-distancing parameters. In the paper, such parameters are estimated from weekly surveys performed by market research groups and the University of Melbourne. In contrast, using existing methods in regions where local spread does occur, Reff can be calculated and generate reasonable insight with relatively little data. Due to the additional data requirements, the calculation of transmission potential is less accessible than some current approaches to calculate Reff in regions with local spread.

      We agree with these comments about the need for behavioural data. We believe this point is made clearly in our existing discussion text, copied below:

      Despite its demonstrated impact, there are limitations to our approach. Firstly, it relies on data from frequent, population-wide surveys. In Australia, these data are collected for government and made available to our analysis team by a market research company which has access to an established “panel” of individuals who have agreed to take part in surveys of public opinion. Researchers and governments in many other countries have used such companies for rapid data collection to support pandemic response [23, 25]. However, these survey platforms are not readily available in all settings.

      We also believe it is clear throughout the manuscript that transmission potential provides complementary information to Reff, and unlike Reff can be calculated in the absence of transmission.

      The authors describe "macro-distancing": the rate of non-household contacts; and "micro-distancing": the transmission probability per non-household contact. This terminology "micro-distancing" gives the false impression that transmission probability depends solely on distance. In the paper, transmission probability is estimated from survey responses to the question 'are you staying 1.5m away from people who are not members of your household?'. This data is limited to estimate the transmission probability and overlooks the impact of mask use, improved ventilation, and meeting outdoors (all non-distance-based approaches). The paper mentions that self-reported hand hygiene could be used to estimate micro-distancing. COVID-19 spreads through airborne transmission, but the paper gives no mention of ventilation or mask-wearing.

      We agree with these important points and have adjusted the terminology for micro-distancing behaviour to improve clarity. We now refer to it as “precautionary micro-behaviour” since adherence to the 1.5 metre rule is used as a proxy/indicator for change over time in all behaviours that influence transmission (other than those reducing the number of contacts). This includes behaviours such as mask-wearing, preference for outdoor gatherings, hand hygiene etc .

      In addition to changing the terminology for this metric throughout the manuscript, we have added the following explanation to the “Model” section of the manuscript (lines 100-105):

      The modelling framework uses adherence to the 1.5 metre rule as a proxy for all behaviours (other than those reducing the number of contacts) that may influence transmission, and so is intended to capture the use of masks, preference for outdoor gatherings, and hand hygiene, among other factors. The 1.5 metre rule was a suitable proxy because it was consistent public health advice throughout the analysis period and time-series data were available to track adherence to this metric over time.

      Some of the writing lacks precision around the descriptions of Reff. Notably, Reff is not a rate because it does not have units 'per time'. There is a lack of clarity that Reff is infections generated over an individual's entire infectious period. Other metrics of outbreak growth are rates, for example, an exponential growth rate parameter. This lack of clarity in the writing does not impact the methodology.

      Thank you for pointing out this lack of clarity, we have removed references to Reff as a ‘rate’ throughout. We have added to our initial definition of Reff (lines 29-32) that the infections cover the entire infectious period:

      A key element of epidemic response is the close monitoring of the speed of disease spread, via estimation of the effective reproduction number (Reff) — the average number of new infections caused by an infected individual over their entire infectious period, in the presence of public health interventions and where no assumption of 100% susceptibility is made.

      In the paper, model parameters are estimated from multiple independent data sources using carefully derived inference models that include complex considerations such as right-censoring of reported cases. While data availability may be a limitation to calculating the transmission potential, the modelling and statistics in the paper are rigorous, and calculation of the transmission potential fills a gap by allowing regions that implement elimination strategies to estimate a quantity similar to Reff.

      We thank the reviewer for their positive feedback.

    1. Author Response

      Reviewer #2 (Public Review):

      In the current manuscript, Feng et al. investigate the mechanisms used by acute leukemia to get an advantage for the access to the hematopoietic niches at the expense of normal hematopoietic cells. They propose that B-ALLs hijack the niche by inducing the downmodulation of IL7 and CXCL12 by stimulating LepR+ MSCs through LTab/LTbR signaling. In order to prove the importance of LTab expression in B-ALL growth, they block LTab/LTbR signaling either through ligand/receptor inactivation or by using a LTbR-Ig decoy. They also show that CXCL12 and the DNA damage response induce LTab expression by B-ALL. They finally propose that similar mechanisms also favor the growth of acute myeloid leukemia.

      Although the proposed mechanism is of particular interest, further experiments and controls are needed to strongly support the conclusions.

      1/ Globally, statistics have to be revised. The authors have to include a "statistical analysis" section in the Material and Methods to explain how they proceeded and specify for each panel in the figure legend which tests they used according to the general rules of statistics.

      We apologize for the lack of details. This has been corrected in the revised manuscript.

      2/ The setup of each experiment is confusing and needs to be detailed. Cell numbers are not coherent from one experiment to the other. As an example, there are discrepancies between Fig1 and Fig2. Based on the setup of the experiment in Fig.2 (Injection of B-ALL to mice followed by 2 injections of treatment every 5 days), mice have probably been sacrificed 12-14 days post leukemic cell injection. However, according to Fig.1, B cells and erythroid cells at this time point should be decreased >10 times while they are only decreased 2-4 times in Fig.2. This is also the case in Fig.4B-J or Fig.5D with even a lower decrease in B cells and erythroid cells despite a high number of leukemic cells. Please explain and give the end point for each experiment in each figure (main and supplemental).

      We understand the reviewer concern but we’d like point out the following: kinetic experiments such as these were reproduced multiple times in the laboratory. However, when comparing side-by-side experiments performed over the course of several months discrepancies in the exact days when leukemia shuts-down hematopoiesis are bound to happen. This is because there are numerous variables at play that we can minimize to the extent possible, but we cannot completely eliminate. For example, we took all possible steps to work with stable batches of preB-ALL cells. However, it is impossible to be absolutely certain that the batch in one experiment is identical to another experiment. Cells have to be expanded for adoptive transfer, which inevitably carries some variability (all biological systems undergo random mutations, including purchased C57Bl6/J from reputable vendors); slight differences in ALL engraftment (i.e. injection variability) can occur such that kinetics may change by a couple of days, etc. The findings we reported here are highly reproducible: ALL shuts down lymphopoiesis and erythropoiesis acutely, less so myelopoiesis; that LTbR signaling is the major mechanism shutting down lymphopoiesis but not erythropoiesis; that ALLs up-regulate LTbR ligands when compared to non-leukemic cells of the same lineage and at a similar developmental stage; that CXCR4 and DSB pathways both promote lymphotoxin a1b2 expression. The exact kinetics of these experiments will vary, or at least carry a margin of error that is to the best of our capability impossible to eliminate.

      3/ To formally prove that the observed effect is really due to LTab/LTbR signaling, the authors must perform further control experiments. LTbR signaling is better known for its positive role on lymphocyte migration. They cannot rule out by blocking LTbR signaling, that they inhibit homing of leukemic cells into the bone marrow through a systemic/peripheral effect, more than through an impaired crosstalk with BM LepR+ cells. They must confirm for inhibited/deficient LTbR signaling conditions, as compared to control, that similar B-ALL numbers home to the BM parenchyma at an early time point after injection. Furthermore, they cannot exclude that the effect on the expression of IL7 (and other genes), and consequently the effect on B cell numbers, is not simply due to the tumor burden. Indeed, B-ALL numbers/frequencies are different between control and inhibited/deficient signaling conditions at the time of analysis. The analyses should thus be performed at similar low and high tumor burden in the BM for both control and inhibited/deficient LTbR signaling conditions.

      We performed ALL homing experiments into control and LTbR∆ and found no significant differences in ALL frequency or number in BM 24h after transplantation. These data have been included in Figure 4A.

      We also performed experiments to control for the number of ALL cells in the bone marrow. Briefly, we compared the impact of 3 million WT ALLs with that of 3 and 9 million Ltb-deficient ALLs on Il7-GFP expression in BM MSCs. The number of Ltb-deficient ALLs in the BM of mice recipient of 9 million ALLs was equivalent to that of mice that received 3 million WT ALLs 7 days after transplantation. Importantly, Il7 was only downregulated in mice transplanted with WT ALLs. These data have been included in Figure 4R and 4S.

      4/ LT/LTbR signaling is particularly known for its capacity to stimulate Cxcl12 expression. How do the authors explain that they see the opposite?

      The reviewer is alluding to a well-known role of LTbR signaling as an organizer of immune cells in secondary lymphoid organs such as spleen and lymph nodes, and particularly its role in promoting CXCL13, CCL19, CCL21 production by fibroblastic reticular cells of these organs. Both the B cell follicle and the T-zone do not express CXCL12 abundantly. Furthermore, in the B cell follicle niche, LTbR signaling is critical for the maturation of Follicular Dendritic Cells, yet FDCs hardly produce CXCL12 as well. So, while LTbR is a well-known regulator of cell organization through the production of homeostatic chemokines and lipid chemoattractants, CXCL12 itself is not one of the major chemokines controlled by this pathway. In summary, we do not think our data is in any way incompatible with prior studies on the LTbR pathway, and even if it was, to our knowledge this is the first study on cell-intrinsic effects of LTbR signaling in BM MSCs.

      5/ The authors show that CXCL12 stimulates LTa expression in their cell line. They then propose that CXCR4 signaling in leukemic cells potentiates ALL lethality by showing that a CXCR4 antagonist reverses the decrease in IL7 and improves survival of the mice. This experiment is difficult to interpret. CXCL12 has been shown to be important for migration/retention of B-ALL in the BM and the decreased tumor burden is probably linked to a decreased migration more than an impaired crosstalk with LepR+ cells (see also point 3). If CXCL12 increases LTab expression, CXCR4 blockade should do the opposite. This result should be presented. The contradiction is that if B-ALLs induce a decrease in CXCL12 in the BM (in addition to IL7) and that CXCL12 regulates LTab levels, leukemic cells should be exhausted. Similarly, IL7 has been previously shown to stimulate LTab expression and B-ALL cells express the IL7R. Again, a decrease in IL7 should be unfavorable to B-ALL. How do they explain these discrepancies?

      We thank the reviewer suggestion of testing the impact of CXCR4 blocking in vivo on LTa1b2 expression. We performed these experiments which have now been included in the revised manuscript (Fig. 5C and 5D). In summary, we observed reduced LTa1b2 on ALLs transplanted into mice treated with AMD3100, a well-known CXCR4 antagonist. These data also show that CXCR4 signaling is not the only mechanism driving LTa1b2. These results further strengthen the main conclusions of the manuscript. Finally, to our knowledge no study has reported Lymphotoxin a1b2 upregulation in B-ALLs by IL-7.

      6/ In Supp 4A, RAG-/- mice are blocked at the pro-B cell stage and do not have pre-B cells. Please compare LTa and LTb expression by Artemis deficient pre-B cell to wt pre-B cells. In this experiment, the authors show that similarly to B-ALL artemis-/- pre-leukemic pre-B cells express high levels of LTab and induce IL7 downmodulation. Using mice deficient for LTbR in LepR+ cells, they show that IL7 expression is increased. However, in opposition to leukemic cells (see Figure 4F), pre-leukemic cells are increased in absence of LTab/LTbR signaling. Please explain this discrepancy. The authors use only one B-ALL model cell line for their demonstration (BCR-ABL expressing B-ALL). Another model should be used to confirm whether LTab/LTbR signaling does favor leukemic/pre-leukemic B cell growth.

      We apologize for the confusion. The mice that were used in this study were initially described by Barry Sleckman and colleagues (Bredemeyer et al. Nature 2008). Briefly, they crossed Artemis-deficient mice with VH147 IgH transgenic and EμBcl-2 transgenic mice to generate mice in which B cell development is arrested at the preB cell stage. The Vh147 heavy chain allows their development to the pre-BCR+ preB cell stage but Artemis deficiency prevents Rag protein re-expression and hence B cell can’t recombine light chain genes. The EμBcl-2 transgene allows preB cells to survive despite carrying unrepaired double-strand DNA breaks (DSB).

      Regarding the discrepancy noted by the reviewer we argue that this is not a discrepancy. While ALLs can grow in vitro and in vivo in the absence of IL7, non-leukemic developing B cells are strictly IL7 dependent. PreB cells carrying unrepaired DSBs still express IL7 receptor and although no data is currently available on whether these cells are also IL7-dependent, we speculate that they are. Because up-regulation of Lymphotoxin a1b2 in preB cells carrying unrepaired DSBs promotes IL7 downregulation we speculate that this mechanism may contribute to the efficient elimination of pre-leukemic preB cells in vivo. We revised the manuscript to include this explanation of the mouse model and discussion on how we think the LTbR pathway may play a role in pre-leukemic states.

      Finally, the data presented in this study includes two distinct leukemia mouse models. It also includes data from human B-ALL and AML samples that is in agreement with the mouse data presented here. We respectfully disagree with the reviewer that a third model is needed to confirm a role for the LTa1b2/LTbR pathway in leukemia.

      7/ Pre-B cells are composed of large pre-B cells (pre-BCR+) and small pre-B cells (pre-BCR-). BCR-ABL B-ALL cells express the pre-BCR. What is the level of expression of LTa and LTb by each of these 2 subsets as compared to BCR-ABL B-ALL?

      This is a misconception. The difference between large and small preB cells is simply that large preB cells are in S/G2 phase of the cell cycle. Their increased size is a mere consequence of doubling DNA, protein, membrane content, etc.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, they demonstrate that neonatal mice produce more CD43- B cellderived IL-10 following anti-BCR stimulation than adult mice. This is due to autocrine mechanisms whereby anti-BCR stimulation leads to pSTAT5 upregulation and production of IL-6 which then enhances IL-10 production via pSTAT3. These are interesting results for the regulatory B cell field, demonstrating that signaling is different in adult vs neonatal B cells and in particular for researchers studying the mechanisms underpinning the enhanced susceptibility to infection. The authors in the main achieved their aim and the results support their conclusions. However, considering that other studies have previously addressed the mechanisms contributing to enhanced IL-10 production in neonates, in the manuscript, there are some experimental decisions and data presentation decisions that at times need more explanation. An important additional comment is that the introduction/discussion is at times insufficiently referenced to put the data in context for non-experts in this field and that numbers in general are low for an in vitro study.

      We have now updated the introduction and discussion to provide more insight into our study. We hope that our study is now more understandable for non-experts.

      Reviewer #2 (Public Review):

      This paper reports that neonatal CD43- B cells produce IL-10 upon BCR stimulation, which inhibits TNF-alpha secretion from the peritoneal macrophage. In the neonatal CD43- B cells, the BCR-mediated signal transmitted Stat5 activation and induced IL-6 production, and subsequently, the secreted IL-6 activated Stat3 finally leading to IL-10 production. The authors identified a unique signaling pathway leading to IL-10 production and revealed the different responses between CD43+ and CD43- B cells against BCR crosslinking. A weakness of this study is that the neonatal CD43- B cell subset secreting IL-10 has not been characterized and discussed as well. BCR expression levels between adult CD43- B cells and neonatal CD43- B cells have been overlooked to explain the different reactivity. Clarity on these points would substantially enhance the impact of the manuscript.

      We thank the reviewer for the suggestion to measure BCR levels. We now measured the IgM and IgD levels on neonatal and adult B cell C43+ and CD43- subsets (Figure 1figure supplement 5).

    1. Author Response

      Reviewer #1 (Public Review):

      This is an exciting paper describing the development of a robust differentiation of the common marmoset induced pluripotent stem cells (iPSCs) into primordial germ cell-like cells and subsequently into spermatogonia-like cells when combined with testis somatic cells. The work is of high quality, but some experimental details and protocols are missing which are necessary for a new protocol development - for example, reconstitution methods and protocols are missing completely in the manuscript and additional details in various aspects of the differentiation and cell maintenance are missing. Despite this, the work is valuable and would be of interest to the germ cell and in vitro gametogenesis communities. The data suggest that marmosets are very similar to humans and macaques, and indeed previously established protocols for PGCLC induction and likely previously published testis reconstitution methods/differentiation were employed here to generate the spermatogonia-like cells.

      We greatly appreciate the positive comments of the reviewer on our manuscript. We have added experimental details of our germ cell differentiation schemes in Materials and Methods.

      Reviewer #2 (Public Review):

      This paper identifies the need for improved pre-clinical models for the study of human primordial germ cells (PGCs) and suggests the common marmoset (Callithrix jacchus) as a suitable primate model. In vitro gametogenesis offers an alternative method to generate germ cells from pluripotent stem cells for study and potential pre-clinical applications. Therefore, the authors aimed to take the first steps toward developing this technology for the marmoset. Here, iPSCs have been derived from the marmoset and differentiated to PGC like-cells (PGCLCs) in vitro that have similarities in gene expression with PGCs identified from single-cell studies of marmoset embryos, as demonstrated through immunofluorescence and RT-qPCR approaches, as well as RNA-sequencing.

      The authors have successfully developed a protocol that produces PGCLCs from marmoset iPSCs. These are shown to express key germline gene markers and are further shown to correlate in gene expression with PGCs from the marmoset. This study uses a 2D culture system for further expansion of the PGCLCs. When cultured with mouse testicular cells in a xenogeneic reconstituted testis culture, evidence is provided that cjPGCLCs have the capacity to develop further, expressing marker genes for later germline differentiation. However, the efficiency of generating these prospermatogonia-like cells in culture is unclear. Nonetheless, with the importance of developing protocols across species for in vitro gametogenesis, this paper takes a key step towards generating a robust preclinical system for the study of germ cells in the marmoset.

      We thank the reviewer for the encouraging comment. By IF analyses, we identified 0.89 and 3.3% of DAZL or DDX4 positive cells, respectively (DDX4+TFAP2C+ cells [4/123, 3.3% among all TFAP2C+ cells] and DAZL+TFAP2C+ cells [2/232, 0.86% among all TFAP2C+ cells]). Overall scarcity of cells and lack of fluorescence reporters (DAZL and DDX4 are cytoplasmic proteins necessitating technically challenging intracellular staining procedure to be assessed by flow cytometry), we were not able to provide the flow cytometric plots in this study. This has been described in the revised manuscript (page 11, Results, “Maturation of cjPGCLCs into early prospermatogonia-like state”).

      The claims of the authors are generally justified by the data provided; however, some conclusions should be clarified. In particular, the authors have failed to show convincingly that cjPGCLCs are a distinct cell type to the iPSCs that generated them. cjiPSCs cultured in feeder conditions (OF) with IWR1 are reported to cluster closely with the derived cjPGCLCs using principal component analysis of RNA-Seq data. This contrasts with the cjiPSCs cultured in feeder-free (FF) conditions which maintain a more undifferentiated/less primed state, and are not capable of differentiating to the germline lineage. Therefore, the OF/IWR1 cjiPSCs could rather be an intermediate cell-state between iPSCs and cjPGCLCs.

      Although OF/IWR1 cjiPSCs are closer to cjPGCLCs than cjiPSCs cultured in other conditions, they are pluripotent (as evidenced by trilineage differentiation assay, morphological assessment, and expression of pluripotency markers, Figure 3–figure supplement 2) and do not express most of key germ cell markers (Figure 6–figure supplement 1C). Our newly added scRNA-seq analyses also highlighted the differences between OF/IWR1 and cjPGCLCs and the molecular dynamics associated with the transition.

      The reasons behind improved germline competence of iPSCs in the different media conditions are unclear. The authors reject the idea that this is due to the presence of IWR1, since this condition has not affected FF iPSCs. However, the efficiency of differentiation was greatly increased in OF conditions when IWR1 was used, indicating inhibition of WNT does indeed have a positive effect on induction to the germline lineage. This area requires further clarification.

      As the reviewer pointed out, inclusion of IWR1 in cultures of OF cjiPSCs upregulates some pluripotency markers (SSEA3, SSEA4) and reduces meso/endodermal differentiation. Thus, the undifferentiated/less primed state under the Wnt inhibition might positively affect germ cell differentiation of OF cjiPSCs. However, FF cjiPSCs are pluripotent and are not germline competent, even in the presence of IWR1, suggesting that there are factors in FF culture conditions that make them incompetent for germline differentiation. Because FF cultures utilize PluriStem™ medium, a proprietary product of MilliporeSigma, we were unable to define the factor that confers such germline incompetence.

      Another area requiring clarification is the reporting of RNA sequencing data as representative of a developmental trajectory, without defining which cell lines produced clusters, or defining the stages of this trajectory. The authors refer to the identification of four clusters representative of a developmental trajectory, however, they provide unclear information as to what this refers to. Importantly, detailed transcriptomic comparisons between in vivo-derived PGCs and in vitro PGCLCs are not provided.

      Our original analysis revealed which cell lines produced clusters (Figure 6A) and defined the stages of the trajectory (iPSCs feeder free, iPSCs on feeder, PGCLCs, expansion, Figure 6C). The four clusters to which the reviewer refers are gene clusters that are defined by unsupervised clustering analysis of variably expressed genes across the samples (Figure 6D). As it is defined computationally, it is not possible to unequivocally define gene clusters by particular cell types. However, we found that these gene clusters revealed insightful patterns (1, genes higher in cjiPSCs; 2, genes higher in cjPGCLCs; 3, genes higher in expansion culture cjPGCLCs; 4, genes higher in d2 cjPGCLCs). We have added sample information to the Figure 6D to further clarify the meaning of the data and a brief explanation of gene clusters in the figure legend. To define the trajectory in a more unbiased manner, we performed scRNA-seq and have added additional trajectory analyses (Figure 7A-K in the revised manuscript). Moreover, we also added the transcriptomic comparison as the reviewer suggested (Figure 7L, M in the revised manuscript).

      Functional validation of iPSC lines generated in the study is not provided besides confirming that the cells express pluripotency markers OCT3/4, SOX2, and NANOG. It is important to confirm tri-lineage differentiation of iPSCs, e.g., through an embryoid body assay. Since FF cjiPSCs were unable to differentiate into cgPGCLCs, it is even more important to confirm cells are genuine iPSCs.

      We performed a trilineage differentiation assay and confirmed that they can generate three germ layers.

      In summary, although there are issues surrounding clarity, this paper is generally justified in its conclusions. The authors present an optimised protocol for the derivation of PGCLCs from marmoset iPSC-like cells, with defined expansion conditions and evidence of further differentiation to prospermatogonia-like cells.

      We thank the reviewer for the encouraging comment.

    1. Author Response

      Reviewer #1 (Public Review):

      Sayin et al. sought to determine if bacterial drug resistance has impact on drug efficacy. They focused on gemcitabine, a drug used for pancreatic cancer that is metabolized by E. coli. Using an innovative combination of genetic screens, experimental evolution, and cancer cell co-cultures to reveal that E. coli can evolve resistance to gemcitabine through loss-of-function mutations in nupC, with potential downstream consequences for drug efficacy.

      Major strengths include:

      • Paired use of genetic screens and experimental evolution

      • The spheroid model is a creative approach to modeling the tumor microbiome that I hadn't seen before

      • Rigorous microbiology, including accounting for mutation rate in both selective and non-selective conditions

      • Timely research question

      Major weaknesses of the methods and results include the following:

      1) Limited scope of the current work. Just a single drug-bacterial pair is evaluated and there are no experiments with microbial communities, animal models, or attempts to test the translational relevance of these findings using human microbiome datasets.

      We agree with the reviewer that uncovering evidence from human microbiome datasets will be very exciting and complementary to our study. However, since gemcitabine is administered intravenously it’s unclear whether it will impose a considerable selective pressure on the gut microbiome. Therefore, it also remains unclear if adaptive mutations, as those we identified, are expected to be found in datasets for the gut microbiome. While metagenomics datasets that are bacterial-centric of infected pancreatic tumors will be ideal for addressing the reviewer’s suggestion, they do not exist to the best of our knowledge. It should be noted however, that our work generated hypotheses that can be tested in pancreatic tumor tissues infected with gammaproteobacteria and can be tested in the future by targeted sequencing for the specific genes of interest (e.g, nupC and cytR).

      2) No direct validation of the primary genetic screen. The authors use a very strict cutoff (16-fold-change) without any rationale for why this was necessary. More importantly, a secondary screen is necessary to evaluate the reproducibility of the results, either by testing each KO in isolation or by testing a subset of the library again.

      We used a strict cutoff to allow the reader to focus on a manageable list of gene names in the main figure (2E). To partly address this limitation in scope, we also included results from pathway enrichment analysis in the same figure (2F). This analysis utilizes all enrichment values and is therefore independent from any choice of cutoff value. We also now refer the reader to explore more of the hit genes in the supplementary information (line 152).

      As the reviewer suggested we evaluated the reproducibility of the results by performing two validation screens. The first validation screen was performed as a biological replicate of the original screen and relied on the original collection of knockouts strains. The second validation screen was performed with a knockout strain collection that was cloned independently from the strains used in our original screen. The results from these two completely independent biological replicates are presented on supp. figure 1D. The results (resistance/sensitivity) from the two screens are highly correlated. We refer to this comparison in the main text (lines 142-147).

      3) Some methodological concerns about the spheroid system. As I understood it, these cells are growing aerobically, which may not be the best model for the microbiome. Furthermore, bacterial auxotrophs are used and only added for 4 hours, which will really limit their impact. It also was unclear if the spheroids are truly sterile. Finally, the data lacks statistical analysis, making it unclear which KOs are meaningful. Delta-cdd looks clearly distinct by eye, but the other two genes are more subtle.

      The 4 hour time interval chosen to address two opposing requirements of the co-culture system – mitigate overgrowth of the bacterial cultures (which hinders spheroid growth irrespective of the drug) while still allowing enough incubation time to allow for drug degradation. As the reviewer notes, removal after 4 hours may limit the bacteria impact. However, such a limitation will only result in underestimation the bacterial impact (but will have no impact on how we evaluate how strains compare to one-another). We now comment on this in the methods section (lines 699-705).

      We do not expect the spheroid to remain infected after bacterial removal since we treat spheroids with antibiotics. We didn’t not detect any bacterial growth in the 7 days post infection in the microscope and we did not observe influence on spheroid growth when compared to spheroid that were not infected. Growth of spheroid before infection was performed w/o antibiotics and we did not detect any evidence of bacterial growth prior to introducing the bacteria intentionally (the cell-line itself was also tested for animal pathogens and bacterial contamination prior to the experiments).

      We repeated the spheroid experiments and observed similar shifts in the EC50 fronts. We now include these replicates as supplementary figure 7. We comment on these replicates in the main text (lines 273-274).

    1. Author Response

      Reviewer #1 (Public Review):

      This is an elegant and fascinating paper on individuality of structural covariance networks in the mouse. The core precepts are based on a series of landmark papers by the same authors that have found that individuality exists in inbred mice, and becomes entrenched when richer environments are available. Here they used structural MRI to provide whole brain analyses of differences in brain structure. They first replicated brain (mostly hippocampal) effects of enrichment. Next, they used their roaming entropy measurements to show that, after dividing their mice into two groups based on their roaming entropy, that there were no differences in brain structure between the two groups yet significant differences in brain networks as measured by structural covariance. Overall I enjoyed this paper, though am confused (and possibly concerned) about how they arrived at their two groups and have some less important methods questions.

      The division of mice into two groups (down and flat) is confusing. The methods appear to suggest that k-means clustering combined with the silhouette method was used (line 380). The actual analyses used involves 2 groups of 15 mice each. The body of the manuscript suggests that 10 intermediate mice were excluded (line 100), but the methods (line 390) suggest that 8 mice were excluded, 2 for having intermediate results and 6 for having high RE slope values.

      This leads to a series of questions:

      • How many mice were excluded and for what reasons, given the discrepancy between body and methods?

      The discrepancy was an oversight that has been corrected. The statement with the exclusion of six upward sloping and two intermediates is correct. For the rationale see above and the inserted text in the discussion.

      • Was the k-means clustering actually used? It appears that the main division of mice was based on visual assessments.

      The superfluous paragraph in the method section was removed.

      • If the clustering was used, did it result in 2 or 3 groups?

      Slope distribution did not reveal clear groups, so it did not offer an advantage over the arbitrary decision based on slope values and described above. We have now added a graphic depiction of the slope values next to the ‘flat’ or ‘down’ matrices for greater clarity (Fig. 3B).

      • The intermediate group bothers me (if it was indeed 10 intermediate mice as indicated by the body rather than 2 as indicated in the methods): if these are indeed intermediate shouldn't they be analyzed and shown to be intermediate on the graph or other measures?

      These were only 2 mice, for which the matrix cannot be calculated.

      • Please explain the reasoning for excluding mice for having too high of a slope (if there were indeed mice excluded for having too high of a slope).

      We went to long discussions among the authors and finally decided in favor of two equally-sized groups with homogenous patterns. The effect that we observed is so large and obvious that it survives all sorts of regrouping. We have also followed the suggestion to present the continuous correlation across the whole range of animals (Fig. 2)

      I'd also appreciate more discussion about the structural covariance differences between flat and down mice. It is not clear what the direction of effects are - it appears that flats show mostly increases in covariance?

      Yes, covariance is greater in the top (flat) than bottom (down) group.

      The structural covariance matrix for those mice with a ‘flat’ RE suggests a much higher degree of inter-regional correlation in comparison to ‘down’ or STD mice, findings confirmed and extended by the NBS analysis.

      Reviewer #2 (Public Review):

      Lopes et al. use genetically identical mice to address a topic of broad interest: how does variation in roaming behaviour across individuals (here, quantified via the roaming entropy) arise over time when exposed to an enriched environment, and how does this variation in behaviour relate to brain structure and networks. Specifically, by examining the roaming entropy of mice and the sizes of brain structures, the authors convincingly show 1) an increase in variability in roaming behaviour over a period of 12 weeks, 2) that mice that roam more contain an increased number of doublecortin positive cells in the dentate gyrus (indicating higher levels of neurogenesis), and 3) that roaming is associated with widespread differences in neuroanatomy. The authors additionally partition mice into two groups characterized by roaming trajectories (continuous "flat" roamers and habituating "down" roamers), construct structural covariance networks for these groups, and show that the structural covariance network for "down" roamers is similar to mice housed in standard conditions and contrasts that of "flat" roamers.

      A major strength of this study is the wealth of roaming data generated by the RFID setup; the high temporal resolution, fair spatial resolution, and long period of observation (3 months) allow for measures such as roaming entropy to be precisely quantified and tracked over time. Coupled with high-resolution whole brain structural MRI and histological measurements of neurogenesis in the dentate gyrus, the dataset generated is an incredibly valuable one to probe brain-behaviour relationships. Importantly, this study showcases the power of animal studies--because the subject mice are inbred, they are virtually identical in their genetics, and therefore any variation in the data collected should arise from the non-shared environment.

      An area of improvement for this study follows from its strength: the dataset collected here contains far more information on mouse behaviours than is analyzed. For instance, the sizes of a broad set of regions were found to be statistically associated with roaming behaviour, but determining how much of this anatomical variation is specifically related to differential exploration of the static environment as opposed to social contact with other animals (which could presumably be determined from the RFID data) would make this study much more impactful and interesting to the community.

      An important limitation in the network analyses performed is the small number of mice. Due to sampling variation, a large number of individuals are required to estimate correlation coefficients with reasonable precision. While large-scale similarities and differences between the structural covariance (correlation) matrices are visually apparent and quite striking, confidence in these results would be increased with the inclusion of more subjects, and/or a replication cohort.

      We fully agree to this judgement. It is not straightforward, however, to further increase N in these studies, both for cost and logistic reasons. Rather than investing into further improving this current study, we decided to learn from our findings and design follow-up studies that take the next steps.

      Finally, while both roaming behaviour and brain structure are assessed, relationships between these measures are associative. Since brain structure was only examined at one timepoint (post-enrichment), the direction of causation cannot be assessed. It remains to be seen if behavioural variation drives anatomical variation through plasticity, or whether anatomical variation present before enrichment is predictive of future behaviours. To their credit, the authors are careful not to make causal inferences. In the context of this brain-behaviour studies, this is an important limitation to recognize, but this does not detract from the important relationships between roaming behaviour and brain structure found by the authors in this study.

      In summary, while there is much more to do in studying relationships between the environment, brain structure, and behaviour, Lopes et al. take an important step ahead in describing relationships between individual roaming behavioural trajectories, brain structure, and structural covariance networks.

    1. Author Response

      Reviewer #1 (Public Review):

      This study elucidates a role of EHD2 as a tumor/metastasis promoting protein. Prior work has found varying results indicating that high expression of EHD2 is either associated with good or poor outcomes. In this work the authors find that EHD2 is expressed in both the nucleus and cytoplasm, and that high cytoplasmic to nuclear expression is associated with a poor prognosis. Using WT and either shRNA knockdown or CRISPR KO cells, they show that EHD2 promotes 3D growth, migration and invasion in vitro, and tumor growth and metastasis in vivo. Importantly, re-expression of EHD2 in KO cells rescues the loss of function phenotype. Mechanistically, the investigators show that the loss of EHD2 decreases the calveoli and that this decreases the Orai1/Stim induced calcium influx. Finally, they show that inhibitors of store operated calcium entry (SOCE) phenocopies the loss of EHD2. Together the data support a protumorigenic role for EHD2 via store-operated calcium entry and reinforce the utility of targeting calveoli and SOCE in tumors with high cytosolic EHD2. This study provides a rationale for using SOCE inhibitors in a subset of breast cancers, and a potential predictive biomarker for using SOCE inhibitors based on high expression of EHD2.

      We are grateful for the positive comments. Since this paragraph is to be published in the event of our manuscript being accepted, we request the correction of one typo in the paragraph: “calveoli” should be “caveolae”.

      Reviewer #2 (Public Review):

      The manuscript by Luan et. al. describes the role of EHD2 in promoting breast tumor growth. They showed that EHD2 cytoplasmic staining predicts poor patient outcome. Both EHD2 KO or knockdown cells showed decreased cell migration/invasion abilities and significant reduction of tumor growth and metastasis in mice. The authors further showed that the levels of EHD2 and Cav1/2 correlate with each other. EHD2 KO cells showed defects on Ca2+ trafficking. Overexpressing the SOCE factor STIM1 partially rescued SOCE defects in EHD2 KO cells. Treatment of the SOCE inhibitor SKF96365 inhibited tumor cell migration in vitro and tumor growth in vivo.

      Major strengths: The authors showed that EHD2 cytoplasmic levels predict patient survival and provided strong evidence that EHD2 knockout or knockdown inhibits tumor cell migration in vitro and tumor growth in vivo. The authors also showed that SKF96365, which inhibits SOCE, suppresses tumor growth in vivo.

      Major weaknesses: The connection between EHD2 and SOCE is weak.

      We are thankful to the reviewer for her/his assessment of the strengths in our manuscript and appreciate her/his pointing to its weaknesses. We agree that more studies will be needed to fully establish the connection of EHD2 to SOCE and have appropriately moderated our statements in the results and discussion sections of the manuscript. We have also added statements about the need for such future studies.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript by Ramaprasad et al., the authors report on the functional characterization of the P. falciparum glycerophosphodiesterase to assess its role in phospholipid biosynthesis and development of asexual stages of the parasite. The authors utilized loxP strategy to conditionally knock-out the target gene, they also carried out complementation assays to show recovery of the knock-out parasites. They further show that Choline supplementation is also able to rescue the knock-out phenotype. Quantitative lipidomic analyses show effect on majority of membrane phospholipids. In vitro activity assays and metabolic labelling assays shows role of GDPD in metabolism of exogenous lysoPC for PC synthesis. The manuscript deciphers the functional role of an important component of lipid metabolism and phospholipid synthesis in the parasite, which are crucial metabolic pathways required for replication of the parasite in the host cell.

      We thank the Reviewer for assessing our work and for the following helpful suggestions.

      Reviewer #2 (Public Review):

      The authors use a conditional Lox/Cre knock-out system to test and confirm the essentiality of glycerophosphodiester phosphodiesterase (GDPD) for blood-stage parasites and a key role in mobilizing choline from precursor lysophosphocholine (LPC) for parasite phospholipid synthesis. Prior works had identified serum LPC as the key choline source for parasites, localized this enzyme in parasites, and suggested an essential function in releasing choline, but this key function had remained untested in parasites. This manuscript critically advances mechanistic understanding of parasite phospholipid metabolism and its essentiality for blood-stage Plasmodium and identifies a potential new drug target.

      Overall, this study is well constructed and rigorously performed, and the data provide strong support for the central conclusions about GDPD essentiality and functional contribution to parasite phosphocholine metabolism. The observation that exogenous choline largely rescues parasites from lethal deletion of GDPD is especially compelling evidence for a critical and dominant role in choline mobilization. A few aspects of the paper, however, are not fully supported by the current data and/or need clarification.

      We thank the reviewer for this very positive assessment and the helpful suggestions below.

      1) GDPD localization

      a) The authors conclude that GDPD is localized to the parasitophorous vacuole (PV) and parasite cytoplasm (lines 114-115), which is consistent with the prior 2012 Klemba paper. However, the data in the present paper (Figures 2A and 2E) only seem to support cytoplasmic localization but don’t obviously suggest a population in the PV, in part because no co-staining with a PV marker is shown. The legend for Fig. 2E indicates staining with the PV marker, SERA5, but such co-stain is not shown in the figures or figure supplements. This data should ideally be included and described.

      We apologise for this error and omission in our original submission. In response to this suggestion, we have now generated new data that demonstrate co-localisation of the PV marker SP-mScarlet (Mesen-Ramirez et al., 2019) with GDPD in our GDPD-GFP line. In the revised manuscript we now include those new data in Fig 2A and we have also corrected the legend of the revised Fig 2E to reflect what is being shown.

      b) How do the authors explain cytoplasmic localization for GDPD? This protein contains an N-terminal signal peptide, which can account for secretion to the PV but would contradict a cytoplasmic population. The 2012 Klemba paper suggested that internal Met19 might provide an alternate site for translation initiation without a signal peptide and thus result in cytoplasmic localization. Some discussion of this ambiguity, its relation to understanding GDPD function, and a possible path to resolve experimentally seem necessary, especially as the authors suggest from data in Fig. 7 that this enzyme may have functions beyond choline mobilization, which may relate to distinct forms in different sub-cellular compartments.

      The Reviewer raises an excellent point here. We agree that the apparent dual localization of GDPD and the question of its potential function in both compartments is intriguing. Since lysoPC is efficiently internalised into the parasite, one simple possible explanation (which we failed to state earlier) is that GDPD performs a similar enzymatic function in both compartments. Given the importance of choline for parasite membrane biogenesis, it would not be surprising for GDPD activity to be required at high abundance in order to maintain sufficient choline levels in the parasite. We have now modified lines 403 onwards in the revised Discussion to provide more perspective on this point, as follows: “Based on protein localisation, ligand docking and sequence homology analyses, we can further speculate regarding aspects of PfGDPD function not explored in this study. It has been previously suggested that the gene could use alternative start codons via ribosomal skipping to produce distinct PV-located and cytosolic variants of the protein (Denloye et al., 2012). PfGDPD could potentially perform similar functions in both compartments by facilitating the breakdown of exogenous lysoPC both within the PV and within the parasite cytosol (Brancucci et al., 2017). This scale of enzyme activity may be essential for the parasite to meet its choline needs, given the high levels of PC synthesis during parasite development and its crucial importance for intraerythrocytic membrane biogenesis. PfGDPD may also have other roles during asexual stages such as temporal and localised recycling of intracellular PC or GPC by the PfGDPD fraction expressed in the cytosol. Finally, our ligand docking simulations also do not rule out catalytic activity towards additional glycerophosphodiester substrates such as glycerophosphoethanolamine and glycerophosphoserine (Figure 6-figure supplement 1A and B). Further investigation that involves variant-specific conditional knockout of the gdpd gene could help us further dissect the role of PfGDPD in the parasite.”

      2) The phenotypes depicted by representative microscopy images in panel 4E (especially for choline rescue) should be supported by population-level analysis by flow cytometry or microscopy of many parasites to establish generality.

      We agree that this would be informative, and in the revised manuscript we have now added a representative microscopy image as source data (Figure 4E_G1+Cho48h-sourcedata.png). It is also worth pointing out that G1 is a clonal line generated from the RAP+ Choline+ parasite population. Both population-level analysis by flow cytometry (Fig 4A) and microscopic images (Fig 4D) are therefore also applicable to the G1 line.

      3) The analysis in the last results section (starting on line 296) seems preliminary.

      a) For panel 7B, a population analysis of many parasites, with appropriate statistics, is important to establish a generalizable defect beyond the single image currently provided.

      b) The data here would seem to be equally explained by an alternative model that GDPD∆ parasites are competent to form gametocytes but their developmental stall (due to choline deficiency) prevents progression to gametocytogenesis. The authors speculate that GDPD may play other roles in phospholipid metabolism beyond choline mobilization that are essential for gametocytogenesis. Their model, if correct, predicts that a GDPD deletion clone from +RAP treatment that is rescued by exogenous choline should not form gametocytes. Testing this prediction would be important to strongly support the conclusion of broader roles for GDPD in sexual development beyond choline mobilization.

      We interpreted our results precisely as the reviewer suggests here – that the developmental stall during trophozoite stages is severe enough to prevent sexual differentiation. A priori, we have no reason to suspect that GDPD plays other roles that are selectively essential for gametocyte development. We speculated that GDPD might have other roles in asexual stages but not necessarily based on this experiment. In the revised manuscript we have modified line 313 accordingly to remove ambiguity: “This result implies that the loss of PfGDPD causes a severe block in PC synthesis resulting in the death of asexual parasites before they get to form gametocytes.”

      We have also altered line 411 in the Discussion to: “PfGDPD may also have other roles during asexual stages such as temporal and localised recycling of intracellular PC or GPC by the PfGDPD fraction expressed in the cytosol.”

      We agree with the reviewer that the analysis is preliminary. Since we lose RAP-treated GDPD:HA:loxPintNF54 populations after cycle 1, we were unable to do more detailed analysis with the line. We also wished to carry out the experiment that the reviewer suggests here to analyze choline-rescued mutants. However, we would be unable to test for this as choline supply alone would suppress sexual differentiation in these parasites (as shown in Brancucci et al., 2017).

      Reviewer #3 (Public Review):

      In this work, Ramaprasad et al. aimed to investigate the roles of a glycerophosphodiesterase (PfGDPD) in blood stage malaria parasites. to determine its role, they generated a conditional disruption parasites line of PfGDPD using the DiCre system. RAP-induced DiCre-mediated excision results in removal of the catalytic domain of this protein. Loss of this domain leads to a significant reduction of parasite survival, specifically affecting trophozoite stages. They suggest that there is an invasion defect when this protein domain is deleted. They additionally show the introduction of an episomal expression of PfGDPD can rescue the activity of the protein and restore parasite development. Interestingly, exogenous choline can rescue the effects of the loss of PfGDPD. Lipidomic analyses with labelled LPC show that choline release from LPC is severely reduced upon protein ablation and in turn prevents de novo PC synthesis. These experiments also show increase in DAG levels suggesting a defect in the Kennedy pathway. The authors purified PfGDPD and enzymatically show that this protein facilitates the release of choline from GPC. Additionally, the paper also briefly looks at the effects of the protein during sexual blood stages and show this is unlikely to be involved in sexual differentiation.

      This paper is of interest to the community since the breakthrough paper of Brancucci et al. (2017), which showed us that decreased LPC levels induce sexual differentiation. This work brings novel insight into a GDPD responsible for the release of choline from GPC which actual seems more relevant to asexual stages and not sexual stage parasites. The authors have been extremely thorough in their experimentations on parasite viability and the exact role of this protein.

      We thank the reviewer for this positive assessment and the helpful comments.

    1. Author Response

      Reviewer #1 (Public Review):

      It is a strength of the current manuscript that it provides a near-complete picture of how the metamorphosis of a higher brain centre comes about at the cellular level. The visualization of the data and analyses is a weakness.

      I do not see any point where the conclusions of the authors need to be doubted, in particular as speculations are expressly defined as such whenever they are presented.

      The fact that molecular or genetic analyses of how the described metamorphic processes are organized are not presented should, I think, not compromise enthusiasm about what is provided at the cellular level.

      We appreciate the comments and guidance that Reviewer #1 has given us on data presentation. We have tried to simplify figures and make the images larger. For the developmental figures, a couple of illustrative examples are provided in the main figure with the remainder given in “figure supplements”

      Reviewer #2 (Public Review):

      This very nice piece of work describes and discusses the developmental progression of larval neurons of the mushroom body into those in the adult Drosophila brain. There are many surprising findings that reveal a number of strategies for how brain development has evolved to serve both the early functions specific to the larval brain and then their eventual roles in the adult brain. I think it is fascinating biology and I was educated while reviewing the paper.

      Line 115-116. 'Output from PPL1 compartments direct avoidance behavior, while that from PAM compartments results in attraction'. This is not correct and is actually reversed. The learning rule is depression so that aversive learning reduces the drive to approach pathways whereas appetitive learning reduces the drive to avoidance pathways. This should be corrected and reference made to studies demonstrating learning-directed depression.

      Line 222. It provides feed-forward inhibition from y4>2>1. I could be wrong but I'm not aware that there is functional evidence for this glutamatergic neuron being inhibitory. It's currently speculation.

      We have noted that this function was proposed by Aso et al.

      Line 242. I think it would be nice if the authors focused on extreme changes and showed larger and nicer images. The rest can be summarized but why not pick a few of the best examples to illustrate the strategies they consider in the discussion?

      We have reduced the number of neurons shown in the new Figs 5 and 6. Hopefully, the images are now large enough to appreciate. Data for the remaining neurons are now in Figure Supplements for Figs 5 and 6.

      Line 249 'became sexually dimorphic'. I may have missed it somewhere but this immediately made me think about the sex of all the images that are shown. Is this explicitly stated somewhere? Was it tracked in all larvae, pupae, and adults?

      We now begin the Methods addressing this point. We did an initial screen and found sex-specific differences only in MBIN-b1 and -b2. After this time, we kept no records as to the sex of the fly that was used except for the latter cells.

      Reviewer #3 (Public Review):

      Truman et al. investigated the contribution and remodeling of individual larval neurons that provide input and output to the Drosophila mushroom body through metamorphosis. Hereto, they used a collection of split-GAL4 lines targeting specific larval mushroom body input and output neurons, in combination with a conditional flip-switch and imaging, to follow the fates of these cells.

      Interestingly, most of these larval neurons survive metamorphosis and persist in the adult brain and only a small percentage of neurons die. The authors also elegantly show that a substantial number of neurons actually trans-differentiate and exert a different role in the larval brain, compared to their final adult functionality (similar to their role in hemimetabolous insects). This process is relatively understudied in neuroscience and of great interest.

      Using the ventral nerve cord as a proxy, the authors claim that the larval state of the neuron would be their derived state, while their adult identity is ancestral. While the authors did not show this directly for the mushroom body neurons under study, it is a very compelling hypothesis. However, writing the manuscript from this perspective and not from the perspective of the neuron (which first goes through a larval state, metamorphosis, and finally adult state), results in confusing language and I would suggest the authors adjust the manuscript to the 'lifeline' of the neuron.

      We have tried to be more “linear” in our presentation. This should make the text less confusing.

      In general, this manuscript does not explain how the larval brain has evolved as the title suggests but instead describes how the larval brain is remodeled during metamorphosis. It thus generates perspectives on the evolution of metamorphosis, rather than the larval state. Additionally, this manuscript would benefit from major rearrangements in both text and figures for the story to be better comprehended.

      We think that the end of the Discussion does relate to how a larval brain evolves. The evolution of the larval brain is faced with constraints related to the shortened period of embryonic development and the highly conserved temporal and spatial mechanisms that insects use to generate their neuronal phenotypes. These constraints result in a potential mismatch between the neurons that are needed and those that are actually made (revealed by the adult phenotypes of these neurons). The larva then turns to trans-differentiation to temporarily transform unneeded (or dead) neurons into the missing cell types to build its larval circuits.

      We think that these ideas provide some new insights into how a larval brain may have evolved and that our title is appropriate.

      The introduction is very focused on the temporal patterning of the insect nervous system, while none of the data collected incorporate this temporal code. Temporal patterning comes back in the discussion but is purely speculative.

      The Speculation about the importance of temporal patterning is now brought in late in the Discussion in reference to Figure 12

      Furthermore, the second part of the introduction describes one strategy for remodeling and why that strategy is not likely but does not present an alternative hypothesis. The first section of the results might serve as a better introduction to the paper instead, as it places the results of the paper better and concludes with the main findings. The accompanying Figure 1 would also benefit from a schematic overview of the larval and adult mushroom bodies as presented in Fig. 2A (left).

      This has been revised in the spirit of these comments

      In the second results section, the authors show the post-metamorphic fates of mushroom body input and output neurons and introduce the concept of trans-differentiation. Readers might benefit from a short explanation of this process. I also encourage the authors to revisit this part of the text since it gives the impression that the neurons themselves undergo active migration (instead of axon remodeling).

      We have tried to make it clear that there is no cell migration. Rather there is retraction/fragmentation of larval arbors followed by outgrowth to new, adult targets

      The discussion starts with a very comprehensive overview of the different strategies that neurons could use during metamorphosis (here too, re-writing the text from the neurons' perspective would increase the reflection of what actually happens to them).

      The Discussion now begins by dealing with gross changes in the MB, with reference to the compartments and eventually moves to changes in individual cells. We have reduced our discussion of the metamorphic strategies of cells and no longer have Fig 8A

      The discussion covers multiple topics concerning trans-differentiation, metamorphosis, memory, and evolution and is often disconnected from the results. It could be significantly shortened to discuss the results of the paper and place them in current literature. Generally, the figures supporting the discussion are hard to comprehend and often do not reflect what the text is saying they are showing.

      The Discussion is still long, but, hopefully, our organization now makes it much easier to read and comprehend.

    1. Author Response:

      Reviewer #1 (Public Review):

      Monfared et al. construct a three-dimensional phase-field model of cell layers and use it to examine cellular extrusion by independently tuning cell-substrate and cell-cell adhesion. In line with earlier studies (in some of which some of the authors were involved), they find that extrusion is linked to topological defects in cellular arrangement and relieving stress.<br /> The authors claim that their development of the three-dimensional phase field model is crucial for understanding cell extrusion (which I agree with the authors is inherently three-dimensional). However, I don't think the data they currently present clearly demonstrate that the three-dimensional model adds significantly more to our understanding of extrusion events than earlier two-dimensional models.

      In the end, I think that the more important achievement of this work -- and one that is likely to be more influential -- is developing a three-dimensional phase field model for cell monolayers rather than any specific result regarding extrusion.

      We sincerely thank the reviewer for their time examining our manuscript and providing critical feedback. We are confident that our detailed response provided below and additional analyses have further highlighted the importance of three-dimensional stresses.

      Reviewer #2 (Public Review):

      The paper provides a natural extension of 2D multiphase field models for cell monolayers to 3D, addressing cell deformations, cell-cell interaction, cell-substrate interactions and active components for the cells. As known from 2D, the cell arrangement leads to positional (hexatic) defects and if the elongation of the cells is coarse-grained to define a global nematic order also to orientational (nematic) defects. These defects are characterized, see Figure 2. However, this is done in 2D and it remains unclear if the projected basal or apical side is considered in this figure and the following statistics. The authors identify correlations between orientational defects and extrusion events. In terms of positional defects such statistics seem not to be considered and the relation between positional defects and cell extrusion events remains vague. Also in-plane and out-of-plane stresses are computed. These results confirm a mechanical origin for cell extrusions. However, these are the only 3D information provided. The final claim that the results clearly demonstrate the existence of a mechanical route related with hexatic and nematic disclinations is not clear to me. 3D vertex models for such systems e.g. showed the importance of different mechanical behavior of the apical and basal side and identified scutoids as an essential geometric 3D feature in cell monolayers. These results are not discussed at all. A comparison of the 3D multiphase field model with such results would have been nice.

      We thank the reviewer for bringing to our attention the work on scutoids, which we now discuss in the manuscript as an important geometric feature of 3D layers on curved surfaces. We shall, however, emphasize that scutoids are specific to monolayers on curved surfaces, while we focus on a cell monolayer on flat substrates here. Moreover, we shall argue that the difference between apical and basal sides is just one element of the 3D complexity of cell layers. Here, we focus on another aspect of 3D complexity that is not accessible in 2D: the development of 3D mechanical stress and its role in an inherently 3D problem of cell extrusion. Nevertheless, as discussed in detail responses below we have now added additional analyses varying the monolayer interaction with the substrate on the basal side.

      Reviewer #3 (Public Review):

      In this paper, the authors studied the influence of topological defects on extrusion events using 3D multi-phase field simulations. By varying cell-cell and cell-substrate parameters, this study helps to better understand the influence of mechanical and geometrical parameters on cell extrusion and their linkage to topological defects.

      First the authors show that extrusion events and topological defects of nematic and hexatic order are typically found in their system, and then that extrusions occur, on average, at a distance of a few cell sizes from a + and - 1/2 defects. Next, the author analyse at extrusion events the temporal evolution of the local isotropic stress and the local out-of-plane shear stress, showing that near the instant of extrusion, the isotropic stresses relax and the shear stresses fluctuate around a vanishing value. Finally, the authors analyse both the distribution of isotropic stress and the average isotropic stress pattern near +1/2 defects.

      We are grateful to the reviewer for their time examining our manuscript and providing critical feedback that has certainly improved our manuscript. In what follows, we provide detailed responses to each comment, including additional statistics that we have computed and now include in the manuscript for completion.

    1. Author Response

      Reviewer #1 (Public Review):

      Junctophilin is mostly known as a structural anchor to keep excitation-contraction (E-C) proteins in place for healthy contractile function of skeletal muscle. Here the authors provide a new interesting role in skeletal muscle for Junctophilin (44 kD segment, JPh44), where it translocates to the nuclei and influences gene transcription. Also, the authors have shown that Calpain 1 can digest junctophilin to generate the 44 kDa segment. The field of skeletal muscle generally knows little about how E-C coupling proteins have dual role and influence gene regulation that subsequently may alter the muscle function and metabolism. This part of the manuscript is solid, informative, and novel. The authors use advanced imaging and genetic manipulations of junctophilin etc to support their hypothesis. The authors then also aim to link this mechanism to hyperglycemia in individuals susceptible for malignant hyperthermia as they have elevated levels of the 44kDa segment. However, the power of the analyses are low and the included data comparisons complicates the possibility to interpret the results and its relevance. Nevertheless, the data supporting the novel dual role of junctophilin would likely be appreciated and gain attention to the muscle field.

      Thanks for your constructive reading. We agreed (in our answer to Item 1) to your concern regarding power of the tests. To improve it we would need many more individual patients (which, after the pandemic peaks, are starting to be recruited again, although at a pace of no more than 2 per month). We are committed to updating the present report as soon as we obtain, say, 20 more MHS and MHN patients –a minimum to impact power of the tests. In any case, we claim that power is not an acute concern, as this communication deals mainly with positive results, where significance is of the essence.

      We have established significance in most of the observations communicated here; in the few cases where p is marginal, significance is inferred by correlations.

      Reviewer #2 (Public Review):

      Skeletal muscle is the main regulator of glycemia in mammals and a major puzzle in the field of diabetes is the mechanism by which skeletal muscle (as well as other tissues) become insensitive to insulin or decrease glucose intake. the authors had proposed in a previous publication that high intracellular calcium, by means of calpain activation, could cleave and decrease the availability of GLUT4 glucose transporters. In this manuscript, the authors identify two additional targets of calpain activation. One of them is GSK3β, a specialized kinase that when cleaved, inhibits glycogen synthase and impairs glucose utilization. The second target is junctophilin 1, a protein involved in the structure of the complex responsible for E-C coupling in skeletal muscle. The authors succeeded in showing that a fragment of junctophilin1 (JPh44) moves from the triad to other cytosolic regions including the nuclei and they show changes in gene expression under these conditions, some of them linked to glucose metabolism.

      Overall, the manuscript shows a novel and audacious approach with a careful treatment of the data (that was not always easy nor obvious) that allow sensible conclusions and definitively constitutes a step forward in this field.

      Thanks for the generous report.

      Reviewer #3 (Public Review):

      First, we express utmost gratitude for your critical work on our manuscript. Your concerns made us perform additional experiments and validations, eventually forcing us to abandon a couple of erroneous notions and therefore improving our understanding and interpretations. Because your concerns were already in the “Essentials” list assembled by the Editor, our responses here will mostly refer to our earlier answers to the items in that list.

      1) Figure 1 A and B show a western blot of proteins isolated from muscles of MHN and MHS individuals decorated with two different antibodies directed against JPH1. According to the manufacturer, antibody A is directed against the JPH1 protein sequence encompassing amino acids 387 to 512 while antibody B is directed against a no better specified C-terminal region of JPH1. Surprisingly, antibody B appears not to detect the full-length protein in lysates from human muscles, but recognizes only the 44 kDa fragment of JPH1. However, to the best of the reviewer's knowledge, antibody B has been reported by other laboratories to recognize the full-length JPH1 protein.

      The reason for the failure of ab B to recognize the full human protein may be that it was raised against a murine immunogen (this interpretation was communicated to us by G.D. Lamb, who co-authored the 2013 paper by Murphy et al. where the failure was noted). It recognizes both JPh1 and JPh44 of murine muscle in our hands.

      Thus, is not obvious why here this antibody should recognize only the shorter fragment.

      We agree entirely. In spite of the difficulties in interpretation, the recognition of human JPh44 by the ab is, however, a fact, repeatedly demonstrated in the present study, which can be used to advantage.

      In addition, in MHS individuals there is no direct correlation between reduction in the content of the full-length JPH1 protein and appearance of the 44 kDa JPH1fragment, since, as also reported by the authors, no significant difference between MHN and MHS can be observed concerning the amount of the 44 kDa JPH1.

      Tentative interpretations of the lack of correlation have been presented in the response to Item 14, above.

      Based on the data presented, it is very difficult to accept that antibody A and B have specific selectivity for JPH1 and the 44 kDa fragment of JPH1.

      Indeed, we now acknowledge that Ab A reacts equally with JPh1 and the 44 kDa fragment (and provide quantitative evidence for it in Supplement 1 to Fig. 8). We also provide conclusive evidence of the specificity of ab B (e.g., Supplement 2 to Fig. 1).

      2) In Figure 2B staining of a nucleus is shown only with antibody B against the 44 kDa JPH1 fragment, while no nucleus stained with antibody A is shown in Fig 2A. Images should all be at the same level of magnification and nuclear staining of nuclei with antibody A should be reported. In Figure 2Db labeling of JPH1 covers both the nucleus and the cytoplasm, does it mean that JPH1 also goes to the nucleus? One would rather think that background immunofluorescence may provide a confounding staining and authors should be more cautious in interpreting these data.

      These items are fully covered in our response to Item 16.

      Images in 2D and 2E refer to primary myotubes derived from patients. The authors show that RyR1 signals co-localizes with full-length JPH1, but not with the 44 kDa fragment, recognized by antibody B. How do the authors establish myotube differentiation?

      Myotubes are studied 5-10 days after switching cells to differentiation medium, which is DMEM-F12 supplemented with 2.5% horse serum, as explained in Figueroa et al 2019. Cells with more than 3 nuclei were considered myotubes. Myotubes with similar degree of maturation (number of nuclei) were selected for experimental comparisons.

      3) Figure 3 A-C. The authors show images of a full-length JPH1 tagged with GFP at the N-terminus and FLAG at the C- terminus. In Figure 3Ad and Cd the Flag signal is all over the cytoplasm and the nuclei: since these are normal mouse cells and fibers, it is surprising that the FLAG signal is in the nuclei with an intensity of signal higher than in patient's muscle.

      Can the authors supply images of entire myotubes, possibly captured in different Z planes? How can they distinguish between the cleaved and uncleaved JPH1 signals, especially in mouse myofibers, where calpain is supposed not to be so active as in MHS muscle fibers?

      Answer fully provided to Items 16b and 17 in Essentials list.

      4) If the 44 kDa JPH1 fragment contains a transmembrane domain, it is difficult to understand the dual sarcoplasmic reticulum and nuclear localization. To justify this the authors, in the Discussion session, mention a hypothetical vesicular transport of the 44 kDa JPH1 fragment by vesicles. Traffic of proteins to the nucleus usually occurs through the nuclear pores and does not require vesicles. Even if diffusion from the SR membrane to the nuclear envelope occurs, the protein should remain in the compartment of the membrane envelope. There is no established evidence to support such an unusual movement inside the cells.

      In agreement with the criticism, we have removed the speculation from the Discussion.

      5) In Figure 5, the authors show the effect of Calpain1 on the full-length and 44 kDa JPH1 fragment in muscles from MHS patients. Can the authors repeat the same analysis on recombinant JPH1 tagged with GFP and FLAG?

      We agree that confirmatory evidence of the calpain effect on dual-tagged recombinant JPh1 would be desirable. However, we think an in-depth study is required to follow up on the number of JPh1 fragments generated by calpain (or by different calpain isoforms) and their positions, similar to the detailed study of JPh2 fragmentation Wang et al. in 2021 (5).

      Can the authors provide images from MHN muscle fibers stained with JPH1 and Calpain1.

      We complied with the request.

      6) In Figure 6, the authors show images of MHS derived myotubes transfected with FLAG Calpain1 and compare the distribution of endogenous JPH1 and RYR1 in two cells, one expressing FLAG Calpain1 (cell1) and one not expressing the recombinant protein. They state that cell1 shows a strong signal of JPH1 in the nucleus, while this is not observed in cell2. Nevertheless, it is not clear where the nucleus is located within cell2 since the distribution of JPH1 is homogeneous across the cell. Can the authors show a different cell?

      In agreement, we now show a comparison between cultures with and without transfection in Supplement 1 to Fig. 6.

      7) In Figure 7, panels Bb and Db: nuclei appear to stain positive for JPH1. It is not clear why in panels Ac, Bc they show a RYR1 staining while in panels Cc and Dc they show N-myc staining. The differential localization to nuclei appears rather poor also in these panels.

      We have entirely removed from the manuscript the description of experiments of exposure to extracellular calpain, including Fig. 7 and three associated tables.

      8) The strong nuclear staining in Figure 8, panels C and D is very different from the staining observed in Fig. 2 and Fig. 3. Transfection should not change the ratio between nuclear and cytoplasmic distribution.

      Transfection is an intrusive procedure, which requires production and trafficking of an exogenous protein. This protein, furthermore, is an artificial construct (in this case, a “stand-in”, which adds to the native protein and therefore is akin to overexpression). For the above reasons, we believe that differences in intensity of nuclear staining may obey to multiple causes and should not be especially concerning.

    1. Author Response

      Reviewer #1 (Public Review):

      1) This study performs an interesting analysis of evolutionary variation and integration in forelimb/hand bone shapes in relation to functional and developmental variation along the proximo-distal axis. They found expected patterns of evolutionary shape variation along the proximo-distal axis but less expected patterns of shape integration. This study provides a strong follow-up to previous studies on mammal forelimb variation, adding and testing interesting hypotheses with an impressive dataset. However, this study could better highlight the relevance of this work beyond mammalian forelimbs. The study primarily cites and discusses mammalian limb studies, despite the relevance of the suggested findings beyond mammals and forelimbs. Furthermore, relevant work exists in other tetrapod clades and structures related to later-developing traits and proximo-distal variation. Finally, variations in bone size and shape along the proximo-distal axis could be affecting evolutionary patterns found here and it would be great to make sure they are not influencing the analysis/results.

      We appreciate the reviewer’s comments, and we acknowledge the importance of including examples of non-mammalian lineages in our study. We attended to the recommendation and included more examples of other tetrapod taxa in our text and in our references, providing a more inclusive discussion of limb bone diversity beyond mammals. We also explain below why the results obtained are not inflated by variation of bigger versus smaller sizes of bones.

      Reviewer #2 (Public Review):

      10) Congratulations on producing a very nice study. Your study aims to examine the morphological diversity of different mammalian limb elements, with the ultimate goal seemingly to test expectations based on the different timing of development of the limb bones. There's a lot to like: the sample size is impressive, the methods seem appropriate and sound, the results are interesting, the figures are clear, and the paper is very well written. You find greater diversity and integration in distal limb segments compared to proximal elements, and this may be due to the developmental timing and/or functional specialization of the limb segments. These are interesting results and conclusions that will be of interest to a broad readership. And the large dataset will likely be valuable to future researchers who are interested in mammalian limb morphology and evolution. I have one major concern with how you frame your discussion and conclusions, which I explain below. But I think you can address this issue with some text edits.

      We sincerely thank the reviewer for his constructive recommendations and for his appreciation of our work. We addressed the issue raised as detailed below.

      11) Major concern - is developmental timing the best hypothesis?

      You discuss two potential drivers for the relatively greater diversity in distal elements: 1) later development and 2) greater functional specialization. Your data doesn't allow you to fully test these two hypotheses (e.g. you don't have detailed evo-devo data to infer developmental constraints), and I think you realize this - you use phrases like "consistent with the hypothesis that ...". You seem to compromise and conclude that both factors (development + function) are likely driving greater autopod diversity (e.g. Lines 302-306). Being unable to fully test these hypotheses weakens the impact of your conclusions, making them a bit more speculative, but otherwise, it isn't a critical issue.

      But my concern is that you seem to favor developmental factors over functional factors as the primary drivers of your results, and that seems backwards to me. For instance, early in the Abstract (Line 32) and early in the Discussion (Line 201) you mention that your results are consistent with the developmental timing hypothesis, but it's not until later in the Abstract or Discussion that you mention the role of functional diversity/specialization/selection. The problem with favoring the development hypothesis is that your integration results seem to contradict that hypothesis, at least based on your prediction in the Introduction (Line 126; although you spend some of the Discussion trying to make them compatible). Later in the paper, you acknowledge that functional specialization (rather than developmental factors) might be a better explanation for the integration results (Lines 282-284, 345-347), but, again, this is only after discussions about developmental factors.

      When you first start discussing functional diversity, you say, "high integration in the phalanx and metacarpus, possibly favoured the evolution of functionally specialized autopod structures, contributing to the high variation observed in mammalian hand bones." (Line 282). This implies that integration led to functional diversity in the autopod. But I'd flip that: I think the functional specialization of the hand led to greater integration. Integration does not result solely from genetic/developmental factors. It can also result from traits evolving together because they are linked to the same function. From Zelditch & Goswami (2021, Evol. & Dev.): "Within individuals, integration is customarily ascribed to developmental and/or functional interdependencies among traits (Bissell & Diggle, 2010; Cheverud, 1982; Wagner, 1996) and modularity is thus due to their developmental and/or functional independence."

      In sum, I think your results capture evidence of greater functional specialization in hands relative to other segments. You're seeing greater 1) disparity and 2) integration in hands, and both of those are expected outcomes of greater functional specialization. In contrast, I think it's harder to fit your results to the developmental timing hypothesis. Thus, I recommend that throughout the paper (Abstract, Intro, Discussion) you flip your discussion of the two hypotheses and start with a discussion on how functional specialization is likely driving your results, and then you can also note that some results are consistent with the development hypothesis. You could maintain most of your current text, but I'd simply rearrange it, and maybe add more discussion on functional diversity to the Intro.

      Or, if you disagree and think that there's more support for the development hypothesis, then you need to make a better case for it in the paper. Right now, it feels like you're trying to force a conclusion about development without much evidence to back it up.

      We thank the reviewer for his thoughtful and thorough comment. We agree that the results provided, particularly those of integration, support the hypothesis that functional specialization contributes to the uneven diversity of limb bones. We addressed the concerns by substantially changing our discussion, particularly moderating (and removing) sections on the developmental constraints and adding new arguments for other possible drivers for the diversity of limb bones, such as function. However, the goal of the paper was to test whether the data corroborate - or not - the predictions derived from the developmental hypothesis, and they largely do. Therefore, we decided to keep the developmental hypothesis presented first in the introduction and in the discussion section, as we believe this sequence provides more coherence considering the hypothesis tested (we believe that detailing the role of functional specialization particularly in the introduction would mislead the reader to think that we directly tested for these parameters). Following the discussion of the integration results, we then go on to discuss the possible role of functional specialization on the results obtained (lines 262-285, see also lines 216-234). Yet, these are not tested in this paper and remain to be tested in a future analysis focusing specifically on the role of ecology and function in driving variation in the mammalian limb.

      12) Limitations of the dataset

      Using linear measurements is fine, but they mainly just capture simple aspects of the elements (lengths and widths). You should acknowledge in your paper the limitations of that type of data. For example, the deltoid tuberosity of the humerus can vary considerably in size and shape among mammals, but you don’t measure that structure. The autopod elements don’t have a comparable process, meaning that if you were to measure the deltoid tuberosity then you’d likely see a relative increase in humerus disparity (although my guess is that it’d still be well below that of the autopod). And you omit the ulna from your study, and its olecranon process varies considerably among taxa and its length is a very strong correlate of locomotor mode. In other words, your finding of the greatest disparity in the hand might be due in part to your choice of measurements and the omission of measurements of specific processes/elements. I recommend that you add to your paper a brief discussion of the limitations of using linear measurements and how you might expect the results to change if you were to include more detailed measurements and/or more elements.

      We followed the recommendation and included a discussion about the dataset limitations, acknowledging for the possible impact of the measurements and the bones chosen in the results obtained (Lines 235-260).

      Reviewer #3 (Public Review):

      32) This paper uses a large (638 species representing 598 genera in 138 families) extant sample of osteologically adult mammals to address the question of proximodistal patterns of cross-taxonomic diversity in forelimb bony elements. The paper concludes, based on a solid phylogenetically controlled multivariate analysis of liner measurements, that proximal forelimb elements are less morphologically diverse and evolutionarily flexible than distal forelimb elements, which the paper concludes is consistent with a developmental constraint axis tied to limb bud growth and development. This paper is of interest to researchers working on macroevolutionary patterns and sources of morphological diversity.

      Methodological review Strengths:

      The taxonomic dataset is very comprehensive for this sort of study and the authors have given consideration to how to identify bony elements present in all mammalian taxa (no small task with this level of taxonomic breadth). Multivariate approaches as used in this study are the gold standard for addressing questions of morphological variations.

      The authors give consideration to two significant confounders of analyses operating at this scale: phylogeny and body size. The methods they use to address these are appropriate, although as I note below body size itself may merit more consideration.

      We sincerely thank the reviewer for his appreciation of our study. We addressed the main concerns pointed out below.

      Weaknesses:

      33) The authors assume a lot of knowledge on the part of the reader regarding their methods. Given that one of their key metrics (stationary variance) is largely a property as I understand it of OU models, more explanation on the authors' biological interpretation of stationary variance would help assess the strength of their conclusions, especially as OU models are not as straightforward as they first appear in their biological interpretation (Cooper et al., 2016).

      We acknowledge that this may not be straightforward and now include a more extensive explanation of the approach and the metrics used. We detailed the explanation about the stationary variances in the methods, contextualizing the biological meaning (lines 456-469).

      34) It is unclear what the authors mean when they say they "simulated the trait evolution under OU processes on 100 datasets". Are the 100 datasets 100 different tree topologies (as seems to be the case later "we replicated the body mass linear regressions with 100 trees from Upham et al (2019)." If that is so, what is the rationale for choosing 100 topologies and what criteria were used to select the 100 topologies?

      We understand the explanation may have been confusing. Globally, we used a parametric bootstrap approach to assess the uncertainty around point estimates for morphological diversity and integration. That is, we first simulated 100 datasets on the maximum clade credibility tree (MCC tree, that summarizes 10,000 trees from Upham et al. 2019) – using the best fit model on our original data (i.e., an OU process) with parameters estimates from this model fit. The model (an OU process) was then fit to these 100 simulated traits, and the distribution of parameters estimates obtained was used to assess the variability around the point estimate (for the determinant, the trace, and the measure of integration) obtained on empirical data. We did not used the simulated dataset to estimate the significance of the stationary variances. We fitted the empirical datasets with 100 trees randomly sampled from the credible set of 10,00 trees of Upham et al (2019) – instead of using the MCC – to further assess the variability due to the tree topology and branching times uncertainties. We included this expanded explanation in the methods in lines 421-428 and 471.

      35) The way the authors approach body mass and allometry, while mathematically correct, ignores the potential contribution of body mass to the questions the authors are interested in. Jenkins (1974) for example argued that small mammals would converge on similar body posture and functional morphology because, at small sizes, all mammals are scansorial if they are not volant. Similarly, Biewener (1989) argued that many traits we view as cursorial adaptations are actually necessary for stability at large body sizes. Thus size may actually be important in determining patterns of variation in limb bone morphology.

      We agree with the observation. We believe that categorizing the groups according to size would provide a meaningful overview on the effect of size on the diversity and evolution of limb bones. Although insightful and worthy of investigation, we were particularly interested in understanding whether developmental timing corresponds to bone diversification more broadly across Mammalia and thus considered only the size residual values. This issue will be addressed in our future works. We discussed in the lines 329-341 the potential contribution of body size to limb segment diversification and the importance of considering this aspect in future studies.

      36) Review of interpretation.

      The authors conclude that their result, in showing a proximo-distal gradient of increasing disparity and stationary variance in forelimb bone morphology, supports the idea that proximo-distal patterning of limb bone development constrains the range of morphological diversity of the proximal limb elements. However, this correlation ignores two important considerations. The first is that the stylopod connects to the pectoral girdle and the axial skeleton, and so is feasibly more constrained functionally, not developmentally in its morphological evolution. The second, related, issue arises from the authors' study itself, which shows that the lowest morphological integration is found in the stylopod and zeugopod, whereas the autopod elements are highly integrated. This suggests a greater tendency towards modularity in the stylopod and zeugopod, which is itself a measure of evolutionary lability (Klingenberg, 2008). And indeed the mammalian stylopod is developmentally comprised of multiple elements (the epiphyses and diaphysis) that are responding to very different developmental and biomechanical signals. Thus, for example, the functional signal in stylopod (Gould, 2016) and zeugopod (MacLeod and Rose, 1993) articular surface specifically is very high. What is missing to fully resolve the question posed by the authors is developmental data indicating whether or not the degree of morphological disparity in the hard tissues of the forelimb change over the course of ontogeny throughout the mammalian tree, and whether changing functional constraints over ontogeny (as is the case in marsupials) affect these patterns.

      We thank the reviewer for sharing such an interesting reinterpretation of the results. Combined to the recommendations from the other two reviewers, we substantially changed our discussion, specially modifying the interpretation of results concerning trait integration. We discussed the possible role of the functional variation at the articulations on element integration in lines 263-285.

    1. Author Response

      Reviewer #2 (Public Review):

      This paper investigates the maintenance and function of memory follicular helper T (Tfh) cell subsets using in vitro approaches, murine immunization models and vaccine-challenged humans. Murine Tfh cell subsets (Tfh1, Tfh2, Tfh17) were generated using in vitro polarization (iTfh1, iTfh2, iTfh17), and then tested for support of humoral response following adoptive transfer or adoptive transfer with resting in vivo for 35 days. iTfh17 cells were statistically better than iTfh1 and iTfh2 cells in promoting GC B cell and plasma cell maturation after resting in vivo, although all 3 populations were capable of B cell help. Tfh17 cells were comparatively enriched among blood borne Tfh central memory cells in humans, and were enriched at the memory phase of vaccination with hepatitis B and influenza vaccines, compared to effector phase, suggesting the possibility they are comparatively superior in Tfh cell memory formation, with greater persistence in aged individuals.

      Significance

      The enrichment of Tfh17 cells in Tfh cell central memory compartment and the dominance of Tfh17 cell population and the Tfh17 transcriptional signature in circulating Tfh cells at the memory phase are nicely demonstrated, and may well be helpful for understanding the heterogeneity of memory Tfh cells and potentially providing clues for vaccine design. The in vitro differentiation system for mouse Tfh cells also provides a strategy for others to build upon in dissection of Tfh cell development and function.

      Points to consider

      1) Even though Tfh17 cells are more likely to persist at memory timepoints in mice and in humans, or produce more GC B cells or plasma cells following transfer, all subsets can do this. Is GC output otherwise distinguishable following transfer of the individual subsets, or is their effect (cytokine related perhaps) pre-GC with differential CSR? It is also not clear if the individual subsets populate the GC and assuming they do so, if their respective phenotypes persist when they become GC Tfh cells.

      We have conducted new experiments and showed that iTfh17 preferentially generate more GC-Tfh cells in the delay immunization (after 35 day’s resting in vivo). Furthermore, different iTfh subsets maintained polarized cytokine profiles after antigen re-exposure and prompt specific CSR as their Th1 or Th2 counterparts. Please refer to the response (2) to Essential Revisions for details.

      2) iTfh17 cells induce more GC B cells and antibodies after resting and antigen challenge (Figures 1, 2). However, it's not clear whether this effect is a consequence of comparatively enhanced iTfh17 survival during resting (as suggested by latter figures), or better expansion or differential skewing to Tfh differentiation during challenge (as suggested by Figure 1 J,K). The total number of remaining adoptively-transferred cells right before challenge and 7 days post challenge will be helpful to understand that.

      We have conducted new experiments and our results suggested that the superior immunological memory maintenance of iTfh17 cells was attributed to their better survival capacity and better maintenance of the potential to differentiate into GC-Tfh cells. Please refer to the response (2) to Essential Revisions for details.

      3) The authors tried to address whether Tfh17 cells have better ability to survive till memory phase or Tfh17 cells with memory potential are generated at higher frequency at the effector phase of vaccination (Figure 5); however, the experiment is not conclusive. The cTfh population 7 days post vaccination is a mixed population with effector Tph cells and Tfh memory precursors. The increased frequency of Th17 cells at day 28 compared to day 7 could be a consequence of superior survival ability, or Tfh memory precursors with Tfh17 signature are better generated.

      As indicated in our gating strategy and the widely accepted definition of cTfh cells - CD4+ CD45RA- CXCR5+ (line 69), we respectively disagree with the reviewer’s comment ‘The cTfh population 7 days post vaccination is a mixed population with effector Tph cells and Tfh memory precursors’. The effector Tph population is defined as PD-1hiCXCR5-CD4+ T cells (Rao DA et al. Pathologically expanded peripheral T helper cell subset drives B cells in rheumatoid arthritis, Nature 2017)

      4) Experiments to confirm expansion ability of the human subsets or their B cell helper ability were not performed.

      In our new experiments, we demonstrated that iTfh1/2/17 cells showed comparable expansion ability.

      Human cTfh1/2/17 cells’ expansion ability and B helper ability were reported previously by Morita et al. (Human blood CXCR5(+)CD4(+) T cells are counterparts of T follicular cells and contain specific subsets that differentially support antibody secretion, Immunity 2011, Figure 4C-D). Human cTfh1/2/17 cells showed comparable expansion ability when co-culturing with SEB-pulsed naive B cells, and cTfh17 cells had superior B cell helper function over cTfh1 but not cTfh2 cells in promoting the B cell expansion and plasma cell formation.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Goering et al. investigate subcellular RNA localization across different cell types focusing on epithelial cells (mouse C2bbe1 and human HCA-7 enterocyte monolayers, canine MDCK epithelial cells) as well as neuronal cultures (mouse CAD cells). They use their recently established Halo-seq method to investigate transcriptome-wide RNA localization biases in C2bbe1 enterocyte monolayers and find that 5'TOP-motif containing mRNAs, which encode ribosomal proteins (RPs), are enriched on the basal side of these cells. These results are supported by smFISH against endogenous RP-encoding mRNAs (RPL7 and RPS28) as well as Firefly luciferase reporter transcripts with and without mutated 5'TOP sequences. Furthermore, they find that 5'TOP-motifs are not only driving localization to the basal side of epithelial cells but also to neuronal processes. To investigate the molecular mechanism behind the observed RNA localization biases, they reduce expression of several Larp proteins and find that RNA localization is consistently Larp1-dependent. Additionally, the localization depends on the placement of the TOP sequence in the 5'UTR and not the 3'UTR. To confirm that similar RNA localization biases can be conserved across cell types for other classes of transcripts, they perform similar experiments with a GA-rich element containing Net1 3'UTR transcript, which has previously been shown to exhibit a strong localization bias in several cell types. In order to determine if motor proteins contribute to these RNA distributions, they use motor protein inhibitors to confirm that the localization of individual members of both classes of transcripts, 5'TOP and GA-rich, is kinesin-dependent and that RNA localization to specific subcellular regions is likely to coincide with RNA localization to microtubule plus ends that concentrate in the basal side of epithelial cells as well as in neuronal processes.

      In summary, Goering et al. present an interesting study that contributes to our understanding of RNA localization. While RNA localization has predominantly been studied in a single cell type or experimental system, this work looks for commonalities to explain general principles. I believe that this is an important advance, but there are several points that should be addressed.

      Comments:

      1) The Mili lab has previously characterized the localization of ribosomal proteins and NET1 to protrusions (Wang et al, 2017, Moissoglu et al 2019, Crisafis et al., 2020) and the role of kinesins in this localization (Pichon et al, 2021). These papers should be cited and their work discussed. I do not believe this reduces the novelty of this study and supports the generality of the RNA localization patterns to additional cellular locations in other cell types.

      This was an unintentional oversight on our part, and we apologize. We have added citations for the mentioned publications and discussed our work in the context of theirs.

      2) The 5'TOP motif begins with an invariant C nucleotide and mutation of this first nucleotide next to the cap has been shown to reduce translation regulation during mTOR inhibition (Avni et al, 1994 and Biberman et al 1997) and also Lapr1 binding (Lahr et al, 2017). Consequently, it is not clear to me if RPS28 initiates transcription with an A as indicated in Figure 3B. There also seems to be some differences in published CAGE datasets, but this point needs to be clarified. Additionally, it is not clear to me how the 5'TOP Firefly luciferase reporters were generated and if the transcription start site and exact 5'-ends of these constructs were determined. This is again essential to determine if it is a pyrimidine sequence in the 5'UTR that is important for localization or the 5'TOP motif and if Larp1 is directly regulating the localization by binding to the 5'TOP motif or if the effect they observe is indirect (e.g. is Larp1 also basally localized?). It should also be noted that Larp1 has been suggested to bind pyrimidine-rich sequences in the 5'UTR that are not next to the cap, but the details of this interaction are less clear (Al-Ashtal et al, 2021)

      We did not fully appreciate the subtleties related to TOP motif location when we submitted this manuscript, so we thank the reviewer for pointing them out.

      We also analyzed public CAGE datasets (Andersson et al, 2014 Nat Comm) and found that the start sites for both RPL7 and RPS28 were quite variable within a window of several nucleotides (as is the case for the vast majority of genes), suggesting that a substantial fraction of both do not begin with pyrimidines (Reviewer Figure 1). Yet, by smFISH, endogenous RPL7 and RPS28 are clearly basally/neurite localized (see new figure 3C).

      Reviewer Figure 1. Analysis of transcription start sites for RPL7 (A) and RPS28 (B) using CAGE data (Andersson et al, 2014 Nat Comm). Both genes show a window of transcription start sites upstream of current gene models (blue bars at bottom).

      A more detailed analysis of our PRRE-containing reporter transcripts led us to find that in these reporters, the pyrimidine-rich element was approximately 90 nucleotides into the body of the 5’ UTR. Yet these reporters are also basally/neurite localized. The organization of the PRRE-containing reporters is now more clearly shown in an updated figure 3D.

      From these results, it would seem that the pyrimidine-rich element need not be next to the 5’ cap in order to regulate RNA localization. To generalize this result, we first used previously identified 5’ UTR pyrimidine-rich elements that had been found to regulate translation in an mTOR-dependent manner (Hsieh et al 2012). We found that, as a class, RNAs containing these motifs were similarly basally/neurite localized as RP mRNAs. These results are presented in figures 3A and 3I.

      We then asked if the position of the pyrimidine-rich element within the 5’ UTR of these RNAs was related to their localization. We found no relationship between element position and transcript localization as elements within the bodies of 5’ UTRs were seemingly just as able to promote basal/neurite localization as elements immediately next to the 5’ cap. These results are presented in figures 3B and 3J.

      To further confirm that pyrimidine-rich elements need not be immediately next to the 5’ cap, we redesigned our RPL7-derived reporter transcripts such that the pyrimidine-rich motif was immediately adjacent to the 5’ cap. This was possible because the reporter uses a CMV promoter that reliably starts transcription at a known nucleotide. We then compared the localization of this reporter (called “RPL7 True TOP”) to our previous reporter in which the pyrimidine-rich element was ~90 nt into the 5’ UTR (called “RPL7 PRRE”) (Reviewer Figure 2). As with the PRRE reporter, the True TOP reporter drove RNA localization in both epithelial and neuronal cells while purine-containing mutant versions of the True TOP reporter did not (Reviewer Figure 2A-D). In the epithelial cells, the True TOP was modestly but significantly better at driving basal RNA localization than the PRRE (Reviewer Figure 2E) while in neuronal cells the True TOPs were modestly but insignificantly better. Again, this suggests that pyrimidine-rich motifs need not be immediately cap-adjacent in order to regulate RNA localization.

      Reviewer Figure 2. Experimental confirmation that pyrimidine-rich motif location within 5’ UTRs is not critical for RNA localization. (A) RPL7 True TOP smFISH in epithelial cells. (B) RPL7 True TOP smFISH in neuronal cells. (C) Quantification of epithelial cell smFISH in A. (D) Quantification of neuronal cell smFISH in D. (E) Comparison of the location in epithelial cells of endogenous RPL7 transcripts, RPL7 PRRE reporter transcripts, and PRL7 True TOP reporter transcripts. (F) Comparison of the neurite-enrichment of RPL7 PRRE reporters and RPL7 True TOP reporters. In C-F, the number of cells included in each analysis is shown.

      In response to the point about whether the localization results are direct effects of LARP1, we did not assay the binding of LARP1 to our PRRE-containing reporters, so we cannot say for sure. However, given that PRRE-dependent localization required LARP1 and there is much evidence about LARP1 binding pyrimidine-rich elements (including those that are not cap-proximal as the reviewer notes), we believe this to be the most likely explanation.

      It should also be noted here that while pyrimidine-rich motif position within the 5’ UTR may not matter, its location within the transcript does. PRREs located within 3’ UTRs were unable to direct RNA localization (Figure 5).

      3) In figure 1A, they indicate that mRNA stability can contribute to RNA localization, but this point is never discussed. This may be important to their work since Larp1 has also been found to impact mRNA half-lives (Aoki et al, 2013 and Mattijssen et al 2020, Al-Ashtal et al 2021). Is it possible the effect they see when Larp1 is depleted comes from decreased stability?

      We found that PRRE-containing reporter transcripts were generally less abundant than their mutant counterparts in C2bbe1, HCA7, and MDCK cells (figure 3 – figure supplements 5, 6, and 8) although the effect was not consistent in mouse neuronal cells (figure 3 – figure supplement 13).

      However, we don’t think it is likely that the changes in localization are due to stability changes. This abundance effect did not seem to be LARP1-dependent as both PRRE-containing and PRRE-mutant reporters were generally more expressed in LARP1-rescue epithelial cells than in LARP1 KO cells (figure 4 – figure supplement 9).

      It should be noted here that we are not ever actually measuring transcript stability but rather steady state abundances. It cannot therefore be ruled out that LARP1 is regulating the stability of our PRRE reporters. Given, though, that their localization was dependent on kinesin activity (figures 7F, 7G), we believe the most likely explanation for the localization effects is active transport.

      4) Also Moor et al, 2017 saw that feeding cycles changed the localization of 5'TOP mRNAs. Similarly, does mTOR inhibition or activation or simply active translation alter the localization patterns they observe? Further evidence for dynamic regulation of RNA localization would strengthen this paper

      We are very interested in this and have begun exploring it. We have data suggesting that PRREs also mediate the feeding cycle-dependent relocalization of RP mRNAs. As the reviewer says, we think this leads to a very attractive model involving mTOR, and we are currently working to test this model. However, we don’t have the room to include those results in this manuscript and would instead prefer to include them in a later manuscript that focuses on nutrient-induced dynamic relocalization.

      5) For smFISH quantification, is every mRNA treated as an independent measurement so that the statistics are calculated on hundreds of mRNAs? Large sample sizes can give significant p-values but have very small differences as observe for Firefly vs. OSBPL3 localization. Since determining the biological interpretation of effect size is not always clear, I would suggest plotting RNA position per cell or only treat biological replicates as independent measurements to determine statistical significance. This should also be done for other smFISH comparisons

      This is a good suggestion, and we agree that using individual puncta as independent observations will artificially inflate the statistical power in the experiment. To remedy this in the epithelial cell images, we first reanalyzed the smFISH images using each of the following as a unique observation: the mean location of all smFISH puncta in one cell, the mean location of all puncta in a field of view, and the mean location of all puncta in one coverslip. With each metric, the results we observed were very similar (Reviewer Figure 3) while the statistical power of course decreased. We therefore chose to go with the reviewer-suggested metric of mean transcript position per cell.

      Reviewer Figure 3. C2bbe1 monolayer smFISH spot position analysis. RNA localization across the apicobasal axis is measured by smFISH spot position in the Z axis. This can be plotted for each spot, where thousands of spots over-power the statistics. Spot position can be averaged per cell as outlined manually within the FISH-quant software. This reduces sample size and allows for more accurate statistical analysis. When spot position is averaged per field of view, sample size further decreases, statistics are less powered but the localization trends are still robust. Finally, we can average spot position per coverslip, which represents biological replicates. We lose almost all statistical power as sample size is limited to 3 coverslips. Despite this, the localization trends are still recognizable.

      When we use this metric, all results remain the same with the exception of the smFISH validation of endogenous OSBPL3 localization. That result loses its statistical significance and has now been omitted from the manuscript. All epithelial smFISH panels have been updated to use this new metric, and the number of cells associated with each observation is indicated for each sample.

      For the neuronal images, these were already quantified at the per-cell level as we compare soma and neurite transcript counts from the same cell. In lieu of more imaging of these samples, we chose to perform subcellular fractionation into soma and neurite samples followed by RT-qPCR as an orthogonal technique (figure 3K, figure 3 supplement 14). This technique profiles the population average of approximately 3 million cells.

      6) F: How was the segmentation of soma vs. neurites performed? It would be good to have a larger image as a supplemental figure so that it is clear the proximal or distal neurites segments are being compared

      All neurite vs. soma segmentations were done manually. An example of this segmentation is included as Reviewer Figure 4. This means that often only proximal neurites segments are included in the analysis as it is often difficult to find an entire soma and an entire neurite in one field of view. However, in our experience, inclusion of more distal neurite segments would likely only strengthen the smFISH results as we often observe many molecules of localized transcripts in the distal tips of these neurites.

      Reviewer Figure 4. Manual segmentation of differentiated CAD soma and neurite in FISH-quant software. Neurites that do not overlap adjacent neurites are selected for imaging. Often neurites extend beyond the field of view, limiting this assay to RNA localization in proximal neurites.

      Also, it should be noted that the neuronal smFISH results are now supplemented by experiments involving subcellular fractionation and RT-qPCR (figure 3 supplement 14). These subcellular fractionation experiments collect the whole neurite, both the proximal and distal portions.

      Text has been added to the methods under the header “smFISH computational analysis” to clarify how the segmentation was done.

    1. Author Response

      Reviewer #1 (Public Review):

      This is timely and foundational work that links cellular neurophysiology with extracellular single-unit recordings used to study LC function during behavior.

      The strengths of this paper include:

      1. Providing an updated assessment of LC cell morphology and cell types since much of the prior work was completed in the late 1970s and early to mid-1980s.

      2. Connecting LC cell morphology with membrane properties and action potential shape.

      3. Showing that neurons of the same type have electrical coupling

      Collectively, these findings help to link LC neuron morphology and firing properties with recent work using extracellular recordings that identify different types of LC single units by waveform shape.

      Another strength of this work is that it addresses recent findings suggesting the LC neurons may release glutamate by showing that, at least within the LC, there is no local glutamatergic excitatory transmission.

      Weaknesses:

      The authors also propose to test the role of single LC neuron activity in evoking lateral inhibition, as well as proposing that electrical coupling between LC cell pairs is organized into a train pattern. The former point is based on a weak premise and the latter point has weak support in their data given the analyses performed.

      Point 1: lateral inhibition in the LC

      The authors write in the abstract that "chemical transmission among LC noradrenergic neurons was not detected" and this was a surprising claim given the wealth of prior evidence supporting this in vitro and in vivo (Ennis & Aston-Jones 1986. Brain Res 374, 299-305; Aghajanian, Cedarbaum & Wang 1977. Brain Res 136, 570-577; Cedarbaum & Aghajanian. 1978 Life Sci 23, 1383-1392).

      Huang et al. 2007 (Huang et al. 2007. Proc National Acad Sci 104, 1401-1406) showed that local inhibition in the LC is highly dependent on the frequency of action potentials, such that local release requires multiple APs in short succession and then requires some time for the hyperpolarization to appear (even over 1 sec). This work suggests that it is not a "concentration issue" per se, rather it is just that a single AP will not cause local NE release in the LC. Although the authors did try 5APs at 50Hz this may not be enough to generate local NE release according to this prior work. A longer duration may be needed. Additionally, although the authors incubated the slices with a NET inhibitor, that will not increase volume transmission unless there is actually NE release, which may have not happened under the conditions tested. In sum, there is no reason to expect that a single AP from one neuron would cause an immediate (within the 100 msec shown in Fig 3B) hyperpolarization of a nearby neuron. Therefore, the premise of the experiment that driving one neuron to fire one AP (or even 5AP's at 50Hz in some) is not an actual test of lateral inhibition mediated by NE volume neurotransmission in the LC. Strong claims that "chemical transmission...was not detected" require substantial support and testing of a range of AP frequencies and durations. Given the wealth of evidence supporting lateral inhibition of the LC, this claim seems unwarranted.

      We thank the reviewers for their constructive comments and interpretations of the data regarding lateral inhibition. In fact, we were fully aware of the prior wealth of data supporting the existence of lateral inhibition and have discussed possible reasons for the absence of lateral inhibition in our dataset. Now both reviewers provided additional potential explanations for this absence. The most plausible explanation appears to be that α2AR-mediated lateral inhibition is a population phenomenon, which would not be readily detected at the single-cell level in in vitro conditions. As reviewers suggested, we may need to vary spike frequency and timing to identify optimal spiking parameters (or stimulating multiple LC neurons at one time) to detect this phenomenon in slices. Alternatively, we could employ other approaches (optogenetic or chemogenetic approach) to activate a group of neurons at one time to evoke this phenomenon, as a recent preprint paper demonstrated (Line 528-535). All these are excellent suggestions, but it may take more than six months to complete these experiments since we need to train another person from scratch for LC recordings (the first author graduated from the program and has left the lab). We thus chose to remove most of the data (about α2AR-mediated lateral inhibition) from the paper in the revision, as the reviewers suggested. We do plan to further explore this interesting topic in our next study.

      Point 2: Train-like connection pattern

      Demonstrating that connected cell pairs often share a common member is an important demonstration of a connection motif in the LC. However, a "train" connection implies that you can pass from A to B to C to D (and in reverse). However, the authors do not do an analysis to test whether this occurs. Therefore, "train" is not an appropriate term to describe the interesting connection motif that they observed.

      In fact, writing A↔B↔C in the paper implies a train without direct support for that form of electrical transmission. For example, in Fig. 6C, it is clear that cell 6 is coupled to cell 1 and that cell 6 is also coupled to cell 8. In both cases, the connection is bilateral. Using the author's formatting of A↔B↔C , would correspond with Cell 6 being B and cells 1 and 8 being A and C (or vice versa). However, writing A↔B↔C implies a train, whereas one can instead draw this connection pattern where B is a common source:

      A C

      . .

      . .

      B

      An analysis showing that spikes in A can pass through B and later appear in C is necessary to support the use of "train". The example in Fig. 6C argues against train at least for this one example.

      Although the analysis is possible to do with the authors' substantial and unique data set, it should be also noted that prior work on putative electrical coupling in extracellular recordings from rat LC showed that trains among 3 single units occurred at an almost negligible rate because out of 12 rats "Only 1 triplet out of 22,100 possible triplet patterns (0.005%) was found in one rat and 4 triplets out of 1,330 possible triplet patterns (0.301%) were found in the other rat." and moreover patterns beyond 3 units were never observed (Totah et al 2018. Neuron 99, 1055-1068.e6). We thank the reviewer for this astute argument and agree that the word “train-like connection” assumes a physiological, functional relationship A→B→C which the data do not show. Therefore, we now term these connections as “chain-like” to indicate the structural nature of the connection, which we believe leaves no room for the interpretation that there is a functional, physiological connection among the three neurons. In fact, we have discussed this issue as a first-order vs second-order coupling issue in our original manuscript (Line 632-639), and concluded that electrical signals hardly pass through the second-order gap junctions in LC, that is, in those two connections sharing the same partner like A↔B↔C (here A and C are not directly connected, but coupled in the second-order), spikes in A hardly pass-through B and later appear in C (Line 632-639).

      Reviewer #2 (Public Review):

      McKinney et al set out to better understand local circuit organization within the mouse locus coeruleus (LC). To do so, the authors achieved the technical feat of performing multiple, simultaneous whole-cell recordings (up to 8 LC neurons at once). This approach gives the authors a powerful and relatively high throughput means of assessing LC neuronal activity and potentially its rate of interconnectedness. In addition to recording from these cells, many were also filled with biocytin to recover their morphology. Using traditional reconstruction approaches the authors identified two morphological classes of LC neurons, fusiform(FF) and multipolar (MP). Although the selection of these classes was biased from previous literature, the authors used machine classification to rigorously demonstrate that these classes indeed exist. From there, the electrical properties of these distinct LC neurons were compared and a number of distinct action potential properties were identified between the two groups. Although firing in response to injected current showed that the FF class could maintain a higher firing rate, basal firing was not explicitly compared as the cells were prevented from firing upon entering whole-cell. The authors next explored the extent to which local chemical transmission occurs within the LC. Although there is evidence of glutamatergic transmission from LC neurons, the authors did not directly observe any evidence of local glutamate release from these neurons. This effect might be expected given the severing of axons in the slice preparation. Somewhat less expected is the author's claim that they could not find evidence of local NE release via alpha2 adrenergic receptor activation. This lack of evidence might well arise because this phenomenon does not occur, but it also remains possible that we do not have sufficient understanding of volume transmission to properly detect a change, particularly in whole-cell current clamp. The evidence that alpha2-mediated hyperpolarization is intact is somewhat adjacent to the concept as the concentrations of NE and clonidine used to show this robust suppression of firing is well above what is likely physiologically released by these neurons. One thing the authors do not consider is that the slice orientation (horizontal vs. coronal) greatly alters local glutamatergic input to the point that glutamate-mediated phasic bursts often do not occur in horizontal slices.

      A major strength of the multi-patch approach used here is the ability to identify electrical connections between LC neurons. While gap junction-coupling has long been established in these neurons, multiple reports suggest that this coupling is decreased as the animal matures into adulthood. Here the authors provide clear evidence for a stable rate of electrical coupling well into adulthood. This approach also gives the authors the relatively unique ability to look for second-order connections between LC neurons and the amount of coupling was elegantly used to model how the LC might wire together more broadly. Although this approach is very powerful and likely at the edge of what is physically possible for whole-cell recordings in this brain structure it still likely undersamples LC local circuitry and biases investigations to be relatively close to one another spatially. While the authors rightfully consider the intersoma distance (ISD), the longest the gets in these studies is still smaller than most anatomical axes of the LC. This is an important limitation because the electrical coupling between FF-FF and FF-MP both appear to increase as ISD increases, suggesting more coupling could be occurring in distal dendrites. Furthermore, if coupling is occurring in distal dendrites it may be harder to detect as shunting in these distal dendrites could prevent signal detection.

      This work is timely and important to the LC field which is on the precipice of having a greater understanding of heterogeneity based on a number of different principles, and this work adds local circuit dynamics as one of these principles. It will be important for the field to see how different efferent anatomical modules align with the cell types and circuit properties identified here.

      We appreciate the reviewer’s constructive comments and suggestions.

    1. Author Response:

      Reviewer #2 (Public Review):

      This study addresses the ways in which bacteriophages antagonize or coopt the DNA restriction or recombination functions of the bacterial RecBCD helicase-nuclease.

      The strength of the paper lies in the marriage of biochemistry and structural biology.

      A cryo-EM structure of the RecBCD•gp5.9 complex establishes that gp5.9 is a DNA-mimetic dimer composed of an acidic parallel coiled coil that occupies the dsDNA binding site on the RecB and RecC subunits. The structure of gp5.9 is different from that of the RecBCD-inhibiting DNA mimetic protein phage λ Gam.

      Cryo-EM structures of Abc2 are solved in complex with RecBCD bound to a forked DNA duplex, revealing that Abc2 interacts with the RecC subunit. A companion structure is solved containing PPI that copurifies with RecBCD•Abc2.

      Whereas the gp5.9 structure fully rationalizes the effect of gp5.9 on RecBCD activity, the Abc2 structure - while illuminating the docking site on RecBCD, a clear advance - does not clarify how Abc2 impacts RecBCD function.

      The authors speculate that Abc2 binding prevents RecA loading on the unwound DNA 3' strand while favoring the loading of the phage recombinase Erf.

      Does the structure provide impetus and clues for further experiments to elaborate on that question and, if so, how?

      Regarding the first point (Murphy’s results). We have now included more detail about Murphy’s results and a brief comparative discussion of our own (page 13). An important caveat in interpreting small (<5-fold) effects on RecBCD activity is that the complex is known to possess different levels of specific activity between preparations (from 20% to 100% active based on titration of DNA ends). This is especially problematic when assessing the effect of Abc2 on RecBCD because (unlike gp5.9 for instance) the protein cannot be purified in isolation and titrated into free RecBCD to monitor how activity changes. Instead, one is comparing activity between different preparations either including Abc2 or not. Regarding the second point (how much does the structure tells us about the mechanism of Abc2?). We agree with the general sentiment here: the mechanism of RecBCD hijacking by Abc2 is still a “work in progress”. Nevertheless, the structure is suggestive of effects on Chi recognition and/or RecA loading which is both consistent with biochemical results and suggests new avenues for further investigation.

      While the RecBCD-gp5.9 structure “nails” the inhibition mechanism as steric exclusion of substrate, the RecBCD-Abc2 structure is less informative. Previously published biochemical and in vivo analyses of Abc2 show that it modulates rather than completely inhibits the enzyme. The hypothesis is that Abc2 modifies the process of Chi recognition and/or RecA loading (which are themselves coupled processes) in order to facilitate loading of the phage recombinase Erf. Given current structural models for the mechanism of RecBCD, it is not entirely obvious from the structure of RecBCD-Abc2 what exactly this small phage protein is doing, because (a) there is no significant change to the structure of RecBCD induced by Abc2 interaction and (b) no known protein interaction site (eg with RecA) is blocked. Indeed, our original manuscript ended with an acknowledgement that understanding how P22 controls recombination in E. coli was ongoing work. As we see it, in addition to simply revealing the binding site of Abc2, our structure has two significant impacts. Firstly, it is consistent with and extends the existing hypothesis. For example, (a) the interaction of Abc2 with RecC is precisely with the domains of the protein that are responsible for Chi recognition and close to a putative site of RecA loading; (b) the recognition that a conserved proline in Abc2 interacts with the active site of PPI implies that Abc2 function is dependent on proline isomerisation; (c) the possible bridging of RecB and RecC by the C-and N-terminal regions of the protein suggest that Abc2 might hinder intersubunit conformational changes. Secondly, the structure facilitates the testing of this hypothesis. For example, (a) does RecA and/or Erf loading depend on interactions with the surfaces destroyed or created by Abc2 at the interface with RecC (b) does P68A mutation inactivate Abc2?; (c) does failure to recognise and respond to Chi require bridging of RecB and RecC that limits conformational transitions? Crucially, as we explain in the discussion, the future study of the P22 recombination system will require the purification and characterisation of additional factors (Abc2, Arf and Erf) beyond just Abc2. This is something we are working on currently in the lab and consider to be beyond the scope of this work.

    1. Author Response

      We thank the reviewers for their comments and helpful suggestions. We are currently preparing a revised version of this manuscript. Notable changes we are making include:

      • adding a diagram to the introduction to show the overall workflow of the study,
      • quantitatively analyzing the fraction of OCT4+ and DDX4+ cells in our immunofluorescence images over time,
      • collecting and analyzing additional bulk RNA-seq data on KGN cells and adult human ovarian tissue,
      • performing estradiol assays on additional lines of hiPSC-derived granulosa-like cells,
      • presenting images from day 70 ovaroids which clearly show follicle formation,
      • changing the colors in the figures to be more accessible to colorblind readers,
      • clarifying which TFs are present in which of our clonal lines.

      These changes will address the weaknesses identified by the reviewers. Along with our revised manuscript, we will also prepare a more comprehensive author response for these reviewer comments.

    1. Author Response:

      Reviewer #1 (Public Review):

      This paper reports an analysis of the inhibition of the serotonin transporter, SERT, by a novel compound, ECSI#6. The authors perform a comprehensive analysis of SERT transport inhibition for the new agent and compare its properties to those of other well-characterized agents: cocaine and noribogaine, with the data pointing to an unusual noncompetitive mechanism of inhibition, a model also supported by electrophysiological recordings of transport currents. Based on the results of these experiments the authors conclude that ESCI#6 binds essentially exclusively to the inward-facing state of the transporter. The authors further present experiments suggesting that ESCI#6 can stabilize the folded form of an ER-arrested SERT mutant and recover its trafficking to the plasma membrane, with some in-vivo drosophila experiments perhaps also supporting this conclusion. Finally, kinetic simulations using a transport model with rate constants from previous experiments support the basic conclusions of the first sections of the paper.

      Strengths:<br /> The transport experiments and simulations here are thorough, carefully performed, and reasonably interpreted. The authors' arguments for noncompetitive inhibition seem well-thought-out and reasonable, as is the conclusion that ESCI#6 binds to the inward-facing state of the transporter. The simulations are also thorough and support the conclusions. In the discussion, the comparison of enzyme noncompetitive inhibition to the process studied here was thoughtful and interesting. Also, the care and analysis of the uptake data are a strength of the paper, with well-presented evidence of reproducibility and statistics. The electrophysiology data is more limited but does communicate the essential conclusion.

      Weaknesses:<br /> The most important concern about the work is the interpretation of the in-vivo drosophila data. Though the SERT fluorescence with WT protein is strong, I cannot see any fluorescence in either drug-treated image from the PG mutant. In this context, shouldn't there be additional intracellular staining for ER-resident SERT? If the cell bodies of these cells are elsewhere this should be clearly pointed out.

      We have modified Fig. 6 to include, in all instances, images of the posterior brain, where the neurons (FB6K) reside, from which the serotonergic projections originate. These images visualize expression of membrane-anchored GFP (mCD8GFP; in panel B), immunoreactivity of serotonin (panel B’), wild type SERT (panels C’,D’,E’) and mutant SERT-PG601,602AA (panels F’,G’,H’) in the soma. The description of these panels has been added to the pertinent sentences starting on p. 20, line 6 from bottom to the end of end of the first paragraph p. 21, which read:

      “These projections (Fig. 6A-A’’) and the FB6K-type neurons, from which they originate in the posterior brain (Fig. 6B-B’’) can be visualized by expressing membrane-anchored GFP (i.e. GFP fused to the C-terminus of murine CD8; [36]) under the control of TRH-T2A-Gal4. Similarly, when placed under the control of TRH-T2A-Gal4, YFP-tagged wild-type human SERT was expressed in the FB6K-type neurons (Fig. 6C’) and delivered to the fan-shaped body (Fig. 6C). In contrast, in flies harboring human SERT-PG601,602AA, the transporter was visualized in the soma of FB6K-type neurons (Fig. 6F’), but the fan-shaped body was devoid of any specific fluorescence (Fig. 6F). However, if three-day old male flies expressing human SERT- PG601,602AA were fed with food pellets containing 100 μM ECSI#6 or 100 μM noribogaine for 48 h, fluorescence accumulated to a level, which allowed for delineating the fan-shaped body (Fig. 6G and H, respectively). This show that ECSI#6 and noribogaine exerted a pharmacochaperoning action in vivo, which partially restored the delivery of the mutant transporter to the presynaptic territory. As expected, in flies harboring wild-type human SERT, feeding of ECSI#6 and noribogaine did not have any appreciable effect on the level of fluorescence in the fan-shaped body (Fig. 6D and E, respectively). “

      Similarly, the single Western blot demonstrating enhanced glycosylation in the presence of Noribogaine or ECSI#6 could be strengthened. I can see the increased band at a high molecular weight that the authors attribute to the fully glycosylated form, but this smear, and the band below, look quite different from those in the blot shown in the El-Kasaby et al reference, raising concerns that the band could be aggregated or dimerized protein rather than a glycosylated form. This concern could easily be addressed by control experiments with appropriate glycosidases, as shown in the reference.

      We understand that the appearance of the mature glycosylated species is being criticized, at least in part, because it differs from sharper bands, which can be found in our previously published papers. We stress that the resolution very much depends on the electrophoretic conditions. We addressed the reviewers’ criticism by carrying out the recommended deglycosylation experiments: a representative experiment is shown in (the new) panel F of Fig. 5, with lysates prepared from HEK293 cells expressing wild type SERT, from untransfected HEK293 cells and from HEK293 cells, which had been preincubated with 30 μM cocaine, 100 μM ECSI#6 and 30 μM noribogaine. The experiment confirms the band assignment with the upper band(s) M representing the mature glycostylated species (which are resistant to deglycosylation by endoglycosidase H) and the lower band C corresponding to the core- gylcoylated species (which are susceptible to cleavage that (as expected) the mature band show a representative degylcosylation by endoglycosidase H). We also think that the immunoblot in panel F ought to satisfy the aesthetic criticism: the bands are sharper/less smeared.

      The description of panel F can be found on p. 18, starting in line 7 from bottom to end of page, and reads: “We confirmed the band assignment by enzymatic deglycosylation (Fig. 5F): the upper bands (labeled M), which appeared in cells incubated in the presence of ECSI#6 and of norbogaine, were resistant to deglycosylation by endoglycosidase H (which cannot cleave mature glycans). In contrast, the core-glycosylated species (labeled C), was susceptible to cleavage by endoglycosidase H resulting in the appearance of the deglycosylated band D.”

      The overall interest in the work is reduced given the quite low affinity of ECSI#6 for the transporter.

      We agree that it would be preferable to have a compound, which works in the submicromolar/nanomolar range. However, it is worth pointing out that the EC50 is low enough for allowing in vivo rescue of the folding-deficient SERT-PG: feeding flies restores its trafficking to the cell surface and to the presynaptic specialization. Obviously, there is room for improvement, but ECSI#6 provides a starting point.

      Reviewer #3 (Public Review):

      This is interesting research that uncovers a novel inhibition mechanism for serotonin (SERT) transporters, which is akin to traditional un-competitive inhibitors in enzyme kinetics. These inhibitors are known to preferentially bind to the enzyme-substrate complex, thus stabilizing it, resulting in a decrease of the IC50 with increasing substrate concentrations. In contrast to this classic enzyme inhibition mechanism, the authors show for SERT, through detailed kinetic analysis as well as kinetic modeling, that the inhibitor, ECSI#6, binds preferentially to the inward-facing state of the transporter, which is stabilized by K+. Therefore, inhibition becomes "use-dependent", i.e. increasing substrate concentrations push the transporter to the inward-facing configuration, which then leads to the increased apparent affinity of ECSI#6 binding. Interestingly, this mechanism of action predicts that the inhibitor should be able to rescue SERT misfolding variants. The authors tested this possibility and found that surface expression and function of a misfolding mutant SERT is increased, an important experimental finding. Another strength of the manuscript is the quantitative analysis of the kinetic data, including kinetic modeling, the results of which can reconcile the experimental data very well. Overall, this is important and, in my view, novel work, which may lead to new future approaches in SERT pharmacology.

      With that said, some weaknesses of the manuscript should be mentioned. 1) The authors suggest that serotonin and ECSI#6 cannot bind simultaneously to the transporter, however, no direct evidence for this conclusion is provided.

      We assessed this point by plotting the data in Fig. 2A,B,C as Dixon plots in (the new) panels D,E,F of Fig. 2. We refer the reader to Segel’s textbook on enzyme kinetics (new ref. 18) on using Dixon plots in the presence of two inhibitors. The pertinent description is on p. 9, lines 12-22 and reads as follows: “We transformed the data summarized in Figs. 2A-C by plotting the reciprocal of bound radioligand as a function of inhibitor concentration to yield Dixon plots (Fig. 2D-F): the x-intercept corresponds to -IC50 of the inhibitor [18]. Thus, Dixon plots allow for differentiating mutually exclusive from mutually non-exclusive binding, if one inhibitor (i.e., cocaine, noribogaine or ECSI#6) is examined at a fixed concentration of the second inhibitor (i.e., serotonin) [18]: if binding of the two inhibitors is mutually non-exclusive, a family of lines of progressively increasing slope, which intersect at -IC50, is to be seen. In contrast, if the two inhibitors bind to the same site, the slope of the inhibition curves is not affected and the x- intercept (i.e, -IC50 of the variable inhibitor) is shifted to more negative values. It is evident from Fig. 2D-E that the presence of 1 and 10 μM serotonin progressively shifted the (expected) x-intercept for cocaine (Fig. 2D), noribogaine (Fig. 2E) and ECSI#6 (Fig. 2D). Thus, binding to SERT of serotonin and of these three ligands was mutually exclusive.” Based on the Dixon plots, we feel that our conclusion is justified, i.e., binding of serotonin and ECSI#6 (and of the other ligands) is mutually exclusive.

      2) How does ECSI#6 access the inward-facing binding site? Does it permeate the membrane and bind from the inward-facing conformation, or is it just a very slowly transported low-affinity substrate that stabilizes the inward-facing state with much higher affinity? Including ECSI#6 in the recording electrode may provide further information on this point.

      We did the suggested experiments: the data are summarized in panel I of Fig. 4 and described in the first paragraph on p. 15, which also explains why this experiments is possibly inconclusive due to the high diffusivity of ECSI#6:

      “Fig. 4I shows representative traces of 5-HT induced currents recorded from SERT expressing cells in the absence (in blue) and presence of 100 μM ECSI#6 (in red) in the electrode solution: when thus applied from the intracellular side, ECSI#6 did not cause an appreciable current block. The right-hand panel summarizes the current amplitude obtained from cells measured in the absence (blue open circles) and presence of intracellular ECSI#6 (open circles in red). These data seem to indicate that ECSI#6 binds to SERT from the extracellular side. Yet this conclusion can be challenged based on the following consideration: in earlier experiments, ibogaine, the parent compound of noribogaine, was found to block HERG channels when applied from the bath solution but failed to do so when added to the electrode solution [27]. However, at a lower intracellular pH (i.e., pH 5.5), ibogaine gained the ability to inhibit HERG from the intracellular side (i.e., via application through the electrode). Conversely, ibogaine was less effective when applied to an acidified bath solution. These observations led to the conclusion that ibogaine blocked HERG from the cytosolic side: because the molecule in its neutral form was so diffusive, a low intracellular pH was required to force its protonation and thus preclude diffusion from the interior of the cell. ECSI#6 is presumed to also be very diffusible given its estimated logP value and polar surface area of 2.48 and 66 Å2, respectively. However, ECSI#6 harbors an amide nitrogen (see Fig. 1A) and thus remains neutral in the experimentally accessible pH range. Hence, it is not possible to verify to which side of SERT it binds.”

      Additionally, it is not clear why displacement experiments were not carried out with cocaine. Since cocaine is a competitive inhibitor but does not induce transport (i.e. doesn't induce the formation of the inward-facing conformation), it should act in a competitive mechanism with ECSI#6.

      We did not quite understand this comment, because displacement experiments were done with cocaine (Fig. 2A, new Fig 2G/previous Fig. 2D). However, if the reviewer questions why we do not use cocaine rather than 5-HT, in the three-way competition experiment, it is precisely, because we wanted to compare the action/binding mode of ECSI#6 to that of cocaine.

      3) Why are dose-response relationships not shown for electrophysiological experiments? These would be a good double-check for the radiotracer flux data.

      These experiments were done and are shown in (the new) panels G and H of Fig. 4; the pertinent description is in the second paragraph of p. 14 and reads:

      “The protocol depicted in Fig. 4B can also be used to gauge the apparent affinity of ECSI#6 for SERT in the presence of 5-HT. Plotted in Fig. 4G is the block of the serotonin-induced current as a function of the co-applied ECSI#6 concentration. The current was evoked by a saturating concentration of 5-HT (30μM) and inhibited by 3, 10, 30 and 100 μM co-applied ECSI#6, respectively (the inset in Fig. 4G shows representative current traces). A fit of an inhibition curve to the data points yielded an IC50 value of approx. 5 μM. This value was lower but still in reasonable agreement, with the IC50 obtained in the radioligand uptake assay for the condition where the 5-HT concentration had been saturating (cf. dashed line in Fig.1C; 10 μM 5-HT). In the uptake assay the IC50 value of ECSI#6 dropped to about 0.5 mM, in the presence of a low 5-HT concentration (i.e., 0.1 μM). In contrast to uptake experiments, electrophysiological recordings also allow for assessing the apparent affinity of ECSI#6 for SERT in the absence of the substrate. This can be achieved by employing the protocol depicted in Fig. 4H (see representative current traces on the left-hand side): we first applied 30 μM 5- HT to a cell expressing SERT for 0.5 s to elicit a peak current (i.e., a control pulse). We then reapplied 30 μM 5-HT after a superfusing the cell with 100 μM ECSI#6 for 1 s (second upper trace in panel H). We chose this time period because it had been sufficient to allow for full current block in the other protocol (see Fig. 4G): the amplitude of the peak current following pre-application of 100 μM ECSI#6 was essentially identical to the prior control pulse. When we pre-applied 100 μM ECSI#6 for a longer period (i.e., 3 s) the amplitude of the two peak currents also remained the same (cf. lower traces in panel H). The right-hand panel shows the summary of several experiments. Plotted in the graph is the ratio of the second and first pulse, which was always close to one. We previously used this protocol to assess the binding kinetics of cocaine, methylphenidate and desipramine on SERT and DAT. Pre-application of these inhibitors consistently led to a concentration dependent reduction in the peak current amplitude of the second pulse in comparison to the first [23]. The lack of inhibition, thus, indicates that the affinity of ECSI#6 in the absence of 5-HT is low. To obtain estimates for the affinity of ECSI# for SERT in the absence of 5-HT we would need to apply this compound at much higher concentrations. This, however, is not possible, because ECSI#6 is poorly soluble in aqueous solutions (i.e., max. 0.03 mg/ml).”

    1. Author Response

      Reviewer #1 (Public Review):

      The authors succeeded in fitting their Jansen-Rit model parameters to accurately reproduce individual TEPs. This is a major success already and the first study of this kind to the best of my knowledge. Then the authors make use of this fitted model to introduce virtual lesions in specific time windows after stimulation to analyze which of the response waveforms are local and which come from recurrent circles inside the network. The methodological steps are nicely explained. The authors use a novel parameter fitting method that proves very successful. They use completely openly available data sets and publish their code in a manner that makes reproduction easy. I really enjoyed reading this paper and suspect its methodology to set a new landmark in the field of brain stimulation simulation. The conclusions of the authors are well supported by their results, however, some analysis steps should be clarified, which are specified in the essential revisions.

      We are delighted and flattered by the Reviewer’s positive evaluation of our work, and appreciation of our efforts to maximize its reproducibility. We wish also to thank the Reviewer for their compelling and interesting points, which we have addressed in full, and we believe further enhance the quality of the paper. Thanks again!

      Reviewer #2 (Public Review):

      Here the authors tackle the problem of identifying which parts of a TMS-evoked response are local to the stimulation site versus driven by reverberant activity from other regions. To do this they use a dataset of EEG recorded simultaneously with TMS pulses, and examine virtual lesions of a network of neural masses fitted to the data. The fitting uses a very recent model inversion method developed by the authors, able to fit time series directly rather than just summary statistics thereof. And it apparently works rather well indeed, at least after the first ~50 ms post-stimulus. I expect many readers will be keen to try this fitting method in their own work.

      We are delighted by the Reviewer’s appreciation and consideration of our paper. We have addressed the comments and revisions requested following the flow suggested by the Reviewer’s comments. We would take this opportunity to kindly thank the Reviewer for his/her contribution and for helping us to improve the manuscript.

      Reviewer #3 (Public Review):

      The manuscript is very well written and the graphics are quite iconic. Moreover, the hypothesis is clear and the rationale is very convincing. Overall, the paper has the potential of being of paramount importance for the TMS-EEG community because it provides a valuable tool for a proper interpretation of several previously published TMS-EEG results.

      Unfortunately, in my opinion, the dataset used to train and validate the method does not support the implication and interpretation of the results. Indeed, as clearly visible from most of the figures and mentioned by the authors of the database, the data contains residual sensory artifacts (auditory or somatosensory) that can completely bias the authors' interpretation of the re-entrant activity.

      We are most grateful to the Reviewer for their positive evaluation of our manuscript. We also sincerely appreciate all the comments and suggestions raised, and for contributing their evident expertise with TMS-EEG data towards the constructive improvement of this research. We hope the Reviewer will appreciate our efforts made to address their excellent points, and are pleased with the resultant strengthening of the paper.

    1. Author Response

      Reviewer #2 (Public Review):

      Wen et al. developed a useful tool for causal network inference based on scRNA-seq data. The authors comprehensively benchmarked 9 feature selection and 9 causal discovery algorithms using both synthetic data and real scRNA-seq data. Their conclusions regarding the performance of these algorithms on synthetic data are solid and valuable. I believe this tool or platform has the potential to help biologists discover novel cell type-specific signaling pathways or gene regulatory events since there is no prior knowledge (such as known pathway annotations) as inputs. However, several major concerns below need to be addressed to improve the paper.

      1) Current validation of the inferred causal networks using real scRNA-seq datasets seems quite simple and is not sufficient to support the accuracy and reliability of results. Annotations from the STRING database do not contain directions of edges among genes or proteins. However, the edge direction in the inferred network is a crucial aspect to explain the causal relationships. Besides using "spike-in" data, a systematic validation of the inferred network, especially the edge directions, should be provided.

      We have used the data of the five lung cancer cell lines and alveolar cells and the genes in several pathways (in which causal interactions are better annotated) in the KEGG and WikiPathway databases to validate network inference systematically. Please see the responses to the Essential Revisions (for the authors).

      2) In order to illustrate the novel discovery, CausalCell should be further compared to existing gene network construction methods based on scRNA-seq data such as SCENIC (Aibar et al. Nature Methods, 2017).

      (a) We have added a "TF=No/Yes" option to feature selection. If this option is ignored, feature selection is as before. If "TF=Yes" is selected, all feature genes are TFs. If "TF=NO" is selected, all feature genes are non-TFs. With this option, normally, two rounds of feature selection are performed. The first round ("TF=Yes" is selected) selects TFs as feature genes of a response variable (RV), and the second round ("TF=No/Yes" is ignored) selects feature genes as before (feature genes contain both TFs and non-TFs). The user selects genes from the results of two rounds to build input to causal discovery.

      (b) The networks inferred by SCENIC are TF-centered: each TF and its potential target genes form a regulon, it searches for genes co-expressed with a TF (through GENIE3/GRNBoost), and the union of all or some of the regulons forms a network. Thus, SCENIC helps uncover the "one TF->all targets" relationships. Different from SCENIC, this "TF=No/Yes" option provides a target-centered transcription regulation analysis and helps uncover the "all TF->one target" relationships (the target is the response variable). Thus, the two approaches are complementary. Feature selection based on the "TF=No/Yes" option also differs from SCENIC in that no predefined regulons (defined upon "cisTarget" databases) are needed.

      (c) We used SCENIC in our initial analysis of the young and old mouse CD4 T cells (see Figure 5 in Elyahu et al. 2019). In the re-analysis of tumor-infiltrating exhausted CD8 T cells, we find that the "TF=No/Yes" option helps uncover transcription regulation. For example, the transcription factor TOX is reported to regulate PDCD1 critically in mice. When we perform feature selection to identify feature genes of PDCD1, TOX is in the top 50 feature genes in the colorectal cancer dataset but not in the lung and liver cancer datasets (Supplementary file1:Table 1). To re-examine whether TOX critically regulates PDCD1 in the two latter datasets, we perform feature selection with "TF=Yes", and the results are that TOX is a top TF of PDCD1.

      3) The authors should also claim what type of the inferred causal network represent from the biological perspective (e.g. signaling networks or gene regulatory networks?).

      (a) Although methods have been developed specifically for inferring signaling and regulatory networks, whether a network is a signaling network or a gene regulatory network depends on the input data. Also, many proteins and noncoding RNAs function as complexes instead of individually in both kinds of networks, and RNA-seq and scRNA-seq data contain only transcripts. Thus, researchers must infer signaling and gene regulation in cells upon inferred networks.

      (b) The input to causal discovery can be (a) a target gene and its potential TFs, (b) a TF and its potential targets, (c) genes encoding both TFs and non-TFs. Thus, whether an inferred network is signaling or gene regulatory network depends on the input. We have made this clear in the Discussion.

      4) Besides edge direction, an important feature of CausalCell is the determination of edge sign (i.e. activation or inhibition). The authors should describe its related procedures.

      In the revised section "2.5 Causal discovery", we wrote, ""In all inferred causal networks, edges have a sign that indicates activation or repression and have a thickness that indicates CI test's statistical significance. The sign of the edge from A to B is determined by computing a Pearson correlation coefficient between A and B, which is ‘repression’ if the coefficient is negative or ‘activation’ if the coefficient is positive. In most cases, ‘A activating B’ and ‘A repressing B’ correspond to up-regulated A in the case dataset compared with down-regulated B in the control dataset."

      5) The authors did not provide an example of constructing a causal network between cells or cell types, although they mentioned its importance in the Abstract. Such intercellular network examples can distinguish the utility of CausalCell in single-cell data analysis from bulk data analysis.

      Constructing causal networks between cells is a quite different work. We delete this sentence in the manuscript because we are still working on it.

      6) If the control dataset is available, it is currently not clear whether batch effects of the query and control datasets will be removed in the data preprocessing step. Differentially expressed genes cannot be selected correctly if batch effects exist.

      Please see our responses to the second point of Essential Revisions.

    1. Author Response:

      Reviewer #1 (Public Review):

      This paper investigates waves in embryonic mouse retinas. These stage 1 waves have been studied less than the post-natal (stage 2) waves. The paper uses calcium imaging in whole retinas to determine the properties of the waves and their dependence on cholinergic and electrical synapses. A strength of the work is the ability to monitor waves over the entire retina at high resolution and weaknesses include reliance on pharmacology and some missing details in analysis.

      Reliance on pharmacology

      The results in the paper depend largely on pharmacological manipulations. Not enough consideration is given to the possible unintended effects of those manipulations. This is particularly true for the gap junction inhibitors. The Discussion brings up the possibility of such effects, but they need to come up much earlier. Is there anything else that can be done to mitigate concerns about the drugs - e.g. does MFA affect waves in Cx36 KO mice?

      We have added additional experiments based on whole cells recordings to address some off target effects of MFA but we do make note of the limitations of these new controls since we observed significant variability of voltage-gated conductances across RGCs at this age as well as the limited ability to maintain stable recordings for the requisite time to have within cell controls for MFA. (see Figure 2 Supplemental Figure 1).

      Over the years we have done several experiments assessing different Cx knockouts and retinal waves (e.g. F. Caval-Holme, et al, “The Retinal Basis of Light Avoidance in Neonatal Mice”, Journal of Neuroscience 42:2022; Blankenship A.G., et al “The role of neuronal connexins 36 and 45 in shaping spontaneous firing patterns in the developing retina, Journal of Neuroscience, 3, 2011). It appears that there are multiple connexins in RGCs and which regulate stage 1 retina waves beyond Cx 36 and Cx45 and therefore it is difficult to use these mice as controls for general gap junction antagonists.

      In the revision, we are more explicit about the caveats of using MFA both in the results (page 5) and discussion (page 10). Notably, we draw attention to past studies where we have done several controls regarding MFA and RGC activity in older retinas in addition to our more limited controls we were able to carry out in E16-E18 retina.

      Comparison of ACh receptor block and knockout mice

      The ACh receptor knockout mouse provides a useful alternative to the pharmacological block of ACh receptors. But different features are described in Figures 2 and 3, preventing direct comparison of the two.

      Our intention was not to use the knockout mice as an alternative to the pharmacological block since we knew that there are compensatory wave mechanisms in the knockout. Rather we are using the β2-nAChR-KO to establish the effectiveness of this KO as a means of testing the role of Stage 1 waves in developmental processes. We do hope the revised manuscript explains this motivation more clearly.

      A related point is the apparent increased role of gap junctions in mediating waves in the absence of ACh receptors. On this point, the description of the effect of MFA (page 8, second paragraph, 3rd sentence) was confusing. It looks to me like MFA almost completely eliminates waves in both WT and KO - so the connection to an altered role of gap junctions was not clear.

      We clarified our description of the MFA result (page 5):

      Application of the gap junction blocker meclofenamic acid (MFA, 50μM) nearly abolished Stage 1 waves, causing a significant reduction in frequency of waves and cell participation during waves (Fig 2A & 2F).

      ipRGC densities

      The goal of the measurements of ipRGC densities was not entirely clear. Why are ipRGCs a reasonable way to determine the importance of waves for development? For example, the introduction raises the issue that changes in RGC proliferation depend on RGC type. Is there reason to think ipRGCs are "special" or, alternatively, that they are following the same developmental trajectory as other RGCs? Is it possible to track another RGC type (e.g. using SMI32 staining)? Related to this general point, page 9 (top) sets up the need to identify the mechanism of RGC cell death but then jumps to waves without a clear connection between the two. It would also be good to mention early that the measurements include multiple ipRGC types, so that issue does not come up only as an explanation for why the ipRGCs are not organized spatially (page 10 top).

      We have revised text extensively to better motivate our selection of ipRGCs (page 6). Our goal was to use an identified differentiated RGC subtype for which we had genetic access to assess the impact of reduced retinal waves on cell number. We settled on ipRGCs because: 1) ipRGCs undergo a significant amount of cell death during the same period there are retinal waves (Chen et al, Elife 2013) and 2) we show ipRGCs participate in retinal waves.

      Analysis

      Quantitative analysis of the calcium measurements relies on the discretization of the signals measured in small ROIs. It was not clear how closely the discretized signals represented the original recordings. The traces illustrated in Figures 1 and 2, for example, appear to be measured in larger ROIs. Two things would be helpful here: (1) an illustration of several original recorded traces in the small ROIs plotted with the discretized versions of those traces; (2) a determination of how sensitive the results are to specifics of the discretization process.

      We have modified Figure 1 to include example traces of the fractional change in fluorescence computed across the small ROIs used for the analysis of waves on the macroscope. They are at the top of Figure 1B. As can be seen by these traces, the signal-to-noise is fantastic.

      Reviewer #2 (Public Review):

      The overall goal of this study is to determine the mechanism of early retinal wave initiation and propagation. Despite a number of earlier studies, the precise mechanism of Stage1 waves and how they differ from Stage 2 waves remained controversial. To address this, the authors describe the timing and character of Stage 1 retinal waves using a custom build imaging system allowing for wide-field monitoring of neuronal activity while preserving high spatial resolution. In a set of elegantly designed experiments, they reveal that the initiation and propagation of Stage 1 waves are driven by distinct mechanisms involving complex circuitry of nAChRs and gap junctions. Interestingly, the data also demonstrate that Stage 1 and Stage 2 waves rely on different subtypes of AChRs. The signaling via beta2AChRs appears to be the driver of Stage 2 waves. However, the precise identity of nAChRs and GJs contributing to Stage 1 waves remains a mystery. Next, to determine the impact of early retinal waves on retinal circuit formation, the authors evaluate their impact on the survival of ipRGC. They show that ipRGC cell survival and their distribution mosaics are not influenced by spontaneous activity. While the observation of ipRGC data and their mosaic are interesting, the rationale for these experiments in the context of this study is not well presented.

      We thank the reviewer for positive comments. We do hope the revised rationale for ipRGC measurements addresses these comments. It is included here for convenience (page 7)

      RGCs undergo a period of dramatic cell death during the first two postnatal weeks of development, the majority occurring during the first postnatal week (Abed et al., 2022; Braunger et al., 2014). Whether this cell death process is regulated by retinal waves is unknown. We looked specifically at intrinsically photosensitive ganglion cells (ipRGCs) for several reasons. First, ipRGCs have completed proliferation (Lucas and Schmidt, 2019; McNeill et al., 2011) and appear to be fully differentiated by E16 (Shekhar et al., 2022; Whitney et al., 2022), the onset of Stage 1 waves. ipRGCs undergo a period of dramatic cell death during the first two postnatal weeks of development, the majority occurring during the first postnatal week, prevention of which profoundly disrupts several important developmental processes in the retina – including spacing of ipRGC somas as well as rod and cone mediated circadian entrainment through the activation of ipRGCs (Chen et al., 2013). However, the exact mechanism regulating ipRGC cell death is unknown. Here we assessed the impact of disrupting Stage 1 and Stage 2 waves on the number and distribution of ipRGCs.

      Reviewer #3 (Public Review):

      The manuscript by Voufo et al. aims to advance our understanding of the mechanisms responsible for the earliest pattern of spontaneous activity in the mouse retina, stage I retinal waves. These waves occur during embryonic development (E16-18) and are the least known form of activity in the immature retina.

      The authors show that stage I waves have broad spatiotemporal features and are mediated by circuitry involving subtypes of nicotinic acetylcholine receptors (nAChRs) and gap junctions. The authors also found that the developmental decrease of intrinsic photoreceptor retinal ganglion cells (ipRGCs) density is preserved between control and ß2-nAChR-KO mice, indicating that processes regulating ipRGC distribution are not influenced by early spontaneous activity.

      The quality of the data is excellent, and the conclusions of this paper are mostly well supported by data, but the presentation of the data and the analysis need to be clarified and extended.

      Strengths:

      The earliest patterns of spontaneous activity are crucial for the correct development of sensory circuits. In the visual system, most studies focus on postnatal activity (stage 2 and 3 retinal waves) overlooking embryonic stages, likely due to challenges related to methods and animal handling. Therefore, in this manuscript, the authors from a laboratory pioneer in studying retinal waves in the mouse, tackle a very relevant subject that has not been explored in detail. The bibliography that encompasses most of the current knowledge about stage 1 retinal waves in mammals is compressed into three fairly dated publications: Galli and Maffei 1988, Bansal et al 2000, and Syed et al 2004. These publications were pioneering attempts to describe early spontaneous activity; however, much work remained to be done regarding the molecular and cellular mechanisms involved. Here, Voufo and colleagues provide additional fundamental details about the properties and components of stage 1 waves. The dataset has excellent quality and plenty of information could be extracted from it. The authors used a macroscope that allows the acquisition of images from the entire retina while preserving a good spatial resolution.

      Weakness:

      The authors distinguish different subtypes of activity during embryonic stages in the retina of mice. However, they do not provide a detailed characterization that allows a clear definition of these subtypes (and specifically stage 1 waves). Moreover, throughout the manuscript, there are many technical details of the analysis that are missing and preclude a complete understanding of the robustness of the data. The authors have an excellent dataset that needs more analysis and an improvement in the presentation of the results.

      We do hope the extensive revisions satisfy reviewer.

    1. Author Response

      Reviewer #1 (Public Review):

      Ciliary length control is a basic question in cell biology and is fascinating. Regulation of IFT via calcium is a simple model that can explain length control. In this model, ciliary elongation associates with an increase in intraciliary calcium level that leads to calcium increase at the ciliary base. Calcium increase acts to reduce IFT injection and thus ciliary assembly rate. The longer the cilia, the more increase of calcium level and the more reduction of IFT injection and thus the ciliary assembly rate. When the cilia approach the genetic defined length, the gradual reducing assembly rate eventually balances the constitutive disassembly activity. Cilia then stop elongation and a final length is achieved. This work tested this model by manipulating the calcium level in cilia by using an ion channel mutant and treatment of the cells with EGTA. In addition, IFT injection was measured before and after calcium ciliary influx. Based on the outcome of these and other experiments, it was concluded that there is no correlation between changes in calcium level and IFT injection, thus challenging the previous model. This work is well written and the experiments appear to be properly executed. It nicely showed an increase of intraciliary calcium during cilia elongation, and beautifully showed that ciliary calcium influx depends on extracellular calcium. However, I felt the current data are inadequate to support the author's conclusion.

      We thank the reviewer for the positive assessment of the interest in our work, and we have performed additional experiments to address the reviewers concerns as discussed below.

      The authors showed that ciliary calcium increases along with ciliary elongation, which correlates with reduction of IFT injection. Thus, this result would support that calcium increase reduces IFT injection. To test whether reducing calcium influx would alter the IFT injection, the authors used an ion channel mutant cav2. Indeed, ciliary calcium level in the mutant cilia appears to be lower compared to the control in average. After measuring ciliary calcium level and IFT injection during ciliary elongation with mathematical analysis, it was concluded that reducing ciliary calcium level did not lead to increased IFT injection, which is distinct from the control cells. Thus, the authors concluded that calcium does not act as a negative regulator of IFT injection. However, if one examines the calcium flux in Figure 3B and IFT injection in Figure 4B of cilia less than 6 micron, one may draw a different conclusion. For the mutant cilia, the calcium influx is higher than that in control cilia and IFT injection is reduced compared to the control. Thus, this analysis is the opposite of the authors' conclusion, and is supporting the previous model. There is a rapid change in ciliary assembly rate at the early stages of ciliary assembly (see Figure 1C), thus, the changes in calcium influx and IFT injection in the earlier assembly stage would be more appropriate to assess the relationship between intraciliary calcium level and IFT injection.

      We thank the reviewer for raising this issue, which led us to examine the data more carefully. In looking at the numbers of cells with flagella in each length range, we became concerned that the apparently low calcium influx in shorter flagella in control cells compared to ppr2 or EGTA treatment might actually due to bias from technical issues: it is relatively difficult to image shorter flagella in our TIRF imaging setup, because shorter flagella have less flagellar surface area to attach the coverslip. The more motile the flagella are, the more likely are the cells to detach when their flagella are short, because the bending force of the flagella is strong enough to pull them away from their small area of adhesion. This effect is much stronger in control cells than in either the ppr2 mutants or EGTA treated cells, whose flagella are less motile. This led to a reduced number of cells examined with flagella shorter than 6 um (17 versus 34 for control and ppr2 cells, respectively). To overcome the difficulties and biased result, we observed more flagella in control cells. The new data has now been integrated with our previous data and shown in Figure 3. The new result shows that calcium influx in control cells is in fact higher than in the ppr2 mutant cells. So, our result is remains consistent with our conclusion, and we believe that it is not useful to analyze the shorter flagella separately.

      The authors used EGTA treatment to support their conclusion. However, EGTA treatment may induce a global calcium change of the cell, the outcome may not reflect actual regulation of IFT injection by ciliary calcium influx. For example, as reported elsewhere, the change of cAMP level in the cell body and cilia has a different impact on ciliary length and hedgehog regulation. The slower assembly of cilia in EGTA treated cells may be caused by many other factors instead of sole regulation by IFT.

      It is certainly possible that EGTA is affecting some process inside the cell that then indirectly affects IFT. Our experiments cannot rule this out. The fact that similar effects are seen with the ppr2 mutant argues against this idea, but again cannot rule it out. We have added the following caveat to the discussion:

      "Other calcium dependent processes in the cytoplasm might also potentially address IFT, and our results cannot rule out this possibility. However, we note that the ppr2 mutant also fails to show the effect on IFT or regeneration predicted by the ion current model."

      The authors only examined the impact of reducing ciliary calcium influx. To further support the authors' conclusion, it is recommended that the authors should examine IFT injection in a condition where ciliary calcium level is increased. Using calcium ionophore may not be a good choice as it may change the global calcium level. One approach to consider is using mutants of a calcium pump present in cilia.

      We thank the reviewers for this suggestion. The calcium current model would predict that if a calcium pump mutant failed to export calcium, the increased calcium building up inside the flagellum should lead to decreased IFT entry and a shorter flagellar length. We found at least two calcium pumps in the published Chlamydomonas flagella proteome (Pazour et al., 2005) and ordered several mutant strains from Chlamydomonas Library Project (CLiP) which are annotated as affecting these pumps. We measured the flagellar length of these potential calcium pump mutant strains, but none showed a statistically significant difference in length relative to control cells. We have now included this data as Figure S4. Because no length change was observed, we did not perform the extremely time consuming process of constructing strains that contain these mutations along with DRC4-GCaMP and KAP-GFP.

      As an alternative strategy to get at this reviewer's suggestion, we measured DRC4-GCaMP and KAP-GFP intensity in 1 mM CaCl2 treated flagella and found that CaCl2 treatment increases both the flagellar calcium level (Figure 3, see below) and IFT injection (Figure 4). This increase in IFT injection is the opposite of what the calcium current model predicts.

      Based on these results, we think the calcium pump experiment is not necessary because of the following reasons. 1. These calcium pump mutants might not increase the flagellar calcium level. 2. Even if the flagellar calcium was increased in these mutants, it does not affect the flagellar length and thus our conclusions would still hold. 3. These mutant strains might still have functional calcium pumps since the existing data on calcium pumps in flagella is likely to be incomplete. 4. The CaCl2 experiment clearly increased the flagellar calcium level inside flagella, directly addressing the point that the reviewer is getting at.

      The conclusion on line 272-273 may need more evidence. The authors showed that addition of 1 mM CaCl2 does not change ciliary assembly, and used this as one of the evidences to argue against the ion-current model. The addition of calcium extracellularly may not alter intracellular/intraciliary calcium level given that cells have robust systems to control calcium homeostasis. To support the authors' conclusion, one should measure the changes of calcium level in the cell/cilia or revise their conclusion.

      We have now performed these measurements and have included the data in Figure 3D.

      The authors showed nicely the changes in IFT properties before, during and after ciliary calcium influx and found that the intensity and frequency of IFT do not have a correlation with calcium influx though calcium influx restarts paused IFT trains for retrograde transport as previously reported (Collingride 2013). The authors again concluded that this is supporting their conclusions in that there is no correlation between IFT injection and calcium influx. However, I am not sure whether the short pulses of calcium influx at one time point would change the calcium level in the whole cilia in a significant way that would alter IFT injection at the ciliary base.

      We agree that individual pulses might not have an effect on the average level of IFT injection. We were specifically trying to see if, having previously ruled out the predicted correlation at the level of average rates, there might still be a trace of the correlation for individual events.

      Reviewer #2 (Public Review):

      The authors use a genetically encoded calcium indicator to measure Ca in flagella to establish that Ca influx correlates with flagellar length. (Despite this correlation, there is so much noise that it is dubious that Ca level can regulate the flagella's length.) Then, they show that reduced Ca decreases the rate of IFT trains entering flagella, which ruins the ion-current model of regulating flagella's length. (Ca can still be one of the factors that sets the target length.) Ca does not seem to change the disassembly rate either. There are also no correlations between Ca influx spikes and IFT injection events. Curiously, these spikes broke pauses of retrograde IFT trains, but that still did not affect IFTs entering dynamics.

      Some other possibilities like Ca regulating unloading rates are discussed and convincingly rejected.

      The study ends with an interesting Discussion, which talks about other possible models, and concludes that the only model not easily rejected so far is the mechanism relying on diffusion time for kinesins from flagella to the cell body being greater in longer flagella.

      The paper is well written, very thorough, contains significant results.

      We thank the reviewer for this strong positive assessment.

      Reviewer #3 (Public Review):

      This work by Ishikawa et. al is focused on testing the hypothesis first proposed by Rosenbaum that Ca2+ levels in the primary cilia act as an internal regulator of cilia length by negatively regulating intraflagellar transport (IFT) injection and/or microtubule assembly. The authors first built a mathematical model for Ca2+ based regulation of cilia length through the activity of a Ca2+ dependent kinase. They then tested this model in the growing cilia of Chlamydomonas cells expressing an axonemal localized GCaMP. Ca2+ levels were manipulated genetically with a calcium channel deficient mutant line and with the addition of EGTA. While increases in Ca2+ levels do correlate with cilia length as expected by the model they found that IFT injection was positively correlated with IFT injection and increased axonemal stability which contradicts its potential as a mechanism for the cell to internally regulate cilia length.

      Overall the conclusions of the paper are supported by their data. They greatly benefit from first establishing their model in a clear form and then experimentally interrogating the model from multiple angles in order to test its viability. The importance of cilia length to our understanding of human health has only become greater in recent history and the authors are making a significant contribution to our understanding of ciliary length regulation.

      We thank the reviewer for this positive assessment, including of the relevance of the model. We have attempted to address all suggestions.

    1. Author Response

      Evaluation summary

      This important study advances our understanding of respiratory complex I. The authors present convincing structural data for the enzyme from Drosophila melanogaster although the interpretation of conformational states is still not conclusively settled. This work will be of interest to researchers studying respiratory enzymes, the evolution of respiration, and mitochondrial diseases.

      Thank you for this positive evaluation of our work. Although we have presented a robust and coherent interpretation of the conformational states we observe, we accept that different opinions on this topic still exist in the field.

      Reviewer #1 (Public Review):

      Agip et al. have resolved the first cryoEM structure of the mitochondrial Complex I from Drosophila melanogaster, an important model organism in biology. The structure revealed a 43-subunit enzyme complex that closely resembles the mammalian Complex I. The authors resolved Complex I in three different conformational states at 3.3-4.0 Å global resolution, with an overall resemblance to the active form of the mammalian Complex I, but also with some characteristic conformational changes near the quinone substrate pocket and surrounding subunits that resemble, at least in part, the deactive form of the mammalian enzyme. The third resolved class was considered 'damaged/broken', and a possible artifact arising from the sample preparation. Biochemical assays showed that the Drosophila Complex I does not undergo an active/deactive transition (as characterized by the N-ethylmaleimide sensitivity), although the structures revealed an exposed ND3 loop that has been linked to transition. The authors could also show that conformational change between an alpha and pi form of transmembrane helix (TM3-ND6) is likely to be involved in catalysis, and distinct from the deactivation mechanism of the mammalian isoform. Due to the 3.3 Å global resolution, water molecules could not be experimentally resolved, and how the observed conformational changes link to the proton pumping activity therefore remains an open question and basis for future studies. Overall I find that this work provides an important basis for understanding mechanistic principles of the mitochondrial Complex I and more specifically a starting point for detailed genetic studies on the fruit fly Complex I.

      We thank the reviewer for their positive evaluation of our manuscript.

      We would like to note that in all three conformational states of Drosophila complex I observed in our study, we do not observe an exposed ND3 loop (Cys39 in particular), as outlined in Figures 3 & 6 and Figure 6 – Figure Supplement 1 (as well as in Figures 5 and 7). This observation is fully consistent with the lack of N-ethylmaleimide (NEM) sensitivity observed in our Drosophila preparation.

      We discuss the relevance of the π-bulge/α-helical nature of ND6-TMH3 to catalysis in the Discussion section in the context of an intercalated phospholipid molecule in the Dm1 structure: “Indeed, if ND6-TMH3 converts between its -bulge and -helical structures during catalysis (Agip et al., 2018; Kampjut and Sazanov, 2020; Kravchuk et al., 2022; Parey et al., 2021; Röpke et al., 2021), then the intercalating phospholipid is very unlikely to be present in the -helical state, moving repeatedly in and out.” While our structures are consistent with this helical change being involved in catalysis, they are resting-state structures and therefore do not provide further evidence in this regard.

      Finally, the reviewer is correct in that the resolutions of the structures resolved here are insufficient to model water molecules, and that how the conformational changes observed here contribute to our currently limited knowledge of the coupling mechanism remains to be answered.

      Reviewer #2 (Public Review):

      • Aim of the study:

      Agip et al. studied the structure of respiratory complex I from Drosophila melanogaster, an important model organism with well-developed genetic toolkit and sufficiently close phylogenetic relationship to mammals. They isolated the complex and analyzed its structure by single-particle electron cryo-microscopy (cryo-EM). They also used mass spectrometry to characterize new subunits. So far, the structures of complex I have been reported for several organisms, including mammals, plants, ciliates, fungi and bacteria, but ones from insects have been missing. This study aims to fill this gap and shed light on some of the key questions pertaining complex I biology, such as 1) the conservacy of supernumerary subunits, 2) the mechanisms and physiological relevance of active/deactive transition and 3) the correspondence between the structurally defined closed/open conformations and the biochemically defined active/deactive states.

      We thank the reviewer for clearly summarising the key aims of the study relative to the current status of the complex I field.

      • Strengths:

      The study provides the first structure of complex I from insects, the organisms at an important phylogenetic branch that has diverged from mammals more recently than other eukaryotic species such as plants and fungi. Using purification methods they developed for mammalian enzymes previously, the authors successfully purified the insect enzyme with high quality - a monodisperse peak in gel filtration, the NADH oxidation activity comparable to mammalian enzymes, and the homogenous subunit composition as confirmed by single-particle analyses. It is noteworthy that the authors used state-of-the art tools in model building and validation, such as ISOLDE and MapQ, which makes this model of high standard. In my opinion such careful validation is particularly important for modelling such a gigantic complex, since without cares one can easily misinterpret the density and draw wrong conclusions.

      The resolution is 3.3 Angstrom for the best class (Dm1), which allowed modelling side chains and comparing between the observed 3D classes and to the known structures. The model confirms the presence of 43 subunits, akin to mammalinan enzymes, composed of 14 conserved core subunits, 28 supernumerary subunits that have close homologs in mammals, and one supernumerary subunit CG9034 that has not been predicted. They are also structurally similar to mammalian enzymes except for minor local differences. The two supernumerary subunits (NDUFC1 and NDUFA2) that are present in mammals are missing. The authors discuss evidence that NDUFC1 is absent from the Drosophila genome and NDUFA2 is genomically present but its expression is restricted to the male germline. Together, the overall similarity to the mammalian enzyme underlines the use of Drosophila complex I as a model system.

      One of the remarkable findings is that common biochemical treatments that are used to deactivate mammalian complex I - heat treatment or NEM treatment - did not reveal deactive state of Drosophila complex I. This is in agreement with their observation that most structural elements are in the active state. The major Dm1 conformation shows all structural features in the active conformation, whereas Dm2 state shows two features in the deactive conformations. Here the author raises an interesting point that the structural elements formerly believed to behave in a consorted manner are actually not coupled, providing new perspective in interpreting complex I structures presented so far and in future. Notably, the authors adopted the same purification procedure for bovine and murine samples. This is a particular strength that they applied a similar procedure for but still observed different behaviors for Drosophila (the absence of the deactive state).

      We thank the reviewer for their positive evaluation of the strengths of the paper.

      • Weaknesses:

      As the authors point out in Discussion, the biochemical statuses of the two described conformations, Dm1 and Dm2, are uncertain. If we assume that Dm1 is a ready-to-go active state, Dm2 could represent several of the possible states; a partially broken state due to delipidation by detergent, a meta-stable state during enzyme turnover, an intermediate towards "full deactiving" structural transition (which the authors argue is unlikely to occur), or a fully reversible state that is in equilibrium to Dm1. Despite these uncertainties, the structure will serve as an excellent starting point to address many open questions in the complex I field in future.

      We agree that the biochemical status of Dm2 is uncertain and as the reviewer notes, we made an attempt to address this question in the Discussion section.

      In the final 3D classification the number of classes was set to 3 (K = 3). This is an arbitrary human decision and implicitly forces particles to separate into 3 descrete classes. It would have been great to mention if the authors had tried different classification parameters and, if so, whether that had led to similar classification results. There are different methods available to dissect conformational heterogeneity other than simple 3D classification. For example, focused classification can differentiate local structural features. MultiBody refinement and 3D variabitlity can analyze continuous conformational changes. The simple 3D classification with local angular sampling employed here may lead to over-simplification of the more complex structural heterogeneity.

      First, the number of classes was set to 5 (K = 5) as written in the Materials and methods section (page 20), which is greater than the number of complex I conformations observed in this study. We apologise if this was not clear and we have amended Figure 1 – Figure Supplement 2 to clarify it.

      Second, as the reviewer correctly points out, there are many different methods to dissect conformational heterogeneity, and for this reason we purposefully performed several classification strategies before validating that the Global 3D classification approach used here (with local angular search extending to 0.2º sampling) yielded comparable (or even better) results. These additional classification strategies include:

      (i) Focus-revert-classify (a strategy often used for complex I (Kampjut and Sazanov, 2020; Klusch et al., 2021; Kravchuk et al., 2022; Letts et al., 2019)) in RELION, where the membrane arm of complex I is first subtracted to focus-refine on the hydrophilic arm, the subtraction reverted, and then focus-classification performed without alignment on the membrane arm. Here, we used a regularisation parameter, t = 8, and K = 5, and the process yielded three complex I classes matching Dm1, Dm2, and Dm3 with comparable population distribution to the aforementioned Global 3D classification method, plus two junk classes.

      (ii) A 3D classification without alignment approach (a strategy also used for complex I (Gu et al., 2022)) in RELION. We used t = 20 and up to K = 12 classes, which resulted in two < 4 Å resolution complex I classes, with the major class matching Dm1 and the minor class a likely mixture of Dm2 and Dm3.

      Based on these three classification strategies, we chose to work with the Global 3D classification approach that has previously proven robust for separating complex I heterogeneity in our hands (Agip, 2018; Chung et al., 2022b; Zhu et al., 2016). However, we agree with the reviewer that it would be valuable to provide this extra information. Therefore, we have amended the Materials and methods section on page 20: “The ‘Focus-Revert-Classify’ classification strategy (Letts et al., 2019), applied using the regularisation parameter t = 8 and K = 5, yielded comparable population distributions (three complex I classes matching Dm1, Dm2, and Dm3, plus two junk classes) whilst 3D classification without alignment using t = 20 and K ≤ 12 yielded two < 4 Å complex I classes, with the major class matching Dm1 and the minor class an apparent mixture of Dm2 and Dm3. The 3D classification approach with local angular sampling was therefore employed to give the final set of Dm1, Dm2 and Dm3 particles as described above.” Furthermore, clear cryo-EM densities for Dm2-specific local features, including the ‘flipped’ ND1-TMH4-Tyr149 and the ND6-TMH3 π-bulge, revealed no evidence for Dm1 contamination in the Dm2 population. This is also now noted on page 20.

      Although 37 degrees heat treatment and NEM treatment did not reveal any sign of deactivation in Drosophila complex I, it does not rule out the possibility that insect complex I has different ways to deactivate the enzyme, to prevent ROS production. It is probably the limitation of applying existing assays that are originally for mammalian and fungal enzymes to the study of insect enzymes.

      The reviewer is correct that Drosophila complex I may have a different way to ‘deactivate’ that does not involve an exposure of ND3-Cys39, and it is also possible that the conditions used for deactivation of mammalian mitochondrial membranes (i.e. 37 ºC heat treatment for 30 min) may not be sufficient to deactivate the Drosophila enzyme. Our approach here was to evaluate if Drosophila complex I undergoes the same active/deactive transition as the mammalian enzyme both structurally and biochemically (and our results suggest that it doesn’t). Moving forward, deactivation mechanisms in different phylogenetic lineages will be an important question to address in the complex I field. We have addressed this question in the first paragraph of the Discussion.

      • Whether they achieved the aims and whether the conclusions are supported by the results:

      Overall, they successfully isolated the active enzyme and determined its structure at 3.3 A resolution, which meets the current state-of-the-art for single-particle cryo-EM and provided an atomic picture of the enzyme composition. The study confirms that the Drosophila complex I is structurally similar to mammalian complex I, but biochemically different in that it does not show the deactive state. It still does not exclude the possibility that Drosophila complex I can transition into a currently unknown state that prevents reverse electron transfer. This question however can be tackled in future by mutagenesis analyses as Drosophila is a genetically tractable organism.

      We agree with the reviewer on his evaluation of the study, and the genetic tractability of the Drosophila enzyme will serve as a crucial tool for future studies.

      • Impact to the field and utility of the data to the community:

      Complex I is important not only for human health but also for understanding universal principles of biological respiration, because of its universal presence in most organisms on Earth. This study provides a basis for relating mammalian complex I with those from other branches of organisms. The current structures will allow Drosophila researchers to interpret and design any mutations that affect complex I functions, and relate them to behavioral, developmental and metabolical changes at tissues, organs and individuals levels.

      We agree with the reviewer on his evaluation of the impact of the study, and thank the reviewer for their comments on the manuscript.

      Reviewer #3 (Public Review):

      The mitochondrial NADH dehydrogenase complex (complex I) is of prime importance for cellular respiration. It has been biochemically and structurally characterized for several groups of organisms, including mammals, fungi, algae, seed plants and protozoa. Furthermore, different complex I conformation have been reported, which are considered to possibly represent distinct physiological states of the enzyme complex. E.g. in mammalian mitochondria, two resting states can be distinguished, designated 'ready-to-go' resting state, and 'deactive' resting state. To better understand the physiological relevance of these states, complex I is here investigated from the fruit fly Drosophila melanogaster, which represents a model for insects but beyond for metazoan in general and which can be easily genetically modified.

      Complex I from Drosophila is presented at up to 3.3 Angstrom resolution. It includes 43 of the 45 complex I subunits defined for mammalian complex I. Subunit NDUFA3 has been found in Drosophila complex I for the first time. Overall, Drosophila complex I is remarkably similar in its composition and structure to the mammalian enzyme. Only minor topological differences were found in some subunits. Furthermore, three different complex I states are described, termed Dm1, Dm2 and Dm3. The three states are extensively discussed and compared to the states found in mammalian complex I. Dm1, which is the dominating class, likely represents the active resting state. In Dm2, the two complex I arms are slightly twisted with respect to Dm1. In Dm3, the membrane arm appears to be 'cracked' at the interface between ND2 and ND4. It possibly represents an artefact resulting from detergent-induced loss of stability in the distal membrane domain of the Dm2 state. Both, Dm2 and Dm3 most closely correspond to the mammalian active state. A state resembling the mammalian deactive state could not be found. This result is further supported by biochemical experiments. It is concluded that Drosophila complex I, despite its remarkable similarity to the mammalian enzyme, does not undergo the mammalian-type active/deactive transition.

      In conclusion, complex I structure from Drosophila is of limited value for the better understanding of the states of mammalian complex I (which could be stated more clearly). However, insights into complex I structure and function of an insect is highly interesting. The conclusions are justified by the presented data. The manuscript is well written and the figures are thoroughly prepared. The discussion very much focusses on the interpretation of the three complex I states. The deactivate state, which is interpreted to protect mammalian mitochondria from ROS production during reverse electron transfer, might be dispensable in species characterized by a comparatively short life cycle like Drosophila, which is in the range of weeks.

      We thank the reviewer for clearly summarising the key findings of the study. We agree that Drosophila complex I may have limitations for studying the full active/deactive transition so far observed exclusively in mammalian enzymes, but we argue that the lack of a fully deactivated state also provides a good system to study which local elements in complex I may offer protection against RET. Despite these limitations, Drosophila remains a powerful model system to study complex I mechanism, assembly, and regulation in physiological contexts.

    1. Author Response

      Reviewer #1 (Public Review):

      Neuronal tissues are very complex and are composed of a large number of neuronal types. With the advent of single-cell sequencing, many researchers have used this technology to generate atlases of neuronal structures that would describe in detail the transcriptome profiles of the different cell types. Along these lines, in this manuscript, the authors present single-cell transcriptomic data of the fruitless-expressing neurons in the Drosophila male and female central nervous systems. The authors initially compare cell cluster composition between male and female flies. They then use the expression of known markers (such as Hox genes and KC neuronal markers) to annotate several of their clusters. Then, they look in detail at the expression of different terminal neuronal genes in their transcriptomic data: first, they look into neurotransmitter-related genes and how they are expressed in the fruitless-expressing neurons; they describe in detail these populations that they then verify the expression patterns by looking into genetic intersections of Fru with different neurotransmitter-related genes. Then, they look at Fru-neurons that express circadian clock genes, different neuropeptides and neuropeptide receptors, and different subunits of acetylcholine receptors. Finally, they look into genes that are differentially expressed between male and female neurons that belong to the same clusters. They find a large number of genes; through GO term enrichment analysis, they conclude that many IgSF proteins are differentially expressed, so they look into their expression in Fru-neurons in more detail. Finally, they compare transcription factor expression between male and female neurons of the same cluster and they identify 69 TFs with cluster-specific sex-differential expression.

      In general, the authors achieved their goal of generating and presenting a large and very useful dataset that will definitely open a large number of research avenues and has already raised a number of interesting hypotheses. The data seem to be of good quality and the authors present a different aspect of their atlas.

      The main drawback is that many of the analyses are very superficial, resulting in the manuscript being handwavy and unsupported. The manuscript would benefit by reducing the number of "analyses" to the ones that are also in vivo validated and by discussing some of the drawbacks that are inherent to their experimental procedure.

      scRNA-seq studies generate atlases that are descriptive, by their nature. Therefore, we decided to keep interesting gene-expression analyses in the paper that are based on the scRNA-seq results, especially for the discoveries that point to exciting avenues for future pursuit. We reduced the text as suggested.

      1) The authors treat their male, female, and full datasets as three different samples. At the end of the day, these are, for the most part, equivalent neuronal types. The authors should decide to a) either only use the full dataset and present all analyses in this, or b) give a clear correspondence of male and female clusters onto the full ones.

      In this paper, all the analyses presented are on the full data set, with some links to the male or female data sets included. We now make clear that the full data set is the focus of the paper (lines 137-141). We provide the male and female data sets for our reader, with the individual Seurat objects uploaded to GEO, to make it easy for the reader to do follow-up analyses using the same criteria we used. We think this is helpful for our research community. We also compare the male and female clusters to the full data set using ClustifyR and report which clusters in the male or female data set analyses correspond to those in the full data set (Source data 2), as suggested by the reviewer, though ClustifyR has some limitations based on our evaluation of this tool for other annotations (see below).

      2) Most of their sections are heavily reliant on marker genes. In fact, in almost every section they mention how many of their genes of interest are marker genes. This depends heavily on specific cutoffs, making the conclusions fragile. Similarly, GO terms are used selectively and are, in many cases, vague (such as “signaling”, “neurogenesis”, “translation”).

      We evaluated marker genes, as those provide molecular identities to the clusters, given by definition they are significantly more highly expressed in a specific cluster, compared to all clusters. We used a Wilcox rank sum test with the following parameters in Seurat: (min.pct=0.25, logfc.threshold=0.25), which resulted in all called marker genes having p values < 0.05. We did not use a more stringent criteria given that most of the marker gene analyses are descriptive, and it is important to capture a broad range of genes. Our criteria are similar to Ma et al. 2021 (PMID: 33438579) and Corrales at al. 2022 (PMID: 36289550). In the text, we refer to the top 5 marker genes in several analyses, though these marker genes have a much more significant enrichment. We agree with the reviewers’ criticisms regarding the cluster-specific GO-term analyses in the text and those have been removed from the manuscript.

      3) A few of the results are not confirmed in vivo. The authors should add a Discussion section where they discuss the inherent issues of their analyses. Are there clusters of low quality? Are there many doublets?

      We have added discussion around these topics to the conclusions section of the manuscript and the results, when appropriate.

      On the same note, their clusters are obviously non-homogeneous (i.e. they house more than one cell types. This could obviously affect the authors' cluster-specific sex-differential expression, as differences could also be attributed to the differential composition of the male and female subclusters.

      We discuss this potential limitation in the discussion of sex differences in gene expression (Lines 959-961).

      4) Immunostainings are often unannotated and, in some cases especially in the Supplement, they are blurry. The authors should annotate their images and provide better images whenever possible.

      We appreciate this being pointed out and have provided higher resolution figures. The issue was we exceeded the manuscript submission file size on initial submission.

      5) I believe that the manuscript would benefit significantly by being heavily reduced in size and being focused on in vivo rigorously confirmed observations.

      We have addressed this comment by removing some of the analyses.

      Reviewer #3 (Public Review):

      This paper uses single-cell transcriptome sequencing to identify and characterize some of the neuronal populations responsible for sex-specific behaviour and physiology. This question is of interest to many biologists, and the approach taken by the authors is productive and will lead to new insights into the molecular programs that underpin sexually dimorphic development in the CNS. The dataset produced by the authors is of high quality, the analyses are detailed and well described, and the authors have made substantial progress toward the identification and characterization of some of the neuron populations. At the same time, many other cell types whose existence is suggested by this dataset remain to be identified and matched to specific neuron populations or circuits. We expect the value of this dataset to increase as other groups begin to follow up on the data and analyses reported in this paper. In general, the value of this paper to the field of Drosophila neurobiology will be high even if it is published in close to its present form. On the other hand, the current manuscript does not succeed in presenting the key take-home messages to a broader audience. A modest effort in this direction, especially re-writing the Conclusions section, will greatly enhance the accessibility and broader impact of this paper.

      While the biological conclusions reached by the authors are generally robust and of high interest, we believe that some conclusions are not sufficiently supported by the analyses that have been performed so far and need to be reexamined and confirmed. A major question concerns the authors' ability to distinguish a shared cell type with sex-biased gene expression from a pair of closely related, sex-limited cell types. There appear to be many cases that fall into this grey area, and the current analysis does not provide an objective criterion for distinguishing between sex-specific and sexually dimorphic clusters. Below we suggest some technical approaches that could be used to examine this issue. A second problem, which we do not believe to be fatal but that needs to be discussed, concerns potential differences in developmental timing and cell cycle phase between males and females, and how these differences might impact the inferences of sexual dimorphism in cell numbers and gene expression. Finally, we identify several areas, including the expression of transcription factors in different neuronal populations, that we believe could be described in more biologically insightful ways.

      For our review, we focus on three levels of evaluation:

      1). Is the dataset of high quality, useful to a large number of people, well annotated, and clearly described?

      The data appear to be high quality. The authors use reasonable neuronal markers to infer that 99% of their cells are neuronal in origin, suggesting extremely low levels of contamination from non-neuronal cells. Moreover, the gene/UMIs detected per cell are high relative to what has been reported in previous Drosophila scRNA-seq neuron papers (e.g. Allen et al., 2020). The cluster annotations are incomplete - which is not surprising, given the complexity of the cell population the authors are working with. 46 of the 113 clusters in the full dataset are named based on published expression data, gene ontology enrichments of cluster marker genes, and overlap with other CNS single cell datasets. This leaves rather a lot outstanding. It is probably unrealistic to aim for a 100% complete annotation of this dataset. But if we're thinking about how this dataset might be used by other researchers, in most cases the validation that a given cluster corresponds to a real, distinct neuron subpopulation will be left to the user.

      A major comment we have about the quality of the dataset relates to how doublets are identified and dealt with. The presence of doublets, an unavoidable byproduct of droplet-based scRNAseq protocols (like the 10x protocol used by the authors), could affect the clustering or at least bias the detection of marker genes. In large clusters, one might expect the influence of doublets on marker gene detection to be diluted, but in smaller clusters it could cause more significant problems. In extreme cases, a high proportion of doublets can produce artifactual clusters. The potential for problems is particularly high in cases where the authors identify cells with hybrid properties, such as clusters 86 and 92, which the authors describe as being serotonergic, glutamatergic, and peptidergic. Currently, the authors filter out cells with high UMI/gene counts, but it's unclear how many are removed based on these criteria, and cells can naturally vary in these values so it is not clear to us whether this approach will reliably remove doublets. That said, we acknowledge that by limiting their 'FindMarkers' analysis to genes detected in >25% of cells in a cluster the authors are likely excluding genes derived from doublets that contaminate clusters in low (but not high) numbers. We think it would be useful for the authors to report the number of cells that are filtered out because they met their doublet criteria and compare this value to the number of expected doublets for the number of cells they recovered (10x provides these figures). We would also recommend that the authors trial a doublet detection algorithm (e.g. DoubletFinder) on the unfiltered datasets (that is, unfiltered at the top end of the UMI/gene distribution). Does this identify the same cells as doublets as those the authors were filtering out?

      We appreciate this suggestion and have now added results from the doublet detection algorithm, DoubletFinder to our manuscript. Please see above response in editorial comments. We provide a table in Figure 1 – supplement 1 that indicates the number of cells removed by our filtering criteria: We acknowledge that there may be additional doublets in our data set that were not removed in our filtering criteria in the discussion (Lines 1098-1102) and have also provided a new table in Source data 2 indicating the number of potential doublets identified by DoubletFinder that are present in each cluster.

      2). What is the value of this study to its immediate field, Drosophila neurobiology? Are the annotation and analysis of specific cell clusters as precise and insightful as they could be? Has all the most important and novel information been extracted from this dataset?

      This is the part that we are least qualified to assess, since we, unlike the authors, are not neurobiologists. We hope some of the other referees will have sufficient expertise to evaluate the paper at this level.

      One thing we noticed (more on that in Part 3) is that the authors rely on JackStraw plots and clustree plots to identify the optimal combination of PCs and resolution to guide their clustering. This represents a relatively objective way of settling on clustering parameters. However, in a number of the UMAPs it looks like there are sub clusters that go undiscussed. E.g. in Fig. 2E clusters 1 and 3 are associated with smaller, distinct clusters and the same is true of clusters 2 and 6 in Fig 4b. Given that the authors are attempting to assemble a comprehensive atlas of fru+ neurons, it seems important for them to assess (at least transcriptomically) whether these are likely to represent distinct subpopulations.

      We appreciate these comments and address this above in the editorial comments section.

      3). How interesting, and how accessible is this paper to people outside of the authors' immediate field? What does it contribute to the "big picture" science?

      Here, we think the authors missed an important opportunity by under-utilizing the Conclusions section. The manuscript has a combined "Results and Discussion" section, where the authors talk about their identification and analysis of specific cell clusters / cell types. Frankly, to a non-specialist this often reads like a laundry list, and the key conclusions are swamped by a flood of details. This is not to criticize that section - given the complexity and potential value of this dataset, we think it is entirely appropriate to describe all these details in the Results and Discussion. However, the Conclusions section does not, in its present form, pull it all back together. We recommend using that section to summarize the 5-8 most important high-level conclusions that the authors see emerging from their work. What are the most important take-home messages they want to convey to a developmental biologist who does not work on brains, or to a neurobiologist who does not work on Drosophila? The authors can enhance the value of this paper by making it more interesting and more accessible to a broader audience.

      We appreciate this suggestion and made changes to the conclusions section to address this comment.

    1. Author Response

      eLife assessment

      The author customises an alpha-fold multimer neural network to predict TCR-pMHC and applies this to the problem of identifying peptides from a limited library, that might engage TCR with a known sequence from a limited list of potential peptides. This is an important structural problem and a useful step that can be further improved through better metrics, comparison to existing approaches, and consideration of the sensitivity of the recognition processes to small changes in structure.

      I appreciate the time taken by the editor and reviewers to assess this manuscript. In response to their comments, I've made significant changes and additions to the manuscript, most importantly adding (1) comparisons to TCRpMHCmodels and sequence-similarity based template selection, (2) analysis of peptide modeling accuracy in structure prediction and epitope prediction, (3) analysis and discussion of bias in the ternary structure database, (4) identification of key factors driving structure prediction accuracy, (5) binding predictions for three experimental systems with altered peptide ligand data, and (6) additional discussion of the generalizability of the epitope specificity prediction results to systems without structural characterization.

      One minor correction to the wording of the above assessment: the alphafold network used as the basis of our protocol is the original "monomer" network, not the multimer network. We chose to start from the monomer network because it was not trained on complexes, allowing for a more accurate assessment of the expected performance when modeling unseen TCR:pMHC complexes. On the other hand, performance comparisons such as in Fig. 2 are made to the AlphaFold multimer pipeline, since that pipeline can directly build models of complexes.

      Reviewer #1 (Public Review):

      The author has generated a specific version of alpha-fold deep neural network-based protein folding prediction programme for TCR-pMHC docking. The alpha-fold multimer programme doesn't perform well for TCR-pMHC docking as the TCR uses random amino acids in the CDRs and the docking geometry is flexible. A version of the alpha-fold was developed that provides templates for TCR alpha-beta pairing and docking with class I pMHC. This enables structural predictions that can be used to rank TCR for docking with a set of peptides to identify the best peptide based on the quality of the structural prediction - with the best binders having the smallest residuals. This approach provides a step toward more general prediction and may immediately solve a class of practical problems in which one wants to determine what pMHC a given TCR recognizes from a limited set of possible peptides.

      Very minor point: the structure prediction pipeline (Fig. 2) handles both MHC class I and class II complexes. For epitope binding specificity prediction (Figs. 3-6), I only tested MHC class I targets due to limitations in data availability (very few class II epitopes have had their TCR repertoires mapped and also ternary complexes solved).

      Reviewer #2 (Public Review):

      The application of AlphaFold to the prediction of the peptide TCR recognition process is not without challenge; at heart, this is a multi-protein recognition event. While Alphafold does very well at modelling single protein chains its handling of multi-chain interactions such as those of antibody-antigens pairs have performed substantially lower than for other targets (Ghani et al. 2021). This has led to the development of specialised pipelines that tweak the prediction process to improve the prediction of such key biological interactions. Prediction of individual TCR:pMHC complexes shares many of the challenges apparent within antibody-antigen prediction but also has its own unique possibilities for error.

      One of the current limitations of AlphaFold Multimer is that it doesn't support multi-chain templating. As with antibodies, this is a major issue for the prediction of TCR:pMHC complexes as the nearest model for a given pMHC, TRAV, or TRBV sequence may be in entirely different files. Bradley's pipeline creates a diverse set of 12-hybrid AlphaFold templates to circumvent this limitation, this approach constrains inter-chain docking and therefore speeds predictions by removing the time-consuming MSA step of the AlphaFold pipeline. This adapted pipeline produces higher-quality models when benchmarked on 20 targets without a close homolog within the training data.

      The challenge to the work is of course not generating predictions but establishing a functional scoring system for the docked poses of the pMHC:TCR and most importantly clearly understanding/communicating when modelling has failed. Thus, importantly Bradley's pipeline shows a strong correlation between its predicted and observed model accuracy. To this end, Bradley uses a receiver operating characteristic curve to discriminate between a TCR's actual antigen and 9 test decoys. This is an interesting testing regime, which appears to function well for the 8 case studies reported. It certainly leaves me wanting to better understand the failure mode for the two outliers - have these correctly modelled the pMHC but failed to dock the TCRs for example or visa versa?

      From the analysis in Figure 5 and Figure 5, supplement 2, it looks to me like the pMHC is pretty well modeled in all cases, and the main difference between the working and non-working targets is in the docking of TCR to pMHC. But as the reviewer rightly points out below, binding specificity is likely sensitive to small details of the structure that may not be well captured by these RMSD metrics. With an N of 8, it's hard to make definitive conclusions. As additional systems with ternary structures and TCR repertoires become available, we should be able to provide better answers.

      The real test of the current work, or its future iteration, will be the ability to make predictions from large tetramer-sorted datasets which then couple with experimental testing. The pipeline's current iteration may have some utility here but future improvements will make for exciting changes to current experimental methods. Overall the work is a step towards applying structural understanding to the vast amount of next-generation TCR sequence data currently being produced and improves upon current AlphaFold capability.

      I completely agree. I am also excited about using this pipeline for design of TCR sequences with altered specificity and/or enhanced affinity. Even an imperfect in silico specificity prediction method can be a useful filter for designed TCRs (in other words, we want TCR designs that are predicted to have specificity for their intended targets). This has been amply demonstrated for protein fold design, where re-prediction of the structure from the designed sequence provides one of the most powerful quality metrics.

      Reviewer #3 (Public Review):

      This manuscript is well organized, and the author has generally shown good rigor in generating and presenting results. For instance, the author utilized TCRdist and structure-based metrics to remove redundancies and cluster complex structures. Additionally, the consideration of only recent structures (Fig. 2B) and structures that do not overlap with the finetuning dataset (Fig. 2D) is highly warranted.

      In some cases, it seems possible that there may be train/test overlap, including the binding specificity prediction section and results, where native complexes being studied in that section may be closely related to or matching with structures that were previously used by the author to fine-tune the AlphaFold model. This could possibly bias the structure prediction accuracy and should be addressed by the author.

      Other areas of the results and methods require some clarification, including the generation and composition of the hybrid templates, and the benchmark sets shown in some panels of Figure 2. Overall this is a very good manuscript with interesting results, and the author is encouraged to address the specific comments below related to the above concerns.

      1) In the Results section, the statement "visual inspection revealed that many of the predicted models had displaced peptides and/or TCR:pMHC docking modes that were outside the range observed in native proteins" only references Fig. S1. However, with the UMAP representation in that figure, it is difficult for readers to readily see the displaced peptides noted by the author; only two example models are shown in that figure, and neither seems to have displaced peptides. The author should provide more details to support this statement, specifically structures of example models/complexes where the peptide was displaced, and/or summary statistics noting (out of the 130 tested) how many exhibited displaced peptides and aberrant TCR binding modes.

      This is a good point, especially since what constitutes a "displaced peptide" is open to interpretation. I've added an analysis of peptide backbone RMSD (Fig. 2, supplement 2) that should make it possible for readers to assess this more quantitatively using an RMSD threshold (e.g. 10 Å) that makes sense to them.

      2) The template selection protocol described in Figure 1 and in the Results and Methods should be clarified further. It seems that the use of 12 docking geometries in addition to four individual templates for each TCR alpha, TCR beta, and peptide-MHC would lead to a large combinatorial amount of hybrid templates, yet only 12 hybrid templates are described in the text and depicted in Figure 1. It's not clear whether the individual chain templates are randomly assigned within the 12 docking geometries, as an exhaustive combination of individual chains and docking geometries does not seem possible within the 12 hybrid models.

      This was poorly explained; I hope I've clarified it now in the methods. The same four templates for each of the individual chains are used in each of the three AlphaFold runs, only the docking geometries vary between the runs. In other words, not all combinations of chain template and docking geometry are provided to AlphaFold.

      3) Neither the docking RMSD nor the CDR RMSD metrics used in Figure 2 will show whether the peptide is modeled in the MHC groove and in the correct register. This would be an important element to gauge whether the TCR-pMHC interface is correctly modeled, particularly in light of the author's note regarding peptide displacement out of the groove with AlphaFold-Multimer. The author should provide an assessment of the models for peptide RMSD (after MHC superposition), possibly as a scatterplot along with docking RMSD or CDR RMSD to view both the TCR and peptide modeling fidelity of individual models. Otherwise, or in addition, another metric of interface quality that would account for the peptide, such as interface RMSD or CAPRI docking accuracy, could be included.

      This is an excellent suggestion. The new Figure 2, supplement 2, addresses this.

      4) It is not clear what benchmark set is being considered in Fig. 2E and 2F; that should be noted in the figure legend and the Results text. If needed, the author should discuss possible overlap in training and test sets for those results, particularly if the analysis in Fig. 2E and 2F includes the fine-tuned model noted in Fig. 2D and the test set in Fig. 2E and 2F is not the set of murine TCR-pMHC complexes shown in Fig. 2D. Likewise, the set being considered in Fig. 2C (which may possibly be the same set as Fig. 2E and 2F) is not clear based on the figure legend and text.

      This has been fixed. More details below.

      5) The docking accuracy results reported in Fig. 2 do not seem to have a comparison with an existing TCR-pMHC modeling method, even though several of them are currently available. At least for the set of new cases shown in Fig. 2B, it would be helpful for readers to see RMSD results with an existing template-based method as a baseline, for instance, either ImmuneScape (https://sysimm.org/immune-scape/) or TCRpMHCmodels (https://services.healthtech.dtu.dk/service.php?TCRpMHCmodels-1.0; this only appears to model Class I complexes, so Class I-only cases could be considered here).

      This is a great suggestion. We've now added a comparison to TCRpMHCmodels (Fig. 2, supplement 3), which shows that the AlphaFold-based TCR pipeline significantly improves over that baseline method on MHC Class I complexes. Unfortunately, ImmuneScape is not available as a stand-alone software package, and the web interface doesn't allow customization of the template selection process to exclude closely-related templates, which is necessary for benchmarking. Given that ImmuneScape selects a single docking template based on sequence similarity, I compared the AF_TCR dock RMSDs to the dock RMSDs of the closest sequence template (excluding related complexes). This analysis (Fig. 2, supplement 3) shows that AlphaFold modeling produces significantly better docking geometries than simply taking the closest template by sequence similarity.

      6) As noted in the text, the epitopes noted in Table 1 for the specificity prediction are present in existing structures, and most of those are human epitopes that may have been represented in the AF_TCR finetuning dataset. Were there any controls put in place to prevent the finetuning set from including complexes that are redundant with the TCRs and epitopes being used in the docking-based and specificity predictions if the AF_TCR finetuned model was used in those predictions? For instance, the GILGFVFTL epitope has many known TCR-pMHC structures and the TCRs and TCR-pMHC interfaces are known to have common structural and sequence motifs in those structures. Is it possible that the finetuning dataset included such a complex in its training, which could have influenced the success in Figure 3? The docking RMSD accuracy results in Fig. 5A, where certain epitopes seem to have very accuracy docking RMSDs and may have representative complex structures in the AF_TCR finetuning set, may be impacted by this train/test overlap. If so, the author should consider using an altered finetuned model with no train/test overlap for the binding specificity prediction section and results, or else remove the epitopes and TCRs that would be redundant with the complex structures present in the finetuning set.

      This is an excellent point. It wasn't at all clear in the original submission, but the AlphaFold model that was fine-tuned on TCR complexes was only used for the mouse comparison in Fig. 2D (now Fig. 2F), and for exactly the reasons you mention. There is too much overlap between the epitopes with well-characterized repertoires and the epitopes with solved structures. This is also the reason we used the original AlphaFold monomer network, which was only trained on individual protein chains, rather than the AlphaFold multimer network, as the basis of the AF_TCR pipeline. As noted in the discussion, there is still the possibility that individual TCR chain structures in the benchmark or specificity prediction sets were part of the AlphaFold monomer training set, which could make the docking and specificity prediction results look better than they should (though not in Fig. 2B).

      7) The alanine scanning results (Figure 6) do not seem to be validated against any experimental data, so it's not possible to gauge their accuracy. For peptide-MHC targets where there is a clear signal of disruption, it seems to correspond to prominently exposed side chains on the peptide which could likely be detected by a more simplistic structural analysis of the peptide-MHC itself. Thus the utility of the described approach in real-world scenarios (e.g. to detect viral escape mutants) is not clear. It would be helpful if the author can show results for a viral epitope variant (e.g. from one of the influenza epitopes, or the HCV epitope, in Table 1) that is known to disrupt binding for single or multiple TCRs, if such an example is available from the literature.

      This is another great point. For me, the main motivation for the alanine scanning results was to further "stress test" the pipeline to see if it produced plausible results. A particular worry was that the use of pMHC:TCR confidence scores might allow the results to be skewed by peptide-MHC binding strength, rather than the intended TCR - pMHC interaction strength. We've seen in other work that the AlphaFold confidence scores for the peptide are correlated with peptide-MHC affinity. In the AF_TCR specificity predictions, we use the mean binding scores for the "irrelevant" background TCRs to subtract out peptide-intrinsic effects. The fact that we don't see strong signal in Figure 6 at the peptide anchor positions suggests that this is working, at least to some extent. It is also encouraging that the native peptide-MHC has stronger predicted binding than the majority of the alanine variants (excepting the two epitopes with poor performance).

      I agree that comparing the repertoire-level mutation sensitivity predictions to real-world experimental data is challenging, given uncertainty about which TCR clones drive selection for escape, and other viral fitness pressures that influence the escape process. The fact that some of the positions predicted to be most sensitive are also the sites of escape mutations (examples now given in the text) is encouraging. But the new peptide-variant results (Fig. 6, supplement 1) highlight the challenges that remain in discriminating between very similar peptides (especially in the single-TCR setting).

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, Menjivar et al. examine the specific role of the enzyme arginase 1 (Arg1), which is expressed in immunosuppressive macrophages and catabolizes arginine to ornithine, in pancreatic cancer. They use an elegant genetic approach that leverages a dual recombinase-based genetically engineered mouse model of pancreatic cancer, which efficiently deletes Arg1 and recovers extracellular arginine in cultured macrophages. Within the pancreas, macrophage Arg1 deletion increased T cell infiltration and fewer mice developed invasive pancreatic cancer. Interestingly, when tumors did develop, the authors observed that compensatory mechanisms of arginine depletion were induced, including Arg1 overexpression in epithelial cells identified as tuft cells or Arg2 overexpression in macrophages. To overcome these compensatory mechanisms, pharmacological targeting of arginase was tested and found to increase T cell infiltration and sensitize to immune checkpoint blockade, suggesting this is a promising approach for pancreatic cancer.

      Strengths:

      This is a very rigorous, well-designed study and the findings are broadly interesting for the metabolism, immunometabolism, and pancreatic cancer communities. The methods are comprehensive and the experimental details in the legends are complete.

      Weaknesses:

      The claim that Arg1 deletion in macrophages delayed the formation of invasive disease is not completely justified by the data presented. Only a small number of mice are analyzed, and no statistics are included.

      While in the original submission this claim was based on a relatively small number of animals, we have now increased each cohort. The new graph is included in Figure 2E (Response Figure 1); statistical analysis is also included and show the differences to be significant.

      Moreover, the abstract does not comprehensively summarize the findings. Many findings, including compensatory upregulation of ARG1 in tuft cells and ARG2 in myeloid cells, are not mentioned, nor was the rationale for the pharmacological approach. Finally, the claim that their data demonstrate that Arg1 is more than simply a marker of macrophage function. While this is the first time this has been examined in pancreatic cancer, a general role for Arg1 and arginine metabolism by myeloid cells in immunosuppression has already been established by multiple studies, including those cited by the authors, in multiple tumor types. This is an overstatement of the findings.

      We apologize for the lack of clarity, in the attempt to meet the word limit for the abstract. We have now amended the abstract to better reflect the total of our findings and the context of our work.

    1. Author Response

      Reviewer #3 (Public Review):

      Yamada et al utilizes the full strength of Drosophila neural circuit approaches to investigate second-order conditioning. The new insights into the mechanisms of how a learned cue can act as reinforcement are relevant beyond the fly field and have the potential to spark broad interest. The main conclusions of the authors are justified and the experiments, to my understanding, are well done.

      Some minor aspects must be addressed. To avoid misunderstandings a clear distinction should be made between those experiments using real world sugar and those using artificial activation of dopamine neurons as reward. For example, the proposed teacher - student model is mostly based on the work established with artificial activation.

      We split Figure 1 and made two separate figures. The new Figure 1 displays experiments with only real sugar or optogenetic activation of sugar receptor neurons (new data), whereas the new Figure 2 shows mostly experiments with direct DAN activations. This new figure arrangement makes a clear distinction between experiments with sugar and DAN activation, and allows readers to compare them more easily. We also modified the second paragraph of the discussion to clarify this point.

      To emphasize the generality of the model, it might help to provide some further evidence using real world sugar approaches, especially since the only known sugar-reward driven plasticity is reported in the student (g5b`2a) but not the teacher compartments. In this line, it would be useful to extend the functional interference used during the sugar experiments beyond the a1 compartment.

      In response to the reviewer’s comment, we added new data in Figure 2D to show that blocking PAM-DANs in γ4, γ5 and β′2a compartments impairs second-order conditioning following odor-sugar first-order conditioning. In contrast to blocking α1 DANs, blocking those non-α1 PAM-DANs did not impair one-day first-order memory (Figure 2D), which further strengthens our model of differential requirement of compartments for first-order and second-order memory formation.

      We think transient blocks of individual DAN cell types during second-order conditioning after odor-sugar conditioning will be informative to map second-order memories to specific compartments in naturalistic settings. For the reasons detailed above, however, we will need to develop a new way of transient purturbation for that.

      We would also point out that, to our knowledge, sugar-reward-driven plasticity has not been fully demonstrated in MBON-γ5β′2a. Owald et al., 2015 Neuron (10.1016/j.neuron.2015.03.025) showed a reduced CS+ odor response after odor-sugar conditioning in MBON-b′2mp (their Fig 3). However, they could not investigate the plasticity of MBON-γ5β′2a because the magnitude of odor response was too low (their Figure S3).

      Further, general statements about the compartments, for example for g5 and a1, might need adjustment since the tools used, the respective driver lines, often don't label all dopamine neurons in one specific compartment. In fact, functional heterogeneity among dopamine neurons innervating the g5 compartment have been recently established (sugar-reward, extinction) and might apply here.

      To clarify the point that we are manipulating a subset of DANs in each compartment, we added “cell count” information in Figure 2A. Also, we made Figure 4-figure supplement 2 to show which subtypes of DANs are connected with SMP108.

      Lastly, I would like to recommend that the authors discuss alternative feedback pathways that might serve similar or parallel functions.

      Despite these minor points, the study is impressive.

      Figure 4C shows several candidate interneurons that may have similar functions as SMP108. For instance, CRE011 may acquire enhanced response to reward-predicting odor as an outcome of reduced inhibition from MBON-γ5β′2a, and send excitatory inputs to DANs.

      In Figure 4-figure supplement 3, we made additional scatter plots to visualize other outlier cell types in terms of their connectivity with MBONs and DANs.

    1. Author Response:

      We thank the three reviewers for their thoughtful comments and constructive critique.

      Reviewer #1 (Public Review):

      Hu et al. present findings that extend the understanding of the cellular and synaptic basis of fast network oscillations in the sensory cortex. They developed the ex vivo model system to study synaptic mechanisms of ultrafast (>400Hz) network oscillation ("ripplets") elicited in layer 4 (L4) of the barrel cortex in the mouse brain slice by optogenetically activating thalamocortical axon terminals at L4, which mimic the thalamic transmission of somatosensory information to the cortex. This model allowed them to reproduce extracellular ripplet oscillations in the slice preparation and investigate the temporal relationship of cellular and synaptic response in fast-spiking (FS) inhibitory interneurons and regular spiking (RS) with extracellular ripplet oscillations to common excitatory inputs at these cells. FS cells show precisely timed firing of spike bursts at ripplet frequency, and these spikes are highly synchronized with neighboring FS cells. Moreover, the phase-locked temporal relationship between the ripplets and responses of FS and RS cells, although different phases, to thalamocortical activation are found to closely coincide with EPSCs in RS cells, which suggests that common excitatory inputs to FS and RS cells and their synaptic connectivity are essential to generate reverberating network activity as ripplet oscillations. Additionally, they show that spikes of FS cells in layer 5 (L5) reduced in the slice with a cut between L4 and L5, proposing that recurrent excitation from L4 excitatory cells induced by thalamocortical optogenetic stimulation is necessary to drive FS spike bursts in layer 5 (L5).

      Overall, this study helps extend our knowledge of the synaptic mechanisms of ultrafast oscillations in the sensory cortex. However, it would have been nice if the authors had utilized various methodologies and systems.

      Although the overall findings are interesting, the conclusion of the study could have been strengthened according to the following points:

      1. The authors investigate the temporal relationship between ripplets and FS and RS cells' response elicited by optogenetic activation of TC axon terminals, which is mainly supported by phase-locked responses of FS and RS cells with local ripplets oscillations to optogenetic activation. They also show highly synchronized FS-FS firing by eliminating electrical gap-junction and inhibitory synaptic connections to this synchrony. Based on these findings, the authors suggest that common excitatory inputs to FS and RS cells in L4 would be essential to generate these local ripplets. However, it interferes with the ability to follow the logical flow for biding other findings of phase-locking responses of FS and RS cells in ripplet oscillations in L4.

      We understand the reviewer’s issue with the logical flow of our argument. We will address this concern by textual changes and/or by rearranging the order of the presentation and figures.

      2. The authors suggest that the optogenetic activation of TC axon terminal elicits local ripplet oscillations via synchronized spike burst of FS inhibitory interneurons and alternating EPSC-IPSC of RS cells in phase-locked with ripplets in L4 barrel cortex, which would be generated by following common excitatory inputs from the local circuits to these cells at the ripple frequency. Thus they intend to investigate the source of these excitatory inputs at this local network of L4 by suppressing the firing of L4 RS cells. However, they show FS spike bursts in L5B, instead of L4, due to the technical limitations of their experimental setup, as described in the manuscript. Although L5 FS spike bursts decrease after cutting the L4/L5 boundary, supposedly inhibiting excitatory input from L4 as depicted in Fig 6D in the author's manuscript, the interpretation of data seems overly extended because it does not necessarily represent cellular and synaptic activities which are phase-locked with the ripplets observed in L4.

      We have not studied network oscillation in layer 5 at the same level of detail we have studied layer 4; however the oscillations in both layers are phase locked. We will show this as supplemental data in the revised manuscript.

      3. Authors suggested a circuit model. It would be recommended that the authors try to perform in silico analysis using the suggested model to explore the function of thalamocortical axons on the fast-spiking and regular-spiking neurons to support their circuit model.

      We agree that a computational model of the layer 4 network, demonstrating ripplets in silico, would enhance our understanding of this re-discovered ultrafast oscillation. Moreover, such a model would also help constrain the allowable parameter space of other, existing models of layer 4 or of the complete cortical column, as the ability of these existing models to recreate ripplets in response to strong, synchronous thalamocortical activation could now be used as a stringent test of the assumptions underlying these models. We hope to reproduce ripplets in silico, within an experimentally constrained parameter space, in a near future study.

      Reviewer #2 (Public Review):

      This manuscript studied potential cellular mechanisms that generate ultrafast oscillations (250-600Hz) in the cortex. These oscillations correlate with sensory stimulation and might be relevant for the perception of relevant sensory inputs. The authors combined ex-vivo whole-cell patch-clamp recordings, local field potential (LFP) recordings, and optogenetic stimulation of thalamocortical afferents. In a technical tour de force, they recorded pairs of fast-spiking (FS)-FS and FS-regular-spiking (RS) neurons in the cortex and correlated their activity with the LFP signal.

      Optogenetic activation of thalamic afferents generated ripple-like extracellular waveforms in the cortex, which the authors referred to as ripplets. The timing of the peaks and troughs within these ripplets was consistent across slices and animals. Activation of thalamic inputs induced precisely timed FS spike bursts and RS spikes, which were phase-locked to the ripplet oscillation. The authors described the sequences of RS and FS neuron discharge and how they phase-locked to the ripplet, providing a model for the cellular mechanism generating the ripplet.

      The manuscript is well-written and guides the reader step by step into the detailed analysis of the timing of ripplets and cellular discharges. The authors appropriately cite the known literature about ultrafast oscillations and carefully compare the novel ripplets to the well-known hippocampal ripples. The methods used (ex-vivo patch-clamp and LFP) were appropriate to study the cellular mechanisms underlying the ripplets.

      Overall, this manuscript develops means for studying the role of cortical ultrafast oscillations and proposes a coherent model for the cellular mechanism underlying these cortical ultrafast oscillations.

      We thank the reviewer for his supportive comments.

      Reviewer #3 (Public Review):

      In this study, Hu et al. aimed to identify the neuronal basis of ultrafast network oscillations in S1 layer 4 and 5 evoked by the optogenetic activation of thalamocortical afferents in vitro. Although earlier in vivo demonstration of this short-lived (~25 ms) oscillation is sparse and its significance in detecting salient stimuli is not known the available publications clearly show that the phenomenon is consistently present in the sensory systems of several species including humans.

      In this study using optogenetic activation of thalamocortical (TC) fibers as a proxy for a strong sensory stimulus the in vitro model accurately captures the in vivo phenomenon. The authors measure the features of oscillatory LFP signals together with the intracellular activity of fast-spiking (FS) interneurons in layer 4 and 5 as well as in layer 4 regular spiking (RS) cells. They accurately measure the coherence of intra- and extracellular activity and convincingly demonstrate the synchronous firing of FS cells and antiphase firing of RS and FS cells relative to the field oscillation.

      Major points:

      1) The authors conclude the FS cell network has a primary role in setting the frequency of the oscillation. While these data are highly plausible and entirely consistent with the literature only correlational not causal results are shown thus direct demonstration of the critical role of GABAergic mechanisms is missing.

      We find that blocking fast inhibition (by puffing a gabazine solution locally) converts ripplets into long-duration paroxysmal events with high-frequency firing of both RS and FS cells. While we do not think that this experiment is diagnostic in distinguishing between competing models (in all models fast inhibition is a necessary component), we will add these experiments as supplemental material.

      2) The authors put a strong emphasis on the role of RS-RS interactions in maintaining the oscillation once it was launched by a TC activity. Its direct demonstration, however, is not presented. The alternative scenario is that TC excitation provides a tonic excitatory background drive (or envelope) for interacting FS cells which then impose ultrafast, synchronized IPSPs on RS cells. Similar to the RS-RS drive in this scenario RS cells can also only fire in the "windows of opportunity" which explains their antiphase activity relative to FS cells, but RS cells themselves do not participate in the maintenance of oscillation. Distinguishing between these two scenarios is critical to assess the potential impact of ultrafast oscillation in sensory transmission. If TC inputs are critical the magnitude of thalamic activity will set the threshold for the oscillation if RS-RS interactions are important intracortical operation will build up the activity in a graded manner.

      Earlier theoretical studies (e.g Brunel and Wang, 2003; Geisler et al., 2005) strongly suggested that even in the case of the much slower hippocampal ripples (below 200 Hz) phasic activation of local excitatory cells cannot operate at these frequencies. Indeed, rise time, propagation, and integration of EPSPs can likely not take place in the millisecond (or submillisecond) range required for efficient RS-RS interactions. The alternative scenario (tonic excitatory background coupled with FS-FS interactions) on the other hand has been clearly demonstrated in the case of the CA3 ripples in the hippocampus (Schlingloff et al., 2014. J.Nsci).

      The Schlingloff et al. study is important, and we actually think that their results, and many of their conclusions, are consistent with our own. We agree with these authors that “…PV cells are essential for the initiation and maintenance of sharp waves and the generation of ripple oscillations”, that “…perisomatic inhibition enforces ripple synchrony by phase-locking firing during SWRs”, and also that “…neuronal coupling via gap junctions is not essential in ripple synchronization”. We also agree that “The tonic excitatory ‘envelope’ arising from the buildup of activity of PCs drives the firing of PV cells”, as far as initiation of ripples in CA3 is concerned. In our model system, thalamocortical excitation serves the same role, of initiating the oscillation. However I do not see how the data of Schlingloff et al. support the conclusion that (in the legend to their Fig. 11) “…there is no cycle-by-cycle reciprocal interaction between the PCs and the PV [interneurons]”, or the implication that FS cells function as independent pacemakers “…because of their reciprocal inhibition”, as their FINO model suggests. The Schlingloff et al. data clearly show cycle-by-cycle alternations of EPSCs and IPSCs (their Fig. 1C, D, as well as their Fig. 7B), as we show in our Fig. 5A. These phasic EPSCs, occurring at ripple frequency, by necessity originate from pyramidal cells synchronized (as a population) to the ripple oscillation, as indeed shown in their multi-unit recordings. This precise, phasic (and clearly not “tonic”) excitatory drive cannot be uncoupled from the ripple (or ripplet) oscillation, and therefore cannot be dismissed as a factor driving the oscillation.

      The strongest evidence the Schlingloff et al. study provides that FS cells synchronize independently of excitatory cells – and then impose this oscillation on the excitatory cells - is in their demonstration of ripples generated by prolonged direct optogenetic stimulation of PV cells, in the presence of glutamatergic blockers (their Fig. 6). However this manipulation worked only in some of their slices, and the oscillations only lasted as long as the light stimulus and therefore were exogenously driven rather than network driven. They do not show intracellular responses from either inhibitory or excitatory cells, nor multi-unit activity, during this manipulation, so it is difficult to know if excitatory cells were indeed entrained to the same frequency, as the FINO model posits. Nevertheless this is a very interesting experiment which we plan to attempt in our own model system in a future study.

      When the properties of the ultrafast oscillation were tested as various stimulation strengths (Figure 2) weaker stimulation resulted in less precise timing. If TC input is indeed required only to launch the oscillation not to maintain it, this is not expected since once a critical number of RS cells were involved to start the activity their rhythmicity should no longer depend on the magnitude of the initial input. On the other hand, if the entire transient oscillation depends on TC excitation weaker input would result in less precise firing.

      Our interpretation for the lesser spike precision with a weaker optogenetic stimulation is that fewer FS cells fired upon the initial thalamocortical volley, and therefore a weaker IPSP wavefront was propagated to RS cells allowing for a wider “window of opportunity” for RS firing,  and this loss of synchrony then propagated from cycle to cycle. This interpretation will be added in the revised manuscript.

      3) The experiments indicating the spread of phasic activity from L4 RS to L5 FS cells can not be accepted as fully conclusive. The horizontal cut not only severed the L4 RS to L5 FS connections but also many TC inputs to the L5 FS apical dendrites as well as the axons of L4 FS cells to L5 FS cells both of which can be pivotal in the translaminar spread.

      FS cells do not have apical dendrites so we assume the reviewer meant to say “L5 RS apical dendrites”; however if the cut reduced the excitability of L5 RS cells, that only strengthens our conclusion that RS firing is required for maintaining the oscillation. While the cut could have also disrupted L4 FS to L5 FS connections, we are not aware of any evidence that such inter-laminar connections exist. On the other hand, the Pluta et al. 2015 study shows very robust excitatory connections between L4 RS and L5 FS cells.  

      Having said that, we agree with the reviewer (indeed, with all three reviewers) that the L4/L5 cut experiments are not conclusive, and we will make this clear in our discussion in the revised manuscript. We plan to do a more conclusive test of our model by using a transgenic line to express inhibitory opsins specifically in L4. This will require expressing ChR2 in the thalamus by virus injection and a careful comparison of ripplets between the two models; we therefore reserve these experiments to a future study.

    1. Author Response

      Reviewer #2 (Public Review):

      "The cellular architecture of memory modules in Drosophila supports stochastic input integration" is a classical biophysical compartmental modelling study. It takes advantage of some simple current injection protocols in a massively complex mushroom body neuron called MBON-a3 and compartmental models that simulate the electrophysiological behaviour given a detailed description of the anatomical extent of its neurites.

      This work is interesting in a number of ways:

      • The input structure information comes from EM data (Kenyon cells) although this is not discussed much in the paper - The paper predicts a potentially novel normalization of the throughput of KC inputs at the level of the proximal dendrite and soma - It claims a new computational principle in dendrites, this didn’t become very clear to me Problems I see:

      • The current injections did not last long enough to reach steady state (e.g. Figure 1FG), and the model current injection traces have two time constants but the data only one (Figure 2DF). This does not make me very confident in the results and conclusions.

      These are two important but separate questions that we would like to address in turn.

      As for the first, in our new recordings using cytoplasmic GFP to identify MBON-alpha3, we performed both a 200 ms current injection and performed prolonged recordings of 400 ms to reach steady state (for all 4 new cells 1’-4’). For comparison with the original dataset we mainly present the raw traces for 200 ms recordings in Figure 1 Supplement 2. In addition, we now provide a direct comparison of these recordings (200 ms versus 400 ms) and did not observe significant differences in tau between these data (Figure 1 Supplement 2 K). This comparison illustrates that the 200 ms current injection reaches a maximum voltage deflection that is close to the steady state level of the prolonged protocol. Importantly, the critical parameter (tau) did not change between these datasets.

      Regarding the second question, the two different time constants, we thank the reviewer for pointing this out. Indeed, while the simulated voltage follows an approximately exponential decay which is, by design, essentially identical to the measured value (τ≈ 16ms, from Table 1; ee Figure 1 Supplement 2 for details), the voltage decays and rises much faster immediately following the onset and offset of the current injections. We believe that this is due to the morphology of this neuron. Current injection, and voltage recordings, are at the soma which is connected to the remainder of the neuron by a long and thin neurite. This ’remainder’ is, of course, in linear size, volume and surface (membrane) area much larger than the soma, see Fig 2A. As a result, a current injection will first quickly charge up the membrane of the soma, resulting in the initial fast voltage changes seen in Fig 2D,F, before the membrane in the remainder of the cell is charged, with the cell’s time constant τ.

      We confirmed this intuition by running various simplified simulations in Neuron which indeed show a much more rapid change at step changes in injected current than over the long-term. Indeed, we found that the pattern even appears in the simplest possible two-compartment version of the neuron’s equivalent circuit which we solved in an all-purpose numerical simulator of electrical circuitry (https://www.falstad.com/circuit). The circuit is shown in Figure 1. We chose rather generic values for the circuit components, with the constraints that the cell capacitance, chosen as 15pF, and membrane resistance, chosen as 1GΩ, are in the range of the observed data (as is, consequently, its time constant which is 15ms with these choices); see Table 1 of the manuscript. We chose the capacitance of the soma as 1.5pF, making the time constant of the soma (1.5ms) an order of magnitude shorter than that of the cell.

      Figure 1: Simplified circuit of a small soma (left parallel RC circuit) and the much larger remainder of a cell (right parallel RC circuit) connected by a neurite (right 100MΩ resistor). A current source (far left) injects constant current into the soma through the left 100MΩ resistor.

      Figure 2 shows the somatic voltage in this circuit (i.e., at the upper terminal of the 1.5pF capacitor) while a -10pA current is injected for about 4.5ms, after which the current is set back to zero. The combination of initial rapid change, followed by a gradual change with a time constant of ≈ 15ms is visible at both onset and offset of the current injection. Figure 3 show the voltage traces plotted for a duration of approximately one time constant, and Fig 4 shows the detailed shape right after current onset.

      Figure 2: Somatic voltage in the circuit in Fig. 1 with current injection for about 4.5ms, followed by zero current injection for another ≈ 3.5ms.

      Figure 3: Somatic voltage in the circuit, as in Fig. 2 but with current injected for approx. 15msvvvvv

      While we did not try to quantitatively assess the deviation from a single-exponential shape of the voltage in Fig. 2E, a more rapid increase at the onset and offset of the current injection is clearly visible in this Figure. This deviation from a single exponential is smaller than what we see in the simulation (both in Fig 2D of the manuscript, and in the results of the simplified circuit here in the rebuttal). We believe that the effect is smaller in Fig. E because it shows the average over many traces. It is much more visible in the ’raw’ (not averaged) traces. Two randomly selected traces from the first of the recorded neurons are shown in Figure 2 Supplement 2 C. While the non-averaged traces are plagued by artifacts and noise, the rapid voltage changes are visible essentially at all onsets and offsets of the current injection.

      Figure 4: Somatic voltage in the circuit, as in Fig. 2 but showing only for the time right after current onset, about 2.3ms.

      We have added a short discussion of this at the end of Section 2.3 to briefly point out this observation and its explanation. We there also refer to the simplified circuit simulation and comparison with raw voltage traces which is now shown in the new Figure 2 Supplement 2.

      • The time constant in Table 1 is much shorter than in Figure 1FG?

      No, these values are in agreement. To facilitate the comparison we now include a graphical measurement of tau from our traces in Figure 1 Supplement 2 J.

      • Related to this, the capacitance values are very low maybe this can be explained by the model’s wrong assumption of tau?

      Indeed, the measured time constants are somewhat lower than what might be expected. We believe that this is because after a step change of the injected current, an initial rapid voltage change occurs in the soma, where the recordings are taken. The measured time constant is a combination of the ’actual’ time constant of the cell and the ’somatic’ (very short) time constant of the soma. Please see our explanations above.

      Importantly, the value for tau from Table 1 is not used explicitly in the model as the parameters used in our simulation are determined by optimal fits of the simulated voltage curves to experimentally obtained data.

      • That latter in turn could be because of either space clamp issues in this hugely complex cell or bad model predictions due to incomplete reconstructions, bad match between morphology and electrophysiology (both are from different datasets?), or unknown ion channels that produce non-linear behaviour during the current injections.

      Please see our detailed discussion above. Furthermore, we now provide additional recordings using cytoplasmic GFP as a marker for the identification of MBON-alpha3 and confirm our findings. We agree that space-clamp issues could interfere with our recordings in such a complex cell. However, our approach using electrophysiological data should still be superior to any other approach (picking text book values). As we injected negative currents for our analysis at least voltage-gated ion channels should not influence our recordings.

      • The PRAXIS method in NEURON seems too ad hoc. Passive properties of a neuron should probably rather be explored in parameter scans.

      We are a bit at a loss of what is meant by the PRAXIS method being "too ad hoc." The PRAXIS method is essentially a conjugate gradient optimization algorithm (since no explicit derivatives are available, it makes the assumption that the objective function is quadratic). This seems to us a systematic way of doing a parameter scan, and the procedure has been used in other related models, e.g. the cited Gouwens & Wilson (2009) study.

      Questions I have:

      • Computational aspects were previously addressed by e.g. Larry Abbott and Gilles Laurent (sparse coding), how do the findings here distinguish themselves from this work

      In contrast to the work by Abbott and Laurent that addressed the principal relevance and suitability of sparse and random coding for the encoding of sensory information in decision making, here we address the cellular and computational mechanisms that an individual node (KC>MBON) play within the circuitry. As we use functional and morphological relevant data this study builds upon the prior work but significantly extends the general models to a specific case. We think this is essential for the further exploration of the topic.

      • What is valence information?

      Valence information is the information whether a stimulus is good (positive valence, e.g. sugar in appetitive memory paradigms, or negative valence in aversive olfactory conditioning - the electric shock). Valence information is provided by the dopaminergic system. Dopaminergic neurons are in direct contact with the KC>MBON circuitry and modify its synaptic connectivity when olfactory information is paired with a positive or negative stimulus.

      • It seems that Martin Nawrot’s work would be relevant to this work

      We are aware of the work by the Nawrot group that provided important insights into the processing of information within the olfactory mushroom body circuitry. We now highlight some of his work. His recent work will certainly be relevant for our future studies when we try to extend our work from an individual cell to networks.

      • Compactification and democratization could be related to other work like Otopalik et al 2017 eLife but also passive normalization. The equal efficiency in line 427 reminds me of dendritic/synaptic democracy and dendritic constancy

      Many thanks for pointing this out. This is in line with the comments from reviewer 1 and we now highlight these papers in the relevant paragraph in the discussion (line 442ff).

      • The morphology does not obviously seem compact, how unusual would it be that such a complex dendrite is so compact?

      We should have been more careful in our terminology, making clear that when we write ’compact’ we always mean ’electrotonically compact," in the sense that the physical dimensions of the neuron are small compared to its characteristic electrotonic length (usually called λ). The degree of a dendritic structure being electrotonically compact is determined by the interaction of morphology, size and conductances (across the membrane and along the neurites). We don’t believe that one of these factors alone (e.g. morphology) is sufficient to characterize the electrical properties of a dendritic tree. We have now clarified this in the relevant section.

      • What were the advantages of using the EM circuit?

      The purpose of our study is to provide a "realistic" model of a KC>MBON node within the memory circuitry. We started our simulations with random synaptic locations but wondered whether such a stochastic model is correct, or whether taking into account the detailed locations and numbers of synaptic connections of individual KCs would make a difference to the computation. Therefore we repeated the simulations using the EM data. We now address the point between random vs realistic synaptic connectivity in Figure 4F. We do not observe a significant difference but this may become more relevant in future studies if we compute the interplay between MBONs activated by overlapping sets of KCs. We simply think that utilizing the EM data gets us one step closer to realistic models.

      • Isn’t Fig 4E rather trivial if the cell is compact?

      We believe this figure is a visually striking illustration that shows how electrotonically compact the cell is. Such a finding may be trivial in retrospect, once the data is visualized, but we believe it provides a very intuitive description of the cell behavior.

      Overall, I am worried that the passive modelling study of the MBON-a3 does not provide enough evidence to explain the electrophysiological behaviour of the cell and to make accurate predictions of the cell’s responses to a variety of stochastic KC inputs.

      In our view our model adequately describes the behavior of the MBON with the most minimal (passive) model. Our approach tries to make the least assumptions about the electrophysiological properties of the cell. We think that based on the current knowledge our approach is the best possible approach as thus far no active components within the dendritic or axonal compartments of Drosophila MBONs have been described. As such, our model describes the current status which explains the behavior of the cell very well. We aim to refine this model in the future if experimental evidence requires such adaptations.

      Reviewer #3 (Public Review):

      This manuscript presents an analysis of the cellular integration properties of a specific mushroom body output neuron, MBON-α3, using a combination of patch clamp recordings and data from electron microscopy. The study demonstrates that the neuron is electrotonically compact permitting linear integration of synaptic input from Kenyon cells that represent odor identity.

      Strengths of the manuscript:

      The study integrates morphological data about MBON-α3 along with parameters derived from electrophysiological measurements to build a detailed model. 2) The modeling provides support for existing models of how olfactory memory is related to integration at the MBON.

      Weaknesses of the manuscript:

      The study does not provide experimental validation of the results of the computational model.

      The goal of our study is to use computational approaches to provide insights into the computation of the MBON as part of the olfactory memory circuitry. Our data is in agreement with the current model of the circuitry. Our study therefore forms the basis for future experimental studies; those would however go beyond the scope of the current work.

      The conclusion of the modeling analysis is that the neuron integrates synaptic inputs almost completely linearly. All the subsequent analyses are straightforward consequences of this result.

      We do, indeed, find that synaptic integration in this neuron is almost completely linear. We demonstrate that this result holds in a variety of different ways. All analyses in the study serve this purpose. These results are in line with the findings by Hige and Turner (2013) who demonstrated that also synaptic integration at PN>KC synapses is highly linear. As such our data points to a feature conservation to the next node of this circuit.

      The manuscript does not provide much explanation or intuition as to why this linear conclusion holds.

      We respectfully disagree. We demonstrate that this linear integration is a combination of the size of the cell and the combination of its biophysical parameters, mainly the conductances across and along the neurites. As to why it holds, our main argument is that results based on the linear model agree with all known (to us) empirical results, and this is the simplest model.

      In general, there is a clear takeaway here, which is that the dendritic tree of MBON-α3 in the lobes is highly electrotonically compact. The authors did not provide much explanation as to why this is, and the paper would benefit from a clearer conclusion. Furthermore, I found the results of Figures 4 and 5 rather straightforward given this previous observation. I am sceptical about whether the tiny variations in, e.g. Figs. 3I and 5F-H, are meaningful biologically.

      Please see the comment above as to the ’why’ we believe the neuron is electrotonically compact: a model with this assumption agrees well with empirically found results.

      We agree that the small variations in Fig 5F-H are likely not biologically meaningful. We state this now more clearly in the figure legends and in the text. This result is important to show, however. It is precisely because these variations are small, compared to the differences between voltage differences between different numbers of activated KCs (Fig 5D) or different levels of activated synapses (Fig 5E) that we can conclude that a 25% change in either synaptic strength or number can represent clearly distinguishable internal states, and that both changes have the same effect. It is important to show these data, to allow the reader to compare the differences that DO matter (Fig 5D,E) and those that DON’T (Fig 5F-H).

      The same applies to Fig 3I. The reviewer is entirely correct: the differences in the somatic voltage shown in Figure 3I are minuscule, less than a micro-Volt, and it is very unlikely that these difference have any biological meaning. The point of this figure is exactly to show this!. It is to demonstrate quantitatively the transformation of the large differences between voltages in the dendritic tree and the nearly complete uniform voltage at the soma. We feel that this shows very clearly the extreme "democratization" of the synaptic input!

    1. Author Response

      Public Review:

      In this article, the authors have taken up the substantial task of combing through thousands of published meta-analyses and systematic reviews, with the goal of identifying the subset that specifically seeks to measure the association between elapsed time ("lag-time") in various milestones of cancer diagnosis or treatment (e.g. time elapse from symptom onset to first seen by primary care physician) and cancer outcomes. Within this subset, they have identified and summarized the findings on how these lag times are related to certain cancer outcomes. For example, how much does a delay in the start of adjuvant chemotherapy after surgery for breast cancer increase the mortality rate for these patients? The overarching goal of this work is to characterize the pre-Covid-19 landscape of these relationships and thereby provide a basis for studying what impact the pandemic had on worsened outcomes for cancer patients due to treatment delays. The authors have done an excellent job in their review of systematic reviews and meta-analyses, both describing their methodology well and interpreting their findings. The immediate connection to the Covid-19 pandemic is somewhat tenuous and primarily left to the reader to determine.

      We thank Dr. Boonstra for this positive feedback regarding our detail-oriented systematic search and review process. The main concern of Dr. Boonstra was the need to elaborate on the translation component of our results onto the pandemic. We clarify the utility of contextualizing our findings with the pandemic and corresponding revisions to our manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      It appears in the text that "there are key differences between the model and actual bacteria-phage systems, and the model should not be interpreted as one that will directly map onto a biological scenario". I agree with this statement. However, by distancing the model from biological scenarios it makes its predictions hard to validate in a real system, leaving us with no obvious way to infer how to apply its conclusions. Indeed, both explicit examples given in lines 125-130: phase-bacteria and T-cell-antigen are not quite captured by modeling choices. I would have much preferred a specific biological system fixed in mind, then minimally modeled in a way that there is hope to directly link the modeling results to experiments. Especially since there is a wealth of available microbial population data, as well as much being generated.

      I do believe that the model can be related to or at least adapted to experimental comparison, specifically once there are sufficiently many datasets measuring binding affinities between proteins that govern the types of interactions described herein. This is starting to happen for TCR-antigen pairs (eg VDJdb), but this database is still far from a large enough to be able to fit a reasonable model, or perform a controlled experiment. I am not sure of an equivalent database for phage binding proteins and their relevant binding rates. As the reviewer notes, the model will need to be tailored to certain particularities of the T cell-pathogen, T cell-tumor, or phage-bacteria dynamics, but these are achievable, and should not impact the qualitative results too much. The current model is instead a minimal model that captures essential aspects of these systems, which have both been modeled as predator-prey populations in the literature.

      As stated, "the population fitness distribution is never able to 'settle'..." is indicative of the driven nature (driven by strong noise) of the quasi steady state as opposed to a stability that arises from the system dynamics.

      I agree with this. The steady state is a sort of “statistical” one rather than an “explicit” one. I think I have made this fairly clear in the text, but please let me know if there are any specific suggestions wrt clarifying this point.

      Reviewer #2 (Public Review):

      This work by Martis illustrates, in a predator-prey or parasite-host eco-evolutionary context, the classical idea of bet hedging or biological insurance: where a single population would fluctuate and perhaps risk extinction, summing over multiple sub-populations with asynchronous dynamics (some going up while others go down) allows a stabler total abundance.

      Here the sub-populations are various genotypes of one predator and one prey species, fluctuations are due to their ecological interactions, their dynamics are more asynchronous when predation is more specialized (i.e. the various predator genotypes differ more in which prey types they can eat), and mutations allow the regeneration of genotypes that have gone extinct, thus ensuring that the diversity of subpopulations is not lost (corresponding to a "clonal interference" regime with multiple coexisting genotypes).

      While the general idea of bet hedging has been explored in many settings, the devil is usually in the details: for instance, sub-populations should be connected enough to allow the rescue of those going extinct, but a too strong connection would simply synchronize their temporal dynamics and lose the benefit of bet hedging. In some cases, connections between sub-populations could even be destabilizing (e.g. Turing instabilities in space).

      In a recent surge of physics-inspired many-species theories, where fluctuations arise from ecological dynamics, these details are notably starting to be understood in the case of spatial bet hedging, i.e. genetically identical subpopulations in multiple patches connected by migration (see e.g. Roy et al PLoS Comp Bio 2020 or Pierce et al PNAS 2020).

      These spatial models certainly served as inspiration and have been cited. However, there is a key difference in that the spatial models rely on something akin to the “storage effect,” where (loosely speaking) strains persist by transiently living on islands with a somewhat favorable ecological context. In my model there is no such storage effect and the persistence of the whole population relies on the generation of strains that are favorable in the current context by chance mutations. There is an analogy to be made with antigen escape, or more generally “Kill-The-Winner” type dynamics. However, the dynamics in my model are more complex than this – specifically, the dynamics are “high-dimensional” and there can be several prey “Winners” with multiple predators in pursuit. However, I clarify the differences between my model and spatial models in Appendix 6.

      In the non-spatial eco-evolutionary setting considered here, the connecting flux is one of mutations rather than migrations, and a predator genotype can in principle interact with all prey genotypes (whereas in usual spatialized models, interactions cannot occur between different patches). Another possibly important detail here is that similar genotypes do not have similar interaction phenotypes, meaning there is no risk of evolution being confined in a neighborhood of similar phenotypes. According to the author and my own cursory exploration of the relevant eco-evo literature (with which I am less familiar than pure ecology), this setting has yet to see many developments in the spirit of the many-species theories mentioned above.

      These differences make this new inquiry worthwhile and I applaud the author for undertaking it. From a theoretical perspective, three results emerging from the simulations stand out in this article as potentially very interesting:

      • rather sharp transitions in extinction probability and strain diversity as mutation flux and predator specialization increase.

      • how mutation rate and interaction strength combine, notably in power-law expressions for total population abundance

      • the discussion of susceptibilities, i.e. how predator and prey populations respond to perturbations, as a key ingredient in understanding the previous results, in particular with counter-intuitive negative susceptibilities indicating positive feedback loops.

      It is a bit unfortunate that these more novel points are only briefly explored in the main text: while they are more developed in appendices, these arguments are not always as complete, polished and distilled as they might have been in a main text, so an article focusing entirely on explaining them deeply and intuitively would have been far more exciting to me.

      Thank you for expressing such interest in the work. And I understand the point about the structure of the manuscript. This was a compromise on my part to make the text readable for a more diverse audience. There are “intuitive” descriptions in the main text, and more extensive intuitive descriptions in the supplement. The technical details are also primarily in the supplement. I have tried my best to make the supplement as readable as possible and cross-reference it with the relevant sections in the main text, but I understand that it is nonetheless particularly long and dense. I certainly understand the difficulty in reading and internalizing it all on a constrained timeframe.

      Finally, I will note that I am not convinced by the framing of the current manuscript as a counterpoint to Robert May's idea of destabilizing diversity - in many ways I think this is a less relevant context than that of bet hedging, and it does a worse job at showcasing what is genuinely interesting and original here; I would thus encourage readers to read this paper in the framing I propose above.

      As mentioned above, I reduced the emphasis on the May result and have explicitly mentioned the analogy to bet-hedging in the main text. I’ve also made a direct comparison to spatial models with a mainland in the supplement.

    1. Author Response

      Reviewer #2 (Public Review):

      The authors performed a series of impressive experiments to systematically establish each part of their CRISPRi method. They provided one of the most compact design of CRISPRi dual-guideRNA library, with a genome-wide coverage; they confirmed prior finding on the optimal repressor domain to generate a set of useful vectors for expressing the repressor; they showcased the usage of the system in multiple common cancer cell lines. The authors also took an important step towards providing a detailed and well-annotated protocol (in the supplementary materials) to help users of their methods. The items listed below would be helpful to further improve this work:

      First, while the dual guideRNA design is a useful development, the author also noted the significant rate (~30%) recombination between the two sgRNAs. This should be further discussed and evaluated in the manuscript to help readers understand the implication of this high recombination rate. For example, across replicate experiments or across cell types tested, would the recombination be stochastic, or there may be some bias of which guide would be recombined? Are there any cell-type dependencies here in terms of the recombination rate? This would also help future users to decide if they would need to check for this effect during functional screening.

      We agree that recombination is an important limitation of dual-sgRNA screens. We included additional analyses and data in the revised manuscript to help readers understand the implications of the observed recombination.

      First, we performed growth screens using dual-sgRNA libraries in two additional cell lines (RPE1 and Jurkat) to address the potential cell type specificity of lentiviral recombination. We cloned a dual-sgRNA library targeting DepMap Common Essential genes (n=2291 dual-sgRNA elements). We transduced cells with this library, harvested cells at day 7 post-transduction, amplified sgRNA cassettes from extracted genomic DNA, and sequenced to quantify sgRNA recombination rates. We found similar recombination rates of dual-sgRNA constructs isolated from these three cell types (observed K562 recombination rate 29%; observed RPE1 recombination rate 26%; observed Jurkat recombination rate 24%).

      Next, we compared the recombination rates of each dual-sgRNA element. Our expectation was that lentiviral recombination would be largely stochastic per element based on the known mechanism of lentiviral recombination previously discussed in Adamson et al. 2018 (https://www.biorxiv.org/content/10.1101/298349v1.full) given that the constant region between sgRNAs (400bp) far exceeds the length of sgRNA targeting regions (20bp). However, we would also expect apparent recombination rates to be artificially inflated for dual-sgRNAs with strong growth phenotypes, as the stronger growth phenotypes of unrecombined dual-sgRNAs compared to recombined dual-sgRNAs will lead to dropout of unrecombined dual-sgRNAs. To account for this bias, we began by comparing the recombination rate for non-targeting control dual-sgRNAs excluding those with growth phenotypes across replicates of our K562 screens. There was only a weak correlation between the recombination rate for non-targeting control dual-sgRNAs (r = 0.30; Figure 1 – Figure Supplement 1E). We next compared the recombination rates of all dual-sgRNA elements (both targeting and non-targeting) across replicates of our K562 screens. As expected, we observed that the recombination rate of elements was correlated across replicates (r = 0.77; Figure 1 – Figure Supplement 1F), and the recombination rate was strongly anticorrelated with the growth phenotype of dual-sgRNAs in K562 cells (r = -0.84; Figure 1 – Figure Supplement 1G). We have integrated these data into the manuscript.

      Second, on the repressor development and evaluation. As the author mentioned in the text, the expression level of the repressor can confound their conclusion on fitness/efficiency comparisons of CRISPR repressor. Thus, it would be helpful to perform protein level validation using the cell lines they generated, such as a WesternBlot comparison to rule out this potential issue.

      We agree that differences in expression levels of the effectors can confound comparisons and that Western Blotting for such differences would be valuable. That said, any such analyses would not substantively alter the main claim of our paper, which is that Zim3-dCas9 provides excellent on-target knockdown in the absence of non-specific effects on cell growth or gene expression. This finding is of immediate practical use to the community. By no means are we claiming that we eliminated all possible confounding factors nor do we think that it is possible to do so. To avoid overstating our findings, we had acknowledged in the discussion that expression levels may indeed be a confounding factor, we had noted in the methods section that the dCas9-MeCP2 effector uses a different coding sequence for dCas9, which may contribute to differences in expression, and we had noted that other effectors may prove useful in some settings. We have further emphasized that differences in expression levels may contribute to our results in the revised manuscript.

      This work would also benefit from including cell proliferation/viability measurement using the selected Zim3-dCas9 repressor in multiple cell lines, as it seems this was only done initially in K562 cells. As authors noted, the fitness effects of the CRISPR repressor would be a major concern when performing functional genomics screening, so such validation of fitness-neutrality of the repressor can be very helpful for potential users of their method and approach.

      To address this point, we assessed the proliferation of HepG2, HuTu-80, and HT29 cells expressing Zim3-dCas9. Expression of Zim3-dCas9 did not have a negative impact on proliferation in any of these cell types, providing further evidence that the Zim3-dCas9 will be broadly useful. We included these data in Figure 4 – Figure Supplement 2 in the revised manuscript. That said, we cannot rule out that expression of Zim3-dCas9 may be detrimental in other cell types. Indeed, we want to emphasize that users should evaluate both on-target knockdown and lack of non-specific effects of effectors in new cell models before proceeding to large-scale experiments. The assays and protocols we describe are ideally suited for this purpose. We have further emphasized this point in the discussion section to guide users.

      Third, a major resource from this work, as the authors noted, is a suite of useful Zim3-dCas9 cell lines. The authors have performed a set of experiments to demonstrate the knockdown efficiency with dozens of guideRNAs. While this is a good initial validation, to really ensure the cell lines are performing as expected, a small scale screening in pooled fashion will be more convincing. This would be a setting more relevant for potential readers, given that pooled screening would likely be the most powerful application of these cell lines.

      While conducting the work described in this manuscript, we had used the Zim3-dCas9 RPE1 cell line for a large-scale pooled screen with single-cell RNA-seq readout (Perturb-seq, Replogle et al. 2022). Across greater than 2000 target genes, the median knockdown was 91.6%, which provides strong validation that Zim3-dCas9 performs as expected in this cell line. We had noted this point in the discussion section of our manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Oxidation of some KCNQ7 channels enhances channel activity. The manuscript by Nuñez and coauthors concluded that oxidation in the S2S3 linker of these channels disrupted the interaction between S2S3 and CaM EF-hand 3 (EF3). This mechanism is Ca2+-dependent. The apo EF3 no longer interacted with S2S3, and H2O2 no longer activated the channel. Electrophysiological recordings and fluorescence and NMR measurements of CaM with isolated helices A and B (CRD) and S2S3 of the channel were performed. While the results were in general clear with good quality, how the results support the conclusion was not clearly described. The approach using isolated molecular components in the study needs further validation since some of the results seem to show major conflicts with the results and mechanisms proposed in previous studies.

      1) Previous studies showed differential responses of Kv7 channels to oxidation; Kv7.2, 4, and 5 are sensitive to oxidation regulation but Kv7.1 and 3 do not change upon H2O2 treatment. These differences were attributed at least partially to the sequence differences in S2S3 among Kv7 channels (ref 10 of this manuscript). The results in this manuscript show some major differences from the previous study. First, in all experiments, no difference was observed among Kv7 channels. Second, in Fig 3-6, S2S3 from KCNQ1 was used. The rationale for using KCNQ1 S2S3 and the interpretation of results is not justified considering that KCNQ1 S2S3 has fewer Cys residues and was least affected by oxidation in the previous study.

      We addressed the issue of differential sensitivity of Kv7 channels to H2O2 in the section 3.2 above (and in the discussion, lines 364-380). In brief, Kv7.3 is likely to display diminished redox-sensitivity due to its high tonic Po (as discussed in ref 10). Kv7.1 does have reduced number of Cys residues in the S2S3 linker and is also insensitive to H2O2 but introducing additional cysteine residues into Kv7.1 S2S3 confers only a fairly weak redox sensitivity. Hence, we think that on the structural level, all Kv7 channels have a redoxresponsive element (S2S3 linker) but Kv7.1 and Kv7.3 have other constrains that prevent their activity to be modulated by their redox-responsive domains.

      We have performed new experiments with Kv7.2 and Kv7.4 peptides (3 cysteine residues). These new data confirm our conclusions, and are now included in Figure 6.

      2) In Fig 6, oxidation of S2S3 leads to a reduction of S2S3-CaM interaction, which leads to an increase of currents (Fig 1C). In Fig 4, Ca2+ loading leads to a reduced S2S3-CaM (EF3) interaction, which should also lead to an increase of currents based on Fig 6 conclusions. However, it is the EF3 mutation (destroying Ca2+ binding) that leads to the current increase (Fig 1B), contradictory to what Fig 6 data suggested.

      Figure 6 and supplemental Figure 12 suggest that the effect of the peptides on the CRD is lost or reduced after oxidation. These data suggest that the oxidized S2S3 can no longer affect the CRD-CaM interaction. We propose that when EF3 is able to bind Ca2+ there is a tonic inhibition, and that oxidation relieves this inhibition leading to current increase.

      As we explain above (see response 2.1), the effect is complicated due to CaMdependent promotion of surface expression.

    1. Author Response

      Reviewer #1 (Public Review):

      Major

      The observations on the hook lipids are critical and should be documented better. Based on previous work, it had been proposed that the hook lipids are associated with the inner leaflet and that they leave upon (partial) channel opening. In contrast, the present MD simulations indicate these lipids are associated with the outer leaflet and that their association to the channel persists on opening. These critical observations need to be documented better.

      i) Do the authors observe hook lipids in the cryoEM structure of the open channel? If yes, data should be shown. If no, then the discrepancy between MD and EM should be explicitly addressed.

      The resolution of the original cryo-EM density map of MscS in PC14 nanodiscs was not sufficient to reveal clear densities for the “hook” lipids. However, through further analysis we have now obtained an improved map to 3.1-Å resolution that offers new insights into this question – see Figure 2 – Figure Supplement 1. The new map confirms all the characteristics previously determined for the open conformation: same helical movements resulting in a similar opening of the pore, and the absence of lipids blocking it, all indicating a conducting conformation. In addition, the new map reveals a series of densities consistent with the dimensions of a phospholipid headgroup near the C-terminus of TM2 (facing the outside), filling a small cavity in-between adjacent TM1 helices. This position is precisely that occupied by the hook lipids in the close MscS structure obtained in PC18 nanodiscs. A headgroup residing in this density would also be well positioned to interact directly with Arg88, a key element in the hook-lipid interaction site, whose mutation leads to a strong loss-of-function phenotype (Reddy et al, 2019). These consistencies notwithstanding, we want to be cautious in this interpretation; these densities are of the same intensity as and blend with that of the nanodisc lipid, and so it is not possible to discern the acyl chains, which were more clearly resolved in the closed state. Therefore, while the new densities are consistent with a model in which the hook lipids are a structural feature of both closed and open states, as indicated by the simulation data, additional experimental data (or further improvements in the map) will be needed for an unequivocal assignment.

      ii) Please show the comparison of the position and coordination of the hook lipids in MD simulations and in the closed (and/or open) structures.

      See new Figure 2 – Figure Supplement 1 in comparison with Figure 5 and new Figure 4 – Figure Supplement 1.

      iii) The authors acknowledge that the volume of the cavity where the hook lipids are located decreases on channel opening. How does this not affect the association of the hook lipids with the protein?

      There appears to be a misunderstanding. The hydrophobic cavities that explain the membrane protrusions discussed in the manuscript are not where the “hook” lipids are observed – we hope to have fully clarified this in the new Figure 4 – Figure Supplement 1. These hydrophobic cavities are underneath each of the TM1-TM2 hairpins, on the cytoplasmic side of the transmembrane domain of the channel; accordingly the protrusions are formed in and exchange lipids with the inner leaflet of the bilayer. Upon reorientation of the TM1-TM2 hairpin, i.e. in the open state, these cavities indeed become smaller but more importantly, they become embedded in the membrane – and hence the protrusions are largely eliminated – see Figure 8 – Figure Supplement 1. The sites where the “hook” lipids observed are elsewhere in the structure, towards the outer entrance of the pore; these lipids originate in the outer leaflet. As discussed in the manuscript, the geometry of these sites in the experimentally determined structures of closed and open states is largely invariant; consistent with that observation, the occupancy of the “hook” lipid sites is also similar when simulations of closed and open states are compared. At this point, therefore, it is unclear whether the “hook” lipids are involved in tension sensing; it is plausible that their primary role is structural (for both open and closed states).

      iv) Past work revealed several lipids in MscS structures near these cavities besides the hook lipids, and their ordered dissociation from the channel was proposed to be important for gating. Do the simulations show lipids in these cavities?

      Yes. Previous structural studies identified individual lipid densities under the TM2-TM3 hairpins. Our data show these lipids are not isolated sites but integrated into a larger morphological feature.

      v) Does the occupancy of the hook lipids in MD simulations change between the open and closed conformations? This should be analyzed.

      Please see our answer to point (iii).

      vi) Is the occupancy of other lipids in the nearby cavity altered upon channel opening?

      Please see our answer to point (iii).

      vii) Is the exchange of lipids near Ile150 affected by the conformational change?

      Please see our answer to point (iii).

      I am a bit confused by the claim that "The comparison clearly highlights the reduction in the width of the transmembrane span of the channel upon opening, and how this changed is well matched by the thickness of the corresponding lipid nanodiscs (approximately from 38 to 23 Å)."

      This statement has been clarified. Our intention was to state is that in the open conformation stabilized by PC14, the increased tilt of the TM1-TM2 hairpins towards the midplane of the bilayer leads to a reduction in the hydrophobic width of the protein parallel to the membrane normal. (This reduction is clearly illustrated by our simulation data – see Figure 8 – Figure Supplement 1.) This change correlates with the reduction in thickness from the PC18 to the PC14 nanodiscs, explaining why the latter stabilizes the open state while the former stabilizes the closed state.

      i. How was the nanodisc membrane thickness determined? This should be described.

      ii. I do not see a ~15A change in the vertical length of the channel protein or of the nanodisc. While the panels in Fig.2 clearly show a vertical compression of the membrane, it appears that the ~15 A claim might be overstated. Adding a panel with measurements would be helpful to quantify this claim. If this is difficult on the membrane, maybe measurements could be performed on the protein.

      The reviewer is correct. The original estimate, based on a cursory measurement of distances between two sets of protein atoms seemingly aligned with the water-lipid interface, turned out to be less accurate than expected. A better and more reproducible estimate has now been derived from the OPM database (https://opm.phar.umich.edu/). Using V3 of the database the closed-state is 32.6 Å and the open is 25.8 Å. The change is 6.8 Å. This is the value we now report.

      iii. What happens to the N-terminal cap structure in the open state? What are the rearrangements that allow the extracellular ends of the TM1 to disassemble the cap.

      In the open conformation part of the N-terminal cap appears to re-folds into TM1 extending its length as this helix tilts to embed itself at the membrane/water interface. The detailed side-chain structure of this domain is not clearly resolved but the C trace can be approximately inferred.

      The data shown in Fig. 6 is cryptic and should be explained better in the main text. As it stands there is a cursory mention in pg. 12 and not much else.

      i. It would be helpful if the authors showed the position of Ile150 in the structure.

      Please see the revised version of Figure 6 and the corresponding caption.

      ii. Does the total number of lipids in proximity of Ile150 change over time? Or the fold change represents ~1:1 exchange of lipids in the pocket?

      Please see the revised version of Figure 6. The total number of lipids in proximity of Ile150 in closed MscS, i.e. the number of lipids filling the cavities under the TM1-TM2 hairpins, is approximately constant at any given timepoint; in both the CG and AA representations, we find about 4 lipids for each of the 7 subunits. However, these are not always same lipid molecules. For example, in a period of 20 s of CG simulation, 40 different lipid molecules were observed to transiently reside in each of protrusions. We trust that this new format of the figure will be more intuitive than the original version.

      iii. I am confused by the difference in the maximum possible fold-change in unique lipids, does this reflect the difference in total number of lipids in each leaflet in each system? If so, I am a bit confused as to why there is a ~30% difference in the AA simulations whereas the values are nearly identical for the CG one.

      Please see the revised version of Figure 6. For clarity we have eliminated the concept of fold-change (and maximum fold-change, relative to the total number of lipids in each leaflet), and now simply quantify the number of lipids in proximity to each site.

      iv. Is it possible to quantify the residence time of the lipids in the pocket of each subunit?

      Please see the revised version of Figure 6. From the data presented in panels C and D, it can be deduced that a full turnover takes 2-4 microseconds in the CG representation of the system; in the AA representation, we observe a turnover of about 75% in 10 microseconds, on average over all subunits.

      The authors state on Pg. 21 "Nevertheless, we question the prevailing view that density signals of this kind are evidence of regulatory lipid binding sites; that is, we do not concur with the assumption that lipids regulate the gating equilibrium of MscS just like an agonist or antagonist would for a ligand-gated receptor-channel." I am a bit confused by this statement. In principle, binding and unbinding of modulatory ligands can happen on relatively fast time scales, so the observation that in MD simulations lipids exchange on a faster time scale than that of channel gating is not sufficient to make this inference. Indeed, there is ample evidence from other channels (i.e. Trp channels, HCN channels etc) where visualization of similar signals led to the identification of modulatory lipid binding sites. Thus, while I do not necessarily disagree with the authors, I would encourage them to tone down the general portion of the statement.

      The statement has been rephrased as “Nevertheless, our data puts into question the prevailing view that density signals of this kind necessarily reflect long-lasting lipid immobilization, as one might expect for an agonist or antagonist of a ligand-gated receptor-channel.”

      Reviewer #2 (Public Review):

      1) Are the structures stable in the membrane also without the weak restraints on the dihedral angles? Continuing at least one of the atomistic simulations without restraints for about 1 microsecond in a tension-free membrane would address a possible concern that the severe membrane distortion could go away by a more extensive relaxation of the channel structure.

      Please see our responses to the Editor.

      2) Does the observed effect occur also in membranes with physiologically relevant PE lipids? Performing a simulation with a lipid mix closer to that in E. coli (and thus high in PE) would address a possible concern that the observed effect is not physiologically relevant.

      Please see our responses to the Editor.

      3) Please include a figure showing that the lipid positions in the MD simulations match the lipid densities in the cryo-EM maps.

      Rather than re-rendering images already published, or generating new images that might not adequately represent the authors’ interpretation of their own data, we have to opted to specify the specific figures in previous studies where lipid densities under the TM1-TM2 hairpin have been clearly highlighted, for both MscS and MSL1. Specifically, for MscS, see Figure 4 in Zhang et al. [Ref. 16] and Figures 3-5 and Supplementary Figure 11 in Flegler et al [Ref. 15]; for MSL1, see Supplementary Figure 8 in Deng et al [Ref. 18].

      4) Is the reported mobility of helices TM2-TM3 of MSL1, as deduced from a comparison of different cryo-EM structures (ref 18), sufficient to impact the lipid organisation?

      In the naming convention used in Ref. 18, TM3 in MSL1 corresponds to TM1 in MscS. Different channels in this family feature different N-terminal domains preceding TM1. MscS features a short helix that has been referred as the N-cap, which lies on the membrane surface. MSL1 from Arabidopsis however features two additional TM helices – which confusingly Ref. 18 refers to as TM1 and TM2, while the key hairpin adjacent to the pore domain is referred to as TM3-TM4. Neither TM1 or TM2 in MSL1 are clearly resolved, presumably because they are indeed mobile, but they are in any case peripheral and therefore not likely to be critically influential for the morphological changes in the membrane that we discuss in the manuscript. Indeed, our simulations of MSL1 do not, by design, include those two N-terminal helices – in part because, as mentioned, they are poorly resolved, but also so that the results can be directly contrasted with MscS. Nevertheless, both channels show very similar deformations in the membrane for the closed state, and an elimination of these deformations in the open state.

      5) Did the initial lipid configuration in atomistic MD simulations already contain the deformations of the inner leaflet, or did these form spontaneously both in coarse-grained and atomistic simulations?

      Please see our responses to the Editor.

      6) Did the earlier MD simulations of the closed-state structure 6PWN of MscL give any indications on the membrane deformation?

      The simulation reported in Reddy et al alongside the structure of closed MscS in PC18 [Ref. 17] did not reveal the kind of deformations observed in this study, most probably due to insufficient equilibration time. However, that simulation did reveal a translational displacement of the channel relative to what had been previously assumed to be the transmembrane span. In retrospect, it seems clear that the observed translation was driven by the strong hydrophobic mismatch between the protein surface and the flat lipid bilayer; the membrane deformations we now observe represent the adaptation that ultimately minimizes that mismatch.

      7) Are there distinct interactions between the headgroups of distorted inner-leaflet lipids with charged amino acids? If so, are these amino acids conserved?

      Please see the new Figure 4 – Figure Supplement 1. As discussed in the manuscript, the interior of the cavities formed under the TM1-TM2 hairpins, and flanked by TM3a and TM3b, are lined almost entirely by hydrophobic residues. Charged and polar amino-acids are only observed on the outer face of the TM1-TM2 hairpin and are primarily in contact water.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors focused on linking physiological data on theta phase precession and spike-timing-dependent plasticity to the more abstract successor representation used in reinforcement learning models of spatial behavior. The model is presented clearly and effectively shows biological mechanisms for learning the successor representation. Thus, it provides an important step toward developing mathematical models that can be used to understand the function of neural circuits for guiding spatial memory behavior.

      However, as often happens in the Reinforcement Learning (RL) literature, there is a lack of attention to non-RL models, even though these might be more effective at modeling both hippocampal physiology and its role in behavior. There should be some discussion of the relationship to these other models, without assuming that the successor representation is the only way to model the role of the hippocampus in guiding spatial memory function.

      We thank the reviewer for the positive comments about the work, and for the detailed and constructive feedback. We agree with the reviewer that the manuscript will benefit from significantly more discussion of non-RL models, and we’ve detailed below a number of modifications to the manuscript to better incorporate prior work from the hippocampal literature, including the citations the reviewer has listed. Since our goal with this paper is to contextualise hippocampal phenomena in the context of an RL learning rule, this is really important and we appreciate the reviewers recommendations. We have added text (outlined in the point-by-point responses below) to the introduction and to the discussion that we hope better demonstrates the connections between the SR and existing computational models of hippocampus, and communicates clearly that the SR is not unique in capturing phenomena such as factorization of space and reward or capturing sequence statistics, but is rather a model that captures these phenomena while also connecting with downstream RL computations. Existing RL accounts of hippocampal representation often do not connect with known properties of hippocampus (as illustrated by the fact that TD learning was proposed in prior work to be the learning mechanism for SRs, even though this doesn’t have an obvious mechanism in HPC), so the purpose of this work is to explore the extent to which TD learning effectively overlaps with the well-studied properties of STDP and theta oscillations. In that sense, this paper is an effort to connect RL models of hippocampus to more physiologically plausible mechanisms rather than an attempt to model phenomena that the existing computational hippocampus literature could not capture.

      1) Page 1- "coincides with the time window of STDP" - This model shows effectively how theta phase precession allows spikes to fall within the window of spike-timing-dependent synaptic plasticity to form successor representations. However, this combination of precession and STDP has been used in many previous models to allow the storage of sequences useful for guiding behavior (e.g. Jensen and Lisman, Learning and Memory, 1996; Koene, Gorchetchnikov, Cannon, Hasselmo, Neural Networks, 2003). These previous models should be cited here as earlier models using STDP and phase precession to store sequences. They should discuss in terms of what is the advantage of an RL successor representation versus the types of associative sequence coding in these previous models.

      We agree that the idea of using theta precession to compress sequences onto the timescale of synaptic learning is a long-standing concept in sequence learning, and that we need to be careful to communicate what the advantages are of considering this in the RL context. We have added these citations to the introduction:

      “One of the consequences of phase precession is that correlates of behaviour, such as position in space, are compressed onto the timescale of a single theta cycle and thus coincide with the time-window of STDP O(20 − 50 ms) [8, 18, 20, 21]. This combination of theta sweeps and STDP has been applied to model a wide range of sequence learning tasks [22, 23, 24], and as such, potentially provides an efficient mechanism to learn from an animal’s experience – forming associations between cells which are separated by behavioural timescales much larger than that of STDP.” and added a paragraph to the discussion as well that makes this clear:

      “That the predictive skew of place fields can be accomplished with a STDP-type learning rule is a long-standing hypothesis; in fact, the authors that originally reported this effect also proposed a STDP-type mechanism for learning these fields [18, 20]. Similarly, the possible accelerating effect of theta phase precession on sequence learning has also been described in a number of previous works [22, 55, 23, 24]. Until recently [40, 41], SR models have largely not connected with this literature: they either remain agnostic to the learning rule or assume temporal difference learning (which has been well-mapped onto striatal mechanisms [37, 56], but it is unclear how this is implemented in hippocampus) [54, 31, 36, 57, 58]. Thus, one contribution of this paper is to quantitatively and qualitatively compare theta-augmented STDP to temporal difference learning, and demonstrate where these functionally overlap. This explicit link permits some insights about the physiology, such as the observation that the biologically observed parameters for phase precession and STDP resemble those that are optimal for learning the SR (Fig 3), and that the topographic organisation of place cell sizes is useful for learning representations over multiple discount timescales (Fig 4). It also permits some insights for RL, such as that the approximate SR learned with theta-augmented STDP, while provably theoretically different from TD (Section 5.8), is sufficient to capture key qualitative phenomena.”

      2) On this same point, in the introduction, the successor representation is presented as a model that forms representations of space independent of reward. However, this independence of spatial associations and reward has been a feature of most hippocampal models, that then guide behavior based on interactions between a reward representation and the spatial representation (e.g. Redish and Touretzky, Neural Comp. 1998; Burgess, Donnett, Jeffery, O'Keefe, Phil Trans, 1997; Koene et al. Neural Networks 2003; Hasselmo and Eichenbaum, Neural Networks 2005; Erdem and Hasselmo, Eur. J. Neurosci. 2012). The successor representation should not be presented as if it is the only model that ever separated spatial representations and reward. There should be some discussion of what (if any) advantages the successor representation has over these other modeling frameworks (other than connecting to a large body of RL researchers who never read about non-RL hippocampal models). To my knowledge, the successor representation has not been explicitly tested on all the behaviors addressed in these earlier models.

      We agree – a long-standing property of computational models in the hippocampal literature is a factorization of spatial and reward representations, and we have edited the text of the paper to make it clear that this is not a unique contribution of the SR. We have modified our description of the SR to better place it in the context of existing theories about hippocampal contributions to the factorised representations of space and goals, and included all citations mentioned here by adding the following text.

      We have added a sentence to the introduction:

      “However, the computation of expected reward can be decomposed into two components – the successor representation, a predictive map capturing the expected location of the agent discounted into the future, and the expected reward associated with each state [26]. Such segregation yields several advantages since information about available transitions can be learnt independently of rewards and thus changes in the locations of rewards do not require the value of all states to be re-learnt. This recapitulates a number of long-standing theories of hippocampus which state that hippocampus provides spatial representations that are independent of the animal’s particular goal and support goal-directed spatial navigation[27, 28, 23, 29, 30]”

      We have also added a paragraph to the discussion:

      “The SR model has a number of connections to other models from the computational hippocampus literature that bear on the interpretation of these results. A long-standing property of computational models in the hippocampal literature is a factorisation of spatial and reward representations [27, 28, 23, 29, 30], which permits spatial navigation to rapidly adapt to changing goal locations. Even in RL, the SR is also not unique in factorising spatial and reward representations, as purely model-based approaches do this too [26, 25, 67]. The SR occupies a much more narrow niche, which is factorising reward from spatial representations while caching long-term occupancy predictions [26, 68]. Thus, it may be possible to retain some of the flexibility of model-based approaches while retaining the rapid computation of model-free learning.”

      3) Related to this, successes of the successor representation are presented as showing thebackward expansion of place cells. But this was modeled at the start by Mehta and colleagues using STDP-type mechanisms during sequence encoding, so why was the successor representation necessary for that? I don't want to turn this into a review paper comparing hippocampal models, but the body of previous models of the role of the hippocampus in behavior warrants at least a paragraph in each of the introduction and discussion sections. In particular, it should not be somehow assumed that the successor representation is the best model, but instead, there should be some comparison with other models and discussion about whether the successor representation resembles or differs from those earlier models.

      We agree this was not clear. This is a nuanced point that warrants substantial discussion, and we have added a paragraph to the discussion (see the paragraph in the response to point 1 that begins “That the predictive skew of place fields can be accomplished…”).

      4) The text seems to interchangeably use the term "successor representation" and "TD trained network" but I think it would be more accurate to contrast the new STDP trained network with a network trained by Temporal Difference learning because one could argue that both of them are creating a successor representation.

      We now refer to these as “STDP successor features” and “TD successor features”. We have also replaced all references of “true successor representation/features” to “TD successor representation/feature” and have edited the text at the beginning of the results section to reflect this:

      “The STDP synaptic weight matrix Wij (Fig. 1d) can then be directly compared to the temporal difference (TD) successor matrix Mij (Fig. 1e), learnt via TD learning on the CA3 basis features (the full learning rule is derived in Methods and shown in Eqn. 27). Further, the TD successor matrix Mij can also be used to generate the ‘TD successor features’...”

      Reviewer #2 (Public Review):

      The authors present a set of simulations that show how hippocampal theta sequences may be combined with spike time-dependent plasticity to learn a predictive map - the successor representation - in a biologically plausible manner. This study addresses an important question in the field: how might hippocampal theta sequences be combined with STDP to learn predictive maps? The conclusions are interesting and thought-provoking. However, there were a number of issues that made it hard to judge whether the conclusions of the study are justified. These concerns mainly surround the biological plausibility of the model and parameter settings, the lack of any mathematical analysis of the model, and the lack of direct quantitative comparison of the findings to experimental data.

      While the model uses broadly realistic biological elements to learn the successor representation, there remain a number of important concerns with regard to the biological plausibility of the model. For example, the model assumes that each CA3 cell connects to exactly 1 CA1 cell throughout the whole learning process so that each CA1 cell simply inherits the activity of a single CA3 cell. Moreover, neurons in the model interact directly via their firing rate, yet produce spikes that are used only for the weight updates. Certain model parameters also appeared to be unrealistic, for example, the model combined very wide place fields with slow running speeds. This leaves open the question as to whether the proposed learning mechanism would function correctly in more realistic parameter settings. Simulations were performed for a fixed running speed, thereby omitting various potentially important effects of running speed on the phase precession and firing rate of place cells. Indeed, the phase precession of CA1 place cells was not shown or discussed, so it is unclear as to whether CA1 cells produce realistic patterns of phase precession in the model.

      The fact that a successor-like representation emerges in the model is an interesting result and is likely to be of substantial interest to those working at the intersection between neuroscience and artificial intelligence. However, because no theoretical analysis of the model was performed, it remains unclear why this interesting correspondence emerges. Was it a coincidence? When will it generalise? These questions are best answered by mathematical analysis of the model (or a reduced form of it).

      Several aspects of the model are qualitatively consistent with experimental data. For example, CA1 place fields clustered around doorways and were elongated along walls. While these findings are important and provide some support for the model, considerable work is required to draw a firm correspondence between the model and experimental data. Thus, without a quantitative comparison of the place field maps in experimental data and the model, it is hard to draw strong conclusions from these findings.

      Overall, this study promises to make an important contribution to the field, and will likely be read with interest by those working in the fields of both neuroscience and artificial intelligence. However, given the above caveats, further work is required to establish the biological plausibility of the model, develop a theoretical understanding of the proposed learning process, and establish a quantitative comparison of the findings to experimental data.

      Thank you for the positive comments about the work, and for the detailed and constructive review. We appreciate the time spent evaluating the model and understanding its features at a deep level. Your comments and suggestions have led to exciting new simulation results and a theoretical analysis which shed light on the connections between TD learning, STDP and phase precession.

      We have incorporated a number of new simulations to tackle what we believe are your most pressing concerns surrounding the model’s biological plausibility. As such, we have extended the hyperparameter sweep (Supp. Fig 3) to include the phase precession parameters you recommended, as well as three new multipanel supplementary figures satisfying your recommendations (Supp. Figs. 1, 2 & 4). Collectively, these figures show that the specifics of our results, which as you pointed out might have been produced with biologically implausible values (place cell size, movement speed/statistics, weight initialisation, weight updating schedule and phase precession parameters), do not fundamentally depend on the specific values of these parameters: the mechanism still learns predictive maps close in form to the TD successor features. In the hyperparameter sweep, we do find that results are sensitive to specific parameter values (Supp. Fig 3), but that interestingly, the optimal values of these parameters are remarkably close to those observed experimentally. We have also written an extensive new theory section analysing why theta sequences plus STDP approximates TD learning. In addition the methods section has been added to and reordered to make some of the subtler aspects of our model (i.e. the mapping of rates-to-rates and weight fixing during learning) more clear.

      At a high level, regarding our claim of biological plausibility, we like to clarify our intended contribution and give context to some responses below. We have added the following paragraph to the discussion in order to accurately represent the scope of our work:

      “While our model is biologically plausible in several respects, there remain a number of aspects of the biology that we do not interface with, such as different cell types, interneurons and membrane dynamics. Further, we do not consider anything beyond the most simple model of phase precession, which directly results in theta sweeps in lieu of them developing and synchronising across place cells over time [60]. Rather, our philosophy is to reconsider the most pressing issues with the standard model of predictive map learning in the context of hippocampus (e.g., the absence of dopaminergic error signals in CA1 and the inadequacy of synaptic plasticity timescales). We believe this minimalism is helpful, both for interpreting the results presented here and providing a foundation for further work to examine these biological intricacies, such as the possible effect of phase offsets in CA3, CA1 [61] and across the dorsoventral axis [62, 63], as well as whether the model’s theta sweeps can alternately represent future routes [64] e.g. by the inclusion of attractor dynamics [65].”

    1. Author Response:

      Reviewer #1 (Public Review):

      Chakrabarti et al study inner hair cell synapses using electron tomography of tissue rapidly frozen after optogenetic stimulation. Surprisingly, they find a nearly complete absence of docked vesicles at rest and after stimulation, but upon stimulation vesicles rapidly associate with the ribbon. Interestingly, no changes in vesicle size were found along or near the ribbon. This would have indicated a process of compound fusion prior to plasma membrane fusion, as proposed for retinal bipolar cell ribbons. This lack of compound fusion is used to argue against MVR at the IHC synapse. However, that is only one form of MVR. Another form, coordinated and rapid fusion of multiple docked vesicles at the bottom of the ribbon, is not ruled out. Therefore, I agree that the data set provides good evidence for rapid replenishment of the ribbon-associated vesicles, but I do not find the evidence against MVR convincing. The work provides fundamental insight into the mechanisms of sensory synapses.

      We thank the reviewer for the appreciation of our work and the constructive comments. As pointed out below, we now included this discussion (from line 679 onwards).

      We wrote:

      “This might reflect spontaneous univesicular release (UVR) via a dynamic fusion pore (i.e. ‘kiss and run’, (Ceccarelli et al., 1979), which was suggested previously for IHC ribbon synapses (Chapochnikov et al., 2014; Grabner and Moser, 2018; Huang and Moser, 2018; Takago et al., 2019) and/or and rapid undocking of vesicles (e.g. Dinkelacker et al., 2000; He et al., 2017; Nagy et al., 2004; Smith et al., 1998). In the UVR framework, stimulation by ensuing Ca2+ influx triggers the statistically independent release of several SVs. Coordinated multivesicular release (MVR) has been indicated to occur at hair cell synapses (Glowatzki and Fuchs, 2002; Goutman and Glowatzki, 2007; Li et al., 2009) and retinal ribbon synapses (Hays et al., 2020; Mehta et al., 2013; Singer et al., 2004) during both spontaneous and evoked release. We could not observe structures which might hint towards compound or cumulative fusion, neither at the ribbon nor at the AZ membrane under our experimental conditions. Upon short and long stimulation, RA-SVs as well as docked SVs even showed a slightly reduced size compared to controls. However, since some AZs harbored more than one docked SV per AZ in stimulated conditions, we cannot fully exclude the possibility of coordinated release of few SVs upon depolarization.”

      Reviewer #2 (Public Review):

      Chakrabarti et al. aimed to investigate exocytosis from ribbon synapses of cochlear inner hair cells with high-resolution electron microscopy with tomography. Current methods to capture the ultrastructure of the dynamics of synaptic vesicle release in IHCs rely on the application of potassium for stimulation, which constrains temporal resolution to minutes rather than the millisecond resolution required to analyse synaptic transmission. Here the authors implemented a high-pressure freezing method relying on optogenetics for stimulation (Opto-HPF), granting them both high spatial and temporal resolutions. They provide an extremely well-detailed and rigorously controlled description of the method, falling in line with previously use of such "Opto-HPF" studies. They successfully applied Opto-HPF to IHCs and had several findings at this highly specialised ribbon synapse. They observed a stimulation-dependent accumulation of docked synaptic vesicles at IHC active-zones, and a stimulation-dependent reduction in the distance of non-docked vesicles to the active zone membrane; while the total number of ribbon-associated vesicles remained unchanged. Finally, they did not observe increases in diameter of synaptic vesicles proximal to the active zone, or other potential correlates to compound fusion - a potential mode of multivesicular release. The conclusions of the paper are mostly well supported by data, but some aspects of their findings and pitfalls of the methods should be better discussed.

      We thank the reviewer for the appreciation of our work and the constructive comments.

      Strengths:

      While now a few different groups have used "Opto-HPF" methods (also referred to as "Flash and Freeze) in different ways and synapses, the current study implemented the method with rigorous controls in a novel way to specifically apply to cochlear IHCs - a different sample preparation than neuronal cultures, brain slices or C. elegans, the sample preparations used so far. The analysis of exocytosis dynamics of IHCs with electron microscopy with stimulation has been limited to being done with the application of potassium, which is not physiological. While much has been learned from these methods, they lacked time resolution. With Opto-HPF the authors were successfully able to investigate synaptic transmission with millisecond precision, with electron tomography analysis of active zones. I have no overall questions regarding the methodology as they were very thoroughly described. The authors also employed electrophysiology with optogenetics to characterise the optical simulation parameters and provided a well described analysis of the results with different pulse durations and irradiance - which is crucial for Opto-HPF.

      Thank you very much.

      Further, the authors did a superb job in providing several tables with data and information across all mouse lines used, experimental conditions, and statistical tests, including source code for the diverse analysis performed. The figures are overall clear and the manuscript was well written. Such a clear representation of data makes it easier to review the manuscript.

      Thank you very much.

      Weaknesses:

      There are two main points that I think need to be better discussed by the authors.

      The first refers to the pitfalls of using optogenetics to analyse synaptic transmission. While ChR2 provides better time resolution than potassium application, one cannot discard the possibility that calcium influx through ChR2 alters neurotransmitter release. This important limitation of the technique should be properly acknowledged by the authors and the consequences discussed, specifically in the context in which they applied it: a single sustained pulse of light of ~20ms (ShortStim) and of ~50ms (LongStim). While longer, sustained stimulation is characteristic for IHCs, these are quite long pulses as far as optogenetics and potential consequences to intrinsic or synaptic properties.

      We thank the reviewer for pointing this out. We would like to mention that upon 15 min high potassium depolarization, the number of docked SVs only slightly increased as shown in Chakrabarti et al., 2018, EMBO rep and Kroll et al. 2020 JCS, but it was not statistically significant. In the current study, we report a similar phenomenon, but here light induced depolarization resulted in a more robust increase in the number of docked SVs.

      To compare the data from the previous studies with the current study, we included an additional table 3 (line 676) now in the discussion with all total counts (and average per AZ) of docked SVs.

      Furthermore, in response to the reviewers’ concern, we now discuss the Ca2+ permeability of ChR2 in addition to the above comparison to our previous studies that demonstrated very few docked SVs in the absence of K+ channel blockers and ChR2 expression in IHCs. We are not entirely certain, if the reviewer refers to potential dark currents of ChR2 (e.g. as an explanation for a depletion of docked vesicles under non-stimulated conditions) or to photocurrents, the influx of Ca2+ through ChR2 itself, and their contribution to Ca2+ concentration at the active zone.

      However, regardless this, we consider it unlikely that a potential contribution of Ca2+ influx via ChR2 evokes SV fusion at the hair cell active zone.

      First of all, we note that the Ca2+ affinity of IHC exocytosis is very low. As first shown in Beutner et al., 2001 and confirmed thereafter (e.g. Pangrsic et al., 2010), there is little if any IHC exocytosis for Ca2+ concentrations at the release sites below 10 µM. Two studies using CatCh (a ChR2 mutant with higher Ca2+ permeability than wildtype ChR2 (Kleinlogel et al., 2011; Mager et al., 2017) estimated a max intracellular Ca2+ increase below 10 µM, even at very negative potentials that promote Ca2+ influx along the electrochemical gradient or at high extracellular Ca2+ concentrations of 90 mM. In our experiments, IHCs were depolarized, instead, to values for which extrapolation of the data of Mager et al., 2017 indicate a submicromolar Ca2+ concentration. In addition, we and others have demonstrated powerful Ca2+ buffering and extrusion in hair cells (e.g. Tucker and Fettiplace, 1995; Issa and Hudspeth., 1996; Frank et al., 2009 Pangrsic et al., 2015). As a result, the hair cells efficiently clear even massive synaptic Ca2+ influx and establish a low bulk cytosolic Ca2+ concentration (Beutner and Moser, 2001; Frank et al., 2009). We reason that these clearance mechanisms efficiently counter any Ca2+ influx through ChR2. This will likely limit potential effects of ChR2 mediated Ca2+ influx on Ca2+ dependent replenishment of synaptic vesicles during ongoing stimulation.

      We have now added the following in the discussion (starting in line 620):

      “We note that ChR2, in addition to monovalent cations, also permeates Ca2+ ions and poses the question whether optogenetic stimulation of IHCs could trigger release due to direct Ca2+ influx via the ChR2. We do not consider such Ca2+ influx to trigger exocytosis of synaptic vesicles in IHCs. Optogenetic stimulation of HEK293 cells overexpressing ChR2 (wildtype version) only raises the intracellular Ca2+ concentration up to 90 nM even with an extracellular Ca2+ concentration of 90 mM (Kleinlogel et al., 2011). IHC exocytosis shows a low Ca2+ affinity (~70 µM, Beutner et al., 2001) and there is little if any IHC exocytosis for Ca2+ concentrations below 10 µM, which is far beyond what could be achieved even by the highly Ca2+ permeable ChR2 mutant (CatCh: Ca2+ translocating channelrhodopsin, Mager et al., 2017). In addition, we reason that the powerful Ca2+ buffering and extrusion by hair cells (e.g., Frank et al., 2009; Issa and Hudspeth, 1996; Pangršič et al., 2015; Tucker and Fettiplace, 1995) will efficiently counter Ca2+ influx through ChR2 and, thereby limit potential effects on Ca2+ dependent replenishment of synaptic vesicles during ongoing stimulation. “

      The second refers to the finding that the authors did not observe evidence of compound fusion (or homotypic fusion) in their data. This is an interesting finding in the context of multivesicular release in general, as well as specifically for IHCs. While the authors discussed the potential for "kiss-and-run" and/or "kiss-and-stay", it would be valuable if they could discuss their findings further in the context of the field for multivesicular release. For example, the evidence in support of the potential of multiple independent release events. Further, as far as such function-structure optical-quick-freezing methods, it is not unusual to not capture fusion events (so-called omega-shapes or vesicles with fusion pores); this is largely because these are very fast events (less than 10 ms), and not easily captured with optical stimulation.

      We agree with the reviewer that the discussion on MVR and UVR should be extended. We now added the following paragraph to the discussion from line 679 on:

      “This might reflect spontaneous univesicular release (UVR) via a dynamic fusion pore (i.e. ‘kiss and run’, (Ceccarelli et al., 1979), which was suggested previously for IHC ribbon synapses (Chapochnikov et al., 2014; Grabner and Moser, 2018; Huang and Moser, 2018; Takago et al., 2019) and/or and rapid undocking of vesicles (e.g. Dinkelacker et al., 2000; He et al., 2017; Nagy et al., 2004; Smith et al., 1998). In the UVR framework, stimulation by ensuing Ca2+ influx triggers the statistically independent release of several SVs. Coordinated multivesicular release (MVR) has been indicated to occur at hair cell synapses (Glowatzki and Fuchs, 2002; Goutman and Glowatzki, 2007; Li et al., 2009) and retinal ribbon synapses (Hays et al., 2020; Mehta et al., 2013; Singer et al., 2004) during both spontaneous and evoked release. We could not observe structures which might hint towards compound or cumulative fusion, neither at the ribbon nor at the AZ membrane under our experimental conditions. Upon short and long stimulation, RA-SVs as well as docked SVs even showed a slightly reduced size compared to controls. However, since some AZs harbored more than one docked SV per AZ in stimulated conditions, we cannot fully exclude the possibility of coordinated release of few SVs upon depolarization.”

      Reviewer #3 (Public Review):

      Precise methods were developed to validate the expression of channelrhodopsin in inner hair cells of the Organ of Corti, to quantify the relationship between blue light irradiance and auditory nerve fiber depolarization, to control light stimulation within the chamber of a high-pressure freezing device, and to measure with good precision the delay between stimulation and freezing of the specimen. These methods represent a clear advance over previous experimental designs used to study this synaptic system and are an initial application of rapid high-pressure freezing with freeze substitution, followed by high-resolution electron tomography (ET), to sensory cells that operate via graded potentials.

      Short-duration stimuli were used to assess the redistribution of vesicles among pools at hair cell ribbon synapses. The number of vesicles linked to the synaptic ribbon did not change, but vesicles redistributed within the membrane-proximal pool to docked locations. No evidence was found for vesicle-to-vesicle fusion prior to vesicle fusion to the membrane, which is an important, ongoing question for this synapse type. The data for quantifying numbers of vesicles in membrane-tethered, non-tethered, and docked vesicle pools are compelling and important.

      We thank the reviewer for the appreciation of our work and the constructive comments.

      These quantifications would benefit from additional presentation of raw images so that the reader can better assess their generality and variability across synaptic sites.

      The images shown for each of the two control and two experimental (stimulated) preparation classes should be more representative. Variation in synaptic cleft dimensions and numbers of ribbon-associated and membrane-proximal vesicles do not track the averaged data. Since the preparation has novel stimulus features, additional images (as the authors employed in previous publications) exhibiting tethered vesicles, non-tethered vesicles, docked vesicles, several sections through individual ribbons, and the segmentation of these structures, will provide greater confidence that the data reflect the images.

      Thank you very much for pointing this out. We now included more details in supplemental figures and in the text.

      Precisely, we added:

      • More details about the morphological sub-pools (analysis and images):

        -We now show a sequence of images with different tethering states of membrane proximal SVs together with examples for docked and non-tethered SVs as we did in Chakrabarti et al., 2018 for each condition (Fig. 6-figure supplement 2, line 438). Moreover, we included for each condition additional information, we selected further tomograms, one per condition, and depict two additional virtual sections: Fig. 6-figure supplement 2.

        -Moreover, we present a more detailed quantification for the different morphological sub-pools: For the MP-SV pool, we analyzed the SV diameters and the distances to the AZ membrane and PD of different SV sub-pools separately, we now included this information in Fig. 7 For the RA-SVs, we analyzed in addition the morphological sub-pools and the SV diameters in the distal and the proximal ribbon part as done in Chakrabarti et al. 2018. We now added a new supplement figure (Fig. 7-figure supplement 2, line 558 and a supplementary file 2).

      • We replaced the virtual section in panel 6D: In the old version, it appeared that the ribbon was contacting the membrane and we realized that this virtual section was not representative: actually, the ribbon was not directly contacting the AZ membrane, a presynaptic density was still visible adjacent to the docked SVs. To avoid potential confusion, we selected a different virtual section of the same tomogram and now indicated the presynaptic density also as graphical aid in Fig. 6.

      The introduction raises questions about the length of membrane tethers in relation to vesicle movement toward the active zone, but this topic was not addressed in the manuscript.

      We apologize for not stating it sufficiently clear, we now rephrased this sentence. We now wrote:

      “…and seem to be organized in sub-pools based on the number of tethers and to which structure these tethers are connected. “

      Seemingly quantification of this metric, and the number of tethers especially for vesicles near the membrane, is straightforward. The topic of EPSC amplitude as representing unitary events due to variation in vesicle volume, size of the fusion pore, or vesicle-vesicle fusion was partially addressed. Membrane fusion events were not evident in the few images shown, but these presumably occurred and could be quantified. Likewise, sites of membrane retrieval could also be marked. These analyses will broaden the scope of the presentation, but also contribute to a more complete story.

      Regarding the presence/absence of membrane fusion events we agree with the reviewer that this should be clearly addressed in the MS. We would like to point out that we

      (i) did not observe any omega shapes at the AZ membrane, which we also mention in the MS. We can also report that we could not see them in data sets from previous publications (Vogl et al., 2015, JCS; Jung et al., 2015, PNAS).

      (ii) To be clear on our observations on potential SV-SV fusion events we now point out in the discussion from line 688ff:

      “We could not observe structures which might hint towards compound or cumulative fusion, neither at the ribbon nor at the AZ membrane under our experimental conditions. Upon short and long stimulation, RA-SVs as well as docked SVs even showed a slightly reduced size compared to controls. However, since some AZs harbored more than one docked SV per AZ in stimulated conditions, we cannot fully exclude the possibility of coordinated release of few SVs upon depolarization.”

      Furthermore, we agree with the reviewer that a complete presentation of endo-exocytosis structural correlates is very important. However, we focused our study on exocytosis events and therefore mainly analyzed membrane proximal SVs at active zones.

      Nonetheless, in response to the reviewer’s comment, we now included a quantification of clathrin-coated (CC) structures. We determined the appearance of CC vesicles (V) and CC invaginations within 0-500 nm away from the PD. We measured the diameter of the CCV, and their distance to the membrane and the PD. We only found very few CC structures in our tomograms (now added in a table to the result section (Supplementary file 1). Sites for endocytic membrane retrieval likely are in the peri-active zone area or even beyond. We did not observe obvious bulk endocytosis events that were connected to the AZ membrane. However, we do observe large endosomal like vesicles that we did not quantify in this study. More details were presented in two of our previous studies: Kroll et al., 2019 and 2020, however, under different stimulation conditions.

      Overall, the methodology forms the basis for future studies by this group and others to investigate rapid changes in synaptic vesicle distribution at this synapse.

      Reviewer #4 (Public Review):

      This manuscript investigates the process of neurotransmitter release from hair cell synapses using electron microscopy of tissue rapidly frozen after optogenetic stimulation. The primary finding is that in the absence of a stimulus very few vesicles appear docked at the membrane, but upon stimulation vesicles rapidly associate with the membrane. In contrast, the number of vesicles associated with the ribbon and within 50 nm of the membrane remains unchanged. Additionally, the authors find no changes in vesicle size that might be predicted if vesicles fuse to one-another prior to fusing with the membrane. The paper claims that these findings argue for rapid replenishment and against a mechanism of multi-vesicular release, but neither argument is that convincing. Nonetheless, the work is of high quality, the results are intriguing, and will be of interest to the field.

      We thank the reviewer for the appreciation of our work and the constructive comments.

      1) The abstract states that their results "argue against synchronized multiquantal release". While I might agree that the lack of larger structures is suggestive that homotypic fusion may not be common, this is far from an argument against any mechanisms of multi-quantal release. At least one definition of synchronized multiquantal release posits that multiple vesicles are fusing at the same time through some coordinated mechanism. Given that they do not report evidence of fusion itself, I fail to see how these results inform us one way or the other.

      We agree with the reviewer that the discussion on MVR and UVR should be extended. It is important to point out that we do not claim that the evoked release is mediated by one single SV. As discussed in the paper (line 672), we consider that our optogenetic stimulation of IHCs triggers the release of more than 10 SVs per AZ. This falls in line with the previous reports of several SVs fusing upon stimulation. This type of evoked MVR is probably mediated by the opening of Ca2+ channels in close proximity to each SV Ca2+ sensor. We indeed sometimes observed more than one docked SV per AZ upon long optogenetic stimulation. This could reflect that possibility. However, given the absence of large structures directly at the ribbon or the AZ membrane that could suggest the compound fusion of several SVs prior or during fusion, we argue against compound MVR release at IHCs. As mentioned above, we added to the discussion (from line 679 onwards).

      We wrote:

      “This might reflect spontaneous univesicular release (UVR) via a dynamic fusion pore (i.e. ‘kiss and run’, (Ceccarelli et al., 1979), which was suggested previously for IHC ribbon synapses (Chapochnikov et al., 2014; Grabner and Moser, 2018; Huang and Moser, 2018; Takago et al., 2019) and/or and rapid undocking of vesicles (e.g. Dinkelacker et al., 2000; He et al., 2017; Nagy et al., 2004; Smith et al., 1998). In the UVR framework, stimulation by ensuing Ca2+ influx triggers the statistically independent release of several SVs. Coordinated multivesicular release (MVR) has been indicated to occur at hair cell synapses (Glowatzki and Fuchs, 2002; Goutman and Glowatzki, 2007; Li et al., 2009) and retinal ribbon synapses (Hays et al., 2020; Mehta et al., 2013; Singer et al., 2004) during both spontaneous and evoked release. We could not observe structures which might hint towards compound or cumulative fusion, neither at the ribbon nor at the AZ membrane under our experimental conditions. Upon short and long stimulation, RA-SVs as well as docked SVs even showed a slightly reduced size compared to controls. However, since some AZs harbored more than one docked SV per AZ in stimulated conditions, we cannot fully exclude the possibility of coordinated release of few SVs upon depolarization.”

      2) The complete lack of docked vesicles in the absence of a stimulus followed by their appearance with a stimulus is a fascinating result. However, since there are no docked vesicles prior to a stimulus, it is really unclear what these docked vesicles represent - clearly not the RRP. Are these vesicles that are fusing or recently fused or are they ones preparing to fuse? It is fine that it is unknown, but it complicates their interpretation that the vesicles are "rapidly replenished". How does one replenish a pool of docked vesicles that didn't exist prior to the stimulus?

      In response to the reviewers’ comment, we would like to note that we indeed reported very few docked SVs in wild type IHCs at resting conditions without K+ channel blockers in Chakrabarti et al. EMBO Rep 2018 and in Kroll et al., 2020, JCS. In both studies, a solution without TEA and Cs was used for the experiments (resting solution Chakrabarti: 5 mM KCl, 136.5 mM NaCl, 1 mM MgCl2, 1.3 mM CaCl2, 10 mM HEPES, pH 7.2, 290 mOsmol; control solution Kroll: 5.36 mM KCl, 139.7 mM NaCl, 2 mM CaCl2, 1 mM MgCl2, 0.5 mM MgSO4, 10 mM HEPES, 3.4 mM L-glutamine, and 6.9 mM D-glucose, pH 7.4). Similarly, our current study shows very few docked SVs in the resting condition even in the presence of TEA and Cs. Based on the results presented in ‘Response to reviewers Figure 1’, we assume that the scarcity of docked SVs under control conditions is not due to depolarization induced by a solution containing 20 mM TEA and 1 mM Cs but is rather representative for the physiological resting state of IHC ribbon synapses. Upon 15 min high potassium depolarization, the number of docked SVs only slightly increased as shown in Chakrabarti et al., 2018 and Kroll et al. 2020, but it was not statistically significant. In the current study, we report a similar phenomenon, but here depolarization resulted in a more robust increase in the number of docked SVs.

      To compare the data from the previous studies with the current study, we included an additional table 3 (line 676) now in the discussion with all total counts (and average per AZ) of docked SVs.

    1. Author Response:

      eLife assessment

      This paper reports a useful set of results that uses a reduced network model based on a previously published large-scale network model to explain the generation of theta-gamma rhythms in the hippocampus. Combining the detailed and reduced models and comparing their results is a powerful approach. However, the evidence for the main claim that CCK+ basket cells play a key role in theta-gamma coupling in the hippocampus is currently incomplete.

      We thank the reviewers for their thorough and thoughtful notes, and we are pleased that there is acknowledgement of the combination of models as a powerful approach.  We agree with many of the comments made and we intend to address them in subsequent revisions. 

      In particular, we think that our ‘narrative’ as presented was perhaps not as clear as it could have been, based on the somewhat different comments from the reviewers (R#1 and #3).  That is, we created a reduced population rate model based on the theta/gamma generation hypotheses from the detailed model and then explored the PRM in more detail to predict cellular contributions.  The goal was not to validate the original detailed model per se (R#1) nor to do a fitting of parameters in the PRM directly from the detailed model (R#3).  Rather, it was to obtain a set of parameter values in PRM that would be in accordance with the hypotheses of the detailed model that could be fully explored to derive cellular-based predictions that could help design experiments to understand theta/gamma rhythms.

      Responses specific to the Reviewers are given below.

      Reviewer #1 (Public Review):

      This paper investigates potential mechanisms underlying the generation of hippocampal theta and gamma rhythms using a combination of several modeling approaches. The authors perform new simulation experiments on the existing large-scale biophysical network model previously published by Bezaire et al. Guided by their analysis of this detailed model, they also develop a strongly reduced, rate-based network model, which allows them to run a much larger number of simulations and systematically explore the effects of varying several key parameters. The combined results from these two in silico approaches allow them to predict which cell types and connections in the hippocampus might be involved in the generation and coupling of theta and gamma oscillations.

      In my view, several aspects of the general methodology are exemplary. In the current work as well as several earlier papers, the authors are re-using a large-scale network model that was originally developed in a different laboratory (Bezaire et al., 2016) and that still represents the state-of-the-art in detailed hippocampal modeling. Such model reuse is quite rare in computational neuroscience, which is rather unfortunate given the amount of time and effort required to build and share such a complex model. Very often, and also, in this case, the original publication that describes a detailed model provides only limited validation and analysis of model behavior, and the re-use of the same model in later studies represents a great opportunity to further examine and validate the model.

      Combining detailed and simplified models can also be a powerful approach, especially when the correspondence between the two is carefully established. Matching results from the two models, in this case, allow strong arguments about key mechanisms of biological phenomena, where the simplified model allows the identification and characterization of necessary and sufficient components, while the detailed model can firmly anchor the models and their predictions to experimental data.

      On the other hand, I have several major concerns about the implementation of these approaches and the interpretation of the results in the current study. First of all, the detailed model of Bezaire et al. is considered strictly equivalent, in all of its relevant details, to biological reality, and no attempt is made to verify or even discuss the validity of this assumption, even when particular details of the model are apparently critical for the results presented. I see this as a fundamental limitation of the current work - the fact that the Bezaire et al. model is the best one we have at the moment does not automatically make it correct in all its details, and features of the model that are essential for the new results certainly deserve careful scrutiny (preferably via detailed comparison with experimental data).

      An important case in point is the strength of the interactions between specific neuronal populations. This is represented by different quantities in the detailed and simplified model, but the starting point is always the synaptic weight (conductance) values given by Bezaire et al. (2016), also listed in Tables 2 and 3 of the current manuscript. Looking at these parameters, one can identify a handful of connections whose conductance values are much higher than those of the other connections, and also more than an order of magnitude higher (50-100 nS) than commonly estimated values for cortical synapses (normally less than about 5 nS, except for a few very special types of synapse such as the hippocampal mossy fibers). Not surprisingly, several of these connections (such as the pyramidal cell to pyramidal cell connections, and the CCK+BC to PV+BC connections) were found to be critical for the generation and control of theta and gamma oscillations in the model. Given their importance for the conclusions of the paper, it would be essential to double-check the validity of these parameter values. In this context, it is worth noting that, unlike the anatomical parameters (cell numbers and connectivity) that had been carefully calculated and discussed in Bezaire and Soltesz (2013), biophysical parameters (the densities of neuronal membrane conductances and synaptic conductances) in Bezaire et al. (2016) were obtained by relatively simple (partly manual) fitting procedures whose reliability and robustness are mostly unknown. Specifically for synaptic parameters in CA1, a more systematic review and calculation were recently carried out by Ecker et al. (2020); their estimates for the synaptic conductances in question are typically much lower than those of Bezaire et al. (2016) and appear to be more in line with widely accepted values for cortical (hippocampal) synapses.

      Furthermore, some key details concerning the construction of the simplified rate model are unclear in the current manuscript. The process of selecting cell types and connections for inclusion in the rate model is described, and the criteria are mostly clear, although the results are likely to be heavily affected by the problems discussed above, and I do not understand why the strength of external input was included among the selection criteria for cell types (especially if the model is meant to capture the internal dynamics of the isolated CA1 region). However, the main issue is that it remains unclear how the parameters of the rate model (the 24 parameters in Table 4) were obtained. The authors simply state that they "found a set of parameters that give rise to theta-gamma rhythms," and no further explanation is provided. Ideally, the parameters of the rate model should be derived systematically from the detailed biophysical model so that the two models are linked as strongly as possible; but even if this was not the case, the methods used to set these parameters should be described in detail.

      An important inaccuracy in the presentation of the results concerns the suggested coupling of theta and gamma oscillations in the models. Although the authors show that theta and gamma oscillations can be simultaneously present in the network under certain conditions, actual coupling of the two rhythms (e.g., in the form of phase-amplitude coupling) is not systematically characterized, and it is therefore not clear under what conditions real coupling is present in the two models (although a probable example can be seen in Figure 1C(ii)).

      The Discussion of the paper states that gamma oscillations in the model(s) are generated via a pure interneuronal (ING) mechanism. This is an interesting claim; however, I could not find any findings in the Results section that directly support this conclusion.

      Finally, although the authors write that they can "envisage designing experiments to directly test predictions" from their modeling work, no such experimental predictions are explicitly identified in the current manuscript.

      As noted above, our goal was not to validate the original detailed model but to carry out further analysis of the Bezaire model in its re-use, since as noted by this Reviewer, the original publication was limited in validation and analysis.  Further validation/extensions of Bezaire et al can be carried out given their acknowledged limitations (some as mentioned by the Reviewer).  However, as noted, more detailed models of CA1 microcircuitry now exist (Ecker et al 2020), and it would be interesting to examine whether and how these more detailed models might express theta/gamma rhythms.  In essence, we completely agree that all the details of the Bezaire et al model are not automatically correct.  We were using it as a biological proxy, albeit imperfect.  However, it is able to produce theta/gamma rhythms using parameter values that are experimentally derived in many ways (Bezaire & Soltesz 2013), and with minimal tuning, and thus our assumption is that it captures a potential ‘biological balance’ to generate these rhythms.  Hence, we carried out additional simulations and explorations to derive generation hypotheses that are “applied” in the development of the reduced population rate model (PRM).  The “ING” aspect is due to CCK+BCs and PV+BCs firing coherent gamma rhythms that are imposed onto the PYR cell population as mentioned in the Results.  Without PYR input, they still fire coherent gamma rhythms.  Experiments in which theta/gamma rhythms are characterized (CFC, frequencies)  with and without the presence of CCK+BCs would allow the main prediction of the modeling work to be explored – i.e., whether CCK+BCs are essential for the existence of these coupled rhythms.  We know from Dudok et al that there are alternating sources of perisomatic inhibition, but how they might control theta/gamma rhythms has not been explored to the best of our knowledge.

      We will more fully describe our process for PRM parameters in subsequent revisions as well as formally apply CFC metrics.

      Reviewer #2 (Public Review):

      The goal of this study is to find a minimal model that produces both theta and gamma rhythms in the hippocampus CA1, based on the full-scale model (FSM) of Bezaire et al, 2016. The FSM here is treated as equivalent to biological data. This seems to be a second part of a study that the same authors published in 2021, and is extensively cited here. The study reduces the FSM to a neural rate model with 4 neurons, which is capable of producing both rhythms. This model is then simulated and its parameter dependencies are explored.

      The authors succeed in producing a rate model, based on 4 neuron types, that captures the essence of the two rhythms. This model is then analyzed at a descriptive level to claim that the synapse from one interneuron type (CCK) to another (PV+) is more effective than its reciprocal counterpart (PV+ to CCK synapse) to control theta rhythm frequency.

      The results fall short on several fronts:<br /> The conclusions rely exclusively on the assumption that the FSM is in fact able to faithfully reflect the biological circuits involved, not just in its output, but in response to a variety of perturbations. Although the authors mention and discuss this assumption, in the end, the reader is left with a (reduced) model of a (complex) model, but no real analysis based on this reduction. In fact, the reduced model is treated in a manner that could have been done with the full one. Thus the significance of the work is greatly reduced not by what the authors do, but by what they fail to do, which is to properly analyze their own reduced model. Consequently, the impact of this study on the field is minimal.<br /> Related to the first point, throughout the manuscript, multiple descriptive findings, based on the authors' observations of the model output, are presented as causal relationships. Even the main finding of the study (that one synapse has a larger effect on theta than another) is not quantified, but just simply left as a judgment call by the authors and reader of comparing slopes on graphs.

      We agree with this Reviewer that analysis of the PRM is needed and is currently underway.  It will hopefully help us understand what ‘balances’ are essential for theta/gamma rhythm expression.  However, the overall goal of this work was not to “find” a minimal model per se, but rather to determine how theta/gamma rhythms in the hippocampus are generated (hence building on previous works).  However, it was important to use the detailed model (biological proxy – albeit imperfect – see response to Reviewer#1) to obtain hypotheses on which the PRM is based.  We do not envisage the minimal model as a `replacement’ for the detailed model in general, but rather, to show that using a combination approach (detailed and/or experimental observations with ‘derived’ reduced models) allows us to gain insight into cellular contributions to rhythm generation. Quantification of observations will be applied in subsequent revisions.

      Reviewer #3 (Public Review):

      While full-scale and minimal models are available for CA1 hippocampus and both exhibiting theta and gamma rhythms, it is not fully clear how inhibitory cells contribute to rhythm generation in the hippocampus. This paper aims to address this question by proposing a middle ground - a reduced model of the full-scale model. The reduced model is derived by selecting neural types for which ablations show that these are essential for theta and gamma rhythms. A study of the reduced model proposes particular inhibitory cell types (CCK+BC cells) that play a key role in inhibitory control mechanisms of theta rhythms and theta-gamma coupling rhythms.

      Strengths:<br /> The paper identifies neural types contributing to theta-gamma rhythms, models them, and provides analysis that derives control diagrams and identifies CCK+BC cells as key inhibitory cells in rhythm generation. The paper is clearly written and approaches are well described. Simulation data is well depicted to support the methodology.

      Weaknesses:<br /> The derivation methodology of the reduced model is hypotheses based, i.e. it is based on the selection of cell types and showing that these need to be included by ablation simulations. Then the reduced model is fitted. While this approach has merit, it could "miss" cell types or not capture the particular balance between all types. In particular, it is not known what is the "error" by considering the reduced model. As a result, the control plots (Fig. 5 and 6) might be deformed or very different. An additional weakness is that while the study predicts control diagrams and identifies CCK+BC cell types as key controllers, experimental data to validate these predictions is not provided. This weakness is admissible, in my opinion, since these recordings are not easy to obtain and the paper focuses on computational investigation rather than computationally guided experiments.

      This Reviewer has provided a succinct description of our work which we will leverage in subsequent revisions as we more fully describe our process – thank you.  We agree with the Reviewer that we could ‘miss’ cell types and not capture particular balances etc., as we based our PRM on hypotheses from the detailed model.  Our PRM and its reference parameter values are ‘designed’ based on hypotheses from our set of explorations of the detailed model, and we were able to determine particular predictions that can be experimentally explored.  Subsequent theoretical analyses will help us understand the required ‘balances’ but as noted above (see response to Reviewer#2), we are not aiming for a minimal model (in general), but rather to use such a combined approach (detailed model and/or experimental observations with ‘derived’ reduced models) to come up with (cellular-based) predictions underlying theta/gamma generation.  As noted by this Reviewer, specific inhibitory cell recordings are not easy to obtain and we hope our work would help with computationally guided experiments – i.e, even though the reduced model may ‘miss’ other aspects, it would hopefully capture some aspects that are biologically salient for consideration in experimental design and future detailed model explorations.

    1. Author Response

      Public Evaluation Summary:

      Powers and colleagues reveal that commonly used "genetic markers" (selectable cassettes that allow for genome modification) may lead to unintended consequences and unanticipated phenotypes. These consequences arise from cryptic expression directed from within the cassettes into adjacent genomic regions. In this work, they identify a particularly strong example of marker interference with a neighboring gene's expression and develop and test next-generation tools that circumvent the problem. The work will be primarily of interest to yeast biologists using these types of tools and interpreting these types of data.

      Thank you for your time and thoughtfulness in assessing our manuscript. We agree the immediate and most direct importance of our findings is to those using cassette-based genome editing in yeast or interpreting data that comes from these experiments. However, the relevance of our findings is not limited to yeast researchers, as yeast deletion phenotypes and synthetic phenotypes are often used to guide studies in other organisms. For example, just one popular synthetic genetic interaction study from yeast (Costanzo et al, Science 2010) has been cited over 1100 times since 2010, and a large subset of these citations are not from studies focused on budding yeast.

      The central finding of our work (which we regret was not sufficiently highlighted in the original manuscript), is important to an even broader scientific community: because eukaryotic promoters are inherently bidirectional, divergent promoter activity from genome-inserted expression cassettes can drive off-target gene neighboring gene repression.

      Although instances of cassette induced off-target effects have been described previously, the mechanism behind these effects was previously unknown. Our study leveraged a strong case of selection cassette-driven off-target effects to identify the mechanism by which these confounding phenotypes occur. Our finding that cassettes of disparate sequence composition and expression level are competent to drive disruption of neighboring gene expression helped us determine that bidirectional promoter activity, inherent to most eukaryotic promoters, drives this effect. Thus, our data suggests a much wider pool of overlooked mutants are potentially affected by effects like the “neighboring gene effect” (NGE, Ben-shtrit et al. Nature Methods 2012) than previously considered. We find that bidirectional promoter activity from expression cassettes occurs at all cassette-inserted loci analyzed, but the resultant divergent transcripts are often terminated before disrupting neighboring genes, apparently through the mechanisms terminating most endogenous divergent transcripts (eg. CUTs; Xu et al. Nature 2009; Schultz et al. Cell 2013). These data help explain why some loci are sensitive to disruption of neighboring gene expression while others are immune. Based on identification of this mechanism of action, we find that a simply “insulating” the promoter internal to the inserted cassette with transcription termination sequences prevents this type of off-target effect. We share these updated editing tools with the community to decrease confounding off-target effects in future studies.

      Because the mechanisms driving these off-target effects are fundamental, they are likely occurring in other eukaryotes. Considering the specific cassette induced LUTI-based mis-regulation reported here, this off-target mis-regulation could be seen, regardless of organism, if the following conditions are met:

      1) Insertion of a cassette housing a bidirectional promoter

      • Most, if not all, promoters have bidirectional activity (Teodorovic, Walls, and Elmendorf, NAR 2007; Xu et al., Nature 2009, Neil et al, Nature 2009, Trinklein et al. Genome Research 2004, Seila et al., Science 2008, Core and Lis Science 2008; Preker et al Science 2008), including commonly used mammalian promoters (CMV and EF1alpha; Curtin et al. Gene Therapy 2008; SV40: Gidoni et al. Science 1985). Insulator use is rare in construct design and has been primarily used in cases in which the concern is protecting expression of the expression cassette from the local chromatin environment. Although not the dominant mode of gene deletion in mammalian cells, expression cassettes are commonly inserted for knock-in experiments, for example, in the form of antibiotic resistance genes or fluorescent protein-encoding genes.

      • It is interesting that in their native context in both yeast and mammals, most promoters do not produce a stable divergent transcript. In yeast, this results from mechanisms including the NNS termination pathway coupled to Rrp6/exosome-mediated RNA degradation (Schultz et al. Cell 2013). The TEF1 promoter is a prime example, with evidence for a divergent transcript that is visible only when RRP6 is deleted (Xu et al., Nature 2009) or when nascent transcripts are analyzed (Churchman and Weissman, Nature 2011). In mammals, the NNS pathway does not serve this role, but rather the production of stable divergent transcripts is limited by early polyA signals that prevent transcriptional interference from naturally occurring more pervasively and the instability of the resultant short transcripts (Ntini et al, NSMB 2013; Almada et al, Nature 2013). Note that persistence of a stable (detectable) transcript is not needed for neighboring gene disruption to occur, but the production of a transcript that extends into the regulatory sequences for a neighboring gene’s transcript is.

      2) A neighboring gene within a distance that allows transcription interference without intervening transcription termination

      • This is hard to assess systematically, but natural transcription interference and LUTI occur in both human and yeast cells (Chen et al., eLife 2017; Chia et al. eLife 2017; Hollerer et al., G3 2019; Otto and Cheng et al., Cell 2018; Van Dalfsen et al. Dev Cell 2018). Data from our lab suggests this regulation can even be effective up to spans of ~2KB (Vander Wende et al, bioRxiv is an interesting example), so it seems that the artificial regulation described here could have similar range.

      • Although yeast genes are more closely spaced than those in human or mice, there are many gene dense regions in these organisms cases and it has been shown that roughly ¼ of head-to-head oriented genes are within 2KB in human (Gherman, Wang, and Avramopolous, Human Genomics, 2009)

      3) A neighboring gene in the divergent orientation to the cassette (ie. Head-to-head orientation; should be present in half of cassette insertions)

      4) Competitive uORF sequences in the extended 5’ transcript region

      • This is, again, hard to systematically assess, but our studies indicate that approximately half of AUG uORFs are effective at competing with main ORF translation. Because almost every intergenic region houses at least one AUG this may not be a major limiting factor. As in yeast, AUG uORF translation has been seen to be pervasive in naturally 5’ extended human transcripts (Floor and Doudna, eLife, 2016 as just one example).

      While these conditions must be met to match the exact LUTI-based repression that we report at the DBP1/MRP51 locus, even situations in which only conditions 1 and 2 are met could drive potent transcriptional interference impacting neighboring gene expression.

      Our findings offer a new perspective important for designing or interpreting genome engineering experiments in any organism, and identification of a mechanism for neighboring gene effects of expression cassette insertion allow it to be prevented in future studies.

      We regret the narrow framing of our study in the initial manuscript, but hope that our revised manuscript better demonstrates how our findings fit into existing literature regarding neighboring gene effects from cassette insertion, and that their broad relevance is now clear.

      Reviewer #1 (Public Review):

      This manuscript presents information that will be of great interest to yeast geneticists - standard gene deletions can lead to misleading phenotypes due to effects on adjacent genes. The experiments carefully document this in one case, for the DBP1 gene, and present additional evidence that it can occur at additional genes. An improved version of the standard gene replacement cassette is described, with evidence that it functions in an improved fashion, insulated from affecting adjacent genes.

      We appreciate the reviewer’s enthusiasm for the data in our study, and their perspective that this will be of great interest to the yeast community. We hope that we have improved the writing in the revised manuscript to emphasize our finding that a conserved feature of eukaryotic gene regulation drives this effect suggests it likely to be occurring in other organisms.

      Reviewer #2 (Public Review):

      The impact of the work will be for yeast researchers in the clear and careful presentation of a case study wherein phenotypes might be ascribed to the knockout of a particular gene but instead derive from effects on a neighboring gene. In this case, a transcript expressed from within or adjacent to a knockout of DBP1 by a selectable marker towards the adjacent gene MRP51 interferes with the adjacent gene's normal transcription start sites. Furthermore, although neighboring MRP51 ORF is present on the longer mRNA isoform that is generated, it is not efficiently translated. The authors expand on this phenotypic observation to demonstrate that a substantial fraction of selectable marker insertions can generate transcription adjacent to or within and going away from, selectable markers.

      The strengths of the work are that the derivation of the observed phenotypes for the dpb1∆ alleles is clearly and carefully elucidated and the creation of new selectable marker cassettes that overcome the potential for cryptic transcript emanation from or near to the selectable markers. This is valuable for the community as a clear demonstration of how only the exact right experiments might detect underlying mechanisms for potentially misattributed phenotypes and that many times these experiments may not be performed.

      Thanks very much to the reviewer for their thoughtful assessment of our manuscript. We are thrilled that they find the work to be valuable for yeast researchers, and more broadly, to those interested in avoiding misinterpretations of mutant phenotypes. We propose this to be a mechanism that is likely to be important beyond yeast studies and hope that we have made this clearer in the revised manuscript.

      While understandable in terms of how the experiments likely played out, the manuscript seems in between biology and tool development, as the biology in question was related to a gene that is not the focus of this lab. The tool development is likely to be useful but potentially non-optimal.

      We agree with the reviewer’s point that this is a good opportunity to improve the standard yeast cassettes further and have now done so. We now include a further improved pair of cassettes that minimize shared sequences (Figure 3H). These and the previously described constructs (Figure 3F) will all be deposited at Addgene and we hope that they will be of value to the yeast community.

      The reviewer’s comment also made us realize that our previous presentation of the work was not ideal. We have adjusted the order of data in the revised manuscript, including swapping the data in Figures 3 and 4 and adding a Figure 5 to further emphasize the mechanism that we identify to drive this off-target effect, rooted in bidirectional promoter activity. While we hope the new cassettes are useful to others, they also serve a specific biological role in this manuscript, which is to show that bidirectional transcription driven from existing cassettes is the cause of the off-target effect that we report.

      The mechanism for interference identified in this example case (via a long undecoded transcript isoform (LUTI) has already been described for other loci and in a number of species, including in work from the Brar lab. The concept of marker interference with neighboring genes has also been increasingly appreciated by a number of other studies.

      Indeed, because of our recent research interests, we were aware that natural LUTI-based regulation was widespread prior to this study, but even we were surprised to see it occurring in this artificial context. The idea that constitutive LUTI-based repression can be easily driven at loci that are not otherwise LUTI-regulated is an interesting point to consider in designing gene editing approaches. We agree with the reviewer that a greater discussion of previously published work regarding marker interference is necessary to understand the novelty of our findings, including the discussion of some work that should have been cited and discussed in the original manuscript (Ben-Shitrit et al. Nature Methods 2012 and Egorov et al. NAR 2021, in particular). In the reframing of our revised manuscript, we aimed to emphasize the novel aspects of our work, and how they relate to previous reports of the “neighboring gene effect” (NGE). Although the phenomenon of the NGE had been reported, it was not previously clear what caused it to occur, which made it impossible to prevent in planning new approaches or to diagnose in existing data. In revealing this unexpected mechanism driven by bidirectional promoter activity that is general to expression cassette-based editing, rather than resulting from any particular cassette sequence, we were able to design constructs to prevent this from occurring in future studies. Moreover, because bidirectional promoter activity is a highly conserved feature of eukaryotic gene expression, this finding suggests that the type of off-target effect that we describe here is likely to occur with expression cassette insertion in more complex eukaryotes, as well. To our knowledge, this has not been widely considered as a possibility.

    1. Author Response

      Reviewer #1 (Public Review):

      This study analyzes the detailed chemical mechanics of the formation of a physiologically important protein multimer. The primary strengths of the study are careful analyses of two distinct methods, CG-MALS a direct measure of multimerization, and environment-sensitive tryptophan fluorescence, that each indicates that Ca2+ activation of the C-lobe alone can change the physical interaction with an SK2 C-terminal peptide. An intriguing finding is that while either the N- or C-lobes alone can interact with the C-terminal peptide, only with full-length CaM can the SK C-terminal peptide be bound by two CaM molecules simultaneously. This study also clearly demonstrates that Ca2+ activation of the N-lobe triggers binding to the SK2 Cterminal peptide. Methods descriptions are thorough and excellent. Discussion of relevance to structures and function are nuanced and free of presumptions. The weaknesses of this manuscript are that the physiological implications of these findings are not clear: CaM interacts with regions of SK channels besides the C-terminal peptide studied here, and no evidence is provided here that C-lobe calcium binding alters channel opening. Overall, the evidence for conformational changes of the complex due to Ca2+ binding to the C-lobe alone is very strong, and physiological importance seems likely. The interpretation of data in this manuscript is mostly cautious and logically crystalline, with alternative interpretations discussed at many junctures.

      We thank Reviewer #1 for very helpful and thoughtful considerations and catching some oversights in our work. Our work was improved by addressing their comments.

      Reviewer #2 (Public Review):

      Activation of SK channels by calcium through calmodulin (CaM) is physiologically important in tuning membrane excitability. Understanding the molecular mechanism of SK activation has therefore been a high priority in ion channel biophysics and calcium signaling. The prevailing view is that the C-terminal lobe of CaM serves as an immobile Ca2+-independent tether while the N-lobe acts as a sensor whose binding activates the channel. In the present study, the authors undertake extensive biophysical/biochemical analysis of CaM interaction with SK channel peptide and rigorous electrophysiological experiments to show that Ca2+ does bind to the C-lobe of CaM and this potentially evokes conformational changes that may be relevant for channel gating. Beyond SK channels, the approach and findings here may bear important implications for an expanding number of ion channels and membrane proteins that are regulated by CaM.

      A strength of the study is that the electrophysiological recordings are innovative and of high quality. Given that CaM is ubiquitous in nearly all eukaryotes, dissecting the effects of mutants particularly on individual lobes is technically challenging, as endogenous CaM can overwhelm low-affinity mutants. The excised patch approach developed here provides a powerful methodology to dissect fundamental mechanisms underlying CaM action. I imagine this could be adaptable for studying other ion channels. Armed with this strategy authors show that both N- and C-lobe of CaM are essential for maximal activation of SK channels. This revises the current model and may have physiological importance.

      The major weakness is that nearly all biochemical inferences are made from analysis of isolated peptides that do not necessarily recapitulate their arrangement in an intact channel. While the use of MALS provides new evidence of the potentially complex conformational arrangement of CaM on the C-terminal SK peptide (SKp), it is not fully clear that these complexes correspond to functionally relevant states. Lastly, perhaps as a consequence of these ambiguities, the overarching model or mechanism is not fully clear.

      We thank Reviewer #2 for their helpful review and requesting context to alleviate some the ambiguities in channel mechanism arising from our data. Although the ultimate goal of our field is to understand gating mechanism, there are too many parameters to solve with a single study. First off, we agree that there is not a clear model out there and we have only continued to assemble building blocks to make one.

      Our report is centered on calmodulin more than it is SK, which is why we studied more CaM mutants and no channel mutants. There are simply too many unanswered questions regarding stoichiometry and state dependencies to make even a basic working model. We invite the greater ion channel field to scrutinize these questions and delve deeper into approaches across disciplines.

      We strived to put our work in context with the decades of research on CaM and SK. Our work focuses on the C-terminus of SK and whether the C-lobe of CaM anchored independent of Ca2+. An anchored C-lobe would be fundamental to building any gating model with the proper energetics. Although we used only a piece of the full-length channel, a peptide that we call SKp has Ca2+-dependent associations with a full-length protein, WT-CaM. We do not have nearly enough data to solve the gating mechanism, nor do we make a claim to have solved the mechanism for SK gating, but if a piece of the channel has Ca2+-dependent interactions with another full-length protein, calmodulin, it is highly unlikely that the full-length SK channel is going to inhibit that interaction in all its closed and open states. Structures do not show inhibitory actions related to conformational Ca2+-sensitivity. The C-lobe is simply captured in most populated binding state, not necessarily its functional state. Indeed, we need a lot more data to get a clearer understanding. It was helpful to discuss this and we added more context to our work.

      Reviewer #3 (Public Review):

      Halling et. al. probe the mechanism whereby calmodulin (CaM) mediates SK channel activity in response to calcium. CaM regulation of SK channels is a critical modulator of membrane excitability yet despite numerous structural and functional studies significant gaps in our understanding of how each lobe participates in this regulation remain. In particular, while Ca2+ binding to the N-lobe of CaM has a clear functional effect on the channel, the C-lobe of CaM does not appear to participate beyond a tethering role, and structural studies have indicated that the C-lobe of CaM may not bind Ca2+ in the context of the SK channel. This study pairs functional and protein binding data to bridge this gap in mechanistic understanding, demonstrating that both lobes of CaM are likely Ca2+ sensitive in the context of SK channels and that both lobes of CaM are required for channel activation by Ca2+.

      Strengths:

      The molecular underpinnings of CaM-SK regulation are of significant interest and the paper addresses a major gap in knowledge. The pairing of functional data with protein binding provides a platform to bridge the static structural results with channel function. The data is robust, and the experiments are carefully done and appear to be of high quality. The use of multiple mutant CaMs and electrophysiological studies using a rescue effect in pulled patches to enable a more quantified evaluation of the functional impact of each lobe of CaM provides a compelling assessment of the contribution of each lobe of CaM to channel activation. The calibration of the patch data by application of WT CaM is innovative and provides precise internal control, making the conclusions drawn from these experiments clear. This data fully supports the conclusion that both lobes of CaM are required for channel activation.

      Weaknesses:

      The paper focuses heavily on the results of multi-angle light scattering experiments, which demonstrate that a peptide derived from the C-terminus of the SK channel can bind to CaM in multiple stochiometric configurations. However, it is not clear if these complexes are functionally relevant in the full channel, making interpretation challenging.

      We thank Reviewer #3 for their helpful review and for providing their concerns with our interpretation of the MALS experiments. From our previous work (Li et al. 2009 and Halling et al. 2014), we have had suggestions that stoichiometry at different functional states is complicated. Our new data presented here adds to the complexity. We do not claim to have solved whether Ca2+-dependent stoichiometry is important for channel function. That requires further research.

      As we stated with reviewer #2, we emphasize our findings convey how CaM interacts with one site on SK. CaM is the Ca2+ sensor, and Ca2+ alters how CaM binds. The channel will have more determinants for interacting with CaM, but just by studying one domain we see extraordinary complexity. We have firm results from our MALS and fluorescent binding assays that challenge the models on the full-channel even with the simplest interpretations, i.e., CaM is not a simple switch. We have shown fundamentally that CaM binding is Ca2+-dependent with a single SK binding site.

      There are several major studies that still need to be done to relate binding data to channel function: 1) Calmodulin binding studies to other calmodulin domains need to be completed 2) The dependence of Ca2+ concentration on calmodulin binding need to be determined and 3) Ca2+-dependent Calmodulin binding studies on full-length SK channels need to be completed. We invite more discussion from the ion channel field on developing models that are consistent with all data.

    1. Author Response

      Reviewer 1 (Public Review):

      Weaknesses: The main conclusion that ablation of the cadherin code decreases synaptic connectivity between the rVRG and phrenic motor neurons is never directly shown. This can only be inferred by the data.

      1) Conclusion that the connectivity between rVRG premotor and phrenic nerve motor neurons is "weaker". This conclusion is inferred from several experiments but is never directly demonstrated. Alternative interpretations of the decreased amplitude of the in vitro phrenic nerve burst is that the rootlet contains fewer axons (as predicted by the fewer motor neurons in S3 and innervation of the diaphragm S2). Additionally, the intrinsic electrophysiological properties of the motor neurons might be different. To show this decisively, the authors could use electrophysiological recordings of phrenic motor neurons to directly measure a change in synaptic input (for example, mEPSPs or EPSPs after optogenetic stimulation of rVRG axon terminals). Without a direct measurement, the synaptic connectivity can only be inferred.

      We agree with the reviewer that without anatomical evidence, we can only infer the loss of synaptic connectivity. However, we believe that this is the most likely interpretation of our data (see response to the editor summary). Unfortunately, the experiment suggested (optogenetic stimulation of rVRG terminals) is not feasible at the moment, as a) a molecular tool to specifically express channelrhodopsin in rVRG does not currently exist; even if it did, it would require crossing two more alleles in our current mouse model, which contains 5 alleles, making the genetics/breeding cumbersome and b) viral-mediated channelrhodopsin expression in the rVRG is not feasible since the mice die at birth. We will continue to explore alternative approaches to directly demonstrate the loss of rVRG-PMC connectivity in the future.

      2) Conclusion that the small phenic nerve burst size in Dbx1 deleted cadherin signaling is due to less synaptic input to the motor neurons. Dbx1 is expressed in multiple compartments of the medullary breathing control circuit, like the breathing rhythm generator (preBötC). The smaller burst size could be due to altered activity between preBötC neurons to create a full burst, the transmission of this burst from the preBötC to the rVRG, etc.

      We agree with the reviewer about the alternative interpretations of the data, which we mention in the discussion. At this point, we can only conclude that cadherin signaling is required in Dbx1derived respiratory populations for proper phrenic respiratory output. We are currently developing the tools in our lab to further dissect the exact contributions of cadherins to rVRG development, connectivity, and function. As this will require significant time and effort, we believe it is outside the scope of the current work.

      3) In vitro burst size. The authors use 4 bursts from each animal to calculate the average burst size. How were the bursts chosen? Why did the authors use so few bursts? What is the variability of burst size within each animal? What parameters are used to define a burst? This analysis and the level of detail in the figure legend/methods section is inadequate to rigorously establish the conclusion that burst size is altered in the various genotypes.

      To address the reviewer’s concern, we have updated the data by analyzing 7 bursts per animal. Some control mice have burst frequencies as low as 0.2 bursts per minute (see fig. 4b), and thus acquiring 7 bursts requires 35 minutes of recording time, a substantial amount when an entire litter is being recorded in a day. All data is from 7 bursts per animal except for 4 out of 11 NMNΔ6910-/- mice, which only had 1-3 bursts total. To analyze the data, either every single burst was analyzed, or for those traces of higher frequency, bursts were selected randomly, spaced throughout the trace. Bursts were defined as activity above baseline that persists for at least 50ms. Some bursts contain pauses in activity in the middle; activity that was spaced less than 1 second apart was defined as a single burst.

      Updating the data for more bursts slightly changed some of our findings. We now find that 6910/- mice no longer exhibit significantly increased burst duration and burst activity. This was barely significant in our previous analysis, and is now just barely non-significant (p=0.065 for burst duration, p=0.059 for burst activity).

      We have included this more detailed description in the methods section. We have also included an excel sheet as source data for fig. 4 to indicate the variability of burst size within each animal and across animals.

      4) The authors state that the in vitro frequency in figure 4 is inaccurate, but then the in vitro frequency is used to claim the preBötC is not impacted in Dbx1 mutants (conclusion section "respiratory motor circuit anatomy and assembly"). To directly assess this conclusion, the bursting frequency of the in vitro preBötC rhythm should be measured.

      We have now included the quantitation of respiratory frequency data for control and βγ-catDbx1∆ mice, showing that there are no significant changes in burst frequency in βγ-catDbx1∆ mice. However, we do agree with the reviewer that the loss of excitatory drive could be due to changes either in the rVRG or the preBötC and we have toned down our conclusions to indicate that the preBötC could be impacted in βγ-catDbx1∆ mice.

      5) The burst size in picrotoxin/strychnine is used to conclude that the motor neurons intrinsic physiology is not impacted. The bursts are described, and examples are shown, but this is never quantified across many bursts within in a single recording nor in multiple animals of each genotype.

      We have now included quantification of this data, using 6-11 bursts/mouse from 3 control and 3 NMNΔ6910-/- mice. We find that both the spinal burst total duration (shown as % of recording time) and the normalized integrated spinal activity over time are not significantly different between control and NMNΔ6910-/- mice.

      Reviewer 3 (Public Review):

      Major points

      1) Page 8: 'In addition, NMNΔ and NMNΔ6910-/- mice showed a similar decrease in phrenic MN numbers, likely from the loss of trophic support due to the decrease in diaphragm innervation (Figure S3c).' This statement should be corrected: phrenic MN number in NMNΔ mice does not differ from controls, in contrast to NMNΔ6910-/- mice (Fig. S3). Similarly, diaphragm innervation is not significantly different from controls in NMNΔ (Fig. S2). Alternatively, these observations could be strengthened by increasing the number of mice analyzed to determine whether there is a significant reduction in PMN number and diaphragm innervation in NMNΔ mice.

      Following the reviewer’s suggestion, we increased the number of control mice analyzed for diaphragm innervation (n=7) and MN numbers (n=6). We now find that there is a significant reduction in both parameters in NMNΔ mice. We have modified the results section accordingly.

      2) A similar comment relates to the interpretation of the dendritic phenotype in NMNΔ and NMNΔ6910-/- mice (Fig. 3m): the authors conclude 'When directly comparing NMNΔ and NMNΔ6910-/- mice, NMNΔ6910-/- mice had a more severe loss of dorsolateral dendrites and a more significant increase in ventral dendrites (Figure 3l-m).' (page 9). The loss of dorsolateral dendrites in NMNΔ6910-/- mice indeed differs significantly from control mice, and is more severe than in NMNΔ mice, which do not differ significantly from controls. For ventral dendrites however, the increase compared to controls is significant for both NMNΔ and NMNΔ6910-/- mice, and the two genotypes do not appear to differ from each other. This suggests cooperative action of N-cadherin and cadherin 6,9,10 for dorsolateral dendrites, but suggests that N-cad is more important for ventral dendrites. This should be phrased more clearly.

      We agree with the reviewer and apologize for the lack of clarity. We have modified our description to highlight the contribution of N-cadherin to dendritic development.

      3) Related comment, page 10: 'Furthermore, the fact that phrenic MNs maintain their normal activity pattern in NMNΔ mice suggests that neither cell body position nor phrenic MN numbers significantly contribute to phrenic MN output.' This should be rephrased, phrenic MN number does not differ from control in NMNΔ mice (Fig. S2c).

      After analyzing additional control mice, we find that phrenic MN numbers are significantly reduced in NMNΔ mice.

      4) The authors conclude that spinal network activity in control and NMNΔ6910-/- mice does not differ (page 10, Fig. 4f). It is difficult to judge this from the example trace in 4f. How is this concluded from the figure and can this be quantified?

      We have now included quantification of this data, using 6-11 bursts/mouse from 3 control and 3 NMNΔ6910-/- mice. We find that both the spinal burst total duration (shown as % of recording time) and the normalized integrated spinal activity over time are not significantly different between control and NMNΔ6910-/- mice.

      5) RphiGT mice: please explain the genetic strategy better in Results section or Methods, do these mice also express the TVA receptor in a Cre-dependent manner? Crossing with the Cdh9:iCre line will then result in expression of TVA and G protein in phrenic motor neurons and presynaptic rVRG neurons in the brainstem, as well as additional Cdh9-expressing neuronal populations. How can the authors be sure that they are looking at monosynaptically connected neurons?

      We have added additional information in the methods to describe the rabies virus genetic strategy. Although the mice do express the TVA receptor, we did not include this in the description as it is not relevant to our strategy. We are using a Rabies∆G virus that is not pseudotyped with EnvA so it does not require TVA to infect cells. The specificity of primary cell (phrenic MN) infection rather comes from diaphragm injections. We only analyze mice in which we can confirm the injection was specific to the diaphragm muscle and did not leak to body wall or hypaxial muscles (about 50% of injections). We have tested different infection times to determine when monosynaptically connected neurons are labeled. We do not see any labeling at the brainstem 5 days post injection and we start to see additional labeling (possible 2nd order neurons) 10 days post injection. Thus we are confident that our analysis at 7 days post injection captures monosynaptically-connected neurons. We have also performed rabies virus tracing in ChAT::Cre mice, where the expression of G-protein is restricted to motor neurons, and we observe a similar distribution of pre-motor neurons in the brainstem, as with Cdh9::iCre, indicating that we are reproducibly labeling 1st order neurons with both genetic strategies.

      6) The authors use a Dbx1-cre strategy to inactivate cadherin signaling in multiple brainstem neuronal populations and perform analysis of burst activity in phrenic nerves. Based on the similarity in phenotype with NMNΔ6910-/- mice it is concluded that cadherin function is required in both phrenic MNs and Dbx1-derived interneurons. However, this manipulation can affect many populations including the preBötC, and the impact of this manipulation on rVRG and phrenic motor neurons (neuron number, cell body position, dendrite orientation, diaphragm innervation etc) is not described, although a model is presented in Fig. 7. These parameters should be analyzed to interpret the functional phenotype.

      We agree with the reviewer that the Dbx1-Cre mediated manipulation can affect multiple respiratory populations (see response to reviewer 1). However, Dbx1-mediated recombination does not target phrenic MNs. We have now added a figure (Figure 6-figure supplement 1), demonstrating this. Thus, we think that it is unlikely to cause any cell-autonomous changes in MN number, diaphragm innervation etc. It is plausible that there might be secondary changes in phrenic MNs as a result of changes in rVRG properties (for example, the dendritic orientation of phrenic MNs could be altered if rVRG synapses are lost), but the primary impact of this manipulation will be on Dbx1-derived neurons.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper describes the results of a MEG study where participants listened to classical MIDI music. The authors then use lagged linear regression (with 5-fold cross-validation) to predict the response of the MEG signal using (1) note onsets (2) several additional acoustic features (3) a measure of note surprise computed from one of several models. The authors find that the surprise regressors predict additional variance above and beyond that already predicted by the other note onset and acoustic features (the "baseline" model), which serves as a replication of a recent study by Di Liberto.

      They compute note surprisal using four models (1) a hand-crafted Bayesian model designed to reflect some of the dominant statistical properties of Western music (Temperley) (2) an ngram model trained on one musical piece (IDyOM stm) (3) an n-gram model trained on a much larger corpus (IDyOM ltm) (4) a transformer DNN trained on a mix of polyphonic and monophonic music (MT). For each model, they train the model using varying amounts of context.

      They find that the transformer model (MT) and long-term n-gram model (IDyOM stm) give the best neural prediction accuracy, both of which give ~3% improvement in predicted correlation values relative to their baseline model. In addition, they find that for all models, the prediction scores are maximal for contexts of ~2-7 notes. These neural results do not appear to reflect the overall accuracy of the models tested since the short-term n-gram model outperforms the long-term n-gram model and the music transformer's accuracy improves substantially with additional context beyond 7 notes. The authors replicate all these findings in a separate EEG experiment from the Di Liberto paper.

      Overall, this is a clean, nicely-conducted study. However, the conclusions do not follow from the results for two main reasons:

      1) Different features of natural stimuli are almost always correlated with each other to some extent, and as a consequence, a feature (e.g., surprise) can predict the neural response even if it doesn't drive that response. The standard approach to dealing with this problem, taken here, is to test if a feature improves the prediction accuracy of a model above and beyond that of a baseline model (using cross-validation to avoid over-fitting). If the feature improves prediction accuracy, then one can conclude that the feature contributes additional, unique variance. However, there are two key problems: (1) the space of possible features to control for is vast, and there will almost always be uncontrolled-for features (2) the relationship between the relevant control features and the neural response could be nonlinear. As a consequence, if some new feature (here surprise) contributes a little bit of additional variance, this could easily reflect additional un-controlled features or some nonlinear relationship that was not captured by the linear model. This problem becomes more acute the smaller the effect size since even a small inaccuracy in the control model could explain the resulting finding. This problem is not specific to this study but is a problem nonetheless.

      We understand the reviewer’s point and agree that it indeed applies not exclusively to the present study, but likely to many studies in this field and beyond. We disagree, however, that it constitutes a problem per se. We maintain that the approach of adding a feature, observing that it increases crossvalidated prediction performance, and concluding that therefore the feature is relevant, is a valid one. Indeed, it is possible and even likely that not all relevant features (or non-linear transformations thereof) will be present in the control/baseline model. If a to-be-tested feature increases predictive performance and therefore explains relevant variance, then that means that part of what drives the neural response is non-trivially related to the to-be-tested feature. The true underlying relationship may not be linear, and later work may uncover more complex relationships that subsume the earlier discovery, but the original conclusion remains justified.

      Importantly, we wish to emphasize that the key conclusions of our study primarily rest upon comparisons between regression models that are by design equally complex, such as surpriseaccording-to-MT versus surprise-according-to-IDyOM and comparisons across different context lengths. We maintain that the comparison with the Baseline model is also important, but even taking the reviewer’s worry here into account, the comparison between different equally-complex regression models should not suffer from it to the same extent as a model-versus-baseline comparison.

      2) The authors make a distinction between "Gestalt-like principles" and "statistical learning" but they never define was is meant by this distinction. The Temperley model encodes a variety of important statistics of Western music, including statistics such as keys that are unlikely to reflect generic Gestalt principles. The Temperley model builds in some additional structure such as the notion of a key, which the n-gram and transformer models must learn from scratch. In general, the models being compared differ in so many ways that it is hard to conclude much about what is driving the observed differences in prediction accuracy, particularly given the small effect sizes. The context manipulation is more controlled, and the fact that neural prediction accuracy dissociates from the model performance is potentially interesting. However, I am not confident that the authors have a good neural index of surprise for the reasons described above, and this limits the conclusions that can be drawn from this manipulation.

      First of all, we would like to apologize for any unclarity regarding the distinction between Gestalt-like and statistical models. We take Gestalt-like models to be those that explain music perception as following a restricted set of rules, such as that adjacent notes tend to be close in pitch. In contrast, as the reviewer correctly points out, statistical learning models have no such a priori principles and must learn similar or other principles from scratch. Importantly, the distinction between these two classes of models is not one we make for the first time in the context of music perception. Gestalt-like models have a long tradition in musicology and the study of music cognition dating back to (Meyer, 1957). The Implication-Realization model developed by Eugene Narmour (Narmour, 1990, 1992; Schellenberg, 1997) is another example for a rule-based theory of music listening, which has influenced the model by David Temperley, which we applied as the most recently influential Gestalt-model of melodic expectations in the present study. Concurrently to the development of Gestalt-like models, a second strand of research framed music listening in light of information theory and statistical learning (Bharucha, 1987; Cohen, 1962; Conklin & Witten, 1995; Pearce & Wiggins, 2012). Previous work has made the same distinction and compared models of music along the same axis (Krumhansl, 2015; Morgan et al., 2019a; Temperley, 2014). We have updated the manuscript to elaborate on this distinction and highlight that it is not uncommon.

      Second, we emphasize that we compare the models directly in terms of their predictive performance both of upcoming musical notes and of neural responses. This predictive performance is not dependent on the internal details of any particular model; e.g. in principle it would be possible to include a “human expert” model where we ask professional composers to predict upcoming notes given a previous context. Because of this independence of the relevant comparison metric on model details, we believe comparing the models is justified. Again, this is in line with previously published work in music (Morgan et al., 2019a), language, (Heilbron et al., 2022; Schmitt et al., 2021; Wilcox et al., 2020), and other domains (Planton et al., 2021). Such work compares different models in how well they align with human statistical expectations by assessing how well different models explain predictability/surprise effects in behavioral and/or brain responses.

      Third, regarding the doubts on the neural index of surprise used: we respond to this concern below, after reviewer 1’s first point to which the present comment refers (the referred-to comment was not included in the “essential revisions” here).

      Reviewer #2 (Public Review):

      This manuscript focuses on the basis of musical expectations/predictions, both in terms of the basis of the rules by which these are generated, and the neural signatures of surprise elicited by violation of these predictions.

      Expectation generation models directly compared were gestalt-like, n-gram, and a recentlydeveloped Music Transformer model. Both shorter and longer temporal windows of sampling were also compared, with striking differences in performance between models.

      Surprise (defined as per convention as negative log prior probability of the current note) responses were assessed in the form of evoked response time series, recorded separately with both MEG and EEG (the latter in a previously recorded freely available dataset). M/EEG data correlated best with surprise derived from musical models that emphasised long-term learned experiences over short-term statistical regularities for rule learning. Conversely, the best performance was obtained when models were applied to only the most recent few notes, rather than longer stimulus histories.

      Uncertainty was also computed as an independent variable, defined as entropy, and equivalent to the expected surprise of the upcoming note (sum of the probability of each value times surprise associated with that note value). Uncertainty did not improve predictive performance on M/EEG data, so was judged not to have distinct neural correlates in this study.

      The paradigm used was listening to naturalistic musical melodies.

      A time-resolved multiple regression analysis was used, incorporating a number of binary and continuous variables to capture note onsets, contextual factors, and outlier events, in addition to the statistical regressors of interest derived from the compared models.

      Regression data were subjected to non-parametric spatiotemporal cluster analysis, with weights from significant clusters projected into scalp space as planar gradiometers and into source space as two equivalent current dipoles per cluster

      General comments:

      The research questions are sound, with a clear precedent of similar positive findings, but numerous unanswered questions and unexplored avenues

      I think there are at least two good reasons to study this kind of statistical response with music: firstly that it is relevant to the music itself; secondly, because the statistical rules of music are at least partially separable from lower-level processes such as neural adaptation.

      Whilst some of the underlying theory and implementation of the musical theory are beyond my expertise, the choice, implementation, fitting, and comparison of statistical models of music seem robust and meticulous.

      The MEG and EEG data processing is also in line with accepted best practice and meticulously performed.

      The manuscript is very well-written and free from grammatical or other minor errors.

      The discussion strikes a brilliant balance of clearly laying out the interim conclusions and advances, whilst being open about caveats and limitations.

      Overall, the manuscript presents a range of highly interesting findings which will appeal to a broad audience, based on rigorous experimental work, meticulous analysis, and fair and clear reporting.

      We thank the reviewer for their detailed and positive evaluation of our manuscript.

      Reviewer #3 (Public Review):

      The authors compare the ability of several models of musical predictions in their accuracy and in their ability to explain neural data from MEG and EEG experiments. The results allow both methodological advancements by introducing models that represent advancements over the current state of the art and theoretical advancements to infer the effects of long and shortterm exposure on prediction. The results are clear and the interpretation is for the most part well reasoned.

      At the same time, there are important aspects to consider. First, the authors may overstate the advancement of the Music Transformer with the present stimuli, as its increase in performance requires a considerably longer context than the other models. Secondly, the Baseline model, to which the other models are compared, does not contain any pitch information on which these models operate. As such, it's unclear if the advancements of these models come from being based on new information or the operations it performs on this information as claimed. Lastly, the source analysis yields some surprising results that don't fit with previous literature. For example, the authors show that onsets to notes are encoded in Broca's area, whereas it should be expected more likely in the primary auditory cortex. While this issue is not discussed by the authors, it may put the rest of the source analysis into question.

      While these issues are serious ones, the work still makes important advancements for the field and I commend the authors on a remarkably clear and straightforward text advancing the modeling of predictions in continuous sequences.

      We thank the reviewer for their compliments.

    1. Author Response

      Public Evaluation Summary:

      This work would be of interest to global health scientists, particularly in low- and middleincome countries where childhood stunting is an ongoing challenge, and to statisticians interested in building clinical prediction rules. The authors leveraged large, rich datasets from multi-center studies to build and validate predictive models. But by using change in growth, rather than absolute growth, as the only outcome, it may be missing children of concern who are already experiencing growth failure and require intervention but have reached a growth faltering floor.

      Thank you for this suggestion. We have added additional models for the following predictions: a) growth faltering in those NOT stunted (HAZ≥-2) at presentation, b) any stunting (HAZ<-2) at follow-up, and c) any stunting at follow-up in those not stunted at presentation. While we agree the addition of these models improves the manuscript, we also want to highlight that these models have distinct outcomes and therefore have separate clinical uses. Our original goal was to identify children whose growth was likely to slow down after diarrhea. As we show, top predictors and predictive performance is similar for growth faltering across baseline stunting status. We present any stunting at follow-up as a comparison, but argue that this is a different clinical outcome that may warrant different intervention. We have edited the manuscript for clarity as follows.

      P.22 L339-343: . In sensitivity analyses, we demonstrated our ability to predict any stunting at follow-up with high accuracy (Table 1, Table S5). However, this represents a related but distinct outcome from our original aim, namely a slowing down of growth as opposed to stunting, and may warrant different clinical intervention.

      P.23 L.353-357: Current malnutrition recommendations are based on patient presentation – whether a child is underweight when they present to the clinic. Our CPR could be used to identify children not currently stunted and therefore not currently recommended for nutritional interventions, but who are likely to slow down in growth and therefore at higher risk of incident stunting.

      P.23 L352-361: Our CPR provides a tool for identifying patients likely to experience additional growth faltering after acute diarrhea. Current malnutrition recommendations are based on patient presentation – is a child underweight when they come to the clinic. Our CPR could be used to identify children not currently stunted and therefore not currently recommended for nutritional interventions, but who are likely to slow down in growth and therefore at higher risk of incident stunting. Identifying these children would allow clinicians to connect patients with community-based nutrition interventions (e.g. maternal support for safe introduction of weening foods, small quantity lipid nutrient supplements (SQ-LNS), etc.(45-48)) to prevent additional effects of chronic malnutrition, namely irreversible stunting.

      P.25 L.390-393: Our findings indicate that use of prediction rules, potentially applied as clinical decision support tools, could help to identify additional children at risk of poor outcomes after an episode of diarrheal illness, i.e. not currently stunted but likely to decelerate growth.

      Reviewer #1 (Public Review):

      In this manuscript, the authors built logistic regression prediction models for linear growth faltering using demographic, socioeconomic, and clinical variables, with the objective of developing a clinical prediction rule that could be applied by healthcare workers to identify and treat high-risk children. A model with 2 variables selected by random forest variable importance performed similarly to a model with 10 variables. Age and HAZ at baseline were selected for the 2-variable model, consistent with existing literature. The authors externally validated the 2variable model and found similar discriminative ability. Based on typical rule-of-thumb cutoffs, model performance was moderate (AUCs of ~0.65-0.75, depending on model specification); models may still be useful in practice, but this should be further discussed by the authors.

      We agree that our overall ability to predict growth faltering was moderate. As we present in-depth below, we do not intend for our clinical prediction rule (CPR) to replace existing guidelines. Therefore, we are not proposing that our CPR be used to withhold nutritional treatment. Rather, we intend for our CPR to be used in conjunction with existing clinical practices to identify additional children who may or may not be currently stunted, but at are increased risk of decelerated growth and therefore would also benefit from nutritional interventions.

      Strengths:

      Linear growth faltering is a pressing issue with broad, negative impacts on the health, development, and well-being of children worldwide. In this work, the authors applied clearly explained, thoughtful approaches to variable selection, model specification, and model validation, with large, multi-country cohorts used for training and external validation. Appropriate datasets for external validation can be challenging to find, but the MAL-ED data used here is well-suited to the task, with similar predictor and outcome measurements to the GEMS training data. The well-characterized studies allowed the authors to explore a wide range of potential predictors for stunting, including socioeconomic factors, antibiotic use, and diarrheal etiology.

      Weaknesses:

      This work would benefit from additional discussion around the clinical relevance of the results. For example, what is the current standard of care for prevention of stunting, and how much would this model improve the status quo? Is specificity of 0.47 in the context of sensitivity of 0.80 an acceptable tradeoff with regards to the interventions that would be used? More discussion around these points is necessary to support the authors' conclusions that these models could potentially be used to support clinical decisions and target resources.

      Current practice focuses on the identification and treatment of malnutrition, with malnutrition classified based on mid-upper arm circumference (MUAC), weight for length or height z-score, or bipedal oedema. None of these measurements compare child size to their age. At the International Centre for Diarrhoeal Research, Bangladesh (ICDDRB), children are only evaluated for stunting if their weight for age z-score is too low. While stunting can be the result of chronic malnutrition, it can also be a contributing factor to future health problems (see first paragraph of Introduction). Therefore, while related to malnutrition, stunting is a distinct health outcome that would benefit from explicit identification strategies. Furthermore, current practice only identifies children who are already stunted when they present to care. A CPR to identify children whose growth is likely to slow down and therefore who are at risk of new or additional stunting could help prevent additional stunting and its downstream health outcome. The Discussion now includes the following:

      P.23 L.353-361: Current malnutrition recommendations are based on patient presentation – whether a child is underweight when they present to the clinic. Our CPR could be used to identify children not currently stunted and therefore not currently recommended for nutritional interventions, but who are likely to slow down in growth and therefore at higher risk of incident stunting. Identifying these children would allow clinicians to connect patients with communitybased nutrition interventions (e.g. maternal support for safe introduction of weening foods, small quantity lipid nutrient supplements (SQ-LNS), etc.(46-49)) to prevent additional effects of chronic malnutrition, namely irreversible stunting.

      In addition to the external validation, further investigation of model performance in key subpopulations would strengthen the importance and applicability of the work. For example, performance of prediction models may vary widely by setting; it would be valuable to show that the model has similar performance in each country. Another key sensitivity analysis would be to show consistent model performance by HAZ at baseline. The authors note that stunting may be challenging to reverse (p.20), and many of the children are already below the typical cutoff of HAZ<-2 at baseline; it would be valuable to show model performance among the subgroup of children for whom treatment would be most beneficial.

      We appreciate this suggestion. We have added additional analysis regarding stunting at baseline as described above. We have added country-specific CPRs in the Supplement. We have also added a sensitivity analysis whereby we fit models to all data from one continent in GEMS, and then validated that model on the other continent in GEMS data. As you can see from Supplementary Table S5, top predictors and discriminative performance were similar across countries and continents

      P.10 L.171-173: Finally, we conducted a quasi-external validation within the GEMS data by fitting a model to one continent and validating it on the other.

      P.24 L.380-383: The quasi-external validation between continents within GEMS data, as well as the country-specific models within GEMS, all had similar top predictors and discriminative performance, further supporting the overall validity of our CPR. Finally, we explored a range of AFe cutoffs for etiology, with consistent results.

      Reviewer #2 (Public Review):

      The manuscript documents a thorough and well-validated clinical prediction model for risk of severe child linear growth faltering after diarrheal disease episodes, using data from multiple studies and countries. They identified a parsimonious model of child age and current size with relatively good predictive accuracy. However, I don't believe the prediction rule should be used in it's current form due to the outcome used the danger of missing treating children who require nutritional supplementation.

      As described in-depth above, we do not intend for this CPR to replace existing guidelines, but rather to function as a complementary tool to identify additional children not currently stunted but who are at risk of their growth slowing down.

      The outcome used for prediction in a binary indicatory for a decrease in height-for-age Z-score >= 0.5. A child who fails to gain height by future measurements is of concern, but this outcome also misses children who are already experiencing growth failure, and is vulnerable to regression to the mean effect. The two most important predictors were age and current size, with current size having a positive association with risk of growth faltering. As mentioned in the discussion, there is "the possibility that children need to have high enough HAZ in order to have the potential to falter." Additionally, there may be children with erroneously high height measurements at the first measurement, so that the HAZ change >= 0.5 associated with high baseline HAZ is from measurement-error regression to the mean. I recommend also predicting absolute HAZ (or stunting status) as a secondary outcome and comparing if the important predictors change.

      See above.

      In its current form, the results and conclusions from the results have problematic implications for the treatment of child malnutrition. The conclusion states: "In settings with high mortality and morbidity in early childhood, such tools could represent a cost-effective way to target resources towards those who need it most." If the current CPR was used in a resourceconstrained setting, it would recommend that larger children should be prioritized for nutritional supplementation over already stunted children who may have reached their growth faltering floor. In addition, with a sensitivity of 80%, the tool would miss treating a large number of children who would experience growth faltering. The results of the clinical prediction tool need to be presented with care in how it could be used to prioritize treatment without missing treating children who would benefit from nutritional supplementation. Including absolute HAZ as an outcome will help, along with additional discussion of how the CPR fits alongside current treatment recommendations. For example, does this rule indicate treating children who aren't currently treated, or are there children who don't need treatment given current guidelines and the created CPR.

      We thank the Reviewers for pointing out this oversight. We have edited the Discussion for clarity as follows.

      P.23 L.352-361: Our CPR provides a tool for identifying patients likely to experience additional growth faltering after acute diarrhea. Current malnutrition recommendations are based on patient presentation – is a child underweight when they come to the clinic. Our CPR could be used to identify children not currently stunted and therefore not currently recommended for nutritional interventions, but who are likely to slow down in growth and therefore at higher risk of incident stunting. Identifying these children would allow clinicians to connect patients with community-based nutrition interventions (e.g. maternal support for safe introduction of weening foods, small quantity lipid nutrient supplements (SQ-LNS), etc.(45-48)) to prevent additional effects of chronic malnutrition, namely irreversible stunting.

      P.25 L.390-393: Our findings indicate that use of prediction rules, potentially applied as clinical decision support tools, could help to identify additional children at risk of poor outcomes after an episode of diarrheal illness, i.e. not currently stunted but likely to decelerate growth.

      In sum, this is a thorough, well done, clearly explained exercise in creating a clinical prediction tool for predicting child risk of future growth faltering. The writing and motivation is clear, and the methods have applicability far beyond the specific use-case.

    1. Author Response

      Public Review:

      1) Despite I do not find negative arguments for any special section of the study, I have a question regarding Triprismatoolithu stephensis:

      As mentioned in the text, Triprismatoolithus is analysed by the authors, and several pictures are provided in Fig.S12 alongside a brief description in de Supplementary Text 4. But it seems that it is not included in any of the phylogenetic analyses or figures. Why?

      If the specimen has no implication for any of the main analyses, there is no need to be considered as "studied material".

      We added more explanation for the purpose of Triprismatoolithus (Lines 803–806). We presented Triprismatoolithus to show the prismatic shell units of maniraptoran eggshell other than a famous case of Prismatoolithus levis. Thus, Triprismatoolithus was also presented in the Figure S1C along with other eggshells with prismatic shell unit microstructure. Without this ootaxon, there are just three comparative pieces of material in Figure S1, and so we prefer maintaining this ootaxon. Admittedly, this eggshell was not used in our analysis in Figures 13–16 because the specific egg-laying taxon is unknown so its taxon-ootaxon relationship is not as solid as the cases of Elongatoolithus, Macroelongatoolithus, and Prismatoolithus levis. But please note that the role of this ootaxon in Figure S1 is not trivial because it supports the view that even prismatic shell units have rugged grain boundaries in the squamatic zone.

    1. Author Response

      Reviewer #1 (Public Review)

      Overall the claims in the manuscript are clearly communicated and justified by the data. However, one of the features on NeuronBridge that was mentioned in the manuscript did not work intuitively and could use more description in the manuscript. This was the feature to upload a confocal stack to search for other Gal4 lines or the appropriate neurons in the EM hemibrain. When a known Gal4 was in the database, it was easy and intuitive to go from a driver line to an EM neuron or, alternatively if an EM neuron was known it was easy to go from that neuron to find a driver line. It was, however, difficult to upload a stack and find the neuron names or a driver line. The example on Neuronbridge was somewhat helpful but an accompanying brief 'How-to' for this process in the manuscript would be very welcome. If it's a possibility, I recommend adding this in as a 'box' or Figure in the revised paper. Further, the authors may want to provide a troubleshooting guide on the website for uploading a confocal stack onto Neuronbridge.

      We are revising the text on the website for clarity and adding additional troubleshooting information. This, along with other updates to the website, will be available in the next release of NeuronBridge towards the end of 2022.

      Reviewer #2 (Public Review):

      1) Figure 4 and its two supplements show the distribution of correct hits in the top 100 for a forward search, as well as illustrating the complementary nature of the 2 methods, with some correct hits found by one of the methods but not the other. Figure 5 shows the results for a reverse search. It seems that this does not correlate to neuron morphology. The manuscript does not mention however if any attempts were made to improve the scoring so that correct hits would be more highly ranked. It would be helpful to clarify this.

      Development of CDM and PPPM search algorithms and associated pre- and post-processing optimizations has proceeded in parallel with the MCFO data release and NeuronBridge application described in the paper. Mais et al., 2021 describes in detail their work to optimize PPPM. CDM improvements since Otsuna et al., 2018 will be described in Otsuna et al., 2023, which isn't ready yet. While we view the search approach evaluations as showing that neuron matches can be found with CDM and PPPM, the evaluation can't be comprehensive across all neurons, datasets, and algorithm variations.

      2) Related to the point above, the examples used for the forward search are all visual projection neurons. In order to illustrate the usefulness and comprehensiveness of the searches, it would be helpful if some examples of central brain neurons, not truncated in Hemibrain, were also used.

      We acknowledge the limited set of neurons examined in the evaluation of CDM and PPPM search, and tried to weight the claims accordingly in lines 305 and 309 of the submission. We agree more examples would be useful, but providing them hasn't proven feasible during the revision period. While the example neurons are truncated, it does not appear likely that searches with completely reconstructed neurons would generally produce worse results.

    1. Author Response

      Reviewer #1 (Public Review):

      The current study uses microbiology, biochemistry, microscopy, and viral vectors to establish a role for prefrontal cortex expression of the immediate early gene NPAS4 in sucrose preference and dendritic spine morphology in the mouse social defeat stress model. The experimental designs are appropriate and the hypotheses addressed are interesting. The paper is generally very well-written and the figures are clear. Most of the statistical analyses are appropriate, and they are reported in clear and useful tables. Thus, the general potential for the studies is quite high. The authors conclusively show that NPAS4 is induced in mPFC in response to social defeat stress and that NPAS4 is important for stress-induced changes in mPFC dendritic spine number. However, some of the key data regarding reward motivation are difficult to properly interpret and do not convincingly demonstrate a behavioral result of NPAS4 knockdown in mPFC. Moreover, the spine morphology and sequencing analyses lack depth. Most importantly, although the authors explore the effects of reducing NPAS4 expression in mPFC, they do not explore the effects of increasing NPAS4 expression or function, and thus the studies seem incomplete and cannot be fully interpreted.

      We appreciate the reviewer's overall positive feedback on our study and the constructive comments to improve the manuscript. In the revised document, we have addressed the key concerns about NPAS4’s function on motivated behavior by providing the new data by which NPAS4 limits natural reward motivation in the CSDS-susceptible group (Figure 3C-D). We encountered the major challenge that animals that sustained injuries during CSDS had to be removed from the study resulting in relatively few susceptible mice. Other factors likely contributed to the low proportion of susceptible mice, including the biological sex of the investigator (Georgiou et al., Nature Neuroscience, 2022). For the gene expression analysis, we provided comparative analysis of our RNA-seq data with published NPAS4 ChIP-seq data to demonstrate genome-wide NPAS4 association, suggesting potential direct NPAS4 target genes. Furthermore, to extend the structural synapse data, we now provide new electrophysiology data (Figure 4C-H). These new data demonstrate that NPAS4 is required for the CSDS-induced reduction of mEPSC frequency. Using new single-nuclei RNA-seq data from adult mPFC tissues, we observe that NPAS4 is expressed predominantly (~93%) in excitatory neuron clusters, but is also expressed in multiple interneuron populations (~7%). Since our NPAS4 knockdown strategy is not cell type-specific, we have revised the discussion to reflect the possibility that some of the NPAS4-dependent CSDS effects on structural and functional glutamatergic synapses and anhedonia-like behaviors could be due, at least in part, to NPAS4 function in one or more classes of GABAergic interneurons. We have discussed these limitations of interpretation, and the need for future cell type-specific approaches, in the revised manuscript.

      Reviewer #2 (Public Review):

      The authors investigate whether neuronal activity-regulated transcription factor 4 (NPAS4) in the medial prefrontal cortex (mPFC) is involved in stress-induced effects on neuronal spine synapse density (as a proxy for synaptic activity) and reward behaviors. A major strength of the manuscript is that NPAS4 is shown to be necessary for stress-induced reward deficits and pyramidal neuron spine density. In addition, whole transcriptome analysis of NPAS4 target genes identify a number of genes previously found to be regulated in the postmortem brain of humans with MDD, providing translational relevance to these studies. A weakness is that studies were only performed in male mice so its unclear how generalizable these effects are to females. Despite this, the work will likely impact the field of neuropsychiatry by providing novel information about the molecular and cellular mechanisms in mPFC responsible for stressinduced effects on spines synapses and reward behaviors.

      We would like to thank the reviewer for the positive comments, including comparison of our NPAS4-dependent PFC genes with published data from postmortem brains of human’s diagnosed with MDD. We agree with the reviewer that assessing the role of NPAS4 in CSDS or similar chronic stress paradigm in females will be an important future direction for our work, and we acknowledge this limitation of our study in the revised manuscript.

      Reviewer #3 (Public Review):

      Hughes et al. report a role for the transcription factor NPAS4 in mediating chronic stressinduced reward-related behavioral changes, but not other depression-like behaviors. The authors find that NPAS4 is transiently upregulated in Camk2a+ PFC neurons following a single bout or repeated social defeat stress, and that knocking down PFC Npas4 prevents anhedonia. Presentation of linked individual data for social interaction/avoidance measures with/without interaction partners (Fig2C, E) is commended - all CSDS papers should show data this way. Npas4 also appears to mediate the known effect of stress on spines in PFC, providing novel mechanistic insight into this phenomenon. Npas4 knockdown altered baseline transcription in PFC, which overlapped with other stress and MDD-associated transcriptional changes and modules. However, stress-induced changes in transcription with knockdown remain unknown. A major drawback is that only male mice were used, although this is discussed to some extent. Results are presented with appropriate context and references to the literature. Conclusions are appropriate.

      Additional context: Given NPAS4's role as an immediate early gene, it will be important for future work to elucidate whether IEG knockdown generally dampens transcriptional response to stress/other salient experiences. Nevertheless, the authors do show several pieces of evidence that Npas4 knockdown does not simply make mice less sensitive to stress and/or produce deficits in threat/fear-related learning and memory which is an important piece of this puzzle.

      We appreciate the thoughtful and generous comments from the reviewer regarding our display method for social interaction/avoidance data. We agree that a major limitation of our study is the lack of females. Unfortunately, we’ve had limited success with reported adaptations for the use of females in CSDS, and follow-up studies will be critical to assess NPAS4’s mPFC role in chronic stress-induced anhedonia-like behavior. We address this limitation of our current study in the discussion section.

      We agree that IEG manipulation might produce profound changes in the stress-dependent transcriptome of the mPFC. Toward this goal, we investigated the gene expression of several candidate NPAS4 target genes at 1-hour after acute social defeat stress, a timepoint of nearpeak protein expression of NPAS4 (Supplemental Figure 4). Although we observed a main effect of Npas4 knockdown, we did not observe an impact of NPAS4 on stress-induced gene expression (Supplemental Figure 4). NPAS4 is a very rapidly and transiently expressed by stress and neural activity, so to determine the impacts of NPAS4 on stress-induced changes in transcription, multiple time points of research will need to be examined. Future studies performing single-cell transcriptomics at various time points following acute or chronic social defeat stress, sucrose SA, and social interaction will be important to address these questions.

    1. Author Response

      Reviewer #3 (Public Review):

      The manuscript by the Qiu and Lu labs investigates the mechanism of desensitization of the acid-activated Cl- channel, PAC. These trimeric channels reside in the plasma membrane of cells as well as in organelles and play important roles in human physiology. PAC channels, like many other ion channels, undergo a process known as desensitization, where the channel adopts a non-conductive conformation in the presence of a prolonged physiological stimulus. For PAC the mo-lecular mechanisms regulating this process are not well understood. Here the authors use a com-bination of electrophysiological recordings and MD simulations to identify several acidic residues and a conserved histidine side chain as important players in PAC desensitization. The results are overall interesting and clearly indicate a role for these residues in this process. However, there are several weaknesses in the experimental design, inconsistencies between the mutagenesis data and the MD results, as well as in the interpretation of the data. For these reasons I do not think the authors have made a convincing mechanistic case.

      We thank the reviewer for the constructive comments and address the concerns point-by-point below.

      Major weaknesses:

      The underlying assumption in the interpretation of all the data is that the mutations stabilize or destabilize the desensitized conformation of the channel. However, none of the functional meas-urements provide direct evidence supporting this key assumption. Without direct evidence sup-porting the notion that the mutations specifically impact the rate of recovery from desensitiza-tion, I do not think the authors have made a convincing mechanistic case.

      We agree with the reviewer that our functional data measure the degree and rate of the PAC channel entering desensitization from the activated state upon prolonged acid treatment. This is a common experimental procedure for research on desensitization/inactivation of ion channels. Fol-lowing the reviewer’s suggestion, we also sought to capture the kinetics from the desensitized state to the activated state by switching from more acidic pH to less acidic pH (for example 4.0 to 5.0) or neutral pH. However, we found that such experiments are not feasible partly because the kinetics of PAC desensitization is much slower compared to other channels, such as ASIC channels (see a recent study we cited: https://elifesciences.org/articles/51111). For the mutants with strong desensitization (E94R and D91R), it’s unclear whether the currents we recorded at pH 5.0 right after pH 4.0 representing the activated state or the desensitized state at pH 5.0. In other words, we don’t know if the PAC channel transitions from the desensitized state from a lower pH back to the activated state or rather directly to the desensitized state at a higher pH. For the mutants with reduced desensitization, the current amplitude at pH 4.0 were often similar to that at pH 5.0, which makes the recovery/transition variable. We also tried to switch the acidic pH to neutral pH. We found that the PAC channels (both WT and mutants) go back to the closed state from the desensitized state in seconds as limited by our perfusion speed. These data suggest that the desensitized state of PAC is no longer maintained after switching buffer from low pH to neutral pH. In summary, it’s technically infeasible, in our opinion, to measure the rate of recovery from desensitization to activation for the PAC channel. However, our data do support the con-clusion that the rates of entering desensitization from the activated state, a standard measurement of desensitization, change for various channel mutants we studied.

      Overall, the agreement between the MD simulations, functional data, and interpretation are often weak and some issues should be acknowledged and addressed.

      For example:

      1) The experimental data suggests that H98, E107, and D109 play analogous roles in PAC desen-sitization. However, the MD simulations suggest that the H98-D109 interaction energy is ~4 times larger than that of H98-E107. This should lead to a much greater effect of the D109 muta-tion. How is this rationalized?

      The purpose of quantifying the interaction between H/R98 with E107 and D109 is to better dis-sect the mechanism by which H/R98 interacts with the acidic pocket residues. The result suggests that R98 has a reduced association with E107/D109 when compared to H98. It also suggests that D109 makes a more direct interaction with H/R98 when compared to E107. We acknowledge that this is not clear in our initial manuscript and we have updated the text to better describe this result. However, this doesn’t imply that the desensitization phenotype of E107R should be less pronounced than D109R. Both E107R and D109R are expected to disrupt the integrity of the acidic pocket, thus resulting in diminished channel desensitization. It is worth pointing out that E107 played a more complex role as it was identified in our previous papers as one of the major proton sensors. The E107R mutant could allow the PAC channel to become more sensitive to ac-id-induced activation (Figure 4d-e in Ruan et al, Nature, 2020), further complicating its effect in desensitization. Taken together, we don’t think the E107/D109 and H/R98 interaction strength could have quantitative correlation with the desensitization phenotype of E107R and D109R.

      2) The experimental data shows that E94 plays a key role in desensitization and the authors argue that this is due to the interactions of this residue with the β10-11 linker. However, the MD simu-lations show that these interactions happen for a small fraction, ~10%, of the time and with inter-action energies comparable to those of the H98-E107-D109 cluster. It is not clear how these sparse and transient interactions can play such a critical role in desensitization. Also, if the inter-action energies are of the same sign, how come one set of mutants favors desensitization and one does not?

      The 10% value is the amount of time when at least a hydrogen bond forms between E94/R94 and the β10–β11 loop. It is NOT the amount of time that they form interactions, as there could be other types of non-bonded interactions such as Van der Waals interaction and Coulombic interaction. In fact, our non-bonded energy calculation clearly suggests that R94 interacts with the β10–β11 loop much more favorably than E94 (Figure 4C). The impact of E94R on β10–β11 loop is also reflected in the root-mean-square-fluctuation analysis, where the β10–β11 loop shows a reduced flexibility when R94 is present (Figure 4B).

      Our central hypothesis is that PAC becomes more prone to desensitization when the desensitized conformation is stabilized. Two critical interactions are characteristic of the desensitized structure of PAC, including the association of the E94 with the β10–β11 loop, and H98 with E107/D109. Therefore, we expect mutations that alter these interactions to affect PAC channel desensitization. Based on the MD simulations, we observed the root-mean-square-fluctuation of β10–β11 loop are reduced for E94R when compared to WT (Figure 4B), suggesting that β10–β11 loop is stabilized when E94 is replaced by an arginine. The non-bonded interaction energy between E94 and the β10–β11 loop is also more negative for E94R when compared to WT (Figure 4C), another indicator of conformation stabilization. As a result, the E94R mutant favors desensitization. This is in sharp contract with the H98R data, in which H98R interact less favorably with E107/D109 (Figure 2F, G, H, I) when compared to WT. Although the interaction energies are of the same sign, it is the difference between WT and the mutants that will ultimately determine whether a certain mutation will favor desensitization or not.

      The authors' MD analysis critically depends on assumptions on the protonation states of multiple residues, that are often located in close proximity to each other. In the methods, the authors state they use PropKa to estimate the pKa of residues and assigned the protonation states based on this. I have several questions about this procedure:

      • What pH was considered in the simulations? I imagine pH 4.0 to match that of the electrophys-iological experiments.

      The exact pH environment cannot be explicitly modeled in standard MD as the protonation state of an ionizable group is not allowed to change during the simulation. Therefore, in our simulation, we prepared the MD system by first predicting the pKa of titratable residues of PAC in the de-sensitized state, and then assign the protonation status of these residues based on the pKa values. We acknowledge that the description in this part is not very clear in our original manuscript. We have revised the method to better describe how the protonation status is assigned.

      • Was the propKa analysis run considering how choices in the protonation state of neighboring residues affect the pKa of the other residues? This is critical because the interaction energies will greatly depend on the protonation state chosen.

      The pKa analysis was done based on the WT structure and the residue protonation status was assigned based on the predicted value. It is possible that mutations on certain residues could change the pKa of neighboring residues. To evaluate this impact, we carried out pKa prediction for all the mutant structures that we used as input for simulation. This is summarized in the table below:

      As shown in the table, although mutations will affect the pKa of neighboring residues, the impact is generally within 0.3 units. As our simulation is carried out based on a pH of 4.0, this variability will not affect how we assign the residue protonation status.

      • Was the pKa for the mutant constructs re-evaluated? For example, does having a Gln or Arg in place of a His affect the pKa of nearby acidic residues?

      We didn’t re-evaluate the pKa for each mutant in our initial manuscript. We have conducted such an analysis as indicated in the above table. The result suggests that arginine substitutions of H98/E94/D91 could have an impact on the pKa value of nearby residues. However, the differ-ence is relatively small and does not alter the predominant protonation status of these residues at pH 4.0.

      • H98R and Q have the same functional effect. The MD partially rationalizes the effect of H98R, however, it is not clear how Q would have the same effect as R on the interaction energies.

      Our analysis on H98R and H98Q serves two different purposes. H98 is expected to be protonat-ed at pH 4.0. The fact that H98Q mutant reduced PAC desensitization suggests that positive charge at the location is critical for PAC desensitization, which we attribute to the loss of favora-ble interaction between H98 and E107/D109. This is different from H98R mutant as arginine bears the same amount of charge as a protonated histidine. Our data suggest that the exact bio-chemical property, including its charge and side-chain flexibility, of H98 is crucial for PAC de-sensitization.

      • Are 600 ns sufficient to evaluate sampling of the different conformations?

      Our MD analysis doesn’t intend to sample large conformational transitions between different functional state. Instead, our analysis focused on local dynamics which allowed us to correlate the observation with electrophysiology data. During the revision, we have extended our simula-tion to 1 μs for each mutant. It is worth pointing out that because PAC protein is a trimer, and we performed all the calculations across three subunits. Therefore, the effective sampling time would become 3 μs in total. The new result remains the same as our initial analysis, suggesting that the sampling time is sufficient to evaluate the metrics reported in the study. We also acknowledged this limitation of our study in the discussion.

    1. Author Response

      eLife Assessment:

      This manuscript follows the still unanswered concept of 'original antigenic sin' and shows the existence of a 24-year periodicity of the immune response against influenza H3N2. The valuable work suggests a long-term periodicity of individual antibody response to influenza A (H3N2) within a city. But, to substantiate their argument, the authors would need to provide additional supporting data.

      Thank you for your comments. We have performed additional analyses and included those results in the revision to support our findings.

      Specifically, we included a sensitivity analyses that predicting phases by fitting models with 35- and 6-years periodicity, which were found to provide poorer predictions than the 24-year periodicity used in our main results (Figure 4 – figure supplementary 1).

      We also generated a antigenic map with the locations of our tested strains shown in the map. We also compared the paired antigenic distance of A(H3N2) strains (including our tested strains). These results (Figure 1 – figure supplementary 3) suggested that the tested strains that we used spanned the circulation of A(H3N2) since its emergence and well covered the antigenic space of the virus.

      Reviewer #1 (Public Review):

      The authors suggest that there is a long-term periodicity of individual antibody response to influenza A (H3N2). The interesting periodicity may be surely appeared. Though the authors assume that the periodicity is driven by pre-existing antibody responses, the authors could provide more supportive data and discuss some possibilities.

      Thank you for your comments and please find our point-to-point responses below.

      1) The authors can investigate whether the periodicity reflects an epidemic/invasion record of A(H2N3) within Guangzhou or the surrounding city, e.g., the numbers of flu-infected people yearly can be referred to.

      Thank you for your comments. We aimed to investigate the periodicity in individual level antibody responses, so we made several efforts to minimize the impacts of population level A(H3N2) activity in our analyses. In particular, we have removed the average activity at population level (i.e., strain-specific intercepts), to minimize the impact of higher circulation of a certain stain on the periodicity.

      In our simulations, we tested models that only incorporated population level activity but not including cross-reactions (Figure 3B, I), which did not recover the observed periodicity. In the models that including both population level activity and cross-reactions, we found that less predictable population level activities (i.e., less regular annual epidemics) would increase the variations in individual-level long-term periodicity (Figure 3G-H). We also found that measured periodicities did not vary substantially when comparing those measured at baseline compared to those measured at follow up (~3-4 years later). These results suggested that the local epidemics may only have limited impacts on the observed periodicity in individual’s antibody responses, while the cross-reactions between previous exposed and currently circulating strains may be the main drivers.

      To address this comment, we added a paragraph in discussion (lines 336-342):

      “In this study, we did not explore the interactions between individual level antibody responses with population level A(H3N2) activity (e.g., epidemic sizes). We minimized the impacts from population level by performing the Fourier analysis with individual departures from population average and validating the results with data from the Vietnam cohort. Simulation results further suggested that the population level virus activity alone was not able to recover the observed periodicity, though epidemics with less regularity seemed to increase the variability in individual-level periodicity in the presence of broad cross-reactions (Figure 3G-H).”

      2) The authors can consider whether the participants are recently/previously vaccinated and/or infected with flu. The remaining antibodies may reflect a long memory but may show a recent activation.

      Thank you for your comments. We agree with the reviewer that the observed seroconversion of the circulating strains may reflect responses recent re-exposures. Given the low influenza vaccine coverage in our cohort (1.3%, 10 out of 777) and in China in general (<5% [3, 4]), we believe that our observed periodicity and seroconversion patterns were unlikely to be caused by to recent influenza vaccinations.

      We think that the pervasive exposure to A(H3N2) could be a driver to the observed seroconversions to circulating strains between our baseline and follow-up were likely due to the pervasive exposures (or reinfections for those who developed into infections). Using the same data set, we previously reported 98% and 74% of participants experienced 2- and 4-fold rise to any of the 21 tested A(H3N2) strains [5].

      As the reviewer and previous studies suggested, the antibody responses could reflect long term memories that were activated after recent exposures [1, 6]. We generated our hypothesis based on this features, and to characterize the periodicity that may arose from the interactions between long term memories and newly generated antibodies.

      We incorporate the re-infection mechanism in our simulations, with and without subsequent cross-reactions with previously exposed distant strains (Figure 3I). Results indicate that reinfection alone cannot recover the observed long-term periodicity (Figure 3A), while reinfection plus the resulting cross-reactions can recover such long-term periodicity (Figure 3D). Therefore, we believe that the repeated exposures or re-infections would not affect our reported periodicity, while they may be drivers of continuous formulation of the life-course antibody profiles and the observed periodicity. Of particular note is the consistency of measured periodic behaviour at baseline and follow up (~3-4 years later).

      To address this comment, we reported the vaccination status of our participants when introducing the data (lines 127-129) and in the discussions (lines 280-282 and 313-315):

      “Only 0.6% (n = 5) of participants self-reported influenza vaccinations between the two visits, therefore, the observed changes in HI titers between the two visits were likely due to natural exposures.”

      “Due to the low influenza coverage in our participants and in China in general, the observed seroconversions likely reflected antibody responses after natural exposures during the study period.”

      “Particularly, our simulation results suggested that model including repeated exposures or population level A(H3N2) activity alone did not recover the long-term periodicity (Figure 3).”

      3) The strains inducing high HI titers may have similar mutations and may be reactive to the same antibodies. What are the mutation frequencies among 21 A(H3N2) strains?

      Thank you for your comments. We selected the 21 tested strains to cover the span of the circulation of A(H3N2) strains since 1968 and antigenic diversity. We prioritized with the strains that were included in the vaccine formulation and tested to create the antigenic map by Fonville et al. [1].

      We reproduced the antigenic map (up to strains isolated in 2010) by Fonville et al. [1] and compared the antigenic locations of our tested A(H3N2) strains (Figure 1—figure supplement 3). The 21 strains (or their belonging antigenic clusters if the strains were not used for the map) largely tracked the antigenic evolution of A(H3N2) since its emergence in 1968, with a reportedly mutation rate of 0.778-unit changes in antigenic space per year [1, 2].

      We further calculated the paired antigenic distance of strains tested in the antigenic map, which was highly correlated with the time intervals between the isolation of the two strains. The figure also suggested our tested strains cover the time spans and antigenic distances that were shown in the original antigenic map. In addition, our observed periodicity was identified in individual time series of residuals, which has removed the shared virus responses or assay measurements (Figure 1). Therefore, we believe that the impact of specific mutations may have limited impacts on our findings.

      To address this comment, we included the reproduced antigenic map showing the locations of the tested strains and their pair-wise antigenic distance in Figure 1—figure supplement 3 and referenced in the main text (line 127).

      Reviewer #2 (Public Review):

      This is a well-thought-out, clearly exposed article. It builds upon the platform of 'original antigenic sin' (OAS), a notion first developed from studying individuals infected with influenza. According to OAS, the initial infection will set the dominant immune response targets (antigens) that immune cells will recognize, such that infection with a related strain will cause a strong response focused mainly against the initially infecting strain, that then goes on to protect against the new-infecting strain. This study builds off this idea, showing that as strains become increasingly antigenically distant as inferred by the time between strain appearance, the cross-protection can drop to a point where it needs to be invigorated with a potentially new response. The potential biological mechanisms behind this aren't discussed, but a model is built that conveys the potential for 'relative risk' of an individual over the course of the life, based essentially on when one was born.

      Thank you for your comments. We expanded our introduction hoping to include more biological mechanisms, especially those related with original antigenic sin.

      “Antibodies mounted against a specific influenza virus decay (in either absolute magnitude or antigenic relevance) after exposure until re-exposure or infection to an antigenically similar virus occurs, whereupon back-boosting of antibodies acquired from previous infections (e.g., activation of memory B cells) can occur, as well as updating antigen specific antibodies to the newly encountered infection (e.g., activation of naïve B cells.” (lines 80-84)

      “Original antigenic sin (OAS) is a widely accepted concept describing the hierarchical and persistent memory of antibodies from the primary exposure to a pathogen in childhood. Recent studies suggested that non-neutralizing antibodies acquired from previous exposures can be boosted and may blunt the immune responses to new influenza infections.” (lines 92-97)

      The basic premise was to measure from serum influenza haemagglutinin-inhibition (HI) titers of 21 strains of influenza A (H3N2) - related strains causing disease at various times over a period of some 40 years- from a diverse set of ≈800 participants of various ages, at two time points, spaced 2 yr apart. The authors then calculated the HI titer for the 21 strains for each individual. From this, each participant's age, their age at the time of a strain's development, and when a strain emerged were used to assess whether there was periodicity to immune responses by performing a splined Fourier transform for each individual and then examining the composite pattern across time for HI titers. The authors propose that on average there is a 24-year periodicity to immune responses to influenza strains, such that after the initial infection, cross-reactivity reduces to the point where it may be less meaningful for protection over around 24-year, and suggests activation of a 'new' immune response might be required to control the more distant strain involved in the response at that time. The periodicity was longer than would be predicted if age were not a factor involved in the HI titer patterns across time. Further, variability in the periodicity was shown to involve broad cross-reactivity between strains and narrow cross-reactivity in more highly-related (closer in time) strains, individual HI titer, and periodic population fluctuations. In the literature, viral strains are estimated to mutate to the point of losing 50% cross-reactivity with a T1/2 of approximately 2.5 yr, which would make the inferred lifespan plausible but perhaps surprisingly long, implying there are immune feedback parameters that influence periodicity. The authors also use an independent cohort of approximately 150 individuals from a separate, published, study to validate some findings revealed in the primary data set.

      Thank you for your comments and sorry for the confusion. We agree with the reviewer that the onward protection from the cross-protection should be shorter than 24-year periodicity that was identified in the retrospective antibody responses. We hope to clarify that we identified long-term periodicity by retrospectively investigating the individual antibody profiles, which were results of multiple previous exposures and immunity and cross-reactions that arose from these previous exposures. Therefore, the long-term periodicity is a retrospective characterization, and should not be directly interpretated as the duration of onward protection.

      As shown in Figure 4A, the 24-year periodicity consists of phases when individuals’ titers are higher (phase I & II) and lower (phase III & IV) than the population average. As such, the duration of onward protection may be shorter than the entire periodicity. Assuming the protection decreasing with lower titer levels, the onward protection is expected to decrease in phase II and take 1-6 years to drop from the furthest to population average. This is consistent with findings that homotypic cross-protection against PCR-confirmed infections up to about five seasons (lines 291-293), but whether such protection is driven by the declining of cross-reactions still need further investigations.

      To address this comment, we rephrased our discussion and make the interpretation less confusing. (lines 285-287):

      “Of note, the long-term periodicity is a retrospective characterization of individual antibody profiles that arose from multiple exposures and cross-protection, which should not be directly interpreted as the duration of onward protection conferred by the existing antibodies.”

      Strengths: Overall, the study is well executed and the patterns that are visually apparent in Figure 1A (the 'raw' data) are built on to inform a model of the potential breadth of cross-reactivity in a given individual at any given time after birth, integrated with the influenza strains to which they are most likely to have been first exposed. It is a complex thing to make sense of data involving many individuals who could be infected or vaccinated at any and variable points in time over the course of their life, but the authors derive a model that probabilistically accounts for possible infection events, so controls for this nicely, or at least to a degree that is practicable.

      Thank you for your supportive comments. We hope to clarify that we identified the long-term periodicity using the residuals of individual HI titers after extracting the population activity that is visually noticeable in Figure 1A. By doing this, we hope to minimize the impacts of population level A(H3N2) activity and laboratory measurements on individual antibody responses (Figure 1C; detailed methods in lines 396-412).

      Questions related to the main limitation: The level of math in this paper makes it hard for a basic biologist to critique the approach, but the argued points are intriguing. Foremost, in the final part of the paper the authors move from building a model to testing its potential to predict HI titers in the final quarter strains of the study period, placing individuals into one of four phases: I) early increasing to high titer response, II) waning response phase where they are returning back to the average population-level response against a strain, III) sub-par response against a strain and then reinitiation of HI titers in phase IV. Pleasingly this shows a good correlation between individuals' ages and their predicted phase. However, while the fit predicts phase well in Fig 4C and 4D, it looks to perform less adequately in Fig 4B.

      1) Why is this?

      Thank you for your comments and sorry for the confusion. In Figure 4B, we aimed to characterize and predict the position instead of the amplitude in the individual time series of residuals. Therefore, we fitted the model using only harmonic terms (i.e., sine and cosine functions; Equation 12 on page 26) [7], while we believe there may be other factors that could affect the observations but were not included in the model. The perditions from the model inform the position and velocity of harmonic oscillators rather than the amplitude or extent of the wave, therefore, the predictions did not exactly fit the observations.

      To address this comment, we expand the corresponding methods hoping to make it clear (lines 661-663):

      “Of note, we fitted the model aiming to estimate the position of the harmonic oscillators and did not consider for other non- harmonic factors, therefore the model may not fully capture the variations of the data.”

      2) Another point for consideration is that the time between samplings (2010-2012) is comparatively short, given a 24-yr predicted periodicity. What would happen to the predictions if the periodicity were 35-yr or 6-yr? Would the model fail to call individuals accurately in these cases?

      Thank you for your comments. We repeated our predictions in Figure 4F-G by assuming a 35-year and 6-year periodicity respectively as suggested. Results suggested that model predictions with either 35-year or 6-year did not outcompete the model predictions assuming a 24 years old (Figure 4—figure supplement 1). For instance, the observed proportion of seroconversion to circulating strains in each cohort have correlation coefficients of 0.49 (p-value = 0.05), 0.63 (p-value = 0.02) and -0.12 (p-value = 0.69) with the predicted proportion of phase IV when assuming a 35-, 24- and 6-year periodicity, respectively.

      We also hope to clarify that we investigated the prediction potentials of long-term periodicity from two perspectives. Except for using the periodicity to predict the seroconversions between baseline and follow-up, we also predict the phase of each individual in the year of 2012 only using HI titers against strains that were isolated before 2002. Our results suggested our 10-years ahead predictions well correlated with observations (Figure 4C).

      To address this comment, we also included the results of analyses using alternative 35- and 6-year periodicity as Figure 4—figure supplement 1, and reported in the main text (lines 262-264).

      3) Similarly, if the samples were taken further apart, would the model still be effective at predicting phase?

      Thank you for your comments. We hope to clarify that we collected two cross-sectional serum samples, while we identified the long-term periodicity and predicted phase with serums collected from each visit, separately. For instance, in our sensitivity analysis that using serum collected in follow-up (Figure 1—figure supplement 1), we revealed similar long-term periodicity (baseline in Figure 1) with that identified using the baseline serums, despite pervasive exposures during this time period (time separating samples varied from 3-4 years). In addition, the Vietnam data collected sera from six consecutive years. These data showed a similar long-term periodicity (Figure 2—figure supplement 5).

      For the phase prediction, we used residuals of HI titers against 14 historical strains that were isolated between 1968 and 2002, and predicted the phase of strain that was isolated in the year 2012. This prediction was derived purely by depending on the periodic pattern of the time series and without information for strains isolated 10 years prior to 2012. Therefore, the prediction was 10 years ahead and was well correlated with observations from the complete time series, further supporting that there may be an intrinsic cycling in individual antibody responses and that this cycle is fairly stationary and predictable.

    1. Author Response

      Reviewer #1 (Public Review):

      While the circuits underlying the computation of directional motion information in the fly brain are very well described, much less is known about the neurons serving the detection of objects. In a previous publication from the same lab, it has been shown that flies perform body saccades to track a moving object during flight. In the current paper, Frighetto and Frye provide evidence that T3 cells, a population of neurons within the optic lobes, are involved in this task. First, they performed 2-photon Calcium imaging from T3 cells to show that these cells respond to moving bars, which they later use in behavioural experiments. They then silenced T3 cells using genetic tools and tested the behavior of these flies in response to a rotating bar using two different setups. In one, the flies are fixed and bilateral changes in wing stroke amplitude are used as a measure for turning, in the other, flies are magnetically tethered such that they can rotate around the vertical body axis. Silencing T3 cells leads to the abolishment of the steering response induced by object position using a bar that is defined by its motion relative to the surround, but leaves the response to object motion intact. In the magnetically tethered flies, it reduces the number of saccades and thus leads to an impairment of bar-tracking behavior. In another set of experiments they optogenetically activated the whole population of T3 neurons (which supposedly impairs their normal function), which leads to an increase in the number of saccades after the activation (when the light stimulus used to activate the cells is turned off). Silencing the neurons necessary for detection of local motion, T4 and T5 cells, in contrast reduces responses elicited by object motion rather than position, but also has an impact on object tracking saccades. The authors provide a simple model, where speed-dependent signals from multiple T3 cells are integrated and trigger a saccade, when a threshold is reached.

      The data generally support the conclusion that T3 cells play a role in detecting bar position and in controlling saccades in response to rotating bars. However, there are some inconsistencies in the data that are not sufficiently explored and discussed.

      1) In a previous paper from the lab (Keleş et al., 2020), it was shown that T3 cells respond preferentially to small objects, whereas here they robustly respond to elongated bars and even large-field gratings. This discrepancy is not discussed.

      The most likely explanation is that Keleş et al. (2020) work used stimuli of half-contrast (or lower) to probe contrast polarity effects, whereas our stimuli here match the behavior experiments using maximum contrast broadband stimuli. Keleş et al. (2020) work also provided visual stimuli over the full display, >200-degrees in azimuth, whereas here we only provide stimuli unilaterally over <100 degrees; perhaps there was some effect of contralateral stimulation. Finally, different Gal4 drivers; here we use a split-Gal4 that is highly specific for T3. Keleş et al. (2020) work used a normal Gal4 driver less clean than the split. We shall discuss these discrepancies in revision.

      2) In a previous paper, the authors showed that integrated positional error rather than bar position is used to elicit bar-tracking saccades and that saccade amplitude is relatively stereotyped. However, here they show, that T3 cells respond much more strongly to a slowly moving stimulus (18{degree sign}/s) rather than to the fast moving stimuli used for the behavioral experiments (> 90{degree sign}/s). This response property plays an important role for the model they propose. My general concern here is that the findings might not be generalizable to slower moving bars, where more precise, position-dependent responses could play a larger role, and that these fast moving bar stimuli represent an extreme situation, where the flies cannot accurately track bar position any more.

      We agree that flies will not accurately track purely positional cues at higher bar speeds, since responses to positional signals are inherently sluggish. In free-flight, files execute orientation saccades when a stationary post subtends ~30 degrees (bar width used here), at which point the leading edge of the post is moving ~250°/s (van Breugel and Dickinson 2012). Thus, higher bar speeds are the norm for flies, and our behavioral stimuli (90°/s) was chosen to robustly trigger tracking saccades and to compare with previously published behavioral data sets. Bar velocity of 18°/s is far below the range that robustly triggers orientation saccades. We image at 90°/s and 180°/s to show that T3 responses to behaviorally relevant bar speeds could reasonably act as inputs to an integrate-and-fire behavioral controller. These points shall be clarified in revision.

      3) The claim that T3 cells are tuned to stimulus velocity is not supported by the data in my view. For the bar stimuli, the authors only tested speeds of 18{degree sign}/s and above 90{degree sign}/s, but nothing in between. For the grating motion there seems to be an influence of temporal frequency for the same stimulus velocity (see e.g. Fig.1_1), but this is not quantified.

      We shall add a full spatiotemporal response profile in revision. One note: we presented T3 responses to different grating speeds in Supplemental material because our goal was merely to indicate speed sensitivity by T3, rather than to present a comprehensive speed tuning curve. T3 is distinct from T4 and T5 in that it is not directionally selective, is full-wave rectified for contrast, and shows similar responses to bars of differing temporal frequencies moving at the same speed. These properties are also likely accompanied by a broad spatial frequency sensitivity (which would bestow speed tuning), but in revision shall either demonstrate this or remove claim to it.

      4) The results from the optogenetic activation experiments are hard to interpret, as it is unclear how a prolonged activation of all T3 cells would affect the downstream circuitry. It is not clear that this experiment is equivalent to a "loss-of-function perturbation" of T3 cells as the authors claim in the text.

      We are making an assumption, which we shall clarify in revision, that downstream circuitry requires a spatiotemporal progression of columnar activity, as would be generated by the projection of a discrete bar-type-object moving across the eye, and that activation of all columnar inputs together, as would occur with CsChrimson stimulation, would disrupt this discrimination. Although it is a supposition, we feel that it is parsimonious. We compared the effect of CsChrimson stimulation under two different LED intensities but found no effect on bar tracking behavior.

      Reviewer #2 (Public Review):

      In their manuscript titled "Feature detecting columnar neurons mediate object tracking saccades in Drosophila", Frighetto & Frye study the effect manipulating T3 neurons has on tethered flight saccades. The authors first characterize the responses of T3 neurons to simple visual stimuli, and then manipulate T3 cells (with both Kir2.1 and CsCrimson) and study the effects on the fly's tethered flight behavior, focusing on different types of sharp turns (saccades). Finally, the authors suggest an integrate and fire model to explain how an array of T3-like neurons can produce some of the recorded behavior.

      The authors study the elementary, yet challenging, computation of object discrimination. They hone in on a cell type that most likely plays an important role in the circuit. However, the authors do not sufficiently clarify the framework in which they conceptualize T3's role in object discrimination, neither when discussing it in the introduction/discussion nor when explaining experimental results. The authors present the work in comparison to T4/T5 cells. However, T4/T5 cells have been shown to be both local motion detectors and the main cell types to compute motion in the fly's eye. Downstream neurons integrate over these local units to detect different patterns of global and local motion (Authors should cite Krapp 1996 Nature). Are the authors suggesting that T3 neurons perform a similar function only as local object detectors? That is a bold claim that will need to be supported with more experimental results and reconciled with previous results. We already know of other Lobula Columnar neurons (LCs) that respond to different sizes, some even smaller than the optimal T3 stimulus (e.g. Klapoetke 2022 Neuron) and we know of LCs that respond to small objects that do not receive major inputs from T3 cells (e.g. Hindmarsh 2021 Nature).

      We are attempting to posit a simple and parsimonious framework for T3 action. Are T3 neurons “local object detectors”? T3 is clearly not “selective” for local objects, since we show that they respond to elongated bars and wide-field gratings (at least when projected over the ipsilateral visual hemisphere). T3 is, however, “sensitive” to objects: vertical bars yielded a mean response peak ~1 ΔF/F whereas a small square object elicited a peak of ~4 ΔF/F (Keleş et al., 2020). This amplitude differential likely indicates surround inhibition, but does not preclude a downstream integrating neuron from pooling columnar inputs to assemble a spatial receptive field for either an elongated bar or a small object. Individual T4/T5 neurons show roughly double the response amplitude to a small object than a long vertical bar (Keleş et al., 2020), which is consistent with other reports, but one would not classify T4/T5 as “small object detectors” as they play a fundamental role in detecting wide-field motion stimuli. We intend to posit that (i) columnar T3 neurons are small-field (local) detectors of the features contained within stimuli that flies readily track, (ii) that the integration of these local signals could support the integrated error computations that flies make to track bars, which (iii) explains why T3 blockade compromises bar tracking saccades. We do not mean to claim that T3 are the first, last, or only inputs to object detection circuitry in deeper neuropiles. We shall endeavor to clarify these issues in revision.

      These differences between T4/T5 cells and T3s also make interpreting the experimental manipulations more challenging. When hyperpolarizing T4/T5 or 'blinding' them with CsCrimson activation, the visual motion circuit is severely disrupted. However, the same cannot be said about inactivating/blinding T3 neurons and the object detection circuit (if it is indeed a single circuit). The authors are justified in deducing a connection between blocking T3 neurons and a reduction in bar tracking, but generalizing the results to object detection requires more experiments and clarifications.

      We consider “bar tracking” to be one form of object detection, but not the only form. A bar is an “object” (albeit a tall object) in the sense that it is optically disparate from the visual surround. Thus, inactivating/blinding T3 indeed severely disrupts the detection of bar-type-objects. We shall clarify the language to remove any confusion between “object” and “bar”. We do not mean to generalize T3 function to all object vision in the same way that T4/T5 function is generalized to all motion vision, and this shall be clarified in revision.

      When framing the manuscript in the object detection framework, previous results regarding the definition of an object should also be addressed. Maimon Curr. Biol. 2008 and work from their own lab (Mongeau, 2019) have already shown that tethered flies respond differently to bars and small objects (fixating on the former while anti-fixating on the latter). Previous work has also shown that T3 neurons respond strongly to small objects and suppress responses to long bars (Tanaka Curr. Biol. 2020). Since all the behavioral experiments in the current manuscript and all the visual stimuli are full arena-length bars, it is impossible to tell whether the T3 results generalize to small objects and even how to reconcile the stronger response to small objects with the role ascribed to T3 cells in generating behavioral responses to long bars.

      This amplitude differential between small object and elongated bar responses by T3 likely indicates surround inhibition, but does not preclude a downstream integrating neuron from pooling columnar inputs to assemble a spatial receptive field for either an elongated bar or a small object. Consider that T4/T5 neurons show roughly double the response amplitude to a small object than a long vertical bar (Keleş et al., 2020 and consistent with other reports), but one would not classify T4/T5 as “object detectors” as their small-field columnar signals are integrated by downstream wide-field neurons that assemble spatial filters for specific patterns of optic flow that are generated during flight maneuvers (Krapp et al., 1996 Nature). One downstream integrator of T3 inputs, LC11, is more selective for small objects than T3. We shall clarify these points in revision.

      Finally, the authors propose a model for a hypothetical neuron downstream of T3 that would integrate over several T3s and generate saccades. However, given the current knowledge level in the fly vision field, the model should either be grounded more in actual circuit connectivity or produce testable predictions that would guide further research.

      We are currently working on the putative downstream partners of T3, and testing for the integration of T3 signals. Preliminary data show that silencing a specific LC class postsynaptic to T3 recapitulates the effects of silencing T3 on saccadic bar pursuit. In the revised version of the manuscript we will provide additional discussion.

      The authors should decide whether they would like to address these concerns with more specific experiments that would shed light on the role T3 has to play under different conditions and different definitions of a visual object, or whether they would prefer to limit the scope of their claims.

      We shall endeavor to do both!

      Reviewer #3 (Public Review):

      In free flight, flies largely change their course direction through rapid body turns termed saccades. Given how important these turns are in determining their overall behavior and navigation, it is important to understand the neural circuits that drive the timing of triggering these saccades, as well as their amplitude. In this paper the authors leverage the powerful genetic tools available in the fruit fly, Drosophila, to address this question by performing physiology experiments as well as behavioral experiments with inactivation and activation perturbations.

      The authors make three primary conclusions based on their experiments: (1) the feature detecting visual pathway (T3) is responsible for triggering saccades in response to moving objects, but not widefield motion, (2) the pathway primarily responsible for wide field motion encoding (T4/T5) is responsible for triggering saccades in response to widefield motion, and (3) the T4/T5 pathways is responsible for controlling the amplitude of both object and widefield motion triggered saccades.

      The authors go on to show that using calcium imaging data of T3 activity it is possible to predict under what conditions flies will initiate a saccade when presented with objects moving at different speeds, resulting in a parsimonious model for how saccades are triggered.

      Together, the imaging, behavior, and modeling provide compelling evidence for claims 1 and 2, however, the evidence and modeling for point 3 - the amplitude of the saccades - is lacking. The statistical analysis does not go into sufficient detail in comparing across different cases, and in particular, there is little mention of the effect sizes, which appear to be quite small (this is primarily in reference to 3F and 4E). The data suggest that both the T3 and T4/T5 pathways contribute to saccade amplitude, instead of T4/T5 being the only or primary drivers.

      We agree that the evidence suggests that both T3 and T4/T5 pathways contribute to saccade amplitude for bar tracking behavior, and shall clarify this conclusion in revision. However, we also note that the effect of silencing T4/T5 is more prominent (e.g., peak angular velocity) and more consistent across visual conditions. We will dig deeper into the data to substantiate this point. The effect sizes might be small because the silencing approach (i.e., inward rectifying Kir2.1 channels) maintains a hyperpolarized state but does not completely block neuron function; consider that the wide-field optomotor responses of T4/T5>Kir2.1 flies is reduced but not eradicated (Fig. 3A_1).

    1. Author Response

      Reviewer #1 (Public Review):

      Li et al. have designed a study that examines specific mechanisms for how different DNA sequence variants in the common cancer gene p53 (also known as TP53) influence the sensitivity of tumors to a variety of common cancer treatments. Specifically, they examine a handful of p53 variants with respect to glioblastoma and its response to platinum-based chemotherapy and to radiation therapy. The authors begin by mentioning that looking at DNA variants in cancer is useful but also incomplete: methylation, PTMs, and non-DNA sequence variants can also be critical. They then mention that they have created a model showing that nearly all cancers with p53 mutations have loss-of-function variants and that many cancers with "normal" wildtype p53 in fact have variants causing LOF. These p53 LOF tumors lead to worse patient outcomes, but the authors here show that these tumors appear to be more susceptible to radiation and platinum-based chemotherapy, which they say they have validated in glioblastoma xenografts. This potentially opens up a new avenue for precision medicine for many different sources of cancer that share common p53 LOF variants. The authors have taken a modern approach towards cancer diagnosis and shown how this can improve targeted treatments across a large array of cancer types. They have provided a reasonably convincing proof of concept of this approach for n = 35 PDXs in one cancer type. By and large, the approach and results are reasonable, although many of the exact results concerning the genes and pathways identified that covary with the various treatments and p53 variants are unclear. For instance, the feature selection seems to be somewhat ad hoc, e.g. the method used to determine p53 LOF from p53 WT in the TCGA data was not the same method used for determining p53 LOF from p53 WT in the PDX data.

      Thanks for the positive comments. In our study, we used the same method for feature selection (i.e., p53 targets identification), and for calculating CES in different cancer types. This is described in Materials and Methods. However, the methods used to identify the LOF of WT TP53 in TCGA and PDX data are different. For TCGA LUNG, BRCA, COAD, ESCA cohorts, we used the SVM models built from the same cancer type to predict TP53 status. For PDX samples derived from the glioblastoma patients, we used the unsupervised clustering approach. This is because:

      1) To train an SVM model, we need a large number of “normal” samples (to represent p53 normal status) and “tumor samples with TP53 truncating mutation” (to present p53 LOF status). In this PDX cohort (n = 35), we have no “normal” samples and only one p53-truncating mutation (Fig. 4f, Table S6). Technically, it is impossible to build an SVM model from this PDX cohort.

      2) The TCGA GBM cohort also has very limited “normal” samples (n = 5) which prevents us from training an SVM model for glioblastoma prediction.

      3) The TCGA pan-cancer SVM model is not a good choice since GBM was not included into the pan-cancer cohort due to its limited training sample size. Although the pan-cancer model achieved a high AUROC, its performances varied significantly across cancer types. This is most likely due to the imbalanced sample size, since the pan-cancer model is biased by cancer types (e.g., lung and breast) with the larger sample sizes.

      4) Even we were able to build a new SVM model from the TCGA pan-cancer with GBM samples included, applying this SVM model to predict non-TCGA samples is still very challenging because of batch effects.

      Therefore, we first used the unsupervised clustering as an alternative to the SVM model to classify samples, and then we manually annotate the PDX clusters into “p53-pN” and “p53-pLOF” according to the composite expression score.

      We agree with the reviewer that the underlying pathways/mechanisms that can potentially explain the different treatment effects and p53 non-mutational LoF are still unclear and warrant further investigation.

      The TCGA AUROCs were incredibly good - over 99% - versus more like 75% for the actual proof of concept. While any significant p-value is fine for basic research, it would be nice to know how this could be improved and bring the results in Figure 4 from ~75% to the >99% that would be necessary for use as a medical diagnostic or for treatment selection for precision medicine.

      Thanks for your suggestion. Precision cancer medicines that target TP53 mutations are currently being evaluated in clinical trials. Developing a robust model to predict p53 functional status for medical diagnosis or treatment selection is the primary goal of our study. However, there is still a long way to go to bring the model trained from external data into medical practice. To minimize the biological, clinical and technological heterogeneities and bias, the best approach is to train an SVM model from the same cancer type in the same institute; this requires:

      1) The sample sizes of both normal and tumors harboring TP53 truncating mutation should be sufficient to train the SVM model. Take the TCGA lung cancer dataset (n_tumor = 1003) as an example, we built an excellent SVM model from 108 normal samples and 254 tumor samples with TP53 truncating mutations. A much larger sample size is needed if the TP53 truncating mutation frequency is low.

      2) Matched data including whole-exome or whole-genome sequencing (to determine TP53 mutation status), RNA-seq (for gene expression), and treatment response.

      If one plans to use public data such as TCGA to train the model, the major challenge is integrating data from different sources (i.e., remove batch effects arising from different patients’ cohorts, tumor samples storage and processing, library preparation, sequencing, and bioinformatics analyses).

      However, there are significant questions regarding the specific findings uncovered: do the gene pathways identified through bioinformatic analysis fit in with the many highly-studied mechanistic roles of p53? Do the cohort selections - which vary by an order of magnitude in sample size, and come from different locations and different tissues - make statistical sense for cross-validation?

      According to our analysis, p53 targets shared by four selected cancer types are significantly enriched in “cell cycle control” and “DNA damage response” pathways, which are the canonical functions of p53 (PMID: 9039259, PMID: 36183376).

      For the four TCGA cancer cohorts selected in our study, cross-validations were independently performed for each cancer type. For the pan-cancer cohort, we agree with the reviewer that the samples come from different locations and different tissues, and the pan-cancer SVM model could be potentially biased by a few cancer types with larger number of samples. Building a pan-caner SMV model is a compromised strategy when each cancer type alone does not have sufficient samples to train its own SVM model, and more rigorous evaluations (by independent datasets) are needed. This is why we put the pan-cancer results into the supplementary materials. We have revised the manuscript to make this point clear (Page 9).

    1. Author Response

      Reviewer #1 (Public Review)

      [...] One potential issue is that the high myelination signal is associated with the compartment in V2 (pale stripes) which was not functionally defined itself but by the absence of specific functional activations. No difference was reported between those stripes that were defined functionally. Other explanations for the differential pattern of a qMRI signals, e.g. ROI distribution for presumed pale stripes is not evenly distributed (more foveal), ROIs with low activations due to some other factor show higher myelin-related signals, cannot be excluded based on the analysis presented.

      Indeed, it would have been advantageous to directly functionally delineate pale stripes in V2. Since we were not able to achieve this by fMRI, we needed an indirect method to infer pale stripe contributions in the analysis. We also added a statement in the discussion section to emphasize this more (p. 9, lines 286–288).

      Furthermore, different myelination between thin and thick stripes was not tested, since we did not have a concrete hypothesis on this. Despite the conflicting findings of stronger myelination in dark or pale CO stripes in the literature, no histological study stated myelination differences between dark CO thin and thick stripes. Therefore, our primary interest and hypothesis was lying in comparing the different myelination of thin/thick and pale stripes using MRI.

      Thank you very much for this comment about potential other sources of differential qMRI parameter patterns. Indeed, based on the original analysis we could not exclude that the absence of functional activation around the foveal representation may have biased our analysis. We therefore added a supporting analysis, in which we excluded the region around the foveal representation from the analysis. The excluded cortical region was kept consistent between participants by excluding the same eccentricity range in all maps. We added more details in the results section of the revised manuscript (p. 8, lines 189–202). In Figure 5-Supplement 1 and Figure 5-Supplement 3, results from this supporting analysis are shown which reproduced the primary findings from the main analysis, particularly the relatively higher myelination of pale stripes.

      ROI definitions solely based on fMRI activation amplitude have additional limitations. However, we find it unlikely that a small fMRI effect size and low contrast-to-noise ratio (i.e. stochastic cause of low statistical parameter values/”activation”) has impacted the results, since Figure 3 shows that we could achieve a high degree of reproducibility for each participant.

      We would note that the fact that we found consistent differences across MPM and MP2RAGE sessions makes some potential artifacts driving the differences unlikely. We also find it unlikely that systematic cerebral blood volume differences between stripes would have driven the results. A higher local blood volume would lead to increased BOLD responses but also to a higher R1 value due to the deoxy-hemoglobin induced relaxation, which is opposite to the observation of higher activity in the thick/thin stripes but lower R1 values.

      Further studies using other functional metrics (e.g. VASO, ASL etc.) may help us to even more clearly demonstrate specificity but were out of the scope of this already rather extensive study. Although we have added extensive further analyses in the revised manuscript such as controlling for foveal effects or registration performance, we did not see a possibility to fully exclude a systematic bias that might potentially be caused by unknown factors.

      Another theoretical and practical issue is the question of "ground truth" for the non-invasive qMRI measures, as the authors - as their starting point - roundly dismiss direct histological tissue studies as conflicting, rather than take a critical look at the merit of the conflicting study results and provide a best hypothesis. If so, they need to explain better how they calibrate their non-invasive MR measurements of myelin.

      We agree and have now further elaborated on the limits of specificity of the R1 and R2* signal as cortical myelin marker (p. 2, lines 68–88; p. 6, line 163; p. 8, line 216; p. 9, lines. 257–260). However, we still think that it is important for the reader to appreciate the conflicting results in histological studies using staining methods for myelin, which adds to the study’s background.

      We did not intend to give the impression that MRI provides the missing ground-truth to adjudicate histological controversies, but that it provides an alternative and additional view on the open questions. We changed the introduction to better reflect the aspect that the study offers a unique view by providing myelination proxies and functional measures in the same individual, which allows for direct comparison and investigation of structure-function relationships (see p. 2, lines 68–70; p. 3, lines 93–95), which is not accessible to any other approach. Nevertheless, we would like to note that R1 has been well established as a myelin marker under particular conditions (Kirilina et al., 2020; Mancini et al., 2020; Lazari and Lipp, 2021). It has also been widely used for cortical myelin mapping across a variety of populations, systems and field strengths. We added this statement to the introduction (see p. 2, lines 82-85). We note that we excluded volunteers with pathologies or neurological disorders from the study and their mean age was about 28 years. Thus, we had conditions comparable to previous (validation) studies.

      Because of the contradictory findings of histological studies, we could not further finesse the hypothesis beyond our previous a priori hypothesis that we expected differences in the myelin sensitive MRI metrics between the thin/thick versus pale stripes. To improve the contextual understanding, we added a paragraph in the discussion section covering in more depth how the MRI results relate to known histological findings (see pp. 8–9, lines 216–240).

      While this paper makes an important contribution to the question of the association of specific myelination patterns defining the columnar architecture in V2, it is not entirely clear whether the authors can fully resolve it with the data presented.

      Indeed, we agree that non invasive aggregate measures, such as the R1 metrics, offer limited specificity which precludes a fully conclusive inference about cortical myelination. We have further emphasized this on several occasions in the text (see p. 2, lines 68–88; p. 6, line 163; p. 8, line 216; p. 9, lines. 257–260). Since the correspondence of cortical myelin levels and R1 (and other metrics) is an active area of research, we expect that the understanding, sensitivity and specificity of R1 to cortical myelination will further improve. We note that the use of qMRI is a substantial advance over weighted MRI typically used, which suffers from lack of specificity due to instrumental idiosyncrasies and varying measurement conditions.

      Reviewer #2 (Public Review)

      [...] Unfortunately, this particular study seems to fall into an unhappy middle ground in terms of the conclusions that can be drawn: the relaxometry measures lack the specificity to be considered "ground truth", while the authors claim that the literature lacks consensus regarding the structures that are being studied. The authors propose that their results resolve whether or not stripes differ in their patterns of myelination, but R1 lacks the specificity to do this. While myelin is a primary driver of relaxation times in cortex, relaxometry cannot be considered to be specific to myelin. It is possible that the small observed changes in R1 are driven by myelin, but they could also reflect other tissue constituents, particularly given the small observed effect sizes. If the literature was clear on the pattern of myelination across stripes, this study could confirm that R1 measurements are sensitive to and consistent with this pattern. But the authors present the work as resolving the question of how myelination differs between stripes, which over-reaches what is possible with this method. As it stands, the measured differences in R1 between functionally-defined cortical regions are interesting, but require further validation (e.g., using invasive myelin staining).

      We agree that we have inadvertently overstated the specificity of R1 at several occasions in the text. We therefore toned down the statements concerning the correspondence between R1 and myelin throughout the manuscript (e.g. see p. 2, lines 68–88; p. 6, line 163; p. 8, line 216; p. 9, lines. 257–260).

      We also removed the phrase that gave the impression that MRI can conclusively resolve the conflicting results found in histological studies. In the Introduction, we changed the corresponding paragraph by emphasizing the alternative view, which can be obtained from MRI by the possibility to investigate structure-function relationships in the living human brain, which would not be possible by invasive myelin staining (see p. 2, lines 68–70; p. 3, lines 93–95).

      We acknowledge that – perhaps aside from electron microscopy – all common markers have shortcomings, which limit their specificity. For example, classic histology is not quantitative and resulted in conflicting results. It even includes the very fundamental issue, that the composition of myelin varies across the brain and within brain areas significantly (e.g., its lipid composition (González de San Román et al., 2018)). Thus, we regard the different invasive/non-invasive measures as complementary. R1 adds to this arsenal of measures and can be acquired non invasively. It has been shown to be a reliable myelin marker under certain circumstances. It follows the known myeloarchitecture patterns of the human brain, which was also checked for the data of the present study (see Figure 4 and Appendix 2). It is responsive to traumatic changes (Freund et al., 2019), development (Whitaker et al., 2016; Carey et al., 2018; Natu et al., 2019) and plasticity (Lazari et al., 2022). Since we studied healthy volunteers with no known pathologies that were sampled randomly from the population, we believe that the previous results generally apply and suggest sufficient specificity of the R1 marker. Of course, we cannot fully exclude bias due to unknown factors that have not been investigated/discovered by validation studies yet. However, in this case we expect that the systematic differences between stripe types would remain an important result most likely pointing to another interesting biological difference between stripes.

      While more research is needed to clarify the precise role of R1 for cortical myelin, we think that the meaningful determination of quantitative MR parameter within one cortical area is still interesting for the neuroscientific community.

      Moreover, the results make clear that R1 differences are not sufficiently strong to provide an independent measure of this structure (e.g., for segmentation of stripe). As such, one would still require fMRI to localise stripes, making it unclear what role R1 measures would play in future studies.

      Indeed, the observed small effect sizes in the present study still requires a functional localization with fMRI. We expected small effect sizes using R1 and R2* due to the known small inter-areal or intra-cortical differences of MRI myelin markers. Therefore, this study aimed at a proof-of-concept investigating whether intra-areal R1 differences at the spatial scale of columnar structures can be detected using non-invasive MRI. Our study shows that these differences can be seen but currently not at the single voxel level. We anticipate that with further improvements in sequence development and scanner hardware, high-resolution R1 estimates with sufficient SNR can be acquired making fMRI redundant (for this kind of investigations). Please see the reply to the next comment concerning the impact of using R1 in future studies.

      The Introduction concludes with the statement that "Whereas recent studies have explored cortical myelination ... using non-quantitative, weighted MR images... we showed for the first time myelination differences using MRI on a quantitative basis". As written, this sentence implies that others have demonstrated that simpler non-quantitative imaging can achieve the same aims as qMRI. Simply showing that a given method is able to achieve an aim would not be sufficient: the authors should demonstrate that this constitutes an important advance.

      Thank you for this comment. It goes to the heart of the concerns raised about specificity and sensitivity of MRI based myelin metrics. We elaborate here on the main advantage of using qMRI in our current study and why it is more specific than weighted MR imaging. However, we emphasize that a thorough comparison between qMRI and weighted MRI is highly complex and refer to our recent review paper on qMRI for further details (Weiskopf et al., 2021), which are beyond the scope of our paper. The signal in weighted MRI, even when optimally optimized to the tissue of interest, additionally depends on both inhomogeneities in the RF transmit and receive (bias) fields. Other methods like using a ratio image (T1w/T2w) can cancel out the receive field bias entirely (in the case of no subject movements between scans) but not the transmit field bias. This hampers the direct analysis and interpretation of signal differences between distant regions of the brain. For high resolution imaging applications, the usage of high magnetic fields such as 7 T is beneficial or even mandatory due to signal-to-noise (SNR) penalties. With increasing field strength, these inhomogeneities also apply to small regions as V2. For these cases, qMRI is advantageous since it provides metrics which are free from these technical biases, significantly improving the specificity. As high-field MRI has the potential to non invasively study the structure and function of the human brain at the spatial scale of cortical layers and cortical columns, we believe that the results of our current study, which successfully demonstrate the applicability of qMRI to robustly detect small differences at the level of columnar systems, is relevant for future studies in the field of neuroscience.

      We emphasized these considerations in the revised manuscript (see. p. 9, lines 273–285).

      The study includes a very small number of participants (n=4). The advantage of non-invasive in-vivo measurements, despite the fact that they are indirect measures, should be that one can study a reasonable number of subjects. So this low n seems to undermine that point. I rarely suggest additional data collection, but I do feel that a few more subjects would shore up the study's impact.

      The present study was conducted in line with a deep phenotyping study approach. That is, we focused on acquiring highly reliable datasets on individuals. We did not intend to capture the population variance, which is often the goal of other group studies, since low level and basic features such as stripes in V2 are expected to be present in all healthy individuals. Thus we traded off and prioritized test-retest measurements for fMRI sessions and using an alternative MP2RAGE acquisition over a larger number of individuals. This resulted in 6–7 scanning sessions on different days for each individual, summing up to 26 long scanning session in total. We also note that the used sample size is not smaller than in other studies with a similar research question. For example, another fMRI study investigating V2 stripes in humans used the same sample size of n=4 (Dumoulin et al., 2017).

      The paper overstates what can be concluded in a number of places. For example, the paper suggests that R1 and R2 are highly-specific to myelin in a number of places. For example, on p7 the text reads" "We tested whether different stripe types are differentially myelinated by comparing R1 and R2..." Relaxation times lack the specificity to definitively attribute these changes purely to myelin. Similarly, on p11: "Our study showed that pale stripes which exhibit lower oxidative metabolic activity according to staining with CO are stronger myelinated than surrounding gray matter in V2." This implies that the study directly links CO staining to myelination. In addition to using non-specific estimates of myelination, the study does not actually measure CO.

      We agree that we did not clearly point out the limitations of R1 myelin mapping. Therefore, we toned down the statements about the connection between cortical myelin and R1. The mentioned statements in the reviewer’s comment were changed accordingly (see p. 6, line 163; p. 11, lines 353–354). We also included a small paragraph to clarify the used terminology (color-selective thin stripes, disparity-selective thick stripes) in the manuscript (see p. 4, lines 110–114) to avoid the inadvertent conflation of CO staining and actually measured brain activity.

      I'm confused by the analysis in Figure 5. I can appreciate why the authors are keen to present a "tripartite" analysis (thick, thin, and pale stripes). But I find the gray curves confusing. As I understand it, the gray curves as generated include both the stripe of interest (red or blue plots) and the pale stripes. Why not just generate a three-way classification? Generating these plots in effect has already required hard classification of thin and thick stripes, so it is odd to create the gray plots, which mix two types of stripes. Alternatively, could you explicitly model the partial volume for a given cortical location (e.g., under the assumption that partial volume of thick and thin strips is indicated by the z-score) for the corresponding functional contrast? One could then estimate the relaxation times as a simple weighted sum of stripe-wise R1 or R2.

      Figure on weighted average of stripe-wise R1 and R2. (a) shows the weighted sum of R1 (de-meaned and de-curved) over all V2 voxels. z-scores from color-selective thin stripe experiments and disparity-selective thick stripes were used as weights in the left and middle group of bars, respectively. An intermediate threshold of zmax=1.96 was used, i.e., final weights were defined as weights=(z-1.96). Weights with z<0 were set to 0. For pale stripes (right group of bars), we used the maximum z-score value from thin and thick stripe measurements. We then set all weights with z≥1.96 to 0 and used the inverse as final weights. i.e., weights = -1 * (max(z)-1.96). (b) shows the same analysis for R2. Error bars indicate 1 standard error of the mean.

      (1) Yes, indeed. We agree that modeling the partial volume of each compartment (thin, thick and pale stripes) in each V2 voxel would be the most elegant approach. However, we note that z-scores between thin and thick stripe experiments may not reflect the voxel-wise partial volume effect, since they are a purely statistical measure and not a partial volume model. Having said this, we think that this general approach can give some additional insights and we provide results for a similar analysis here. We calculated the weighted sum of R1 and R2 values over all V2 voxels for each stripe compartment (thin, thick and pale stripes) independently (see above figure). For R1, we see the same pattern of R1 between stripe types as in the manuscript (Figure 5). Additionally, we show the differences here for each subject, which further demonstrates the reproducibility across subjects in our study. For R2, no clear pattern across subjects emerged, confirming the results in our manuscript. Since, this analysis did not add relavant new information to the manuscript, we refrained from adding this figure to the manuscript, in order not to overload it.

      (2) In our current study, we were not primarily interested in investigating differences between thin/thick stripes and pale stripes. While histological analysis found differences (though not consistent) between CO dark stripes (more myelinated, (Tootell et al., 1983)) and CO pale stripes (more myelinated, Krubitzer and Kaas, 1989)), no study stated myelin differences between CO dark stripes. This does not fully exclude the possibility of myelination differences but suggests that if myelination differences between CO dark stripes existed, they would presumably be smaller than differences between CO dark and CO pale stripes. Thus, it would be even more difficult to demonstrate than the hypothesis of this manuscript.

      Therefore, we decided to directly test two compartments against each other instead of modeling all three compartments within a single model. In our analysis, we thereby loosely followed the analysis methods described in Li et al. (2019), which compared myelin differences between thin/thick and pale stripes in macaques. We note that this demonstrates further consistency, since it is not trivial that both thick and thin stripes show lower R1 values than the pale stripes. For example, there may be no or opposite differences.

      (3) Just for clarification, the plots in Figure 5 show the comparison of R1 (or R2*) between two compartments in V2. The red (blue) curve includes the thin (thick) stripe of interest. The gray curve includes everything in V2 minus contributions from thick (thin) stripes of interest. If we take the thin stripe comparison as example (Figure 5a), then red contains the thin stripes of interest while gray contains everything minus the thick stripes. Therefore, assuming a tripartite stripe arrangement, the gray curve contains both thin and pale stripe contributions.

      References

      Carey D, Caprini F, Allen M, Lutti A, Weiskopf N, Rees G, Callaghan MF, Dick F. Quantitative MRI provides markers of intra-, inter-regional, and age-related differences in young adult cortical microstructure. Neuroimage 2018; 182:429–440.

      Dumoulin SO, Harvey BM, Fracasso A, Zuiderbaan W, Luijten PR, Wandell BA, Petridou N. In vivo evidence of functional and anatomical stripe-based subdivisions in human V2 and V3. Sci Rep 2017; 7:733.

      Freund P, Seif M, Weiskopf N, Friston K, Fehlings MG, Thompson AJ, Curt A. MRI in traumatic spinal cord injury: from clinical assessment to neuroimaging biomarkers. Lancet Neurol 2019; 18:1123–1135.

      González de San Román E, Bidmon H-J, Malisic M, Susnea I, Küppers A, Hübbers R, Wree A, Nischwitz V, Amunts K, Huesgen PF. Molecular composition of the human primary visual cortex profiled by multimodal mass spectrometry imaging. Brain Struct Func 2018; 223:2767–2783.

      Kirilina E, Helbling S, Morawski M, Pine K, Reimann K, Jankuhn S, Dinse J, Deistung A, Reichenbach JR, Trampel R, Geyer S, Müller L, Jakubowski N, Arendt T, Bazin P-L, Weiskopf N. Superficial white matter imaging: Contrast mechanisms and whole-brain in vivo mapping. Sci Adv 2020; 6:eaaz9281.

      Krubitzer LA, Kaas JH. Cortical integration of parallel pathways in the visual system of primates. Brain Res 1989; 478:161–165.

      Lazari A, Lipp I. Can MRI measure myelin? Systematic review, qualitative assessment, and meta-analysis of studies validating microstructural imaging with myelin histology. Neuroimage 2021; 230:117744.

      Lazari A, Salvan P, Cottaar M, Papp D, Rushworth MFS, Johansen-Berg H. Hebbian activity-dependent plasticity in white matter. Cell Rep 2022; 39:110951.

      Li X, Zhu Q, Janssens T, Arsenault JT, Vanduffel W. In Vivo Identification of Thick, Thin, and Pale Stripes of Macaque Area V2 Using Submillimeter Resolution (f)MRI at 3 T. Cereb 2019; 29:544–560.

      Mancini M, Karakuzu A, Cohen-Adad J, Cercignani M, Nichols TE, Stikov N. An interactive meta-analysis of MRI biomarkers of myelin. Elife 2020; 9:e61523.

      Natu VS, Gomez J, Barnett M, Jeska B, Kirilina E, Jaeger C, Zhen Z, Cox S, Weiner KS, Weiskopf N, Grill-Spector K. Apparent thinning of human visual cortex during childhood is associated with myelination. PNAS 2019; 116:20750–20759.

      Tootell RBH, Silverman MS, De Valois RL, Jacobs GH. Functional Organization of the Second Cortical Visual Area in Primates. Science 1983; 220:737–739.

      Weiskopf N, Edwards LJ, Helms G, Mohammadi S, Kirilina E. Quantitative magnetic resonance imaging of brain anatomy and in vivo histology. Nat Rev Phys 2021; 3:570–588.

      Whitaker KJ, Vértes PE, Romero-Garcia R, Váša F, Moutoussis M, Prabhu G, Weiskopf N, Callaghan MF, Wagstyl K, Rittman T, Tait R, Ooi C, Suckling J, Inkster B, Fonagy P, Dolan RJ, Jones PB, Goodyer IM, NSPN Consortium, Bullmore ET. Adolescence is associated with genomically patterned consolidation of the hubs of the human brain connectome. PNAS 2016; 113:9105–9110.

    1. Author Response

      Reviewer #1 (Public Review):

      1) In family 2, the variant was detected by routine trio-based WES diagnostics. Sanger confirmation was not performed. IGV images can be added as supplementary material. Furthermore, median coverage was 75× which might not be sufficient for the identification of all heterozygous variants.

      We thank reviewer for pointing it out for clarification. Obviously, at the time (2016) of the reporting of this variant this was our laboratory’s thoroughly validated protocol, which shows that median (!) coverage of 75x with the technology at the time is more than sufficient for robust variant calling. This particular variant was actually below 75X in coverage (at 65x), but Sanger confirmation was not necessary (based on thorough validation of the robustness of calling and GATK scores and other quality parameters for de novo calling). In addition, when coverage goes below 30-35X Sanger confirmation is warranted.

      2) Proband 2 (P2) was born as the second child of non-consanguineous parents of Caucasian descent after an uneventful pregnancy and delivery. The boy was macrosomic at birth. Since there was macrosomia, how would the pregnancy be uneventful? At the last assessment at 10 years of age, obesity associated with hyperphagia was of concern; the weight of the patient should be clarified. P2 was diagnosed with autism spectrum disorder but a normal cognitive profile. The identified NM_001014809.2(CRMP1_v001):c.1280C>T variant is very rare and reported in GnomAD exomes with allele frequency 0.0000041.

      Routine echographia during pregnancy did not result in any concerns. The pregnancy was indeed uneventful. BMI at last evaluation was 26.1. We included the details in the revised manuscript.

      3) Proband 3 (P3) is the first of three children of a non-consanguineous family of European descent. There is a familial history of obesity on both parental sides, and the father is macrocephalic (head circumference: 60.5 cm). Macrocephaly can be isolated and benign, such as in benign familial macrocephaly. However, P3 presented with moderate intellectual disability and an autism spectrum disorder. Since P3 has a macrocephaly also, the PTEN gene should be further interrogated by detailed WGS data analysis as well as an additional orthogonal method(s) since it has pseudogenes.

      We have not noted any pathogenic variant of the PTEN gene in the genetic analysis.

      Reviewer #2 (Public Review):

      Weaknesses of the article include:

      1) Spelling errors and difficult-to-understand language. The use of "variant" is now preferred over mutation. According to current nomenclature, predicted but not experimentally confirmed protein alterations should be written as p.(Phe351Ser) rather than p.Phe351Ser.

      We apologise for the spelling errors and the difficult-to-understand language in the manuscript. We considered the reviewers comments seriously and corrected the errors and rephrased the sentences wherever necessary.

      2) Inconsistent use of in silico pathogenicity predictors and conservation metrics. These should be standardized for each case and should include at least phylop, CADD, and REVEL.

      We have applied consistency in the description of in silico pathogenicity predictors and conservation metrics for each patient.

      3) CRMP1 is under significant constraint against loss-of-function variation in gnomAD - pLI = 0.99, LOEUF 0.28. Genes in the top decile are highly enriched for haploinsufficiency as a disease mechanism. This should be considered in the interpretation of this data and incorporated into the manuscript.

      We thank the reviewer for the comment. As per reviewer’s suggestion, we have included a statement in the revised manuscript under ‘Subjects and Methods’ section.

      4) I am not convinced the data supports a dominant-negative interpretation. The variants do not oligomerize as well as wild-type CRMP1, and when co-expressed with wild-type CRMP1 there is an increase in monomeric wild-type CRMP1. While this could support a dominant-negative interpretation, an alternative explanation is these are loss-of-function alleles that cannot oligomerize, and at the stoichiometry of this artificial overexpression system, this leads to increased monomeric wild-type CRMP1. The axonal outgrowth studies are more compelling, but without a loss-of-function control allele, it is difficult to interpret.

      The experiments in Figure 2 should be replicated, quantitated, and their statistical significance confirmed.

      We thank reviewer for raising concern about the experiment and interpretation of the data. We performed size exclusion chromatography experiments and included the data in the revised Figure 2. Unfortunately, we could not reproduce the experiments for Figure 2B. From our current experimental results, we prove that the CRMP1 variants affect the homo-oligomerization process.

      Reviewer #3 (Public Review):

      1) The major weakness is Figure 2, as it is not performed up to high standards like the rest of the paper. Panel A does not show any loading control and does not confirm. Panel B at 720 kDa band is not convincing. Results should be repeated with size exclusion chromatography and/or another method to determine molecular weight and should be quantified from triplicate experiments. Panel C is also not convincing and should be repeated to more carefully show results, and quantified.

      We thank reviewer for this important concern raised on our Figure 2 experimental data. We addressed the comments in the revised manuscript. We performed size exclusion chromatography and presented the results in the revised manuscript and discussed accordingly in page 23-24.

      Fig. 2A: This panel shows the recombinant CRMP1 wildtype and the variants from E-coli expressing system. We repeated the expression several times and obtained similar partially cleaved proteins. Fig. 2A is Coomassie Brilliant Blue staining. Protein size marker and loading control (BSA) were applied on the same gel as shown in Fig.2A original.

      Fig.2B: Due to limited protein expression of T313M and P475L mutants, we could not repeat the gel-filtration experiments.

      Fig. 2C, 2D: It is difficult to adjust the expression level of each construct (CRMP1 wildtype, T313M, or P475L) in HEK293T cells (input). Therefore, we measured the signal intensity of myc-IP band and input ratio of V5 blot in each condition. Fig. 2D shows the ratio from four independent experiments.

    1. Author Response

      Reviewer #1 (Public Review):

      In this paper, Quiniou and colleagues show via orthogonal methods that human thymopoiesis releases a large population of CD8+ T cells harboring a/b paired TCRs that (i) have high generation probabilities, (ii) have a preferential usage of some V and J genes, (iii) are shared between individuals and (iv) can each recognize and be activated by multiple unrelated viral peptides, notably from EBV, CMV and influenza.

      Major strengths of the paper

      Quiniou et al. generated single-cell sequencing datasets of the earliest stages of TCR beta chain gene recombination. And then showed that a subset of them is highly clustered also having high generation probability.

      They show that these T cells can bind multiple antigens, both via the use of public antigen-specific datasets as well as corroborating experimental TCR expression and binding essays.

      Minor weaknesses

      To what extent is TCR clustering and high Pgen and cross-individual sharing correlated? What is the Pgen of the sequences clustered with the high Pgen cells? Can you comment on the correlation between these three phenomena?

      Indeed, there is a significant positive correlation between the Pgen and the number of connections among the clustered TCRs, as was reported in Fig.1F of the original manuscript. Furthermore, this correlation is true for both private and public TCRs, as was reported in figure 2B of the original manuscript.

      To show the link between the three phenomena, we now have added two supplementary figures showing a high positive correlation between Pgen and the number of connections, and between cross-individual sharing and the number of connections, and to a lesser extent between Pgen and cross-individual sharing (Figure 2-figure supplement 4C and D in the manuscript supplementary information).

      However, we would like to emphasize that the difference in the mean Pgen of the clustered and dispersed TCRs is of about 20-fold. This is a high difference for a biological process (and highly statistically significant), but a small one compared to the 10-log10 span of the Pgens of the two populations. Factually, what we observed is not that clustered sequences have a high Pgen, but that they have a higher Pgen than the non-clustered sequences. Yet, many CDR3s with high Pgen do not cluster, and vice versa, indicating that a high Pgen is not the only (nor most important) driver of clustering. We have now added these as Figure 1-figure supplement 3E-F of our revised manuscript.

      In other words, to what extent is this surprising to see that highly clustered TCRs have higher Pgen and are more shared?

      That for a given CDR3 there is a correlation between having a high Pgen and being public is not surprising as both suggest a positive selection during evolution. What is more surprising is that there are CDR3s forming large clusters that occupy over 20% of the repertoire and that co-cluster between individuals with different HLA, “indicating a convergence of specificities between individuals’ clustered repertoires”. This suggests a surprising selection process that could depend less on HLA than the “classical” selection.

      These points are now better emphasized in the revised manuscript.

      Potential Impact of the paper

      This work highlights an intrinsic property of the adaptive immune response: to generate TCRs with high generation probability that can efficiently bind multiple antigens. This finding has, therefore important impact on drug discovery and vaccine design.

      We thank the reviewer for his appreciation.

      Reviewer #2 (Public Review):

      This study analyses the T cell receptor (TCR) repertoire of double positive human thymocytes, and compares this to mature single positive CD8 cells. The first major finding is that the repertoire post-selection is enriched for groups of TCRs with high generation probabilitites, similar sequences, and for TCRs previously annotated for viral specificity. This data is clearly presented and convincing. The extent of analysis of the human thymocyte repertoire is still very limited, and the paper adds significantly to this important question.

      We thank the reviewer for his appreciation.

      The second major finding is much more controversial. The authors first investigate the publicly available databases and show that there is a substantial proportion of TCRs which have been annotated to multiple viral specificities, a fact which is well-known to the specialists in the field, but not previously addressed.

      Indeed, we are not aware of reports disclosing “a substantial proportion of TCRs which have been annotated to multiple viral specificities”. Actually, one could wonder why “a fact which is well-known to the specialists in the field” is not mentioned and discussed in published articles? To us, it reveals that this point has been overlooked by immunologists as recently in Zhang et al, 2021 where authors aiming at identifying highly specific T cell clones with a new modelling approach, excluded all clones binding more than 1 peptide. Thus, it makes it important to report it, as we do. Furthermore, we would also like to emphasize that we do more than just reporting that some TCR have “been annotated to multiple viral specificities”. We show from a manual curation of public databases that (i) some TCR have been reported to bind to tetramers presenting peptides from unrelated viruses; (ii) that such TCRs co-cluster using Levenshtein distance or GLIPH2 based clustering method; and (iii) that some of these TCRs indeed recognize different, unrelated peptides without significant sequence homology upon re-expression in carrier T cells.

      The authors acknowledge that this in silico analysis is mostly based on unpaired alpha/beta sequence data, and that the chain pairing may influence specificity. They, therefore, perform a number of functional assays, demonstrating examples of T cells which respond by interferon gamma production to more than one peptide.

      We thank the reviewer for pointing to the fact that, beyond tetramer binding, we performed cumbersome functional studies to document polyreactivity.

      The paper is mostly very clearly written and presented and provides some fascinating novel perspectives on T cell cross-reactivity.

      We thank the reviewer for his appreciation

      The findings will surely be of interest to a broad readership - indeed anyone interested in how adaptive immunity works.

      The link between the different sections of the paper is the weakest aspect. The relationship between thymic selection and polyspecificity, and also the real relationship between in silico "cross-reactivity" as evidenced by multiple annotations and the functional polyspecific T cells remains unclear.

      Our flow of reasoning/analyzing was as follow. As we were studying the thymic selection of TCR repertoires, (1) we discovered a massive clustering within these repertoires. As for thymocytes this cannot be accounted for by a history of immune responses, this triggered our attention and led us to analyze the properties of these TCRs. This led us (2) to discover in these thymic repertoires “TCRs which have been annotated to multiple viral specificities”, that we were not aware of. We were so much intrigued by these observations that we wanted to substantiate them using datasets of paired  TCRs. As (3) we could confirm these observations in such datasets, this led us (4) to investigate these TCRs in functional studies. This is the link for the 1-to-4 sections.

      To make this link clearer, we have reworked the titles of the different Results’ sections such as to emphasize the switch from thymocyte bulk sequencing studies to that of single peripheral cell sequencing studies.

      The mechanistic molecular details underlying polyspecificity also remain unclear.

      Indeed, we believe that solving the structure of polyreactive TCRs interacting with different peptides will be needed for a molecular understanding of polyreactivity, but that it falls beyond the present work.

      But overall, lots of interesting new data, and some very intriguing hypotheses for the community to follow up on.

      We thank the reviewer for his overall comment

      Reviewer #3 (Public Review):

      In this manuscript, the authors propose that there is a special, previously unrecognized, high-frequency population of a/b TCRs that are shared between people, have high generation probabilities, and react to many unrelated viral epitopes. Here is the main flow of the results, with comments on the strengths of the conclusions:

      "Thymopoiesis selects a large and diverse set of clustered CDR3s with high generation probabilities" -- this seems correct and has been noted in earlier work by Mora and Walczak and others.

      So far, Mora and Walczak selection models in humans are based on studying PBMCs (our ref n° 27 in the revised version), not thymic DP and SP sorted cells, even in the mouse derived models for which they used the total thymic cells (our ref n° 27).

      Selection leads to a focusing of the CDR3 length which likely increases the degree of clustering and increases Pgen.

      To address this question, we compared the CDR3 length distribution between DP CD3+ cells and CD8 SP cells from our thymic dataset. We did not observe major changes. The distribution and the mean of CDR3 length for the two cell populations remained identical. We only observed a small shifting in the CDR3 length distribution towards shorter sequences post-selection. This is now reported in the new Figure 1-figure supplement 3C in the revised manuscript.

      "Clustered CDR3s are enriched for publicness " This also seems correct and again it makes sense: publicness is equivalent to having been independently rearranged (and sequenced) in another individual, which is determined by Pgen, and clustering is also determined to a large extent by Pgen (the factors that contribute to Pgen, shorter CDR3s for example, are largely shared between neighbor TCRs).

      We agree that theory could have indeed predicted that. In any case, to our knowledge, this is the first report of large clusters of just selected thymocytes’ CDR3s that moreover co-cluster between individuals with different HLA.

      "Clustered public CDR3s are enriched in viral specificities" -- This claim is not justified by the data, which comes from sequence matching against literature-derived databases. Rather, what is true is that "Clustered public CDR3s are enriched in public viral specificities".

      We changed “CDR3s are enriched in viral specificities” for “clustered public CDR3s are enriched in public viral specificities".

      But this might be a simple consequence of the previous observation, that "clustered CDR3s are enriched for publicness". One would need experimental specificity data on the very same datasets to make a conclusion about viral specificities in general.

      We based our interpretation on experimental data.

      Indeed, we manually curated databases to identify CDR3s that bind specific tetramers/dextramers. This type of “experimental specificity data” is for immunologists a paradigmatic and yet unchallenged mean to define specificity.

      We make the observation that there are more CDR3s from a TCR that does bind tetramers/dextramers presenting viral peptides in clustered than in dispersed CDR3s. This is a highly statistically significant fact, that we now report as a fact that we leave open to discussion/challenge by our community.

      "Identification of polyspecific TCRs" -- In this section, the authors report that some of the CDR3 clusters contain CDR3 sequences from literature-derived TCRs with multiple specificities. They conclude that these must represent polyspecific TCRs. The problem with this conclusion is that even having the same CDR3beta, let alone similar CDR3beta sequences, does not imply the same specificity. One can see the problem if one imagines a very deeply sequenced dataset, and focuses on a short CDR3 length with high frequency. With sufficient sampling, one will be able to navigate from nearly any single CDR3beta to any other CDR3beta of the same or similar length by jumping between single-mismatch variants. But this doesn't imply that all the TCRs from which these CDR3s were sampled, which likely have many different Vbeta genes and completely different TCRalpha sequences, must all bind the same thing.

      We will first point to the fact that we did not analyze “a very deeply sequenced dataset”, but only the 18 000 most abundant sequences per sample. Singletons were excluded. In addition, we did not mean to say that all the connected TCRs have the same specificities, regardless of their position in the cluster. Clustering algorithms, whether LV distance of GLIPH2 for example, are now commonly used to infer specificity of clusters and it is admitted that the closer the TCR sequences are, the more they share their specificities.

      That said, it is precisely because we acknowledge the limitation of bulk sequencing for inferring specificities that we turned to also analyze single-cell datasets.

      We made this more apparent by the new sections of the results that more clearly indicate the shift from unpaired bulk thymocyte sequencing and paired single peripheral cell sequencing.

      "Binding properties of polyspecific TCRs" -- Here the authors look to validate these results with paired TCR sequences. They analyze a public dataset made available by 10X genomics, featuring single-cell gene expression, TCR sequencing, and dextramer UMI counts for ~150,000 T cells. This is an amazing dataset with lots of interesting features, but, like any large high-throughput dataset, it needs to be analyzed with care.

      We can assure the reviewer that we were always very careful. Actually, we even started by carefully reviewing the 10X proposed methodology, in which we identified major biases. This led us to explore this dataset cautiously and without preconceived ideas.

      The authors claim to see evidence for large-scale cross-reactivity. This comes mainly from a set of dextramers for A03 and A11-restricted peptides. But these dextramers appear to be binding in a uniquely non-specific manner (by comparison with the other dextramers) and non-TCR-dependent manner in this experiment. One can see this, for example, by comparing the consistency of binding within expanded clonotypes: for a specific dextramer like A*02-GIL(Flu), positive binding for one cell in a clonotype greatly increases the likelihood of binding for other cells in the clonotype, suggesting that the binding is mediated by the TCR.

      This is not true for the A03 and A11 dextramers (except for a few expanded clonotypes in an A*11 donor). TCR sequence doesn't appear to be the determining factor for binding to these dextramers; rather it may be expression of KIR genes or other surface proteins that can interact with MHC.

      These are indeed striking binding patterns that are remarkably similar for a single CDR3 beta associated with more than 40 different CDR3s alpha (and moreover from two donors). The first attitude of immunologists would indeed be of discarding this observation for non-fitting the paradigms. We would like to rather propose an agnostic view at these results.

      These results show that a series of five A03 and A11 dextramers loaded with various peptides bind to cells that express a given CDR3 beta associated with a multitude of CDR3alpha. If it would be an MHC to KIR binding, then such dextramers should bind to most cells, independently of their TCRs. We have added two supplementary figures (Figure 4-figure supplement 8B-C) to show that this is not the case, and that further show very different binding patterns.

      If it would be a binding to “other surface proteins”, it would likely be the same.

      We identified a CDR3 from donor 3 which binds preferentially to A03 and A11 dextramers. However, it binds to only 4 out of 5 of these. If the binding is non-specific and non-TCR-dependent, a binding for the A0301 RIAAWMATY BCL2L1 dextramer should also have been observed. Moreover, we identified this same CDR3beta in two other cells from donor 1 and 4, and that were associated with a different CDR3alpha. Except for only one binding, these TCRs didn’t show binding to the A03 and A11 dextramers.

      Moreover, we identify another CDR3 from donor 1 that is associated with a strong binding to one A1101 dextramer presenting an EBV peptide when associated to many different CDR3alpha. The binding to the other A03 and A011 dextramer is weaker and seem to depend more on the CD3alpha.

      If the binding of A03 and A011 dextramers is non-specific and non-TCR-dependent, why is there such a difference between the binding of A1101 IVTDFSVIK and A1101 AVFDRSDAK dextramers?

      "Polyspecific T cells are activated in vitro by multiple viral peptides" Here the authors explore polyspecificity experimentally. First they report that polyclonal populations of T cells, sorted for binding to one dextramer, can also produce IFN gamma upon stimulation with a distinct peptide, albeit more weakly than for the cognate peptide.

      This is indeed true for CMV+ sorted cells that respond better to CMV peptides than to EBV ones, but not true for EBV+ sorted cells that also respond better to CMV peptides than to EBV ones.

      But it's not clear that the concentrations of the peptides are appropriate for stringently detecting cross-reactivity.

      We wonder what does mean “stringently”? It is possible that stringently mainly means defining the conditions that eliminates what does not fit the current paradigm?

      More factually, the peptide concentration used for these experiments, presented in Fig. 5A-B, was 1 µg/mL, i.e. ~1 µM for a 9-10 aa-long peptide. This is clearly a physiological concentration for viral peptides, routinely used in in-vitro recall assays. We can thus rule out that the observed cross-reactivity is simply due to an excess peptide stimulation.

      Then the authors actually synthesize and characterize individual TCRs. Here what is seen is consistent with expectation and does not seem to support the idea of substantial fuzzy cross-reactivity: binding to the cognate peptide is 3-4 orders of magnitude stronger than to the alternative peptides.

      We respectfully disagree. First, as shown in Fig. 5C TCR#35-13 (cognate peptide HLA-A2-restricted Flu MP 58-66) indeed recognizes the alternative HLA-A2-restricted CMV IE1 184-192 peptide with a 3-4 higher log EC50; yet, the EC50 of this TCR is approx. 10e-6 M, i.e. 1 µM, which remains a physiological concentration. Second, this is not the case for TCR#36-150 (same cognate peptide HLA-A2-restricted Flu MP 58-66), which actually recognizes the alternative HLA-A2-restricted EBV BMLF1 280-288 peptide with a 4-fold lower EC50.

      The only exception is the GAD 114-122 TCR, where the different peptides appear to be closer in binding strength. But in this case, the authors state that they "analyzed their response to a set of peptides comprising their cognate peptide and peptides with no significant structural commonalities, selected by testing combinatorial peptide libraries". If the competitor peptides came from peptide library screening then the observation of strong binding to alternative peptides does not seem as surprising as a TCR that binds well to a Flu peptide, say, and also a CMV peptide, selected from a smallish set of possibilities.

      As explained above, this TCR does not stand as an exception compared to Flu-reactive TCRs. Moreover, it should be noted that this GAD 114-122 TCR recognizes its cognate peptide in a similar or even lower concentration range compared to the Flu-reactive TCR #36-150. It should also be pointed out that, contrary to the Flu-reactive TCRs, here we did not have any reference dextramer binding data to guide our peptide selection, which is why we resorted to combinatorial peptide libraries. Thus, although different strategies were used, peptide selection was “guided” in both instances.

      It is pretty well established that TCRs are cross-reactive, both for nearby peptides and also for sequence-dissimilar peptides.

      We agree and had notably quoted the landmark paper by Don Mason estimating that each TCR may respond to over 106 different peptides from an estimated repertoire of > 1010 peptides. Based on the Don Mason estimate of cross reactivity, the chance to find a cross reactive peptide at random would be around 10-4.

      Here, we just tested a few peptides from different viruses. If Don Mason’s estimates are correct, for a given TCR, the chance to find even just 1 cross-reactive peptide among these few peptides would be at most of 10-3, the chance to find 2 cross reactive peptides would be of 10-6 and that to find 3 or more cross reactive peptides would have be infinitesimal.

      Thus, if the polyreactivity that we described is part of this general cross reactivity, our results are at least highlight a major previously unreported bias in the selection of these cells.

      The question is whether widespread, functionally relevant (not just dextramer binding at some concentration) poly-reactivity to diverse viral peptides is a defining feature of a large fraction of the TCR repertoire. The paper does not appear to present sufficiently strong evidence to support this claim.

      We agree with the reviewer that more work is needed to “fully” appreciate the role of polyreactive cells!

    1. Author Response

      Reviewer #2 (Public Review):

      This paper reports a novel measure of biological age derived from machine-learning analysis of retinal imaging data with chronological age as the criterion measure. The resulting algorithm is impressive. Not only can the retinal image data accurately predict chronological age in the training data and record changes over short time intervals, but it also proves accurate in independent test data and appears to contain information related to mortality risk. In addition, the authors report a GWAS of the new measure.

      I would like to see a bit more validation data in the UKB - how does EyeAge relate to (a) tests of visual acuity - e.g. does it explain aging-related differences?

      We have extended the supplemental tables and figures (Supplementary table 5 and Figure 3- figure supplement 2) to show additional adjustments to the hazard ratios using visual acuity.

      (b) measures of morbidity and disability - e.g. how is EyeAge Accel associated with at least some of the counts of chronic diseases, self-reported physical limitations, tests of physical performance, measures of fluid intelligence?

      We felt that all-cause mortality is the most clear outcome to test against, as other outcomes were not available for all participants or would require domain-specific knowledge to properly incorporate which we felt was out of scope. Given this, we have added this limitation to the discussion:

      “This study has several limitations. First, further work will be needed to assess whether eyeAgeAccel is correlated with other important health outcomes and measures.“

      But overall, this is a very strong report of an exciting new biomarker of aging. It was unclear to me whether the algorithm to compute the measure would be publicly available. The authors should clarify.

      Code for both training and evaluation of eyeAge from fundus images is available by minimally modifying open-source software we previously released under the permissive BSD 3-clause license. We have added the following “Code availability” section to the paper:

      “To develop the eyeAge model we used the TensorFlow deep learning framework, available at https://www.tensorflow.org. Code for both training and evaluation of chronological age from fundus images is open-source and freely available as a minor modification (https://gist.github.com/cmclean/a7e01b916f07955b2693112dcd3edb60) of our previously published repository for fundus model training57.”

    1. Author Response

      Reviewer #1 (Public Review):

      This is a very interesting paper trying to quantify excess deaths due to the COVID-19 pandemic in the USA. The paper is roughly divided into two main sections. In the first section, the authors estimate age and cause-specific excess mortality. In the second section, using their excess mortality estimates, the authors attempt to disentangle the impact of SARS-CoV-2 infection (direct impact) vs. the impact of NPIs on this excess mortality (indirect impact). I have some concerns, particularly with respect to the second section.

      The model used to estimate excess mortality is quite clear. The authors adjust the baseline model to account for low influenza circulation (and deaths) during the COVID-19 pandemic, to avoid underestimating the number of deaths caused by COVID-19. While this makes sense if the authors are trying to estimate the total number of deaths caused by COVID-19, I'm not sure it needs to be accounted for if the authors want to estimate excess/added deaths. A counterfactual scenario would've included influenza. It also raises the question of whether (conceptually) they should be adjusting for other causes of deaths that may have also decreased during the pandemic. The authors briefly acknowledge this in the discussion ("we can't account for changes in baseline respiratory mortality due to depressed circulation of endemic pathogens other than influenza") but my comment goes beyond respiratory diseases. Analyses of excess mortality from other settings have suggested, for example, decreased deaths due to fewer traffic accidents (not in the US) or due to decreased air pollution, and not accounting for these would also lead to an underestimate of the total deaths caused by COVID-19. I understand that it is not feasible to account for all potential factors, so I wonder if they should focus on reporting excess deaths as compared to a counterfactual with influenza.

      Thanks. We think it is helpful to “single out” influenza as it causes major fluctuations in mortality from multiple causes in regular years and is a useful reference to contrast the pandemic impact. But the reviewer’s point is well taken. We have clarified our assumptions about the meaning of the baseline in this analysis (methods p 5), discussed the depressed circulation of other pathogens in depth, and mentioned air pollution (p 12-13). We have also slightly reworked our comparison between COVID19 and influenza so that excess mortality estimates are comparable and now cover periods of the same duration (Nov 2017-Mar 2018 for flu and Nov 2020-Mar 2021 for COVID19, see Figure S11).

      The second section, trying to estimate direct vs. indirect effects is also very interesting. However, more details are required about the regression model used and, importantly, what the assumptions and limitations of the approach are. Specifically:

      • Please provide a bit more information on the regression used for direct vs. indirect effects. I'd like to see explicit discussion of the assumptions and limitations of the approach but also of the stringency index used. Does this model include an intercept? Was the association between stringency index and excess deaths assumed to be linear? Or were different functional forms considered? It is also not clear how well the model fits the data.

      Thanks for these comments which helped us improve this section. We have provided more details about the stringency index in methods (it captures the “sum” of interventions), described the model in methods and supplement, and discussed limitations in caveats section, especially regarding effectiveness of these interventions (p13). We had tried different linear models with and without intercepts but elected to use models with intercepts so as not to overly constrain the relationship between interventions, COVID19 activity and excess mortality. These models also incorporate lags in the predictors that are determined by cross-correlation analysis (as detailed in supplement). In the revised version, we now use gam models, where the relationships between excess mortality and predictors do not have to be linear. We can do so since we were able to add several weeks of data (the regression is now based on 96 pandemic weeks from March 1, 2020 to January 1, 2022). The models are described in detail in supplement p 4-5, and we now specify that they have intercepts. We have also provided additional plots of model fits in main text and supplement (Figures 4 and S16-19).

      • Related to the above, please provide more details on how the results of the regressions were translated into the results presented. The main text reports percentages, but the methods only briefly explain how numbers of direct deaths were calculated, and the supplementary tables report coefficients. It is not clear if these estimates of direct and indirect deaths were somehow constrained to add up to the total number of excess deaths, but it doesn't seem like it since point estimates cross 100% in some cases.

      As discussed in response to one of the editor’s questions, estimates are not constrained to 100%. We have provided more details in the supplement on how we estimate the direct impact of the pandemic. Briefly, we calculate expected deaths in the gam model with all predictors set to their observed values and again with the COVID19 predictor to zero. The direct impact is the difference between the two predictions, divided by the predictions of the full model.

      We note that while some of the estimates derived from gam model exceed 100% (and are similar to the linear model estimates presented in the initial analysis, before revision), these estimates echo the findings from a more empirical analysis, in which we compare all-cause excess deaths with official COVID19 deaths tallies. There, in the two oldest age groups, we find more official COVID19 deaths than estimated by the excess mortality models. Hence both analyses point to an underestimation of the direct burden of COVID19 by the excess mortality approach, specific to the oldest age groups. We return to this point in depth in the discussion (p 12-13) and consider the possible effects of harvesting, depressed circulation of non-SARS pathogens, and inaccurate coding of official statistics (as pointed by reviewer #3).

      • Please discuss the potential limitations of using the stringency index to quantify NPIs.

      Several limitations have been added to caveats (p 13); major issues include aggregation of multiple interventions into a single index, which does not consider the actual implementation nor the effect of interventions. The index is solely based on mandates in place in different locations and time periods. We also assume that the effectiveness of these interventions, for a given level of stringency, does not change over time.

      • When estimating direct and indirect effects, the paper assumes that the estimated parameter is time-invariant? Indirect effects might have changed over the course of the epidemic by factors not necessarily captured by the stringency index used, particularly since the index doesn't take into account the implementation of the measures. Have the authors tested this assumption?

      This is an interesting point, which we have explored further. The non-linear relationships we find between NPIs and chronic condition excess mortality may suggest that the reviewer is right. We discuss the role of NPIs in the results section much more deeply than we were previously (bottom of p8).

      “At lower levels of interventions (Oxford index between 0 and 50), representing the early stages of the lockdown in March 2020, excess mortality rose with interventions. Later in the pandemic, increased interventions were estimated to have a beneficial effect on excess mortality, driven by comparison between the period when interventions were strengthened in response to increasing COVID19 activity in late 2020 (Oxford index above 60) to the period when interventions were relaxed in 2021 (Oxford index between 50 and 60).”

      We cannot run an analysis over different time windows because NPI and time are highly conflated (for instance NPI rise from 0-50% in the very early part of the lockdown period, and then stays above 50% for the rest of the study, so we cannot compare the effect of a 25% level in 2020 and 2021). We have added this limitation in the caveat section p.13.

      • The authors state "In contrast, the indirect impact of the pandemic measured by the intervention term was highest in youngest age groups, decreased with age, and lost significance in individuals above 65 years" - I'm not entirely sure of where this statement comes from? For example Table S3 suggests that the indirect effect (multivariate or univariate) is higher in 25-64 yo than in <25s? The same table also suggests negative impacts (protective effects?) in >75s in the multivariate model. Please clarify.

      There are fewer deaths in the under 25 yo so this is why the coefficients were lower overall in table S3. Yet we find that the proportion of variance explained by interventions is higher in the under 25 yrs than in 25-44 yrs.

      We have now changed our modeling strategy to use gam so Table S3 is no longer relevant but the main conclusion that interventions explain a larger relative portion of excess mortality in the under 25 yrs than in the other age groups, and than other covariates, remains valid. The NPI term is now significant is in all groups (although the relative contribution of NPI still declines with age, as in the prior analysis), so we have rephrased this sentence: “In contrast, the relative contribution of indirect effects, via the intervention variable, was highest in youngest age groups and decreased with age”.

      • How do the authors interpret "Percents of excess deaths" over 100%? Similarly, I don't fully understand how to interpret "The upper bound of the 95% confidence interval for heart diseases was above 100% (158%), suggesting that for every excess death from heart disease estimated by our model, up to 1.58 death from heart disease could be directly linked to SARS-CoV-2 infection.

      We have rephrased this section although the overall conclusions remain unchanged. GAM estimates of the direct COVID 19 impact is statistically significantly above 100% in the 85 yo and over, suggesting that our excess mortality approach is too conservative and does not estimate enough COVID19 excess deaths in this age group. We draw a similar conclusion from a more empirical analysis, in which we compare all-cause excess death estimates with official COVID19 deaths tallies. In this analysis, we find more official COVID19 deaths than estimated by the excess mortality models in the two oldest age groups (point estimates above 100% in the 75-84 and 85+ yrs). Hence both analyses point to an underestimation of the direct burden of COVID19 in the oldest age groups by excess mortality approaches.

      Rephrased results section bottom of p.9: “We estimate that the direct contribution of COVID-19 to excess mortality increases with age, from negative and non-statistically significant in individuals under 25 yrs to over 100% in those over 85 years, echoing the gradient seen in official statistics (Table 4). It is also worth noting that our excess mortality estimates may be too conservative (too high) as we did not account for missed circulation of endemic pathogens. This could explain why our estimates of direct COVID-19 contribution exceed 100% in the oldest age group.“

      We return to this point in depth in the discussion and consider the possible effects of harvesting and depressed circulation of non SARS pathogens (p 12-13).

      • Table 3: The signs of the point estimate vs CI for vehicle accidents are inconsistent.

      Thanks, this was a typo. It should have been 4300 (-700, 9300) excess deaths from accidents. This has been updated with more recent data.

      Reviewer #3 (Public Review):

      Authors examine mortality data in the US and use time-series approaches to estimate excess mortality during the COVID-19 pandemic.

      Major comments:

      I would encourage authors to discuss the two different concepts of excess mortality:

      (#1) what deaths were caused, directly or indirectly, by the pandemic. This is what the authors have aimed to assess, and I have no major concerns with the methodology

      (#2) how many additional deaths occurred during the pandemic, compared to what would have been expected in the absence of a pandemic. For such an analysis I think expected annual influenza deaths should be added back to the baseline (or subtracted from the excess)? Some of the discussion seems to relate more to an impression of #2 rather than #1 but I would be interested in the authors' thoughts.

      We have added more details about the approach, in particular why we think that #1 is the proper analysis here (see methods p 5). Given the sheer magnitude of COVID19 excess deaths (over 1 million excess deaths at the end of our study), adding back influenza deaths (up to 52,000 deaths in a recent severe season with a mismatched vaccine, as in 2017-18) would not make a large difference. We have also provided a more direct comparison of the impact of influenza and COVID19.

      1. Authors estimate fewer excess COVID deaths in the elderly than there were confirmed deaths (Table 3). Could this be an indication of some confirmed deaths being "deaths with COVID" rather than "deaths from COVID"? I'm not sure how to interpret the %s in the final column when they exceed 100%. The authors suggested a harvesting effect but I would suggest "deaths with COVID" might be a more likely explanation? This issue can be a limitation of confirmed-death data.

      This is a good point. We have added a comment along these lines in discussion in the middle of p 12. Still, we think harvesting and/or the depressed circulation of endemic pathogens, which would have inflated our baseline, are more likely explanations for these findings. This is because we find similar estimates (exceeding 100%) in gam models that ignore official statistics and rely on COVID19 case data, or COVID19 hospital occupancy data, and this suggests that other mechanisms, beyond coding of official mortality statistics, are at play.

      Yet, as more detailed official statistics become available, a tabulation of confirmed deaths by presence of a primary vs secondary COVID (U07) code may be revealing and get more directly at the reviewer’s question.

    1. Author Response*

      Reviewer #1 (Public Review):

      ARL3 is a small GTPase that localizes to the primary cilium and plays a role in regulating the localization of some specific ciliary membrane proteins, including PDEδ and NPHP3. Mutations in this gene cause Joubert syndrome, a type of ciliopathy characterized by cerebellar malformation, and retinal degeneration. While the majority of the diseases occur in an autosomal recessive manner, two mutations in ARL3 (D67V and Y90C) have been reported to cause autosomal dominant retinal diseases. In the current paper, Travis et al. sought to understand the pathogenesis of the diseases caused by the two autosomal dominant mutations. They found that D67V acts as a constitutive active mutation, whereas Y90C is a fast-cycling mutant, which can be activated in a guanine nucleotide exchange factor (GEF) independent manner. Since the fast-cycle mutant did not bind to the effector proteins in vitro (likely because the guanine nucleotide falls off from the mutant ARL3, which has a lower affinity to GDP/GTP), they developed a method to snapshot the interaction between ARL3 and its effector. Using this method, they showed that the Y90C mutant indeed has increased interaction with the effectors, suggesting that Y90C is an overactive form of ARL3. They then addressed how photoreceptor cells are affected by these two mutations using a mouse model and found that the mutations disrupt the proper migration of the photoreceptor cells.

      Strengths:

      • The paper is well written, and it was easy to understand what the authors did from the figure legends and the methods section.

      • It was easy to find out what is known or unknown, as the paper has accurate references.

      • The authors developed a method to analyze a snapshot of the interaction between ARL3 and its interactors.

      • The paper has an in vivo model and connects the biochemical characteristics of ARL3 to in vivo cellular phenotypes.

      Weaknesses:

      (1) I understand that authors focused on nuclear migration defect as the phenotype was first described in ARL3-Q71L transgenic mice. The similar phenotype observed in RP2 knockout mice further supports the idea that the defect is caused by the hyperactivation of ARL3. Indeed, the defect is not reported in the ARL3 knockout mice, however, I feel that it does not necessarily mean that the defect is not caused by loss of function. Although it has not been assessed, ARL3 knockout mice might have the same defect. Therefore, I think analyzing both the migration defect and trafficking defect would be more informative, rather than focusing on the migration defect. The fact that the relationship between nuclear migration defect and the retinal degeneration phenotype is not entirely clear further enhances the importance of analyzing the trafficking defect.

      Does the expression of ARL3-Y90C also cause the trafficking defect? If it is the case, you can separate the nuclear migration phenotype from the one caused by the trafficking defect. Would the expression of lipidated cargo(s) rescue the trafficking defect as well?

      I think many questions can be addressed by analyzing the localization of the lipidated cargos, such as PDEδ and GRK1.

      The effect of Arl3-Y90C expression on trafficking of lipidated cargos is an interesting question. Previous papers showed mislocalization of lipidated outer segment proteins in Arl3-KO rods and down-regulation or subtle mislocalization in Arl3-Q71L overexpressing rods. So, this was one of the first things we investigated; however, we never observed mislocalization of ciliary or outer segment lipidated cargos (i.e. GRK1, transducin, Rab28, and PDE) in wild type mature rods that were overexpressing Arl3 mutants, and many were tested. It was through these experiments that we first identified the pronounced nuclear migration defect. Rod photoreceptor nuclear migration is a developmental process that is completed by P10, so Arl3-Y90C overexpression is causing a developmental defect. When rods are positioning their nuclei in the ONL, they are still “immature” as their primary cilium has not begun to elaborate disc membranes for light capture. All our analysis was performed in mature rods, so it is not surprising that we did not observe any lipidated trafficking defects at this timepoint. Since the developmental timing of the nuclear migration defect is important for our manuscript, we have added this to our introduction. Additionally, we use “immature” photoreceptors for the cartoon diagrams showing how Arl3 activity is altered by different mutation and rescue experiments, since formation of the mature outer segment occurs post-migration.

      (2) I am not quite sure if the nuclear migration was assessed properly. Based on the pictures in Fig.1, some of the FLAG-negative cells also seem to be migrating to INL (please see Fig.1C and Fig.1D). Is this biologically normal during development? Could this analysis be affected by the thickness of OPL, the layer between ONL and INL? Also, the picture is cut out in the middle of INL. Could authors include more layers, such as IPL, of the retina in the picture, so that we can evaluate INL and OPL better? Taking this into account, I think it is worth measuring the nuclear position of FLAG-negative cells as a negative control in all the experiments.

      Our electroporation technique results in a small population of rods that express our constructs of interest (~5-15% with a patch). All the experiments were performed in wild type retina which develop normal retinal layers, so analysis of the nuclear position of FLAG-positive cells with the retina is cell autonomous. Migration defects are assessed by differences in the skew of FLAG-labeled rods relative to the boundaries of the wild type ONL, which is marked by Hoechst nuclear stain (also a measure of the FLAG-negative rods). Wild type photoreceptors nuclei are not found within the INL, the nuclei in that layer belong to either horizontal cells or bipolar cells both of which are not targeted by our electroporation approach. As a control, we show that when wild type Arl3-FLAG was expressed FLAG-labeled rods were never observed within the INL. We have now included the % of displaced nuclei in Table 1.

      (3) The way that the authors showed the Y90C mutant of ARL3 is a fast-cycling mutant is not very compelling. In Figure 2C, the authors showed that ARL3 Y90C can bind to PDEδ, its effector, once it is pre-loaded with GTP. The authors also showed that the mutant can bind to its effector even without EDTA as long as an excess amount of GTP is added. The authors used endogenous ARL3 as a control to compare the effects between wild-type and mutants. I see that this experiment has multiple pitfalls. First, ideally, this type of experiment needs to be done with a purified protein using fluorescent guanine nucleotide/radioactive guanine nucleotide (e.g. nucleotide loading assay or nucleotide exchange assay) to directly access the kinetics of nucleotide exchange. However, I do understand that this is out of the authors' expertise. In the authors' experimental setting, I am not sure loading the protein with GTP in the presence of the EDTA means anything more than confirming that the protein is intact. Theoretically, wild-type and a fast-cycling mutant can load GTP with similar efficiency in the presence of EDTA. Then during immuno-precipitation, GTP falls off from the Y90C mutant faster than wild-type (because a fast-cycling mutant theoretically has a lower affinity to guanine nucleotides), assuming that GTP was not added during immuno-precipitation (GTP addition was not mentioned in the method, but could authors confirm this?). But in this case, the kinetic of GTP dissociation can be affected by many factors, including the presence of GAP in the reaction, the dissociation constant of Y90C, the volume of the buffer used, and the number of washing steps. Thus, it is not very easy to estimate the difference between wild-type and Y90C. Besides, using endogenous ARL3 rather than ARL3-wild type FLAG as a control can be dangerous. I have experienced that a tagged protein is cleaved to a protein that has a similar size to endogenous protein. (I expressed GFP-protein X in knockout cells lacking protein X, and saw the band at the position where the endogenous protein is observed in wild-type cells). So, the endogenous band that the authors showed could come from the cleaved FLAG-Arl3. (Authors can easily confirm this by having wild-type not expressing FLAG-tagged ARL3, though).

      An alternative experiment that I would suggest is doing immuno-precipitation in the buffer containing: 1) no guanine nucleotide, 2) 10mM GDP, or 3) 10mM GTP in the cells expressing the following protein: 1) ARL3 wild-type FLAG, 2) ARL3 Y90C FLAG, or 3) ARL3 D129N FLAG. 10mM guanine nucleotide should be added throughout the process including washing. This experiment might also be affected by many factors, but variability should be lower than the experiment presented in Fig 2C. ARL3-wild type FLAG is also a better control here than endogenous protein.

      Variability due to the factors you mention is a concern, but we were able to repeatedly obtain the same results using our method—admittedly our method is testing whether the mutated Arl3 can exchange under a certain condition more than exactly how. We know that we are not providing precise kinetics or elucidating the underlying mechanism for how these mutations lead to what we are calling fast cycling. While that information is important, it is outside the scope of this paper.

      As you mention, an important conclusion from the PDEδ binding experiments is that we confirm the Arl3-Y90C protein is intact by showing it can indeed bind nucleotide as long as there is an excess of GTP (Fig 2B. The interesting finding from these experiments is that Arl3-Y90C binds GTP even in the presence of magnesium, a behavior not observed for wild type Arl3. We feel that showing that endogenous Arl3 is not activated in the presence magnesium in each of our preparations is a lovely internal control. However, we agree that showing wild type Arl3-FLAG in these assays is an important negative control and have now included this blot as Fig 2-Sup Fig 1.

      (4) In Fig.3, the authors attempted to take a snapshot of the interaction between ARL3 and multiple effector proteins. The three bands that were enriched in the Q71L cells were found as RP2, UNC119, and BART by mass spec (Fig.3B). These bands were used as a readout for the subsequent experiments. I am not quite sure why the authors used this approach rather than using the cell line that expresses both FLAG-ARL3 and GFP tagged protein of interest, just like what the authors did in Fig3G. The reasons why I prefer the latter approach are the following: FLAG bands that correspond to the three proteins (RP2, UNC119, and BART) in wild-type cells are very close to the detection limit, 2) authors failed to confirm that the lowest band actually comes from BART, 3) authors cannot access some important effector proteins, such as PDEδ because 293 cells might not express them. All of the problems can be solved by using the approach that was taken in Figur 3G.

      If the authors chose the former approach because of some specific reason, I would appreciate it if the authors could explain that in the main text of the paper.

      In vitro crosslinking experiments were performed to test whether overexpression of Arl3 mutants resulted in an active cellular Arl3 without artificially changing any components of the GTPase cycle. We feel these experiments are highly elegant as they allow us to take a snapshot of native Arl3 activities without compromising the analysis by artificially altering GAP/GEF/effector interactions through overexpression or during lysis (as we show that the concentration of GTP/Mg could alter interactions in Fig 2). While AD293T cells are not rod photoreceptors, we are able to use this system to better understand how the Arl3 mutants alter the level of activity within the cell. Yes, this experimental assay is novel, but we confirmed the identity of the effectors by Western and mass spec, used positive and negative controls in each experiment, and show that the method is highly reproducible. We agree with Reviewers 2 and 3 that using this method to study the cellular activity of fast cycling Arl3 mutants is a strength of our paper.

      (5) ALR3 Y90C causes nuclear migration defect. Given that Y90C is a fast-cycling mutant (hyperactive) and has a high affinity to ARL13B, the nuclear migration defect might come from either the increased activity of ARL3 or sequestration of ARL13B, which can act as a GEF for ARL3 but potentially have other functions. If my understanding is correct, the authors concluded that the defect caused by ARL3-Y90C is likely due to hyper-activation of the protein, as Y90C/T31N mutant, which cannot bind to effectors but still retains the ability to capture ARL13B, did not cause migration defect. But I am a little confused by the fact that Y90C/R149H, which is unable to bind to ARL13B (Fig.2C) but still retains the ability to interact with the effectors (Fig.3F), did not have migration defect (Fig.7B). Wouldn't this mean that the sequestration of ARL13B could contribute to the phenotype?

      If my understanding is correct, the authors are trying to say that both hyper-activation of cytosolic ARL3 and the defect in endogenous ARL3 activation in cilium is necessary to cause migration defect. I am not very convinced by this hypothesis, and still think that the defect could be caused by sequestration of ARL13B to the cytoplasm.

      Then why Y90C/T31N did not cause the defect even though they can sequester ARL13B? This might be explained by the localization of the ARL13B mutants. If Y90C can localize to the cilium while the double mutant, Y90C/T31N, does not, then only Y90C might be able to inhibit the ARL13B function in the cilium. This could explain the lack of the defect in the cells expressing Y90C/T31N.

      It would be helpful to understand how exactly the fast-cycling mutant causes the defect if the authors can provide more information, including localization of ARL3 (wild-type and mutants) as well as key proteins, such as ARL13B and the effector proteins. Assessing ARL13B defect seems to be particularly important to me because ARL13B deficiency has been connected to neuronal migration defect (Higginbotham et al., 2012)

      What I am trying to say here is that how the defect is caused is likely very complex. So, providing more information without sticking to one specific hypothesis might be important for readers/authors to accurately interpret the data.

      Our data shows that for the fast cycling Arl3-Y90C mutation both features: blocking endogenous Arl3 activation in the cilium (through Arl13B binding) and increasing activity of Arl3-Y90C in the cell body are required to produce a nuclear migration defect. We find that we can rescue migration defects by either restoring activation in the cilium or reducing GTP activity outside the cilium. As long as there is more Arl3-GTP activity in the cilium, then the rod can handle aberrant Arl3-GTP activity in the cell body. The Y90C/R149H was a critical result that led to our hypothesis that there is a gradient between the two compartments that is used for proper migration. One interesting point is that absence of any activity does not produce the migration phenotype, further suggesting that an imbalance in the gradient is important.

      We performed new experiments to investigate whether Arl3-Y90C is sequestering Arl13B away from the cilium but found that localization of Arl13B (both endogenous and overexpressed) is not altered by expression of Arl3-Y90C – see Fig 3-SupFig 1-2.

      It is an interesting question as to how different Arl3-FLAG constructs are localized within the photoreceptor. Sadly, we did not analyze the data in a way that would allow us to draw any conclusion about the localization of different Arl3-FLAG constructs. In general, we observed FLAG localization throughout the photoreceptor cell and focused our imaging on the FLAG staining around the nucleus so we could further analyze ONL position. Looking back through our images, some of mutants might have a more prominent localization within a specific subcellular compartment (e.g. Arl3-D67V is more prominent in the inner segment than outer segment and Arl3-Y90C appears to have dominant outer segment localization). Likely, these differences represent each mutant binding a particular effector: D67V to RP2 and Y90C to Arl13B, which we show biochemically. Ideally, Arl3 mutant localization would be analyzed during development to provide a more direct link to the nuclear migration defect, a future direction for our lab. We have updated our manuscript to be more transparent about the potential differences in rod localization of Arl3 mutants.

      (6) The rescue experiments that the authors presented in Fig.5-6 are striking and would build a base for future therapy of the diseases caused by ARL3 defects. However, I believe more examinations are needed to accurately interpret the data. The authors did this rescue experiment by co-injecting ARL3-FLAG and chaperons/cargos if I understand the method section correctly. But I feel we can interpret this data correctly only when ARL3-FLAG and chaperons/cargos are co-expressed in the same cells. I think a better way to analyze the data might be by comparing the nuclear migration phenotype between ARL3-FLAG only and ARL3-FLAG;chaperons/cargos double-positive cells.

      Our lab has found that the initial estimates by the Cepko Lab that co-injection of two plasmids results in above 90% of rods expressing both proteins is accurate (see reference Matsuda and Cepko PNAS 2004). Since we only assess nuclear position of FLAG-labeled rods, it is true that a small percentage of cells in this analysis express the Arl3-FLAG mutant and not the chaperone/cargo; however, inclusion of these cells really only bolsters our findings as complete rescue would likely be even more robust than measured.

      Reviewer #2 (Public Review):

      The small GTPase Arl3 (Arf-like 3) is a well-characterized component of primary cilia, including the outer segment of photoreceptors, which contain specialized cilia. Arl3 is critical for the import of multiple lipid-modified proteins into cilia that are vital to ciliary function. Human mutations in Arl3 are reported to cause both autosomal recessive and dominant inherited retinal dystrophies, but the mechanisms through which these mutations disrupt photoreceptor development are not known. Here the authors show that two dominant Arl3 mutants, Arl3-D67V and Arl3-Y90C exhibit increased activity, but for different reasons. Arl3-D67V is constitutively active (unable to hydrolyze GTP), whereas Arl3-Y90C is a classic rapid-cycling mutant, able to bind GTP spontaneously (independent of its guanine nucleotide exchange factor Arl13) but still able to complete the GTPase cycle by hydrolyzing GTP. Expression of either mutant in developing murine retinas results in a nuclear migration defect, specifically aberrant localization of rod nuclei to the inner rather than outer nuclear layer. In this sense, they phenocopy another well-characterized constitutively active mutant, Arl3-Q71L. Normal nuclear distribution could be restored by overexpression of Arl3 effectors, suggesting that the active mutants disrupt nuclear migration, at least in part, by sequestering Arl3 effectors.

      While the data are reasonably clear and convincing, there are several instances where the conclusions drawn are either confusing or problematic. Specifically:

      1) Although retinal rod cells are ciliated in their outer segment, the authors never actually examine ciliation here. Their only morphological readout is nuclear migration. How does nuclear migration failure impact ciliogenesis in the outer segment?

      Imaging was performed in mature retinas at P21 after outer segment formation is completed. Electroporation only targets a small population of cells for which we observed normal outer segments structures in all conditions tested — therefore we conclude that ciliogenesis is unaffected. Previous literature has also showed that defects in rod nuclear migration do not affect ciliation of the outer segment.

      2) The Arl3-Y90C mutant seems to act physiologically more like a dominant-negative than an activated mutant. A second mutation in Y90C (R149H) that blocks binding to the GEF Arl13 abrogates the nuclear migration defect, suggesting that Y90C is preventing activation of endogenous Arl3 by sequestering the GEF. Yet overexpression of effectors or cargos still rescues nuclear migration in the presence of Y90C, suggesting that it also sequesters effectors. How do the authors explain this?

      We agree with this interpretation. We have now included language about Arl3-Y90C’s role as a dominant negative in that it blocks Arl13B activity. The interesting caveat to this black and white usage is that blocking Arl13B would suggest a reduction in endogenous Arl3 activity in rods (which we find to be true, see Fig 5A). However, the migration defect phenotype mimics overly active Arl3 (Arl3-Q71L) and not a loss of function in Arl3 (Arl3-T31N). Using in vivo crosslinking experiments, we show that the fast cycling nature of Arl3-Y90C also causes GEF-independent activation of Arl3 (Fig 4D-E) that leads to the migration defect. Our rescue data shows that only the combination of both effects – reduced Arl3 activity in the cilium and GEF-independent Arl3 activation outside the cilium - is enough to disrupt the ciliary gradient and produce the migration defect.

      3) Fig. 1 suggests that an Arl3-T31N mutant has no phenotype. This is a canonical mutation in small GTPases that typically renders them dominant negative. The lack of phenotype is surprising since most dominant-negative mutants act by sequestering their GEFs, thereby preventing activation of the endogenous GTPase. Fig. 2C suggests that this may not be the case for Arl3-T31N, which binds Arl13 only weakly. Some of this confusion may arise from the fact that Arl13 is not a typical GEF. It is very unusual for one GTPase to directly promote nucleotide exchange on another. Does Arl3-T31N affect ciliation in the rod outer segment, or in other ciliated cells? Some discussion of this point is warranted here.

      Our paper finds that Arl3 mutants must produce an aberrant activity outside the cilium, whether through constitutive activity (seen for D67V and Q71L) or fast cycling (seen for Y90C and D129N) to cause the migration defect. Since T31N does not cause excess Arl3 activity in cells (see Fig 4) even if it does have some dominant negative activity toward Arl13B, then it is still not enough to cause the migration phenotype. This was directly tested in Fig 5, where we increase T31N binding to Arl13B by introducing Y90C/T31N and still do not see migration defect. Our results are also in line with a previous study showing that despite rapid photoreceptor degeneration in a retina-specific conditional Arl3 knockout mouse the outer segments were initially formed, in contrast the retina-specific conditional Arl13B knockout mouse did disrupt photoreceptor ciliogenesis leading to a more rapid degeneration (Hanke-Gogokhia, JBC 2017). Since complete loss of Arl3 activity did not disrupt ciliogenesis, it is unlikely that expression of Arl3-T31N in wild type retinas would alter outer segment formation, and we observed that outer segments formed in all Arl3 mutants.

      4) Oddly, Arl3-Y90C does robustly bind Arl13 (Fig. 2C), while at the same time binding to effectors (Fig. 3D/E), although less strongly than the canonical Q71L constitutively active mutant (Fig. 2A). As noted in point #2, the Y90C/R149H double mutant, which fails to bind Arl13, abrogates the nuclear migration defect observed with Y90C alone. Although the authors refer to Y90C as "rapid cycling" its phenotype is more similar to a dominant-negative than an activated mutant.

      We agree with this interpretation. We have now included language about Arl3-Y90C’s role as a dominant negative in that it blocks Arl13B activity. However, the rapid cycling behavior is important to cause the phenotype.

      5) The authors also mention that loss of Arl3 has no phenotype in their assay, however, Arl3 knockout mice exhibit severe retinal degeneration. How do they explain this?

      Our study finds that not all human Arl3 mutations will target the same cellular process even though they all result in degeneration. Arl3 knockout mice show drastic alterations in lipidated protein trafficking to the rod outer segment in mature retinas, a phenotype that we did not observe by expressing the dominant Arl3 mutants in wild type rods. Since our tools are not designed to study degeneration of rods, the precise mechanisms of degeneration caused by loss of function or dominant mutations remains to be determined. We outline some ideas in the discussion, but more work needs to be done before making any big statements regarding this. We hope that our manuscript will inspire clinicians to take a closer look at human patients to determine if there are subtle differences between disease presentation for dominant and recessive forms Arl3 inherited mutations. This is beyond the scope of our expertise.

      Reviewer #3 (Public Review):

      This work provides mechanistic insights into two recently described dominant variants of Arl3, a small GTPase, namely mutations D67V and Y90C. The authors identified a phenotype of these dominant variants during the development of rod photoreceptors by in vivo experiments in mice. They specifically observed a defect in rod nuclear migration to their final outer nuclear layer. This phenotype has been previously observed in another constitutively active variant of Arl3, Q71L. The authors performed a series of extensive and thorough biochemical assays to clarify the mode of action of these variants, mostly the Y90C variant, comparing the behavior of these variants to previously described mutants and combining multiple variants by mutagenesis. They also developed a new in vivo crosslinking strategy to be able to identify transient states of protein-protein interactions. They finally performed phenotypic rescue experiments by co-expression of various relevant proteins interacting/involved with Arl3. They finally propose a model based on differential subcellular compartmentalization of Arl3 activation which when disrupted leads to rod nuclei misplacement. These data add to the current understanding of contribution of different Arl3 variants causing human retinal degeneration, which has strong potential translational implications.

      Strengths:

      Relevance of Arl3 dominant variants to human retinal degeneration. Identification of Y90C variant as a "fast cycling" GTPase, and not as a predicted destabilizer of the protein structure.

      New method of crosslinking to enable snapshots of endogenous protein-protein interactions.

      Weaknesses:

      • The relevance of this study is justified by the fact that newly identified dominant variants of Arl3 have been associated to retinal degeneration. However, the authors never assess a degeneration phenotype.

      Electroporation technique allows for rapid expression of constructs, but the sparse expression makes it a poor means to study retinal degeneration. This is important to examine in the future using robust genetic mouse models.

      • The authors show new dominant variants of Arl3, namely Y90C and D67V, cause rod nuclear mislocalization. This phenotype is interesting but this was previously observed with other constitutively active mutation of Arl3, Q71L, and therefore is not novel.

      Yes, the Q71L paper is well cited in our manuscript and set the basis for many of our experiments.

      • The main claim of this paper is that subcellular compartmentalization of Alr3 activation to the cilium (the so called gradient by the authors) is required for proper rod nuclear migration to their final outer nuclear layer destination. The authors provide multiple experiments to support this model, but this is never directly demonstrated.

      We are not aware of any methods that could be done to directly show the subcellular localization of active Arl3-GTP within rod photoreceptors. We agree that we have provided many experiments that support our hypothesis that altering the Arl3-GTP gradient between cilium and cell body produces a nuclear migration defect. Some of our favorites include Fig 6, where we find that the migration phenotype is only rescued with expression of ciliary cargos and not rescued by non-ciliary cargos. Also, the new data requested by reviewers showing Arl13B expression in the cilium can restore the Y90C defect further supports that the Arl3 ciliary gradient is necessary for proper nuclear migration.

    1. Author Response

      Reviewer #1 (Public Review):

      Pan et al. examined the role of oligodendroglial exocytosis, and specifically the role of L-type prostaglandin D synthase (LPGDS), in modulating oligodendrocyte differentiation and myelination. The topic of autocrine and paracrine signaling within the oligodendrocyte lineage is under-studied and the authors use a novel approach for oligodendrocyte precursor-specific inhibition of VAMP-mediated exocytosis using inducible expression of botulinum toxin with the PDGRFa-CreER transgenic mouse line (PD:ibot). Using a combination of in vitro culture systems and immunohistological analysis in vivo, the authors find ibot expression in OPCs leads to reduced oligodendrogenesis and myelination, leading to a behavioral deficit in rotarod performance. Additional transcriptomic analysis in PD:ibot mice revealed Ptgds, the gene encoding LPGDS, was significantly overexpressed in both mature oligodendrocytes and OPCs. Further pharmacological experiments with cultured OPCs showed direct LPGDS inhibition led to a similar inhibition of oligodendrogenesis as PD:ibot mice. Together, this study reveals VAMP-mediated exocytosis in OPCs is required for normal oligodendrogenesis and identifies LPGDS as a new chemical regulator of oligodendrocyte myelination. These findings are strengthened by careful characterization of the PD:ibot mouse line and effective use of culture systems and pharmacology to uncover a cellular mechanism. Quantification is performed at several levels of resolution using immunohistochemistry, electron micrography, and protein/transcriptomic analyses and control experiments were largely carefully considered.

      We thank the reviewer for recognizing the strength of our study.

      Despite these strengths, there are some points that need to be further addressed. The interpretation of autocrine/paracrine signaling relies on a critical culture experiment in which PD:ibot OPCs were cultured in the presence of PD:ibot or control OPC well inserts. However, these results had a marginal effect size, raising questions as to the extent to which VAMP inhibition specifically had effects through the blockade of exocytosis (resulting in an autocrine/paracrine signaling deficit) or inhibited oligodendrogenesis in a cell-intrinsic mechanism (e.g. VAMP-dependent trafficking of critical myelination components, such as PLP (Feldmann et al., 2011)).

      We agree with the reviewer that both cell autonomous and cell non-autonomous effects may contribute to the defect associated with VAMP inhibition. We performed additional experiments to investigate the contribution of cell non-autonomous mechanisms. We took advantage of the fact that all OPCs purified from PD:ibot mice were not botulinum-GFP-expressing (efficiency ~65% Figure 6B, page 24). The GFP- cells in PD:ibot OPC cultures did not express botulinum toxin and were competent in exocytosis. We compared the development of GFP- control cells in cultures generated from PD:ibot mice vs. control cells in cultures generated from control mice. Interestingly, we found that the percentages and sizes of lamellar cells in control cells in PD:ibot cultures were smaller than in control cells in control cultures (Figure 6C, D text page 25). Although both groups of cells were competent in exocytosis, they were surrounded by exocytosis-deficient vs. exocytosis-competent neighbor cells. The differences in the growth capacity of control cells in the presence of different neighbor cells reveal cell non-autonomous contributions of botulinum-expressing cells in oligodendrocyte development.

      As described above under Essential Revisions 4), we performed additional experiments on the role of the secreted protein L-PGDS in oligodendrocyte development. We found that adding a protein that inactivates PGD2, HPGD extracellularly to oligodendrocyte cultures inhibited their development (Figure 7F, G, page 33). Adding L-PGDS protein extracellularly to PD:ibot oligodendrocyte cultures rescued their development defect (Figure 9A, B, page 33). Moreover, overexpressing Ptgds in PD:ibot mice partially rescued the myelination defect (Figure 9E-H, page 36). These observations further strengthened our conclusion that cell non-autonomous mechanisms contribute to the effect of botulinum toxin on oligodendrocyte and myelin development.

      Nevertheless, these results do not rule out the cell autonomous effect of botulinum on oligodendrocyte development and, therefore, we included the potential contribution of both cell autonomous and cell non-autonomous mechanisms in the text.

      Additionally, the authors claim the reduced number of oligodendrocytes in PD:ibot mice in vivo is not due to oligodendrocyte apoptosis and provide evidence by cleaved caspase-3 immunostaining of the cerebral cortex. While statistically not significant, the data is highly variable.

      We thank the reviewer for pointing out the variability of the caspase-3 results. We performed a more thorough analysis of activated caspase-3 at multiple developmental stages. Again, we did not find any statistically significant difference in apoptosis between PD:ibot and control oligodendrocytes, OPCs, or cells of other lineages (Figure 3-figure supplement 1, text page 13).

      If true, this would suggest oligodendrocyte differentiation is inhibited, which would coincide with a reduction of OPC proliferation. A complementary experiment comparing the rates of OPC proliferation between control and PD:ibot mice in vivo would provide further clarity on how oligodendrocyte density is being reduced.

      We analyzed OPC proliferation in vivo by staining and quantifying Ki67+PDGFRa+ cells. Intriguingly, we found a modest increase in OPC proliferation in PD:ibot mice (Figure 3-figure supplement 3, text page 14).

      The relevance of these myelination deficits is assessed with a rotarod assay, however, the mice used for these experiments are several times older (2-5 months) than those used for all other histological quantification (P8-P30). The large variance in results could be due to age-related differences in myelination, and it is unclear whether the deficits at early timepoints show a linear progression with age.

      We thank the reviewer for the insightful comment. We have separately labeled data points from 2 months old and 5 months old mice (Figure 3Q-S, text page 17). With the data we have so far (n=20-27 per genotype), there isn’t a striking progression of phenotype with age. Future analysis at multiple time points may resolve any age-dependent changes in the phenotype.

      Reviewer #3 (Public Review):

      The authors pose an important question of whether oligodendrocyte lineage cells have an autocrine/paracrine signaling loop that contributes to their differentiation and myelination. While prior studies have demonstrated oligodendrocyte lineage cells have cell-intrinsic pathways that impact differentiation and myelination, there isn't a strong precedent for oligodendrocytes to promote their own differentiation via autocrine/paracrine mechanisms. The notion that oligodendrocyte lineage cells promote their own differentiation in an autocrine/paracrine manner is an intriguing one that adds a new layer to our understanding of how oligodendrocyte maturation is controlled. I anticipate this paper will prompt a new direction of future investigations to uncover the extent of oligodendrocyte autocrine/paracrine signaling.

      To test the possible role of oligodendrocyte-secreted molecules on oligodendrocyte development, Pan et al. utilized a mouse model where the release of a subset of secretory vesicles (specifically VAMP1/2/3-dependent vesicles) is blocked. Blocking this vesicular release prevented or delayed the differentiation of oligodendrocytes in vivo and in vitro. Further, the authors identified changes to the mRNA and secreted protein levels of prostaglandin D2 synthase (L-PGDS). Prior RNA sequencing and snRNA sequencing datasets of the oligodendrocyte lineage have identified Ptgds as a highly abundant mRNA transcript in oligodendrocyte lineage cells, particularly mature oligodendrocytes. Ptgds encodes L-PGDS, which has an unknown role in oligodendrocyte function. L-PGDS has been shown to regulate Schwann cell myelin formation in the peripheral nervous system, prompting the question of whether this protein acts similarly in the central nervous system. The paper has a clear set of well-rounded experiments, with a few remaining points that would strengthen the conclusions:

      We thank the reviewer for the positive comments on our study.

      One of the foundational conclusions of the study is that VAMP1/2/3-dependent exocytosis is critical to oligodendrocyte maturation, by using a PDGFRa-CreER mouse line combined with iBot mice that express botulinum toxin in Cre-expressing cells (abbreviated as PD:iBot). Prior work has demonstrated in vitro that oligodendrocyte morphological maturation, myelin gene expression and myelin protein transport can all be impacted by the loss of VAMPs, including VAMP3. This paper establishes the importance of these SNARE proteins in the oligodendrocyte lineage in vivo: the number of mature (CC1+) oligodendrocytes and myelin basic protein staining is substantially reduced in PD:iBot mice.

      1) The data in Figure 3M suggests that PD:iBot oligodendrocytes (GFP+) are lacking MBP+ sheaths and that any myelin formed is by the smaller percent of oligodendrocytes that do not express botulinum (GFP- cells). Furthermore, the efficiency of iBot expression (as evaluated by GFP+ cells) shows that 80% of OPCs and just 60% of oligodendrocyte lineage cells express GFP at P8 and supplementary data shows just 30% of oligodendrocyte lineage cells express GFP at P30. This raises the question of whether PD:iBot cells are unable to differentiate and die. While the authors show no change in caspase-dependent apoptosis in PD:iBot cells in vivo and in vitro, the data still suggests that blocking VAMP-dependent exocytosis itself slows or prevents the progression to a fully myelinating oligodendrocyte in vivo rather than the putative autocrine/paracrine signals are required for OPC differentiation. Confirming whether botulinum-expressing cells also contribute to the population of surviving, differentiated oligodendrocytes in vivo to strengthen the conclusions that autocrine/paracrine secreted molecules contribute to the oligodendrocyte maturation in vivo.

      We thank the reviewers for raising a key point in characterizing the consequence of botulinum toxin expression in oligodendrocyte-lineage cells. We analyzed the overlap between GFP+ botulinum-expressing cells and the population of differentiated oligodendrocytes (Olig2+PDGFRa-CC1+ cells) and found that botulinum-expressing cells can survive and become differentiated oligodendrocytes (Figure 3-figure supplement 2, text page 14). Additionally, we performed a more thorough analysis of activated caspase-3+ apoptotic cells than was included in first submission and did not detect statistically significant differences between PD:ibot and control mice (Figure 3-figure supplement 1, text page 13).

      2) The paper has complementary in vitro data to pinpoint a mechanism that results in hindered oligodendrocyte maturation. The authors conduct a well-designed set of in vitro co-culture experiments in Fig4 K-M that led them to conclude oligodendrocyte morphology is impacted by secreted molecules from other oligodendrocytes.

      2a) The key experiment is the transwell co-culture experiment with control and iBot cells, which suggests that blocking secretion itself has the predominant impact on cell morphology: by eye, both group3 and 4 show the largest reduction in lamellar area and the difference between group 3 and 4 is slight. At day 3 of culture (Fig 4E), the authors show the clearest effect as a reduction in cells with lamellar morphology. The quantification of the lamellar cell area is less obvious than the % of cells with arborized vs lamellar shape, as seen in Figures E & F. I would recommend that the authors show representative images of these observations and quantification of morphologies for the transwell experiments. The impact of secreted factors may be clearer with this measure.

      We added representative images (Figure 6G). We quantified both the % and size of lamellar cells. The size of lamellar cells is significantly higher in group 4 than in group 3. Although the % of lamellar cells is numerically higher in group 4 than in group 3, the difference is not statistically significant. To further assess whether cell non-autonomous mechanisms contribute to the oligodendrocyte development defect in PD:ibot mice, we performed additional analysis in culture. We took advantage of the fact that all OPCs purified from PD:ibot mice were not botulinum-GFP-expressing (efficiency ~65% Figure 6B). The GFP- cells in PD:ibot OPC cultures did not express botulinum toxin and were competent in exocytosis. We compared the development of GFP- control cells in cultures generated from PD:ibot mice vs. control cells in cultures generated from control mice. Interestingly, we found that the percentages and sizes of lamellar cells in control cells in PD:ibot cultures is smaller than in control cells in control cultures (Figure 6C, D, text page 25). Although both groups of cells were competent in exocytosis, they were surrounded by exocytosis-deficient vs. exocytosis-competent neighbor cells. The differences in the growth capacity of control cells in the presence of different neighbor cells reveal cell non-autonomous contributions of botulinum-expressing cells in oligodendrocyte development.

      2b) On a related note, the cell morphology data is dependent on MBP staining. The authors show that MBP protein is reduced in cells from iBot mice. Since MBP+ cell area/arborized or lamellar structure is being quantified, there remains the possibility that the cells could display a more complex morphology (lamellar) that may be missed by only staining for MBP. The authors use a CellMask dye to show cellular morphology, which is a great idea. The authors state that it labels the plasma membrane; however, the methods (and images) indicate that a cytoplasmic CellMask was used (cat.no. H32720 labels nuclei and cytoplasm, not membranes). These conclusions about cell morphology vs simply MBP expression would be strengthened by an alternative membrane label (e.g., a CellMask plasma membrane dye).

      We thank the reviewers for the insightful suggestion. We used the membrane version of CellMask and repeated the transwell co-culture experiment. The new results are consistent with the results based on MBP (Figure 6-figure supplement 1, text page 23). In addition, we used the membrane version of CellMask for all the new cell culture experiments (L-PGDS rescue, HPGD etc.)

      3) The authors sought to identify what secreted factors may be affected by blocking VAMP1/2/3-dependent exocytosis. Pan et al. opted for a strategy of examining transcriptional changes, asserting that important genes may be upregulated in response to compensate for blocked secretion. While this is an indirect way to identify secreted candidates, the authors found a fortuitous result that Ptgds was substantially increased in the PD:iBot oligodendrocyte cells. To confirm that L-PGDS secretion is reduced from iBot cells, the authors show Western blots. By eye the change in L-PGDS is variable, however, the authors conduct several experiments with an inhibitor and product of L-PGDS that nonetheless indicate L-PGDS activity can contribute to the morphological maturation of oligodendrocytes. A caveat is that the AT-56 inhibitor reduces MBP+ cells, and the quantification of morphology is dependent on MBP staining (again, see my note in 2b about the CellMask dye). A report on differentiation (% MBP+ cells) may be a more accurate reflection of the result.

      We repeated the AT-56 experiment using the membrane version of CellMask and again found that AT-56 inhibits oligodendrocyte maturation (Figure 7-figure supplement 2, text page 33).

      The key, compelling experiment demonstrating the role of prostaglandin D2 is the authors' rescue experiment in Fig 4G.

      As described above under Essential Revisions 4), we performed additional rescue experiments on the role of L-PGDS in oligodendrocyte development. We found that adding L-PGDS protein extracellularly to PD:ibot oligodendrocyte cultures rescued their development defect (Figure 9A, B, page 34). Moreover, overexpressing Ptgds in PD:ibot mice partially rescued the myelination defect (Figure 9E-H, page 36).

      4) Although it's not a direct demonstration that L-PDGS secretion from oligodendrocytes is the key factor, the global L-PDGS knockout mice phenocopy many of the observations of the PD:iBot mice. This is a nice set of observations consistent with the author's hypothesis that L-PDGS impacts oligodendrocyte maturation. Future work should pinpoint whether oligodendrocyte-derived L-PDGS is critical.

      We agree with the reviewer that pinpointing whether oligodendrocyte-derived L-PGDS promotes oligodendrocyte development and myelination is an interesting direction to pursue in future work. We are breeding L-PGDS conditional knockout mice to address this question and may report the results in a separate paper in the future.

      Minor points:

      1) The authors demonstrate that PD:iBot expresses botulinum and loses VAMP2 protein levels in oligodendrocyte lineage cells, but there is no demonstration of whether VAMP3 is expressed or similarly affected. Prior work has demonstrated in vitro that oligodendrocytes express both VAMP2 and VAMP3 (VAMP1 not detected). This would more clearly demonstrate which VAMP-mediated vesicular transport is blocked for the effects observed.

      We agree with the reviewer and examined VAMP3 levels with Western blot. We found diminished levels of VAMP3 in oligodendrocyte-lineage cells from PD:ibot mice (Figure 1 J, M, text page 10).

      2) It is satisfying to observe a behavioral effect in the PD:iBot mice. I would advise caution in interpreting any direct link between oligodendrocytes maturation and the rotarod behavioral difference at this time. Blocking secretion from PDGFRa-Cre expressing cells may have many indirect effects (beyond myelination) in both the CNS and other cell types that can express PDGFRa and VAMPs1/2/3. I was pleased that the authors did not conclude any direct links at this time.

      We agree with the reviewer.

      Overall, the authors had a well-rounded manuscript with clearly described and thoughtful experiments. The data support the conclusion that VAMP-mediated exocytosis is critical for oligodendrocyte maturation. The evidence that reduced L-PDGS secretion from the oligodendrocytes can explain the effects of the iBot mice is not as clear cut, but their data does demonstrate that L-PDGS is an important molecule for the differentiation of oligodendrocytes. This work will lead a new direction for future studies to investigate autocrine/paracrine signaling in oligodendrocyte maturation.

      We thank the reviewer for the positive comments on our manuscript. As detailed in Essential Revisions 4), we now provide additional evidence on the potential contribution of L-PGDS in the oligodendrocyte development defect in PD:ibot mice.

    1. Author Response

      Reviewer #3 (Public Review):

      Garratt et al. investigated that transient exposure of young mice during their first two months of life with olfactory cues from con-specific adults would have long-lasting effects on their late-life health and lifespan. They find that the olfactory cues have sex-specific effects on lifespan, which only the lifespan of young females can be extended by odours from adult females but no other combinations, neither young females with adult males nor young males with either sex. Interestingly, their data also suggested that depletion of G protein Gαo in the olfactory system played no role in the lifespan extension, indicating it might be another unknown factor(s) mediating this sex-specific effect on longevity in mice. While the conclusions of this study are well supported by the data, there are some issues with parts of the data analysis and presentation that would need to be clarified and extended.

      1) The authors suggested that the G protein Gαo played no role in lifespan extension in the case that transient exposure of young females with olfactory cues from female adults, as they showed in Figure 1. However, it is not clear if the depletion of G Gαo (Gαo mutant) itself has effects on lifespan, compared to its wild type. It would be important to show the lifespan curves from wild type and Gαo mutant individually alongside the pooled lifespan curves, as well as regarding data in a table, followed with a proper discussion.

      Data for genotypes is now shown individually.

      2) Regarding the functional tests, the authors showed that there was only a small fraction of experiments showed differences between treatments, which were all in figure 2. However, it is necessary to also show the data with no differences, particularly since the conclusion of the study suggested the underlying mechanisms are not clear yet. In my opinion, body weight, plasma glucose, and body temperature all deserve to have their figures individually with all data points.

      This data is now shown.

      3) As the authors mentioned in the Introduction, the age at sexual maturity correlates positively with the median lifespan across mice strains (Yuan et al. 2012, Wang et al. 2018). Also, young female mice that were exposed to male odours during their developmental stage accelerated sexual maturity (Drickamer 1983), and the same happened to young males that were exposed to the odours from the opposite sex (Vandenbergh 1971). It is, therefore, surprising to see in this study, the exposure of young females or young males to the olfactory information from their opposite sex had no effects on lifespan. One of the solutions to solve this disparity is to measure the sexual maturity of the mice in this study. The authors should seek the possibility to check the record of when the first litter of pups was born between treatments (Shindyapina et al. 2022) or examine preputial separation and vaginal opening (Hoffmann 2018), for instance.

      The animals used in the lifespan experiment were not allowed to breed so as not to interfere with the lifespan assessment. Similarly, we did not check animals within the lifespan experiment for sexual maturity as we wanted to minimize the handling of animals after weaning, and this requires daily handling and/or vaginal swabbing.

      We conducted a preliminary experiment prior to the main lifespan experiment (in UM-Het3 mice) to test whether sexual maturity was modulated in the expected directions with the odour exposure protocol we planned to impose. This experiment showed that the odor manipulation we applied has the expected effects on sexual maturity. We have now outlined this experiment and its results in the methods section of the paper to justify the odor treatment protocol.

  3. Nov 2022
    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, the author characterizes the lattice of kinesin-decorated microtubule reconstituted from porcine tubulins in vitro and Xenopus egg extract using cryo-electron tomography and subtomogram averaging. Using the SSTA, they looked at the transition in the lattice of individual microtubules. The authors found that the lattice is not always uniform but contains transitions of different types of lattices. The finding is quite interesting and probably will lead to more investigation of the microtubule lattice inside the cells later on for this kind of lattice transition.

      The manuscript is easy to read and well-organized. The supporting data is very well prepared.

      Overall, it seems the conclusion of the author is justified. However, the manuscript appears to show a lack of data. Only 4 tomograms are done for the porcine microtubules. Increasing the data number would make the manuscript statistically convincing.

      One tomogram can contain one to several tens of microtubules. For example, 64 microtubules were analyzed in the Xenopus-DMSO dataset obtained on 5 tomograms, versus 24 microtubules for the GTP-dataset obtained on 4 tomograms (see Table 1). Hence, taking the number of tomograms to assess the statistical relevance of our work cannot be considered as a valid criterion. Tomograms are taken randomly on the EM-grid sample, solely based on ice quality and the covering of microtubules in the holes as determined at low magnification before tomographic acquisition. No prior knowledge of the structure and lattice-type organization of the microtubules can be obtained before acquisition. It appears to us that a more pertinent criterion is the number of events that we characterized, specifically lattice-type transitions along individual microtubules. In the dataset mentioned by the referee (see Figure 2-figure supplement 3-4 and Table I), 24 microtubules were analyzed and further divided into 195 segments, providing an equivalent number of individual 3D reconstructions. For each 3D reconstruction, almost all lateral interactions could be characterized in terms of lattice-type, i.e., 2091 of the B-type, 460 of the A-type, and 112 not determined (essentially at transition regions). Most importantly, we document in this specific dataset 119 transitions in lattice-type, which we think is sufficient to characterize such molecular events and provide solid statistics for this dataset. Adding the GMPCPP and Xenopus data, we end-up with 938 individual 3D reconstructions (not including the full-length microtubule volumes), 12 463 lateral interactions analyzed (A-, B-, or ND-type), and the observation of 172 lattice-type transitions. Therefore, we respectfully disagree with the referee stating that our work lacks data.

      To highlight the quantity of data used in our work, we have modified the following sentences: L124-131: ' Analysis of 24 microtubules taken on 4 tomograms, representing 195 segments of ~160 nm length (i.e., 2664 lateral interactions), allowed us to characterize 119 lattice type transitions with an average frequency of 3.69 µm-1 (Table 1), but with a high heterogeneity' L160-164: ' Analysis of 31 GMPCPP-microtubules taken on 6 tomograms, representing 338 segments of ~150 nm in length (i.e., 3236 lateral interactions), and using the same strategy as in the presence of GTP (Figure 5—figure supplement 1-2) revealed a transition frequency of 1.25 µm-1 (Table 1), i.e., ~3 fold lower than microtubules assembled in the presence of GTP.' L200-203: ' A total of 64 microtubules taken on 5 tomograms were analyzed in the Xenopus-DMSO dataset (i.e., 419 segments from which we characterized 5446 lateral interactions), and 15 microtubules taken on one tomogram for the Xenopus Ran-dataset (i.e., 86 segments from which we characterized 1118 lateral interactions), (Table 1).'

      In addition, having the same transition with the missing wedge orientation randomly from different subtomograms will allow a better average of transition without the missing wedge artifact.

      In this work, we did not aim at averaging transitions. Transitions in lattice-types are highly heterogeneous in nature, and we wonder what additional information an averaging strategy would have provided. Conversely, each transition is a unique event that we characterized to obtain useful statistics, and the missing data at high angle inherent to electron tomography were not an obstacle to fulfill this task.

      Another thing that I found lacking is the mapping of the transition region/alignment in the raw data.

      In Figure 4, we clearly show the correspondence between the segmented sub-tomogram averages (SSTA) and the raw filtered images at the transition region. This is also the case in Figure 5 where the SSTA (Figure 5A) are compared with the raw tomogram (Figure 5B), and where we clearly visualize the holes that result from the transitions in lattice types.

      However, it is not easy for me or the reader to understand how each segment is oriented relative to each other apart from the simplified seam diagrams in the figures, and also the orientation of the seam corresponding to the missing wedge in the average. With these improvements, I think the conclusion of the manuscript will be better justified.

      The segmentation process is explained in Figure 2-figure supplement 2 and in the Materials and Methods section, which shows that each segment is linearly related to the next. Small rotations can happen between individual segments, and it is important to check that the same protofilaments are followed during the initial modeling (see the online tutorial referenced in the manuscript for full-length microtubules). The segment models are derived from that of the full-length microtubule, as explained in the Materials and Methods section, using a new routine (splitIntoNsegments) implemented into the PEET program. In addition, a detailed protocol describing our SSTA strategy will be submitted following publication of our manuscript.

      Reviewer #2 (Public Review):

      Differences in protofilament and subunit helical-start numbers for in vitro polymerized and cellular microtubules have previously been well characterized. In this work, Guyomar et al. analyze the fine organization of tubulin dimers within the microtubule lattice using cryo-electron tomography and subtomogram averaging. Microtubules were assembled in vitro or within Xenopus egg cytoplasmic extracts and plunge frozen after addition of a kinesin motor domain to mark the position of tubulin dimers. By generating subtomogram averages of consecutive sections of each microtubule and manually annotating their lattice geometry, the authors quantified changes in lattice arrangement in individual microtubules. They found in vitro polymerized microtubules often contained multiple seams and lattice-type changes. In contrast, microtubules polymerized in the cytoplasmic extract more frequently contained a single seam and fewer lattice-type transitions.

      Overall, their segmented subtomogram averaging approach is appropriately used to identify regions of lattice-type transition and quantify their abundance. This study provides new data on how often small holes in the lattice occur and suggests that regulators of microtubule growth in cells also control lateral tubulin interactions. However, not all of the claims are well supported by their data and the presentation of their main conclusions could be improved.

      1 - We have corrected approximative claims and conclusions where necessary. In particular, we now discuss separately the Xenopus-DMSO and the Xenopus-Ran egg extract samples, and have modified our conclusions accordingly. We also deposited onto the EMPIAR all tomograms and PEET models to reproduce the 938 segmented sub-tomogram averages analyzed in this study (see new Supplementary file 2).

      Reviewer #3 (Public Review):

      Protofilament number changes have been observed in in vitro assembled microtubules. This study by Guyomar and colleagues uses cryo-ET and subtomogram averaging to investigate the structural plasticity of microtubules assembled in vitro from purified porcine brain tubulin at high concentrations and from Xenopus egg extracts in which polymerization was initiated either by addition of DMSO or by adding a constitutively active Ran. They show that the microtubule lattice is plastic with frequent protofilament changes and contains multiple seams. A model is proposed for microtubule polymerization whereby these lattice discontinuities/defects are introduced due to the addition of tubulin dimers through lateral contacts between alpha and beta tubulin, thus creating gaps in the lattice and shifting the seam. The study clearly shows quantitatively the lattice changes in two separate conditions of assembling microtubules. The high frequency of defects they observe under their microtubule assembly conditions is much higher than what has been observed in vivo in intact cells. Their observations are clear and supported by the data, but it is not at all clear how generalizable they are and whether the defect frequencies they see are not a result of the assembly conditions, dilutions used and presence of kinesin with which the lattice is decorated. The study definitely has implications for mechanistic studies of microtubules in vitro and raises the question of how these defects vary for protocols from different labs and between different tubulin preparations.

      1 - High tubulin concentration: It has been documented by many laboratories since the discovery of tubulin and the characterization of its assembly properties that a sufficient concentration of free tubulin is necessary to self-assemble microtubules. This is called the critical concentration for self-assembly (the CC, i.e., the critical concentration to overcome the nucleation barrier), and has been reported to be in the range 14~25 µM in the presence of GTP depending on laboratories. For example, in the seminal work of Mitchison and Kirschner the CC was estimated at 14 µM (Fig. 5 of ref. (Mitchison & Kirschner, 1984b)) and self-assembly was induced at concentrations in the range 32-59 µM (Mitchison & Kirschner, 1984a). Our own estimate of the CC for porcine brain tubulin was 21 µM (Fig 2C of (Weis et al., 2010)), and we routinely use a tubulin concentration slightly above the CC when we aim at robust microtubule self-assembly. Hence, we argue that 40 µM, which is ~twice the CC, cannot be considered as a "very high" tubulin concentration to induce microtubule self-assembly.

      2 - Protofilament number and lattice-type transitions in cells: While microtubules with protofilament numbers different than 13 have been observed in different cell types and species (reviewed in (Chaaban & Brouhard, 2017)), we are aware of only one recent study where changes in protofilament numbers along individual microtubules have been reported in cells (Foster et al., 2021), but with no statistics concerning their frequencies. Hence, we cannot compare changes in protofilament number frequencies in Xenopus egg extracts with those that occur in intact cells. Concerning lattice-type transitions, we are not aware of any previous study that documented such features, whether in vitro or in cells.

      3 - Generalization of our results, source of tubulin and protocols: Multi-seams in microtubules assembled in vitro have been reported by several groups in the past (see our Introduction, L49-62), starting from (Kikkawa et al., 1994), the Milligan group (Dias & Milligan, 1999; Sosa et al., 1997), and more recently by the Sindelar group (Debs et al., 2020). In Kikkawa et al. (1994), the authors purified tubulin from porcine brain by three cycles of assembly/disassembly followed by phosphocellulose chromatography. Assembly was carried out at 24 µM in the presence of Taxol. In Sosa and Milligan (1996-1997), the authors used a commercial source (Cytoskeleton) and assembled the microtubules at 30 µM in the presence of Taxol. In Debs et al. (2020), the authors used tubulin purified from porcine brain according to (Castoldi & Popov, 2003), as we did, to assemble GMPCPP microtubules, and bovine brain tubulin (Cytoskeleton) to assemble Taxol-stabilized microtubules. Noticeably, they used an initial tubulin concentration of 100 µM to initiate microtubule polymerization and then added Taxol to continue the reaction.

      We add to these previous studies that microtubules with different numbers of seams are not unique ones, but that both the number and location of seams can vary within individual microtubules. The reason why this was not observed before is that the analytical tools used in those previous studies were not suited to reveal this structural heterogeneity within individual microtubules. By contrast, the SSTA approach that we designed was specifically developed towards this aim. Even in the recent work by Debs et al. (2020) that provides the most comprehensive characterization of multi-seams in microtubules assembled in vitro and that obtained a seam distribution very similar to ours (compare their Figure 3C with our new Figure 10C for GDP microtubules, dark blue bars), their protofilament-based approach could not reveal changes in the number and location of seams within individual microtubules. Yet, they probably could have done it if they had asked whether segments with different seam numbers had been extracted from the same microtubules.

      Here, we designed a specific approach to tackle the structural heterogeneity of individual lattices that permitted this discovery. Not only do we confirm results obtained by others, but we also propose a molecular mechanism that explains how multi-seams form in microtubules assembled in vitro and how they change in location in a cytoplasmic environment. By doing so, we propose a novel molecular event - formation of unique lateral interactions without longitudinal ones - that was not envisioned before, and which to our opinion, must be incorporated in further modelling studies concerning microtubule nucleation and assembly, including the mechanism of dynamic instability (see the Ideas and speculation section).

      4 - Dilution: A 50X dilution was used only for Xenopus egg cytoplasmic extracts to decrease their density on the EM grid just before freezing. These conditions were settled by cryo-fluorescence microscopy to ensure that we had the adequate density of microtubules onto the EM-grid (Figure 7 and Figure 2—figure supplement 1D). Of note, the microtubules analyzed by SSTA were assembled in extracts that were not supplemented with fluorescent tubulin. While we could imagine that dilution may induce the removal of dimers from the microtubule lattice, we cannot foresee how this could change the register between tubulin subunits within the microtubule lattice.

      5 - Kinesin decoration: Like many other laboratories (see the Table in Figure 3 of (Manka & Moores, 2018)), we use the non-processive motor domain of kinesin 1 to decorate microtubules, with the aim to differentiate the - and -tubulin monomers within the microtubule lattice. In particular, it has been shown that lattice parameters such as the protofilament skew and lattice spacing are unmodified when kinesin motor domains are added to GMPCPP- or GDP-microtubules (Zhang et al., 2015, 2018). In addition, we cannot envisage how this non processive motor added to preformed microtubules could change the registry of the -tubulin heterodimers within the microtubule lattice.

    1. Author Response

      .Reviewer #1 (Public Review):

      1) It is important to emphasize that the osteoporotic phenotypes were only demonstrated in males, but not in female mice. The observed phenotypes were not hormone-dependent, as no significant differences in examined bone parameters were observed between wild type andPrdx5KO female mice in an ovariectomy-induced osteoporosis model. However, women over 50 have a four times higher rate of osteoporosis compared with men, and the role of testosterone in the development of osteoporosis in Prdx5KO mice should be investigated. It is known that the osteoporosis is increased in men with low level of testosterone.

      Thanks for your comments regarding osteoporosis phenotypes in Prdx5 KO males and their relation with testosterone levels. Based on your suggestion, we re-examined testosterone levels in the serum of male mice and tested the expression levels of the androgen receptor (AR) in the differentiated osteoblasts and osteoclasts of the mice. We have updated the data in Figure 3-figure supplement 2 and included the revised information in the Results (Pages 13-14) and Discussion (Page 34) sections.

      2) It is misleading for authors to state throughout the manuscript that osteoporotic phenotypes are observed in Prdx5KO mice, while it is only observed in male mice.

      We apologize for this oversight. We have modified the text and indicated that all osteoporotic phenotypes were observed in Prdx5 KO male mice.

      Reviewer #2 (Public Review):

      1) While the abstract emphasizes transcriptomic analysis and mass spectrometry, extensive imaging techniques have also been used and should be highlighted to give an overview of results from the performed techniques.

      In addition, make it clear that it is proteomics-based mass spectrometry, since I was only able to confirm that after seeing Figure 5.

      Thanks for your helpful suggestions. We have modified the Abstract based on your suggestions.

      2) Line 46-53: I would add more details of how balanced bone mass looks on average, how much is too much, when should we be concerned about bone mass, and does some amount of stress benefit bone mass?

      Thank you for the suggestion. We have modified the Introduction. We wanted to explain that for bone as a supporting organ, general mechanical stress is required for its remodeling, although we agree that it is not some necessary information related to our study and may confuse the readers.

    1. Author Response

      Reviewer #3 (Public Review):

      Results of this manuscript provide a new link between oxygen sensing and cholesterol synthesis. In previous studies, this group showed that the cholesterol synthetic enzyme squalene monooxygenase (SM) is subjected to partial proteasomal degradation, which leads to the production of a truncated, constitutively active enzyme. In this study, the authors provide evidence for the physiological significance of SM truncation. In a series of experiments, the authors show that subjecting cells to hypoxia (oxygen deprivation) induces truncation of SM. The synthesis of cholesterol requires 11 molecules of oxygen and SM is the first oxygen-dependent enzyme in the cholesterol-committed branch of the pathway. Evidence is presented that hypoxia causes squalene, the substrate of SM, to accumulate, which results in the enzyme's truncation. In addition, hypoxia stabilizes MARCHF6, the E3 ligase required for sterol-dependent ubiquitination and degradation of SM. Finally, the authors provide an experiment showing that truncation of SM correlates with hypoxia in endometrial cancer tissues.

      Overall, the data presented in this manuscript are compelling for the most part. Hypoxia-induced truncation of SM and MARCHF6 is very clear according to the presented results. The specificity of SM-induced truncation is strong; both direct addition and inhibitor studies are presented. The major strength of this manuscript is that it provides the physiological relevance for the authors' previous finding that squalene accumulation leads to truncation of SM. However, there are a few issues that should be addressed to improve the interpretation of the data presented.

      We thank the reviewer for their useful comments.

      The manner in which quantified immunoblots are presented is very confusing and difficult to interpret. This is evident in experiments in several figures. For example, it is difficult to determine the role of ubiquitination (Figure 2D) and MARCHF6 (Figure 2E) in the generation of truncated SM. The authors should present quantified data of all lanes of the immunoblots to reduce confusion.

      The revised manuscript includes quantification of protein levels for all immunoblot lanes, including in Figure 2D and Figure 2E (now Figure 3A). It also contains updates to the text, figure legends, and axis labels to improve clarity about data normalization. For more information, please refer to our response to Essential Revisions comment #1.

      The other important finding of this manuscript is that hypoxia stabilizes MARCHF6. This is supported by the results of Fig. 3A; however, the result of Figure 3B is not clear. A new band appears upon inhibition of VCP and MG-132 seems to reduce protein expression. These results could be removed from the manuscript without impacting the conclusions drawn.

      As suggested, the revised manuscript contains only the initial observation that hypoxia stabilizes MARCHF6. Other experiments investigating the mechanism have been removed. For more information, please refer to our response to Essential Revisions comment #2.

      Finally, the results shown in Figure 5 showing that truncation of SM correlates with hypoxia in endometrial cancer tissues are a little preliminary. Multiple bands are detected in SM immunoblots, which interferes with interpretation. This experiment could be removed and speculated upon in the discussion.

      As suggested, this experiment is removed from the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      They established a "behavioral transcriptomics" platform as they cultured mouse primary cell explant on an apparatus, imaged the cells over time, and analyzed cells with differential physiological status by scRNA-seq. They showed evidence that the system recapitulated physiological features of airway cells, including chemical-induced damage response. They further utilized the system to isolate cells of different cellular features and analyzed gene expression through scRNA-seq. The study demonstrates an interesting establishment and application of an in vitro system mimicking in vivo.

      However, several major concerns need to be resolved.

      First, whereas the overall study seems to focus on the establishment of airway epithelial cell explant apparatus and its application, take home messages that are delivered by the authors seem to emphasize the transcriptome analysis part. The authors introduced "spatial transcriptomics" and"behavioral transcriptomics" in the abstract but it is hard to appreciate that the study resolves spatial transcriptomics. This causes unnecessary confusion. Second, probably related to the first question, it is hard to find the novelty of the study. Third, probably the last and most important part of the manuscript is to analyze the cells by Smart-seq. But the analysis was performed on the SO2 injured animal only and lacked experiment on wildtype mice. If the authors tried to prove the feasibility of the technique rather than resolving physiological mechanism here, then I would recommend explaining why wild type experiment was not performed.

      The method described in the manuscript consists of two components: a novel tissue imaging platform, and characterization of a cellular behavior. Both steps can be generalized to different tissue contexts and different cellular behaviors, respectively. We have revised the title and abstract to specify the scope of this study and have also revised the text accordingly.

      Live imaging allows us to observe cell behaviors in intact tissues but does not provide information on cell type. By profiling cells that are observed by live imaging to share a behavior at single-cell resolution rather than bulk, we can separate out sources of transcriptional variation, like cell type identity, in order to identify the transcriptional signatures that reflect cell behaviors.

      Single-cell sequencing (via Smart-seq) has been previously performed in wild-type mouse trachea (Montoro et al., 2018), and identified underlying cellular heterogeneity. However, the steady state tracheal epithelium is largely quiescent, characterized by slow turnover and a lack of visible cell motility. We performed daily imaging of trachea explants from uninjured mice over 4 days and did not observe any significant displacement of epithelial cells. Furthermore, we also imaged an uninjured explanted tracheal epithelium every 40 minutes for over 19 hours with no significant directional movement (new Movie 3). We added the following text to the manuscript: “Imaging of trachea explant controls from uninjured mice over 19 hours revealed no cellular displacement in the airway epithelium (Movie 3).”

      In contrast, regeneration activates cell motility followed by cell proliferation. Therefore, we chose tissue regeneration as the more suitable biological context for this study to examine cellular dynamics. We leveraged the gene signatures derived from the previous wild-type study (Montoro et al., 2018) to identify different cell types and make like-for-like comparisons. We used an independent regeneration dataset in the same tissue but with a different injury model (Plasschaert et al., 2018) to test whether the molecular signatures derived in our study that differentiate moving and non-moving cells are generalizable to other contexts.

      Reviewer #2 (Public Review):

      Kwok et al. devise a method that uses a transgenic mouse line to make the link between cell behaviour in intact living tissue and subsequent dissociation into distinct groups forsingle cell sequencing. Specifically, they set up a mouse airway culture system in which it is possible to maintain live cells for multiple days and then preserve the same tissue. The analysed tissue section can be fixed and known cell types identified via classical staining protocols. In this system they imaged a number of tissue phenotypes such as ciliary beating, mucociliary clearing and airway regeneration. With respect to airway regeneration they observe that there was cellular heterogeneity between cells with the capacity to move and so-called non-movers, which the authors were able to quantitively track.To make the link with single cell sequencing, they use the Kaede transgenic mouse lines,which contains a green fluorescent reporter gene, that can be converted into a red fluorescent reported gene by illuminating a defined tissue section, in this case regions enriched for movers or non-movers. After dissociation of the tissue, cells were FACSsorted using the reporter protein. Subsequent single cell RNAseq revealed distinct gene signatures that were associated with the mover versus the non-mover phenotype. These phenotypes could also be detected in previously published data sets.

      The conclusions of the paper are supported by the data that is presented, but the comparison to existing mouse injury data could be improved. A weakness of the paper is the implication that the technique can be used for any of the phenotypes that they have examined. However, in order to be assessed by this method,there need to be a reasonably large number of cells that show similar behaviour in a region that can be photoconverted. If it is indeed possible to do the photoconversion at the single cell level, the authors should demonstrate that such resolution is possible, or otherwise clearly state this limitation of the technique they have developed.

      We recognize that the approach in this study does not involve photoconversion at single-cell resolution. While single-cell photoconversion and subsequent intermittent live imaging has been demonstrated in other systems such as zebrafish (Green and Smith, 2018) and mouse skin (Park et al., 2017), the throughput of doing downstream single-cell analysis would be limited, especially in a cell type-specific manner. Having observed a relatively homogeneous behavior of cells within a small region (~200 μm diameter, Movie 1 and Movie 2) of the airway epithelium, we photoconverted a small area with several hundred cells. Subsequent single cell sequencing allowed us to compare differences in gene expression between basal cells of slow/non-moving regions to basal cells of fast/moving regions.

      Reviewer #3 (Public Review):

      In this manuscript, the authors identify a pressing need to couple visualized in situ cell behaviour with deep molecular profiling of visualized cells, aiming to move beyond inferences made from time-lapse tissue sampling approaches or the analysis of transcriptional kinetics to identify the molecular pathways that drive cellular behaviour in situ. The authors identify live cell imaging combined with deep molecular profiling of theimaged cells as one possible solution. To this end, the authors establish a novel platform for live cell imaging of tracheal epithelial cells using explants of mouse trachea that allows long-term visualization of cell behaviour, and try to couple live-cell imaging to the transcriptional cell states.

      Combining single-cell RNA-seq analyses with live cell imaging offers the unique opportunity to link transcriptional and anatomic, morphological or movement phenotypes of individual cells. To be able to do this in intact tissues at baseline and in response to injury would allow a far more detailed and integral analysis of cellular behaviour in their physiological context. As such, the approach of the authors is interesting and clearly focused on achieving this goal. The only data that can support a claim of successfully achieving this ambitious goal are presented in figure 3, where an advanced mouse model(the Kaede-Green mouse) is used that allows labelling individual cells by photo-conversion, followed by isolation of individual cells by flow cytometry and plate-basedsc RNA-seq analysis of sorted cells. By taking this approach, the authors are able to identify transcriptional differences at the group level between tracheal epithelial cell subsets that differ in their movement after injury.

      While this in itself is a remarkable accomplishment, and an interesting observation, the relationship between the 'behaviour' of the cells observed with live cell imaging (the movement after injury) versus the transcriptional phenotype remains rather elusive. One explanation could be that active movement of cells depends on a specific transcriptional program, that is lacking from the non-moving cells. Another explanation could be that the tracheal epithelial cells are inherently heterogeneous, and one subset has the capacity to move whereas others do not, and the transcriptional profile merely identifies these heterogeneous populations. The observation that non-mover cell populations contain both basal and club cells, whereas mover regions only have basal cells seems to support this notion to some extent. However, the authors then claim to use basal-cell derived signatures (excluding the club cells) from mover and non-mover regions and compare this to literature data from another injury model to show that these signatures also identify distinct subsets in a mouse model of polidocanol-induced injury. How the distinction basal vs club cells in the non-mover regions is made remains unclear, and would seem challenging from the number of cells analyzed (as presented in figure 3).

      The identification of two behavioural phenotypes of basal cells (mover vs non-mover) in this manuscript is based on group-level phenotypes: the cells belong to a region of moversor a region of non-movers. This is relevant for figures 2 (including supplemental) and 3. In figure 2 supplemental 2C, it seems evident that within one region (or focussing only on all moving regions?), the behaviour of all cells within that region/selection is quite uniform:the variation is really very limited, and all cells seem to speed up and slow down in a highly coordinated fashion within the selected regions shown. At the same time, in figure2D, the distribution of regions across speed categories at 26-36 hours pi (the peak of the movement in suppl 2C) seems almost bimodal, with regions belonging either to non-mover(range 0.5 - 2.5 uM/hr) or mover (range 3.0-7.0 uM/hr) phenotypes. However, all regions display an increased movement at 16h pi compared to the pre-injury movements (Figure2C), indicating that all cells will be induced to induce movement to some extent.

      My main concern with this analysis is that the behavioural phenotype of the epithelial cells is assumed to be homogeneous within each region, allowing a contrast to be made in figure3 for the transcriptional phenotypes on the basis of moving phenotypes rather than on the basis of the main variation within the dataset.

      For instance, from the t-SNE plot (3B) - for what it's worth of course - and the heatmap (3C) there seems to be at least one non-mover cell that transcriptionally has a higher resemblance to the mover cells than to the other non-mover cells. Of course that can just be the variability present in the dataset, but it could also indicate that non-mover regions are not completely homogeneous, and even more so, that the moving vs non-moving associated transcriptional phenotype is a gradual transition rather than 2 clearly separate sub-phenotypes.

      All-in-all, this manuscript describes an interesting technical advance and shows some of the applications thereof. However, the approach also has its limitations: The requirement to mark cells with specific behavioural features for follow-up transcriptomic analysis (such as by photoconversion) necessitates the division of the epithelial cells into major categories on the basis of certain cellular phenotypes (such as movement) that can be visualized by live cell imaging. This limits the analysis opportunities to group-based contrasts in cellular behaviour as also used here by the authors.

      Also, the use of explanted tissue is of course less ideal than in vivo imaging, but most likely the only technically feasible approach at this moment. At the same time, the capacity to combine image-based features with single-cell transcriptomic data is an important advance, even when initially only possible in explanted tissue from mouse models carrying all kinds of fluorescent reporters. To strengthen the manuscript, it would therefore be important to discuss the limitations of the approach, as well as to provide a more comprehensive overview of the possible applications that the authors foresee.

      We thank the reviewer for the feedback. Our data demonstrates that the movement behavior is an injury-induced phenotype. 24 hours after injury (hpi), the “mover” transcriptional program is transiently enriched, while the “non-mover” transcriptional program is also transiently decreased, consistent with a cell state that is induced by injury (see Figure 4A, 24-hpi).

      SO2 removes nearly all the luminal cells (Rock et al., 2009) so we removed the club cells to compare injury response in basal cells. Distinguishing basal vs club cells is done by hierarchical clustering and comparison to established cell type signatures (Montoro et al., 2018). We apologize that the initial presentation did not make this clear. In the revised manuscript, we have provided an additional figure supplement demonstrating the hierarchical clustering (Figure 3 - figure supplement 1A), and the disjoint expression of canonical markers Krt5 (basal) and Scgb1a1 (club), which enabled us to assign unambiguous cell-type identities to discovered clusters (Figure 3 - figure supplement 1B).

      We agree with the reviewer that all cells, including cells that we classified as “mover” and “nonmover” are induced to move compared to pre-injury as suggested by Figure 2c. However, “mover” and “non-mover” cells differ dramatically in the amplitude and collective directionality of movement. We investigated the movement phenotypes in detail, including high-resolution imaging at shorter time intervals (10 min). We found that the slow “non-movers” had a large circular directionality variance (akin to oscillations), whereas the rapid “movers” moved directionally across the field of view. We quantified this with particle image velocimetry in Figure 2 – figure supplement 3C-D, and we revised the text to provide additional details about this result.

      The reviewer also raises concern about whether the movement is homogeneous enough to account for the variation in the datasets. We used our imaging data to determine the time points in which the mover and non-mover phenotypes varied the most (around 40 hrs post injury) between different regions (Figure 2 - figure supplement 2A, C) but we have also demonstrated that the movement within each region is indeed relatively homogeneous (~200 μm diameter, Movie 1 and Movie 2).

      We acknowledge that the presented data did not eliminate the possibility of another main variation within the dataset. We now perform PCA on the dataset, which confirmed that while the first principal component (PC) is associated with a solitary pulmonary neuroendocrine cell, the second PC is strongly associated with the difference between moving and non-moving cells (p=0.003, Wald test). When analyzing only the basal cells, we find that PC-1 provides a very clean separation and overlaps perfectly with the moving vs non-moving distinction (p<2 x 10-16, Wald test, Figure 3 - figure supplemental 2a). Taken together, with this additional analysis we can confirm that our focus on this behavioral phenotype reflects the main variation within the dataset.

      We appreciate the reviewer’s nuanced question about the single outlier cell. While we do observe a transcriptional phenotype that is clearly distinct, as the reviewer points out, there is a very small degree of overlap between the two cell type clusters visible on the t-SNE plot in Figure 3B. Given that the physical process of movement is a matter of degree, it is possible that this particular cell is simply not moving as much, and thus activating movement-related transcriptional programs to a lower degree. To analyze this question further in response to this question, we analyzed the separability of these groups by training a machine learning (k-nearest neighbor) classifier to distinguish these clusters (new Figure 3 - figure supplement 2b). We found that the groups could be distinguished with a high accuracy of 98.7% (95% CI: 92.7-99.9) using 5 or more of the signature genes that we defined in Figure 3C. This additional analysis we continue to conclude that while the groups have a very small degree of overlap, the moving and non-moving phenotypes are strongly separable.

      We acknowledge the limitations of this approach to groups of cells (see response to Reviewer 1) and both the limitations and advantages of using a tissue rather than cells, and we added these points to the discussion section.

    1. Author Response

      Reviewer 1 (Public Review):

      1) The finding that thalamic activity exhibits a low dimension structure is in my opinion less of a finding, but rather an assumption that motivates the use of dimensionality reduction techniques. When the authors ask (line 101) "whether thalamic task activity exhibits similar low dimensional structure", what is the alternative hypothesis? I think it is a foregone conclusion that with a restricted number of tasks, and the intrinsic smoothness of fMRI activity data, there are always K<<N components that capture 50,75, 90% of the variance. If you had measured the spiking of the entire population of thalamic neurons or increased the threshold to 99%, the structure of activity would be more high dimensional. So I believe you can either frame this as an assumption going in, or you build carefully an alternative hypothesis of what a "high-dimensional" structure would look like. Generating activity data i.i.d would be the simplest case, but given that both signal and measurement noise in fMRI are reasonably smooth, this would be a VERY trivial null hypothesis.

      We thank the reviewer for pointing out this inherent assumption in our analysis. We agree that given the smoothed nature of BOLD signal and the restricted task design we likely cannot effectively test an alternative high dimensional organization hypothesis. We have revised our introduction accordingly and clarify that we use a dimensionality reduction technique with the assumption that we will observe a low dimension structure of thalamic task fMRI data, similar to past fMRI studies that focused on cortical ROIs (line 102). Furthermore, we have revised the discussion section to remove discussion highlighting the low-dimension organization as a novel finding (line 404).

      2) The measure of "task hub" properties that is central to the paper would need to be much better explained and justified. You motivate the measure to be designed to find voxels that are "more flexibly recruited by multiple thalamic activity components", but it is not clear to me at this point that the measure defined on line 634 does this. First, sum_n w_i^2 is constrained to be the variance of the voxel across tasks, correct? Would sum_n abs(w) be higher when the weights are distributed across components? Given that each w is weighted by the variance (eigenvalue) of the component across the thalamus, would the score not be maximal if the voxel only loaded on the most important eigenvector, rather than being involved in a number of components? Also, the measure is clearly not rotational invariant - so would this result change after some rotation PCA solution? Some toy examples and further demonstrations that show why this measure makes sense (and what it really captures) would be essential. The same holds for the participation index for the resting state analysis.

      Please see our response to essential revision point #1.

      3) For the activity flow analysis, the null models (which need to be explained better) appear weak (i.e. no differences across tasks?), and it is no small wonder that the thalamus does significantly better. The Pearson correlations are not overwhelmingly impressive either. To give the reader a feel for how good/bad the prediction actually is, it would be essential that the authors would report noise ceilings - i.e. based on the reliability of the cortical activity patterns and thalamic activity patterns, what correlation would the best model achieve (see King et al., 2022, BioRxiv, as an example).

      Please see our response to essential revision point #4.

      4) Overall it has not been made clear what the RDM analysis adds to the prediction of the actual activity patterns. If you predicted the activity patterns themselves up to the noise ceiling, you would also hit the RDM correctly. The opposite is not the case, you could predict the correct RDM, but not the spatial location of the activity. However, the two prediction performances are never related to each other and it remains unclear what is learned from the latter (less specific) analysis.

      We agree that the utility of the RDM analysis is not clear, and we have removed it from the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper details the creation and data behind the website http://pandemics.okstate.edu/covid19/. The authors attempt to explore if there is a cause and effect between the detection of unusually increased mutation activity in the genomic surveillance databases and subsequent near-term surges in SARS-CoV-2 case numbers.

      Overall the premise is interesting as other than following case numbers reported to health authorities and observing what is happening in another country, there is no reliable way to predict when a surge is going to occur. Unfortunately, the data demonstrate that there was no reliable metric that could be used to predict surge events. Interestingly, the website has issued a "surge alert" currently for the month of September. It will be interesting to observe whether their model indeed has predictive power or whether the current analysis is merely coincidental with the surges but not necessarily predictive of them.

      In this work, we investigated a number of metrics for finding a reliable signal of surge prediction. The commonly used ratio ka/ks or the derivative of ka/ks with respect to time did not provide a reliable metric. However for the same data, ka has provided a fairly robust surveillance signal so far. We believe ka/ks studies provide insights into genome changes, but not as a function of short time periods such as days (at least not in the case of SARS-CoV-2). As the motivation of our work is to provide the community with a genomic surveillance approach in real time, we believe that the current data shows that ka is, at present, a useful and fairly reliable metric.

      As the reviewer mentioned, while this manuscript was being reviewed, we issued a warning on September 7th 2022. Several different types of data (including number of new infections, number of hospitalizations, and COVID19 related deaths) has indicated that our warning was accurate since there was a surge in reported number of cases in September and reached a peak in October. For instance, plots shown in Figure S6 indicate that there was a surge in number of cases around Europe at large, and several individual countries including France, United Kingdom, Germany and Italy. Similarly our earlier warning in June also was followed by surges being reported across many countries and collectively across the world (Figure S5). Therefore, we believe the presented methodology has been validated.

      Reviewer #2 (Public Review):

      In this manuscript, Najer et al., perform a comprehensive bioinformatic analysis of SARS-CoV-2 sequences available from public repositories. Through a comparison with the genome sequence of the original Wuhan 2020 strain, they identify the total accumulation of non-synonymous mutations as a predictor of the evolution of new strains. The manuscript provides data for three structural proteins - spike (S), membrane (M), and envelope (E) proteins, as well as data for the non-structural RNA-dependent RNA polymerase (RDRp) protein that serves as a negative control. However, the predictivity of this approach is most marked only for the Omicron variant, with considerable variation in the predictive power of SARS-CoV-2 proteins for other variants. Focusing on a spike, the method does not detect the alpha variant or delta variant surges, which were mostly driven by changes in spike protein, although the level of sequencing data available for the delta variant might have been less. Notably, although the authors conclude that other parameters such as the ratio of non-synonymous to synonymous mutations or the rate of accumulation of non-synonymous mutations are not predictive, they appear to have similar success in predicting the omicron surge.

      We agree with the reviewer, the case of spike protein during the Alpha surge could have been affected by insufficient number of sequences. In case of Gamma/Delta variants, we did notice changes in the spike and the membrane protein. For the case of Omicron and its various sub-variants, the use of ka provides a reliable signal due to changes in the spike, membrane and envelope proteins.

    1. **Author Response""

      Reviewer #2 (Public Review):

      The work systematically reassesses fungal mi/miRNA-like characteristics and annotation confidence and identifies that many of the loci fail to meet the key points of the methods developed for animal or plant miRNAs. Therefore, the authors establish a set of criteria suitable for the annotation of fungal miRNAs and provide a centralized annotation of identified mi/milRNA hairpin RNAs in fungi based on their established rules.

      Here are some comments and suggestions for the manuscript to be improved:

      1) The title mentions "ancestral links", however, the main context of this paper does not include the evolution of fungal mi/milRNAs or show the origins of conserved mi/milRNAs in fungi. The authors are suggested to consider a more appropriate title for this work.

      Agreed, we have modified our title to include a more fitting description of the outcome of the study:

      “Comprehensive re-analysis of hairpin small RNAs in fungi reveals loci with conserved links”

      2) The work proposes a fungal mi/milRNAs hairpin precursor recovery pipeline with three minimal criteria to annotate fungal mi/milRNA loci, which allows nearly half of the loci to pass these rules. To highlight the innovation of this annotation, it is strongly suggested that the authors compare their established pipeline and criteria for fungi with those used in animal or plant miRNAs in detail, and emphasize the advantages of the established pipeline. A figure showing the established pipeline and detailed parameters is needed.

      We have now included a clear workflow diagram for establishing miRNA annotation records and confidence tiers (Figure 1-supplemental 3). As for the comparison with rules in plants and animals, this is stated in Table S6, where it shows some rules employed by other tools/papers/species. We believe these combined supplementals give a strong overview of our approach and how it differs from rules in other approaches.

      3) The established "standard rules" for fungal mi/milRNA annotation still require more evaluation. It would be better if there is experimental validation to improve confidence.

      Sequencing evidence is generally regarded as the gold-standard of experimental support for identifying and annotating miRNAs (Axtell and Meyers, 2018) though the rules are not clear yet in fungi. We agree that developing a standard-rule-set is a high-priority for identifying complete annotation standards. We had a statement (~ line 290) affirming this need, and have now modified this sentence to highlight the need for a sufficient standard.

      “While this minimal rule-set is useful for filtering the lowest-confidence loci, it is likely not sufficient to form the basis of an annotation and this analysis further confirms the need for a standardized pipeline and set of criteria for miRNA annotation in fungi.”

      To address the question of experimental validation, we have included descriptions of loci with strong-functional support in Table S5, including a section discussing top-tier loci in the discussion, described in the response to reviewer 3.

    1. Author Response

      Reviewer #1 (Public Review):

      By studying the effect of Treg depletion in a CD8+ T cell-dependent diabetes model the group around Ondrej Stepanek described that in the absence of Treg cells antigen-specific CD8+ OT-I T cells show an activated phenotype and accelerate the development of diabetes in mice. These cells - termed KILR cells - express CD8+ effector and NK cell gene signatures and are identified as CD49d- KLRK1+ CD127+ CD8+ T cells. The authors suggest that the generation of these cells is dependent on TCR stimulation and IL-2 signals, either provided due to the absence of Treg cells or by injection of IL-2 complexed to specific antiIL-2 mAbs. In vivo, these cells show improved target cell killing properties, while the authors report improved anti-tumor responses of combination treatments with doxorubicin combined with IL-2/JES6 complexes. Finally, the authors identified a similar human subset in publicly available scRNAseq datasets, supporting the translational potential of their findings.

      The conclusions are mostly well supported, except for the following two considerations:

      We are happy for the positive overall evaluation of our manuscript by both reviewers and we are thankful for their specific insightful comments, which helped us to improve the manuscript.

      1) From Fig. 4A and B it is not conclusively shown, that Tregs limit IL-2 necessary for the expansion of OT-I cells and subsequent induction of diabetes. An IL-2 depletion experiment (e.g. with combined injection of the S4B6 and JES6-1 antibodies) would further strengthen this claim. Along these lines, the authors claim "IL-2Rα expression on T cells can be induced by antigen stimulation or by IL-2 itself in a positive feedback loop [20]. Accordingly, downregulation of IL-2Rα in OT-I T cells in the presence of Tregs might be a consequence of the limited availability of IL-2.". The cited reference 20 did observe CD25 upregulation by IL-2 on T cells but the observed effect might only be caused by upregulation of CD25 on Treg cells, which increases the MFI for the whole T cell population. Did the authors observe significant upregulation of CD25 on effector CD4+ and CD8+ T cells in their experiments with IL-2/S4B6 or IL-2/JES6 treatment?

      We added another reference to support our claim (Sereti, I., et al., Clin Immunol, 2000. 97(3): p. 266-76.). Along this line, we also observed that addition of IL-2 in vitro leads to IL-2Rα upregulation on CD8+ T cells (shown in Fig. 4C), which was IL-2Rα level was lower if Tregs were present. We also observed upregulation of IL-2Rα in vivo upon the stimulation of OT-I T cells with OVA and IL-2ic, which is now shown in the Fig. S6C of the revised manuscript.

      To further explore if Tregs limit expansion of OT-I and diabetes progression via IL-2 limitations, we performed the proposed experiment using a combined injection of S4B6 and JES6-1 anti-IL-2 antibodies. At the beginning, we were skeptical that we could completely block the IL-2 using this approach for the following reasons. First, IL-2 is produced locally in the spleen and lymph nodes and might not be easily accessible for the antibodies for a complete block. Second, IL-2 has a relatively short turnover and is continuously produced, but the half-life of the injected antibodies is unknown, which questions the duration of such a block. Third, it is possible that some IL-2 molecules would bound only to one of the two antibodies, which will make it a hyper-stimulating immune-complex, instead of neutralizing it.

      Anyway, we were curious enough to perform this experiment. We used a condition that based on our experience leads to diabetes manifestation in Tregs depleted, but not in Treg replete mice (10 k OT-I T cells, OVA + LPS immunization). One additional group of Treg-depleted mice received a single dose of S4B6 and JES6-1 anti-IL-2 (200 µg of each antibody per mouse). We observed that this IL-2 blocking delayed, but not prevented the development of diabetes in most animals (Fig. 1 below).

      Overall, we believe that this experiment is rather supporting our conclusions concerning the importance of IL-2, although the effect is only partial. However, we decided not to include this experiment in the manuscript, because we do not have the evidence about how efficient the IL-2 blocking was (see above), which makes the interpretation difficult. Because the reviews and the point-by-point response is public in eLife, we believe that showing the data here is appropriate.

      Figure 1. Role of IL-2 blocking on the development of experimental diabetes. Two independent experiments were performed. Statistical significance was calculated using Log-rank (Mantel-Cox) test for survival, and Kruskal-Wallis test for blood glucose (p-value is shown in italics).

      2) The anti-tumor efficacy of KILR cells is intriguing but currently, it is unclear if it is indeed mediated by KILR cells. Have KILR cells been identified by flow cytometry in the BCL1 and B16F10 models treated with doxorubicin and IL-2/JES6? Were specific KILR cell depletion studies conducted, e.g. with an anti-KLRK1 depleting antibody? Additional experiments addressing these questions would be desirable to further support the authors' claims.

      We are thankful to both reviewers for their similar comments concerning the analysis of CD8+ T cells in the tumor model. Addressing these comments lead to very useful data and significantly improved our manuscript.

      We performed the analysis of splenic CD8+ T cells in the BCL1 leukemia model (spleen is the major site of the leukemic cells in this model). We observed that KLRK1+ T cells represented almost half of CD8+ T cells in mice treated with DOX+IL-2, which was much higher frequency than in the control and DOX-only treated mice. Although not all KLRK1+ cells were bona fide KILR cells, the frequencies of KLRK1+ IL-7R+ and KLRK1+ CD49d- cells were also strongly elevated in the Dox+IL-2ic treated mice. Overall, the survival of DOX+IL-2ic treated mice correlated with the frequencies of KILR T cells and KLRK1+ T cells. Moreover, GZMB was almost exclusively expressed by KLRK1+ T cells. We are showing these data in Fig. 7C and Fig. S7B in the revised manuscript.

      In the B16 melanoma model, we analyzed CD8+ T cells in the spleens and also in the tumors. We observed a huge population of KLRK1+ GZMB+ CD8+ T-cell population in the spleen of DOX+IL-2ic-treated mice, but not in the untreated or DOX-only treated mice (Fig. 7F). Both KLRK1+ CD49d+ and KLRK1+ CD49d- CD8+ T cells were substantially more frequent in the DOX+IL-2ic-treated, but not in the untreated or DOX-only treated mice (Fig. S7F). In the tumor, the KLRK1+ CD49d- CD8+ T cells were found at large numbers only in the DOX+IL-2ic-treated mice (Fig. 7G). Moreover, these KLRK1+ CD49d- CD8+ T cells expressed high levels of IL-7R and GZMB only in DOX+IL-2ic-treated, but not in untreated and DOX-only treated mice (Fig. 7H).

      We believe that these new data provide evidence that the combination of immunogenic chemotherapy with IL-2 treatment induced KILR cells in the spleens and in the tumors and that this correlates with the better survival.

      Because the majority of non-naïve CD8+ T cells (and vast majority of GZMB+ CD8+ T cells) in the spleens and tumors of the tumor-bearing mice treated with DOX+IL-2ic were KLRK1+ and because we have shown that the protective effect of the DOX+IL-2ic therapy is largely CD8+ T cell-dependent, we did not find it essential to perform the depletion of KLRK1+ T-cells. We believe that it is almost inevitable that the depletion of KLRK1+ T cells would lead to increased tumor growth as it would probably deplete the majority of antigenspecific CD8+ T cells, mimicking the overall CD8+ T cell depletion. Moreover, we do not have this protocol established.

      Reviewer #2 (Public Review):

      In this study, the authors determine the superior cell killing abilities of KLRK1+ IL7R+ (KILR) CD8+ effector T cells in experimental diabetes and tumor mouse model. They also provide evidence that Tregs suppress the formation of this previously uncharacterized subset of CD8+ effector T cells by limiting IL-2.

      Strength and Limitation

      This study focuses on the relationship between Tregs and CD8+ T cells. They used different experimental diabetes mouse models to reveal that Tregs suppress the CD8+ effector T cells by limiting IL-2. They also found a unique subset of KLRK1+ IL7R+ (KILR) CD8+ effector T cells with superior cell killing abilities through single-cell sequencing, but killing abilities could be inhibited by Tregs. They also tested their theory in in vivo tumor model. The data, in general, support the conclusions; however, some issues need to be fully addressed, as detailed below.

      We are happy for the positive overall evaluation of our manuscript by both reviewers and we are thankful for their specific insightful comments, which helped us to improve the manuscript.

      1) This study used the concentration of urine glucose as the standard for diabetes ({greater than or equal to} 1000 mg/dl for two consecutive days). However, multiple reasons may lead to a high level of urine glucose. As a type I diabetes mouse model, authors could use immunohistological analysis of islet to show the proportion of T cells and islet cells in islet, which can display the geographic distribution of immune cells, severity and histology structure of damaged pancreas islet directly. If possible, different subsets of immune cells, especially CD4 vs CD8+ cells should be stained for their location.

      We added the histological examination of the pancreas in control, DEREG-, and DEREG+ mice using contrast H&E staining and immuno-fluorescence (Fig. 1D-E in the revised manuscript). We observed that the high glucose and blood levels are preceded by the destruction of the pancreatic islets (morphology and decreased insulin production) as well as by the infiltration of the islets with immune cells including CD4+ and CD8+ T cells.

      2) This article shows that KILR effector CD8+ T cells have strong cytotoxic properties. However, they do not describe the potential proliferation ability vs apoptosis of this subset from islets.

      We analyzed the proliferation (KI67 expression) and apoptosis (Annexin V, cleaved Caspase 3) in T cells isolated from the pancreas of DEREG- and DEREG+ mice on day 4 after the induction of diabetes using flow cytometry (Figure 2 below). We did not observe any differences between DEREG- and DEREG+ mice or among different subsets of OT-I T cells in the DEREG+ mice. Essentially, all T cells were proliferative (KI67+) and there was a very low percentage of Annexin V or cleaved Caspase 3 positive cells.

      Figure 2. Lymphocytes were isolated from the pancreas of DEREG- RIP.OVA and DEREG+ RIP.OVA mice on day 4 after the induction of diabetes, and analyzed using flow cytometry. Two independent experiments were performed. Gated on OT-I T cells. Top: proliferation rate based on Ki-67 staining. Representative histogram and MFI (median is shown). Middle: Apoptosis rate based on Annexin V staining. Representative histogram shows Annexin V staining in three populations of OT-I T cells from DEREG+ mouse (“AE” - CD49d+ KLRK1-, “++” - CD49d+ KLRK1+, KILR - CD49d- KLRK1+), total OT-I T cells from DEREG-, and a positive control: WT CD8+ T cells treated with hydrogen peroxide. Middle right: Percentage of Annexin V+ cells and MFI (median is shown). Bottom: Apoptosis rate based on cleaved Caspase 3 staining. Representative dot plots show cleaved Caspase 3 staining of OT-I T cells from DEREG+, DEREG-, and a positive control: WT CD8+ T cells treated with hydrogen peroxide. Bottom right: percentage of cleaved Caspase 3+ cells (median is shown).

      However, we found question concerning proliferation and apoptosis of KILR cells interesting and worth further investigation. For this reason, we assessed the proliferation, survival, and phenotypic stability of naïve, KILR, and effector T cells by their competitive transfer into CD3ε-/- mice. The phenotype of all these three subsets remained stable for 4 days (Fig. 6F), documenting that KILR cells are not just a very transient stage. Moreover, the KILR cells were ~2 fold more abundant then effector cells 3 days after their 1:1 cotransfer into CD3ε-/- mice (Fig. 6G, Fig. 6SE). This was probably caused by their slight advantages in both proliferation and survival (Fig. 6SF-G).

      3) Figure 7 shows that the antitumor efficacy of IL-2 depends on CD8+ T cells. But in this part, there is no data to show the change of KLRK1+ IL7R+ CD8+ effector T cells in tumor tissue. Therefore, the article needs to add more data to verify that IL-2 enhances antitumor ability via KLRK1+ IL7R+ CD8+ effector T cells.

      We are thankful to both reviewers for their similar comments concerning the analysis of CD8+ T cells in the tumor model. Addressing these comments lead to very useful data and significantly improved our manuscript.

      We performed the analysis of splenic CD8+ T cells in the BCL1 leukemia model (spleen is the major site of the leukemic cells in this model). We observed that KLRK1+ T cells represented almost half of CD8+ T cells in mice treated with DOX+IL-2, which was much higher frequency than in the control and DOX-only treated mice. Although not all KLRK1+ cells were bona fide KILR cells, the frequencies of KLRK1+ IL-7R+ and KLRK1+ CD49d- cells were also strongly elevated in the Dox+IL-2ic treated mice. Overall, the survival of DOX+IL-2ic treated mice correlated with the frequencies of KILR T cells and KLRK1+ T cells. Moreover, GZMB was almost exclusively expressed by KLRK1+ T cells. We are showing these data in Fig. 7C and Fig. S7B in the revised manuscript.

      In the B16 melanoma model, we analyzed CD8+ T cells in the spleens and also in the tumors. We observed a huge population of KLRK1+ GZMB+ CD8+ T-cell population in the spleen of DOX+IL-2ic-treated mice, but not in the untreated or DOX-only treated mice (Fig. 7F). Both KLRK1+ CD49d+ and KLRK1+ CD49d- CD8+ T cells were substantially more frequent in the DOX+IL-2ic-treated, but not in the untreated or DOX-only treated mice (Fig. S7F). In the tumor, the KLRK1+ CD49d- CD8+ T cells were found at large numbers only in the DOX+IL-2ic-treated mice (Fig. 7G). Moreover, these KLRK1+ CD49d- CD8+ T cells expressed high levels of IL-7R and GZMB only in DOX+IL-2ic-treated, but not in untreated and DOX-only treated mice (Fig. 7H).

      We believe that these new data provide evidence that the combination of immunogenic chemotherapy with IL-2 treatment induced KILR cells in the spleens and in the tumors and that this correlates with the better survival.

      4) It is unclear why the authors chose Dox to combine with IL-2/JES6. The authors should provide a more rational introduction to bridge such a combination. Authors should also explain the reason why there is no antitumor effect of IL-2/JES6 treatment alone.

      The experiments with OT-I mice showed that the formation of KILR cells required both the antigenic stimulation and IL-2 signals. We believe that there is only very week antigenic stimulation by the tumor itself. For this reason, we combined the treatment with the chemotherapy Doxorubicin, which is known to induce immunogenic cell death of the tumor cells (e.g., Casares et al. 2005, PMID: 16365148). We believe that doxorubicin induces the death of (some) tumor cells and the release and presentation of their tumorspecific antigens. Without it, the tumor are simply too “cold” to induce sufficient T-cell response. We emphasized this in the revised version of the manuscript.

      Importantly, some of us observed a similar effect of IL-2ic in a combination with check-point blockade therapy (without chemotherapy) in a different tumor model, which documents that the chemotherapy is not essential for this effect (unpublished data).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors are trying to determine how time is valued by humans relative to energy expenditure during non-steady-state walking - this paper proposes a new cost function in an optimal control framework to predict features of walking bouts that start and stop at rest. This paper's innovation is the addition of a term proportional to the duration of the walking bout in addition to the conventional energetic term. Simulations are used to predict how this additional term affects optimal trajectories, and human subjects experiments are conducted to compare with simulation predictions.

      I think the paper's key strengths are its simulation and experimental studies, which I regard as cleverly-conceived and well-executed. I think the paper's key weakness is the connection between these two studies, which I regard as tenuous for reasons I will now discuss in detail.

      The Title asserts that "humans dynamically optimize walking speed to save energy and time". Directly substantiating this claim would require independently manipulating the (purported) energy and time cost of walking for human subjects, but these manipulations are not undertaken in the present study. What the Results actually report are two findings:

      1. (simulation) minimizing a linear combination of energy and time in an optimal control problem involving an inverted-pendulum model of walking bouts that (i) start and stop at rest and (ii) walk at constant speed yields a gently-rounded speed-vs-time profile (Fig 2A);

      2. (experiment) human subject walking bouts that started and stopped at rest had self-similar speed-vs-time profiles at several bout lengths after normalizing by the average duration and peak speed of each subject's bouts (Fig 4B).

      If the paper established a strong connection between (1.) and (2.), e.g. if speed-vs-time trajectories from the simulation predicted experimental results significantly better than other plausible models (such as the 'steady min-COT' and 'steady accel' models whose trajectories are shown in Fig 2A), this finding could be regarded as providing indirect evidence in support of the claim in the paper's Title. Personally, I would regard this reasoning as rather weak evidence - it would be more accurate to assert 'brief human walking bouts look like trajectories of an inverted-pendulum model that minimize a linear combination of energy and time' (of course this phrasing is too wordy to serve as a replacement Title -- I am just trying to convey what assertion I think can be directly substantiated by the evidence in the paper). But unfortunately, the connection between (1.) and (2.) is only discussed qualitatively, and the other plausible models introduced in the Results are not revisited in the Discussion. To my naive eye, the representative 'steady min-COT' trace in Fig 2A seems like a real contender with the 'Energy-Time' trace for explaining the experimental results in Fig 4, but this candidate is rejected at the end of the third-to-last paragraph in the 'Model Predictions' subsection of Results based on the vague rationale that is never revisited.

      We have addressed most of this comment above, but respond here regarding Fig. 4. The argument against steady min-COT should also point out the peak speed. The Results have been revised thus: “In contrast to the min-COT hypothesis, the human peak speeds increased with distance, many well below the min-COT speed of about 1.25 m/s. The human speed trajectories did not resemble the trapezoidal profiles of the steady min-COT hypothesis for all distances, nor the triangular profiles of steady acceleration.”

      An additional limitation of the approach not discussed in the manuscript is that a fixed step length was prescribed in the simulations. The 'Optimal control formulation' subsection in the Methods summarizes the results of a sensitivity analysis conducted by varying the fixed step length, but all results reported here impose a constant-step-length constraint on the optimal control problem. Although this is a reasonable modeling simplification for steady-state walking, it is less well-motivated for the walking bouts considered here that start and stop at rest. For instance, the representative trial from a human subject in Figure 8 clearly shows initiation and termination steps that differ in length from the intermediate steps (visually discernable via the slope of the dashed line interpolating the black dots). Presumably different trajectories would be produced by the model if the constant-step-length constraint were removed. It is unclear whether this change would significantly alter predictions from either the 'Energy-Time' or 'steady min-COT' model candidates, and I imagine that this change would entail substantial work that may be out of scope for the present paper, but I think it is important to discuss this limitation.

      This is addressed elsewhere (Essential Revisions 2), but we explain more here. One of the parameter studies included step length increasing with speed according to the human preferred relationship. This is included in Fig. 3, and so we concluded that variable step lengths are not critical to the speed trajectories. A related assumption is that the energetic cost of modulating step length/frequency is small compared to the step-to-step transition cost. We believe that humans expend substantial energy for both costs, but that the overall cost of walking is still dominated by step-to-step transitions.

      With my concerns about the paper's framing and through-line noted as above, I want to emphasize that I regard the computational and empirical work reported here to be top-notch and potentially influential. In particular, the experimental study's use of inexpensive wearable sensors (as opposed to more conventional camera-based motion capture) is an excellent demonstration of efficient study design that other researchers may find instructive. To maximize potential impact, I encourage the authors to release their data, simulations, and details about their experimental apparatus (the first two I regard as essential for reproducibility - the third a selfless act of service to the scientific community).

      I think the most important point to emphasize is that the bulk of prior work on human walking has focused on steady-state movement - not because of the real-world relevance (since one study reports 50% of walking bouts in daily life are < 16 steps as summarized in Fig 1B), but rather because steady walking is a convenient behavior to study in the laboratory. Significantly, this paper advances both our theoretical and empirical understanding of the characteristics of non-steady-state walking.

      It is also significant to note the relationship between this study, where time was incorporated as an additive term in the cost of walking, with previous studies that incorporated time in a multiplicative discount in the cost of eye and arm movements. There is an emerging consensus that time plays a key role in the generation of movement across the body - future studies will discern whether and when additive or multiplicative effects dominate.

      We have acknowledged this in a brief sentence: “Indeed, we have found a similar valuation of time to explain how reaching durations and speed trajectories vary with reaching distance (Wong et al., 2021).” As an aside, in that reference we measured metabolic cost of cyclic arm reaching, combined it with a linear time cost, and predicted reaching durations vs. distance and bell-shaped hand speed trajectories. Others (Shadmehr et al. Curr Biol. 2016) have proposed multiplicative (hyperbolic) temporal discounting to explain durations, but the cost formulas are not dynamical, and cannot produce trajectories. We agree with reviewer’s point, but we think the evidence for hyperbolic discounting is not strong. Linear time costs are simpler and work at least as well. This is of great interest to us, but we didn’t discuss beyond the brief mention above, because we fear it is too far afield.

      Reviewer #2 (Public Review):

      This paper provides a novel approach to quantifying the tradeoff between energetic optimality during walking and the valuation of time to travel a given distance. Specifically, the authors investigated the relationships between walking speed trajectories, distance traveled, and the valuation of (completion) time. Time has been proposed as a potential factor influencing movement speed, but less is understood about how individuals balance energetic optimality and time constraints during walking. The authors used a simple, sagittal-plane walking model to test competing hypotheses about how individuals optimize gait speed from gait initiation to gait termination. Their approach extends literature in the space by identifying optimal gaits for shorter, partially non-steady speed walking bouts.

      The authors successfully evaluated three competing walking objectives (constant acceleration, minimum cost of transport at steady speed, and the energy-time objective), showing that the energy-time objective best matched experimental data in able-bodied adults. Although other candidate objectives may exist, the paper's findings provide a likely-generalizable explanation of how able-bodied humans select movement strategies that encompass studies of steady-speed walking.

      Overall, this paper provides a foundation for future studies testing the validity of the energy-time hypothesis for human gait speed selection in able-bodied and patient populations. Extensions of this work to patient populations may explain differences in walking speed during clinical assessments and provide insight into how individual differences in time valuation impact performance on assessments. For example, understanding whether physical capacity or time valuation (or something comparable) better explains individual differences in walking speed may suggest distinct approaches for improving walking speed.

      Strengths:

      The authors presented a compelling rationale for the tradeoffs between energetic optimality and time and their results provide strong support for a majority of their conclusions. In particular, significant reductions in the variance of experimental speed trajectories provides good support for the scaling of speeds across individuals and the plausibility of the energy-time hypothesis. Comparison to theoretical (model-based) reductions across difference time valuation (cT) parameters would further enhance confidence in the practical significance of the variance reductions. Further, while additional work is needed to determine the range of "normal" valuations of time, the authors present experimental ranges that appear reasonable and are well explained. The computational and analytical methods are rigorous and are supported by the literature. Overall, the paper's conclusions are consistent with experimental and computational results.

      The introduction of a model-based analytical approach to quantify the effects of time valuation of walking could generalize to test other cost functions, populations, or locomotion modes. Further, models of varying complexity could be implemented to test more individualized estimates of metabolic cost, ranging from 3D dynamic walking models (Faraji et al., Scientific Reports, 2018) or physiologically-detailed models (Falisse et al., Journal of The Royal Society Interface. 2019). The relatively simple set of analyses used in this paper is consistent with prior literature and should generalize across applications and populations.

      The authors justified simplifications in the analysis and addressed major limitations of the paper, such as using a fixed step length in model predictions, using a 2D model, and basing energy estimates on the mechanical work of a simple model. It is unlikely that the paper's conclusions would change given additional model complexity. For example, a 3D walking model would need to control frontal plane stability. However, in able-bodied adults, valuation of frontal-plane stability during normal walking would not likely alter the overall shape of the predicted speed profiles.

      Weaknesses:

      The primary weakness of this work is that alternative objectives may provide similar speed profiles and thus be plausible objectives for human movement. For example, the authors tested an objective minimizing the steady-speed cost of transport. This cost function is consistent with the literature, but (as predicted) unlikely to explain acceleration and deceleration during gait. An objective more comparable to the energy-time hypothesis would be to minimize the net energy cost over the entire bout, including accelerations and decelerations. This may produce results similar to the energy-time hypothesis. However, a more complex model that incorporates non-mechanical costs (e.g., cost of body weight support) may be needed to test such objectives. Therefore, the energy-time hypothesis should be considered in the context of a simple model that may be incapable of testing certain alternative hypotheses.

      We have addressed some of this comment in Essential Revisions 4.

      We are unsure what is meant by “net energy over the entire bout, including accelerations and decelerations.” Our hypothesis uses total (gross) energy over the entire bout, and already includes accelerations and decelerations. If “net” refers to the customary definition of metabolic energy minus resting, then it differs from our gross cost (Fig. 6A) only in the amount of constant offset, namely resting cost. Removing the offset is equivalent to a decrease in C_T. As shown in Fig. 3, this would reduce peak speeds magnitudes but not change the shape of the speed, peak speed, and duration patterns. There is also another interpretation where the cost of walking includes only net energy, and the cost of time includes the resting metabolic rate (Fig. 6C). This interpretation yields the same predictions, the only difference is whether resting rate is treated as an energy or a time cost. We have not made further changes, because we are unsure what the reviewer meant. The difference between net and total is at most one of degree, not of qualitatively different behavior.

      We do not address the proposed “cost of body weight support” because we are unsure of the definition. There is a hypothesis by Kram & Taylor (1990) that defines a metabolic cost rate proportional to body weight divided by ground contact time. It is unclear if this is what reviewer is referring to, so we did not include it in the manuscript. However, IF this is what reviewer means, we do not consider the Kram & Taylor (“K&T”) cost to be a viable hypothesis for computational models. It is a correlation observed from data, which is inadequate as a model, for several reasons. First, in a model optimization, it leads to absurd predictions, because metabolic cost could then be reduced simply by increasing stance (contact) time. A model could do so simply by walking with very long double support phases, or running with a very brief aerial phase, both of which people clearly do not do. In walking, extended double support durations result in much higher metabolic cost (Gordon et al., APMR 2009). Models must operate quite literally on whatever objective they are given, and here, a literal interpretation of K&T makes absurd predictions.

      Another issue with the K&T cost is that it is not mechanistic. A mechanistic model is concerned with the forces and work performed by an actuator such as muscle. Muscles experience forces far greater than body weight, not captured by the K&T cost. Of course, overall cost for animal locomotion is roughly proportional to body weight, but what a model needs is a cost associated with its control inputs, e.g. actuator forces.

      We have also examined the K&T hypothesis in previous publications. In Schroeder & Kuo (Plos Comp Biol 2021), we used a simple model of running that minimizes an energetic cost dominated by mechanical work. Even though the model has no cost similar to K&T, its predicted metabolic cost is correlated with the K&T cost. Correlation does not imply causation, which is known in this model.

      We have also examined the K&T hypothesis in experimental data. In Riddick & Kuo (Sci Rep 2022), we examined human data and found that there are many variables that correlate quite well with metabolic cost, including the K&T correlate. We use human data to show how mechanical work could explain metabolic cost, and even if it does, the K&T cost appears as a correlate. In our interpretation, both model and data that experience an energetic cost proportional to mechanical work may have a number of variables correlated to energy cost. Those correlates need not have any causal influence.

      There are, of course, many similar correlates that could be or have been proposed to explain the metabolic cost of running. Most such correlates are not operational enough to work in a model, and it is also difficult to predict what a reader might consider plausible, even if we do not.

      We agree with this statement: “the energy-time hypothesis should be considered in the context of a simple model that may be incapable of testing certain alternative hypotheses.” In fact, in Discussion of limitations we listed other potential factors (e.g. forced leg motion, stability, 3D motion), and stated “We did not explore more complex models here, but would expect similar predictions to result if similar, pendulum-like principles of work and energetic cost apply.” We had also cited other models that include such factors and are compatible with the step-to-step transition concept. Finally, we already stated, “the Energy-Time hypothesis should be regarded as a subset of the many factors that should govern human actions, rendered here in a simple but quantitative form.” We believe this is already aligned with reviewer’s comment.

      An experimental design involving an intervention to perturb the valuation of time would provide stronger support for time being a critical factor influencing gait speed trajectories. The authors noted this limitation as an area of future work.

      While the results are compelling, the limited sample size and description of participants limit the obvious generalizability of the results. Older adults tend to have higher metabolic costs of walking than younger adults, which may alter the predicted time-energy relationships (Mian OS, et al., Acta physiologica. 2006). As noted in the introduction, differences in walking speeds have been observed in different living environments. General information on where participants lived (city, small town, etc...) may provide readers with insight into the generalizability of the paper's conclusions. Additionally, the experimental results figures show group-level trends, but individual-specific trends and the existence of exceptional cases are unclear.

      We wish to defend the “limited sample size.” The present sample size was (in our opinion) sufficient to test the hypothesis, and we have reported confidence intervals and other statistics where appropriate. (As always, it is up to the individual reader to decide whether they are convinced or not.) It is true that the data may be insufficient for other purposes, but we cannot anticipate or address all other purposes.

      We appreciate the relevant connection to aging. We have added to Discussion, “We do not know whether that family [of trajectories] also applies to older adults, who prefer slower steady speeds and expend more energy to walk the same speed (Malatesta, 2003). Perhaps an age-related valuation of time might explain some of the differences in speed.”

      We agree about the participants, and have added “Subjects were recruited from the community surrounding the University of Calgary; the city has a moderately affluent population of about 1.4 M, with a developed Western culture.”

      No specific reviewer recommendation was made about individual-specific trends, but there are several indicators already included in the manuscript. First, all trials from all subjects are shown in Fig. 4A. Any truly exceptional cases should be visible as substantial deviations from the group. Second, the normalization by peak speed in Fig. 4B shows how individuals tend to be fairly consistent in their preferred speeds, in that shorter and longer bouts of an individual are consistent with each other, even if some walk faster than others. Third, this observation is analyzed more quantitatively by the reduction in standard deviations with normalization (Results). Fourth, we will provide a data repository with all the data, to allow readers to inspect individuals more carefully (Data availability statement).

      The authors' interpretation of clinical utility is vague and should be interpreted with caution. A simple pendulum-based walking model is unlikely to generalize to patient populations, whose gait energetics may involve greater positive and negative mechanical work (Farris et al., 2015; Holt et al., 2000). Additionally, the proposed analytical framework based on mechanical work as a proxy for the metabolic cost may not generalize to patient populations who have heterogeneous musculotendon properties and increased co-contraction (e.g., children with cerebral palsy; Ries et al., 2018). Consequently, the valuation of time for an individual could be incorrectly estimated if the estimates of metabolic cost were inaccurate. Therefore, as the authors noted for their able-bodied participants, more precise measures of metabolic rates will be critical for translating this work into clinical settings.

      We agree, and did not intend to say that clinical populations must walk the same way, rather that the Normal patterns could be used as a basis of comparison. To make this clearer, we have amended the Discussion of clinical implications (new text emphasized): “it may be possible to predict the duration and steady speed for another distance, referenced from a universal family of walking trajectories. We have identified one such family that applies to healthy individuals with pendulum-like gait. Of course, some clinical conditions might be manifested by a deviance from that family, perhaps in the acceleration or deceleration phases, or in how the trajectories vary with distance. If quantified, such deviance might prove clinically useful… the characterization of distance-dependent speed trajectories can potentially provide more information than available from steady speed alone.”

      We agree that the valuation of time can be inaccurate if the metabolic cost is inaccurate. That is why we did not make a precise estimate of the valuation. We have amended the text to help clarify that our rough estimates are based on previous data. In addition, our general scientific intent is to reveal behavioral sensitivities, for example of walking duration to bout distance, as opposed to absolute numerical quantities.

    1. Author Response

      Reviewer #2 (Public Review):

      One other major concern I have regards the conclusion that the participants in these studies use an additive rather than a multiplicative rule to integrate the risk information. The additive rule is problematic in general because it fails to predict the reversal in the effect of probability on payoffs when the payoffs change sign. More specifically, increasing the probability of winning increases the probability of choosing an option when the payoff is positive, but the effect reverses when the payoff is negative. One needs to impose some pretty ad hoc assumptions to make the additive model account for this fundamental interaction between probability and payoff. Of course, the experiments reported here did not include negative payoffs, and so didn't run into this problem. In fact, when the payoffs are positive, it is possible to transform the multiplicative model to an additive model by a log transform. This transformation is only possible for the simple type of gamble investigated in this manuscript - a single amount to win with some probability of winning, otherwise win or lose nothing. If the gambles involved more than one outcome, then the theorist needs to deal with a sum of products and the log transform is no longer possible. For these reasons I am very skeptical about the general application of a summation rule for probability and value in risk choice. The authors do address this issue to some extent. They point out the abundance of other research supporting a multiplicative rule, and they speculate that the additive rule may have occurred within the restrictions of this special situation. The latter discussion is a good start, but I suggest that the authors discuss this fundamental issue in more depth.

      Thank you for this very insightful comment. We have now included more in-depth discussions about the decision rules (multiplicative vs. additive) in our Discussion, in which we have absorbed and reflected many of the insights offered by Reviewer #2.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper tests the hypothesis that 1/f exponent of LFP power spectrum reflects E-I balance in a rodent model and Parkinson's patients. The authors suggest that their findings fit with this hypothesis, but there are concerns about confirmation bias (elaborated on below) and potential methodological issues, despite the strength of incorporating data from both animal model and neurological patients.

      First, the frequency band used to fit the 1/f exponent varies between experiments and analyses, inviting concerns about potentially cherry-picking the data to fit with the prior hypothesis. The frequency band used for fitting the exponent was 30-100 Hz in Experiment 1 (rodent model), 40-90 Hz in Experiment 2 (PD, levodopa), and 10-50 Hz in Experiment 3 (PD, DBS). Ad-hoc reasons were given to justify these choices, such as " to avoid a spectral plateau starting > 50 Hz" in Experiment 3. However, at least in Experiment 3 (Fig. 3), if the frequency range was shifted to 1-10 Hz, the authors would have uncovered the opposite effect, where the exponent is smaller for DBS-on condition.

      We agree that parameter choice is crucial, in particular, choice of the fitting range. In addition to the 40-90 Hz range (Figure 2C), we have performed aperiodic fitting for five other frequency ranges to test to what extent the reported results are sensitive to the selected frequency range (Figure S2A). This analysis showed that the results are robust when a broad frequency range from 30 to 95 Hz was chosen, which is consistent with what has been suggested by Gao et al., 2017 to make inferences on the E/I ratio.

      Accordingly, we have now repeated the analyses for the animal data with the same fitting range used for the ON-OFF medication comparison in humans. Along with Figure S2A where different frequency ranges were tested for data used in Figure 2, this shows that the results in Figure 1 and 2 hold up with higher aperiodic exponents when STN spiking is low and vice versa. Therefore, a broad fitting range from 30 to 90 Hz (excluding harmonics of mains interference) generates consistent results for both human and animal data.

      We opted against a fitting range from 1-10 Hz because of two restraints highlighted in Gerster et al., 2022. First, a fitting range starting at 1 Hz could have a larger y-intercept due to the presence of low-frequency oscillations. This could lead to a larger aperiodic exponent and could be misinterpreted as stronger neural inhibition. Therefore, the lower fitting bound should be chosen to best avoid known oscillations in the delta/theta range (Gerster et al., 2022). Second, frequencies should be chosen to avoid oscillations crossing fitting range limits. In Figure 3A, oscillations in the theta/alpha band both ON and OFF stimulation would complicate parameterisation and would likely result in spurious fits.

      We also tested the effect of changing the peak threshold, peak width limits and the aperiodic fitting mode on FOOOF parameterisation. Increasing and decreasing the peak threshold from its default value (at 2 standard deviations) did not change results (Figure S2B). Similarly, adapting the peak width limits did not affect the exponent difference between medication states (Figure S2C). Finally, choosing the ‘knee’ mode instead of ‘fixed’ resulted in fundamentally different aperiodic fits that did not differ anymore with medication (Figure S2D). This is most likely a consequence of the near linear PSD in log-log space from 40 to 90 Hz (Figure 2B). If there is no bend in the PSD, the FOOOF algorithm will be forced to assign a ‘random’ knee and the aperiodic fit will then mostly reflect the slope of the spectrum above the knee point.

      Second, there are important, fine-grained features in the spectra that are ignored in the analyses, which confounds the interpretation.

      One salient example of this is Fig. 2, where based on the plots in B, one would expect that the power of beta-band oscillations to be higher in the Med-On condition, as the oscillatory peaks rise higher above the 1/f floor and reach the same amplitude level as the Med-OFF condition (in other words, similar total power is subtracted by a smaller 1/f power in the Med-ON condition). But this impression is opposite to the model-fitting results in C, where beta power is lower in the Med-ON condition.

      We agree that PSDs over a broad frequency range (e.g. 5-90 Hz) typically do not have a single 1/f property. Instead, there can be multiple oscillatory peaks and ‘knees/bends’ in the aperiodic component. For these cases, fitting should be performed using the knee mode. To extract periodic beta power, we parameterise the PSD between 5 and 90 Hz and select the largest oscillatory component between 8 and 35 Hz (this range was extended to include the large oscillatory peaks in hemispheres 27 and 28 at ~ 10 Hz, see Figure R1). We now use the knee mode, to model the aperiodic component between 5 and 90 Hz when periodic beta power is calculated (see our previous comments). Figure R1 provides an overview of all PSDs ON and OFF medication, the aperiodic fits (5-90 Hz (knee) and 40-90 Hz (fixed)) and the detected beta peaks. In spite of this modification in our pipeline, periodic beta power is still larger OFF medication (Figure 2C), in keeping with previous studies (Kim et al., 2022; Kühn et al., 2006; Neumann et al., 2017; Ray et al., 2008). We acknowledge the reviewer’s point that the average spectra in Figure 2B are misleading in that respect and for clarity provide here all 30 spectra in both conditions. Note that the calculation of aperiodic exponents between 40 and 90 Hz is not affected by this change in our pipeline. Figures 2B, D+E were revised accordingly.

      We have repeated the analysis of our animal data using the ‘knee mode’ with a fitting range from 30 to 100 Hz. However, using the knee mode did not improve the goodness of fit or fitting error and, in fact, made them slightly worse (Figure S5). Based on this, we think the fixed mode would provide a more holistic model for the PSDs used in this analysis. We have now added this comparison in Figure S5 to justify the choice of the fixed mode.

      Figure R1. PSDs from all 30 hemispheres ON and OFF medication. Aperiodic fits are shown between 5-90 Hz (knee mode), which was used to calculate the power of beta peaks, and between 40-90 Hz (fixed mode), which was used to estimate the aperiodic exponent of the spectrum.

      Another example is Fig. 1C, where the spectra for high and low STN spiking epochs are identical between 10 and 20 Hz, and the difference in higher frequency range could be well-explained by an overall increase of broadband gamma power (e.g. as observed in Manning et al., J Neurosci 2012, Ray & Maunsell PLoS Biol 2011). This increase of broadband gamma power is trivially expected, as broadband gamma power is tightly coupled with population spiking rate, which was used to define the two conditions.

      We agree with the reviewer that in Figure 1C, high and low STN spiking states could well be separated by average gamma power (Figure 1E), too. However, the difference of aperiodic exponents is more prominent between both conditions (Figure 1D+E, based on p-values). What is more, in human LFP data recorded from clinical macroelectrodes, medication states can be reasonably well distinguished using the aperiodic exponent between 40-90 Hz (Figure 2C), but average gamma power does not separate both states (Figure S3A). This suggests that the aperiodic exponent reflects more than just power differences in the high gamma regions. In addition, power changes do not inevitably change the aperiodic exponent and vice versa as elaborated in (Donoghue et al., 2020).

      Manning et al., 2009 show that the power spectrum is shifted to higher power values at all observed frequencies (2-150 Hz) as firing rates increase. As the reviewer points out, power spectra of our data are almost identical between 10-20 Hz (despite the marked spiking differences) and only drift apart from > 20 Hz (Figure 1C). This is a relevant difference between our study and Manning et al., 2009 and suggests that power differences in the gamma range are not solely explained by differences in spiking. This is confirmed when cortical activity at different spikes/sec is modelled (Miller et al., 2009). The entire spectrum is shifted to higher power values if spiking rates increase.

      Ray & Maunsell, 2011 reported low (30-80 Hz) and high (> 80 Hz) gamma activity in the macaque visual cortex, with a positive correlation between spiking activity and high gamma activity. However, activities in the low gamma range (30-80 Hz), which largely overlaps with the frequency range in our study, does not necessarily correlate with firing rates.

      In conclusion, the link between gamma power and spiking activity is not as strong as alluded. Even if the change in spiking activities can lead to changes of both gamma power and the aperiodic exponent, the aperiodic exponent would still constitute a measure to separate E/I levels and medication states.

      The above consideration also speaks to a major weakness of the general approach of considering the 1/f spectrum a monolithic spectrum that can be captured by a single exponent. As the authors' Fig. 1C shows, there are distinct frequency regions within the 1/f spectrum that have different slopes. Indeed, this tripartite shape of the 1/f spectrum, including a "knee" feature around 40-70 Hz which is well visible here, was described in multiple previous papers (Miller et al., PLoS Comput Biol 2009; He et al., Neuron 2010), and have been successfully modeled with a neural network model using biologically plausible mechanisms (Chaudhuri et al., Cereb Cortex, 2017). The neglect of these fine-grained features confounds the authors' model fitting, because an overall increase in the broadband gamma power - which can be explained straightforwardly by the change in population firing rates - can result in the exponent, fit over a larger spectral frequency region, to decrease. However, this is not due to the exponent actually changing, but the overall increase of power in a specific sub-frequency-region of the broadband 1/f activity.

      We have now used the knee mode for aperiodic fits between 5 and 90 Hz when periodic beta power is calculated. We agree that this broad frequency range is unlikely to have a single 1/f component.

      We have also repeated the analysis of our animal data using the knee mode for aperiodic fits between 30 and 100 Hz (Figure S5). However, the goodness of fits had barely changed. In fact, the R2 and error become slightly worse. In addition, the knee parameter complicates interpretation of the aperiodic exponent and has to be considered along with the knee frequency. What is more, we do not see this bend around 40-70 Hz in all subjects. We show PSDs of representative LFP channels in Figure R2 and need to assert that the knee around 40-70 Hz is not a robust finding in our data set. Therefore, we chose the fixed mode for parameterisation within this frequency band.

      Please see our answer to the previous comment regarding the link between broad gamma power and changes in population firing rates.

      Figure R2. PSDs of representative PSD channels for each animal (data used in Figure 1C). The knee around 40-70 Hz is not a robust finding in all PSDs.

    1. Author Response

      Reviewer #3 (Public Review):

      Argenty et al. investigated the role of Lissencephaly gene 1 (LIS1), a dynein-binding protein, in thymic development and T cell proliferation. They find that LIS1 is essential for the early stages of T and B cell development, and demonstrate that loss of LIS1 has a negative impact on the transition from DN3 to DN4 thymocytes and on the maturation of pre-pro-B cells into pro-B cells in the bone marrow. Using a CD2Cre Lis1fl/fl murine model, they observe that in thymocytes LIS1 is critical for DN3 proliferation and completion of cell division. Then, using a CD4Cre Lisfl/fl model (Cd4 promoter is up-regulated just in later stages of thymic development and, thus, does not impact DN3 thymocytes) they show that LIS1-deficient CD4 T cells have proliferation defects following both TCR-dependent or -independent stimulation, which results in apoptosis. They also confirm previous reports that show that LIS1-deficient CD8 T cells do not have their proliferation impaired upon TCR stimulation, which suggests that these two cell types rely on different mechanisms to regulate the cell cycle. Finally, the authors make efforts to determine how LIS1 regulates proliferation in thymocytes and CD4 T cells. Interestingly, they show that LIS1 is important for chromosome alignment and centrosome integrity and provide data that support a model where LIS1 would facilitate the assembly of active dyneindynactin complexes. These data provide interesting insights into how different cell types use distinct strategies to undergo mitosis and how this can impact on their proliferation and fate decisions. The conclusions of the manuscript are mostly supported by the provided data, although certain aspects can be further investigated and clarified.

      Strengths of the paper:

      By combining a re-assessment of previous reports with new findings, the data from this manuscript convincingly demonstrates that LIS1 is crucial for cell proliferation in certain development steps/cell types. Furthermore, the manuscript provides clear evidence of how LIS1 loss causes proliferation defects by disrupting centrosome integrity and chromosome alignment both in CD4+ T cells and thymocytes.

      Weakness of the paper:

      Although authors successfully address the mechanistic role of LIS in thymocyte and CD4+ T cell division, the manuscript would be strengthened by both providing further evidence to support some of their conclusions and a review of some speculations raised in the discussion.

      In Figure 1, the authors claim that LIS1 is not required for pre-TCR assembly, but for expansion/proliferation of DN3 thymocytes as a step prior to reaching the DN4 stage. However, authors indeed observe increased expression of CD5 (which is a downstream event of Notch and IL-7R signalling). Thus, from the data provided it is not clear whether signalling through Notch or IL-7R is definitely not affected, which could be clarified by assessing the expression of other downstream targets of these molecules.

      CD5 is a downstream target of the pre-TCR signaling but to our knowledge, it is not a downstream target of Notch or IL-7R signaling. The sentence p7 of the initial manuscript was re-formulated since we understand that it could be misleading. However, we fully agree with the reviewer’s comment on Notch and IL-7R signaling and included new data in the revised version of the manuscript to address this point. Notch signaling stimulates metabolic changes which lead to the increase of thymocyte cell-size following the b-selection checkpoint (Ciofani M. et al., Nature Immunology, 2005; Maillard I. et al., The Journal of Experimental Medicine, 2006) and to the up-regulation of the transferrin receptor CD71 (Kelly, A.P. et al., The EMBO journal, 2007). We now show in Figure 1E of the revised manuscript that the loss of LIS1 does not affect the average cell-size of post-b-selection thymocytes and the expression level of CD71 in these cells, suggesting that Notch signaling is preserved in the absence of LIS1. This was confirmed in vitro following stimulation of DN3a thymocytes with OP9-dl1 cells (Figure 2D of the revised manuscript). In this Figure, we also analyzed the expression level of Bcl-2, which is regulated by IL-7R signaling (von Freeden-Jeffry, U. et al., Immunity, 1997). We show that Bcl-2 is comparable in abundance in LIS1 wild-type and LIS1-deficient thymocytes following stimulation with OP-9dl1, suggesting that Il-7R signaling is not affected by the absence of LIS1.

      In Figure 3, the authors mostly confirm previous data from Ngoi, Lopez, Chang, Journal of Immunology, 2016 (reference 34), but also provide evidence of a role of LIS1 in CD4+ T cell proliferation in more physiological setups, using OT2-CD4-Cre Lis1flox/flox (or OT2 Lisflox/flox as controls) in adoptive transfer experiments followed by antigen-specific immunization. However, the evidence provided by the authors about proliferation defects in LIS1-deficient cells in this context is limited by the early timepoint chosen: day 3 post-immunization.

      We choose to analyze CD4+ T cells at day 2 and 3 after immunization because we sought to catch early cell-division waves through CTV dilution. We also wanted to show that LIS1 deficient CD4+ T cells could normally survive and migrate to lymph nodes before they start to proliferate. Given the dramatic effect of LIS1 on CD4+ T-cell proliferation at day 3, we anticipated that very low numbers of LIS1 deficient cells would survive at later time points after immunization. To address the reviewer’s comment, we transferred OT2+CD45.1+ CD4+ T cells stained with CTV in C57BL/6 mice and analyzed the percentages and numbers of CD45.1+ T cells as well as the dilution of CTV in those cells at day 7 after immunization. As expected, all CD45.1+ cells were negative for CTV at this time of analysis (data not shown). The percentages and numbers of CD45.1+ T cells were strongly decrease in the absence of LIS1 in comparison to wild-type controls (Figure 3 - Figure Supplement 2C), confirming results obtained at day 3 after immunization.

      In the discussion, the authors speculate about the differences observed between CD4 and CD8 T cells, as the latter do now show proliferative defects upon TCR-triggered stimulation, and come up with the hypothesis that LIS1 might be important for symmetric cell divisions, but not for asymmetric cell divisions. However, the arguments used by the authors have few caveats, especially because CD4+ T cells can also undergo asymmetric cell division following TCR-triggered stimulation upon the first cognate antigen encounter (Chang et al., Science, 2007, Ref. 8).

      We agree that CD4+ T cells can undergo asymmetric division (actually, this is mentioned and referenced p3 and p18 of the manuscript). However, it is unknown whether these divisions occur systematically or whether they occur with variable frequency which could be context-dependent. It is also unclear whether CD4+ and CD8+ T cells have similar rates of asymmetric division. The literature is lacking of comparative studies in which cellular events associated to mitosis would be investigated side-by-side in those two subsets. As mentioned to reviewer-1, only one study to our knowledge performed a comparative analysis of T-bet repartition in daughter cells after a first round of cell division in CD4+ and CD8+ T cells (Chang, J. T. et al., Immunity, 2011). They found that T-bet segregates unequally in daughter cells in both CD4+ and CD8+ T cells. However, the disparity between daughter cells was higher in CD8+ T cells as compared to that in CD4+ T cells (5- versus 3-fold). This suggests that key molecules are either more equally (or less unequally) distributed in daughter cells from the CD4+ lineage or that the rate of symmetric divisions is higher in CD4+ T cells than in the CD8+ T cells. Those results are in accordance with our interpretation and previous findings (Yingling, J. et al., Cell, 2008; Zimdahl, B. et al, Nature Genetics, 2014), suggesting that LIS1 is predominantly involved in mitosis associated to symmetric divisions. Another possibility to explain this difference is that asymmetrical division might occur at different stages in CD4+ and CD8+ T cells. Although some asymmetrical divisions have been detected early after antigen encounter in CD4+ T cells, a more recent study from the same group suggest that asymmetric division might occur mainly later after several rounds of divisions of CD4+ T cells to enable self-renewal to be coupled to production of differentiated effector CD4+ T cells (Nish, S. A., Journal of Experimental Medicine, 2017). It is therefore possible that LIS1 could be critical early in CD4+ T cell expansion, when cells mainly divide through symmetrical process, and less critical later when cells are engaged in asymmetrical division. This is now discussed in greater details p18 of the revised version of the manuscript.

      Finally, the authors discuss that mono-allelic LIS1 defects might contribute to malignancies. Certainly not all points raised in the discussion need to be experimentally addressed, but for this particular hypothesis the authors would likely have the tools to achieve that, which would broaden the relevance of understanding LIS1 function.

      We have addressed this point experimentally in the revised version of the manuscript. We show that mono-allelic LIS1 deficiency does not have a significant impact on the percentages of thymocyte populations in Cd2-Cre Lis1flox/+ mice (Figure 1 - Figure Supplement 1B) and on the numbers of peripheral T cells in Cd4-Cre Lis1flox/+ (Figure 3 - Figure Supplement 1E), suggesting that LIS1 does not operate in a dose-dependent fashion in the context of T-cell development and T-cell homeostatic maintenance. Additionally, Cd4-Cre Lis1flox/+ CD4+ T cells proliferate effectively following TCR and CD28 stimulation (Figure 3 - Figure Supplement 2A), indicating further that mono-allelic LIS1 dosage is sufficient to support cell division of CD4+ T cells. The part of the discussion related to Lis1 haplo-deficiency has been rephrased according to this new set of data.

    1. Author Response

      Reviewer #1 (Public Review):

      1) One nagging concern is that the category structure in the CNN reflects the category structure baked into color space. Several groups (e.g. Regier, Zaslavsky, et al) have argued that color category structure emerges and evolves from the structure of the color space itself. Other groups have argued that the color category structure recovered with, say, the Munsell space may partially be attributed to variation in saturation across the space (Witzel). How can one show that these properties of the space are not the root cause of the structure recovered by the CNN, independent of the role of the CNN in object recognition?

      We agree that there is overlap with the previous studies on color structure. In our revision, we show that color categories are directly linked to the CNN being trained on the objectrecognition task and not the CNN per se. We repeated our analysis on a scene-trained network (using the same input set) and find that here the color representation in the final layer deviates considerably from the one created for object classification. Given the input set is the same, it strongly suggests that any reflection of the structure of the input space is to the benefit of recognizing objects (see the bottom of “Border Invariance” section; Page 7). Furthermore, the new experiments with random hue shifts to the input images show that in this case stable borders do not arise, as might be expected if the border invariance was a consequence of the chosen color space only.

      A crucial distinction to previous results is also, is that in our analysis, by replacing the final layer, specifically, we look at the representation that the network has built to perform the object classification task on. As such the current finding goes beyond the notion that the color category structure is already reflected in the color space.

      2) In Figure 1, it could be useful to illustrate the central observation by showing a single example, as in Figure 1 B, C, where the trained color is not in the center of the color category. In other words, if the category structure is immune to the training set, then it should be possible to set up a very unlikely set of training stimuli (ones that are as far away from the center of the color category while still being categorized most of the time as the color category). This is related to what is in E, but is distinctive for two reasons: first, it is a post hoc test of the hypothesis recovered in the data-driven way by E; and second, it would provide an illustration of the key observation, that the category boundaries do not correspond to the median distance between training colors. Figure 5 begins to show something of this sort of a test, but it is bound up with the other control related to shape.

      We have now added a post-hoc test where we shift the training bands from likely to unlikely positions using the original paradigm: Retraining output layers whilst shifting training bands from the left to the right category-edge (in 9 steps) we can see the invariance to the category bounds specifically (see Supp. Inf.: Figure S11). The most extreme cases (top and bottom row) have the training bands right at the edge of the border, which are the interesting cases the reviewer refers to. We also added 7 steps in between to show how the borders shift with the bands.

      Similarly, if the claim is that there are six (or seven?) color categories, regardless of the number of colors used to train the data, it would be helpful to show the result of one iteration of the training that uses say 4 colors for training and another iteration of the training that uses say 9 colors for training.

      We have now included the figure presented in 1E, but for all the color iterations used (see SI: Figure S10. We are also happy to include a single iteration, but believe this gives the most complete view for what the reviewer is asking.

      The text asserts that Figure 2 reflects training on a range of color categories (from 4 to 9) but doesn’t break them out. This is an issue because the average across these iterations could simply be heavily biased by training on one specific number of categories (e.g. the number used in Figure 1). These considerations also prompt the query: how did you pick 4 and 9 as the limits for the tests? Why not 2 and 20? (the largest range of basic color categories that could plausibly be recovered in the set of all languages)?

      The number of output nodes was inspired by the number of basic color categories that English speakers observe in the hue spectrum (in which a number of the basic categories are not represented). We understand that this is not a strong reason, however, unfortunately the lack of studies on color categories in CNNs forced us to approach this in an explorative manner. We have adapted the text to better reflect this shortcoming (Bottom page 4). Naturally if the data would have indicated that these numbers weren’t a good fit, we would have adapted the range. (if there were more categories, we would have expected more noise and we would have increased the number of training bands to test this). As indicated above, we have now also included the classification plots for all the different counts, so the reader can review this as well (SI: Section 9).

      3) Regarding the transition points in Figure 2A, indicated by red dots: how strong (transition count) and reliable (consistent across iterations) are these points? The one between red and orange seems especially willfully placed.

      To answer the question on the consistency we have now included a repetition of the ResNet18, with the ResNet34, ResNet50 and ResNet101 in the SI (section 1). We have also introduced a novel section presenting the result of alternate CNNs to the SI (section S8). Despite small idiosyncrasies the general pattern of results recurs.

      Concerning the red-orange border, it was not willfully placed, but we very much understand that in isolation it looks like it could simply be the result of noise. Nevertheless, the recurrence of this border in several analyses made us confident that it does reflect a meaningful invariance. Notably:

      • We find a more robust peak between red and orange in the luminance control (SI section 3).

      • The evolutionary algorithm with 7 borders also places a border in this position.

      • We find the peak recurs in the Resnet-18 replication as well as several of the deeper ResNets and several of the other CNNs (SI section 1)

      • We also find that the peak is present throughout the different layers of the ResNet-18.

      4) Figure 2E and Figure 5B are useful tests of the extent to which the categorical structure recovered by the CNNs shifts with the colors used to train the classifier, and it certainly looks like there is some invariance in category boundaries with respect to the specific colors uses to train the classifier, an important and interesting result. But these analyses do not actually address the claim implied by the analyses: that the performance of the CNN matches human performance. The color categories recovered with the CNN are not perfectly invariant, as the authors point out. The analyses presented in the paper (e.g. Figure 2E) tests whether there is as much shift in the boundaries as there is stasis, but that’s not quite the test if the goal is to link the categorical behavior of the CNN with human behavior. To evaluate the results, it would be helpful to know what would be expected based on human performance.

      We understand the lack of human data was a considerable shortcoming of the previous version of the manuscript. We have now collected human data in a match-to-sample task modeled on our CNN experiment. As with the CNN we find that the degree of border invariance does fluctuate considerably. While categorical borders are not exact matches, we do broadly find the same category prototypes and also see that categories in the red-to-yellow range are quite narrow in both humans and CNNs. Please, see the new “Human Psychophysics” (page 8) addition in the manuscript for more details.

      5) The paper takes up a test of color categorization invariant to luminance. There are arguments in the literature that hue and luminance cannot be decoupled-that luminance is essential to how color is encoded and to color categorization. Some discussion of this might help the reader who has followed this literature.

      We have added some discussion of the interaction between luminance and color categories (e.g., Lindsay & Brown, 2009) at the bottom of page 6/ top of page 7. The current analysis mainly aimed at excluding that the borders are solely based on luminance.

      Related, the argument that “neighboring colors in HSV will be neighboring colors in the RGB space” is not persuasive. Surely this is true of any color space?

      We removed the argument about “neighboring colors”. Our procedure requires the use of a hue spectrum that wraps around the color space while including many of the highly saturated colors that are typical prototypes for human color categories. We have elected to use the hue spectrum from the HSV color space at full saturation and brightness, which is represented by the edges of the RGB color cube. As this is the space in which our network was trained, it does not introduce any deformations into the color space. Other potential choices of color space either include strong non-linear transformations that stretch and compress certain parts of the RGB cube, or exclude a large portion of the RGB gamut (yellow in particular).

      We have adapted the text to better reflect our reasoning (page 6, top of paragraph 2).

      6) The paper would benefit from an analysis and discussion of the images used to originally train the CNN. Presumably, there are a large number of images that depict manmade artificially coloured objects. To what extent do the present results reflect statistical patterns in the way the images were created, and/or the colors of the things depicted? How do results on color categorization that derive from images (e.g. trained with neural networks, as in Rosenthal et al and presently) differ (or not) from results that derive from natural scenes (as in Yendrikhovskij?).

      We initially hoped we could perhaps analyze differences between colors in objects and background like in Rosenthal, unfortunately in ImageNet we did not find clear differences between pixels in the bounding boxes of objects provided with ImageNet and pixels outside these boxes (most likely because the rectangular bounding boxes still contain many background pixels). However, if we look at the results from the K-means analysis presented in Figure S6 (Suppl. Inf.) of the supplemental materials and the color categorization throughout the layers in the objecttrained network (end of the first experiment on page 7) as well as the color categorization in humans (Human Psychophysics starting on page 8), we see very similar border positions arise.

      7) It could be quite instructive to analyze what's going on in the errors in the output of the classifiers, as e.g. in Figure 1E. There are some interesting effects at the crossover points, where the two green categories seem to split and swap, the cyan band (hue % 20) emerges between orange and green, and the pink/purple boundary seems to have a large number of green/blue results. What is happening here?

      One issue with training the network on the color task, is that we can never fully guarantee that the network is using color to resolve the task and we suspected that in some cases the network may rely on other factors as well, such as luminance. When we look at the same type of plots for the luminance-controlled task (see below left) presented in the supplemental materials we do not see these transgressions. Also, when we look at versions of the original training, but using more bands, luminance will be less reliable and we also don’t see these transgressions (see right plot below).

      8) The second experiment using an evolutionary algorithm to test the location of the color boundaries is potentially valuable, but it is weakened because it pre-determines the number of categories. It would be more powerful if the experiment could recover both the number and location of the categories based on the "categorization principle" (colors within a category are harder to tell apart than colors across a color category boundary). This should be possible by a sensible sampling of the parameter space, even in a very large parameter space.

      The main point of the genetic algorithm was to see whether the border locations would be corroborated by an algorithm using the principle of categorical perception. Unfortunately, an exact approach to determining the number of borders is difficult, because some border invariances are clearly stronger than others. Running the algorithm with the number of borders as a free parameter just leads to a minimal number of borders, as 100% correct is always obtained when there is only one category left. In general, as the network can simply combine categories into a class at no cost (actually, having less borders will reduce noise) it is to be expected that less classes will lead to better performance. As such, in estimating what the optimal category count would be, we would need to introduce some subjective trade-off between accuracy and class count.

      9) Finally, the paper sets itself up as taking "a different approach by evaluating whether color categorization could be a side effect of learning object recognition", as distinct from the approach of studying "communicative concepts". But these approaches are intimately related. The central observation in Gibson et al. is not the discovery of warm-vscool categories (these as the most basic color categories have been known for centuries), but rather the relationship of these categories to the color statistics of objects-those parts of the scene that we care about enough to label. This idea, that color categories reflect the uses to which we put our color-vision system, is extended in Rosenthal et al., where the structure of color space itself is understood in terms of categorizing objects versus backgrounds (u') and the most basic object categorization distinction, animate versus inanimate (v'). The introduction argues, rightly in our view, that "A link between color categories and objects would be able to bridge the discrepancy between models that rely on communicative concepts to incorporate the varying usefulness of color, on the one hand, and the experimental findings laid out in this paragraph on the other". This is precisely the link forged by the observation that the warmcool category distinction in color naming correlates with object-color statistics (Gibson, 2017; see also Rosenthal et al., 2018). The argument in Gibson and Rosenthal is that color categorization structure emerges because of the color statistics of the world, specifically the color statistics of the parts of the world that we label as objects, which is the same approach adopted by the present work. The use of CNNs is a clever and powerful test of the success of this approach.

      We are sorry we did not properly highlight the enormous importance of these two earlier papers in our previous version of the manuscript. We have now elaborated our description of Gibson’s work to better reflect the important relation between the usefulness of colors and color categories (Page 2, middle and Page 19 par. above methods). We think our work nicely extends the earlier work by showing that their approach works even at a more general level with more color categories,

    1. Author Response

      Reviewer #3 (Public Review):

      In this paper, for the first time, metabolomics, proteomics, and lipidomics are combined to multi-dimensionally obtain more objective and scientific clues about early and advanced PMI, compared to the traditional methods of PMI estimation that relies on the subjective judgment of morphology. The "ForensOMICS" pipeline establishes a multi-omics analysis pipeline based on the LC-MS platform, which will bring influence and inspiration to the related research of PMI estimation based on molecular biological markers in the foreseeable future. However, due to the limitation of the availability of bone samples and metadata (which might contain covariates with latent influences on the PMI estimation), the current research is still a proof-of-concept study which is incomplete for the "ForensOMICS" approach to be applied in court.

      Strengths:

      Combing multiple omics and bioinformatics, as claimed by the authors, the "ForensOMICS" approach is more accurate and precise than the conventional morphological methods and molecular biological methods using single omics. Moreover, the research does not stop at developing time-dependent models using several omics biomarkers but carries on the enrichment analysis of relevant markers to further explore the pathophysiology mechanism behind the great changes in the internal environment after death, so as to provide meaningful reference data for the basic forensic research of death.

      Data Integration Analysis for Biomarker discovery using Latent variable approaches for Omics studies (DIABLO) method and multiple features selecting tools are used in the bioinformatic process to analyze multiple omics data, and PMI classification model constructed based on PLS-DA, with parameters optimized by 3-fold/100 repeats cross-validation. The overall analysis process is relatively complete, and the data and classification model provided have scientific values for reference.

      The "ForensOMICS" workflow in principle is compatible across metabolomics, proteomics, and lipidomics data obtained in different domains of proof-of-concept studies focusing on forensic-related time estimation (e.g. post-mortem submersion interval and time since deposit), for offering relatively complete analysis process.

      Weaknesses:

      Although the paper does have strengths in principle, the limitation of the availability of bone samples and metadata leads to the major weaknesses of the paper. Therein, age bias samples with single bone type and lack of analysis for environmental factors are the major weaknesses that argue against the key claims in the manuscript by the data presented.

      The mean age of body donors is 74 years with {plus minus}11.6 years of standard deviation, while there was only one type of bone tissue (left anterior midshaft tibia). Different structures and locations of the sampled bone tissue as well as metabolic changes and bone degeneration caused by aging may lead to significant discrepancies in different multi-omics data. Moreover, most of the dead found at crime scenes are in the prime of life, and in addition to the tibia, other skeletal remains found at the scenes are commonly skull, ribs, upper limb bones, and teeth. Therefore, the relevant conclusions obtained from the research based on the limited bone samples cannot meet the actual needs for estimating the PMI of skeletal remains. As mentioned by the authors in the discussion, due to the difficulty in acquiring human remain samples with definite post-mortem intervals, this study is still proof-of-concept. If possible, the authors can focus on a larger sample set of different bone remains in younger age groups in future studies.

      The reviewer is describing exactly the purpose of this manuscript. As highlighted by them, this paper is not intended to be an applicable method for PMI estimation at this stage, as we are aware of the differences that may exist between multiple skeletal elements and the omics results (at least, for proteomics data, as we published several papers on this topic). However, this is the proof of concept to demonstrate the potential that multiple omics combined together may have in addressing the PMI. We are committed to increase our sample size in order to develop a forensic technique for PMI estimation, that should anyway be then validated on multiple skeletal elements.

      Tibia is frequently recovered from scenes also involving the presence of incomplete human remains subjected to long PMIs; our previous studies have also demonstrated that midshaft tibia may be an ideal candidate for proteomics analyses, due to its small intra-individual variability in comparison with other bones. Therefore, the selected sample for this pilot has been the anterior midshaft tibia. We do agree with the reviewer that such samples may not be representative of the whole bone proteome, metabolome, and lipidome composition (with particular regards to cortical and trabecular parts); however, this could be addressed as part of future studies on the topic.

      We do agree with the reviewer about the possible confounding factor related to the relatively high variability in terms of age at death differences, that was indeed due to the difficult in acquiring human bodies with a known PMI.

      Although in-life physiological and/or pathological conditions (i.e., osteoporosis) might be responsible for variability among baseline samples and between baseline and different long PMIs’ samples seen in several metabolites and proteins, we believe the biological phenomena underlying PMI are strong enough to overcome such limitations in the design of the experiment. This is also supported by the small inter-individual variability observed amongst the fresh/baseline samples.

      It is suggested that metadata which may be influence factors of PMI such as temperature, humidity, UV-exposure, and deposition context (which is already recorded) should be recorded and statistically analyzed, so as to further optimize the "ForensOMICS" classification model by considering these possible environmental covariates. In addition, according to the No Free Lunch theorem, PLS-DA is very likely not to be the optimal solution for all the above-mentioned PMI classification tasks based on multi-omics data under different environmental conditions. It is recommended to develop and compare more different classification models for improving the generalization performance of the "ForensOMICS" approach.

      We agree with the reviewer that these factors are crucial in the decomposition process. In our opinion, however, at this stage it is not appropriate to include these metadata in the statistical analysis as covariates by applying additional classification models, due to the small sample size available. Additionally, the main focus of the paper is exclusively on PMI-driven modifications. Environmental data have been added for reference in Supplementary File 2 and will be taken into account in future works when a bigger sample size will be evaluated.

      Due to the limitation of sample size and the discrete-time gradients, the omics data obtained in the paper could only be applied to build a classification model rather than the regression model. Since such a model does not give a specific predicted PMI with MSE and RMSE indicating its performance, and the current "ForensOMICS" approach failed to distinguish different samples of late PMI (219-834 days), there is still a distance for "ForensOMICS" approach to apply in the actual forensic practice.

      Thank you for your comments. We agree, and stressed across the whole manuscript, that this is far from being appliable to forensic practice. The proof-of-concept nature of the study represents a mandatory step for the building of a regression model than can be challenged in the future with the highly rigorous standard required in the forensic setting (i.e., Daubert criteria). We appreciate the understanding of the reviewer for the choice of modelling the data using classification rather than regression.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors define regulatory networks across 77 tissue contexts using software they have previously published (PECA2, Duren et al. 2020). Each regulatory network is a set of nodes (transcription factors (TF), target genes (TG), and regulatory elements (RE)) and edges (regulatory scores connecting the nodes). For each context, the authors define context-specific REs, as those that do not overlap REs from any of the other 76 contexts, and context-specific regulatory networks as the collection of TFs, TGs, and REs connected to at least one context-specific RE. This approach essentially creates annotations that are aggregated across genes, elements, and specific contexts. For each tissue, the authors use linkage disequilibrium score regression (LDSC) to calculate enrichment for complex trait heritability within the set of all REs from the corresponding context-specific regulatory network. Heritability enrichments in context-specific regulatory network REs are compared with heritability enrichments in regions defined using other approaches.

      We thank the reviewers for the pertinent and precise summary of our paper.

      Reviewer #2 (Public Review):

      In this manuscript the authors develop a method, SpecVar, to perform heritability estimation from regulatory networks derived from gene expression and chromatin accessibility data. They apply this approach to public datasets available in ENCODE and Roadmap Epigenomics consortia as well as GWAS phenotype associations in UK Biobank. It promises to be a powerful method to interpret mechanisms from genetic associations. Below are some strengths and weaknesses of the paper.

      Strengths

      • The method performs heritability enrichment on two major genomic data types: gene expression and chromatin accessibility.

      • This method leverages gene regulatory networks to perform the heritability estimation, which may better capture complex disease architecture.

      • The authors perform an extensive comparison to other LDSC-based approaches using different tissue datasets.

      Weaknesses

      (1) This approach may represent a modest advance over existing LDSC methods when looking at other complex traits.

      (2) The authors only compare with LDSC using different functional annotations as input, which may not be appropriate. A more broad comparison with other heritability methods would be helpful.

      (3) The method seems to be applied to "paired" data, but this is still bulk profiles not paired single-cell RNA/ATAC data.

      The authors successfully applied a regulatory network approach to improving the heritability estimation of complex traits by using both gene expression and chromatin accessibility data. While the results could be further strengthened by comparing them to other network and non-network-based methods, it provides important insight into a few traits beyond the standard LDSC model with different functional annotations.

      Given that this method is based on the widely used LDSC approach it should be broadly applied in the field. However, the authors should consider adapting this to single-cell data as well as admixed human population genetic data.

      We thank the reviewer for the positive comment on our work by specifically pointing out that SpecVar is a powerful method to interpret mechanisms from genetic associations. We appreciate that the reviewer’s summarized “Strength” part well captures our major contribution in building an atlas of regulatory networks by integrating paired gene expression and chromatin accessibility data, leveraging regulatory networks to perform the heritability enrichment, and identifying relevant tissues and estimate relevance correlation. We also thank the reviewer for pointing out the weakness to further enhance our results. To address the comments, we (1) performed ablation studies and added more description to clarify the novelty of our methods; (2) conducted extensive comparison to another network-based method CoCoNet and non-network-based method RolyPoly; (3) discussed the promising direction in identification of relevant contexts at cell type level by leveraging single cell multi-omics profiles and application on admixed populations.

      Reviewer #3 (Public Review):

      Identifying the critical tissues and cell types in which genetic variants exert their effects on complex traits is an important question that has attracted increasing attention. Feng et al propose a new method, SpecVar, to first construct context-specific regulatory networks by integrating tissue-specific chromatin states and gene expression data, and then run stratified LD score regression (LDSC) to test if the constructed regulatory network in tissue is significantly associated with the trait, measured by a statistic called trait relevance score in this study. They apply their method to 6 traits for which there exists prior evidence on the most relevant tissues in the literature, and then further apply to 206 traits in the UK Biobank. They find that compared to LDSC using other sources of information to define context-specific annotations, their method can "improve heritability enrichment", "accurately detect relevant tissues", helps to "interpret SNPs" identified from GWAS, and "better reveals shared heritability and regulations of phenotypes" between traits.

      We thank the reviewer for the summary and appreciation of our efforts to address the important question: identifying the critical tissues and cell types in which genetic variants exert their effects on complex traits.

      However, I think it requires more work to understand where exactly the benefits come from and the statistical properties of their proposed test statistic (e.g., how to perform hypothesis tests with their relevance score and whether the false positive rate is under control). In addition, it's not clear to me what they can conclude about the shared heritability (which means genetic correlation) by comparing their relevance score correlation across tissues to the phenotypic correlation between traits.

      We thank the reviewer’s advice to do more work to enhance the statistical rigorousness of SpecVar. We have added the significant test of heritability enrichment and our proposed R score in the revision. We also clarified that SpecVar can use common relevant contexts and shared SNP-associated regulatory networks as potential explanation for the correlation between traits.

      They show that SpecVar gives much higher heritability enrichment than the other methods in the trait-relevant tissues (Fig. 2). The fold enrichment from SpecVar is extremely high, e.g., more than 600x in the right lobe of the liver for LDL. First, I think a standard error should be given so that the significance of the differences can be assessed. Second, it is very rare (hence suspicious) to observe such a huge enrichment. Since SpecVar is based on LDSC, the same methodology that other methods in comparison depend on, the differences to the other methods must come from the set of SNPs annotated for each tissue. I think it is important to understand the difference between the SpecVar annotated SNPs and those from other methods. For example, is the extra heritability enrichment mainly from the SpecVar-specific annotation or from the intersection narrowed down by SpecVar?

      The reviewer has pinpointed a question about one important advantage of our method to improve heritability enrichment. We addressed this question by first providing standard errors, p values, and q values of heritability enrichment. Second, we conduct the ablation analysis to study the source of extra heritability enrichment. This question greatly helps us to clarify the main contribution of our method.

      They propose to use the relevance score (R score) to prioritise trait-relevant tissues. In Fig. 3, they show tissue-trait pairs with the highest R scores, and from there they prioritise several tissues for each trait (Table 1). I can see that some tissue has an outstanding R score, however, it is not clear to me where they draw the line to declare a positive result. The threshold doesn't seem to be even consistent across traits. For example, for LDL, only the right lobe of the liver is identified although other tissues have R scores greater than 100, whereas, for EA, Ammor's horn and adrenal gland are identified although their R scores are apparently smaller than 100. It seems to me they use some subjective criteria to pick the results. It leads to a serious question on how to apply their R score in a hypothesis test: how to measure the uncertainty of their R score? What significance threshold should be used? Whether the false positive rate is under control? (Without knowing these statistical properties, readers won't be able to use this method with confidence in their own research.

      We thank the reviewer to raise the question about the hypothesis test of the R score. We used the block Jackknife stratagem to estimate standard errors, p values, and q values in our revision. We added the new result to the main text and they greatly enhanced the statistical rigorousness of our method.

      Another related comment to the above is to investigate false positive associations, they should show the results for all tissues tested to see if SpecVar tends to give higher R scores even in tissues that are not relevant to the trait. It would also be useful to include some negative control traits, such as height for brain tissues.

      We agree that negative control is important and the six phenotypes in our manuscript are negative for each other. For example, LDL is relevant to liver tissue and not relevant to brain tissue. Educational attainment is relevant to brain tissue but not relevant to liver tissue.

      Fig. 3 shows that tissues prioritised by LDSC-SAP and LDSC-SEG seem to make less sense than those from SpecVar. However, some of the results are not consistent with the LDSC-SEG paper (Finucane et al 2018). For example, LDL was significantly associated with the liver in Finucane et al (Fig. 2), but not in this study. How to explain the difference? (Question 3)

      We checked the results in Figure 3 and found that even though the liver was not ranked to be top 5 tissues, it has a significant P-value to LDL in our implementation. There is indeed some difference in heritability enrichment and P-value between the LDSC-SEG paper and our implementation. And the difference was from the different sets of tissues (77 tissues in our paper and 53 tissues in the LDSC-SEG paper) for the two applications.

      The authors highlight an example where SpecVar facilitates the interpretation of GWAS signals near FOXC2. They find GWAS-significant SNPs located in a CNCC-specific RE downstream of FOXC2 and reason these SNPs affect brain shape by regulating the expression of FOXC2. I think more work can be done to consolidate the conclusion. For example, if the GWAS signals are colocalised with the eQTL for FOXC2 in the brain. Also, note that the top GWAS signal is actually on the left of the CNCC-specific RE (Fig. 4b). A deeper investigation should be warranted.

      We agree that more work should be done to consolidate the regulation of FOXC2. In our revision, we used the HiChIP loop in the brain to support the SNP-associated regulation of FOXC2. We also thank the reviewer’s suggestion for the idea of eQTL colocalization and we conduct eQTL colocalization analysis on our method-revealed SNP-associated regulation to show our method can facilitate the fine mapping of GWAS signals. Lastly, brain shape is a complex trait and may be relevant to multiple tissues. Hence it is reasonable to suspect that the top GWAS signal may be active in other relevant tissues’ regulatory elements.

      They show that SpecVar's relevance score correlation across tissues can better approximate phenotypic correlation between traits. However, the estimation of the phenotypic correlation between traits is neither very interesting nor a thing difficult to do (it can be directly estimated from GWAS summary statistics). A more interesting question is to which extent the observed phenotypic correlation is due to common genetic factors acting in the shared tissues/cell types/pathways/regulatory networks between traits. Note that in their Abstract, they use words "depict shared heritability and regulations" but I don't seem to see results supporting that.

      We are sorry that we didn’t make it clear how SpecVar “depict shared heritability and regulations”. We added more results and one example in the UKBB application to show SpecVar can use common relevant contexts and shared SNP-associated regulatory networks as potential explanation for the correlation between traits.

      Line 396-402: "For example, ... heritability could select most relevant tissues ... but failed to get correct tissues for other phenotypes ... P-value could obtain correct tissues for CP ... but failed to get correct tissues for ... SpecVar could prioritize correct relevant tissues for all the six phenotypes." Honestly, I find hard to judge which tissues are "correct" or "incorrect" for a trait in real life. It would be more straightforward to compare methods using simulation where we know which tissues are causal.

      We thank the reviewers to pinpoint the improper statement of “correct”. It is difficult to find phenotypes with gold-standard relevant tissues and we used six relatively well-studied phenotypes with prior knowledge of possible relevant tissues in our paper. We revised the “correct” statement in our revision.

    1. Author Response

      Reviewer #1 (Public Review):

      Trudel and colleagues aimed to uncover the neural mechanisms of estimating the reliability of the information from social agents and non-social objects. By combining functional MRI with a behavioural experiment and computational modelling, they demonstrated that learning from social sources is more accurate and robust compared with that from non-social sources. Furthermore, dmPFC and pTPJ were found to track the estimated reliability of the social agents (as opposed to the non-social objects). The strength of this study is to devise a task consisting of the two experimental conditions that were matched in their statistical properties and only differed in their framing (social vs. non-social). The novel experimental task allows researchers to directly compare the learning from social and non-social sources, which is a prominent contribution of the present study to social decision neuroscience.

      Thank you so much for your positive feedback about our work. We are delighted that you found that our manuscript provided a prominent contribution to social decision neuroscience. We really appreciate your time to review our work and your valuable comments that have significantly helped us to improve our manuscript further.

      One of the major weaknesses is the lack of a clear description about the conceptual novelty. Learning about the reliability/expertise of social and non-social agents has been of considerable concern in social neuroscience (e.g., Boorman et al., Neuron 2013; and Wittmann et al., Neuron 2016). The authors could do a better job in clarifying the novelty of the study beyond the previous literature.

      We understand the reviewer’s comment and have made changes to the manuscript that, first, highlight more strongly the novelty of the current study. Crucially, second, we have also supplemented the data analyses with a new model-based analysis of the differences in behaviour in the social and non-social conditions which we hope makes clearer, at a theoretical level, why participants behave differently in the two conditions.

      There has long been interest in investigating whether ‘social’ cognitive processes are special or unique compared to ‘non-social’ cognitive processes and, if they are, what makes them so. Differences between conditions could arise during the input stage (e.g. the type of visual input that is processed by social and non-social system), at the algorithm stage (e.g. the type of computational principles that underpin social versus non-social processes) or, even if identical algorithms are used, social and non-social processes might depend on distinct anatomical brain areas or neurons within brain areas. Here, we conducted multiple analyses (in figures 2, 3, and 4 in the revised manuscript and in Figure 2 – figure supplement 1, Figure 3 – figure supplement 1, Figure 4 – figure supplement 3, Figure 4 – figure supplement 4) that not only demonstrated basic similarities in mechanism generalised across social and non-social contexts, but also demonstrated important quantitative differences that were linked to activity in specific brain regions associated with the social condition. The additional analyses (Figure 4 – figure supplement 3, Figure 4 – figure supplement 4) show that differences are not simply a consequence of differences in the visual stimuli that are inputs to the two systems1, nor does the type of algorithm differ between conditions. Instead, our results suggest that the precise manner in which an algorithm is implemented differs when learning about social or non-social information and that this is linked to differences in neuroanatomical substrates.

      The previous studies mentioned by the reviewer are, indeed, relevant ones and were, of course, part of the inspiration for the current study. However, there are crucial differences between them and the current study. In the case of the previous studies by Wittmann, the aim was a very different one: to understand how one’s own beliefs, for example about one’s performance, and beliefs about others, for example about their performance levels, are combined. Here, however, instead we were interested in the similarities and differences between social and non-social learning. It is true that the question resembles the one addressed by Boorman and colleagues in 2013 who looked at how people learned about the advice offered by people or computer algorithms but the difference in the framing of that study perhaps contributed to authors’ finding of little difference in learning. By contrast, in the present study we found evidence that people were predisposed to perceive stability in social performance and to be uncertain about non-social performance. By accumulating evidence across multiple analyses, we show that there are quantitative differences in how we learn about social versus non-social information, and that these differences can be linked to the way in which learning algorithms are implemented neurally. We therefore contend that our findings extend our previous understanding of how, in relation to other learning processes, ‘social’ learning has both shared and special features.

      We would like to emphasize the way in which we have extended several of the analyses throughout the revision. The theoretical Bayesian framework has made it possible to simulate key differences in behaviour between the social and non-social conditions. We explain in our point-by-point reply below how we have integrated a substantial number of new analyses. We have also more carefully related our findings to previous studies in the Introduction and Discussion.

      Introduction, page 4:

      [...] Therefore, by comparing information sampling from social versus non-social sources, we address a long-standing question in cognitive neuroscience, the degree to which any neural process is specialized for, or particularly linked to, social as opposed to non-social cognition 2–9. Given their similarities, it is expected that both types of learning will depend on common neural mechanisms. However, given the importance and ubiquity of social learning, it may also be that the neural mechanisms that support learning from social advice are at least partially specialized and distinct from those concerned with learning that is guided by nonsocial sources. However, it is less clear on which level information is processed differently when it has a social or non-social origin. It has recently been argued that differences between social and non-social learning can be investigated on different levels of Marr’s information processing theory: differences could emerge at an input level (in terms of the stimuli that might drive social and non-social learning), at an algorithmic level or at a neural implementation level 7. It might be that, at the algorithmic level, associative learning mechanisms are similar across social and non-social learning 1. Other theories have argued that differences might emerge because goal-directed actions are attributed to social agents which allows for very different inferences to be made about hidden traits or beliefs 10. Such inferences might fundamentally alter learning about social agents compared to non-social cues.

      Discussion, page 15:

      […] One potential explanation for the assumption of stable performance for social but not non-social predictors might be that participants attribute intentions and motivations to social agents. Even if the social and non-social evidence are the same, the belief that a social actor might have a goal may affect the inferences made from the same piece of information 10. Social advisors first learnt about the target’s distribution and accordingly gave advice on where to find the target. If the social agents are credited with goal-directed behaviour then it might be assumed that the goals remain relatively constant; this might lead participants to assume stability in the performances of social advisors. However, such goal-directed intentions might not be attributed to non-social cues, thereby making judgments inherently more uncertain and changeable across time. Such an account, focussing on differences in attribution in social settings aligns with a recent suggestion that any attempt to identify similarities or differences between social and non-social processes can occur at any one of a number of the levels in Marr’s information theory 7. Here we found that the same algorithm was able to explain social and non-social learning (a qualitatively similar computational model could explain both). However, the extent to which the algorithm was recruited when learning about social compared to non-social information differed. We observed a greater impact of uncertainty on judgments about social compared to non-social information. We have shown evidence for a degree of specialization when assessing social advisors as opposed to non-social cues. At the neural level we focused on two brain areas, dmPFC and pTPJ, that have not only been shown to carry signals associated with belief inferences about others but, in addition, recent combined fMRI-TMS studies have demonstrated the causal importance of these activity patterns for the inference process […]

      Another weakness is the lack of justifications of the behavioural data analyses. It is difficult for me to understand why 'performance matching' is suitable for an index of learning accuracy. I understand the optimal participant would adjust the interval size with respect to the estimated reliability of the advisor (i.e., angular error); however, I am wondering if the optimal strategy for participants is to exactly match the interval size with the angular error. Furthermore, the definitions of 'confidence adjustment across trials' and 'learning index' look arbitrary.

      First, having read the reviewer’s comments, we realise that our choice of the term ‘performance matching’ may not have been ideal as it indeed might not be the case that the participant intended to directly match their interval sizes with their estimates of advisor/predictor error. Like the reviewer, our assumption is simply that the interval sizes should change as the estimated reliability of the advisor changes and, therefore, that the intervals that the participants set should provide information about the estimates that they hold and the manner in which they evolve. On re-reading the manuscript we realised that we had not used the term ‘performance matching’ consistently or in many places in the manuscript. In the revised manuscript we have simply removed it altogether and referred to the participants’ ‘interval setting’.

      Most of the initial analyses in Figure 2a-c aim to better understand the raw behaviour before applying any computational model to the data. We were interested in how participants make confidence judgments (decision-making per se), but also how they adapt their decisions with additional information (changes or learning in decision making). In the revised manuscript we have made clear that these are used as simple behavioural measures and that they will be complemented later by more analyses derived from more formal computational models.

      In what we now refer to as the ‘interval setting’ analysis (Figure 2a), we tested whether participants select their interval settings differently in the social compared to non-social condition. We observe that participants set their intervals closer to the true angular error of the advisor/predictor in the social compared to the non-social condition. This observation could arise in two ways. First, it could be due to quantitative differences in learning despite general, qualitative similarity: mechanisms are similar but participants differ quantitatively in the way that they learn about non-social information and social information. Second, it could, however, reflect fundamentally different strategies. We tested basic performance differences by comparing the mean reward between conditions. There was no difference in reward between conditions (mean reward: paired t-test social vs. non-social, t(23)= 0.8, p=0.4, 95% CI= [-0.007 0.016]), suggesting that interval setting differences might not simply reflect better or worse performance in social or non-social contexts but instead might reflect quantitative differences in the processes guiding interval setting in the two cases.

      In the next set of analyses, in which we compared raw data, applied a computational model, and provided a theoretical account for the differences between conditions, we suggest that there are simple quantitative differences in how information is processed in social and nonsocial conditions but that these have the important impact of making long-term representations – representations built up over a longer series of trials – more important in the social condition. This, in turn, has implications for the neural activity patterns associated with social and non-social learning. We, therefore, agree with the reviewer, that one manner of interval setting is indeed not more optimal than another. However, the differences that do exist in behaviour are important because they reveal something about the social and non-social learning and its neural substrates. We have adjusted the wording and interpretation in the revised manuscript.

      Next, we analysed interval setting with two additional, related analyses: interval setting adjustment across trials and derivation of a learning index. We tested the degree to which participants adjusted their interval setting across trials and according to the prediction error (learning index, Figure f); the latter analysis is very similar to a trial-wise learning rate calculated in previous studies11. In contrast to many other studies, the intervals set by participants provide information about the estimates that they hold in a simple and direct way and enable calculation of a trial-wise learning index; therefore, we decided to call it ‘learning index’ instead of ‘learning rate’ as it is not estimated via a model applied to the data, but instead directly calculated from the data. Arguably the directness of the approach, and its lack of dependence on a specific computational model, is a strength of the analysis.

      Subsequently in the manuscript, a new analysis (illustrated in new Figure 3) employs Bayesian models that can simulate the differences in the social and non-social conditions and demonstrate that a number of behavioural observations can arise simply as a result of differences in noise in each trial-wise Bayesian update (Figure 3 and specifically 3d; Figure 3 – figure supplement 1b-c). In summary, the descriptive analyses in Figure 2a-c aid an intuitive understanding of the differences in behaviour in the social and non-social conditions. We have then repeated these analyses with Bayesian models incorporating different noise levels and showed that in such a way, the differences in behaviour between social and non-social conditions can be mimicked (please see next section and manuscript for details).

      We adjusted the wording in a number of sections in the revised manuscript such as in the legend of Figure 2 (figures and legend), Figure 4 (figures and legend).

      Main text, page 5:

      The confidence interval could be changed continuously to make it wider or narrower, by pressing buttons repeatedly (one button press resulted in a change of one step in the confidence interval). In this way participants provided what we refer to as an ’interval setting’.

      We also adjusted the following section in Main text, page 6:

      Confidence in the performance of social and non-social advisors

      We compared trial-by-trial interval setting in relation to the social and non-social advisors/predictors. When setting the interval, the participant’s aim was to minimize it while ensuring it still encompassed the final target position; points were won when it encompassed the target position but were greater when it was narrower. A given participant’s interval setting should, therefore, change in proportion to the participant’s expectations about the predictor’s angular error and their uncertainty about those expectations. Even though, on average, social and non-social sources did not differ in the precision with which they predicted the target (Figure 2 – figure supplement 1), participants gave interval settings that differed in their relationships to the true performances of the social advisors compared to the non-social predictors. The interval setting was closer to the angular error in the social compared to the non-social sessions (Figure 2a, paired t-test: social vs. non-social, t(23)= -2.57, p= 0.017, 95% confidence interval (CI)= [-0.36 -0.4]). Differences in interval setting might be due to generally lower performance in the nonsocial compared to social condition, or potentially due to fundamentally different learning processes utilised in either condition. We compared the mean reward amounts obtained by participants in the social and non-social conditions to determine whether there were overall performance differences. There was, however, no difference in the reward received by participants in the two conditions (mean reward: paired t-test social vs. non-social, t(23)= 0.8, p=0.4, 95% CI= [-0.007 0.016]), suggesting that interval setting differences might not simply reflect better or worse performance

      Discussion, page 14:

      Here, participants did not match their confidence to the likely accuracy of their own performance, but instead to the performance of another social or non-social advisor. Participants used different strategies when setting intervals to express their confidence in the performances of social advisors as opposed to non-social advisors. A possible explanation might be that participants have a better insight into the abilities of social cues – typically other agents – than non-social cues – typically inanimate objects.

      As the authors assumed simple Bayesian learning for the estimation of reliability in this study, the degree/speed of the learning should be examined with reference to the distance between the posterior and prior belief in the optimal Bayesian inference.

      We thank the reviewer for this suggestion. We agree with the reviewer that further analyses that aim to disentangle the underlying mechanisms that might differ between both social and non-social conditions might provide additional theoretical contributions. We show additional model simulations and analyses that aim to disentangle the differences in more detail. These new results allowed clearer interpretations to be made.

      In the current study, we showed that judgments made about non-social predictors were changed more strongly as a function of the subjective uncertainty: participants set a larger interval, indicating lower confidence, when they were more uncertain about the non-social cue’s accuracy to predict the target. In response to the reviewer’s comments, the new analyses were aimed at understanding under which conditions such a negative uncertainty effect might emerge.

      Prior expectations of performance First, we compared whether participants had different prior expectations in the social condition compared to the non-social condition. One way to compare prior expectations is by comparing the first interval set for each advisor/predictor. This is a direct readout of the initial prior expectation with which participants approach our two conditions. In such a way, we test whether the prior beliefs before observing any social or non-social information differ between conditions. Even though this does not test the impact of prior expectations on subsequent belief updates, it does test whether participants have generally different expectations about the performance of social advisors or non-social predictors. There was no difference in this measure between social or non-social cues (Figure below; paired t-test social vs. non-social, t(23)= 0.01, p=0.98, 95% CI= [-0.067 0.68]).

      Figure. Confidence interval for the first encounter of each predictor in social and non-social conditions. There was no initial bias in predicting the performance of social or non-social predictors.

      Learning across time We have now seen that participants do not have an initial bias when predicting performances in social or non-social conditions. This suggests that differences between conditions might emerge across time when encountering predictors multiple times. We tested whether inherent differences in how beliefs are updated according to new observations might result in different impacts of uncertainty on interval setting between social and non-social conditions. More specifically, we tested whether the integration of new evidence differed between social and non-social conditions; for example, recent observations might be weighted more strongly for non-social cues while past observations might be weighted more strongly for social cues. This approach was inspired by the reviewer’s comments about potential differences in the speed of learning as well as the reduction of uncertainty with increasing predictor encounters. Similar ideas were tested in previous studies, when comparing the learning rate (i.e. the speed of learning) in environments of different volatilities 12,13. In these studies, a smaller learning rate was prevalent in stable environments during which reward rates change slower over time, while higher learning rates often reflect learning in volatile environments so that recent observations have a stronger impact on behaviour. Even though most studies derived these learning rates with reinforcement learning models, similar ideas can be translated into a Bayesian model. For example, an established way of changing the speed of learning in a Bayesian model is to introduce noise during the update process14. This noise is equivalent to adding in some of the initial prior distribution and this will make the Bayesian updates more flexible to adapt to changing environments. It will widen the belief distribution and thereby make it more uncertain. Recent information has more weight on the belief update within a Bayesian model when beliefs are uncertain. This increases the speed of learning. In other words, a wide distribution (after adding noise) allows for quick integration of new information. On the contrary, a narrow distribution does not integrate new observations as strongly and instead relies more heavily on previous information; this corresponds to a small learning rate. So, we would expect a steep decline of uncertainty to be related to a smaller learning index while a slower decline of uncertainty is related to a larger learning index. We hypothesized that participants reduce their uncertainty quicker when observing social information, thereby anchoring more strongly on previous beliefs instead of integrating new observations flexibly. Vice versa, we hypothesized a less steep decline of uncertainty when observing non-social information, indicating that new information can be flexibly integrated during the belief update (new Figure 3a).

      We modified the original Bayesian model (Figure 2d, Figure 2 – figure supplement 2) by adding a uniform distribution (equivalent to our prior distribution) to each belief update – we refer to this as noise addition to the Bayesian model14,21 . We varied the amount of noise between δ = [0,1], while δ= 0 equals the original Bayesian model and δ= 1 represents a very noisy Bayesian model. The uniform distribution was selected to match the first prior belief before any observation was made (equation 2). This δ range resulted in a continuous increase of subjective uncertainty around the belief about the angular error (Figure 3b-c). The modified posterior distribution denoted as 𝑝′(σ x) was derived at each trial as follows:

      We applied each noisy Bayesian model to participants’ choices within the social and nonsocial condition.

      The addition of a uniform distribution changed two key features of the belief distribution: first, the width of the distribution remains larger with additional observations, thereby making it possible to integrate new observations more flexibly. To show this more clearly, we extracted the model-derived uncertainty estimate across multiple encounters of the same predictor for the original model and the fully noisy Bayesian model (Figure 3 – figure supplement 1). The model-derived ‘uncertainty estimate’ of a noisy Bayesian model decays more slowly compared to the ‘uncertainty estimate’ of the original Bayesian model (upper panel). Second, the model-derived ‘accuracy estimate’ reflects more recent observations in a noisy Bayesian model compared to the ‘accuracy estimate’ derived from the original Bayesian model, which integrates past observations more strongly (lower panel). Hence, as mentioned beforehand, a rapid decay of uncertainty implies a small learning index; or in other words, stronger integration of past compared to recent observations.

      In the following analyses, we tested whether an increasingly noisy Bayesian model mimics behaviour that is observed in the non-social compared to social condition. For example, we tested whether an increasingly noisy Bayesian model also exhibits a strongly negative ‘predictor uncertainty’ effect on interval setting (Figure 2e). In such a way, we can test whether differences in noise in the updating process of a Bayesian model might reproduce important qualitative differences in learning-related behaviour seen in the social and nonsocial conditions.

      We used these modified Bayesian models to simulate trial-wise interval setting for each participant according to the observations they made when selecting a particular advisor or non-social cue. We simulated interval setting at each trial and examined whether an increase in noise produced model behaviours that resembled participant behaviour patterns observed in the non-social condition as opposed to social condition. At each trial, we used the accuracy estimate (Methods, equation 6) – which represents a subjective belief about a single angular error -- to derive an interval setting for the selected predictor. To do so, we first derived the point-estimate of the belief distribution at each trial (Methods, equation 6) and multiplied it with the size of one interval step on the circle. The step size was derived by dividing the circle size by the maximum number of possible steps. Here is an example of transforming an accuracy estimate into an interval: let’s assume the belief about the angular error at the current trial is 50 (Methods, equation 6). Now, we are trying to transform this number into an interval for the current predictor on a given trial. To obtain the size of one interval step, the circle size (360 degrees) is divided by the maximum number of interval steps (40 steps; note, 20 steps on each side), which results in nine degrees that represents the size of one interval step. Next, the accuracy estimate in radians (0,87) is multiplied by the step size in radians (0,1571) resulting in an interval of 0,137 radians or 7,85 degrees. The final interval size would be 7,85.

      Simulating Bayesian choices in that way, we repeated the behavioural analyses (Figure 2b,e,f) to test whether intervals derived from more noisy Bayesian models mimic intervals set by participants in the non-social condition: greater changes in interval setting across trials (Figure 3 – figure supplement 1b), a negative ‘predictor uncertainty' effect on interval setting (Figure 3 – figure supplement 1c), and a higher learning index (Figure 3d).

      First, we repeated the most crucial analysis -- the linear regression analysis (Figure 2e) and hypothesized that intervals that were simulated from noisy Bayesian models would also show a greater negative ‘predictor uncertainty’ effect on interval setting. This was indeed the case: irrespective of social or non-social conditions, the addition of noise (increased weighting of the uniform distribution in each belief update) led to an increasingly negative ‘predictor uncertainty’ effect on confidence judgment (new Figure 3d). In Figure 3d, we show the regression weights (y-axis) for the ‘predictor uncertainty’ on confidence judgment with increasing noise (x-axis). This result is highly consistent with the idea that that in the non-social condition the manner in which task estimates are updated is more uncertain and more noisy. By contrast, social estimates appear relatively more stable, also according to this new Bayesian simulation analysis.

      This new finding extends the results and suggests a formal computational account of the behavioural differences between social and non-social conditions. Increasing the noise of the belief update mimics behaviour that is observed in the non-social condition: an increasingly negative effect of ‘predictor uncertainty’ on confidence judgment. Noteworthily, there was no difference in the impact that the noise had in the social and non-social conditions. This was expected because the Bayesian simulations are blind to the framing of the conditions. However, it means that the observed effects do not depend on the precise sequence of choices that participants made in these conditions. It therefore suggests that an increase in the Bayesian noise leads to an increasingly negative impact of ‘predictor uncertainty’ on confidence judgments irrespective of the condition. Hence, we can conclude that different degrees of uncertainty within the belief update is a reasonable explanation that can underlie the differences observed between social and non-social conditions.

      Next, we used these simulated confidence intervals and repeated the descriptive behavioural analyses to test whether interval settings that were derived from more noisy Bayesian models mimic behavioural patterns observed in non-social compared to social conditions. For example, more noise in the belief update should lead to more flexible integration of new information and hence should potentially lead to a greater change of confidence judgments across predictor encounters (Figure 2b). Further, a greater reliance on recent information should lead to prediction errors more strongly in the next confidence judgment; hence, it should result in a higher learning index in the non-social condition that we hypothesize to be perceived as more uncertain (Figure 2f). We used the simulated confidence interval from Bayesian models on a continuum of noise integration (i.e. different weighting of the uniform distribution into the belief update) and derived again both absolute confidence change and learning indices (Figure 3 – figure supplement 1b-c).

      ‘Absolute confidence change’ and ‘learning index’ increase with increasing noise weight, thereby mimicking the difference between social and non-social conditions. Further, these analyses demonstrate the tight relationship between descriptive analyses and model-based analyses. They show that a noise in the Bayesian updating process is a conceptual explanation that can account for both the differences in learning and the difference in uncertainty processing that exist between social and non-social conditions. The key insight conveyed by the Bayesian simulations is that a wider, more uncertain belief distribution changes more quickly. Correspondingly, in the non-social condition, participants express more uncertainty in their confidence estimate when they set the interval, and they also change their beliefs more quickly as expressed in a higher learning index. Therefore, noisy Bayesian updating can account for key differences between social and non-social condition.

      We thank the reviewer for making this point, as we believe that these additional analyses allow theoretical inferences to be made in a more direct manner; we think that it has significantly contributed towards a deeper understanding of the mechanisms involved in the social and non-social conditions. Further, it provides a novel account of how we make judgments when being presented with social and non-social information.

      We made substantial changes to the main text, figures and supplementary material to include these changes:

      Main text, page 10-11 new section:

      The impact of noise in belief updating in social and non-social conditions

      So far, we have shown that, in comparison to non-social predictors, participants changed their interval settings about social advisors less drastically across time, relied on observations made further in the past, and were less impacted by their subjective uncertainty when they did so (Figure 2). Using Bayesian simulation analyses, we investigated whether a common mechanism might underlie these behavioural differences. We tested whether the integration of new evidence differed between social and non-social conditions; for example, recent observations might be weighted more strongly for non-social cues while past observations might be weighted more strongly for social cues. Similar ideas were tested in previous studies, when comparing the learning rate (i.e. the speed of learning) in environments of different volatilities12,13. We tested these ideas using established ways of changing the speed of learning during Bayesian updates14,21. We hypothesized that participants reduce their uncertainty quicker when observing social information. Vice versa, we hypothesized a less steep decline of uncertainty when observing non-social information, indicating that new information can be flexibly integrated during the belief update (Figure 5a).

      We manipulated the amount of uncertainty in the Bayesian model by adding a uniform distribution to each belief update (Figure 3b-c) (equation 10,11). Consequently, the distribution’s width increases and is more strongly impacted by recent observations (see example in Figure 3 – figure supplement 1). We used these modified Bayesian models to simulate trial-wise interval setting for each participant according to the observations they made by selecting a particular advisor in the social condition or other predictor in the nonsocial condition. We simulated confidence intervals at each trial. We then used these to examine whether an increase in noise led to simulation behaviour that resembled behavioural patterns observed in non-social conditions that were different to behavioural patterns observed in the social condition.

      First, we repeated the linear regression analysis and hypothesized that interval settings that were simulated from noisy Bayesian models would also show a greater negative ‘predictor uncertainty’ effect on interval setting resembling the effect we had observed in the nonsocial condition (Figure 2e). This was indeed the case when using the noisy Bayesian model: irrespective of social or non-social condition, the addition of noise (increasing weight of the uniform distribution to each belief update) led to an increasingly negative ‘predictor uncertainty’ effect on confidence judgment (new Figure 3d). The absence of difference between the social and non-social conditions in the simulations, suggests that an increase in the Bayesian noise is sufficient to induce a negative impact of ‘predictor uncertainty’ on interval setting. Hence, we can conclude that different degrees of noise in the updating process are sufficient to cause differences observed between social and non-social conditions. Next, we used these simulated interval settings and repeated the descriptive behavioural analyses (Figure 2b,f). An increase in noise led to greater changes of confidence across time and a higher learning index (Figure 3 – figure supplement 1b-c). In summary, the Bayesian simulations offer a conceptual explanation that can account for both the differences in learning and the difference in uncertainty processing that exist between social and non-social conditions. The key insight conveyed by the Bayesian simulations is that a wider, more uncertain belief distribution changes more quickly. Correspondingly, in the non-social condition, participants express more uncertainty in their confidence estimate when they set the interval, and they also change their beliefs more quickly. Therefore, noisy Bayesian updating can account for key differences between social and non-social condition.

      Methods, page 23 new section:

      Extension of Bayesian model with varying amounts of noise

      We modified the original Bayesian model (Figure 2d, Figure 2 – figure supplement 2) to test whether the integration of new evidence differed between social and non-social conditions; for example, recent observations might be weighted more strongly for non-social cues while past observations might be weighted more strongly for social cues. [...] To obtain the size of one interval step, the circle size (360 degrees) is divided by the maximum number of interval steps (40 steps; note, 20 steps on each side), which results in nine degrees that represents the size of one interval step. Next, the accuracy estimate in radians (0,87) is multiplied by the step size in radians (0,1571) resulting in an interval of 0,137 radians or 7,85 degrees. The final interval size would be 7,85.

      We repeated behavioural analyses (Figure 2b,e,f) to test whether confidence intervals derived from more noisy Bayesian models mimic behavioural patterns observed in the nonsocial condition: greater changes of confidence across trials (Figure 3 – figure supplement 1b), a greater negative ‘predictor uncertainty' on confidence judgment (Figure 3 – figure supplement 1c) and a greater learning index (Figure 3d).

      Discussion, page 14: […] It may be because we make just such assumptions that past observations are used to predict performance levels that people are likely to exhibit next 15,16. An alternative explanation might be that participants experience a steeper decline of subjective uncertainty in their beliefs about the accuracy of social advice, resulting in a narrower prior distribution, during the next encounter with the same advisor. We used a series of simulations to investigate how uncertainty about beliefs changed from trial to trial and showed that belief updates about non-social cues were consistent with a noisier update process that diminished the impact of experiences over the longer term. From a Bayesian perspective, greater certainty about the value of advice means that contradictory evidence will need to be stronger to alter one’s beliefs. In the absence of such evidence, a Bayesian agent is more likely to repeat previous judgments. Just as in a confirmation bias 17, such a perspective suggests that once we are more certain about others’ features, for example, their character traits, we are less likely to change our opinions about them.

      Reviewer #2 (Public Review):

      Humans learn about the world both directly, by interacting with it, and indirectly, by gathering information from others. There has been a longstanding debate about the extent to which social learning relies on specialized mechanisms that are distinct from those that support learning through direct interaction with the environment. In this work, the authors approach this question using an elegant within-subjects design that enables direct comparisons between how participants use information from social and non-social sources. Although the information presented in both conditions had the same underlying structure, participants tracked the performance of the social cue more accurately and changed their estimates less as a function of prediction error. Further, univariate activity in two regions-dmPFC and pTPJ-tracked participants' confidence judgments more closely in the social than in the non-social condition, and multivariate patterns of activation in these regions contained information about the identity of the social cues.

      Overall, the experimental approach and model used in this paper are very promising. However, after reading the paper, I found myself wanting additional insight into what these condition differences mean, and how to place this work in the context of prior literature on this debate. In addition, some additional analyses would be useful to support the key claims of the paper.

      We thank the reviewer for their very supportive comments. We have addressed their points below and have highlighted changes in our manuscript that we made in response to the reviewer’s comments.

      (1) The framing should be reworked to place this work in the context of prior computational work on social learning. Some potentially relevant examples:

      • Shafto, Goodman & Frank (2012) provide a computational account of the domainspecific inductive biases that support social learning. In brief, what makes social learning special is that we have an intuitive theory of how other people's unobservable mental states lead to their observable actions, and we use this intuitive theory to actively interpret social information. (There is also a wealth of behavioral evidence in children to support this account; for a review, see Gweon, 2021).

      • Heyes (2012) provides a leaner account, arguing that social and non-social learning are supported by a common associative learning mechanism, and what distinguishes social from non-social learning is the input mechanism. Social learning becomes distinctively "social" to the extent that organisms are biased or attuned to social information.

      I highlight these papers because they go a step beyond asking whether there is any difference between mechanisms that support social and nonsocial learning-they also provide concrete proposals about what that difference might be, and what might be shared. I would like to see this work move in a similar direction.

      References<br /> (In the interest of transparency: I am not an author on these papers.)

      Gweon, H. (2021). Inferential social learning: how humans learn from others and help others learn. PsyArXiv. https://doi.org/10.31234/osf.io/8n34t

      Heyes, C. (2012). What's social about social learning?. Journal of Comparative Psychology, 126(2), 193.

      Shafto, P., Goodman, N. D., & Frank, M. C. (2012). Learning from others: The consequences of psychological reasoning for human learning. Perspectives on Psychological Science, 7(4), 341-351.

      Thank you for this suggestion to expand our framing. We have now made substantial changes to the Discussion and Introduction to include additional background literature, the relevant references suggested by the reviewer, addressing the differences between social and non-social learning. We further related our findings to other discussions in the literature that argue that differences between social and non-social learning might occur at the level of algorithms (the computations involved in social and non-social learning) and/or implementation (the neural mechanisms). Here, we describe behaviour with the same algorithm (Bayesian model), but the weighing of uncertainty on decision-making differs between social and non-social contexts. This might be explained by similar ideas put forward by Shafto and colleagues (2012), who suggest that differences between social and non-social learning might be due to the attribution of goal-directed intention to social agents, but not non-social cues. Such an attribution might lead participants to assume that advisor performances will be relatively stable under the assumption that they should have relatively stable goal-directed intentions. We also show differences at the implementational level in social and non-social learning in TPJ and dmPFC.

      Below we list the changes we have made to the Introduction and Discussion. Further, we would also like to emphasize the substantial extension of the Bayesian modelling which we think clarifies the theoretical framework used to explain the mechanisms involved in social and non-social learning (see our answer to the next comments below).

      Introduction, page 4:

      [...]<br /> Therefore, by comparing information sampling from social versus non-social sources, we address a long-standing question in cognitive neuroscience, the degree to which any neural process is specialized for, or particularly linked to, social as opposed to non-social cognition 2–9. Given their similarities, it is expected that both types of learning will depend on common neural mechanisms. However, given the importance and ubiquity of social learning, it may also be that the neural mechanisms that support learning from social advice are at least partially specialized and distinct from those concerned with learning that is guided by nonsocial sources.

      However, it is less clear on which level information is processed differently when it has a social or non-social origin. It has recently been argued that differences between social and non-social learning can be investigated on different levels of Marr’s information processing theory: differences could emerge at an input level (in terms of the stimuli that might drive social and non-social learning), at an algorithmic level or at a neural implementation level 7. It might be that, at the algorithmic level, associative learning mechanisms are similar across social and non-social learning 1. Other theories have argued that differences might emerge because goal-directed actions are attributed to social agents which allows for very different inferences to be made about hidden traits or beliefs 10. Such inferences might fundamentally alter learning about social agents compared to non-social cues.

      Discussion, page 15:

      […] One potential explanation for the assumption of stable performance for social but not non-social predictors might be that participants attribute intentions and motivations to social agents. Even if the social and non-social evidence are the same, the belief that a social actor might have a goal may affect the inferences made from the same piece of information 10. Social advisors first learnt about the target’s distribution and accordingly gave advice on where to find the target. If the social agents are credited with goal-directed behaviour then it might be assumed that the goals remain relatively constant; this might lead participants to assume stability in the performances of social advisors. However, such goal-directed intentions might not be attributed to non-social cues, thereby making judgments inherently more uncertain and changeable across time. Such an account, focussing on differences in attribution in social settings aligns with a recent suggestion that any attempt to identify similarities or differences between social and non-social processes can occur at any one of a number of the levels in Marr’s information theory 7. Here we found that the same algorithm was able to explain social and non-social learning (a qualitatively similar computational model could explain both). However, the extent to which the algorithm was recruited when learning about social compared to non-social information differed. We observed a greater impact of uncertainty on judgments about social compared to non-social information. We have shown evidence for a degree of specialization when assessing social advisors as opposed to non-social cues. At the neural level we focused on two brain areas, dmPFC and pTPJ, that have not only been shown to carry signals associated with belief inferences about others but, in addition, recent combined fMRI-TMS studies have demonstrated the causal importance of these activity patterns for the inference process […]

      (2) The results imply that dmPFC and pTPJ differentiate between learning from social and non-social sources. However, more work needs to be done to rule out simpler, deflationary accounts. In particular, the condition differences observed in dmPFC and pTPJ might reflect low-level differences between the two conditions. For example, the social task could simply have been more engaging to participants, or the social predictors may have been more visually distinct from one another than the fruits.

      We understand the reviewer’s concern regarding low-level distinctions between the social and non-social condition that could confound for the differences in neural activation that are observed between conditions in areas pTPJ and dmPFC. From the reviewer’s comments, we understand that there might be two potential confounders: first, low-level differences such that stimuli within one condition might be more distinct to each other compared to the relative distinctiveness between stimuli within the other condition. Therefore, simply the greater visual distinctiveness of stimuli in one condition than another might lead to learning differences between conditions. Second, stimuli in one condition might be more engaging and potentially lead to attentional differences between conditions. We used a combination of univariate analyses and multivariate analyses to address both concerns.

      Analysis 1: Univariate analysis to inspect potential unaccounted variance between social and non-social condition

      First, we used the existing univariate analysis (exploratory MRI whole-brain analysis, see Methods) to test for neural activation that covaried with attentional differences – or any other unaccounted neural difference -- between conditions. If there were neural differences between conditions that we are currently not accounting for with the parametric regressors that are included in the fMRI-GLM, then these differences should be captured in the constant of the GLM model. For example, if there are attentional differences between conditions, then we could expect to see neural differences between conditions in areas such as inferior parietal lobe (or other related areas that are commonly engaged during attentional processes).

      Importantly, inspection of the constant of the GLM model should capture any unaccounted differences, whether they are due to attention or alternative processes that might differ between conditions. When inspecting cluster-corrected differences in the constant of the fMRI-GLM model during the setting of the confidence judgment, there were no clustersignificant activation that was different between social and non-social conditions (Figure 4 – figure supplement 4a; results were familywise-error cluster-corrected at p<0.05 using a cluster-defining threshold of z>2.3). For transparency, we show the sub-threshold activation map across the whole brain (z > 2) for the ‘constant’ contrasted between social and nonsocial condition (i.e. constant, contrast: social – non-social).

      For transparency we additionally used an ROI-approach to test differences in activation patterns that correlated with the constant during the confidence phase – this means, we used the same ROI-approach as we did in the paper to avoid any biased test selection. We compared activation patterns between social and non-social conditions in the same ROI as used before; dmPFC (MNI-coordinate [x/y/z: 2,44,36] 16), bilateral pTPJ (70% probability anatomical mask; for reference see manuscript, page 23) and additionally compared activation patterns between conditions in bilateral IPLD (50% probability anatomical mask, 20). We did not find significantly different activation patterns between social and non-social conditions in any of these areas: dmPFC (confidence constant; paired t-test social vs nonsocial: t(23) = 0.06, p=0.96, [-36.7, 38.75]), bilateral TPJ (confidence constant; paired t-test social vs non-social: t(23) = -0.06, p=0.95, [-31, 29]), bilateral IPLD (confidence constant; paired t-test social vs non-social: t(23) = -0.58, p=0.57, [-30.3 17.1]).

      There were no meaningful activation patterns that differed between conditions in either areas commonly linked to attention (eg IPL) or in brain areas that were the focus of the study (dmPFC and pTPJ). Activation in dmPFC and pTPJ covaried with parametric effects such as the confidence that was set at the current and previous trial, and did not correlate with low-level differences such as attention. Hence, these results suggest that activation between conditions was captured better by parametric regressors such as the trial-wise interval setting, i.e. confidence, and are unlikely to be confounded by low-level processes that can be captured with univariate neural analyses.

      Analysis 2: RSA to test visual distinctiveness between social and non-social conditions

      We addressed the reviewer’s other comment further directly by testing whether potential differences between conditions might arise due to a varying degree of visual distinctiveness in one stimulus set compared to the other stimulus set. We used RSA analysis to inspect potential differences in early visual processes that should be impacted by greater stimulus similarity within one condition. In other words, we tested whether the visual distinctiveness of one stimuli set was different to the visual distinctiveness of the other stimuli set. We used RSA analysis to compare the Exemplar Discriminability Index (EDI) between conditions in early visual areas. We compared the dissimilarity of neural activation related to the presentation of an identical stimulus across trials (diagonal in RSA matrix) with the dissimilarity in neural activation between different stimuli across trials (off-diagonal in RSA matrix). If stimuli within one stimulus set are very similar, then the difference between the diagonal and off-diagonal should be very small and less likely to be significant (i.e. similar diagonal and off-diagonal values). In contrast, if stimuli within one set are very distinct from each other, then the difference between the diagonal and off-diagonal should be large and likely to result in a significant EDI (i.e. different diagonal and off-diagonal values) (see Figure 4g for schematic illustration). Hence, if there is a difference in the visual distinctiveness between social and non-social conditions, then this difference should result in different EDI values for both conditions – hence, visual distinctiveness between the stimuli set can be tested by comparing the EDI values between conditions within the early visual processing. We used a Harvard-cortical ROI mask based on bilateral V1. Negative EDI values indicate that the same exemplars are represented more similarly in the neural V1 pattern than different exemplars. This analysis showed that there was no significant difference in EDI between conditions (Figure 4 – figure supplement 4b; EDI paired sample t-test: t(23) = -0.16, p=0.87, 95% CI [-6.7 5.7]).

      We have further replicated results in V1 with a whole-brain searchlight analysis, averaging across both social and non-social conditions.

      In summary, by using a combination of univariate and multivariate analyses, we could test whether neural activation might be different when participants were presented with a facial or fruit stimuli and whether these differences might confound observed learning differences between conditions. We did not find meaningful neural differences that were not accounted for with the regressors included in the GLM. Further, we did not find differences in the visual distinctiveness between the stimuli sets. Hence, these control analyses suggest that differences between social and non-social conditions might not arise because of differences in low-level processes but are instead more likely to develop when learning about social or non-social information.

      Moreover, we also examined behaviourally whether participants differed in the way they approached social and non-social condition. We tested whether there were initial biases prior to learning, i.e. before actually receiving information from either social or non-social information sources. Therefore, we tested whether participants have different prior expecations about the performance of social compared to non-social predictors. We compared the confidence judgments at the first trial of each predictor. We found that participants set confidence intervals very similarly in social and non-social conditions (Figure below). Hence, it did not seem to be the case that differences between conditions arose due to low level differences in stimulus sets or prior differences in expectations about performances of social compared to non-social predictors. However, we can show that differences between conditions are apparent when updating one’s belief about social advisors or non-social cues and as a consequence, in the way that confidence judgments are set across time.

      Figure. Confidence interval for the first encounter of each predictor in social and non-social conditions. There was no initial bias in predicting the performance of social or non-social predictors.

      Main text page 13:

      [… ]<br /> Additional control analyses show that neural differences between social and non-social conditions were not due to the visually different set of stimuli used in the experiment but instead represent fundamental differences in processing social compared to non-social information (Figure 4 – figure supplement 4). These results are shown in ROI-based RSA analysis and in whole-brain searchlight analysis. In summary, in conjunction, the univariate and multivariate analyses demonstrate that dmPFC and pTPJ represent beliefs about social advisors that develop over a longer timescale and encode the identities of the social advisors.

      References

      1. Heyes, C. (2012). What’s social about social learning? Journal of Comparative Psychology 126, 193–202. 10.1037/a0025180.
      2. Chang, S.W.C., and Dal Monte, O. (2018). Shining Light on Social Learning Circuits. Trends in Cognitive Sciences 22, 673–675. 10.1016/j.tics.2018.05.002.
      3. Diaconescu, A.O., Mathys, C., Weber, L.A.E., Kasper, L., Mauer, J., and Stephan, K.E. (2017). Hierarchical prediction errors in midbrain and septum during social learning. Soc Cogn Affect Neurosci 12, 618–634. 10.1093/scan/nsw171.
      4. Frith, C., and Frith, U. (2010). Learning from Others: Introduction to the Special Review Series on Social Neuroscience. Neuron 65, 739–743. 10.1016/j.neuron.2010.03.015.
      5. Frith, C.D., and Frith, U. (2012). Mechanisms of Social Cognition. Annu. Rev. Psychol. 63, 287–313. 10.1146/annurev-psych-120710-100449.
      6. Grabenhorst, F., and Schultz, W. (2021). Functions of primate amygdala neurons in economic decisions and social decision simulation. Behavioural Brain Research 409, 113318. 10.1016/j.bbr.2021.113318.
      7. Lockwood, P.L., Apps, M.A.J., and Chang, S.W.C. (2020). Is There a ‘Social’ Brain? Implementations and Algorithms. Trends in Cognitive Sciences, S1364661320301686. 10.1016/j.tics.2020.06.011.
      8. Soutschek, A., Ruff, C.C., Strombach, T., Kalenscher, T., and Tobler, P.N. (2016). Brain stimulation reveals crucial role of overcoming self-centeredness in self-control. Sci. Adv. 2, e1600992. 10.1126/sciadv.1600992.
      9. Wittmann, M.K., Lockwood, P.L., and Rushworth, M.F.S. (2018). Neural Mechanisms of Social Cognition in Primates. Annu. Rev. Neurosci. 41, 99–118. 10.1146/annurev-neuro080317-061450.
      10. Shafto, P., Goodman, N.D., and Frank, M.C. (2012). Learning From Others: The Consequences of Psychological Reasoning for Human Learning. Perspect Psychol Sci 7, 341– 351. 10.1177/1745691612448481.
      11. McGuire, J.T., Nassar, M.R., Gold, J.I., and Kable, J.W. (2014). Functionally Dissociable Influences on Learning Rate in a Dynamic Environment. Neuron 84, 870–881. 10.1016/j.neuron.2014.10.013.
      12. Behrens, T.E.J., Woolrich, M.W., Walton, M.E., and Rushworth, M.F.S. (2007). Learning the value of information in an uncertain world. Nature Neuroscience 10, 1214– 1221. 10.1038/nn1954.
      13. Meder, D., Kolling, N., Verhagen, L., Wittmann, M.K., Scholl, J., Madsen, K.H., Hulme, O.J., Behrens, T.E.J., and Rushworth, M.F.S. (2017). Simultaneous representation of a spectrum of dynamically changing value estimates during decision making. Nat Commun 8, 1942. 10.1038/s41467-017-02169-w.
      14. Allenmark, F., Müller, H.J., and Shi, Z. (2018). Inter-trial effects in visual pop-out search: Factorial comparison of Bayesian updating models. PLoS Comput Biol 14, e1006328. 10.1371/journal.pcbi.1006328.
      15. Wittmann, M., Trudel, N., Trier, H.A., Klein-Flügge, M., Sel, A., Verhagen, L., and Rushworth, M.F.S. (2021). Causal manipulation of self-other mergence in the dorsomedial prefrontal cortex. Neuron.
      16. Wittmann, M.K., Kolling, N., Faber, N.S., Scholl, J., Nelissen, N., and Rushworth, M.F.S. (2016). Self-Other Mergence in the Frontal Cortex during Cooperation and Competition. Neuron 91, 482–493. 10.1016/j.neuron.2016.06.022.
      17. Kappes, A., Harvey, A.H., Lohrenz, T., Montague, P.R., and Sharot, T. (2020). Confirmation bias in the utilization of others’ opinion strength. Nat Neurosci 23, 130–137. 10.1038/s41593-019-0549-2.
      18. Trudel, N., Scholl, J., Klein-Flügge, M.C., Fouragnan, E., Tankelevitch, L., Wittmann, M.K., and Rushworth, M.F.S. (2021). Polarity of uncertainty representation during exploration and exploitation in ventromedial prefrontal cortex. Nat Hum Behav. 10.1038/s41562-020-0929-3.
      19. Yu, Z., Guindani, M., Grieco, S.F., Chen, L., Holmes, T.C., and Xu, X. (2022). Beyond t test and ANOVA: applications of mixed-effects models for more rigorous statistical analysis in neuroscience research. Neuron 110, 21–35. 10.1016/j.neuron.2021.10.030.
      20. Mars, R.B., Jbabdi, S., Sallet, J., O’Reilly, J.X., Croxson, P.L., Olivier, E., Noonan, M.P., Bergmann, C., Mitchell, A.S., Baxter, M.G., et al. (2011). Diffusion-Weighted Imaging Tractography-Based Parcellation of the Human Parietal Cortex and Comparison with Human and Macaque Resting-State Functional Connectivity. Journal of Neuroscience 31, 4087– 4100. 10.1523/JNEUROSCI.5102-10.2011.
      21. Yu, A.J., and Cohen, J.D. Sequential effects: Superstition or rational behavior? 8.
      22. Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., and Kriegeskorte, N. (2014). A Toolbox for Representational Similarity Analysis. PLoS Comput Biol 10, e1003553. 10.1371/journal.pcbi.1003553.
      23. Lockwood, P.L., Wittmann, M.K., Nili, H., Matsumoto-Ryan, M., Abdurahman, A., Cutler, J., Husain, M., and Apps, M.A.J. (2022). Distinct neural representations for prosocial and self-benefiting effort. Current Biology 32, 4172-4185.e7. 10.1016/j.cub.2022.08.010.
    1. Author Response

      Reviewer #1 (Public Review):

      The authors performed simultaneous extracellular recordings in brain regions (CA1, prefrontal cortex (PFC), olfactory bulb (OB)) that are key to odor-guided decision making to delineate the oscillatory and cell population dynamics that guide decision making based on learned associations. They used complementary analyses to assess the coordination between CA1 and medial PFC (mPFC), using coherence and phase-locking analysis as well as generalized linear models and Bayesian decoding methods.

      One of the strengths of this work is the comparison of beta and respiratory (RR) LFP coherence in several behavioral states to rule out confounds due to sniffing or preparatory motor behavior (e.g., coherence was assessed during decision making with and without an odor present, during reward consumption). These controls allowed the authors to identify a specific enhancement of beta compared to RR coherence during decision making.

      The analyses of task-responsive putative interneuron and pyramidal cells suggest that accurate decision-making is associated with a stronger modulation of beta phase-locking in interneurons. Additional cross-correlation analyses between cell types across regions showed that cells, particularly interneurons, are temporally coordinated in the beta range. Their analyses did not identify a mechanism for this coordination, but the temporal lags between PFC and CA1 cells raise the possibility of top-down interactions mediated by a third brain region.

      The authors used the cellular activity to determine that the animal's upcoming behavior could be predicted from the ensemble activity during decision-making a few hundred milliseconds before the behavioral choice, but decoding accuracy diminished soon after the decision-making period. Interestingly, decoding accuracy increased after decision-making when using the spatially active cell ensembles. As indicated by the authors, these results suggest that different cell ensembles are engaged during decision-making and during the execution of the decision. It is possible that this change in ensemble dynamics before and after decision-making relates to the familiarity of the animals with the task, which makes it likely to involve procedural components (e.g., Jog et al., 1999). As pointed out by the authors in the discussion, several results have implications for the formation of associative memories and provide clues for future experiments. Thus, future work looking at the ensemble dynamics and at the occurrence of CA1 ripples in the early stages of task learning compared to when the animals are very familiar with the task (as in the current study), will provide a better understanding of the shifts that develop during the formation and consolidation of the association.

      One of the considerations in interpreting the results is that the odor sampling and decisionmaking periods overlap, making it difficult to disentangle the neural dynamics that are driven by the recall of the association (cued retrieval) and those that relate to the upcoming turning behavior after odor port disengagement. However, the author's analyses of odor and choice selectivity in correct and incorrect trials demonstrate a preferential association between spike activity and choice selection in this task.

      Overall, the results advance our understanding of odor-guided decision-making mechanisms in CA1 and PFC at the LFP and cell population level. This work will be of significance to further research on the cellular basis of memory-guided decision-making, and to future work characterizing the interactions between CA1 and PFC during learning.

      We thank the Reviewer for their detailed evaluation summarizing and highlighting the strengths of the study. In addition to beta and respiratory rhythm (RR) modulation of CA1-PFC activity and the relationship between spiking activity and choice selection, the Reviewer also highlighted the temporal coordination of CA1 interneurons and change in ensemble dynamics during the decision-making period at the odor-port vs. during the execution of the decision on the maze, which is further emphasized as a novel result in the revised manuscript.

      Reviewer #2 (Public Review):

      Symanski et al. investigated the communication between the medial prefrontal cortex (mPFC), the hippocampal CA1 region, and the olfactory bulb (OB) while rats underwent an odor-cued decision-making task. By recording local field potentials and spiking activity in the three regions, they found that all regions became synchronized at the beta band and respiratory rhythms during cue sampling/decision-making. Although the strength of inter-region synchrony was not predictive of correct choices, both CA1 and mPFC neurons showed stronger phase-locked firings to beta oscillations for correct than incorrect choices. Moreover, a subset of putative pyramidal and interneurons in both regions were selective for task variables, and as ensembles, they formed activity patterns differentiating choices. Also, their firings were temporally coordinated in a direction that the mPFC interneurons led CA1 interneurons and pyramidal neurons. Based on these findings, the authors propose that cue-evoked beta oscillations modulate the activity of interneurons to coordinate ensemble activity in CA1-mPFC networks supporting decision-making.

      Strength:

      The findings uncovered a new style of mPFC-Hippocampal communication through odorevoked beta oscillations, which contrasts with theta oscillations and sharp-wave/ripples reported during memory-guided spatial navigation tasks. The overall quality of the work is outstanding. The data collection and analysis were meticulously conducted with appropriate controls and statistical tests.

      Weakness:

      The initial analysis of LFP activity (Figure 2d) revealed strong coherence in the beta band in all region pairs; however, the subsequent analysis focuses on mPFC-CA1 interaction. To justify this approach, it is essential to establish that the mPFC-CA1 beta synchrony reflects their direct communication rather than a by-product of common inputs from the OB.

      The authors used cross-correlograms to reveal the directionality of mPFC-CA1 interaction. To strengthen the author's view that beta oscillations help coordinate neural activity, it is worth investigating if the same temporal relationship is also detectable within each cycle of beta oscillations. Specifically, mPFC interneurons may fire at earlier phases, followed by firings of CA1 interneurons and pyramidal neurons at later phases.

      We thank the Reviewer for their positive evaluation and constructive comments. We have addressed the weaknesses noted in the revised manuscript. In particular, we have added analyses and text that emphasize the change in beta synchrony in the OB-CA1PFC network during the task, and added analyses that examine phase locking of pyramidal cells and interneurons to beta rhythms in the mPFC, CA1 and OB.

      Reviewer #3 (Public Review):

      Symanski et al. describe a set of interesting results derived from analyzing electrophysiological recordings performed in rats well trained on an associative memory task on a spatial maze (a T maze), in which animals learned to associate a given odor delivered in an initial maze region (upon a nose poke) with a subsequent spatial choice (a left or a right turn) to receive a reward. The authors have obtained LFPs from the OB, PFC, and CA1 from 8 animals subjected to this task, along with single-unit activity from the PFC and CA1. The authors describe that, during odor sampling, there is prominent LFP activity in the beta range (20-30 Hz) as well as prominent activity of the respiration-entrained LFP rhythm (RR, 7-8 Hz). The authors convincingly show that beta activity - but not RR - is specific to odor sampling (RR also shows up during other immobility periods within the task and when animals breathed clean air). They further show that not only beta power but also inter-regional beta coherence significantly enhances during the odor sampling period. In addition, the authors find a higher beta phase modulation of spiking in a subset of neurons associated with subsequent correct decisions. Since the authors also prove - based on behavioral analysis - that the odor-sampling period corresponds to the decisionmaking period in this task, they propose a role for beta coordination of hippocampal-prefrontal networks in sensory-cued decision making. The paper also brings along a set of complementary findings looking at the single unit and ensemble activity in both regions (CA1 and PFC), which are capable of predicting future spatial choices.

      I consider the investigated topic relevant to modern neuroscience and likely to interest a broad audience. Nevertheless, while there is much to like about this paper (e.g., carefully done experiments, advanced computational data analyses, well-written text, and well-crafted figures), I caught some issues that called my attention upon a careful reading, which I list below:

      A) The paper is written in a way clearly centered on rhymical brain activity (c.f. title, abstract, introduction, and discussion). Yet, out of 7 main figures, only 2 of them show data related to oscillations (while 1 figure shows behavioral data and 4 figures show spiking analysis not related to brain rhythms). Therefore, the presentation of the results seems unbalanced and disconnected from the main story.

      B) Somewhat related to the point above, in a strict sense, the title is not well justified ("Rhythmic coordination of hippocampal-prefrontal ensembles (...)") since there is no analysis relating assembly activity with either beta or RR (their results show beta or RR modulating a subset of single units), nor there is a combined ensemble analysis of PFC and hippocampal units (i.e., interregional cell assemblies). Why not try to relate ensemble activity to the observed oscillations?

      C) The main result of increased interregional beta coherence specifically during odor sampling is very interesting and seems quite solid. Though I hate being the one raising questions about the level of advancement, I cannot avoid noticing that similar increases in beta coherence in odor-sampling-based tasks have been observed before (e.g., increased OB-HPC beta coherence during odor sampling has been shown in Martin et al 2007 and between LEC and HPC in Igarashi et al 2014), which is to say that there is overlap between this core finding and previous research. But that said, in times where the reproducibility of our scientific endeavor has been put into question, this particular reviewer favors the publication of similar findings by independent labs, especially given this neatly collected dataset. It is recommended to highlight which results constitute novel insights here and which results provide support for previously published results.

      D) It called to my attention that many of the spiking results were obtained for a small percentage of neurons. For instance, how can the authors be confident that the choice-selective neurons are actually coding for the choice as opposed to being randomly detected by statistical chance? As a case in point, the authors mention that 1309 units were recorded in CA1, and from these 42 cells were choice selective. If the authors have employed a typical alpha of 5% for detecting such neurons, chance alone would predict ~60 neurons being false positives. I apologize if I am missing something, but could the authors clarify? On a related note, even though most findings hold true for a small percentage of neurons, the writing also tends to generalize the findings to the whole population (e.g., "Beta phase modulation of CA1 and PFC neuronal activity during this period was linked to accurate decisions, suggesting that this temporal modulation influences sensory-cued decision making.").

      We thank the Reviewer for their detailed comments and feedback. We have addressed the issues raised by the Reviewer, which has significantly strengthened the manuscript.

      A) We have added several new analyses for rhythmic modulation of spiking activity, and elevated some of the Supplementary Figures related to oscillations to the main figures (Figures 2, 5). In addition, since several of our analyses provide novel results for spiking and ensemble dynamics before and after the decision making period, as noted by Reviewer 1, and we have emphasized these results as a novel advance in the revised manuscript , including the title and abstract.

      B) We agree that our analysis focuses on rhythmic coordination by beta and RR oscillations, phase modulation of single cell spiking activity in CA1 and PFC for accurate odor-cued decision making, and ensemble dynamics during decision making and execution of decisions. While relating ensemble activity to the observed oscillations is a long-term goal, we are limited by the size of simultaneously recorded ensembles within single sessions, since measures of ensemble dynamics per trial are required for such analyses. This is now noted in the Discussion section. We therefore focus our analyses separately on single cell modulation by rhythms and dynamics of ensemble activity during decision making.

      We have also retitled the manuscript to indicate this: “Rhythmic coordination and ensemble dynamics in the hippocampal-prefrontal network for odor-place associative memory and decision making”, to more accurately reflect our results.

      C) We appreciate the Reviewer’s favorable view on independent confirmation of previous results on beta coherence using our strong dataset. We have referenced previous results on OB-HPC, LEC-HPC and striatal beta coherence in the manuscript (e.g., Kay and Beshel 2010; Igarashi et al. 2014; Rangel et al. 2016; Leventhal et al., 2012).

      In addition, we also highlight the novelty of our results in the manuscript, as noted by Reviewers 1 and 2. Our findings in these specific circuits, namely the PFC-CA1 network, during odor-cued decision making are novel. Our results show that beta phase modulation of a sub-population of phase-coherent CA1 and PFC neurons is linked to accurate decision making, elucidate selectivity and ensemble dynamics in these regions during decision making, and show that independent ensembles are recruited during odor-sampling vs. the execution of decisions on the spatial maze. These results are emphasized in the revised manuscript.

      D) We apologize for the misunderstanding regarding the number of neurons. We had initially reported total number of neurons recorded across run and sleep sessions, including those with very few spikes during the task. In determining task-responsive and task-unresponsive neurons (Figure 3), the task-unresponsive set also includes a very large fraction of neurons that did not have sufficient spikes during the odor-sampling or decision making period (e.g. using a criterion of number of spikes equal to number of trials; similar numbers are seen with other criterion such as an absolute minimum number of spikes). These neurons should be more accurately denoted as “Odor Period Inactive”. Therefore a more accurate estimate of task-responsive neurons in CA1 and PFC indicating their task engagement is now shown in Figure 3, starting with neurons that had sufficient spikes for this analysis. Using this metric, a large fractions of neurons are task responsive and selective, similar to previously reported fractions in other studies (Igarashi, et al., 2014). We have added this description and numbers in the text (page 11 lines 230-241) and Methods (page 37 lines 795-797).

      We have also toned down the interpretation by avoiding generalizing to the whole population, and note that beta phase modulation of phase-locked neurons is related to behavior accuracy. Here, in particular, our results suggest a key role of CA1 interneurons in beta-mediated interactions.

    1. Author Response

      Reviewer #2 (Public Review):

      Reinforcement learning (RL) theory is important because it provides a broad, mathematically proven framework for linking behavioral states to behavioral actions, and has the potential for linking realistic biological network dynamics to behavior. The most detailed neurophysiological modeling uses biophysical compartmental models with the theoretical framework of HodgkinHuxley and Rall to describe the dynamics of real neurons, but those models are extremely difficult to link to behavioral output. RL provides a theoretical framework that could help bridge across the still-underexplored chasm between behavioral modeling and neurophysiological detail.

      On the positive side, this paper uses a network of interacting neurons in region CA3 and CA1 (as used in previous models by McNaughton and Morris, 1987; Hasselmo and Schnell, 1994; Treves and Rolls, 1994; Mehta, Quirk and Wilson. 2000; Hasselmo, Bodelon and Wyble, 2002) to address how a simple representation of biological network dynamics could generate the successor representation used in RL. The successor representation is an interesting theory of hippocampal function, as it contrasts with a previous idea of model-based planning. Previous neuroscience data supports the idea that animals use a model-based representation (a cognitive map made up of place cells or grid cells) to read out potential future paths to plan their behavior in the environment. For example, Johnson and Redish, 2007 showed activity spreading into alternating arms of a T-maze before a decision is made (i.e. a model-based exploration of possible actions, NOT a successor representation), and Pfeiffer and Foster, 2013 showed that replay in 2-dimensions corresponds to future goal directed activity. Models such as Erdem and Hasselmo, 2012 and Fenton and Kubie, 2012 showed how forward planning of possible trajectories could guide performance of behavioral tasks. In contrast, the successor representation proposes that model-based activity is too computationally expensive and proposes that instead of reading out various possible model-based future paths when making a decision, that a simulated agent could instead learn a look-up table indicating the probability of future behavioral states accessible from a given state. In previous work, the successor representations accounted for certain aspects of experimental neuroscience data such as place cells responding to the insertion of barriers as seen by Alvernhe et al. and the backward expansion of place field seen by Mehta et al. The current paper is admirable for addressing the potential role of neural replay in training of successor representations and its relationship to other neural and behavioral data such as the papers by Cheng and Frank 2008 and by Wu et al. 2017.

      However, a lot of this same data could still be interpreted as indicating that animals use a model-based representation as described above. There's nothing in this paper that rules out a model-based interpretation of the results discussed above. In fact, the cited paper by Momennejad et al. 2017 shows that humans extensively use model-based mechanisms along with some use of a successor representation in addition to the model-based mechanism. The description in the article under review needs to avoid treating successor representations as if they are already the ground truth.

      To do this, throughout the paper, the authors need to repeatedly address the fact that the Successor Representation is just a theory and not proven experimental fact. And they need to repeatedly in all sections point out that the successor representations hypothesis can be contrasted with the theory that model-based neural activity could instead guide behavior and could be the correct account for all of the data that they address (i.e. such as the darkavoidance behavior). They should cite the previous examples of neural data that looks like model-based planning such as Johnson and Redish, 2007 in the T-maze and Pfeiffer and Foster, 2013 in open fields, and cite models such as Hasselmo and Eichenbaum, 2005; Erdem and Hasselmo, 2012 and Fenton and Kubie, 2012 that showed how forward replay or planning of possible trajectories could guide performance of behavioral tasks

      We thank the reviewer for the valuable feedback. We have adapted the manuscript throughout to discuss the important point that the SR is not the ground truth (e.g. the final paragraphs in the sections “Bias-variance trade-off” and “Leveraging replays to learn novel trajectories”). We also discussed more extensively the model-based literature and the suggested citations in the manuscript.

      The title and text repeatedly refers to a "spiking" model. They show spikes in Figure 2 and extensively discuss the influence of spiking on STDP, but they ought to more explicitly discuss the interaction of their spike generation mechanisms (using a Poisson process) and the authors should compare their model to the model of George, DeCothi, Stachenfeld and Barry which addresses many of the same questions but using theta phase precession to obtain the correct spike timing in STDP.

      Yes, that's a great suggestion. We have extended our discussion section. In particular, we added:

      In our work, we did not include theta modulation, but phase precession and theta sequences could be yet another type of activity within the TD lambda framework. Interestingly, more groups have recently investigated related ideas. A recent work \citep{George2022} incorporated the theta sweeps into behavioural activity, showing it approximately learns the SR. Moreover, theta sequences allow for fast learning, playing a similar role as replays (or any other fast temporalcode sequences) in our work. By simulating the temporally compressed and precise theta sequences, their model also reconciles the learning over behavioral timescales with STDP. In contrast, our framework reconciles both timescales relying purely on rate-coding during behaviour. Finally, their method allows to learn the SR within continuous space. It would be interesting to investigate whether these methods co-exist in the hippocampus and other brain areas. Furthermore, \citep{Fang2022} et al. recently showed how the SR can be learned using recurrent neural networks with biologically plausible plasticity.

      The introduction and start of the Results section are should have more citations to neuroscience data. The introduction currently cites only three experimental citations (O'Keefe and Dostrovsky, 1971; O'Keefe and Nadel, 1978 and Mehta et al. 2000) and then gives repeated citations of previous theory papers as if those papers define the experimental data that is relevant to this study. The article should review actual neuroscience literature, instead of acting as if a few theory papers in the last five years are more important sources of data than decades worth of experimental work. The start of the results section makes a statement about the role of hippocampus and only cites Stachenfeld et al. 2017 as if it were an experimental paper. The introduction, start of results and discussion need to be modified to address actual experimental data instead of just prior modeling papers. They need to add at least a paragraph to the introduction discussing real experimental data. There are numerous original research papers that should be cited for the role of hippocampus in behavior so that the reader doesn't get the impression all of this work started with the paper by Stachenfeld et al. 2017. For example, the introduction should supplement the citations to O'Keefe and Mehta with other experimental papers including those that they cite later in the paper. They should also cite other seminal work of Morris et al. 1982 in Morris water maze and Olton, 1979 in 8-arm radial maze and work by Wood, Dudchenko, Robitsek and Eichenbaum on neural activity during spatial alternation. At the start of the Results, instead of only citing Stachenfeld (which should have reduced emphasis when speaking about experiments), they should again cite O'Keefe and Nadel, 1978 for the very comprehensive review of the literature up to that time, plus the work of Morris and Eichenbaum and Aggleton and other experimental work.

      We thank the reviewer for the suggested citations. We have added many citations in order to discuss the experimental literature more thoroughly.

      This article is admirable for addressing how to utilize a continuous representation of space and time, which Kenji Doya also addressed in his NeurIPS article in 1995 and Neural Computation 2000 (which should be cited). To emphasize the significance of this continuous representation, they could note that reinforcement learning (RL) theory models still tend to use a discretized grid-like map of the world and discrete representation of time that does not correspond to the probabilistic nature of place cell response properties (Fenton and Muller) and the continuous nature of the response of time cells (Kraus et al. 2013).

      We thank the reviewer for this important comment and this is indeed one of the main strengths of the proposed framework. We have now emphasised this point, by adding the following paragraph to the Discussion:

      “Importantly, the discount parameter also depends on the time spent in each state. This eliminates the need for time discretization, which does not reflect the continuous nature of the response of time cells (Kraus et al. 2013).”

      I think the authors of this article need to be clear about the shortcomings of RL. They should devote some space in the discussion to noting neuroscience data that has not been addressed yet. They could note that most components of their RL framework are still implemented as algorithms rather than neural models. They could note that most RL models usually don't have neurons of any kind in them and that their own model only uses neurons to represent state and successor representations, without representing actions or action selection processes. They could note that the agents in most RL models commonly learn about barriers by needing to bang into the barrier in every location, rather than learning to look at it from a distance. The ultimate goal of research such as this should to link cellular level neurophysiological data to experimental data on behavior. To the extent possible, they should focus on how they link neurophysiological data at the cellular level to spatial behavior and the unit responses of place cells in behaving animals, rather than basing the validity of their work on the assumption that the successor representation is correct.

      We thank the reviewer for this suggestion, we have now extended the Discussion to include a paragraph on the “Limitations of the Reinforcement Learning framework” which we reproduce here:

      We have already outlined some of the perks of using reinforcement learning for modelling behaviour, including providing clear computational and algorithmic frameworks. However, there are several intrinsic limitations to this framework. For example, it needs to be noted that RL agents that only use spatial data do not provide complete descriptions of behavior, which likely arises from integrating information across multiple sensory inputs. Whereas an animal would be able to smell and see a reward from a certain distance, an agent exploring the environment would only be able to discover it when randomly visiting the exact reward location. Furthermore, the framework rests on fairly strict mathematical assumptions: typically the state space needs to be markovian, time and space need to be discretized (which we manage to evade in this particular framework) and the discounting needs to follow an exponential decay. These assumptions are overly simplistic and it is not clear how often they are actually met. Reinforcement Learning is also a sample-intensive technique, whereas we know that some animals, including humans, are capable of much faster or even one-shot learning. \ Regarding the specific limitations of our model, we can note that even though we have provided a neural implementation of the SR, and of the value function as its read-out (see Figure 5-figure supplement S2, the whole action selection process is still computed only at the algorithmic level. It may be interesting to extend the neural implementation to the policy selection mechanism in the future.

    1. Author Response

      Reviewer #2 (Public Review):

      Regulation of NAD and its intermediary metabolites is of critical importance in axon degeneration and neurodegenerative disease. Mounting evidence supports a scenario in which low NAD, and high NMN triggers axon degeneration by competitive allosteric inhibition/activation of SARM1. Strategies to increase NAD levels and/or lower NMN levels provide neuroprotection in a variety of contexts. NAD metabolism is a partially conserved process, however, there are key differences in pathway routes and dynamics between model organisms used for NAD research (yeast, worm, fly, zebrafish, mouse/mammalian systems). Drosophila is a key model organism for axon degenerative research based on its ease of use and range of available genetic tools, in addition, the effector of axon degeneration - SARM1 - was first identified in the fly. As Drosophila has some key differences in the NAD synthesis pathways to mammalian systems it is important to test and develop tools to enable exploration of these pathways on the fly. Llobet Rosell and colleagues have developed clear and demonstrable tools in Drosophila for exploring NAD-related axon degenerative pathways by modulating the use of NMN via the addition of NMN consuming and NMN generating enzymes. They utilize Drosophila genetics to adequately support the claims made in the manuscript. Importantly, the authors well-demonstrate that consuming NMN through an alternate route to NaMN provides neuroprotection and that the neuroprotective components of low NMN are upstream of SARM1. These should be useful tools for neuroscientists in the future to use Drosophila for neurodegenerative research.

      Strengths:

      • Clear demonstration that low NMN provides neuroprotection using novel, stable, enzymatic depletion of NMN (to NaMN).

      • Development of a novel Drosophila tool (NMN-D transgenics) to explore NMN metabolism in vivo, including a stabilized version to permit chronic NMN depletion.

      • Metabolomic profiles across the pathway to show all pathway changes (not just isolated NMN or NAD assays). • Neurodegenerative assays that include both histological outcomes (axon degeneration) but also circuitry/functional outcomes. Data from both series of experiments all support each other.

      • Assessment of other known potent axon degenerative genes via genetics in combination with the tools developed. • Staging of the molecular processes by strategic ablation of the inhibitory ARM domain on SARM1 (dSarm deltaARM). These experiments suggest that low NAD AND high NMN (i.e. ratio between the two) is the critical factor that drives axon degeneration. Once NAD is low, axon degeneration cannot be recovered by further lowering of NMN. The dSarm delta-ARM and dnmnat sgRNAs experiments support a hypothesis in that (high) NMN triggers, but doesn't, execute axon degeneration.

      We appreciate his recognition of the quality of our research.

      Weaknesses:

      • The authors use murine NAMPT (mNAMPT) to increase NMN. The degeneration assays support the hypotheses made, yet mNAMPT doesn't actually increase NMN. Thus it is unclear in this setting whether mNAMPT promotes axon degeneration by an NMN-related mechanism or through another route. It is also unclear as to why the murine form was chosen versus a human or other orthologues, or changing the metabolism of the intrinsic pathway (NR and NRK).

      Why mNAMPT:

      We decided to use mouse NAMPT (mNAMPT) because it was readily available by Giuseppe Orsomando (Amici et al., 2017), and because we did not have access to human NAMPT (hNAMPT).<br /> We agree with the observation that under physiological conditions, the expression of mNAMPT does not change NMN. However, we argue that after injury, once dNmnat is degraded, the additional NMN synthesis provided by mNAMPT expression (in addition to dNrk), leads to a faster NMN accumulation. It is supported by the observation that NMNAT2 is more labile than NAMPT in mammals (Gilley and Coleman, 2010; Stefano et al., 2015).

      • The authors use metabolic profiling to look at the individual metabolites during axon degenerative evens and treatments however it is unclear if any of these proteins or genes change as a consequence. This is likely not important for understanding the findings however, might be helpful in explaining the mNAMPT data.

      We agree with the idea to test whether there is a change induced at the mRNA or protein level when the metabolic flux is altered. To do this, first, we measured the relative expression levels of axon death and NAD+ synthesis genes (Figure 2 – figure supplement 1B). Then, we measured potential changes upon mNAMPT expression (Figure 4 – figure supplement 1). Importantly, while the Gal4-driven expression resulted in an increase of relative mNAMPT transcript abundance from 30 to 12’000, the change observed in the other genes was not notable. Importantly, compared to Actin–Gal4, dnrk is 2-fold lower in UAS-mNAMPT and Actin > mNAMPT backgrounds (control vs. experiment, respectively). Thus, overall, there appears to be no change in mRNAs of either axon death or NAD+ synthesis genes.

      In the results, we changed the text accordingly:

      "We then tested the effect of mNAMPT on the NAD+ metabolic flux in vivo. Surprisingly, NAM, NMN, and NAD+ levels remained unchanged under physiological conditions (Figure 4C). However, we noticed 3-fold higher NR and a moderate but significant elevation of ADPR and cADPR levels upon mNAMPT overexpression (Figure 4C). We also asked whether mNAMPT impacts on NAD+ homeostasis thereby altering the expression of axon death or NAD+ synthesis genes. Besides the expected significant increase in the Gal4-mediated expression of mNAMPT, we did not observe any notable changes at the mRNA level (Figure 4 – figure supplement 1)."

      • The authors repeatedly introduce a novel PncC antibody. However, no details on this, its generation, or its testing are found within the manuscript as presented. The antibody detects with several bands. The authors speculate that this could be a degradation product but nothing substantial is shown.

      In Materials and methods, we added a new section:

      "PncC antibody generation Rabbit anti-PncC antibodies were generated by Lubioscience under a proprietary protocol. The immunogen used was purified from Escherichia coli, strain K12, corresponding to the full protein sequence of NMN-D. The amino acid sequence is the following: MTDSELMQLSEQVGQALKARGATVTTAESCTGGWVAKVITDIAGSSAWFERGFVTYSNEAKAQMIGVREETLAQHGAVSEPVVVEMAIGALKAARADYAVSISGIAGPDGGSEEKPVGVWFAFATARGEGITRRECFSGDRDAVRRQAT AYALQTLWQQFLQNT"

      We also updated the results referencing it.

      "We found that both wild-type and enzymatically dead NMN-D enzymes are equally expressed in S2 cells, as detected by newly generated PncC antibodies (Materials & Methods, Figure 1–figure supplement 2). Notably, we observed two immunoreactivities per lane, with the lower band being a potential degradation product."

      In addition, we now provide evidence why we believe that the upper band is NMN-D, while the lower one is a degradation product. In the figure attached below, the samples of the first five lanes were denatured at 70 °C, while the samples of the last two lanes were denatured at 95 °C (each for 10 min, respectively). The resulting Western blot shows that at 70 °C, there is more unspecific background, but no lower degradation product, while at 95 °C, the background is drastically reduced; however, there is a lower degradation product appearing. NMN-D is indicated by an asterisk. We feel that it is important to show this data here in the rebuttal. But we feel that it would add confusion to the readers in the manuscript.

      • Olfactory receptor neuron degeneration assays are shown in Fig1 but no data is presented with it to support the images.

      We agree that a quantification would support our observation. However, it is difficult to precisely quantify individual axons in the ORN injury assay, for two main reasons:

      1. Severed axons are often bundled, thus the exact number cannot be scored.

      2. Due to the removal of the cell body, the axonal GFP intensity decreases over time, due to the absence of mCD8::GFP synthesis. It adds another level of difficulty. Nevertheless, we added numbers to each example in Figure 1E and D, where we quantified the % of brains where severed preserved axons were observed, similar to Figure 2 in (MacDonald et al., 2006).

      In the results section, we changed the text as indicated below:

      "We extended the ORN injury assay and found preservation at 10, 30, and 50 dpa (Figure 1E). While quantifying the precise number of axons is technically not feasible, severed preserved axons were observed in all 10, 30, and 50 dpa brains, albeit fewer at later time points (MacDonald et al., 2006). Thus, high levels of NMN-D confer robust protection of severed axons for multiple neuron types for the entire lifespan of Drosophila."

      In the Figure 1 legend, we changed the text accordingly:

      "D Low NMN results in severed axons of olfactory receptor neurons that remain morphologically preserved at 7 dpa. Examples of control and 7 dpa (arrows, site of unilateral ablation). Lower right, % of brains with severed preserved axon fibers. E Low NMN results in severed axons that remain morphologically preserved for 50 days. Representative pictures of 10, 30, and 50 dpa, from a total of 10 brains imaged for each condition (arrows, site of unilateral ablation). Lower right, % of brains with severed preserved axon fibers."

    1. Author Response

      Reviewer #1 (Public Review):

      Alexander Komkov et al. developed a novel software/algorithm (iROAR) to utilise naturally occurring non-functional clonotypes as a control repertoire to correct for amplification bias associated with multiplex PCR based technologies commonly used in TCR/BCR repertoire analysis. No new data was generated in this study and utilises only publicly available datasets. The authors firstly determine the over amplification rate (OAR) as a metric which is found to be close to 1 under no or little amplification bias and this was validated by calculating the OAR for repertoires determined using 5'-RACE, a method known to have little to no amplification bias. This was a great control to have and is essential for validating the OAR measurement. In contrast, multiplex PCR based protocols such as VMPlex and VJMplex had significant deviations in the distribution of OAR.

      Strengths: The authors used publicly available datasets that utilise both biased (multiplex PCR based) and low biased (5'-RACE) methods to determine TCR/BCR repertoires. In addition, the authors generated in silico biased 5'-RACE datasets. These comparisons are critical in determining the effect of bias correction.

      Weaknesses: Analysis of TCR/BCR repertoires are very generalised to number of clonotypes. The use of this algorithm could be more widespread if the effect of iROAR on another repertoire analysis tools was determined or discussed. For example, does iROAR affect measures of diversity? Identification of rare but unique clonotypes? The ability to detect true clonal expansions? Additionally, documentation for the software is lacking and largely inaccessible to non-specialists.

      By default, iROAR does not affect diversity and does not remove any clones. This statement was added to the manuscript. For now, the analysis of the potential effect on the detection of true clonal expansion is infeasible due to the lack of appropriate data with sufficient sequencing coverage. Also, we’ve made a more detailed description of iROAR software.

      Reviewer #2 (Public Review):

      In this paper, Komkov et al. describe a novel approach for computational correction of PCR amplification bias in adaptive immune receptor repertoire (AIRR) sequencing data (AIRR-seq). Their correction algorithm is based on using out-of-frame rearrangements to approximate gene-specific amplification bias. Gene-specific relative frequencies among out-of-frame rearrangements are not altered by clonal expansion except to the extent that out-of-frame rearrangements are passengers in clones expanding as a consequence of the specificity of the functional rearrangement. Due to independence between the two rearrangements, it can be reasonably assumed that the effects of clonal expansion are uniform in their impact on the observed V- and J-gene frequencies among out-of-frame rearrangements. Komkov et al. further assume that gene-specific relative frequencies among unique, out-of-frame rearrangements approximate recombination frequencies and that the extent to which gene-specific relative frequencies among all out-of-frame rearrangements deviate from those among unique, out-of-from rearrangements provides an estimate of gene-specific PCR amplification bias. The ratio of V- or J-gene relative frequencies among all out-of-frame rearrangements to the corresponding relative frequency among unique out-of-frame rearrangements provides this estimate and can be used as a correction factor during data processing. It also serves as the basis for a repertoire-level metric of the overall extent of amplification bias in a repertoire.

      This is a very nice and, to the best of my knowledge, novel idea. The proposed correction factor and metric have potential utility in all studies conducting AIRR-seq that use a PCR amplification step. While the proposed approach may not have superior or even equal performance when compared to biological spike-ins, it still has great potential utility given the time and financial costs and required expertise of using biological spike-ins and because it can be applied to data sets that have already been generated. Incorporation of this approach into AIRR-seq data processing has the potential to increase the accuracy of downstream analyses. It also has the potential to enhance the comparability of results across studies and to reduce the effects of different sequencing protocols for data re-use when data are integrated across studies.

      Enthusiasm is dampened by the fact that the proposed method is not directly compared to the gold standard of biological spike-ins.

      During manuscript revision, we designed and performed an additional wet-lab experiment to directly compare the iROAR approach with biological spike-ins.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript describes the generation and characterization of a mouse knockout model of Cep78, which codes for a centrosomal protein previously implicated in cone-rod dystrophy (CRD) and hearing loss in humans. Previous work in cultured mammalian cells (including patient fibroblasts) also indicated roles for CEP78 in primary cilium assembly and length control, but so far no animal models for CEP78 were described. Here, the authors first use CRISPR/Cas9 to knock out Cep78 in the mouse and convincingly demonstrate loss of CEP78 protein in lysates of retina and testis of Cep78-/- animals. Next, by careful phenotypic analysis, the authors demonstrate significant defects in photoreceptor structure and function in these mutant animals, which become more severe over a 9 (or 18) month period. Specifically, TEM analysis demonstrates ultrastructural defects of the connecting cilium and photoreceptor outer segments in the Cep78 mutants, which is in line with previously reported roles for CEP78 in CRD and in regulating primary cilia assembly in humans. In addition to a CRD-like phenotype, the authors also convincingly show that male Cep78-/- animals are infertile and exhibit severe defects in spermatogenesis, sperm flagella structure and manchette formation (MMAF phenotype). Furthermore, the authors provide evidence for an MMAF phenotype from a male individual carrying a previously reported CEP78 c.1629-2A>G mutation, substantiating that CEP78 is required for sperm development and function in mammals and supporting previously published work (Ascari et al. 2020).

      Finally, to identify the underlying molecular mechanism by which CEP78 loss causes MMAF, the authors perform some biochemical analyses, which suggest that CEP78 physically interacts with IFT20 and TTC21A (an ortholog of Chlamydomonas IFT139) and might regulate their stability. The authors conclude that CEP78 directly binds IFT20 and TTC21A in a trimeric complex and that disruption of this complex underlies the MMAF phenotype observed in Cep78-/- male mice. However, this conclusion is not fully justified by the data provided, and the mechanism by which CEP78 affects spermatogenesis therefore remains to be clarified.

      Specific strengths are weaknesses of the manuscript are listed below.

      Strengths:

      Overall, the phenotypic characterisation of the Cep78-/- animals appears convincing and provides new evidence supporting that CEP78 plays an important role in the development and function of photoreceptors and sperm cells in vertebrates.

      Weaknesses:

      1) The immunoprecipitation experiments of mouse testis extracts that were used for the mass spectrometry analysis in Table S4 were performed with an antibody against endogenous CEP78 (although antibody details are missing). One caveat with this approach is that the antibody might block binding of CEP78 to some of its interactors, e.g. if the epitope recognized by the antibody is located within one or more interactor binding sites in CEP78. This could explain why the authors did not identify some of the previously identified CEP78 interactors in their IP analysis, such as CEP76 and the EDD-DYRK2-DDB1-VprBP complex (Hossain et al. 2017) as well as CEP350 (Goncalves et al. 2021).

      We thank Reviewer #1 (Public Review) for agreeing with us on Cep78 plays an important role in photoreceptors and sperm cells development. We also appreciate Reviewer #1 (Public Review) for pointing out the weaknesses which helped us improve our study.

      For the immunoprecipitation experiments of mouse testis extracts, the antigenic sequence of the Cep78 antibody used is p457-741 (NP_932136.2). Cep78 was reported to bind DD-DYRK2-DDB1-VprBP complex, the 1-520aa is responsible for Cep78’s interaction with VprBP, and deletion of p450-497 didn’t affect Cep78’s interaction with VprBP, indicating importance of Cep78 (1-450aa) in interaction with VprBp (Hossain et al. 2017). Our anti-Cep78 antibody is generated using antigen sequence p457-741, the binding of p1-450aa to VprBP is not expected to be blocked by our anti-Cep78 antibody. However, VprBp was not detected by our IP-MS experiment. C-terminal region (395-722aa) of Cep78 overlaps with our Cep78 antibody’s antigenic region (p457-741), and was reported to interact with Cep350 (Goncalves et al. 2021). As a polyclonal antibody, our anti-Cep78 antibody didn’t block the interaction with p457-741, because we still identified Cep350 in our IP-MS. Thus, immunoprecipitation experiments using our Cep78 antibody identified some of the previously known interactors, and the interaction with VprBP may not be blocked by our Cep78 antibody.

      The detailed antibody information has now been added to Supplementary Table S7 in our revised supplementary materials.

      2) Figure 7A-D and page 18-25: based on IPs performed on cell or tissue lysates the authors conclude that CEP78 directly binds IFT20 and TTC21A in a "trimeric complex". However, this conclusion is not justified by the data provided, nor by the previous studies that the authors are referring to (Liu et al. 2019 and Zhang et al. 2016). The reported interactions might just as well be indirect. Indeed, IFT20 is a known component of the IFT-B2 complex (Taschner et al., 2016) whereas TTC21A (IFT139) is part of the IFT-A complex, which suggests that they may interact indirectly. In addition, the IPs shown in Figure 7A-D are lacking negative controls that do not coIP with CEP78/IFT20/TTC21A. It is important to include such controls, especially since IFT20 and CEP78 are rich in coiled coils that tend to interact non-specifically with other proteins.

      Thank Reviewer #1 (Public Review) for the comment on protein interaction between Cep78, Ift20, and Ttc21a. As the reviewer pointed out, IFT20 is a known component of the IFT-B2 complex (Taschner et al., 2016) whereas TTC21A (IFT139) is part of the IFT-A complex. Both IFT20 and TTC21a are located at peripheral areas of IFT-B and IFT-A (PMID: 32456460), and are not core components of IFT-A or IFT-B. It is still possible that these two proteins interact with each other. Actually, Liu et al. have revealed interaction between Ift20 and Ttc21a in human sperm (PMID: 30929735). Additionally, to mediate trafficking of ciliary axonemal components, the IFT machinery is recruited to the distal appendages (PMID: 30601682), which is adjacent to the distal end of the (mother) centriole wall, where at the (mother) centriole wall was reported to be located (PMID:35543806). Cep78 may interact with Ift20 and Ttc21a at centriole during cilliogenesis.s

      To rule out the nonspecific interaction between Cep78 and Ttc21a or Ift20, we added additional negative controls of Gapdh (Figure 7D) and Ap80-NB-HA (Supplementary Figures S7A-C) in co-IP as the reviewer suggested, and found that the interaction between Cep78 and Ttc21a or Ift20 is specific. To examine if Cep78, Ift20 and Ttc21a formed a complex, we fractionated testicular protein complexes using size exclusion chromatography, and found that Cep78, Ift20 and Ttc21a co-fractioned at the size between158 kDa to 670 kDa (Figure 7E), supporting the formation of a trimeric complex. And our immunofluorescent analysis by SIM also showed co-localization between Cep78 and Ift20 or Ttc21a (Figure 7F). All these data support the interaction among Cep78, Ttc21a and Ift20. In the revised manuscript, we rephrased “direct interaction” as “interaction” at page 18, line 393 in the revised manuscript.

      3) In Figure 7D, the input blots show similar levels of TTC21A and IFT20 in control and Cep78-/- mouse testicular tissue. This is in contrast to panels E-G in the same figure where TTC21A and IFT20 levels look reduced in the mutant. Please explain this discrepancy.

      Thank you for pointing this out. Deletion of Cep78 caused down-regulation of Ttc21a and Ift20 proteins. To better reveal the change of interaction between Ttc21a and Ift20, we have to normalize their interaction against expression levels. To achieve this, we increased the amount of total Cep78-/- testicular proteins to ensure that Ttc21a and Ift20 in the input are at similar levels between Cep78+/- and Cep78-/- testes. Using 3 times the amount of the Cep78+/- testicular proteins for Cep78-/- testicular proteins, we detected similar protein levels of Ttc21a and Ift20 between Cep78-/- and Cep78+/- testes, and the interaction between Ttc21a and Ift20 was shown to be down-regulated after Cep78 deletion. Consistently, the analysis of GAPDH as a loading control in input proteins showed more Cep78-/- testicular proteins than Cep78+/- testicular proteins subjected to analysis. To avoid confusion, we have added description of “The amount of Cep78-/- testicular proteins used was 3 times of that of Cep78+/- proteins” in the legend of Figure 7D in the revised version of manuscript.

      4) The efficiency of the siRNA knockdown shown in 7J-M was only assessed by qPCR (Figure S4), but this does not necessarily mean the corresponding proteins were depleted. Western blot analysis needs to be performed to show depletion at the protein level. Furthermore, it would be desirable with rescue experiments to validate the specificity of the siRNAs used.

      Thank the reviewer for the suggestion. To validate the specificity of the siRNAs used, we performed rescue experiments using rescue plasmid with siRNA targeting sequence synonymously mutated (Supplementary Table S6). The efficiency of siRNA knockdown and effects of rescue experiments were evaluated by both qPCR (Supplementary Figures S4.A-C) and Western Blot (Figures 7.J-K, Supplementary Figures S4.D-E, H-I). The results showed that siRNAs significantly reduced the expression of Cep78, Ift20, and Ttc21a at both mRNA (Supplementary Figures S4.A-C) and protein levels (Figures 7.J-K, Supplementary Figure S4.A-C). Meanwhile, with siRNA treatment, the rescue plasmids rescued the expression of Cep78, Ift20, and Ttc21a at both mRNA (Supplementary Figures S4.A-C) and protein levels (Figures 7.J-K, Supplementary Figures S4.D-E, H-I) compared with the control groups.

      In the rescue experiments, we further evaluated whether the effects are specific for Cep78, Ift20 and Ttc21siRNAs in the regulation of cilia and centriole lengths. The results showed that suppression of cilia and centriole lengths by Cep78, Ift20 and Ttc21siRNAs could be rescued by overexpression of rescue plasmids of Cep78syn-HA, Ift20syn-Flag and Ttc21asyn-Flag (Figures 7.N-S).

      5) Figure 7I: the resolution of the IFM is not very high and certainly not sufficient to demonstrate that CEP78, IFT20 and TTC21A co-localize to the same region on the centrosome, which one would have expected if they directly interact.

      Thank the reviewer for the constructive comments. To better demonstrate co-localization of CEP78, IFT20 and TTC21A on the centrosome, we overexpressed Cep78-Halo, Ift20-mCherry and Ttc21a-mEmerald in NIH3T3 cells by lentivirus, and acquired super-resolution images with SIM (N-sim, Nikon, Tokyo, Japan). The SIM results showed that Ift20 and Ttc21a co-localized with Cep78 (Figure 7F). Cep78 was previously reported to localize at the centriole (Goncalves et al., 2021). The co-localization of Cep78, Ift20 and Ttc21a indicated possible important roles of Cep78 in the regulation of Ift20 and Ttc21a in centriole. Our interaction analysis revealed that Cep78 interacted with Ift20 and Ttc21a (Figure 7A-C, Supplementary Figure S7), and formed a complex with Ift20 and Ttc21a (Figure 7E). Loss of Cep78 down-regulated the expression of and interaction between Ift20 and Ttc21a (Figures 7D, G-M).

      6) It is not really clear what information the authors seek to obtain from the global proteomic analysis of elongating spermatids shown in Figure 3N, O and Tables S2 and S3. Also, in Table S2, why are the numbers for CEP78 in columns P, Q and R so high when Cep78 is knocked out in these spermatid lysates? Please clarify.

      Thank the reviewer for the comments. Our global proteomic analysis showed that majority of differentially expressed proteins were down-regulated (Figure 3N), and many proteins are centrosome- and cilia-related proteins and important for sperm flagella and acrosome structures (Figure 3O), which provide insights of downstream molecular events in sperm flagella and acrosome defects after Cep78 deletion.

      As to the quantification of CEP78 expression in TMT-based proteomics analysis, the ratio between Cep78-/- and Cep78+/- is relatively high due to the ratio compression effect, a well-known phenomenon in TMT-based proteomics analysis (PMID: 25337643). The actual difference in protein expression is usually higher than the ratio calculated by TMT signals. Actually, our Western blot analysis of CEP78 protein showed absence of expression in Cep78-/- testis. Although TMT labelling has the disadvantage of ratio compression (PMID: 32040177,PMID: 23969891), it is widely used quantitative proteomics analysis, and is demonstrated to be able to identify key pathways and proteins (PMID: 30683861, 33980814).

      7) Figure 1F and Figure 4K: the data needs to be quantified.

      Thank the reviewer for this suggestion. For Figure 4K, we stained Cep78+/- and Cep78-/- spermatids with anti-Centrin 1 to measure the centriole length. The statistical data of centriole length were provided (Figure 4L), showing significantly increased centriole lengths in Cep78-/-spermatids.

      For Figure 1F, we quantified the immunofluorescence intensities of cone arrestin of light-adapted retinas of Cep78+/- and Cep78-/- mice at 3-month. The results indicate that immunofluorescence intensity of the cone arrestin was lower in Cep78-/- mice.

      8) Figure 2A: It is difficult to see a difference in connecting cilium length in control and Cep78-/- mutant retinas based on the images shown here.

      Thank you for your suggestion, we have stained retinal cryosections from Cep78+/- and Cep78-/- mice with anti-Nphp1 to visualize connecting cilium, and the data are provided in the revised Figure 2A-B.

      Reviewer #2 (Public Review):

      In this report, the authors have described the generation and characteristics of Cep78 mutant mice. Consistent with the phenotype observed in patients carrying the mutations in CEP78, Cep78 knock-out mice show degeneration in photoreceptors cells as well as defects in sperm. The author further shows the CEP78 protein can interact with IFT120 and TTC21a. Mutation in CEP78 results in a reduction of protein level of IFT120 and TTC21A and mislocalization of these two proteins, offering mechanistic insights into the sperm defects. Over all the manuscript is well written and easy to follow. Phenotyping is thorough. However, improvement of the background section is needed. In addition, some of the conclusion is not sufficiently supported by the data, warranting further analysis and/or additional experiments. The Cep78 KO mice model established by the author will be a useful model for further elucidating the disease mechanism in human and developing potential therapy.

      My comments are the following:

      1) Introduction. The statement that "CRD usually exists with combination of immotile cilia defects in other systems" is not correct. CRD due to ciliopathy can have cilia-related syndromic defects in other systems but it is a relatively small portion of all CRDs and the most frequently mutated genes are not cilia-related genes, such as ABCA4, GUCY2D, CRX.

      Thank the reviewer for the comments. We agree with the reviewer that only a small portion of CRDs are due to cilia defects and can have cilia-related syndromic defects in other systems. We corrected this statement in Line 4, Page 77-79 of the revised version of our manuscript. In our revised version, the statement has been changed to “A small portion of CRDs are due to retina cilia defects, and they may have cilia-related syndromic defects in other systems[1].”

      2) Introduction: Page 4 CNGB1 encodes channel protein and not a cilia gene. It should be removed since it does not fit.

      Thank the reviewer for the comment. According to the reviewer’s suggestion, we removed the description of “mutations in CNGB1 cause CRD and anosmia [3]” at Page 4, Line 81 in the revised manuscript.

      3) Page 5, given the previous report of CEP78 patients with retina degeneration, hearing loss, and reduced infertility, the statement of "we report CE79 as a NEW causative gene for a distinct syndrome...TWO phenotypes....." Is not accurate.

      Thank the reviewer for the comments. We have removed the statement of “NEW” causative gene in Page 5, Line 104 of the revised version of our manuscript. The revised sentence is “In this study, based on results of a male patient carrying CEP78 mutation and Cep78 gene knockout mice, we report CEP78 as a causative gene for CRD and male sterility.”

      4) Figure 1F, the OS of the cone seems shorter, which might be the reason for weaker arrestin staining in the mutant compared to the heterozygous. Also, it would be better to quantify the staining to substantiate the statement.

      Thanks for this suggestion. For Figure 1F, we have quantified the immunofluorescence intensity of cone arrestin in Cep78+/- and Cep78-/- light-adapted retinas at 3-month. The results indicate that immunofluorescence intensity of the cone arrestin was significantly lower in Cep78-/- mice.

      5) Figure 1K, panel with lower magnification would be useful to get a better sense of the overall structure defect of the retina. Is the defect observed in the cone as well?

      Thank the reviewer for the comment. As suggested by the reviewer, we have provided images of lower magnification to show the overall structure by TEM, showing disruption of most outer segment in Cep78-/- retina. It is difficult to distinguish whether the disordered outer segment structure belongs to a cone or a rod cell. The images are now provided as Figure 1L in the revised manuscript.

      We observed the abnormality of photopic b-wave amplitudes (Figure 1B, E) and decreased intensity of cone arrestin in light-adapted retinas (Figure 1F, G) in Cep78-/- mice, which indicate that the function of cone cells is damaged.

      6) Figure 2A, NPHP1 or other markers specifically label CC would be more useful to quantify the length of CC. Also need to provide a notation for the red arrows in Figure 2. In addition, the shape of CC in the mutant seems differ significantly from the control. It seems disorganized and swollen.

      Thank the reviewer for the suggestion. According to the reviewer’s suggestion, we have stained anti-Nphp1 in retinal cryosections from Cep78+/- and Cep78-/- mice to visualize connecting cilium, and quantified the length of CC. The results showed that connecting cilia were shorter in Cep78-/- mice. These data are showed in Figure 2A-B.

      Besides, we observed that upper parts of connecting cilia were swelled with disorganized microtubules in TEM (Figure 2E-G). The red arrows in Figure 2E-G indicated swelled upper part of connecting cilia and disorganized microtubules of Cep78-/- photphoreceptors, we added this description in the figure legend.

      7) Evidence provided can only indicate direct interaction among CEP78/IFT20/TTC21A.

      Thanks for the comment. To further validate the interaction between Cep78 and Ttc21a or Ift20, we performed reciprocal co-IP between Cep78 and Ttc21a or Ift20 by overexpression (Figure 7A-C), and also added relevant negative control of Gapdh (Figure 7D) and Ap80-NB-HA (Supplementary Figures S7A-C) in co-IP as negative controls to avoid non-specific interaction. Besides, we provided evidence that Cep78, Ift20 and Ttc21a formed a complex, as they all co-fractioned in a testicular protein complex at the size between158 kDa to 670 kDa using size exclusion chromatography (Figure 7E). Additionally, we performed super-resolution analysis of immunofluorescent localizations, and observed co-localization between Cep78 and Ttc21a or Ift20 by SIM. With these data, we think that Cep78 interacts with Ttc21a and Ift20 and they form a complex. We rephrased “direct interaction” as “interaction” in the manuscript.

      Reviewer #3 (Public Review):

      Authors were aiming to bring a deeper understanding of CEP78 function in the development of cone-rod dystrophy as well as to demonstrate previously not reported phenotype of CEP78 role in male infertility.

      It is important to note, that the authors 're-examined' already earlier published human mutation, 10 bp deletion in CEP78 gene (Qing Fu et al., 10.1136/jmedgenet-2016-104166). This should be seen as an advantage since re-visiting an older study has allowed noting the phenotypes that were not reported in the first place, namely impairment of photoreceptor and flagellar structure and function. Authors have generated a new knockout mouse model with deleted Cep78 gene and allowed to convey the in-depth studies of Cep78 function and unleash interacting partners.

      The authors master classical histology techniques for tissue analysis, immunostaining, light, confocal microscopy. They also employed high-end technologies such as spectral domain optical coherence tomography system, electron, and scanning electron microscopy. They performed functional studies such as electroretinogram (ERG) to detect visual functions of Cep78-/- mice and quantitative mass spectrometry (MS) on elongating spermatids.

      The authors used elegant co-immunoprecipitation techniques to demonstrate trimer complex formation.

      Through the manuscript, images are clear and support the intended information and claims. Additionally, where possible, quantifications were provided. Sample number was sufficient and in most cases was n=6 (for mouse specimens).

      The authors could provide more details in the materials and methods section on how some experiments were conducted. Here are a few examples. (i) Authors have performed quantitative mass spectrometry (MS) on elongating spermatids lysates, however, did not present specifically how elongating spermatids were extracted. (ii) In the case of co-IPs authors should provide information on what number of cells (6 well-plate, 10 cm dish etc) were transfected and used for co-IPs. Furthermore, authors could more clearly articulate what were the novel discoveries and what confirmed earlier findings.

      The authors clearly demonstrate and present sufficient evidence to show CEP78/Cep78 importance for proper photoreceptor and flagellar function. Furthermore, they succeed in identifying trimer complex proteins which help to explain the mechanism of Cep78 function.

      The given study provides a rather detailed characterization of human and mouse phenotype in response to the CEP78/Cep78 deletion and possible mechanism causing it. CEP78 was already earlier associated with Cone-rod dystrophy and, this study provides a greater in-depth understanding of the mechanism underlying it. Importantly, scientists have generated a new knock-out mouse model that can be used for further studies or putative treatment-testing.

      CEP78/Cep78 deletion association with male infertility is not previously reported and brings additional value to this study. We know, from numerous studies, that-testes express multiple genes, some are unique to testes some are co-expressed in multiple tissues. However, very few genes are well studied and have clinical significance. Studies like this, combining patient and animal model research, allow to identify and assign function to poorly characterized or yet unstudied genes. This enables data to use in basic research, patient diagnostics and treatment choices.

      We would like to thank Reviewer #3 (Public Review) for positive comments on our work.

      As to the suggestions to provide some details in the materials and methods by the reviewer, we added the description of STA-PUT method for spermatids purification at Page 34, Line 729-741 in the revised manuscript, the amount of cells used for co-IPs “10 cm dish HEK293T were transfected (Vazyme, Nanjing, China) wit 5μg plasmid for each experimental group.” at Page 36, Line 783-784 in the revised manuscript.

      We also highlighted our new discovery and ensured that all previous published findings are accompanied by references, we added “We further explored whether c.1629-2A>G mutation in this previously visited patient would disturb CEP78 protein expression and male fertility. Blood sample was collected from this patient and an unaffected control for protein extraction.” at Page 17, Line 335. We also added “The major findings of our study are as follows: we found CEP78 as the causal gene of CRD with male infertility and multiple morphological abnormalities of the sperm flagella using Cep78-/- mice. A male patient carrying CEP78 c.1629-2A>G mutation, whom we previously reported to have CRD [8], was found to have male infertility and MMAF in this study. Cep78 formed a trimer with sperm flagella formation enssential proteins IFT20 and TTC21A (Figure 8), which are essential for sperm flagella formation[16, 18]. Cep78 played an important role in the interaction and stability of the trimer proteins, which regulate flagella formation and centriole length in spermiogenesis. ” at the first paragraph of discussion, which is Page 21, Line 447-456 of our revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      The idea that a passive living being can improve the wind dispersal of its seeds by passively changing their drag is enticing. The manuscript shows that high wind events in Scotland are inversely correlated with the ambient humidity. The dandelion pappus morphs with the ambient humidity, being more open in dry conditions, which is associated with stronger wind events. This passive morphing of the shape of the pappi thus leads to a dispersal of the seeds further away from their origin.

      The analysis and discussion in the paper is focused on "distance", i.e., how far the pappus will fly. Could the notion of time be relevant too? In wet conditions, perhaps it's better for a seed to hit the ground quickly and start germinating, whereas if its dry, staying up in the air for longer to travel farther might be a better strategy.

      This is an interesting point; however, we think that flight time is likely to be less relevant to the dispersal outcomes. This is because seeds mostly remain attached to the parent plant in wet conditions so will not fly at all and therefore will not begin germination. When they do disperse, flight time will generally be only a few seconds for the majority of seeds whether they are wet or dry, and the timescale of wet weather is generally much longer (typically hours).

    1. Author Response

      Reviewer #1 (Public Review):

      This excellent manuscript challenged the premise that NF-kappaB and its upstream kinase IKKbeta play a role in muscle atrophy following tenotomy. Two animal models were used - one leading to enhanced muscle-specific NF-kappaB activation and the other a muscle-specific deletion. In both models, there was no significant relationship to observed muscle changes following tenotomy. Overall this work is significant in that it challenges the existing dogma that NF-kappaB plays a crucial role in muscle atrophy.

      Surprisingly the authors noted that there were basal differences observed in the phenotypes of their models that were sex-dependent. They note that male mice lose more muscle mass after tenotomy and specifically type 2b fiber loss.

      Overall this is an outstanding study that challenges the notion that NF-kappaB inhibitors are likely to improve muscle outcomes following injuries such as rotator cuff tears. Its main weakness is that there were no pharmacological arms of investigation; this fails to definitively exclude the hypothesis that inhibition may exert some effect in healing, perhaps in surrounding non-muscle matrix tissue that in turn may assist in healing.

      Thank you for your careful and thoughtful review. We agree that the finding that NFkb is not driving tenotomy-induced atrophy is both surprising and interesting. We look forward to further uncovering the atrophic mechanisms responsible. We also agree that an investigation using pharmacological NFkb inhibitors will improve our understanding of the full scope of the role of NFkb in the tenotomy pathology. As you and another reviewer note, this work has only blocked NFkb signaling in the mature muscle fiber and thus cannot assess the role of NFkb in satellite cell, fibroblast, immune cell activation etc in the healing response. However, we avoided using these inhibitors in this study due to the potential for these systemic effects to obscure the role of NFkb in the muscle fiber. While we believe that a pharmacological investigation is beyond the scope of this study, it will make an excellent follow on investigation.

      Reviewer #2 (Public Review):

      The primary strength of this paper is a rigorous approach to 'negative' data. Did the authors definitively prove that NF-kB has no role in the tenotomy-induced atrophy? Probably not entirely, since there are limitations of the mouse model and the knockdown mice. There cannot be complete elimination of load since mice heal with some scar tissue, and the knockdown is not complete elimination. However, even with these limitations, this presents important findings that tenotomy, which induces mechanical unloading of the muscle-tendon unit, provides a unique biomechanical environment for the muscle to undergo atrophy, which warrants a more in-depth look given that these injuries are unique and extremely common. It must be mentioned that the results are entirely supported by their data and that even though the model is not 'perfect' it truly supports that NF-kB has a limited role in atrophy. The sex-mediated differences based on autophagy are a secondary hypothesis and are interesting but possibly less clinically relevant based on the differences shown.

      We appreciate your thoughts on the “negative” data in this study. A manuscript in which the data refute your hypothesis and that of the field is difficult to write. There is a higher burden of validation and closer scrutiny of limitations. We agree that the model does have some limitations, but overall strongly supports a limited role for NBkb in tenotomy-induced muscle atrophy.

      The important next step for this group and others is to evaluate the 'how and why' of tenotomy atrophy if not through NF-kB. Is it that there are many redundant processes that the muscle may have to circumnavigate the NF-kB pathway given that it is so ubiquitous that the authors didn't see a difference? Could it be differences in axial vs appendicular muscle? Or should there be a closer look at the mechanosensors in the muscle cells to determine if there are other key drivers of atrophy? Regardless, this paper shows that tenotomy-induced muscle atrophy is unique and supports the conclusion that muscle has many ways to atrophy based on the injury it undergoes.

      We agree that the major next step for this work is to investigate the mechanism(s) responsible for tenotomy-induced atrophy. Autophagy in particular needs a more thorough investigation using autophagic inhibitors in naive wildtype mice to investigate its role in the sex-specificity of tenotomy-induced atrophy. The question of axial vs. appendicular muscle is intriguing. There could also be an upper vs. lower body difference that is worth exploring in future work.

      Reviewer #3Public Review):

      The authors provided thorough analyses of muscle morphology, biochemistry, and function, which is a major strength of the study. However, there are some key confounding variables authors failed to address. For example, the difference in the estrous cycle in female animals was not controlled. The study could have been significantly improved by controlling sex hormone levels or at least testing differences in response to injury.

      We appreciate your careful and insightful review of our work. We designed this study to assess the role of myofiber NFkb in tenotomy-induced atrophy, which led us to a rigorous assessment of morphology, biochemistry and function, which we agree is the strength of the study. We also agree that a major limitation of this study is that the secondary observations of sex-specificity and autophagic signaling are not as well controlled or supported. This is because these observations were made at the end of the study when the histological analyses were completed by the blinded rater. The sex-specificity in the basophilic puncta that the rater observed sparked us to reconsider the sex-specificity in our other data and to stain for autophagic vesicles. As you suggest, to rigorously assess sex-specificity it would be good to control of estrous cycle and analysis of sex hormones which would require initiation of another study, planning for these variables in advance. We think this is beyond the scope of the current question of the role of NFkb in tenotomy-induced atrophy but think it should be undertaken as a follow on to eliminate confounding variables of genetic manipulation and tamoxifen treatment.

      However, since we still need to report the sex specificity we observed while ensuring that our findings are not misconstrued, we reviewed the language in the manuscript to emphasize that these are retrospective observations that require further investigation. We have also added discussion of these variables and their potential influence on the results to the Discussion.

      Discussion: “Additionally, it is important to note that estrous cycle was not controlled in these mice and sex hormone levels weren’t measured in this study. These preliminary observations, though intriguing, will require more rigorous follow up evaluations to define the interaction between sex, tenotomy, and autophagy in naïve wildtype mice.”

      Furthermore, more data are needed to link NFkB signaling and autophagy to make any kind of conclusions. Overall, in the current form of the manuscript, the presented data seem underdeveloped, and the addition of more supporting data could significantly improve the quality of the manuscript and enhance our understanding of NFkB signaling and muscle wasting in rotator cuff injury.

      We agree that more data are needed to complete the picture of autophagy in tenotomy-induced muscle atrophy. The p62 and LC3 positive intracellular puncta in male tenotomized muscle are distinctive, but only limited conclusions can be drawn physiologically because 1) they are only present in a fraction of fibers and 2) it is impossible to tell whether they result from increased autophagic flux or altered vesicle processing. Western blot for LC3 (and now p62) indicates only small changes in total protein, but since these proteins are synthesized and degraded during active autophagy, it is possible for their levels to remain constant while flux increases. Direct measures of autophagic flux would require treating mice with an autophagosome block which would require initiation of another study. However, we agree with the reviewer that we can add some additional measures to better characterize the instantaneous state.

      We have added analysis of p62 protein expression to LC3 since p62 protein content in muscle can be decoupled from LC3 (PMID: 27493873). We also added expression data for genes involved in autophagy (Lc3b, Gabarapl1, Becn1, Bnip3, and Atg5). Finally, we have commented on the limitations of our data in the Discussion.

      Discussion: “Evidence for autophagy regulating tenotomy-induced atrophy has been mounting over recent years (Bialek et al., 2011; Gumucio et al., 2012; Joshi et al., 2014; Ning et al., 2015; Hirunsai & Srikuea, 2021). The evidence presented here supports this contention, but we find surprisingly small effect sizes for all markers investigated. This could be because we are not directly assessing autophagic flux and so are missing some temporal dynamics since synthesis and degradation are ongoing simultaneously.”

    1. Author Response

      Reviewer #1 (Public Review):

      The authors have generated a set of seven nanobody tools against two of the largest Drosophila proteins, which are related to vertebrate titin and essential for muscle function. The study of such gigantic proteins is a challenge. They show that each of these nanobodies recognizes their epitope with high affinity (as expected from antibodies), fails to generate a signal after immune-fixation of a mutant for the cognate protein, do not cross-react with each other, and generates a signal in the muscle that makes sense with what one would anticipate for fly titin homologs. In addition, they show that these nanobodies have better penetration and labeling efficiency than conventional antibodies in thick tissues after classical paraformaldehyde fixation. Using these nanobodies, they could deduce the organization of the epitopes in different muscle types and propose a model for Sallimus and Projectin arrangement in muscles, including in larvae which are difficult to label with traditional antibodies due to their impermeable chitin skeleton. Finally, they could fuse the gene encoding one of the nanobodies to the open reading frame of NeonGreen and express the corresponding fusion protein in animals to use the probe in FRAP assays.

      The work is very well performed and convincing. However, given its significant redundancy in terms of biological conclusions with the companion study "Nanobodies combined with DNAPAINT super-resolution reveal a staggered titin nano-architecture in flight muscles" by the same authors, and other published papers, I recommend the authors further prove the use of their nanobodies in live assays. In particular, the authors should test whether they can use the nanobodies to induce protein degradation either permanently or conditionally.

      Thanks for this nice summary of our findings. We have now extended the analysis of the Nanobody-NeonGreen fusion expressing larval muscles and provide first proof of principle analysis of new fly strains that we generated that contain Sls-Nano2 or Sls-Nano42 nanobodies fused to a degradation signal. These induce lethality of the animals suggesting that Sls protein is partially non functional. We verified this by providing quantitative stainings of various Sls epitopes in these muscles suggesting that Sls is not fully degraded but rather partially modified in the Sls-Nano-deGrad expressing muscle fibers. These will be interesting tools to study Sls function during sarcomere homeostasis.

      Reviewer #2 (Public Review):

      The data presented in this manuscript are sound but rather descriptive. The contribution - as presented - is mostly of a technical nature. The authors correctly state that anti-GFP nanobodies, while used extensively across many model organisms, have limited utility for in vivo applications when the GFP-tagged protein in question displays abnormal behavior or is non-functional. The creation of nanobodies that are uniquely specific for the protein(s) of interest is therefore a significant improvement, especially since the Sallimus and Projectinspecific reagents reported here react with PFA-fixed material. At least one of these nanobodies, when expressed in vivo, decorates the appropriate target. The source of antigens used for the construction of the nanobody library is Drosophila-derived. The extent of homology of Drosophila Sallimus and Projectin with related proteins in other species is not discussed. Whether the nanobodies reported here would be useful in other (closely related?) species, therefore, remains to be established. For those studying muscle biology in Drosophila, the nanobodies described here will be publicly available as cDNAs. Ease of production implies a readily shared and standardized resource for the field.

      We thank this reviewer for appreciating that our Sallimus and Projectin nanobodies are useful. We now have extended the collection even further, including anti-Obscurin, αActinin and Zasp52 nanobodies, the latter two will also be useful for researcher studying other tissues, in particular Drosophila epithelial tissues. As always in the Drosophila field, all the here generated fly strains and plasmids will be made easily available to the community by placing them in stock centers or shipping them to the laboratories directly. As indicated, also the plasmids will be deposited at Addgene.

      Further characterization of these nanobodies by biochemical methods such as immunoblotting would be challenging, given the size of the target proteins. In view of the technical nature of this manuscript, the authors should perhaps critically discuss the distinction between bulky GFP tags versus the much smaller epitope tags and the nanobodies that recognize them, although this was covered in a recent eLife paper from the Perrimon lab. Insertion of small tags, in conjunction with nanobodies that recognize them, would be less perturbing than the much bulkier GFP tag and lend itself to genome-wide applications. Creating nanobodies uniquely specific for each protein encoded in the Drosophila genome is not realistic, and the targeted approach deployed here is obviously valuable.

      We are discussing the drawbacks of solely relying on GFP nanobodies, which requires GFP tagged proteins to be available and being functional. In particular for the sarcomeric proteins this is often not the case. We also cite the Perrimon paper, which was just published as we prepared this manuscript. We would like to point out to this reviewer that even tagging with a small epitope tag is considerable work in Drosophila and that the Perrimon paper, on which this reviewer is an author, does describe only two endogenously tagged genes with a nanotag (histone H2Av and Dilp2) the other genes described were expressed from a UAS source or in cell culture. We show here 22 nanobodies against 11 target epitopes.

      Nanobodies recognise typically folded epitopes and are rather unlikely to work in immunoblotting.

      The authors apply two different approaches to characterize the newly generated Nanobodies: more or less conventional immunohistochemistry with fluorescently labeled nanobodies, and in vivo expression of nanobodies fused to the fluorescent neongreen protein. The superiority of nanobodies in terms of tissue penetration has been shown by others in a direct comparison of intact fluorescently labeled immunoglobulins versus nanobodies. The authors state that in vivo labeling with nanobody fusions "thus far was done only with nanobodies against GFP, mCherry or short epitope tags." There is no fundamental difference between these recognition events and what the authors report for their Sallimus and Projectin-specific reagents. The section that starts at line 304 is thus a little bit of a 'straw man'. There is no reason to assume that a nanobody that recognizes a muscle protein would behave differently than a nanobody that would recognize that same protein (or another) when epitope- or GFP-tagged. What might be interesting is to examine the behavior of these muscle-specific nanobodies in the course of muscle contraction/relaxation: are there conformational alterations that promote dissociation of bound nanobodies? Do different nanobodies display discrete behavior in this regard? The manuscript is silent on how muscles behave in live L3 larvae. The FRAP experiment seems to suggest that not much is happening, but the text refers to the contraction of larval sarcomeres from 8.5 µM to 4.5 µM. Does the in vivo expressed nanobody remain stably bound during this contraction/relaxation cycle? What about the other nanobodies reported in this manuscript? Since the larval motion was reduced by exposure to diethylether, have the authors considered imaging the contractive cycle in the absence of such exposure?

      We appreciate the expert knowledge about nanobodies by this reviewer. However, nanobodies were not extensively applied in Drosophila tissues. Hence, we believe it is important to characterise their penetration in stainings and compare them carefully to antibodies. Such, the Drosophila reader will be aware of their advantages.

      We have now also included more data on the larval muscle morphology in the nanobody expressing muscles. Their morphology is normal. As larvae move around extensively all the time, the binding of the nanobodies to the target must be stable, otherwise it would not be bound when we fix them or anesthetize them. However, we have not attempted to image them at high resolution while crawling freely. From quantifying the crawling speed (about 1.5 mm per second, see Figure 9 S1) we hope this reviewer appreciates that high resolution imaging of sarcomeres in freely crawling larvae is highly non trivial.

      Given that the nanobodies bind well-folded epitopes with low picomolar dissociations constants, it is hard to imagine that conformational changes of the target would dissociate them. The nanobody would stabilise the recognised conformation by a ΔG of ≈60 KJ/ mole, and we would not expect that the chosen domains undergo major conformational changes.

      Reviewer #3 (Public Review):

      Loreau et al. have presented a well-written manuscript reporting clever, original work taking advantage of fairly new biotechnology - the generation and use of single chain antibodies called nanobodies. The authors demonstrate the production of multiple nanobodies to two titin homologs in Drosophila and use these nanobodies to localize these proteins in several fly muscle types and discover interesting aspects of the localization and span of these elongated proteins in the muscle sarcomere. They also demonstrate that one of these single chain antibodies can be expressed in muscle fused to a fluorescent protein to image the localization of a segment of one of these giant proteins called Sallimus in muscle in a live fly. Their project is well-justified given the limitations of the usual approaches for localizing and studying the dynamics of proteins in the muscle of model organisms such as the possibility that GFP tagging of a protein will interfere with its localization or function, and poor penetration of large IgG or IgM antibodies into densly packed structures like the sarcomere after fixation as compared to smaller nanbodies.

      They achieved their goals consistent with the known/expected properties of nanobodies: (1) They demonstrate that at least one of their nanobodies binds with very high affinity. (2) They bind with high specificity. (3) The nanobodies show much better penetration of fixed stage 17 embryos than do conventional antibodies.

      They use their nanobodies mostly generated to the N- and C-terminal ends of Sallimus and Projectin to learn new information about how these elongated proteins span and are oriented in the sarcomere. For example, in examining larval muscles which have long sarcomeres (8.5 microns), using nanobodies to domains located near the N- and C-termini, they show definitively that the predicted 2.1 MDa protein Sallimus spans the entire I-band and extends a bit into the A-band with its N-terminus embedded in the Z-disk and C-terminus in the outer edge of the A-band. Using a similar approach they also show that the 800 kDa Projectin decorates the entire myosin thick filament except for the H-zone and M-line in a polar orientation. Their final experiment is most exciting! They were able to express in fly larval muscles a nanobody directed to near the N-terminus of Sallimus fused to NeonGreen and show that it localizes to Z-disks in living larvae, and by FRAP experiments demonstrate that the binding of this nanobody to Sallimus in vivo is very stable. This opens the door to using a similar approach to study the assembly, dynamics, and even conformational changes of a protein in a complex in a live animal in real time.

      We thank this reviewer for appreciating the quality and impact of our approach and the our obtained results.

      There are only a few minor weaknesses about their conclusions: (1) They should note that in fact their estimate of the span of Sallimus could be an underestimate since their Nano2 nanobody is directed to Ig13/14 so if all of these 12 Ig domains N-terminal of their epitope were unwound it would add 12 X 30 nm = 360 nm of length, and even if unwound would add about 50 nm of length.

      We are discussing the length contribution of the 12 Ig domains now more extensively in the DNA PAINT super-resolution paper, however not in this resource paper as the 50 nm difference was not resolved with the confocal microscopy applied here to the larval muscle sarcomere.

      (2) They discuss how Sallimus and Projectin are the two Drosophila homologs of mammalian titin, however, they ignore the fact that there is more similarity between Sallimus and Projectin to muscle proteins in invertebrates. For example, in C. elegans, TTN-1 is the counterpart of Sallimus, and twitchin is the counterpart of Projectin, both in size and domain organization. The authors present definitive data to support Figure 9, their nice model for a fly larval sarcomere but fail to point out that this model likely pertains to C. elegans and other invertebrates. In Forbes et al. (2010) it was shown that TTN-1, which can be detected by western blot as ~2 MDa protein and using two polyclonal antibodies spans the entire Iband and extends into the outer edge of the A-band, very similar to what the authors here have shown, more elegantly for Sallimus. In addition, several studies have shown that twitchin (Projectin) does not extend into the M-line; the M-line is exclusively occupied by UNC-89, the homolog of Obscurin.

      We thank this reviewer for pointing out the important C. elegans literature that we have now included in this revised manuscript. We apologise for initially omitting them. They are indeed highly relevant.

      Reviewer #4 (Public Review):

      Authors report the generation and characterisation of several nanobodies for giant Drosophila sarcomeric proteins, Sallimus and Projectin the functional orthologs of titin. They describe an efficient pipeline that could potentially help in designing and producing nanobodies for other proteins. There are several advantages to using nanobodies in comparison to conventional antibodies and the authors nicely demonstrate that the generated nanobodies allow to precisely map subcellular localisation and even the protein orientation in the case of Projectin. They also show that small nanobody molecules have superior penetration and labelling efficiencies with respect to classical antibodies. Finally, the authors select one of the nanobodies to test whether it will efficiently detect native proteins in living tissue. They confirm that Sls-Nano2NeoGreen binds Sls in vivo in muscles of temporarily immobilized 3rd instar larva allowing to reveal sarcomeric Sls pattern and to demonstrate by FRAP experiments that Sls does not exchange during a short time period.

      This work is of significant value to a large audience. It provides a clear and precise pipeline for the generation of efficient nanobodies, which are invaluable tools of modern biology.

      We thank this reviewer for expressing strong support for our manuscript and appreciating its value for a large readership.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Chou-Zheng and Hatoum-Aslan follow up on their previous studies that have characterized the collaborations between the type III-A CRISPR-Cas10 Csm complex and various cellular housekeeping nucleases. The authors have previously demonstrated that the Csm complex associates with several nucleases that are implicated in RNA degradation via pulldown and mass spectrometry analysis. They also previously showed that some of these enzymes, including PNPase, are important for CRISPR RNA (crRNA) maturation and for robust anti-phage defense. They now show that a second housekeeping enzyme, RNase R, is required for crRNA maturation. PNPase and RNase R act in concert to produce the mature crRNA. The authors also analyze the interactions between Csm5 and both housekeeping proteins. Finally, they demonstrate that PNPase and RNase R are important for robust anti-plasmid activity when using crRNAs that are complementary to low-abundance transcripts.

      This is a well-written paper with clear figures and well-described experiments and results. The experiments in Figures 1 and 2 demonstrating the importance of RNase R for crRNA maturation are excellent. The biochemistry experiments in Figure 2 are especially convincing, in which the authors were able to reconstitute the concerted activities of RNase R and PNPase for crRNA biogenesis. The experiments in Figure 5 implicating PNPase and RNase R in robust anti-plasmid activity when targeting low-abundance transcripts are also clear and convincing, and the result is intriguing. Overall, these experiments provide a new example in a growing list of co-opted host proteins that are important for crRNA biogenesis and CRISPR-mediated defense.

      Thank you for your thoughtful review of our manuscript and comments overall!

      I do have some concerns about experiments in Figures 3 and 4 analyzing interactions between PNPase or RNase R and the Csm5 subunit of the Csm complex, and I believe that some of the authors' conclusions are not fully supported by the evidence presented in these experiments. These concerns, along with a question about their model, are detailed below.

      1) The authors used the structure of S. thermophilus Csm5 to guide their design of truncations to probe potential intrinsically disordered regions (IDR1 and IDR2) that may be sites of interaction with PNPase or RNase R. Since the authors submitted their manuscript, an AlphaFold predicted structure of the S. epidermidis Csm5 has been released on the AlphaFold Protein Structure Database. In this model, the IDR2 region is predicted by AlphaFold to be a beta strand at the center of a beta sheet, rather than a disordered region. If the prediction is accurate, deletion of this strand could cause Csm5 to misfold, making it difficult to interpret what causes loss of interaction with PNPase (i.e. deletion of a specific interaction surface versus misfolding of the overall tertiary structure). In light of this, the discussion surrounding these experiments should be altered to include more caveats about the truncations, and conclusions based on this experiment should be softened.

      While this manuscript was under review, several cryo-EM structures of the Cas10-Csm complex from S. epidermidis were solved and reported (Smith et al, 2022, Structure). In the unbound complex (PDB ID 7V02), IDR2 of Csm5 does indeed overlap with a short beta strand, but it is flanked by loops/unstructured regions. In addition, of the 46 residues that we deleted in the Csm546 mutant, 20 residues are unresolved in the experimentally-determined structure, supporting the notion that this region is generally flexible. Also, it is unlikely that this and the other Csm5 deletion mutants are misfolded because they all retain the ability to associate with the complex (Fig. 4B), and we were able to readily purify the mutant with the largest deletion (Csm546) without any issues (Fig. 5). To address this concern, we added panel D in Figure 4-figure supplement 1, which highlights the IDR regions in Csm5 from the recently-published S. epidermidis Cas10-Csm complex structure and integrated the observations mentioned above in the narrative (lines 241-247 in the marked-up revised manuscript). We also softened the conclusions based on these experiments (lines 276-278 in the marked-up revised manuscript): “Taken together, these results suggest that the IDR2 region of Csm5 likely plays a role in the recruitment and stimulation of PNPase, while the binding site for RNase R may reside elsewhere in Csm5”.

      2) The native gels testing interactions between Csm5 and RNase R show a slight change in mobility of RNase R upon the addition of Csm5. Although I agree with the authors' interpretation that this shift could be due to transient interactions between Csm5 and RNase R, it is also possible that the mobility of RNase R is affected simply based on the addition of a large excess of a second protein, even without a specific interaction between the two proteins. As a result, the evidence for direct interaction with Csm5 is limited. Discussion of how RNaseR is recruited by the Csm complex could contain more possible explanations. For example, it is possible that the interaction between RNase R and the Csm complex is mediated by another protein (e.g. PNPase could bridge interaction between the two) or that such an interaction could be stabilized by intermediate crRNA or target RNA binding by the Csm complex.

      Thank you for this comment. To help rule out the possibility that excess Csm5 could cause a shift of any protein nonspecifically, we included a control in the original manuscript in which the same native gel assay was performed with BSA and Csm5, and found that Csm5 fails to cause an upward shift in BSA (Figure 3-figure supplement 1). In addition, to bolster the claim of a direct interaction between Csm5 and RNase R, we performed an additional pulldown assay (Figure 3-figure supplement 2). Details are described under the essential revisions point number 3 above. Regarding the other possibilities mentioned, it is unlikely that PNPase is bridging the interaction with RNase R because when we delete PNPase from cells, we still get some maturation (Fig. 1E and Chou-Zheng and Hatoum-Aslan, eLife, 2019). Also, in the reconstituted system, RNase R can still perform some level of maturation on its own (Fig. 2D). These observations argue against the need for bridging interactions with PNPase. Furthermore, maturation occurs in the absence of target RNA, ruling out the possibility that target RNA bridging is necessary for RNase R-mediated crRNA maturation. However, we agree with the reviewer that it is possible that other components of the Cas10-Csm complex may help to recruit and stabilize the interaction with RNase R in vivo, and this possibility was already mentioned in the narrative in the original submission, although we did not explicitally state the intermediate crRNA as one such component (lines 213-215 and again in lines 413-416 in the marked-up revised manuscript). We have replaced “subunits” with “components” in line 415 to be more inclusive of this possibility. Since this is all still speculative, we opt not to elaborate further on this point in the current manuscript. Needless to say, we are actively pursuing other more quantitative assays to measure the interactions between Csm5 and PNPase/RNase R and hope to have such data available in a follow-up manuscript.

      3) On lines 367-391, the authors propose a model for how PNPase and RNase R may contribute to defense against foreign DNA through their recruitment by the Csm complex to the target transcript. However, their experiments do not test whether PNPase and RNase R must interact with the Csm complex to support anti-plasmid activity. Indeed, it may make more sense for free RNase R to be involved in defense, similar to how free activated Csm6 degrades transcripts non-specifically, rather than only cleaving transcripts in close proximity to the Csm complex. The authors could expand their discussion to mention the possibility that free RNase R or PNPase are acting in anti-plasmid defense.

      Thank you for this suggestion. The following statement has been added to the discussion (lines 393-395 in the marked-up revised manuscript): “Once recruited by the complex, PNPase and RNase R may degrade nucleic acids in the vicinity nonspecifically, similarly to Csm6.”

      Reviewer #2 (Public Review):

      This work follows up on an earlier publication that showed PNPase and RNase J2 play important roles in CRISPR RNA processing (doi: 10.7554/eLife.45393). Here, the authors show that RNase R also plays a critical role in CRISPR RNA maturation. In addition, they show that RNase R and PNPase are both recruited to the type III CRISPR complex (Cas10-Csm) via direct interactions with the Cmr5 subunit and that deletion of an intrinsically disordered region (IDR2) on Cmr5 selectively inhibits PNPase recruitment but not RNase R. The authors show unquantified stimulation of PNPase nuclease activity by Cmr5. Phage challenge assays are performed to test the impact of PNPase and RNase R deletion mutations on CRISPR-Cas mediated phage defense. Contrary to expectation, over-expression of the CRISPR system in cells that contain a deletion of PNPase and/or RNase R, maintain robust anti-phage immunity. The interpretation of this experiment is that RNase R and PNPase may be dispensable in an over-expression system that produces high (non-natural) concentrations of the Csm complex. They test this idea using a system that expresses the CRISPR-Cas components off of a chromosomally encoded locus (strain RP62a) and challenge these cells using a plasmid conjugation assay. In this iteration, deletion of PNPase has no impact on CRISPR performance, while deletion of RNase R "exhibited a moderate" attenuation of the immune response. In contrast, to either single gene deletion, the PNPase and RNase R double mutant showed a near complete loss of immunity.

      Overall, the paper provides convincing evidence that PNPase and RNase R are involved in crRNA processing, and that they are recruited to the type III complex via Cmr5. The work on RNase R is entirely new and the role of PNPase is expanded. The role of cellular RNases in CRISPR RNA biogenesis is important, though some of the results are subtle and some of the biochemistry would benefit from a more quantitative analysis.

      Thank you for your thorough assessment and comments overall.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a well-executed study using cutting-edge proteomics analysis to characterize muscle tissue from a genetically diverse mouse population. The use of only females in the study is a serious limitation that the authors acknowledge. The statistical methods, including protein quantification, QTL mapping, and trait correlation analysis are appropriate and include corrections for multiple testing. One concern is that missense variants, if they occur in peptides used to quantify proteins, could lead to false-positive signatures of low abundance (see lines 123-127). The experimental validation and deep dive into UFMylation provide some confidence in the reliability of other associations that can be mined from these data. The authors have provided a web-based tool for exploring the data.

      We thank the reviewer for these very positive comments and for reviewing the manuscript.

      We agree the quantification of peptides containing missense variants could confound quantification at the protein level. This is an important consideration when there are only a few peptides identified for a specific protein. However, in our data the average number of peptides used to quantify the 14 proteins containing missense-associated pQTLs was ~68 peptides/protein (lowest was 5 peptides for FGB and highest 703 peptides for NEB).

      In the case of EPHX1, we quantified 15 peptides (Figure R1A). We identified a peptide adjacent to R338 spanning amino acids 339-347. As such, mutation of R338C would prevent trypsin from cleavage resulting in the missense peptide not being identified and may lead to false-positive signatures of low abundance as suggested by the reviewer. To investigate this, we re-quantified EPHX1 relative protein abundance with or without the peptide spanning 339-347 for each genotype (Figure R1B). This made little difference to protein quantification and EPHX1 abundance was still significantly lower following mutation of R338C (AA genotype). In fact, quantification at the peptide-level revealed 12 out of the remaining 14 peptides were also significantly lower in AA genotype (data not shown).

      Although we agree this a very important consideration, we are mindful of the length of the article and feel including these data would not significantly improve the manuscript. We therefore request to not include these data as it would detract from the main findings of the paper focused on phenotypic associations and validation of UFMylation as a regulator of muscle function.

      Figure 1R. (A) Identified peptides from EPHX1 mapped onto primary amino acid sequence highlighting the missense mutation induced by SNP rs32746574 that was associated to EPHX1 protein levels by pQTL analysis. (B) Relative quantification of EPHX1 between the two genotypes of SNP rs32746574 with and without the peptide neighboring the missense mutation (amino acids 339-347) (**p<0.001, students t-test)

    1. Author Response

      Reviewer #1 (Public Review):

      Current generative models of protein sequences such as Potts models, Variational autoencoders, or autoregressive models must be trained on MSA data from scratch. Therefore, they cannot learn common substitution or coevolution patterns shared between families, and require a substantial number of sequences, making them less suitable for small protein families (e.g., conserved only for eukaryotes or viruses). MSA transformers are promising alternatives as they can generalize across protein families, but there is no established method to generate samples from them. Here, Sgarbossa et al. propose a simple recursive sampling procedure based on iterative masking to generate novel sequences from an input MSA. The sampling method has three hyperparameters (masking frequency, sampling temperature, and the number of iterations) which are set by rigorous benchmarking. The authors compare their approach to bmDCA, and evaluate i) single sample quality metrics ii) sample diversity and similarity to native sequences iii) similarity between original and generated sequence distribution, and iv) phylogeny/topology in sequence space of the generated distribution.

      Strengths:

      • The proposed sampling approach is simple.

      • The computational benchmarking is thorough.

      • The code is well organized and looks easy to use.

      Weaknesses:

      • There is no experimental data to back up the methodology.

      • It is not clear whether the sampling hyperparameter used is optimal for all protein sizes.

      • I am unsure that the bmDCA baseline method was trained appropriately and that the sampling method was adequate for protein design purposes (regular sampling).

      • Quality assessment of predicted structures is incomplete.

      • The proposed metrics for evaluating the diversity of generated sequences are fairly technical.

      We respond to each of these points below, in the section titled "Recommendations for the authors", since these questions were asked by the reviewer in more detail there.

      Impact assessment: The claim that MSA Transformer could be useful for protein design is supported by the computational benchmark. This work will be useful for researchers interested in applying MSA-Transformer models for protein design

      We thank the reviewer for this encouraging assessment of our work, and for their very interesting suggestions which helped us improve our manuscript.

      Reviewer #2 (Public Review):

      The manuscript by Sgarbossa et al. proposes the use of a machine learning technique used in Language Models (LM) and adapted to protein sequences (PLM) as a means to generate synthetic sequences that retain functional properties contained in the original multiple sequence alignment (MSA) of natural sequences. This technique (or a similar one) called MSA Transformers is also a component of the supervised learning methodology Alphafold which has been successful in predicting protein structures and complexes of proteins. The premise of this study is that an iterative masking approach can be used as a sampling technique to create a diverse set of sequences that still preserve important properties of the original natural sequences. For example, such samples retain homology properties, score well in terms of retaining relevant pairwise or epistatic interactions, and produce "foldable" sequences when used as input for Alphafold and scored via its confidence metric pLDDT. In order to provide support for this claim, the authors compare against Direct Coupling Analysis (DCA), which is a global sequence modeling technique that has shown to be successful in many aspects of the structure and function of proteins and particularly in generating and sampling sequences analogous to the input MSA. Most importantly, DCA and its generative version bmDCA have been shown to produce functional sequences experimentally. The authors then establish that the properties of sequences of the MSA Transformer with iterative masking, have in general better scores in terms of homology, statistical energies, and pLDDT scores than the ones from bmDCA and have spectral, statistical and similarity properties more akin to the natural sequences than those from the bmDCA methodology, except for the reproduction of single and pairwise statistics. The sequences from the MSA Transformer, however, replicate better the three body statistics of the natural sequences. The authors conclude that MSA Transformers with iterative masking is a valid technique for sequence design and it is an important alternative to the use of DCA or de novo physics-based methods or supervised learning techniques.

      Given the success of the use of language models in machine learning and its contributions to the structure prediction of protein and complexes, I see this study as a required follow-up to the breadth of work of amino acid coevolution spearheaded by DCA methodologies. In general, I believe this is a useful and relevant study for the community and opens up several avenues for research connecting Transformers with unsupervised protein design. Although the study provides support for this technique to be potentially useful for protein design, I was not completely convinced that it will yield more transformative results than the ones using Potts models. The differences, although consistent across the study, seem to be within "the margin of error" compared to bmDCA.

      We thank the reviewer for this positive assessment of our work, and for their cogent remarks which helped us improve our manuscript.

      We agree that in the case of large protein families, the main message is that our sequence generation method based on MSA Transformer scores at least as well as bmDCA. Given that bmDCA has been experimentally validated as a generative model, we believe that this is a valuable result. Our revised manuscript makes this point stronger, by showing that our sequence generation method based on MSA Transformer yields sequences that score similarly to those generated by bmDCA at low sampling temperature, while retaining substantially more sequence diversity.

      In addition, following the reviewer's suggestion below, we now present results for smaller protein families, whose shallow MSAs make it difficult to accurately fit Potts models. These results are presented in a new section of Results, titled "Sequence generation by the iterative masking procedure is successful for small protein families", including the new Figure 3. As mentioned there, "Fig. 3 reports all four scores discussed above in the case of these 7 small families, listed in Table S1 (recall that the families considered so far were large, see Table 1). We observe that MSA-Transformer–generated sequences have similar HMMER scores and structural scores to natural sequences. MSA-Transformer–generated sequences also generally have better HMMER scores and structural scores than those generated by bmDCA with default parameters. While low-temperature bmDCA yields better statistical energy scores (as expected), and also gives HMMER scores and structural scores comparable to natural sequences, it in fact generates sequences that are almost exact copies of natural ones (see Fig. 3, bottom row). By contrast, MSA Transformer produces sequences that are quite different from natural ones, and have very good scores." This shows that our method not only performs as well as bmDCA for large families, but also has a broader scope, as it is less limited by MSA depth than bmDCA.

      I also have certain comments related to the use of these 3 metrics to analyze the performance of the sampling. On the one hand, HMMER which has had a great utility for Pfam and the community in general is a score that is not necessarily reflecting the global properties of the sequences. In other words, we might be using a simpler statistical model to evaluate the performance of two other models (MSA Transformers and bmDCA) which are richer and that capture more sequence dependencies than the hidden Markov model.

      We agree with the reviewer that HMMER scores are associated with simpler statistical models, which cannot fully represent the data. We nevertheless believe that these scores remain useful to assess homology. In the framework of our study, they show that the sequences we generate are deemed "good homologs" by HMMER - similarly to natural sequences that would be extracted from a database by this widely-used tool. This said, we agree with the reviewer that one should not overinterpret HMMER scores, and we have reduced our discussion of their correlations with Hamming distances to avoid giving too much importance to this point.

      Moreover, we now present new scores that give a more complete picture of the quality of our generated sequences:

      • Regarding structure, in addition to the AlphaFold pLDDT score, we now also report the RMSD between a reference experimental structure of the relevant family (see Table 1) and the AlphaFold structure predicted for each sequence studied. The results from the RMSD analysis corroborate those obtained with pLDDT and show that predicted structures are indeed similar to the native ones. These results are now discussed in the main text. We believe that this point strengthens our conclusions and we thank the reviewer for suggesting this analysis.

      • We also performed a retrospective validation using published experimental results. For chorismate mutase, a protein family which was experimentally studied in [Russ et al 2020] using bmDCA, we now report estimated relative enrichments for our generated sequences in Figure S8, in addition to our four usual scores now shown for this family in Figure S7. In addition, for protein families PF00595 and PF13354, we now report deep mutational scanning scores for our generated sequences in Figure S9. These results strengthen our conclusion that our sequence generation method based on MSA Transformer is highly promising.

      For the case of the statistical energy score, the authors decided to use a sampling temperature T=1, but the authors note that this temperature can be reduced, as it was done in the experimental paper, to produce sequences with better energies, therefore this metric can be easily improved by modifying the temperature. The authors mentioned that they did try to reduce the temperature and that they also improved their HMMER score, however, they decided against it because the pairwise statistics were affected. However, pairwise statistics was precisely the only factor where bmDCA seemed superior to the MSA transformer, so reducing it should be an acceptable trade-off in order to optimize the other two important metrics.

      We thank both reviewers for raising this very interesting point. As mentioned above in our response to the first reviewer, we have now performed a comprehensive comparison of our MSA Transformer-generated data not only to bmDCA-generated data at sampling temperature T=1 but also at lower sampling temperatures. We considered the two temperature values chosen in [Russ et al 2020], namely T=0.33 and T=0.66. For completeness, we also considered the two values of regularization strength λ from [Russ et al 2020] for these three temperatures, in the case of family PF00072, as reported in Table S5. Given the relatively small impact of λ observed there, we kept only one value of λ for each value of T in the rest of our manuscript namely, λ=0.01 for T=1 to match the parameters in [Figliuzzi et al 2018], and λ=0.001 for T=0.33 and T=0.66 as it gave slightly better scores in Table S5. Note that for our additional study of small protein families, we employed λ=0.01 throughout because it is better suited to small families. In particular, we now include results obtained for bmDCA at λ=0.001 and T=0.33 in all figures of the revised manuscript.

      Our general findings, which are discussed in the revised manuscript, are that decreasing T indeed improves the scores of bmDCA-generated sequences. However, the main improvement regards statistical energy (as expected from lowering T), while the improvements of other scores (HMMER score, and, more importantly, structural scores) are more modest. Even using T=0.33 for bmDCA, our MSA Transformer-generated sequences have similar or better scores compared to bmDCA-generated sequences, apart from statistical energy (see Figure 1 and Tables S2 and S3). Moreover, we find that decreasing T with bmDCA substantially decreases MSA diversity, while MSA Transformer-generated sequences do not suffer from such an issue (see Figure S1). In fact, at low T, bmDCA concentrates on local minima of the statistical energy landscape (see Figures 2, 5 and S5), resulting in low diversity.

      Overall, these new results confirm that our procedure for generating sequences using MSA Transformer is promising, featuring scores comparable with low-temperature bmDCA sequences and high diversity.

      Finally, the use of pLDDT could also present some biases, since Alphafold itself uses transformers, I wonder if this fact could lead to the fact that sequences obtained with transformers simply perform better by definition.

      We thank the reviewer for raising this intriguing point. It is true that MSA Transformer has an architecture that is very similar to that of the EvoFormer module of AlphaFold. However, AlphaFold couples the EvoFormer module to a structural module, and is trained in a supervised way to predict protein structure, which makes it significantly different from MSA Transformer.

      Nevertheless, we agree that the AlphaFold pLDDT score does not give a complete view of structure. As mentioned above, to improve this, in addition to pLDDT, we now also report the RMSD between a reference experimental structure of the relevant family (see Table 1) and the AlphaFold structure predicted for each sequence studied. The results from the RMSD analysis corroborate those obtained with pLDDT and show that predicted structures are indeed similar to the native ones. These results are now discussed in the main text.

      The authors should try to address all these concerns. My assessment is that these concerns do not demerit the relevance and how timely this study is, but I would like to see a more fair comparison of these metrics where more optimizations to bmDCA are made, e.g. lower T, to have a more accurate comparison of the methods, even if that is reflected in lower performance on pairwise statistics.

      We did our best to address all these points. We believe that the additions mentioned above have substantially improved our manuscript.

      My assessment is that this manuscript's main strength is in introducing a state-of-the-art technique that has already been extremely successful in the field of computer science and artificial intelligence into the field of amino acid coevolution. By adapting this technique and creating a sampling version that is compatible with other successful methodologies, this work will lead to many other studies dealing with function and the effects of sequence variation of biomolecules.

      Again, we thank the reviewer for their encouraging assessment.

    1. Author Response

      Reviewer #1 (Public Review):

      This fMRI study investigated how memories are updated after reinterpreting past events. Participants watched a movie and subsequently recalled individual scenes from that movie. Importantly, the movie ends with a twist that changes the interpretation of earlier scenes in the movie. One group of participants watched the movie with the twist at the end, one group did not get to see the twist, and a third group was already informed about this twist before watching the movie. Analyses compared the similarity of activity patterns to (encoded or recalled) events across participants within regions of the default mode network (DMN). The design allowed for multiple relevant comparisons, confirming the prediction that activity patterns in DMN regions reflect the (re)interpretation of the movie (during movie viewing and/or during recall).

      The study is well-designed and executed. The inclusion of multiple analyses involving distinct comparisons strengthens the evidence for the role of the DMN in memory updating.

      The following points may be relevant to consider:

      1) The cross-participant pattern analysis method used here is not standard, with such analyses typically done within participants (or across participants, but after aligning representational spaces). Considering individual variability in functional organization, the method is likely only sensitive to coarse-scale patterns (e.g., anterior vs posterior parts of an ROI). This is not necessarily a weakness but is relevant when interpreting the results.

      We agree with the reviewer that functional misalignment might have played against us here. We designed this study as a natural successor of our previous work in which we captured reliable and multimodal scene-specific cross-participant pattern similarity during encoding and recall in standard space. In this revised version, we provide further evidence on how scene content is captured and influences our results. Nonetheless, we agree with your comment and add the following section to the discussion to encourage considering this point while interpreting the results.

      "Moreover, our current method relies on averaging spatially-coarse activity patterns across subjects (and time points within an event). Future extensions of this work may benefit from using functional alignment methods (Haxby et al 2020, Chen et al 2015) to capture more fine-grained event representations which are shared across participants."

      2) Unlike previous work, analyses are not testing for scene-specific information. Rather, each scene is treated separately to establish between-group differences, and results are averaged across scenes. This raises the question of whether the patterns reflect scene-specific information or generic group differences. For example, knowing the twist may increase overall engagement, both when viewing the movie (spoiled group) and when recalling it (spoiled group + twist group). The DMN may be particularly sensitive to such differences in overall engagement.

      You have brought up great points. We addressed them in two ways: (1) We ran a univariate analysis in each DMN ROI to look at the role of overall regional-average response magnitude in our results. We did not observe a significant effect of group or an interaction between group and condition. (2) We ran a scene-specificity analysis in a new Results section entitled “The role of scene content” (Figure 4). This section is focused on comparing interaction index (Figure 2C), as an indicator of memory updating, under different manipulations. Interaction index reflects the reversal of neural similarity during encoding and recall. Our results suggest that we don’t see the same effects if we shuffle the scene labels and recompute the pattern similarity analyses. Please see added text and figures below:

      "To test whether our reported results were mainly driven by the similarities and differences in multivariate spatial patterns of neural representations, as opposed to by univariate regional-average response magnitudes, we ran a univariate analysis in each ROI. This analysis revealed no significant effect of group (“spoiled”, “twist”, “no-twist”) or interaction between group and condition (movie, recall) (Table 1, see Methods for details).

      Next, to determine whether scene-specific neural event representations—as opposed to coarser differences in general mental state across all scenes with similar interpretations—drive our observed pISC differences, we shuffled the labels of critical scenes within each group before calculating and comparing pISC across groups. By repeating this procedure 1000 times and recalculating the interaction index at each iteration, we constructed a null distribution of interaction indices for shuffled critical scenes (light magenta distributions in Figure 4B). In 12 out of 24 DMN regions, interaction indices were statistically significant based on the shuffled-scene distribution (p < .025, FDR controlled at q < .05). All of these 12 regions were among the ROIs that showed meaningful effects in our original analysis (Figure 2C). Regions with significant scene-specific interaction effects are marked as blue dots with black borders in Figure 4B. Overall, the findings from this analysis confirm that our results are driven by changes to scene-specific representations."

      3) The study does not reveal what the DMN represents about the movie, such that its activity changes after knowing the twist. The Discussion briefly mentions that it may reflect the state of the observer, related to the belief about the identity of the doctor. This suggests a link to the theory of mind/mentalizing, but this is not made explicit. Alternatively, the DMN may be involved in the conflict (or switching) between the two interpretations.

      Great points. We added to the discussion about the role of mentalizing network and in the particular temporo-parietal cortex. About your last point, we think our whole brain findings outside DMN (ACC and dlPFC) might relate to that point. We discussed these further in the paper.

      "We performed two targeted analyses to look for evidence of memory updating across encoding and recall: the interaction analysis (Figure 2C) and the encoding-recall analysis (Figure 3). We hypothesized that a shift in direction of pISC difference would occur when neural representations during recall in the “twist” group start to reflect the Ghost interpretation. The interaction analysis probed this shift indirectly by taking into account the effects of both encoding-encoding and recall-recall analyses. Unlike the interaction analysis, in the encoding-recall analysis, we directly compared neural event representations during encoding and recall. Interestingly, all regions exhibiting an effect across the two encoding-recall analyses, excluding left anterior temporal cortex, were present in the interaction results. Among these regions, the left angular gyrus/TPJ exhibited an effect across all three analyses. As a core hub in the mentalizing network, temporo-parietal cortex has been implicated in theory of mind through perspective-taking, rationalizing the mental state of someone else, and modeling the attentional state of others (Frith and Frith 2006, Guterstam et. al 2021, Saxe and Kanwisher 2003). The motivations behind some actions of the main character in the movie heavily depend on whether the viewer perceives them as a Doctor or a Ghost, and participants may focus on this during both encoding and recall. We speculate that neural event representations in AG/TPJ in the current experiment may be related to mentalizing about the main character’s actions. Under this interpretation, the updated event representations during recall following the twist would be more closely aligned to the “spoiled” encoding representations, as a consequence of memory updating in the “twist” group.

      In our whole brain analysis, these regions did not have significant interaction effects, which suggests that the effects were isolated to encoding. In the whole-brain analysis, we also observed a significant encoding-encoding and interaction effects in anterior cingulate cortex, as well as recall-recall and interaction effects in dlPFC. These results suggest that both the "spoiled" manipulation and the "twist" may recruit top-down control and conflict monitoring processes during naturalistic viewing and recall."

      4) The design has many naturalistic aspects, but it is also different from real life in that the critical twist involves a ghost. Furthermore, all results are based on one movie with a specific plot twist. It is thus not clear whether similar results would be obtained with other and more naturalistic plot twists.

      We added this as a limitation of the study.

      "Our findings provide further insight into the functional role of the DMN. However, these results have been obtained using only one movie. While naturalistic paradigms better capture the complexity of real life and provide greater ecological generalizability than highly-controlled experimental stimuli and tasks (Nastase et al., 2020), they are still limited by the properties of the particular naturalistic stimulus used. For example, this movie—including the twist itself—hinges on suspension of disbelief about the existence of ghosts. Future work is needed to extend our findings about updating event memories to a broader class of naturalistic stimuli: for example, movies with different kinds of (non-supernatural) plot twists, spoken stories with twist endings, or using autobiographical real-life situations where new information (e.g. discovering a longtime friend has lied about something important) triggers re-evaluation of the past (e.g. reinterpreting their friend’s previous actions)."

      5) Only 7 scenes (out of 18) were included in the analysis. It is not clear if/how the results depend on the selection of these 7 scenes.

      Thank you for bringing this up. These scenes were pre-selected for the analyses, as they are the only scenes that are rated high by our independent raters (not study participants) on “twist influence”, meaning that knowing the twist could dramatically change their interpretation. So, we had a priori reasons to hypothesize that the effect will be strong in these scenes. To address your point, we report results by including all 18 scenes in a new Results section entitled “The role of scene content” and in Figure 4A. While the effect was weaker for all scenes it was still apparent in this conservative analysis. As expected, however, including 7 critical scenes produces stronger results than including all scenes or the uncritical scenes (all minus critical scenes). Please see the “The role of scene content” in Results and in Figure 4 for more detailed information.

      "The role of scene content In the prior analyses, we focused on “critical scenes”, selected based on ratings from four raters who quantified the influence of the twist on the interpretation of each scene (see Methods). An independent post-experiment analysis of the verbal recall behavior of the fMRI participants yielded “twist scores” that were also highest for these scenes; that is, the expected and perceived effect of twist information on recall behavior were found to match. In our next analysis, we asked whether the neural event representations reflect these differences in the twist-related content of the scenes. In other words, are the “critical scenes” with highly twist-dependent interpretations truly critical for our observed effects?

      To answer this question, we re-ran our main encoding-encoding and recall-recall pISC analysis in each DMN ROI (Figure 2-3). We calculated interaction indices (Figure 2C) first by including all scenes, and second by including only the 11 non-critical scenes. To better compare the effect of including different subsets of scenes to our original results, in Figure 4 we show the results in 15 ROIs that exhibited meaningful effects in our main analyses (Figure 2C). Figure 4A demonstrates that “critical scenes” yielded higher interaction indices compared to all scenes or non-critical scenes across all ROIs. The interaction score across all DMN ROIs was significantly higher in “critical scenes” than all scenes (t(23) = 7.19, p = 2.53 x 10-7) and non-critical scenes (t(23) = 7.3, p = 1.95 x 10-7). These results show that critical scenes are indeed responsible for the observed pISC differences across groups."

      Reviewer #2 (Public Review):

      In this manuscript titled "Here's the twist: How the brain updates the representations of naturalistic events as our understanding of the past changes", the authors reported a study that examined how new information (manipulated as a twist at the end of a movie) changes the neural representations in the default mode network (DMN) during the recall of prior knowledge. Three groups of participants were compared - one group experienced the twist at the end, one group never experienced the twist, and one group received a spoiler at the beginning. At retrieval, participants received snippets of 18 scenes of the movie as cues and were asked to freely describe the events of each scene and to provide the most accurate interpretation of the scene, given the information they gathered throughout watching.

      All three groups were highly accurate in the recall of content. The groups that experienced the twist at the end as well as at the beginning as a spoiler showed a higher twist score (the extent to which twist information was incorporated into the recall), while seemingly also keeping the interpretation without the twist ("Doctor representation") intact. Neurally, several regions in the DMN showed significant interaction effects in their neural similarity patterns (based on intersubject pattern correlation), indicating a change in interpretation between encoding and recall in the twist group uniquely, presumably reflecting memory updating.

      Several points that I think should be addressed to strengthen the manuscript:

      1) The results from encoding-retrieval similarity analysis (particularly the one depicted in Figure 3B) don't match the results from encoding/retrieval interaction (particularly those shown in Figure 2C). While they were certainly based on different comparisons, I would think that both analyses were set up to test for memory updating. Can the authors comment on this divergence in results?

      Thank you for your comment. Except for one ROI, the other two regions in Figure 2C are present in the interaction analysis. The ROI at the frontal pole might be hard to see from this angle but in fact it holds a high effect size in interaction analysis. So we do not see a big divergence between these two results. But taking into account the recall-recall results, we agree that there seems to be inhomogeneity. We discussed these further in the discussion.

      "We performed two targeted analyses to look for evidence of memory updating across encoding and recall: the interaction analysis (Figure 2C) and the encoding-recall analysis (Figure 3). We hypothesized that a shift in direction of pISC difference would occur when neural representations during recall in the “twist” group start to reflect the Ghost interpretation. The interaction analysis probed this shift indirectly by taking into account the effects of both encoding-encoding and recall-recall analyses. Unlike the interaction analysis, in the encoding-recall analysis, we directly compared neural event representations during encoding and recall. Interestingly, all regions exhibiting an effect across the two encoding-recall analyses, excluding left anterior temporal cortex, were present in the interaction results. Among these regions, the left angular gyrus/TPJ exhibited an effect across all three analyses. As a core hub in the mentalizing network, temporo-parietal cortex has been implicated in theory of mind through perspective-taking, rationalizing the mental state of someone else, and modeling the attentional state of others (Frith and Frith 2006, Guterstam et. al 2021, Saxe and Kanwisher 2003). The motivations behind some actions of the main character in the movie heavily depend on whether the viewer perceives them as a Doctor or a Ghost, and participants may focus on this during both encoding and recall. We speculate that neural event representations in AG/TPJ in the current experiment may be related to mentalizing about the main character’s actions. Under this interpretation, the updated event representations during recall following the twist would be more closely aligned to the “spoiled” encoding representations, as a consequence of memory updating in the “twist” group.

      Our findings are consistent with the view that DMN synthesizes incoming information with one’s prior beliefs and memories (Yeshurun et al 2021). We add to this framework by providing evidence for the involvement of DMN regions in updating prior beliefs in light of new knowledge. Across our different encoding and recall analyses, we observe memory updating effects in a varied subset of DMN regions that do not cleanly map onto a specific subsystem of DMN (Robin and Moscovitch 2017, Ranganath and Ritchey 2012, Ritchey and Cooper 2020). Rather than being divergent, these results might be reflecting inherent differences between the processes of encoding and recall of naturalistic events. It has been proposed that neural representations corresponding to encoding of events are systematically transformed during recall of those events (Chen et al 2017, Favila et al 2020, Musz and Chen 2022). While we provide evidence for reinstatement of memories in DMN, our findings also support a transformation of neural representation during recall, as encoding-recall results were weaker in some areas than recall-recall findings. This transformation could affect how different regions and sub-systems of DMN represent memories, and suggests that the concerted activity of multiple subsystems and neural mechanisms might be at play during encoding, recall and successful updating of naturalistic event memories."

      2) The recall task was self-paced. Can reaction time information be provided on how long participants needed to recall? Did this differ across groups? Presumably in the twist group and spoiled group participants might have needed a longer time to incorporate both the original and twist interpretation.

      This is an interesting idea. Unfortunately, we could not measure this accurately because our recall cues were snippets from the beginning of each scene with different length (selected based on content). And updating could begin from the beginning of those snippets (but we wouldn’t know when). We will consider this point in the future related designs.

      How was the length difference across events taken into consideration in the beta estimates?

      They were used as event durations in the GLM model.

      Also, is there an order effect, such that one type of interpretation tended to be recalled first?

      This is hard to measure as this only occurs in a subset of scenes. But we assume it happens in other people’s brains as well

      This is indeed hard to measure as you mentioned. We will provide the transcripts when sharing the data and hopefully this will facilitate future text-analysis work on this dataset to answer interesting questions like this.

      3) The correlation analysis between neural pattern change and behavioral twist score is based on a small sample size and does not seem to be well suited to test the postulation of the authors, namely that some participants may hold both interpretations in their memory. Interestingly, the twist score of the spoiled group was similar to the twist group, indicating participants in this group might have held both interpretations as well. Could this observation be leveraged, for example by combining both groups (hence better powered with larger sample size), in order to relate individual differences in neural similarity patterns and behavioral tendency to hold both interpretations?

      Even though both groups showed signs of holding both interpretations in mind, the process happening in their brain during the recall is different. In particular, we do not expect to see any updating effect in the spoiled group. So it wouldn’t seem accurate to combine these groups to test the effect of incomplete updating.

      4) Several regions within the DMN were significant across the analysis steps, specifically the angular gyrus, middle temporal cortex, and medial PFC. Can the authors provide more insights on how these widely distributed regions may act together to enable memory updating? The discussion on the main findings is largely at a rather superficial level about DMN, or focuses specifically on vmPFC, but neglects the distributed regions that presumably function interactively

      Thanks for bringing this up. We added text to discussion to respond to this very valid point. Please see the added text in our response to your first point. One more snippet added to the discussion about this:

      "In addition to mPFC, right precuneus and parts of temporal cortex exhibited significantly higher pattern similarity in the “twist” and “spoiled” groups who recalled the movie with the same interpretation. Precuneus is a core region in the posterior medial network, which is hypothesized to be involved in constructing and applying situation models (Ranganath and Ritchey 2012). Our findings support a role for precuneus in deploying interpretation-specific situation models when retrieving event memories. In particular, we suggest that the posterior medial network may encode a shift in the situation model of the “twist” group in order to accommodate the new Ghost interpretation.

      We performed two targeted analyses to look for evidence of memory updating across encoding and recall: the interaction analysis (Figure 2C) and the encoding-recall analysis (Figure 3). We hypothesized that a shift in direction of pISC difference would occur when neural representations during recall in the “twist” group start to reflect the Ghost interpretation. The interaction analysis probed this shift indirectly by taking into account the effects of both encoding-encoding and recall-recall analyses. Unlike the interaction analysis, in the encoding-recall analysis, we directly compared neural event representations during encoding and recall. Interestingly, all regions exhibiting an effect across the two encoding-recall analyses, excluding left anterior temporal cortex, were present in the interaction results. Among these regions, the left angular gyrus/TPJ exhibited an effect across all three analyses. As a core hub in the mentalizing network, temporo-parietal cortex has been implicated in theory of mind through perspective-taking, rationalizing the mental state of someone else, and modeling the attentional state of others (Frith and Frith 2006, Guterstam et. al 2021, Saxe and Kanwisher 2003). The motivations behind some actions of the main character in the movie heavily depend on whether the viewer perceives them as a Doctor or a Ghost, and participants may focus on this during both encoding and recall. We speculate that neural event representations in AG/TPJ in the current experiment may be related to mentalizing about the main character’s actions. Under this interpretation, the updated event representations during recall following the twist would be more closely aligned to the “spoiled” encoding representations, as a consequence of memory updating in the “twist” group.

      Our findings are consistent with the view that DMN synthesizes incoming information with one’s prior beliefs and memories (Yeshurun et al 2021). We add to this framework by providing evidence for the involvement of DMN regions in updating prior beliefs in light of new knowledge. Across our different encoding and recall analyses, we observe memory updating effects in a varied subset of DMN regions that do not cleanly map onto a specific subsystem of DMN (Robin and Moscovitch 2017, Ranganath and Ritchey 2012, Ritchey and Cooper 2020). Rather than being divergent, these results might be reflecting inherent differences between the processes of encoding and recall of naturalistic events. It has been proposed that neural representations corresponding to encoding of events are systematically transformed during recall of those events (Chen et al 2017, Favila et al 2020, Musz and Chen 2022). While we provide evidence for reinstatement of memories in DMN, our findings also support a transformation of neural representation during recall, as encoding-recall results were weaker in some areas than recall-recall findings. This transformation could affect how different regions and sub-systems of DMN represent memories, and suggests that the concerted activity of multiple subsystems and neural mechanisms might be at play during encoding, recall and successful updating of naturalistic event memories."

      Reviewer #3 (Public Review):

      Zadbood and colleagues investigated the way key information used to update interpretations of events alter patterns of activity in the brain. This was cleverly done by the use of "The Sixth Sense," a film featuring a famous "twist ending," which fundamentally alters the way the events in the film are understood. Participants were assigned to three groups: (1) a Spoiled group, in which the twist was revealed at the outset, (2) a Twist group, who experienced the film as normal, and (3) a No-Twist group, in which the twist was removed. Participants were scanned while watching the movie and while performing cued recall of specific scenes. Verbal recall was scored based on recall success, and evidence for descriptive bias toward two ways of understanding the events (specifically, whether a particular character was or was not a ghost). Importantly, this allowed the authors to show that the Twist group updated their interpretation. The authors focused on regions of the Default Mode Network (DMN) based on prior studies showing responsiveness to naturalistic memory paradigms in these areas and analyzed the fMRI data using intersubject pattern similarity analysis. Regions of the DMN carried patterns indicative of story interpretation. That is, encoding similarity was greater between the Twist and No-Twist groups than in the Spoiled group, and retrieval similarity was greater between the Twist and Spoiled groups than in the No-Twist group. The Spoiled group also showed greater pattern similarity with the Twist group's recall than the No-Twist group's recall. The authors also report a weaker effect of greater pattern similarity between the Spoiled group's encoding and the Twist group's recall than between the Twist group's own encoding and recall. Together, the data all converge on the point that one's interpretation of an event is an important determinant of the way it is represented in the brain.

      This is a really nice experiment, with straightforward predictions and analyses that support the claims being made. The results build directly on a prior study by this research group showing how interpretational differences in a narrative drive distinct neural representations (Yeshurun et al., 2017), but extend an understanding of how these interpretational differences might work retrospectively. I do not have any serious concerns or problems with the manuscript, the data, or the analyses. However I have a few points to raise that, if addressed, would make for a stronger paper in my opinion.

      1) My most substantive comment is that I did not find the interpretive framework to be very clear with respect to the brain regions involved. The basic effects the authors report strongly support their claims, but the particular contributions to the field might be stronger if the interpretations could be made more strongly or more specifically. In other words: the DMN is involved in updating interpretations, but how should we now think about the role of the DMN and its constituent regions as a result of this study? There are a number of ideas briefly presented about what the DMN might be doing, but it just did not feel very coherent at times. I will break this down into a few more specific points:

      While many of us would agree that the DMN is likely to be involved in the phenomena at hand, I did not find that the paper communicated the logic for singularly focusing on this subset of regions very compellingly. The authors note a few studies whose main results are found in DMN regions, but I think that this could stand to be unpacked in a more theoretically interesting way in the Introduction.

      Relatedly, I found the summary/description of regional effects in the Discussion to be a bit unsatisfying. The various pattern similarity comparisons yielded results that were actually quite nonoverlapping among DMN regions, which was not really unpacked. To be clear, it is not a 'problem' that the regional effects varied from comparison to comparison, but I do think that a more theoretical exploration of what this could mean would strengthen the paper. To the authors' credit, they describe mPFC effects through the lens of schemas, but this stands in contrast to many other regions which do not receive much consideration.

      Finally, although there is evidence that regions of the DMN act in a coordinated way under some circumstances, there is also ample evidence for distinct regional contributions to cognitive processes, memory being just one of them (e.g., Cooper & Ritchey, 2020; Robin & Moscovitch, 2017; Ranganath & Ritchey, 2012). The authors themselves introduce the idea of temporal receptive windows in a cortical hierarchy, and while DMN regions do appear to show slower temporal drift than sensory areas, those studies show regional differences in pattern stability across time even within DMN regions. Simply put, it is worth considering whether it is ideal to treat the DMN as a singular unit.

      Thank you for your helpful comments. We added text to the introduction and discussion to address your point:

      "Introduction:

      The brain’s default mode network (DMN)—comprising the posterior medial cortex, medial prefrontal cortex, temporoparietal junction, and parts of anterior temporal cortex—was originally described as an intrinsic or “task-negative” network, activated when participants are not engaged with external stimuli (Raichle et al. 2001, Buckner et al 2008). This observation led to a large body of work showing that the DMN is an important hub for supporting internally driven tasks such as memory retrieval, imagination, future planning, theory of mind, and creating and updating situation models (Svoboda et al. 2006; Addis et al. 2007; Hassabis and Maguire 2007, 2009; Schacter et al. 2007; Szpunar et al. 2007; Spreng et al. 2009, Koster-Hale & Saxe, 2013 2013, Ranganath and Ritchey 2012). However, it is not fully understood how this network contributes to these varying functions, and in particular—the focus of the present study—memory processes. Activation of this network during “offline” periods has been proposed to play a role in the consolidation of memories through replay (Kaefer et al 2022). Interestingly, prior work has also shown that the DMN is reliably engaged during “online” processing (encoding) of continuous rich dynamic stimuli such as movies and audio stories (Stephens et al 2013, Hasson et al 2008). Regions in this network have been shown to have long “temporal receptive windows” (Hasson et al 2008; Lerner et al., 2011; Chang et al., 2022), meaning that they integrate and retain high-level information that accumulates over the course of extended timescales (e.g. scenes in movies, paragraphs in text) to support comprehension. This combination of processing characteristics suggests that the DMN integrates past and new knowledge, as regions in this network have access to incoming sensory input, recent active memories, and remote long-term memories or semantic knowledge (Yeshurun et al 2021, Hasson et al 2015). These integration processes feature in many of the “constructive” processes attributed to DMN such as imagination, future planning, mentalizing, and updating situation models (Schacter and Addis 2007, Ranganath and Ritchey 2012). Notably, constructive processes are highly relevant to real-world memory updating, which involves selecting and combining the relevant parts of old and new memories. Recent work has shown that neural patterns during encoding and recall of naturalistic stimuli (movies) are reliably similar across participants in this network (Chen et al. 2017; Oedekoven et al., 2017; Zadbood et al., 2017; see Bird 2020 for a review of recent naturalistic studies on memory), and the DMN displays distinct neural activity when listening to the same story with different perspectives (Yeshurun et al 2017). Building on this foundation of prior work on the DMN, we asked whether we could find neural evidence for the retroactive influence of new knowledge on past memories."

      "Discussion :

      In addition to mPFC, right precuneus and parts of temporal cortex exhibited significantly higher pattern similarity in the “twist” and “spoiled” groups who recalled the movie with the same interpretation. Precuneus is a core region in the posterior medial network, which is hypothesized to be involved in constructing and applying situation models (Ranganath and Ritchey 2012). Our findings support a role for precuneus in deploying interpretation-specific situation models when retrieving event memories. In particular, we suggest that the posterior medial network may encode a shift in the situation model of the “twist” group in order to accommodate the new Ghost interpretation.

      We performed two targeted analyses to look for evidence of memory updating across encoding and recall: the interaction analysis (Figure 2C) and the encoding-recall analysis (Figure 3). We hypothesized that a shift in direction of pISC difference would occur when neural representations during recall in the “twist” group start to reflect the Ghost interpretation. The interaction analysis probed this shift indirectly by taking into account the effects of both encoding-encoding and recall-recall analyses. Unlike the interaction analysis, in the encoding-recall analysis, we directly compared neural event representations during encoding and recall. Interestingly, all regions exhibiting an effect across the two encoding-recall analyses, excluding left anterior temporal cortex, were present in the interaction results. Among these regions, the left angular gyrus/TPJ exhibited an effect across all three analyses. As a core hub in the mentalizing network, temporo-parietal cortex has been implicated in theory of mind through perspective-taking, rationalizing the mental state of someone else, and modeling the attentional state of others (Frith and Frith 2006, Guterstam et. al 2021, Saxe and Kanwisher 2003). The motivations behind some actions of the main character in the movie heavily depend on whether the viewer perceives them as a Doctor or a Ghost, and participants may focus on this during both encoding and recall. We speculate that neural event representations in AG/TPJ in the current experiment may be related to mentalizing about the main character’s actions. Under this interpretation, the updated event representations during recall following the twist would be more closely aligned to the “spoiled” encoding representations, as a consequence of memory updating in the “twist” group.

      Our findings are consistent with the view that DMN synthesizes incoming information with one’s prior beliefs and memories (Yeshurun et al 2021). We add to this framework by providing evidence for the involvement of DMN regions in updating prior beliefs in light of new knowledge. Across our different encoding and recall analyses, we observe memory updating effects in a varied subset of DMN regions that do not cleanly map onto a specific subsystem of DMN (Robin and Moscovitch 2017, Ranganath and Ritchey 2012, Ritchey and Cooper 2020). Rather than being divergent, these results might be reflecting inherent differences between the processes of encoding and recall of naturalistic events. It has been proposed that neural representations corresponding to encoding of events are systematically transformed during recall of those events (Chen et al 2017, Favila et al 2020, Musz and Chen 2022). While we provide evidence for reinstatement of memories in DMN, our findings also support a transformation of neural representation during recall, as encoding-recall results were weaker in some areas than recall-recall findings. This transformation could affect how different regions and sub-systems of DMN represent memories, and suggests that the concerted activity of multiple subsystems and neural mechanisms might be at play during encoding, recall and successful updating of naturalistic event memories."

      2) I think that some direct comparison to regions outside the DMN would speak to whether the DMN is truly unique in carrying the key representations being discussed here. I was reluctant to suggest this because I think that the authors are justified in expecting that DMN regions would show the effects in question. However, there really is no "null" comparison here wherein a set of regions not expected to show these effects (e.g., a somatosensory network, or the frontoparietal network) in fact do not show them. There are not really controls or key differences being hypothesized across different conditions or regions. Rather, we have a set of regions that may or may not show pattern similarity differences to varying degrees, which feels very exploratory. The inclusion of some principled control comparisons, etc. would bolster these findings. The authors do include a whole-brain analysis in Supplementary Figure 1, which indeed produced many DMN regions. However, notably, regions outside the DMN such as the primary visual cortex and mid-cingulate cortex appear to show significant effects (which, based on the color bar, might actually be stronger than effects seen in the DMN). Given the specificity of the language in the paper in terms of the DMN, I think that some direct regional or network-level comparison is needed.

      In the original submission, we included additional analyses for visual and somatosensory networks, which we hypothesized would serve as control networks. Following your comment, in the revision, we added a separate section (included below) more thoroughly examining these analyses. We also added text to the results and discussion to explain our interpretation of these findings.

      "Changes in neural representations beyond DMN We focused our core analyses on regions of the default mode network. Prior work has shown that multimodal neural representations of naturalistic events (e.g. movie scenes) are similar across encoding (movie-watching or story-listening) and verbal recall of the same events in the DMN (Chen et al., 2017; Zadbood et al., 2017). Therefore, in the current work we hypothesized that retrospective changes in the neural representations of events as the narrative interpretation shifts would be observed in the DMN. We did not, for example, expect to observe such effects in lower-level sensory regions, where neural activity differs dramatically for movie-viewing and verbal recall. To be thorough, we ran the same set of analyses we performed in the DMN (Figure 2-3) in regions of the visual and somatomotor networks extracted from the same atlas parcellation (Schaefer et al., 2018). Our results revealed larger overall differences in DMN than in visual and somatosensory networks for the key comparisons discussed previously (Figure S2). In particular, the only regions showing significant differences in pISC in recall-recall and encoding-recall comparisons (p < 0.01, uncorrected) were located in the DMN. We did not observe a notable difference between DMN and the two other networks when comparing recall “twist” to movie “spoiled” and recall “twist” to movie “twist” (RG – MG > RG – MD) which is consistent with the weak effect in the original comparison (Figure 3B). In the encoding-encoding comparison, several ROIs from the visual and somatomotor networks showed relatively strong effects as well (see Discussion).

      In addition, we qualitatively reproduced our results by performing an ROI-based whole brain analysis (Figure S3, p < 0.01 uncorrected). This analysis confirmed the importance of DMN regions for updating neural event representations. However, strong differences in pISC in the hypothesized direction were also observed in a handful of other non-DMN regions, including ROIs partly overlapping with anterior cingulate cortex and dorsolateral prefrontal cortex (see Discussion)."

      "Discussion: While our main goal in this paper was to examine how neural representations of naturalistic events change in the DMN, we also examined visual and somatosensory networks. Aside from the encoding-encoding analysis in which some visual and somatosensory regions showed stronger similarity between two groups with the same interpretation of the movie, we did not find any regions with significant effects in these two networks in the other analyses. Unlike the recall phase where each participant has their unique utterance with their own choice of words and concepts to describe the movie, the encoding (move-watching) stimulus is identical across all groups. Therefore, the effects observed during encoding-encoding analysis in sensory regions could reflect similarity in perception of the movie guided by similar attentional state while watching scenes with the same interpretation (e.g. similarity in gaze location, paying attention to certain dialogues, or small body movements while watching the movie with the same Doctor or Ghost interpretations). In our whole brain analysis, these regions did not have significant interaction effects, which suggests that the effects were isolated to encoding. In the whole-brain analysis, we also observed a significant encoding-encoding and interaction effects in anterior cingulate cortex, as well as recall-recall and interaction effects in dlPFC. These results suggest that both the "spoiled" manipulation and the "twist" may recruit top-down control and conflict monitoring processes during naturalistic viewing and recall."

      3) If I understand correctly, the main analyses of the fMRI data were limited to across-group comparisons of "critical scenes" that were maximally affected by the twist at the end of the movie. In other words, the analyses focused on the scenes whose interpretation hinged on the "doctor" versus "ghost" interpretation. I would be interested in seeing a comparison of "critical" scenes directly against scenes where the interpretation did not change with the twist. This "critical" versus "non-critical" contrast would be a strong confirmatory analysis that could further bolster the authors' claims, but on the other hand, it would be interesting to know whether the overall story interpretation led to any differences in neural patterns assigned to scenes that would not be expected to depend on differences in interpretation. (As a final note, such a comparison might provide additional analytical leverage for exploring the effect described in Figure 3B, which did not survive correction for multiple comparisons.)

      This is a helpful suggestion, and we’ve added an analysis addressing your comment. We found that the interaction index capturing the difference between the three groups was stronger for the critical scenes than for the non-critical scenes for almost all DMN ROIs.

      "The role of scene content In the prior analyses, we focused on “critical scenes”, selected based on ratings from four raters who quantified the influence of the twist on the interpretation of each scene (see Methods). An independent post-experiment analysis of the verbal recall behavior of the fMRI participants yielded “twist scores” that were also highest for these scenes; that is, the expected and perceived effect of twist information on recall behavior were found to match. In our next analysis, we asked whether the neural event representations reflect these differences in the twist-related content of the scenes. In other words, are the “critical scenes” with highly twist-dependent interpretations truly critical for our observed effects?

      To answer this question, we re-ran our main encoding-encoding and recall-recall pISC analysis in each DMN ROI (Figure 2-3). We calculated interaction indices (Figure 2C) first by including all scenes, and second by including only the 11 non-critical scenes. To better compare the effect of including different subsets of scenes to our original results, in Figure 4 we show the results in 15 ROIs that exhibited meaningful effects in our main analyses (Figure 2C). Figure 4A demonstrates that “critical scenes” yielded higher interaction indices compared to all scenes or non-critical scenes across all ROIs. The interaction score across all DMN ROIs was significantly higher in “critical scenes” than all scenes (t(23) = 7.19, p = 2.53 x 10-7) and non-critical scenes (t(23) = 7.3, p = 1.95 x 10-7). These results show that critical scenes are indeed responsible for the observed pISC differences across groups."

      4) I appreciate the code being made available and that the neuroimaging data will be made available soon. I would also appreciate it if the authors made the movie stimulus and behavioral data available. The movie stimulus itself is of interest because it was edited down, and it would be nice for readers to be able to see which scenes were included.

      Unfortunately due to copyright, we cannot share the movie stimulus outright. However, we will share the timing of the cuts used, as well as the time-stamped transcripts of verbal recall.

      To sum up, I think that this is a great experiment with a lot of strengths. The design is fairly clean (especially for a movie stimulus), the analyses are well reasoned, and the data are clear. The only weaknesses I would suggest addressing are with regards to how the DMN is being described and evaluated, and the communication of how this work informs the field on a theoretical level.

    1. Author Response

      Reviewer #1 (Public Review):

      In a very interesting and technically advanced study, the authors measured the force production of curved protofilaments at depolymerizing mammalian microtubule ends using an optical trap assay that they developed previously for yeast microtubules. They found that the magnesium concentration affects this force production, which they argue based on a theoretical model is due to affecting the length of the protofilament curls, as observed previously by electron microscopy. Comparing with their previous force measurements, they conclude that mammalian microtubules produce smaller force pulses than yeast microtubules due to shorter protofilament curls. This work provides new mechanistic insight into how shrinking microtubules exert forces on cargoes such as for example kinetochores during cell division. The experiments are sophisticated and appear to be of high quality, conclusions are well supported by the data, and language is appropriate when conclusions are drawn from more indirect evidence. Given that the experimental setup differs from the previous optical trap assay (antibody plus tubulin attached to bead versus only antibody attached to bead), a control experiment could be useful with yeast microtubules using the same protocol used in the new variant of the assay, or at least a discussion regarding this issue. One open question may be whether the authors can be sure that measured forces are only due to single depolymerizing protofilaments instead of two or more protofilaments staying laterally attached for a while. How would this affect the interpretation of the data?

      This work will be of interest to cell biologists and biophysicists interested in spindle mechanics or generally in filament mechanics.

      Thank you for your careful reading of our manuscript, your kind remarks, and your favorable review.

      Reviewers #1 and #2 both mentioned a concern about potential differences between our previous setup with yeast microtubules, versus our new setup with predominantly bovine microtubules, and whether such differences might underlie the different pulse amplitudes we measured. We think this concern comes mainly from a misunderstanding of how the beads in both setups were tethered to the sides of the microtubules, and we apologize for not making this aspect clearer in our original submission.

      It is true that our new setup requires one additional step, pre-decoration of the anti-His beads with His6-tagged yeast tubulin. However, in both cases, the anti-His antibodies were kept very sparse on the beads to ensure that most beads, if they became tethered to a microtubule, were attached by a single antibody. (~30 pM beads were mixed with 30 pM of anti-His antibody, for a molar ratio of 1:1.) And even though the anti-His beads in our previous work did not undergo a separate incubation step for pre-decoration with tubulin, they undoubtedly were decorated immediately after being mixed into the microtubule growth mix, which in that case included ~1 µM of unpolymerized His6-tagged yeast tubulin dimers. Thus, the arrangement with beads tethered laterally to the sides of microtubules via single antibodies was created in both cases by essentially the same three-step process: First, beads decorated very sparsely with anti-His antibodies were bound to unpolymerized His6-tagged yeast tubulin. Second, a bead-tethered His6-tagged yeast tubulin was incorporated into the growing tip of a microtubule (which could be assembling from either yeast or bovine tubulin, depending on the experiment). Third, the tip grew past the bead to create a large extension. Because the beads in both scenarios were tethered by a single antibody to the same C-terminal tail of yeast β-tubulin, the differences in pulse amplitude cannot be explained by differences in the tethering. In our revised manuscript, we now mention explicitly in Results that the beads were tethered by single antibodies (lines 95 to 100). In Methods we significantly expanded the section about preparation of beads and how they became tethered (lines 365 to 393). [We refer here, and below, to line numbers when the document is viewed with “All Markup” shown.]

      You also raise an interesting, open question: Do protofilaments curl outward entirely independently of their lateral neighbors? Or under some conditions might they tend to stay laterally associated during the curling process, perhaps curling outward in pairs rather than as individual protofilaments? We cannot formally rule out the possibility that such lateral associations sometimes persist during protofilament curling. However, changes in lateral association seem unlikely to explain the magnesium- and species-dependent differences we measured in pulse amplitude, for several reasons: First, there is good evidence for lengthening of protofilament curls at disassembling tips (e.g., Mandelkow 1991, Tran & Salmon 1997), but we are not aware of convincing evidence for magnesium or species-dependent increases in the propensity of curling protofilaments to remain laterally associated. Second, an increase in lateral association should increase the effective flexural rigidity of the curls, but under all the conditions we examined, pulse enlargement was associated with a steepening of the amplitude-vs-force relation – i.e., with softening, not stiffening. Our model indicates that this softening can be fully explained by an increase in protofilament contour length, without any change in the intrinsic flexural rigidity of the protofilament curls.

      Reviewer #2 (Public Review):

      Microtubules are regarded as dynamic tracks for kinesin and dynein motors that generate force for moving cargoes through cells, but microtubules also act as motors themselves by generating force from outward splaying protofilaments at depolymerizing ends. Force from depolymerization has been demonstrated in vitro and is thought to contribute to chromosome movement and other contexts in cells. Although this model has been in the field for many years, key questions have remained unanswered, including the mechanism of force generation, how force generated might be regulated in cells, and how this system might be tuned across cellular contexts or organisms. The barrier is that we lack an understanding of experimental conditions that can be used to control protofilament shape and energetics. This study by Murray and colleagues makes an important advance towards overcoming that barrier.

      This study builds on previous work from the authors where they developed a system to directly measure forces generated by outward curling protofilaments at depolymerizing microtubule ends. That study showed for the first time that protofilaments act like elastic springs and related the generated force to the estimated energy contained in the microtubule lattice. Furthermore, they showed that slowing polymerization rate did not diminish force generation. That study used recombinant yeast tubulin, including a 6x histidine tag on beta tubulin that created attachment points for the bead on the microtubule lattice. The current study extends that system to show that work output is related to the length of protofilament curls.

      We are grateful for your very thoughtful and thorough review, which has helped us improve our manuscript.

      Murray and colleagues show this by manipulating curls in two ways - using bovine brain tubulin instead of yeast tubulin and altering magnesium concentration. Previous EM studies indicated that protofilaments on depolymerizing bovine microtubules have similar curvature but are shorter. The authors here use a blend of bovine brain tubulin and bead-linked recombinant yeast tubulin with the 6x histidine tag in their in vitro system and find smaller deflections of the laser-trapped bead than previously observed with pure yeast tubulin. A concern with comparing this heterogeneous bovine/yeast system to the previous work with homogeneous yeast tubulin is that density of 6x histidine-tagged tubulin subunits is likely to be different between the two systems. Also, the rate of incorporation of 6x histidine yeast tubulin into bovine microtubules in the current study may be different from the rate of incorporation into yeast microtubules in the previous study. These differences could lead to changes in the strength of bead attachment to the microtubule lattice and alter the compliance of the bead to deflection by curling protofilaments. These possibilities and lattice attachment strength are not explored in this study, raising concerns about comparing the two systems.

      Reviewers #1 and #2 both mentioned a concern about potential differences between our previous setup with yeast microtubules, versus our new setup with predominantly bovine microtubules, and whether such differences might underlie the different pulse amplitudes we measured. As detailed in our response to Reviewer #1 above, we think this concern comes mainly from a misunderstanding of how the beads in both setups were tethered to the sides of the microtubules, and we apologize for not making this aspect clearer in our original submission. For both our yeast and bovine microtubule experiments, the anti-His antibodies were kept very sparse on the beads to ensure that most beads, if they became tethered to a microtubule, were attached by a single antibody. Because the beads in both scenarios were tethered by a single antibody to the same C-terminal tail of yeast β-tubulin, the differences in pulse amplitude cannot be explained by differences in the tethering. In our revised manuscript, we now mention explicitly in Results that the beads were tethered by single antibodies (lines 95 to 100). In Methods we significantly expanded the section about preparation of beads and how they became tethered (lines 365 to 393).

      The authors go on to show that magnesium increases bead deflection and work output from the system. The use of magnesium was motivated by earlier studies which showed that increasing magnesium speeds up depolymerization and increases the lengths of protofilament curls. The use of magnesium here provides the first evidence that work output can be tuned biochemically. This is an important finding. The authors then go on to show that the effect of magnesium on bead deflection can be separated from its effect on depolymerization speed. They do this by proteolytically removing the beta tubulin tail domain, which previous studies had shown to be necessary to mediate the magnesium effect on depolymerization rate. The authors arrive at a conclusion that magnesium must promote protofilament work output by increasing their lengths. How magnesium might do this remains unanswered. The mechanistic insight from the magnesium experiments ends there, but the authors discuss possible roles for magnesium in strengthening longitudinal interactions within protofilaments or perhaps complexing with the GDP nucleotide at the exchangeable site, although that seems less likely at the concentrations in these experiments.

      The major conclusion of the study is the finding that work output from curling protofilaments is a tunable system. The examples here demonstrate tuning by tubulin composition and by divalent cations. Whether these examples relate to tuning in biological systems will be an important next question and could expand our appreciation for the versatility of depolymerizing microtubules as a motor.

      We fully agree that two very important next questions are whether work output from curling protofilaments is truly harnessed in vivo, and whether protofilament properties in vivo might be actively regulated for this purpose. Based on your recommendations, and as detailed below (under Major point 2), we have expanded our discussion of these possibilities in our revised manuscript.

      Reviewer #3 (Public Review):

      The authors used a previously established optical tweezers-based assay to measure the regulation of the working stroke of curled protofilaments of bovine microtubules by magnesium. To do so, the authors improved the assay by attaching bovine microtubules to trapping beads through an incorporated tagged yeast tubulin.

      The assay is state-of-the-art and provides a direct measurement of the stroke size of protofilaments and its dependence on magnesium.

      The authors have achieved all their goals and the manuscript is well written.

      The reported findings will be of high interest for the cell biology community.

      Thank you for reading and evaluating our manuscript. We are grateful for your positive comments.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors found that the IDR in Cdc15 gets phosphorylated by multiple kinases, Pom1/Shk1/Pck1/Kin1, and the phosphorylation on IDR inhibits the phase separation of the Cdc15 protein. The phosphorylation was demonstrated in the cell as well as in vitro. Moreover, the phosphorylation sites were identified by mass spectrometry. The phospho-regulation of Cdc15 LLPS was demonstrated by in vitro assay using recombinant proteins. The significance of the phosphorylation on contractile actomyosin ring (CAR) was demonstrated by using a cdc15 mutant carrying 31 Ala-substitutions at the phosphorylation sites (cdc15 31A). The CAR assembled comparable to cdc15+, but maturation and contraction of the ring were faster in the cdc15 31A mutant, suggesting the contribution of the phosphorylation for delaying cytokinesis. This could be one of the mechanisms to ensure the completion of chromosome segregation before the cytokinesis. In this paper, the authors showed over-accumulation of type-II myosin regulatory light chain Rlc1 on CAR in the cdc15 31A mutant during the CAR assembly and its contraction. In addition, the kinases for the Cdc15 IDR phosphorylation are identified as polarity kinases, which restrict the assembly of the CAR formation in the middle. Indeed, inhibition of the kinases increases the ratio of septa formation at the cell tip in the mid1 knockout mutant, which lacks a major positive polarity cue during the mitotic phase. However, in this manuscript, this phenotype is not solely explained by the phosphorylation of the cdc15 31A, because the authors did not show the tip septa formation using cdc15 31A.

      Preventing Cdc15 phosphorylation does not on its own promote tip septa formation (Bhattacharjee et al., 2020). The polarity kinases have other substrates in the tip exclusion pathway that presumably also play a key role in septation. Also, cells must also be in the correct part of the cell cycle to form functional CRs and septa. We described the necessary roles of other polarity kinase substrates in our discussion.

      Overall, the data supports their conclusion, Cdc15 forms LLPS, and the process is inhibited by the phosphorylation of amino acid residues in the IDR in Cdc15 by polarity kinases. It is still unclear whether LLPS formation is a reversible process regulated by the protein kinases. In vitro experiments showed condensate formation by dephosphorylation of Cdc15 IDR but not diffusion of the LLPS by phosphorylation. I wonder if incubation of the kinases and the Cdc15 IDR condensates induces demolition of the LLPS.

      This is an interesting idea but technically challenging. The reactions performed in vitro are done by adding phosphatase to induce droplet formation and there is no way to remove the phosphatase. Therefore, addition of kinase will battle the phosphatase and clear results are unlikely. What we do know from work in vivo is that without the ability to rephosphoryate Cdc15 with the Alanine mutants, the protein remains bound to membrane in clusters so it seems clear that it is the phosphostate of Cdc15 that governs this property of the protein.

      The transition of the Cdc15 IDR phosphorylation and LLPS formation through the cell cycle progression is unclear. In asynchronous cells (most of the cells may be in the G2 phase) and nda3 or cps1 mutants, Cdc15 was still highly phosphorylated. This indicates that the Cdc15 is phosphorylated and the LLPS formation is inhibited throughout the cell cycle. The transition of the phosphorylation status for individual residues could be the next challenge for this research.

      The cell cycle changes in Cdc15 phosphostatus and their correlation with localization have been well-documented (e.g. Fankhauser et al., Cell, 1998; Clifford et al., JCB, 2008; Roberts-Galbriath et al., Mol. Cell, 2010). Upon bulk analysis, Cdc15 is never fully dephosphorylated during mitosis but it is not highly phosphorylated in cells blocked in mitosis with nda3 or in cps1 cells when some portion of it is in CRs (please see the references indicated previously). As shown in the simulations, the protein need not be fully phosphorylated or dephosphorylated in order to undergo a conformational change that would allow condensate formation. A major conclusion of our work is that no particular phosphorylation site or sites is important but rather the overall charge on the dimer is important and that some threshold of phosphorylation keeps the protein off from forming clusters on the membrane. We agree with the reviewer that what that threshold is will be of interest in the future.

      In addition, currently, there is no approach to monitor the LLPS in wild-type cells. Therefore, it is still unclear if LLPS formation is the physiological mechanism regulating cell division in wild-type cells.

      We agree that we have not monitored LLPS in live cells. However, Cdc15’s condensate formation in live cells and its phosphorylation state are highly correlated. This suggestive of LLPS in vivo.

    1. Author Response

      Reviewer #2 (Public Review):

      “To describe LLPS or to distinguish between polymer-polymer phase separation and LLPS, recent studies have used single particle tracking, a technique allowing to follow the dynamics of individual proteins in living cells (https://doi.org/10.7554/eLife.60577; https://doi.org/10.7554/eLife.69181; https://doi.org/10.7554/eLife.47098). The authors should mention that such an approach can be a good alternative to avoid the artefact of fixation. Using techniques such as single particle tracking or FCS, it is possible to estimate the effective diffusion coefficient of protein-living cells. When a liquid phase separation is formed, it is also possible to estimate the diffusion coefficient of the protein of interest (POI) inside versus outside of the LLPS.”

      We thank the reviewer for their insight and fully agree that live-cell techniques like SPT and FCS are valuable for investigating LLPS while avoiding fixation artifacts. We have added discussion emphasizing this fact and incorporated the citations recommended by the reviewer in Paragraph 1 on Page 15: “Live imaging techniques that allow estimation of protein diffusion coefficients within specific cellular compartments, e.g., SPT (Hansen et al., 2018 and Heckert et al., 2022) and fluorescence correlation spectroscopy (Lanzanò et al., 2017), can be useful alternative approaches for diagnosing LLPS in vivo without the potential artifact of fixation, as diffusion dynamics are recently shown to be affected by LLPS (Heltberg et al., 2021; McSwiggen et al., 2019a; Miné-Hattab et al., 2021; Chong et al., 2022; and Ladouceur et al., 2020).”

      “The authors say that less dynamic interactions are better captured by PFA fixation. In the simulation part, would it be possible to predict from the diffusion coefficients of the POI inside a condensate the effect of the PFA fixation? […] In the simulation part, they could try to incorporate the diffusion coefficient of the protein of interest and see if it is possible to predict the effect of fixation as a function of the diffusion coefficient.”

      We thank the reviewer for pointing out the absence of this critical piece that connects our experimental observations to our kinetic model. Our model considers association/dissociation rates rather than diffusion coefficients to describe interaction dynamics, but the reviewers’ point is still very insightful and important. As described in Response 2, we compared two proteins: Halo-TAF15(IDR), which is poorly preserved by fixation, and TAF15(IDR)-Halo-FTH1, which is well preserved by fixation. We used SPT to measure the dissociation rates of Halo-TAF15(IDR) and TAF15(IDR)-Halo-FTH1 and showed that the dissociation rate of Halo-TAF15(IDR) from its puncta is much faster than that of TAF15(IDR)-Halo-FTH1, demonstrating more stable homotypic interactions of the latter than the former. The observation that TAF15(IDR)-Halo-FTH1 has less dynamic interactions and is better preserved by fixation compared to Halo-TAF15(IDR) agrees with our model’s prediction that less dynamic interactions are better captured by fixation. Please see Response 2 for more details. Our new data and discussion have been added to the revised manuscript in Paragraph 3 on Page 13 and in Figure 3B, Figure 3E, Figure 6, and Video 2.

      “Finally, the authors propose that in the future, it will be important to design novel fixatives with significantly faster cross-linking rates than biomolecular interactions to eliminate fixation artifacts in the cell. It would be even more interesting if the authors could propose some ideas of potential novel fixatives. Did they test several concentrations of PFA, for example? Did they test different times of PFA incubation? Did they test cryofixation and do they know what would be their effect on LLPS? Do they have novel fixatives in mind? […] To strengthen the manuscript, the authors should try more protocols of fixation.”

      We thank the reviewer for these good questions. As described in Response 1, we have done additional quantification of the change of LLPS appearance in cells upon treatment of 0% PFA (only PBS buffer), 1% PFA, 2% PFA, and 8% PFA as well as 4% PFA supplemented by 0.2% GA. We saw statistically significant changes in the LLPS-describing parameters upon all the PFA and PFA/GA treatments except the 0% PFA control. To examine how fixation artifacts depend on the time of PFA incubation, we acquired a time-lapse movie of a cell overexpressing EGFP-FUS(IDR) immediately after 4% PFA treatment and quantified the number of puncta over time (Video 1). We showed that fixation is complete (the number of puncta becomes constant) by roughly 100 seconds (Figure 1 – figure supplement 2). Our new data also justified our choice of a 10-minute PFA incubation time for analyzing fixation-induced change of LLPS appearance in the rest of the paper. Please see Response 1 for more details. Our new data and discussion have been added to the revised manuscript in Paragraph 3 on Page 3 and in Figure 1 - figure supplement 2 (time dependence of fixation artifacts), Figure 1 - figure supplement 3 (fixation artifact at various PFA concentrations), and Figure 1 - figure supplement 4 (fixation artifact upon treatment of 4% PFA supplemented with 0.2% GA).

      We agree that testing more cell fixation protocols such as cryofixation on LLPS appearance would be interesting. However, given the complexity of novel fixation protocols like cryofixation and highly specialized equipment and reagents they require, testing widely how different fixation methods might change LLPS appearance would be a tremendous amount of work that is enough to fill a separate paper. These experiments would be much more appropriate for a separate study in the future.

      Reviewer #3 (Public Review):

      “Understanding whether/how fixation methods affect the detection of biomolecular condensates is of broad interest given the importance of LLPS in regulating different aspects of cell biology. However, in this manuscript, the authors use only paraformaldehyde as a fixation method and study only fluorescently-labelled IDR proteins. The work would benefit from a comparison between living cells and cells fixed with other fixation methods.”

      We appreciate the reviewer for this suggestion and agree that more fixation protocols should be investigated. As described in Response 1 and Response 18, besides examining PFA fixation, we have quantified how fixation using 4% PFA supplemented by 0.2% GA changes LLPS appearance in cells. We saw statistically significant changes in all the LLPS-describing parameters upon PFA/GA treatments. Please see Response 1 and Response 18 for details. Our new data and discussion have been added to the revised manuscript in Paragraph 3 on Page 3 and in Figure 1 - figure supplement 4.

      “In addition, it would be useful to test the impact of these fixation methods on the detection of endogenous proteins or IDR proteins without fluorescent tag.”

      We appreciate the reviewer for this suggestion and have now investigated an endogenous IDR-containing protein in the revised manuscript. Specifically, we quantified the effect of 4% PFA fixation on endogenously expressed EWS::FLI1 in an Ewing sarcoma cell line A673, which is an oncogenic fusion transcription factor that causes Ewing sarcoma (Grünewald et al., 2018) and known to form local, high-concentration hubs at target genes associated with GGAA microsatellites (Chong et al., 2018). We previously Halo-tagged endogenous EWS::FLI1 in A673 cells using CRISPR/Cas9-mediated genome editing (Chong et al., 2018). Here, we quantified the effect of PFA fixation on endogenous EWS::FLI1 puncta in this knock-in cell line and found no significant difference in the distribution of EWS::FLI1 upon fixation. This result suggests that PFA fixation does not change the intracellular distribution of all proteins. Our new data and discussion have been added to the revised manuscript in Paragraph 1 on Page 8 and in Figure 3C.

      Unfortunately, testing fixation artifacts of IDR-containing proteins without a fluorescent tag has been infeasible as we rely on fluorescence from a tag on the protein of interest to quantitatively compare LLPS appearance in live and fixed cells. Although we have considered using non-fluorescent methods, e.g., phase contrast microscopy, to visualize putative LLPS in cells, its lack of specificity in imaging proteins or cellular structures makes the type of quantification we do for fixation artifact characterization inaccessible.

    1. Author Response

      Reviewer #1 (Public Review):

      1 - Problems with the analysis of stimulation latency

      The data in this paper show a variable latency in signal propagation from stimulation sites to hippocampal recording electrodes. In an attempt to measure this latency, the authors examine the theta phase offset between each pair of stimulation and recording electrodes (Figure 9). They interpret their results as showing a consistent 90-degree phase offset. However, their data do not support this interpretation because in fact their measurements show a bimodal distribution of phase differences with peaks at 0 and 180 degrees. It is not valid to interpret the circular mean of a bimodal distribution because the result is not well defined. Further, individual electrodes do not show a mean difference of 90 degrees.

      Because the results do not reliably support the claim of a consistent 90 phase difference between the hippocampus and cortex, it is a substantial problem for the paper, given the importance of hippocampal-cortical timing in their interpretation. In particular, the authors should reconsider how they frame their results in relation to the Siegle and Wilson work and others.

      We no longer emphasize the phase difference between hippocampus and neocortex in the revised manuscript. This phase difference was computed to attempt to address the possibility that there was some latency in the propagation of stimulation effects from lateral temporal cortex to hippocampus, which would affect our interpretation of which theta phase angles evoked minimal versus maximal hippocampal response (i.e., “peak” stimulation trials may actually have involved stimulation propagating to hippocampus sometime after its peak). However, as noted above in response to Essential Revisions #1, we cannot fully rule out the possibility that volume conduction influenced our estimates of phase lag. We no longer emphasize this analysis and have moved it to the appendix (Appendix 1-Figure 4), along with a new analysis using bipolar rereferencing to address the volume conduction issue.

      The manuscript is now focused on the main finding of the experiment, of a 180-degree separation between theta phases associated with minimal versus maximal evoked responses. We analyzed this via circular-linear models of phase versus evoked amplitude, as suggested by the reviewers, rather than the phase-binning analyses emphasized in the original manuscript. Circular-linear analyses are indifferent to the specific phase values associated with minimal/maximal response. We have also expanded our Introduction with further discussion of homologies to the rodent literature, including to the Siegle and Wilson paper. Our revised Discussion section emphasizes that the central homology is that there is 180-degree separation between hippocampal theta phase angles associated with minimal versus maximal responsiveness to input, with less emphasis placed on the specific angles (i.e., peak versus trough), given difficulties in comparing specific phase angles across species and recording approaches.

      2 - Problems with the figures

      Some figures in the paper were hard to interpret and I felt it would benefit readers for many to be combined. The results from Figures 3 through 7 would be helpful to see side by side, as they show various investigations of the same data. In Figure 4, it would be helpful to see both plots from (a) on the same axis, as is in (b). I did not find that the accuracy estimation paper in Figure 2 was important to include in the main paper. It would be better suited for the supplement, in my view, unless I am missing something.

      We have substantially revised the figures for clarity. The analyses presented in original Figures 2, 6, and 9 have been moved to the appendix (as revised Appendix Figures 1, 3, and 4). Figure 3 has been combined with Figure 1 into the revised Figure 1. Figures 4 and 7 have been combined in order to show EP data from all four phase bins side-by-side (Figure 3). We did not combine a) and b) from the original figure onto the same axis, as we found it difficult to interpret the four overlaid traces (i.e., 2 EP traces and 2 phase-matched stimulation-free traces). However, these data are now shown side-by-side and on equal axes. We have updated all EP visualizations to improve readability. Figure 5 has been expanded to include component amplitudes comparisons for both peak versus trough and rising versus falling phases, in keeping with the expanded Figure 3.

    1. Author Response

      Reviewer #2 (Public Review):

      This clinical trial is conducted to pursue short course DAA therapy. For an ultra-short course to work, it has to be simple, equally efficacious to established treatments, and requires no additional workup (like genotyping, IL28B, HCV VL determination, etc after initiation of therapy as shown in Liu et al.). This is because our aim is to simplify therapy to treat most people, especially those who are not engaged in care. This work struggles to achieve these goals, as the to the SVR for short-course therapy is unacceptably low. The authors' conclusion that treat short first and then you can treat those who fail again does not appear to achieve these goals, as realistically,it is difficult to re-engage marginalized population from an elimination perspective. The ideal is to treat them in one attempt.

      We would like to clarify that we do not propose treating with 4 weeks and then retreating, because we acknowledge an unacceptable first line cure rate with this approach. We suggest 8 weeks may achieve cure rate of greater than 90% in mild liver disease (18/18 participants with slow virological response were cured with 8 weeks SOF/DCV in this study). Since retreatment with the same drug combination is effective, there is arguably less jeopardy in a regimen with 90% cure rate than previously perceived.

      Reviewer #3 (Public Review):

      This prospective study evaluated the utility of D2 VL determination for response-guided ultra-short (4w) sofosbuvir + daclatasvir treatment of chronic HCV patients (with mild disease) with G1+6. Shortening therapy duration reduces DAA use with a cure rate of 75% overall upon first-line treatment and 100% among retreated patients. In contrast to a previous report in G1b patients that showed a 100% success rate with D2-based 3-week triple therapy, the present study fails to show a good enough yield for a 4w sofosbuvir + daclatasvir regimen among G1+6 patients. Given the small number of patients, additional studies should determine whether a different time point and/or a different viral threshold could be more appropriate indicators to allow a 4-week duration of dual therapy (without a protease inhibitor).

      Strengths:

      A) An important study that is a nice addition to previous reports evaluating the utility of response-guided therapy for shortening the duration of HCV treatment. Given the disease burden and the high costs of treatment, especially in low-income countries, this is a major goal that was also advocated by the WHO.

      B) This study investigates an ultra-short protease-inhibitor-free regimen and therefore complements a previous (positive) RGT study of a 3-week triple regimen.

      C) This study is prospective with careful analyses of ample data, including the evaluation of RAS by gene sequencing. The follow-up was long enough and analyses of viral kinetics were performed. In addition, a detailed analysis of re-treatment outcomes and viral mutations in this population was performed

      D) Although the main objective (shortening therapy to 4 weeks) was not adequately achieved (<90% success rate), the study's results may suggest that re-treatment in case of failure is safe and efficient, although further studies with a higher number of patients are needed for confirmation.

      Limitations:

      A) Relatively small study cohort. Overall, only 34 patients were treated with a 4-week regimen. However, given the results, it seems that this number of patients who achieved only a 75% cure rate, is enough to exclude the use of a D2 RGUT, at least in G1+6 patients treated with sofosbuvir + daclatasvir. On the other hand, even 100% of success rate on 8-week treatment among 17 patients is not really enough to draw firm conclusions on the adequacy of this short regimen among this group of patients. A higher number of patients could better validate this positive result.

      Addressed in discussion. Firstly, it was powered to determine overall cure rate with 4- and 8- weeks treatment, rather than outcomes with each duration. It is possible that we would have seen patients failing 8 weeks therapy with a larger sample, and our cure estimates may therefore be imprecise.

      B) The values chosen for the RGT are arbitrary. The relatively small number of patients could not allow for a more detailed analysis of more appropriate time points and/or viral load thresholds to determine the adequacy of a 4-week of therapy in individual patients. The D2 500IU/ML threshold is based on a small previous phase 2 study on G1b patients treated with a triple-drug regimen, which does not necessarily imply dual therapy (w/o a protease inhibitor) involving patients with a different subtype of the virus. In this context, a control group treated with triple combination therapy (with a protease inhibitor) could be very helpful to the study.

      This was a mechanistic pilot study conducted in Vietnam, where antiviral options are limited. We therefore made a conscious decision to use licensed/available treatments (SOF/DCV) rather than Lau combination which is not WHO-approved.

      C) Is there a particular pattern of viral kinetics to 4w cured patients Vs. failures? Fig 1 (Appendix 1) only shows the means of viral load and the general kinetics for the whole population, but individual plots of viral kinetics are not presented although could potentially be useful. Also, according to the presented data, day 7 VL<LLOQ may be a better indicator for shortening treatment to 4w. A detailed graphical presentation of viral kinetics in these patients could be helpful.

      We have added appendix 1- figure 2 showing HCV RNA kinetics in participants treated with 4 weeks SOF/DCV, with cures (red lines) distinguished from treatment failures. In results section we comment on this that Even though the numbers are small, this helps illustrate that early on-treatment response alone may be of limited value in determining cure with ultra-short therapy.

      D) According to Table 3, no significant differences in the host or viral factors were detected between cured or failures of the 4w regimen. However, the low number of patients makes it very difficult to interpret these data and might miss potential differences between these two groups of patients, emphasizing again the difficulty in drawing firm conclusions from this study. In this context, I wonder whether a regression analysis would better define either viral (subtype, RAS) or host factors that are implicated in a 4w duration success.

      See above.

    1. Author Response

      Reviewer #1 (Public Review):

      Auwerx et al. have taken a new approach to mine large existing datasets of intermediary molecular data between GWAS and phenotype, with the aim of uncovering novel insight into the molecular mechanisms which lead a GWAS hit to have a phenotypic effect. The authors show that you can get additional insight by integrating multiple omics layers rather than analyzing only a single molecular type, including a handful of specific examples, e.g. that the effect of SNPs in ANKH on calcium are mediated by citrate. Such additional data is necessary because, as the authors' point out, while we have thousands of SNPs with significant impact on phenotypes of interest, we often don't know at all the mechanism, given that the majority of significant SNPs found through GWAS are in non-coding (and often intergenic) regions.

      This paper shows how one can mine large existing datasets to better estimate the cellular mechanism of significant, causal SNPs, and the authors have proven that by providing insight into the links between a couple of genes (e.g. FADS2, TMEM258) and metabolite QTLs and consequent phenotypes. There is definitely a need and utility for this, given how few significant SNPs (and even fewer recently-discovered ones) hit parts of the DNA where the causal mechanism is immediately obvious and easily testable through traditional molecular approaches.

      I find the paper interesting and it provides useful insight into a still relatively new approach. However, I would be interested in knowing how well this approach scales to the general genetics community: would this method work with a much smaller N (e.g. n = 500)? Being able to make new insights using cohorts of nearly 10,000 patients is great, but the vast majority of molecular studies are at least an order of magnitude smaller. While sequencing and mass spectrometry are becoming exponentially cheaper, the issue of sample size is likely to remain for the foreseeable future due to the challenges and expenses of the initial sample collection.

      We thank the reviewer for his assessment and have now addressed – in the revised version of the manuscript, as well as in the below point-by-point reply – his specific comments/questions.

      Reviewer #2 (Public Review):

      Auwerx et al. present a framework for the integration of results from expression quantitative trait loci (eQTL), metabolite QTL (mQTL) and genome-wide association (GWA) studies based on the use of summary statistics and Mendelian Randomization (MR). The aim of their study is to provide the field with a method that allows for the detection of causal relationships between transcript levels and phenotypes by integrating information about the effect of transcripts on metabolites and the downstream effect of these metabolites on phenotypes reported by GWA studies. The method requires the mapping of identical SNPs in disconnected mQTL and eQTL studies, which allows MRbased inference of a causal effect from a transcript to a metabolite. The effect of both transcripts and metabolites on phenotypes is evaluated in the same MR-based manner by overlaying eQTL and mQTL SNPs with SNPs present in phenotypic GWA studies.

      The aim of the presented approach is two-fold: (1) to allow identification of additional causal relationships between transcript levels and phenotypes as compared to an approach limited to the evaluation of transcript-to-phenotype associations (transcriptome-wide MR, TWMR) and (2) to provide information about the mechanism of effects originating from causally linked transcripts via the metabolite layer to a phenotype.

      The study is presented in a very clear and concise way. In the part based on empirical study results, the approach leads to the identification of a set of potential causal triplets between transcripts, metabolites and phenotypes. Several examples of such causal links are presented, which are in agreement with literature but also contain testable hypotheses about novel functional relationships. The simulation study is well documented and addresses an important question pertaining to the approach taken: Does the integration of mQTL data at the level of a mediator allow for higher power to detect causal transcript to phenotype associations?

      We thank the reviewer for his/her assessment and have now addressed – in the revised version of the manuscript, as well as in the below point-by-point reply – his/her specific comments/questions.

      Major Concerns

      1) Our most salient concern regarding the presented approach is the presence of multiple testing problems. In the analysis of empirical datasets (p. 4), the rational for setting FDR thresholds is not clearly stated. While this appears to be a Bonferroni-type correction (p-value threshold divided by number of transcripts or metabolites tested), the thresholds do not reflect the actual number of tests performed (7883 transcripts times 453 metabolites for transcript-metabolite associations, 87 metabolites or 10435 transcripts times 28 complex phenotypes). The correct and more stringent thresholds certainly decrease the overlap between causal relationships and thus reduce the identifiable number of causal triplets. Furthermore, we believe that multiple testing has to be considered for correct interpretation of the power analysis. The study compares the power of a TWMR-only approach to the power of mediation-based MR by comparing "power(TP)" against "power(TM) * power(MP)" (p. 12). This comparison is useful in a hypothetical situation given data on a single transcript affecting a single phenotype, and with potential mediation via a single metabolite. However, in an actual empirical situation, the number of non-causal transcript-metabolite-phenotype triplets will exceed the number of non-causal transcript-phenotype associations due to the multiplication with the number of metabolites that have to be evaluated. This creates a tremendous burden of multiple testing, which will very likely outweigh the increase in power afforded by the mediation-based approach in the hypothetical "single transcript-metabolite-phenotype" situation described here. Thus, for explorative detection of causal transcript-phenotype relationships, the TWMR-only method might even outperform the mediation-based method described by the authors, simply because the former requires a smaller number of hypotheses to be tested compared to the latter. The presented simulation would only hold in cases where a single path of causality with a known potential mediator is to be tested.

      We thank the reviewer for pointing out the multiple testing issue. Based on this comment, we have revised our approach by mainly implementing two major modifications to our approach.

      First, we reduce the number of assessed metabolites to 242 compounds for which we were able to identify a Human Metabolome Database (HMDB) identifier through manual curation. This was triggered by the suggestion of reviewer #1 to facilitate the database/literature-based follow-up of our discoveries. The motivation is to only test metabolites that if found to be significantly associated would yield interpretable results, thereby reducing the number of tests to be performed. This modification is described in the revised manuscript:

      Results: “Summary statistics for cis-eQTLs stem from the eQTLGen Consortium metaanalysis of 19,942 transcripts in 31,684 individuals [3], while summary statistics for mQTLs originate from a meta-analysis of 453 metabolites in 7,824 individuals from two independent European cohorts: TwinsUK (N = 6,056) and KORA (N = 1,768) [6]. After selecting SNPs included in both the eQTL and mQTL studies, our analysis was restricted to 7,884 transcripts with ≥ 3 instrumental variables (IVs) (see Methods, Supplemental Figure 1) and 242 metabolites with an identifier in The Human Metabolome Database (HMDB) [28] (see Methods, Supplemental Table 1).”

      Methods: “mQTL data originate from Shin et al. [6], which used ultra-high performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS) to measure 486 whole blood metabolites in 7,824 European individuals. Association analyses were carried out on ~2.1 million SNPs and are available for 453 metabolites at the Metabolomics GWAS Server (http://metabolomics.helmholtz-muenchen.de/gwas/). Among these metabolites, 242 were manually annotated with Human Metabolome Database (HMDB) identifiers (Supplemental Table 1) and used in this study.”

      Second, to account for all remaining tests, we now select significant causal effects based on FDR < 5% in all performed univariable MR analyses. With 5% FDR on both the transcript-to-metabolite and metabolite-to-phenotype effects, the FDR for triplets is slightly inflated to 9.75% (= 1-0.952), a consideration that we now explicitly describe. Note that selecting triplets based on transcript-tometabolite and metabolite-to-phenotype effects FDR < 2.5%, result in a FDR < 5% (1-0.9752) for the triplets. This more stringent threshold identifies 135 causal triplets, 39 of which would be missed by TWMR. Overall, Results and Supplemental Tables have been updated and now read as follow:

      “Mapping the transcriptome onto the metabolome […] By testing each gene for association with the 242 metabolites, we detected 96 genes whose transcript levels causally impacted 75 metabolites, resulting in 133 unique transcriptmetabolite associations (FDR 5% considering all 1,907,690 instrumentable gene-metabolite pairs Supplemental Table 2) […].

      Mapping the metabolome onto complex phenotypes […] Overall, 34 metabolites were associated with at least one phenotype (FDR 5% considering all 1,344 metabolite-phenotype pairs), resulting in 132 unique metabolitephenotype associations (Supplemental Table 4).

      Mapping the transcriptome onto complex phenotypes […] In total, 5,140 transcripts associated with at least one phenotype (FDR 5% considering all 292,170 gene-phenotype pairs) resulting in 13,141 unique transcript-phenotype associations (Supplemental Table 5).

      Mapping metabolome-mediated effects of the transcriptome onto complex phenotypes […] We combined the 133 transcript-metabolite (FDR ≤ 5%) and 132 metabolite-trait (FDR ≤ 5%) associations to pinpoint 216 transcript-metabolite-phenotype causal triplets (FDR = 1-0.952 = 9.75%) (Supplemental Table 6).”

      In the simulations performed for the power analysis, we used a Bonferroni correction. We ran each simulation for 500 transcripts, measuring 80 metabolites at each run and performed TWMR and MWMR. The power of TWMR was calculated by counting how many times we obtain p-values ≤ 0.05/500. The power of the mediation analysis was calculated as 𝑝𝑜𝑤𝑒𝑟"$ ∗ 𝑝𝑜𝑤𝑒𝑟$#, where 𝑝𝑜𝑤𝑒𝑟"$ was calculated by counting how many times we obtain p-values ≤ 0.05/(500*80), and 𝑝𝑜𝑤𝑒𝑟$# was calculated by counting how many times we obtain p-values ≤ 0.05/80. In the revised manuscript, we additionally repeated each simulated scenario 10 times to increase robustness of results. This has been clarified in both the Methods and Results sections of the revised manuscript:

      Methods: “Ranging 𝜌 and 𝜎 from -2 to 2 and from 0.1 and 10, respectively, we run each simulation for 500 transcripts measuring 80 metabolites at each run and performed TWMR and MWMR starting from above-described 𝛽7<"=, 𝛽4<"= and 𝛽>?,(. For each MR analysis we calculated the power to detect a significant association as well as the difference in power between TWMR and the mediation analyses (i.e., 𝑝𝑜𝑤𝑒𝑟"# − 𝑝𝑜𝑤𝑒𝑟"$ ∗ 𝑝𝑜𝑤𝑒𝑟$#). Each specific scenario was repeated 10 times and the average difference in power across simulation was plotted as a heatmap.”

      Results: “To characterize the parameter regime where the power to detect indirect effects is larger than it is for total effects, we performed simulations using different settings for the mediated effect. In each scenario we evaluated 500 transcripts and 80 metabolites and varied two parameters characterizing the mediation: a. the proportion (𝜌) of direct (𝛼!) to total (𝛼"#) effect (i.e., effect not mediated by the metabolite) from -2 to 2 to cover the cases where direct and mediated effect have opposite directions (51 values); b. the ratio (𝜎) between the transcript-to-metabolite (𝛼"$) and the metabolite-to-phenotype (𝛼$#) effects, exploring the range from 0.1 to 10 (51 values).<br /> Transcripts were simulated with 6% heritability (i.e., median ℎ@ in the eQTLGen data) and a causal effect of 0.035 (i.e., ~65% of power in TWMR at a = 0.05) on a phenotype. Each scenario was simulated 10 times and results were averaged to assess the mean difference in power (see Methods).”

      2) A second concern regards the interpretation of the results based on the empirical datasets. For the identified 206 transcript-metabolite-phenotype causal triplets, the authors show a comparison between TWMR-based total effect of transcripts on phenotypes and the calculated direct effect based on a multivariable MR (MVMR) test (Figure 2B), which corrects for the indirect effect mediated by the metabolite in the causal triplet. The comparison shows a strong correlation between direct and total effect. A thorough discussion of the potential reasons for deviation (in both negative and positive directions) from the identity line is missing.

      Deviation from the identity line, as observed in Figure 2B, indicates that while there is a strong correlation between direct and total effect, it is not perfect, and part of the total effect is due to an indirect effect mediated by metabolites. This is explained and discussed in the Results and Discussion section:

      Results: “Regressing direct effects (𝛼!) on total effects (𝛼"#) on (Figure 2A), we estimated that for our 216 mediated associations, 77% [95% CI: 70%-85%] of the transcript effect on the phenotype was direct and thus not mediated by the metabolites (Figure 2B).”

      Discussion: “The observation that 77% of the transcript’s effect on the phenotype is not mediated by metabolites suggests that either true direct effects are frequent or that other unassessed metabolites or molecular layers (e.g., proteins, post-translational modifications, etc.) play a crucial role in such mediation. It is to note that in the presence of unmeasured mediators or measured mediators without genetic instruments, our mediation estimates are lower bounds of the total existing mediation. […] Thanks to the flexibility of the proposed framework, we expect that in the future and upon availability of ever larger and more diverse datasets, our method could be applied to estimate the relative contribution of currently unassessed mediators in translating genotypic cascades.”

      Furthermore, no test of significance for potential cases of mediation is presented. Due to the issues of multiple testing discussed above, the significance of the inferred cases of mediation is drawn into question. The examples presented for causal triplets (involving the ANKH and SLC6A12 transcripts) feature transcripts with low total effects and a small ratio between direct and total effect, in line with the power analysis. However, in these examples, the total effects are also quite low. Its significance has to be tested with an appropriate statistical test, incorporating multiple testing correction.

      Following the reviewer’s suggestion, we have modified our criteria to call significant associations to account for multiple testing (see extensive reply to major concern #1). With 5% FDR on both the transcript-to-metabolite and metabolite-to-phenotype effects, the FDR for triplets is slightly inflated to 9.75% (= 1-0.952). We mention this limitation in the revised manuscript:

      “We combined the 133 transcript-metabolite (FDR ≤ 5%) and 132 metabolite-trait (FDR ≤ 5%) associations to pinpoint 216 transcript-metabolite-phenotype causal triplets (FDR = 1-0.952 = 9.75%) (Supplemental Table 6).”

      All examples presented in the original manuscript remained significant. The fact that the total effect in these examples is low makes them particularly interesting as it highlights how our approach can detect biologically plausible associations between a transcript and a phenotype that only show mild evidence through TWMR but are strongly supported when accounting for metabolites that mediate the transcript-phenotype relation, showcasing situations in which our method can provide a true advantage over classical approaches such as TWMR. Such examples may emerge due to opposite signed direct and indirect effects, which cancel each other out when it comes to testing total effects. What is key that we do not claim the total and the mediated effects to be different (as we would have very limited power to do so), but simply point out that under certain settings we are better powered to detect mediated effects than total ones. In the ANKH example (more details below), the total ANKH-calcium effect is almost exactly the same as the product of the 𝛼,-.%→056157 and 𝛼056157→0120*34 effects, simply the latter ones are detectable, while the total effect is not.

      In the revised manuscript the case for our selected examples is made even stronger thanks to an analysis proposed by Reviewer #1 that aimed at estimating the proportion of previously reported associations through automated literature review. For instance, while our literature review found previously reported evidence of the ANKH-calcium link and of the ANKH-citrate link, we did not identify any publication mentioning all 3 terms in combination in the abstract and/or title, illustrating how our approach can establish bridges between knowledge gaps. We revised the Results section describing the ANKH example accordingly:

      “The 126 triplets that were not identified through TWMR due to power issues represent putative new causal relations. This is well illustrated by a proof-of concept example involving ANKH [MIM: 605145] and calcium levels, for which 48 publications were identified through automated literature review (Supplemental Table 6). While the TWMR effect of ANKH expression on calcium levels was not significant (𝛼,-.%→012034 = −0.02; 𝑃 = 0.03), we observed that ANKH expression decreased citrate levels (𝛼,-.%→056157 = −0.30; 𝑃 = 2.2 × 1089:), which itself increased serum calcium levels (𝛼056157→012034 = 0.07; 𝑃 = 6.5 × 108;9). Mutations in ANKH have been associated with several rare mineralization disorders [MIM: 123000, 118600] [32] due to the gene encoding a transmembrane protein that channels inorganic pyrophosphate to the extracellular matrix, where at low concentrations it inhibits mineralization [33]. Recently, a study proposed that ANKH instead exports ATP to the extracellular space (which is then rapidly converted to inorganic pyrophosphate), along with citrate [34]. Citrate has a high binding affinity for calcium and influences its bioavailability by complexing calcium-phosphate during extracellular matrix mineralization and releasing calcium during bone resorption [35]. Together, our data support the role of ANKH in calcium homeostasis through regulation of citrate levels, connecting previously established independent links into a causal triad.”

      Furthermore, the analysis of the empirical data indicates that the ratio between direct and indirect effect of a transcript on a phenotype is in most cases close to identity, except for triplets with low total effects. This fact should be considered in the power analysis, which assigned the highest gain in power by the mediation analysis to cases of low direct to total effect ratio. The empirical data indicate that these cases might be rare or of minor relevance for the tested phenotypes.

      As our previous power analyses did not fully reflect scenarios observed from empirical data, we extended the range of covered 𝜌 (i.e., the ratio between direct and total effect), so that it mimics more closely the observed range of 𝜌. In the revised manuscript, 𝜌 varies from -2 to 2, so that we also consider configurations where direct and total effects have opposite direction. To provide the readers with a rough idea how frequent the different parameter combinations occur in real data, we now provide another heatmap indicating the density of detected associations in those parameter regimes as Supplemental Figure 4.

      This map can be brought in perspective of Figure 4A that illustrates the power of TWMR vs. mediation analysis over the same range of parameter settings.

      It becomes apparent from Supplemental Figure 4 that in real data, 𝜎 is always larger than 1 and often exceeds 10. Note, however, that this heatmap must be interpreted with care, since the “detected” density will be low in regions where both methods have low power.

      3) Related to the interpretation of causal links: horizontal pleiotropy needs to be considered. The authors report the identification of causal links between TMEM258, FADS1 and FADS2, arachidonic acid-derived lipids and complex phenotypes. However, they also mention the high degree of pleiotropy due to linkage disequilibrium at the underlying eQTL and mQTL region as well as the network of over 50 complex lipids known to be associated with the expression of the above transcripts. Thus, it seems possible that the levels of undetected lipid species may be more important for the phenotypic effect of variation in these transcripts and that the reported "mediators" are rather covariates. Such horizontal pleiotropy would violate a basic assumption of the MR approach. While we think that this does not invalidate the approach altogether, it does affect the interpretation of specific metabolites as mediators. This is aggravated by the fact that metabolic networks are more tightly interconnected than macromolecular interaction networks (assortative nature of metabolic networks) and that single point-measurements of metabolites may not be generally informative about the flux through a specific metabolic pathway.

      This is a valid point and we discuss this limitation in the revised Discussion:

      “It is to note that in the presence of unmeasured mediators or measured mediators without genetic instruments, our mediation estimates are lower bounds of the total existing mediation. In addition, unmeasured mediators sharing genetic instruments with the measured ones, can modify result interpretation as some of the observed mediators may simply be correlates of the true underlying mediators. While this is a limitation of all MR methods, metabolic networks may harbor particularly large number of genetically correlated metabolite species.”

    1. Author Response

      Reviewer #2 (Public Review):

      This paper presents novel evidence for the successor representation in the hippocampus and V1 for temporally structured visual sequences. Participants learned sequences of 4 items shown in specific locations (A-B-C-D) on the screen. On a subset of trials, participants were only shown one of the four items, which enabled the authors to test whether the remaining three items were reactivated equivalently, or whether the upcoming items were activated in a temporally graded predictive fashion, consistent with the successor representation. The data suggest the latter interpretation, which was observed in both the hippocampus and V1.

      The approach is well-motivated, and the hypotheses are laid out clearly. The manuscript is very clear and streamlined. The design adopted by the authors, which allowed them to disentangle spatial vs. temporal proximity, is clever and provides an interesting approach to the SR framework. The figures are also very clear and nicely designed. I just have a few comments which I hope the authors can address.

      We thank the reviewer for the positive evaluation.

      1) My main question is related to the difference between the analytic approach to V1 vs. hippocampal representations. In Fig. 3, the authors present evidence of a compelling gradation in V1 representations. However, the corresponding hippocampal results in Fig. 5 are collapsed across all predecessor vs. successor representations.

      I initially thought that the same approach could not be taken in the hippocampus (-3/-2/-1 vs. 1/2/3) due to the coarser representation of space - is that the correct interpretation? However, on p. 9 the authors state that they successfully trained a hippocampal classifier based on spatial locations, so I was unsure why the same approach would not be possible. It would be helpful if the authors could add a sentence clearly explaining why the plots and analyses are not parallel across V1 and the hippocampus.

      We appreciate the reviewer bringing up this point. The reviewer is correct, that in principle the same approach could be applied to both V1 and hippocampus. We have now added our motivation for collapsing the data for hippocampus and also appended the non-averaged hippocampus results as a Supplementary Figure. Below we copy our response to Reviewer #1 from above, who brought up a similar point.

      Given the significant, but very low classification accuracy in within the localizer (accuracy = 15% 3.6%, mean ± s.d.; p = 0.002), we had previously decided to only report averaged location results for the hippocampus as the non-averaged predictions would be very noisy. To put the hippocampus classification accuracy into context, in V1 cross-validated accuracy within the localizer was 92% ± 12%, mean ± s.d.).

      We now stressed this difference between V1 and hippocampus decoding in the Results section and motivate our reason for presenting averaged results:

      "Within localizer decoding accuracy results confirmed that hippocampus has a coarse representation of the eight stimulus locations (Figure 5B) within the localizer (one-sample t-test; t(34) = 3.28, p = 0.002; cross-validated accuracy = 15%  3.6%, mean  s.d.; see Materials and Methods). Notably, compared to V1 (cf. Figure 2A), within localizer accuracy was relatively low and as a consequence tuning curves in hippocampus appeared less sharp (Figure 5C). In order to maximize sensitivity for the hippocampus, we averaged classification evidence across successor and predecessor locations. Non-averaged results can be found in Supplementary Figure 1A."

      Further, we followed the reviewer’s suggestion and added a new supplementary Figure including the non-averaged results for hippocampus. The new Figure also includes the model comparison the reviewers had asked for. The new Supplementary Figure 1 is included here for convenience:

      2) The analysis disentangling temporal vs. spatial proximity in the localizer data (Fig. 6) is interesting, particularly the persistent gradation in hippocampal responses vs. their absence in V1. However, could the same/similar temporal vs. spatial model not be applied in the full vs. partial sequences as well, as one of the alternative models shown in Fig. 4? The CO model in Fig. 4B assumes a flat reactivation of all other items in the sequence, but it might be that the two items closer in terms of Euclidean distance are represented differently than the far item. After reading the detailed methods, I wonder if this was not possible because the second presented item was always the furthest from the start (180 degrees), but it would be helpful if the authors could clarify this.

      The reviewer is correct that the fact that the sequence order and spatial distance were not fully decorrelated (second presentation was always farthest away from starting dot, third and fourth dot always the same distance from start) prevents us from quantifying the interaction of the SR and CO model with a spatial model during the main task.

      We added the following to the Method section to clarify this:

      "Note that because within each dot sequence, temporal order and spatial distance were not perfectly decorrelated (e.g. the second sequence dot was always farthest apart from the starting dot), it is not possible to estimate the combined influence of the SR model and the spatial coactivation model on the observed BOLD activity."

      Having said that, we believe that there is little concern that the reported reactivations of the main task are driven by the Euclidean distance in a meaningful way for two reasons:

      (1) detailed analysis of the localizer data showed that there is no spatial spreading from one dot location to the other sequence locations (Figure 6). This is likely because the relevant dot locations were sufficiently spaced apart (at least 5.36 degrees of visual angle), whereas population receptive field sizes in V1 are well below 2 degrees (Dumoulin & Wandell, 2008). Given the lack of spreading during the localizer, where the dot was flashed for 13.5s, makes the presence of spreading during the main task, where the dot was flashed for only 100ms, equally unlikely.

      (2) the presence of spatial spreading would actually obfuscate the reported SR-like pattern and could not have caused it. Specifically, because the second sequence dot was always farthest apart from the start, this is where one would assume the least amount of activity spread (greatest Euclidean distance). Sequence dots three and four should be more active given that they are both closer to the starting point in terms of Euclidean distance. Our reported results are the opposite of that pattern, ruling out the possibility that these were caused by spatial spreading.

      3) As the authors state on p. 12, the present study did not require any long-term prospective planning. However, the participants' task during the full sequences was closely linked to their predictions about the temporal structure of the four stimuli. It would be useful to see whether the participants who were more closely 'locked' to the sequence and accurate at this temporal detection task also showed stronger SR representations (as these rely on temporal distance).

      This would also provide a useful test of the timescale at which successor representations are behaviorally relevant. In several prior studies, the timescales were quite long, so it would be important to test how strongly SR representations at these timescales relate to behavior.

      We thank the reviewer for this suggestion. In order to relate SR representations to behavior, we first calculated individual BOLD differences for successor vs predecessor locations to get an estimate for how much participant’s predictions were skewed toward future locations. One might argue, that participants with stronger predictions toward future locations would perform better at the behavioral task. We then correlated these values with behavioral accuracy across subjects. No significant correlation was found (r = 0.05; p = 0.769). The lack of significant correlation might not be surprising, given that our design is likely underpowered for such a between-subject correlation analysis. Additionally, there was no behavioral response in the prediction trials, that could be directly related to participants’ BOLD activity. Instead the behavioral response is taken from the full sequence trials.

      These new results were added to the results section:

      "One might argue that participants with stronger predictions toward future locations would perform better at the behavioral detection task. However, no such correlation between individual V1 BOLD activity and task accuracy was found in an across subject correlation analysis (see Materials and Methods, spearman r = 0.05; p = 0.769)."

      And described in the methods:

      "Correlation with behavior. In order to relate SR representations to behavior, we first calculated individual V1 BOLD differences for all successor vs all predecessor locations to get an estimate for how much participant’s predictions were skewed toward future locations. We then correlated these values with behavioral accuracy across subjects using spearman correlation."

    1. Author Response

      Reviewer #2 (Public Review):

      In the manuscript, Mijnheer et al mainly exploited CyTOF Helios mass cytometer and TCRβ repertoire sequencing to investigate the T cell composition and distribution in peripheral blood and synovial fluid, and further explored the temporal and spatial dynamics of regulatory T cells (Tregs) and non-Tregs in the inflamed joints of Juvenile Idiopathic Arthritis (JIA) patients. Their results indicate that the activated effector T cells and hyper-expanded Treg TCRβ clones found at the inflamed joints are highly persistent in the circulation, and the dominant of high degree of sequence similarity of Treg clones could serve as the novel therapeutic targets for the JIA treatment. Overall, the research design is appropriate, and the methods are adequately described in the study. However, several issues are required to be addressed.

      (1) The criteria for the JIA patient's recruitment should be clearly presented in the method section. For example, what is the specific included criteria and excluded criteria? Or did the patients take medicines for the treatment during the study?

      A total of 9 JIA patients were included in this study. Of these, n=2 were diagnosed with extended oligo JIA, n=2 with rheumatoid factor negative poly-articular JIA, and n=5 with oligo JIA, according to the revised criteria for JIA. The average age at the time of inclusion was 13,1 years (range 3,2 – 18,1 years) with a disease duration of 7,3 years (range 0.4 – 14.2 years). Due to limited sample availability, we did not have strict inclusion or exclusion criteria for JIA patient recruitment. For CyTOF analysis, patients were selected based on the criteria that the left and right knee joints should both be affected at the time of inclusion. For sequential TCR sequencing analysis, we included patients with a refractory disease course. At the time of first inclusion, patients were treated with non-steroidal anti-inflammatory drugs (NSAIDs) or methotrexate, but no biologicals. For the refractory time point samples, patients were treated with disease modifying anti-rheumatic drug (DMARDs) (leflunomide) and/or biologicals (Humira) after first sample inclusion due to the refractory nature of their disease.

      We have now updated the methods section (lines 455-463) of the revised manuscript with more information on patient recruitment, and included information on diagnosis, sex, age, disease duration and medication for every patient in Supplementary File 1.

      (2) As for the correlation analysis of the entire spectrum of node frequencies, the SFMCs and PBMCs isolated from 3 patients were conducted in the study. The sample size is too limited to obtain robust results and to make a convincing conclusion from the correlation analysis. And it is shown that a total of 9 JIA patients have been involved in the study. Could the author clarify it?

      In order to strengthen our observations, we now included single-cell transcriptomics data obtained from Zhang, et al. (https://doi.org/10.1038/s41590-019-0378-1). In this data, we identified a cluster of CD4+FOXP3+ Tregs (new Figure 2-figure supplement 2A and 3B) that showed increased frequency in RA patients (new Figure 2-figure supplement 2C), consistent with the high frequency of Tregs that we observed in our JIA SFMC samples. Additionally, the expression of markers of chronic TCR activation (PDCD1 (PD1), CTLA4 and ICOS), and cytokines (TNF, IFNG and GZMB) were significantly increased in RA compared to OA, in line with what we observed in JIA SFMC (new Figure 2-figure supplement 2D). These results demonstrate that, although the number of JIA patients included in our study is low, obtained results are robustly reproducible in an independent, comparable dataset.

      We do agree with the reviewer that the low number of patients included in our study warrants further validation. Therefore, we have now added the following line in the discussion to highlight this (lines 369-371): “Further validation of our observations in larger cohorts of JIA patients should help to substantiate these results and aid the identification of pathogenic Treg populations across patients.”.

      Regarding the number of patients included in our studies, we have now included Supplemental File 1, which clarifies which JIA patients have been used for each analysis in our study.

      (3) The results of the study indicate that the hyper-expanded T cell clones are shared between left and right knee joints. Since JIA may affect one or more joints, did the author check other joints to see if the same expanded T cell clones infiltrate multiple joints, such as hand or wrist?

      Indeed, it would be interesting to see whether hyper-expanded clones are shared between multiple inflamed joints other than knees. However, samples from other locations are very difficult to obtain and very little synovial fluid can be extracted from joints such as hands and wrists. Therefore, the number of cells obtained from these joints would be too limited to perform mass cytometry or TCR sequencing. Thus, we chose to focus on synovial fluid from knee joints in our studies. Moreover, for oligoarticular JIA patients, only the large joints are affected (of which the knees are most typical), so for these patients it would not be possible to include other joints.

      (4) For Fig.2B, the Treg CD25+FOXP3+ population was significantly enriched in synovial fluid (SF). Is it from the left knee joints or the right knee joints?

      Figure 2B shows data from both knee joints. We have now clarified this in the figure legend by adding “For SFMCs, data from the right and left knee joints for all patients is shown” (lines 179-180).

      And in the context of Line 144-148, it indicated the SF, however, the title of axis in Fig.2B indicated Synovial Fluid Mononuclear Cells (SFMCs). Please keep consistent.

      We thank the reviewer for bringing this to our attention. We have critically revised the manuscript and made the use of SF versus SFMCs more consistent.

      (5) For the longitudinal sampling timelines of JIA patients shown in Supplementary Fig.3, the interval of PB and SF sample collection is not consistent. And only 1 patient completed 4 visits and the sample collection. It is hard to make any conclusion from 1 patient.

      In our study, we had longitudinal samples available for 5 JIA patients for which we performed TCR sequencing of Tregs from SFMCs from different joints (right or left) at least two time points. In the manuscript we mainly focused on patient 1, as for this patient the largest amount of data was available. However, for all other longitudinal patient samples included, we also show that dominant clones persist over time (Figure 4A and 5A). To further highlight that our observations are not just applicable to one patient, but consistent for all patients included, we now included detailed analysis for all patients in Figure 4-figure supplement 3 and Figure 5-figure supplement 1. This analysis shows that frequencies of shared TCRβs are consistent over time in all patients.

    1. Author Response

      Reviewer #1 (Public Review):

      Detecting and quantifying balancing selection is a notoriously difficult challenge. Because the distribution of times to fixation or removal of strictly neutral variants has a long tail, it can be hard to exclude the null hypothesis of neutrality when testing for balancing selection that was not established so long ago that trans-specific variants can be observed. As Aqil et al. point out, most efforts to detect balancing selection in the human genome have been focused on single nucleotide variants. The authors seek to characterize the amount of balancing selection specific for polymorphic deletions. The authors justify their focus based on the fact that deletions are more likely to have functional consequences than single nucleotide variants, making it more likely that if they have remained for many generations, this could be a signature of balancing selection. That said, multiple aspects of the analysis deserve more attention.

      I have two broad concerns about the manuscript that the authors need to address. First, the authors use neutral simulations to exclude that neutrality alone can explain the amount of allele sharing observed between African modern humans and the archaic genomes. My concern is that human demography models, including the one from Gravel et al. (2011) used by the author are always simplifications of the complex demographic events that shaped human populations during evolution. In the case of the specific model used by the authors, African populations were inferred by the Gravel et al. model to have a constant population size for the past ~150,000 years (parameters Taf and Naf in the original model). This is an unrealistic assumption of this model. In brief, I am wondering how much the claim of the authors that neutrality alone cannot explain patterns of allele sharing is potentially based on mis-specifications of the neutral demography model. For example, the more fine scale fluctuations of effective population sizes in Africa inferred by author L. Speidel in 2019 Nature (Figure 3) paint a different picture than the Gravel et al. model. The authors need to run extensive testing of the robustness of their conclusions to changes in the neutral demographic model used. What if the average ancestral population size was closer to 20,000? What if it was closer to 50,000 and frequency fluctuations every generation were smaller? Given how uncertain past population sizes really were and the current uncertainties about demographic reconstruction in particular relative to linked selection, the authors need to explore a range of past population size beyond the idiosyncrasies of a specific model.

      These are great suggestions. Based on them, we now conducted 37 additional simulations with different sets of parameters, including adding the Speidel et al. model to the mix (the new Figure 1C). As discussed above (please refer to our response to the general reviews) and in the Results section, realistic neutral scenarios cannot explain the excess allele sharing.

      My second broad concern is that it is difficult to evaluate how novel the findings really are. It is true that the authors focus on deletions while pasts scans for balancing selection in the human genome focused on SNVs. But it could be the case that a substantial number of the deletions identified here as under balancing selection could have previously been identified as such loci through linked SNVs by the scans cited by the authors. The authors need to provide quantification of how many of their deletions are truly novel balancing selection loci as opposed to balancing selection loci already identified through linked SNVs.

      The reviewer is right. We now compared our results with previous genome-wide studies, which have been notoriously inconsistent with each other. We found that virtually all of our candidates are novel, as described in our response to the general reviews and our Results section.

      The novelty of the balanced deletions will also be better established by providing a more quantitative and less anecdotal functional analysis. It is true that the deletions include immune loci, but are they statistically enriched for immune loci as annotated for example by Gene Ontology, in a way that shows that their distribution across the genome is not random but indeed driven by selection enriching them at loci with specific functions? In addition, do the pie charts in Figure 5E, represent a statistically significant deviation from left to right or not?

      We appreciate the reviewers’ suggestions, which led us to conduct a series of very fruitful analyses. As discussed above, we now found that ancient deletions are significantly more likely to have GWAS traits and be exonic (Figure 5B) and significantly more likely to affect immunity, blood, and metabolism-related traits (Figure 5C). Moreover, we found that ancient deletions are depleted for smaller size categories but show significant enrichment for the sizes 95th percentile and above (Figure 7A). We now discussed these findings in the Results section.

      Reviewer #2 (Public Review):

      The authors assess evidence for balancing selection by looking at old polymorphisms where the derived allele is shared by descent with archaic humans, meaning the polymorphism must predate this split. Using simulations and several features of these old polymorphisms, they evaluate whether and what signatures of balancing selection are enriched in these polymorphisms. This is a well-explained and thorough analysis, and a clever way to approach a difficult question, yet the analysis remains fairly descriptive and the claims that can be made are not strong. For instance, the title of the paper does not state a particular finding of balancing selection, and several claims are "may" such as "A significant portion of ancient polymorphisms may have evolved under medium-term balancing selection" and "These results suggest that at least 27% of common functional deletion polymorphisms may have been evolving under balancing selection".

      We thank the reviewer for their insights. We agree that balancing selection is a difficult to elucidate definitively. However, in our revisions, we have conducted several additional analyses based on reviewers’ suggestions as discussed under individual comments. We believe that these analyses strengthen our claims.

      First, using simulations, they show there are more such ancient nonsynonymous and (indirectly) deletion variants than expected under a simple neutral model. The enrichment is nominal when compared only with Denisovan sharing, which could be explained due to some superarchaic ancestry in Denisovans (though not clear if that holds up quantitatively). The classification of the shared polymorphisms as recurrent, recently introgressed, or ancient shared by descent could be more carefully tested. In particular, I'm concerned about the possible inclusion of recurrent mutations among the ancient set. Although the age trend is consistent, it does not indicate how much misclassification there might still be. For example, there are "ancient" deletions that have inferred ages more recent than the human-archaic split (shown in Fig. 3).

      We agree that recurrent mutations are crucial to discriminate from the ancient ones in our analysis. We have now conducted additional analysis of allele frequency and CG content to further test potential recurrent mutations in our datasets as described in our response to general reviews. We described these in our Results section and Figure S1. In addition, we actually conducted even more stringent filtering requiring perfect LD and found that this increased stringency did not affect our results substantially. Thus, we believe that our pipeline identifies ancient deletions very conservatively and likely harbors a considerable number of false negatives, where ancient deletions are categorized as recurrent.

      The reviewer’s observation that some ancient deletions have recent dates is indeed interesting. The dating of individual alleles assumes neutrality and broadly depends on haplotype length and allele frequency. We believe that given the potential soft sweeps acting on these deletions, it is possible that the dates may be biased in some cases. For example, if there is a recent sweep on an ancient deletion, this may lead to longer haplotype lengths and, thus, a more recent date for these alleles. Therefore, the ancient derived alleles (those that are shared with archaic hominins) which happen to have recent allele dates may be of particular interest for future scrutiny. We now discuss this particular issue further in the Results section as follows:

      “Counterintuitively, some “ancient” deletions have very recent dates. This may be due to instances of recent soft sweeps involving some deletions leading to an increased length of the associated haplotype and an artificial decrease in age. Secondly, some ancient deletions may have low frequencies, which too creates a downward bias in age. Lastly, this may be due to rare instances of miscategorization of non-ancient deletions as ancient.”

      For the rest of the paper, the authors then focus on the deletion variants, showing that these ancient deletions show an elevated signature of balancing selection (stdbeta2) but do not show less variance in allele frequency over time as would be expected under an overdominance model. They infer the mechanism to be spatial or temporal variation in selection or negative frequency-dependent selection by process of elimination. They identify the subset of ancient deletion polymorphisms that overlap exons and are associated with phenotypes, finding a high proportion of ancient deletions that fall in both these categories. The identification of this set of potentially causal deletions that may be under balancing selection is a set that is of interest to the wider community for follow up (though several have already been the subject of study and individual publications from this lab). Overall, this is a useful combination of simulation work and assessment of an intriguing set of old deletion polymorphisms. Put together, the analysis does support evidence of balancing selection on some of them, but the extent is still not clear.

      We thank the reviewer. To further determine the extent of balancing selection acting on these ancient deletions, we conducted several enrichment analyses described above (please refer to our response to the general reviews) and in the paper. Briefly, we now added Figures 5B, 5C and 7A to describe these new analyses.

    1. Author Response

      Reviewer #1 (Public Review):

      This study demonstrates the role of the circadian clock in spatiotemporal regulation of floral development. The authors nicely illustrated floral development patterns in domesticated sunflower. In particular, during anthesis, discrete developmental zones, namely pseudowhorls, are established, and hundreds of florets simultaneously undergo maturation in each psudowhorl in a circadian-dependent manner. Consistently, the flower development follows key features of the circadian clock, such as temperature compensation and gating of plant response to environmental stimuli. Evolutionary advantages of this regulation will add more merit to this study.

      We thank the reviewer for this suggestion. We have performed new experiments (Figures 7 and 7-S1) that demonstrate that delays in anthesis relative to dawn and disruption of pseudowhorl formation both negatively impact pollinator visits to flowers. These findings suggest that circadian and light regulation of floral anthesis may have significant impacts on male reproductive fitness.

      Reviewer #2 (Public Review):

      Little is known about how the circadian clock regulates the timing of anthesis. This manuscript shows that the circadian clock regulates the diurnal rhythms in floral development of the sunflower. The authors have developed a new method to characterize the timing of floral development under normal conditions as well as constant dark and light conditions. The results from the treatment of darkness during the subjective night and day suggest that the circadian clock regulates the growth of ovary, stamen, and style differently.

      All clock papers claim that the circadian clock regulates the fitness of organisms, however, it is hard to evaluate how the circadian clock affects the fitness of organisms. The timing of pollen release and stigma maturity is directly related to plant fitness. That's why the authors suggest that the circadian clock in sunflowers increases plant fitness by regulating the timing of anthesis.

      Although the authors manipulated the light and temperature to examine the role of the circadian clock in floral development, the weakness of this manuscript is that there is no molecular evidence to show how the clock regulates floral development.

      We acknowledge that this study does not demonstrate the molecular mechanisms by which the circadian clock and environmental sensing pathways regulate floral anthesis in sunflower. However, we feel that our demonstration that the circadian clock is involved in the generation of spatial patterns of development on the sunflower inflorescence disk is in itself novel and significant.

      Reviewer #3 (Public Review):

      The flowering heads of species in the Asteracaeae comprise large number of flowers, and this phenotype is thought to contribute to their reproductive success. The Harmer lab has developed sunflower as an experimental model to investigate the contribution of circadian regulation to the processes of reproduction in the Asteraceae, and this paper presents a new addition to this line of research.

      The novelty of the article is that it resolves unanswered questions around the processes that underlie coordinated flowering within the disc structure of the floral capitulum. The authors demonstrate a role for circadian clock in the temporal structuring of this process. They identify a free running rhythm in constant darkness of floral anthesis, and this rhythm has several key characteristics of circadian rhythms. The data collected also indicate that the circadian clock might gate the response of anthesis to darkness.

      I like the presentation of an external coincidence model for the interaction of light and circadian cues in the floral developmental program of the capitulum. However, I wonder whether this is the only potential explanation. The data in Fig. 4C look like classical entrainment responses. Are the authors sure that they are not just seeing an entrainment process within the capitulum, combined with a masking effect of continuous light upon the rhythmic phenotype? I encourage the authors to retain speculation about the coincidence model within the discussion- it's so important for future work- but perhaps consider alternative interpretations of the data also.

      We thank the reviewer for their positive comments and overall enthusiasm for the study. We agree that it is entirely plausible that continuous light masks circadian clock-controlled rhythms in floral organ development; in our view, this is a restatement of the external coincidence model. We argue that in developing sunflowers, a circadian clock-regulated process controls elongation of floret organs. Normal development depends upon a dark period of at least 4.5 hours occurring during the subjective night. In constant light conditions, or early in re-entrainment when the dark period occurs during the subjective day, normal development is inhibited. This model is analogous to the photoperiodic control of flowering time in short-day plants, in which light perceived during the subjective night inhibits the floral transition.

    1. Author Response

      Reviewer #1 (Public Review):

      Tafenoquine is an important 8-aminoquinoline antimalarial, mostly aimed at the management of Plasmodium vivax malaria. Through the retrospective analysis of several previously performed efficacy trials, the authors aimed to better understand the drugs mechanism of action, while exploring the possibility of improved efficacy through dose increment.

      Strengths: robust analysis approaches unlocked three main messages with the potential of improving the clinical practice:

      i. P. vivax recurrency is positively associated with tafenoquine terminal half-life and D7 methemoglobin levels.

      ii. The methemoglobin levels support the current view that tafenoquine, acts through its metabolites, similar to what is believed for primaquine.

      ii. Most importantly, the therapeutic window of tafenoquine is wider than previously considered, allowing the suggestion of a significant increase in dosing, from 300 mg to 450 mg, leading to significantly increased efficacy.

      Weaknesses: being a retrospective analysis, the work is limited to the available data. In particular, and as referred by the authors, no drug levels are reported. Additionally, there are some aspects that in my view need more detailed analysis and discussion, in particular, what seems to be a lack of exploration as to the importance (or lack of it) of the patient CYP2D6 status in Tafenoquine T1/2, methemoglobin levels, and overall efficacy. These mild weaknesses do not change the overall conclusions of the study.

      We thank the reviewer for their positive comments.

      The analysis estimates the parameters of the PK model from 4499 measured drug concentrations measured for 718 individuals between days 0 and 180. The active metabolites of tafenoquine are unknown and thus could not be quantified.

      Whilst the study is retrospective it includes 77% (651/847) of all patients enrolled in published P. vivax treatment trials of tafenoquine.

      We respond to the relationship between CYP2D6 polymorphisms and the other outcomes in our response to Reviewer #1, Comment 2.

      Reviewer #3 (Public Review):

      By assembling the vast majority of global tafenoquine pharmacology data from clinical treatment studies that led to the 8-aminoquinoline's registration in 2018, the authors of this manuscript have convincingly made their argument that the currently recommended treatment dosage of 300mg (in combination with chloroquine) is too low and needs to be increased by at least 50%. Access to the multiple data sets is thorough, the modelling reasonable and the conclusion reached is sound.

      How did we get here (again) under-dosing malaria patients with a class of drugs we have been working on for a century? Speaking as someone who was associated with tafenoquine development over two decades, it seems that worry about adverse events, specifically hemolysis in G6PD deficient persons, overcame the operational requirement to give enough drugs in a single dose regimen. However, tafenoquine is very safe in G6PD normal persons who by definition were the ones entered into the clinical treatment trials. Risk-benefit judgments cannot always be weighted towards "safety" especially when the real concern was that a single severe adverse event would derail the entire development program. Real-world effectiveness matters and should now result in the studies the authors state are needed to certify the higher dose regimen.

      1) The schizophrenic nature of tafenoquine development needs to be mentioned. This manuscript discusses malaria treatment and includes nearly all the relevant data, but extensive work was also done to support the chemoprophylaxis indication largely sponsored by the US Army. These prophylaxis efforts were often separate from the parallel efforts on treatment indication to the loss of both groups who were ostensibly working on the same drug. 450mg tafenoquine is not a large dose; 600mg (over 3 days) is routinely given at the beginning of malaria chemoprophylaxis. Up to twice that amount was given in phase 2 studies done in Kenya in 1998 which resulted in the only described severe hemolytic reaction when one G6PD deficient heterozygote woman was given 1200mg over 3 days due to incorrect recording of her G6PD status. It is not easy to hemolyze even G6PD-deficient erythrocytes due to the slow metabolism of tafenoquine. Nearly all clinical trials of both primaquine and tafenoquine have experienced similar hemolytic events when there were errors in the determination of G6PD status. This does not mean that all 8-aminoquinolines are dangerous drugs, only that a known genetic polymorphism needs to be accounted for when treating vivax malaria.

      It is notable that much larger doses of tafenoquine have been evaluated previously and these have been well tolerated in individuals with G6PD activity >30% (previous studies used semi-quantitative tests). We have added a review of all patients with P. vivax malaria who have been studied in treatment trials. A total of 847 were enrolled in all studies and our series contains individual patient data on 651 (77%) of these patients.

      We have added the following to the Discussion on lines 277-283:

      “Much larger doses have been studied in treatment and prophylaxis trials (up to 2100mg given over one week, Walsh et al., 1999, see Supplementary Appendix). The only report of a severe haemolytic reaction occurred in a female patient heterozygous for G6PD deficiency (A- variant) and received a total dose of 1200mg tafenoquine over 3 days (Shanks et al., 2001). In the same study, a homozygous female (A- variant) who was also given 1200mg tafenoquine over 3 days had an estimated 3g/dL drop in haemoglobin, but remained asymptomatic.”

      2) The authors point out the utility of 7-day methemoglobin concentrations in predicted drug success/failure in the prevention of subsequent relapses. This is important and stresses the requirement of drug metabolism to a redox-active intermediate as being a common property of all 8-aminoquinolines. Tafenoquine and primaquine are similar but not identical and the slow metabolism of tafenoquine to its redox-active intermediates explains its main advantage of being capable of supporting a single-dose cure. The main reason this was not appreciated much earlier is we were looking in the wrong place. Metabolic end-products (5,6 orthoquinones) are in very low concentrations after single-dose tafenoquine in the blood, but being water-soluble they are easily located in the urine. Such urine metabolites indicative of redox action are very likely to be complementary to methemoglobin measurements which mark the redox effect on the erythrocyte. Despite earlier simplifying assumptions made during tafenoquine development (no significant metabolites exist), metabolism to redox-active intermediates must be embraced as the explanation of drug efficacy and not a cause of undesirable adverse events.

      Another dark cloud over tafenoquine mentioned by the authors was the very disappointing results of the INSPECTOR trial in Indonesia whose full results are yet to be published. The failure of P vivax relapse prevention using 300mg tafenoquine with dihydroartemisinin-piperaquine in Indonesian soldiers was largely ascribed to under-dosing. Although this may have been partially true, another aspect indicated in figure 15 of the appendix is the nature of the partner drug. Artemisinin combinations with tafenoquine do not produce the same amount of methaemoglobin (indicative of redox metabolism) as when combined with the registered partner drug chloroquine. We do not understand tafenoquine metabolism, but it is increasingly clear that what drug is combined with tafenoquine makes a very substantial difference. Despite the great operational desire to use artemisinin combination therapy for all malaria treatment regimens, this may not be possible with tafenoquine. Chloroquine likely is driving tafenoquine metabolism as it has no real effect on latent hypnozoites in the liver by itself. Increased dose studies with tafenoquine need to be done with chloroquine, not artemisinin.

      We are aware that this is an area of intense interest and that ex vivo data were presented at the recent ASTMH conference in Seattle suggestive of a drug-drug interaction between artemisinisin and tafenoquine. However, there are as yet insufficient in-vivo data to conclude with tafenoquine reducing the methaemoglobin concentration indicative of reducing redox metabolism compared to chloroquine and tafenoquine. In addition these data as yet unpublished.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper is a continuation of other research by this group and represents another step back in time for peptide preservation in eggshells. It is exciting to see Miocene age peptides and that they overlap so completely with both extant ostrich struthiocalcin as well as the previously described Pliocene peptides. The biggest weakness is the lack of tables showing both the de novo peptides as well as those detected by database searching.

      We thank the Reviewer for their positive assessment of our work. We now provide a table with peptides identified by database searching as well as the annotated tandem mass spectra for the peptides.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Germanos et al present preclinical evidence of a dynamic interplay between tumor microenvironmental elements underlying prostate cancer initiation, progression, and emerging therapeutic resistance in the transgenic mouse model. The authors identify an intermediate luminal cell population trans-differentiating from a hypo-proliferative basal cell subset, meanwhile, hyper-proliferative basal cells replenish a non-differentiating basal subpopulation. The meticulous methodologic approach identifies candidate cellular interactions in fibroblasts, MDSCs, and immune cell populations associated with PTEN loss. The generalization of these findings to human data sets is of particular interest and recommended for future studies on this topic. Mechanistic studies with multi-cellular co-culture models are needed to extend and validate the findings in this report.

      We thank the reviewer for finding our research “meticulous” in its approach. We agree that validating our findings in human contexts is a vital next step and have added new orthogonal datasets in the revised manuscript (Figure 4D-E). We also agree that complex molecular studies will be needed to fully evaluate our cell-cell interaction hypotheses. To this end, we have elaborated on appropriate follow-up studies in the discussion (Lines 625-628, 642-643, 657-659, 675-677).

      Strengths and Weaknesses:

      The study focuses on a clinically highly relevant and timely topic. The strength of this manuscript is the meticulous description of the Methods and model development and the integration of state-of-the-art orthogonal data sets. However, the number of data points across the experiments (n = 2 or 3) with considerable variability in the Ptenfl/fl group limits the interpretation of findings. Additionally, further experiments are needed to validate these observations in human prostate cancer and establish the potential translational relevance of these findings.

      We are ecstatic that the reviewer finds our study “clinically highly relevant.” We agree that the low sample size is a potential limitation but believe that our overall results are robust and enable concrete conclusions for both epithelial and immune cell populations. This is in part because we validated our findings in orthogonal human datasets (Figure 4A-C, Figure 5H) in the original manuscript. However, to add rigor to our study, we have conducted new scRNAseq analysis showing that our findings correlate well with both human patient data (Figure 4D-E) and orthogonal mouse models (Figure 4F-G). Furthermore, we conducted additional scRNAseq on castrated WT murine prostate to demonstrate how castration plays an important role in translational heterogeneity in intermediate cells (Figure 4H, Figure 3 – figure supplement 1G).

      As such, the report is fairly descriptive, and expanding the discussion on the mechanistic studies needed to identify which of these interactions drives aggressive prostate cancer would improve this report.

      We agree with the reviewer that additional discussion of follow-up studies is necessary. As such, we have updated the discussion to highlight the molecular studies needed to fully characterize the cellular phenotypes described in this manuscript (Lines 625-628, 642-643, 657-659, 675-677).

      Reviewer #2 (Public Review):

      This work provides a thorough characterization of tumor cell and microenvironment dynamics in a castrate Pten null prostate cancer model and details the strength of cellular interactions using single-cell RNA sequencing. The search for a preexisting castrate-resistant prostate progenitor has been upended in recent years with the discovery that prostate luminal cells adapt to low androgen environments by undergoing lineage plasticity rather than an expansion of proximal progenitors. This paper provides indirect evidence that basal epithelia give rise to 'intermediate' epithelia through increased translation in intact and castrate Pten null mice cells, which is validated in a Pten null, 4ebp1 mutant mouse model.

      Strengths:

      The single-cell data are robust and expertly presented in the figures. The methods are largely appropriate and the delineation of experimental protocols is straightforward. The analysis is comprehensive and well described in relation to biological questions of interest to the community. The validation of the effect of translation on prostate epithelial viability in relation to initial findings advances our understanding of how cells survive in low androgen environments. The addition of a public portal for the data is highly useful.

      We thank the reviewer for evaluating our work as “robust and expertly presented,” “comprehensive,” and “highly useful.”

      In response to the reviewer’s in-depth comments, we have revised our nomenclature of WT epithelial cell subtypes to specifically distinguish between Krt4+/Tacstd2+ urethral, prostatic, and cancer-derived cells (Lines 163-185). We now find urethral and luminal progenitor groups in WT intact mice, which are distinct from “intermediate” cells arising from Pten loss (Figure 1 – figure supplement 1D-F). We have accordingly revised our interpretation of the potential origins of these intermediate cells in cancer (Lines 256-275).

      Weaknesses:

      The PB-Cre4 promoter seems to be promiscuously inactivating Pten in basal, intermediate, and luminal cells, which is problematic as this confounds the ability to differentiate between cells that are undergoing lineage plasticity vs. expansion of a pre-existing progenitor cell type. Much recent evidence points to lineage plasticity of prostate luminal tumor cells under androgen deprivation rather than survival and expansion of a pre-existing castrate-resistant basal or intermediate cell type. Accordingly, the observation that basal epithelia may transdifferentiate to intermediate epithelia or that a pre-existing intermediate luminal cell state is expanded under castration may be artifacts of the model without reproduction in human prostate cancer. The use of trajectory analysis of single-cell data to demonstrate basal or intermediate cell lineage transdifferentiation is a weaker type of evidence than the lineage tracing of individual cell types provided by other groups, which argue against transdifferentiation and for lineage plasticity.

      This is a very thoughtful and nuanced comment. We agree that the PB-Cre4 promoter is promiscuously inactivating Pten in basal, luminal progenitor cells, and luminal cells which does confound the ability to differentiate between cells that are undergoing lineage plasticity versus expansion of pre-exisiting progenitor cell types. As such, we now expand our results section to include non-basal routes to the expansion of the Pten intermediate cell population (Lines 261-275). Furthermore, we also comprehensively discuss the limitations of our models in the discussion section highlighting the need to validate our findings using lineage tracing or newer techniques such as DNA Typewriter (Lines 616-628) (Choi et al., Nature 2022).

      Currently it is not possible to conduct lineage tracing within the human prostate making it impossible to determine if basal epithelia may transdifferentiate to intermediate epithelia or if a pre-existing intermediate luminal cell state is expanded under castration. However, we do present new human scRNAseq data that the intermediate cell state, as reflected by the 5-gene castration signature, is enriched specifically in metastatic, but not localized prostate cancer (Figure 4D-E). Furthermore, we show that this gene signature is also relevant in a completely different progression model of murine prostate cancer (Figure 4F-G). Thus, while not perfect, our model does have potential human relevance despite the limitations which we address in the manuscript (Lines 261-275, 616-628).

    1. Author Response

      Reviewer #1 (Public Review):

      Kang et al. have performed whole exome sequencing of gall bladder carcinomas and associated metastases, including analysis of rapid autopsy specimens in selected cases. They have also attempted to delineate patterns of clonal and subclonal evolution across this cohort. In cases where BilIN was identified, the authors show that subclones within these precursor lesions can expand and diversify to populate the primary tumor and metastatic sites. They also demonstrate subclonal variation and branching evolution across metastatic sites within the same patient, with the suggestion that multiple subclonal populations may metastasize together to seed different sites. Lastly, they highlight ERBB2 amplification as a recurrent event observed in gall bladder carcinomas.

      While these data add to the literature and start to examine important questions related to clonal evolution in a relatively rare malignancy, the authors' findings are very descriptive and it is hard to draw many generalizable conclusions from their data. In addition, the presentation of their figures is somewhat confusing and difficult to interpret. For example, they do not separate their clonal analyses by disease site and by time in a readily interpretable manner, as in some instances of Figure 2 and Figure 3 the clone maps are from different sites collected at the same time point, while others show some samples at different time points. Depicting these hierarchies in a more organized and clearly understandable manner would help readers more easily interpret the authors' findings. In addition, the clinical implications of these clonal hierarchies and their heterogeneity are unclear, as the authors do not relate the observed evolution to intervening therapies and may not be powered to do so with this dataset.

      Thank you for the constructive and valuable comments about 1) figures and 2) clinical implications.

      1) We agree with your opinion that Figures 2 and 3 are confusing. Reflecting on your comment, Figures 2 and 3 have been modified. Now, the time point at which the tissue was obtained and the anatomical location of the tissue are readily visible in the redesigned figures.

      2) From a clinical point of view, we believe that our study highlights the importance of precise genomic analysis of multi-regional and longitudinal samples in individual cancer patients. In the current oncology clinics, cancer panel data of patients are being used to identify druggable mutations usually with a single tumor sample. However, we found that only a part of the mutations was clonal while a substantial proportion was subclonal, which is usually not an effective druggable target. For example, in the GB-S2 patient, after sequencing with GB tissue, ERBB2 targeting treatment would have been performed if a specific clinical trial is available because ERBB2 p.V777L is pathogenic. However, our clonal evolution analysis suggests that ERBB2 targeting strategy may not be effective in subclones without the ERBB2 p.V777L mutation, especially from regional metastasis. We have added the description for this part to the Discussion section (Page 13, Line 12-15).

      Additional areas that would require clarification include:

      1) There are very few details on how the authors performed their subclone analysis to identify major subclones, and what each of the clusters in Supplemental Figure 1 represents. In addition, they do not describe how they determined that the highlighted mutations in Table 2 were drivers for metastasis and subclonal expansion. Were these the only genes that exhibited increased allele frequencies in metastatic sites, or were other statistical criteria used?

      Thank you for the important comment about 1) clone analysis and 2) highlighted mutations in Table 2.

      1) Mutations were timed as clonal or subclonal through PyClone (Roth A et al., Nat Methods. 2014) clustering (Figure 1—figure supplement 1). Phylogenetic trees were constructed using the mutation clusters identified with PyClone as an input of CITUP (Malikic S et al., Bioinformatics. 2015) (Figures 2 and 3). We added the sentence "See Supplementary File 1 to check the matching information for the PyClone clusters and the CITUP clones." to the supplementary figure legend.

      2) A full list of mutations constituting a CITUP clone can be found in Supplementary File 1. Among the mutations, previously reported cancer-associated genes harboring them were selected manually and listed in Table 2. References for each gene are introduced in the 'Evolutionary trajectories and expansion of subclones during regional and distant metastasis' section.

      2) The authors do not discuss the relevance of variation in mutational signatures observed with disease progression/metastasis, e.g., is there any significance that signature 22 (aristolochic acid) and signature 24 (aflatoxin) are increased in metastases? In addition, when comparing their data to previously published reports in Figure 1B and Figure 4A, it would be helpful if the authors discussed possible reasons for some of the large differences in mutational or signature frequencies across datasets. For example, do the authors think the frequency of ERBB2 alterations is so much higher in their cohort than in prior reports due to methodological/data reasons or due to differences in patient population?

      Thank you for the constructive and valuable comments about 1) mutational signatures observed with disease progression/metastasis and 2) differences in mutational or signature frequencies across datasets.

      1) During the revision process, signatures 22 and 24 highlighted in the metastasis stage were validated by two additional tools, Signal (Degasperi A et al., Nat Cancer. 2020) and MuSiCa (Diaz-Gay M et al., BMC Bioinformatics. 2018) (Figure 4—figure supplement 3). Aristolochic acid is an ingredient of oriental herbal medicine (Debelle FD et al., Kidney Int. 2008, Hoang ML et al., Sci Transl Med. 2013). Given that all the patients in our cohort are Korean, and a recent study found that Korean cancer patients are frequently exposed to herbal medicines (Kwon JH et al., Cancer Res Treat 2019), one possible explanation is that some patients might have been exposed to herbal remedies containing aristolochic acid. On the other hand, aflatoxin is known to be contained in soybean paste and soy sauce, which are widely used in Korean food (Ok HE et al., J Food Prot. 2007). Considering that the signatures 22 and 24 are found not in early carcinogenesis but in late carcinogenesis and metastasis (Figure 4B and Figure 4—figure supplement 3), the two carcinogens appear to have little impact on the early stage of cancer development, but their impacts might be highlighted in overt cancer cells. Further investigation is required because it is difficult to determine the etiology of signatures 22 and 24 with this limited patient data. We updated this part in the Discussion section (Page 13, Line 4-7).

      2) In the two previous genomics studies on GBAC, the prevalence of ERBB2 alteration was 7.9% (Narayan RR et al., Cancer. 2019) and 9.4% (Li M et al., Nat Genet. 2014), respectively. Compared with these data, our data is characterized by relatively higher ERBB2 alterations (54.5%: amplification in 27.3% and SNV in 27.3%) (Figure 1B). A higher prevalence of ERBB2 alteration was also reported in other studies on GBAC, with corresponding rates of 28.6% (amplification and overexpression, Nam AR et al., Oncotarget. 2016) and 36.4% (amplification only, Lin J et al., Nat Commun. 2021). The variations in ethnicity and culture might have contributed to the differences. This part is described in the Discussion section (Page 11, Line 19-23). In addition, the discrepancy in Figure 4A might be attributed to the difference in analyzed samples: our study included precancerous and metastatic lesions while the other two studies uniformly analyzed primary tumors.

      Reference for reply 1)

      • Kwon JH, Lee SC, Lee MA, Kim YJ, Kang JH, Kim JY, et al. Behaviors and Attitudes toward the Use of Complementary and Alternative Medicine among Korean Cancer Patients. Cancer Res Treat. 2019;51(3):851-60.

      3) The authors try to describe and draw conclusions about the possibility of metastasis to metastasis spread in p.6, lines 6-10 "In our study, of 7 patients with 2 or more metastatic lesions, evidence of metastasis-to-metastasis spread was found in 2 patients (28.6%). In GB-A1 (Figure 2A), it appears that CBD, omentum 1-2, mesentery, and abdominal wall 2-4 lesions may originate from abdominal wall 1 (old) rather than from primary GBAC considering clone F." The authors conclude here that the spread arose from abdominal wall 1, but this lesion is only separated from the CBD lesion by 1 month. There is no history given about whether this timing difference is significant or if it was simply due to clinically-driven differences in when each lesion was sampled. Given the proximity of the CBD lesion to the original gall bladder cancer, it seems just as likely that all of these distant lesions were seeded from the CBD lesion. If this is the case, the author's conclusion about "metastasis to metastasis" spread does not seem strongly supported. It would be helpful if the authors could clarify this point and/or provide additional data to strengthen this conclusion.

      We appreciate your valuable comment. As addressed above, the manuscript has been modified to reflect your comments.

      Reviewer #2 (Public Review):

      Minsu Kang et al. analyzed 11 patients with gallbladder adenocarcinoma using multi-point sampling. Mutational analysis revealed evolutional patterns during progression where the authors found metastasis-to-metastasis spread and the migration of a cluster of tumor cells are common in gallbladder adenocarcinomas. The signature analysis detected signatures 22 (aristolochic acid) and 24 (aflatoxin) in metastatic tumors. Overall, the analyses are well-performed using established algorithms. However, the manuscript is highly descriptive. Therefore, it is very difficult to understand what the novel findings are.

      Major comments

      1) The sections "Evolutionary trajectories and expansion of subclones during regional and distant metastasis", "Polyclonal metastasis and intermetastatic heterogeneity", "Mutational signatures during clonal evolution", and "Discussion" are highly descriptive which makes it difficult to understand what the novel and/or important findings are. Those sections would profit from reorganization.

      Thank you for the important comment. We have reorganized the manuscript according to your comments.

      1) In the "Evolutionary trajectories and expansion of subclones during regional and distant metastasis" section, unnecessary sentences have been removed and Figures 2 and 3 have been changed to make it simpler to understand how subclones spread during metastasis.

      2) In the "Polyclonal metastasis and intermetastatic heterogeneity" section, after receiving feedback on statements that were conflicting (Reviewer #1's comment 4), we clarified the statements and removed any other extraneous sentences. Figures 2 and 3 have been changed to make it simpler to understand polyclonal metastasis and intermetastatic heterogeneity.

      3) In the "Mutational signatures during clonal evolution" section, after receiving comments that Figures 4B and 4C were confusing (Essential Revisions #6), we moved Figure 4B to Figure 4—figure supplement 2. Unnecessary sentences have been removed. We emphasized signatures 22 and 24 highlighted during metastasis. This result was validated by using two additional tools, Signal (Degasperi A et al., Nat Cancer. 2020) and MuSiCa (Diaz-Gay M et al., BMC Bioinformatics. 2018).

      4) In the Discussion section, duplicate descriptions and unnecessary extraneous explanations have been deleted. We emphasized that whereas aflatoxin and aristolochic acid had little impact on early cancer formation, their impacts could be more clearly seen in cancer cells that had already manifested (Page 13 Line 2-7). In addition, the limitations of the NGS test currently used in the clinical field were pointed out, and the clinical significance of this study was described (Page 13 Line 8-16).

      2) What would enhance this paper is more of a connection between the bioinformatics analysis and the biology. Although the authors analyzed multi-point sequencing data well, this paper lacks in-depth discussion. I understand that the results in the paper are "computationally" the most likely. However, the impact is lost by an incomplete connection to biology.

      As you commented, we analyzed the WES data obtained from patient samples by computational methods. In this study, we did not validate the various results using in vitro or in vivo models. However, we would like to emphasize the significance of our work because it is the first human study, covering the current theory of carcinogenesis from precancerous lesions to metastasis in GBAC. For example, polyclonal seeding has been previously confirmed in animal models (Cheung KJ et al., Science 2016). In humans, there have been reports in breast cancer (Ullah I et al., J Clin Invest. 2018) and colorectal cancer (Wei Q et al., Ann Oncol. 2017), but not in GBAC yet.

      3) In addition to the above concern, it is difficult to comprehend the cohort as the detailed information is lacking. I would suggest providing a brief table that contains the number of collected samples, frozen or FFPE, the clinical information, etc. by sample.

      Thank you for the constructive comment. Supplementary Table 1 was modified as you mentioned. It is now indicated from which organ, when, and by what method the tissue was obtained, what the tumor purity of the tissue was, and whether the tissue was fresh-frozen or FFPE. In addition, we updated the information about tissue acquisition sites in Figure 1A.

      4) The mutations with very low allele frequency (< 1%) are discussed in the manuscript. However, no validation data is provided. Please add a description of the accuracy of the mutation calling considering the following concerns.

      • FFPE samples are analyzed using the same method as frozen samples. FFPE contains much more artifacts. Is it adequate to use the same methods for both frozen and FFPE samples?

      Thank you for the valuable comment. We also considered the FFPE artifacts. However, we did not remove the possible artifacts. This part has been described above. Please see Essential Revisions #5.

      • How were those mutations with low allele frequency validated? Are those variants validated by other methods? Especially in FFPE.

      Thank you for the important comment. Firstly, we discarded any low-quality, unreliable reads and variants according to the pre-specified filtering criteria used in previous literature analyzed with the Genomon2 pipeline (Yokoyama A et al., Nature. 2019, Kakiuchi N et al., Nature. 2020, Ochi Y et al., Nat Commun. 2021). In the Method section, we have added an explanation for this part (Page 16 Line 5-12).

      As you commented, validation of low VAF mutation is required if the mutation is sample-specific. However, in this study, if a mutation in Supplementary File 1 has a low VAF in one sample, one of the other samples always has a higher VAF, which has passed our pre-specified filter. Therefore, validation is not required for that mutation. In addition, possible sequencing artifacts with low VAFs in FFPE tissues have been discussed above. Please see Essential Revisions #5.

      • Is the low variant allele frequency (0.2~1%) significantly higher than the background noise level?

      Thank you for the important comment. As you expected, FFPE samples had a higher number of sample-specific mutations than fresh-frozen ones in our study. However, we did not remove these mutations in the analysis of the FFPE samples. For a more detailed description, please see Essential Revisions #5.

      5) The authors compared mutational signatures divided by stages or timings. How are the signatures calculated although each sample has a distinct number of somatic mutations? Did the authors correct the difference?

      Thank you for the helpful comment. We classified all the mutations according to the specific criteria (Page 9 Line 9-18). For example, in Figure 4B (before revision, Figure 4C), mutations were classified by the timing of development during clonal evolution. After that, we could calculate the relative contributions of mutational signatures in each group using the three tools, Mutalisk (Lee J et al., Nucleic Acids Res. 2018), Signal (Degasperi A et al., Nat Cancer. 2020) and MuSiCa (Diaz-Gay M et al., BMC Bioinformatics. 2018). Although the number of mutations is different for each group, no additional correction was required because we compared the relative contributions among the groups.

      6) In distant metastasis tumors, signatures 22 and 24 are increased. Those two signatures are strongly associated with a specific carcinogen. Although the clinical information lacks, do the authors think that those patients were exposed to those chemicals after the diagnosis? Why do the authors think the two signatures increased in the metastatic tumors? Were those signatures validated by other methods?

      We appreciate your important and constructive comment.

      1) We think that the patients might have been exposed to aristolochic acid or aflatoxin before or after the cancer diagnosis. Aristolochic acid is an ingredient of oriental herbal medicine (Debelle FD et al., Kidney Int. 2008, Hoang ML et al., Sci Transl Med. 2013). Given that all the patients in our cohort are Korean, and a recent study found that Korean cancer patients are frequently exposed to herbal medicines (Kwon JH et al., Cancer Res Treat 2019), one possible explanation is that some patients might have been exposed to herbal remedies containing aristolochic acid. On the other hand, aflatoxin is known to be contained in soybean paste and soy sauce, which are widely used in Korean food (Ok HE et al., J Food Prot. 2007). Nevertheless, we believe that further investigation is required because it is difficult to determine the etiology of signatures 22 and 24 with this limited patient data.

      2) Summarizing the mutational signature results using the 3 different tools (Figure 4B and Figure 4—figure supplement 3), the signatures 22 and 24 are relatively rare in early carcinogenesis. However, the two signatures contributed more to late carcinogenesis and metastasis. Therefore, it is speculated that the two carcinogens appear to have little impact on the early stage of cancer development but might be highlighted in overt cancer cells. Further studies on this novel hypothesis are necessary.

      3) During the revision process, signatures 22 and 24 highlighted in the metastasis stage were validated by two additional tools, Signal (Degasperi A et al., Nat Cancer. 2020) and MuSiCa (Diaz-Gay M et al., BMC Bioinformatics. 2018) (Figure 4—figure supplement 3). We updated this part in the Result (Page 9 Line 18-21) and Discussion (Page 13 Line 2-7) sections.

      Reference for reply 1)

      • Kwon JH, Lee SC, Lee MA, Kim YJ, Kang JH, Kim JY, et al. Behaviors and Attitudes toward the Use of Complementary and Alternative Medicine among Korean Cancer Patients. Cancer Res Treat. 2019;51(3):851-60.

      7) Figures 2 are well-described. However, they are difficult for readers to fully understand. The colors for each clone are sometimes similar. The results of multi-time point and regional analyses in the cases with multiple sampling are not integrated. Driver mutations are separately described in the small phylogenetic trees. Evolutional patterns (linear or branching) are not described in the figures. Modifying the above concerns would improve the manuscript.

      We appreciate your important comment.

      1) In GB-S1, clones of similar colors were modified to be different colors.

      2) Figures 2 and 3 have been modified to make them easier to understand by separating time and space more clearly.

      3) Driver mutations are now indicated in both the phylogenetic tree and TimeScape result (Figures 2 and 3).

      4) Evolutional patterns (linear or branching) can be discovered by examining the phylogenetic tree in Figures 2 and 3. In addition, we described each patient's evolutionary pattern more clearly in the manuscript.

      8)"Among 6 patients having concurrent BilIN tissues, two patients were excluded from the further analysis because of low tumor purity in one patient and different mutational profiles between BilIN and primary GBAC in the other patient, suggesting different origins of the two tumors (Figure 1-figure supplement 2)." This seems cherry-picking. More explanation is necessary.

      • How is the tumor purity? Although the authors use 0.2% variant allele frequency as true mutation (for example Table 2), is the tumor purity lower than 0,2%?

      Thank you for the important comment. The calculated tumor purity of BilIN in the GB-S8 patient was 0.03 based on the WES data. We added this value to the manuscript (Page 6 Line 9) and Supplementary Table 1. Although variants were called in this case, the tumor purity was too low to estimate the allele-specific copy number, and thus sophisticated analysis as in other patients was not possible. In addition, the value of 0.2% in Table 2 is not the VAF, but cellular prevalence calculated by PyClone and CITUP. Although the value is low in the primary tumor, it is mentioned because it is high in metastatic lesions.

      • BilIN and GBAC of GB-S7 have some shared mutations. Why do the authors conclude that BilIN and GBAC have distinct origins? Do the authors think that those shared mutations are germline mosaic mutations?

      Thank you for the important comment.

      1) We think that the BilIN and GBAC of the GB-S7 patient are tumors of different origins because BilIN and GBAC of the GB-S7 patient have different truncal mutations (Figure 1—figure supplement 2C). This is a markedly different feature compared to BilIN and GBAC samples of other patients. We have added an explanation for this part to the Results section (Page 6 Line 9-11).

      2) We do not think that mosaicism occurred at the developmental stage. In addition, although some mutations were identified from both BilIN and GBAC, we cannot determine their importance because either one of the lesions had a very low VAF ranging from 0.001 to 0.04. If the mosaicism occurred only in the GB at the developmental stage, the VAF values of the shared mutations should be much higher than the current values, and the VAF values of the two BilIN and GBAC lesions should be similar.

      • Was the copy number profile compared between BilIN and GBAC?

      Thank you for the constructive comment. In this study, we obtained allele-specific copy numbers using Control-FREEC version 11.5 (Boeva V et al., Bioinformatics. 2012). The copy number of the mutations in the GB-S8 patient's BilIN could not be estimated by Control-FREEC due to low tumor purity (0.03). In the case of GB-S7, BilIN and GBAC were determined to be of a different tumor origin and thus disregarded from the analysis.

    1. Author Response

      Reviewer #1 (Public Review):

      It's here where my very mild (I truly liked this article - it is well done, well written, and creative) comments arise. The implications for stochastic strategies immediately emerge in the early results - bimodal strategies come about from the introduction of two variables. There is not enough credence given to the field of stochastic behavior in the introduction - the introduction focuses too much on previous models of predator-prey interaction, and in fact, Figure 1, which should set up the main arguments of the article, shows a model that is only slightly different (slight predator adjustment) that is eventually only addressed in the Appendix (see below). The question of "how and when do stochastic strategies emerge?" is a big deal. Figure 1 should set up a dichotomy: optimal strategies are available (i.e., those that minimize Tdiff) which would predict a single unimodal strategy. Many studies often advocate for Bayesian optimal behavior, but multimodal strategies are the reality in this study - why? Because if you consider the finite attack distance and inability of fish to evoke maximum velocity escapes while turning, it actually IS optimal. That's the main point I think of the article and why it's a broadly important piece of work. Further framing within the field of stochastic strategies (i.e., stochastic resonance) could be done in the introduction.

      We appreciate the comment provided by the reviewer. We changed the second paragraph of the introduction so as to focus more on the protean tactic (stochasticity). We added a new figure (Figure 1 in the new version) to conceptually show the escape trajectories (ETs) of a pure optimal tactic, a pure protean tactic, a combination of optimal and protean tactics, and an empirically observed multimodal pattern. We explained each tactic and described that the combination of the optimal and protean tactics still cannot explain the empirically observed multiple preferred ETs.

      The revised paragraph (L49-66) is as follows: Two different escape tactics (and their combination) have been proposed to enhance the success of predator evasion [16, 17]: the optimal tactic (deterministic), which maximizes the distance between the prey and the predator (Figure 1A) [4, 14, 15, 18], and the protean tactic (stochastic), which maximizes unpredictability to prevent predators from adjusting their strike trajectories accordingly (Figure 1B) [19-22]. Previous geometric models, which formulate optimal tactics, predict a single ET that depends on the relative speeds of the predator and the prey [4, 14, 15, 18], and additionally, predator’s turning radii and sensory-motor delay in situations where the predator can adjust its strike path [23-25]. The combination of the optimal tactic (formulated by previous geometric models), which predicts a specific single ET, and the protean tactic, which predicts variability, can explain the ET variability within a limited angular sector that includes the optimal ET (Figure 1C). However, the combination of the two tactics cannot explain the complex ET distributions reported in empirical studies on various taxa of invertebrates and lower vertebrates (reviewed in [26]). Whereas some animals exhibit unimodal ET patterns that satisfy the prediction of the combined tactics or optimal tactic with behavioral imprecision (e.g., [27]), many animal species show multimodal ETs within a limited angular sector (esp., 90–180°) (Figure 1D) (e.g., [4, 5, 28]). To explore the discrepancy between the predictions of the models and empirical data, some researchers have hypothesized mechanical/sensory constraints [17, 29]; however, the reasons why certain animal species prefer specific multiple ETs remain unclear.

      All experiments are well controlled (I especially liked the control where you varied the cutoff distance given that it is so critical to the model). Some of the figures require more labeling and the main marquee Figure 1 needs an overhaul because (1) the predator adjustment model that is only addressed in the Appendix shouldn't be central to the main introductory figure - it's the equivalent of the models/situations in Figure 6, and probably shouldn't take up too much space in the introductory text either (2) the drawing containing the model variables could be more clear and illustrative.

      (1) According to this comment and comment #11 from reviewer #2, we moved the two panels in the figure (Figure 1B and D in the original version) to Appendix-figure 1, and accordingly, we changed the first paragraph of the Model section so as to clearly describe that we focus on Domenici’s model in this study (L103-108).

      As for Figure 6 (Figure 7 in the new version) and related parts, we tempered our claims to clearly describe that our model has only the potential to explain the different patterns of escape trajectories observed in previous works. We would like to keep this figure in the main text because it is fundamental to explain the potential applicability of our model to other predator-prey systems.

      (2) To alleviate the burden for readers, we added the model variables to the figure and made them colored (Figure 2B in the new version).

      Finally, I think a major question could be posed in the article's future recommendations: Is there some threshold for predator learning that the fish's specific distribution of optimal vs. suboptimal choice prevents from happening? That is, the suboptimal choice is performed in proportion to its ability to differentiate Tdiff. This is "bimodal" in a sense, but a probabilistic description of the distribution (e.g., a bernoulli with p proportional to beta) would be really beneficial. Because prey capture is a zero-sum game, the predator will develop new strategies that sometimes allow it to win. It would be interesting if eventually the bernoulli description could be run via a sampler to an actual predator using a prey dummy; one could show that the predator eventually learns the pattern if the bernoulli for choosing optimal escape is set too high, and the prey has balanced its choice of optimal vs. suboptimal to circumvent predator learning.

      We thank the reviewer for this constructive comment. Actually, we are now developing a dummy prey system. We added the following sentence in the Discussion to mention future research.

      The added sentence (L496-499): Further research using a real predator and dummy prey (e.g., [48]) controlled to escape toward an optimal or suboptimal ET with specific probabilities would be beneficial to understand how the prey balances the optimal and suboptimal ETs to circumvent predator learning.

      Reviewer #2 (Public Review):

      First, it is unclear how the dummy predator is actuated. The description in the Methods section does not clearly address how rubber bands are used for this purpose.

      To clearly mention how the dummy predator was actuated by rubber bands, we added a figure (Figure 3-figure supplement 3B) and the following sentences.

      The added sentences (L608-611): The dummy predator was held in place by a metal pipe anchored to a four-wheel dolly, which is connected to a fixed metal frame via two plastic rubber bands (Figure 3—figure supplement 3B). The wheel dolly was drawn back to provide power for the dummy predator to strike toward the prey.

      Second, the predator's speed, which previous research has identified as a critical factor during predator-prey interactions, is not measured from the motion of the dummy predator in the experiments. Instead, it is estimated using an optimization algorithm that utilizes the mathematical model and the prey-specific parameters. It is unclear why the authors chose this method over measuring velocity from their experiments. Since the prey fish are responding to a dummy predator moving toward them at a particular speed during the interaction, it is important to measure the speed of the predator or clearly explain why estimating it using an optimization procedure is more appropriate.

      We chose this method (optimizing predator speed from the prey’s viewpoint) because there was no significant effect of predator speed on the escape trajectory in our experiment (L203-208). In other words, we considered that, at least in our case, the prey did not change the escape trajectory in response to the predator speed, and thus it would be more appropriate to use a specific predator speed estimated through an optimization algorithm from the prey’s point of view. It may be appropriate to use measured predator speed in other cases where the prey adjusts the escape trajectory in response to the change in predator speed. Therefore, we conducted a further analysis using actual predator speeds (both the predator speed at the onset of escape response, and the mean speed for the predator to cover the distance between the predator and prey). The results show that the model fit became worse when using measured predator speed per trial compared to the model using the fixed predator speed estimated through the optimization procedure (Table 3—source data 1; Figure 5—figure supplement 1). We added the above explanation in L219-226.

      One of the major claims of this article is that the model can explain escape trajectories observed in other predator-prey systems (presented in Figure 6). Figure 6 panels A-C show the escape responses of different prey in response to some threatening stimuli. Further, panels D-F suggest that the empirical data can be predicted with the model. But the modeling parameters used to produce the escape trajectories in D-F are derived from the authors' experiments with fish, instead of the experiments with the species shown in panels A-C.

      We thank the reviewer for this comment. We agree that this part in the previous version was an over-interpretation. Therefore, we tempered our statements to simply suggest that our approach has the potential to explain multiple ETs observed in other taxa. The revised sentences are as follows.

      Abstract (L27-30): By changing the parameters of the same model within a realistic range, we were able to produce various patterns of ETs empirically observed in other species (e.g., insects and frogs): a single preferred ET and multiple preferred ETs at small (20–50°) and large (150–180°) angles from the predator.

      Results (L395-407): Potential application of the model to other ET patterns. ...(sip)... To investigate whether our geometric model has the potential to explain these different ET patterns, we changed the values of model parameters (e.g., Upred, Dattack) within a realistic range, and explored whether such adjustments can produce the ET patterns observed in the original work. ...(sip)... These results indicate that our model has the potential to explain various patterns of observed animal escape trajectories.

      Discussion (L538-548): We show that our model has the potential to explain other empirically observed ET patterns (Figure 7). ...(sip)... Further research measuring the escape response in various species and applying the data to our geometric model is required to verify the applicability of our geometric model to various predator-prey systems.

    1. Author Response

      Reviewer #3 (Public Review):

      The authors use two-photon imaging to visualize various axonal organelle populations that they have virally labeled with fluorescent proteins, including DCVs and late endosomes/ lysosomes. The latter topic is a bit contentious, as the authors use two labels that tag potentially overlapping and not highly specific markers so that the nature of the tagged organelle populations remains unclear. Notably, the authors also have previously published a detailed account of how DCVs traffic in vivo, so the novelty is mostly in comparing the behavior of different organelles and the potential influence of activity.

      Overall, the reported results mostly corroborate the expectations from previous in vitro and in vivo work on these organelles and other cargoes, performed by the authors and their collaborators, as well as in many other laboratories:

      (i) Different organelles have different transport behaviors regarding speed, the ratio of anterograde to retrograde moving organelles, etc.

      (ii) Organelles move in different ways when they pass specific anatomical landmarks in the axons, such as presynaptic terminals.

      (iii) Activity of a neuron (here measured by calcium imaging) can impact the measured transport parameters, albeit in a subtle and mechanistically not well-defined manner. The chosen experimental design precludes a more detailed analysis, for example of the precise movement behavior (such as defining the exact pausing/movement behavior of organelles, which would require higher imaging speeds) or of a correlation of different organellar behavior at synaptic sites or during activity (which would require three-channel simultaneous imaging of two organelle classes plus a synaptic or activity marker).

      In summary, this publication uses sophisticated in vivo labeling and imaging methods to corroborate and complement previous observations on how different axonal organelles move, and what influences their trafficking.

      We thank the reviewer for the time dedicated to our manuscript. We are thankful for the critical and specific comments, which allowed us to further improve our manuscript. We agree that it would have been beneficial to have higher frame rates and there instead of two imaging channels. However, this would have further added technical complexity to an already complex experimental setup resolving fluorescent puncta with sizes below the resolution limit. And we are convinced that all our main conclusions are justified based on the imaging settings in the current data sets.

    1. Author Response:

      Reviewer #2 (Public Review):

      The study is well designed and provides exciting new insights into the plasticity of intracortical connections, (over-)compensating for the partial loss of thalamic inputs. To optically resolve the activity of single synapses in vivo during sensory stimulation is technically very challenging. It would be helpful to know whether the recordings were made in the binocular or monocular region of V1. The results argue against a generalized multiplicative upscaling of all inputs and suggest selective boosting of synapses that are part of sensory-driven subnetworks. However, it is not clear whether homeostatic plasticity occurred at the observed spines themselves or on the level of presynaptic neurons, which could then e.g. fire more bursts, leading to larger postsynaptic Ca transients. The possibility that thalamic inputs from the intact eye in layer 4 could be potentiated should be discussed. It would probably help to explain to the reader the layer-specific connectivity of V1 in the introduction, and why thalamic input synapses themselves were not optically monitored (may require adaptive optics). Technical limitations are a main reason why the conclusions are somewhat vague at this point ("... regulation of global responses"), this could be spelled out better.

      We thank the reviewer for these suggestions. We agree with the reviewer that we cannot determine (due to technical limitations) whether the changes are occurring pre- or post-synaptically or some combination (also related to the reviewer’s point 8). We have added this point to the discussion.

      "Finally, it is important to note that while we made these measurements in layer 5 pyramidal cells, the homeostatic changes mediated by TNF-α could occur outside of layer 5, including changes to upstream inputs or changes to the presynaptic responses, either through changes in presynaptic release (Vitureira et al., 2012) or through a change in activity patterns of the presynaptic cell (e.g., bursts compared to single spikes) (Linden et al., 2009)."

      One important point that was unclear in the earlier version of the manuscript is that the experiments conducted in visual cortex were done in the monocular visual cortex. As explained in comments to reviewer 1, there are not any visually-evoked responses following enucleation in our experiments.

      Reviewer #3 (Public Review):

      Weaknesses are largely restricted to suggested changes to the writing - specifically, there are additional explanations of the data whose discussion may strengthen the long-term impact of the manuscript.

      1) Most importantly, the hypothesis at the heart of this work (subset versus global processes) is framed as orthogonal to the status quo model of homeostatic processes (global). I suspect that adherents to the global argument would quickly point out that the current work is conducted in adult animals, and the majority of the homeostatic plasticity research (which forms the basis of the global model) is conducted in juvenile animals. This is an important distinction because the visual system is enriched in plasticity mechanisms during the ocular dominance critical period. Since Hubel and Wiesel at least, there is extensive evidence to suggest that sensory systems take advantage of critical periods to set themselves up in accordance with the statistics of the world in which they are embedded. The flip side of this is that sensory systems are far less readily influenced by experience once the critical period is closed (Vital-Durand et al., 1978, LeVay et al., 1980; Daw et al., 1992, Antonini et al., 1999, Guire et al., 1999, Lehmann and Lowel, 2008). Through this lens, one might predict that a key feature of the adult cortex is that sensory spines could benefit by being selectively protected from what would otherwise be global homeostatic processes. Either way, the manuscript can be read as if it is framing a show-down between the classical model and a newer, higher-resolution model. I worry that this will be interpreted as misleading without careful presentation/contextualization of the role of development in the introduction and a thorough dissection in the discussion. Currently, the first occurrence of the word, "adult", occurs in the methods, on page 27, line 512. "Juvenile" and "critical period" are not in the manuscript. The age of the animals in this study isn't mentioned until the methods (between P88 and P148 at the time of imaging).

      2) Goel and Lee (2007) seem quite pertinent here: they show that L2/3 neurons give rise to homeostatic regulation of mEPSCs in both juvenile and adult animals, but that the process is no longer multiplicative in nature once the animal is post-critical period. Multiplicity has been the basis of the argument for global change since Turrigiano 1998. Thus, the Goel and Lee finding seems to really bolster the current findings - and also perhaps reconcile the likelihood of a mechanistic difference between CP and adult homeostatic plasticity.

      We fully agree with the reviewer that our results are not in conflict with the developmental synaptic scaling literature. We have changed the text throughout the manuscript to highlight previous studies at different ages and made clear the age of the animals in this work (including in the abstract, introduction, results and discussion). We have also referenced Goel and Lee, 2007, which we agree should be included and thank the reviewer for pointing this out.

    1. Author Response

      Reviewer #1 (Public Review):

      While eDNA methods are becoming more established, there remains skepticism by many in the scientific community about the origins of the detected DNA (e.g. does it drift in from other areas or water layers?). If these concerns aren't addressed (i.e. by citing supporting literature on the fate of eDNA), the different biodiversity profiles between trenches could possibly be explained by differing oceanography. There is also some important methodological information that is missing from this manuscript. For example, sampling volumes will affect the amount of biodiversity detected, but it is not clear if sample volumes are consistent across depths and study areas. It was also not indicated whether field controls (blanks) were taken to assess the potential contamination of samples. Lastly, the literature in the eDNA field is progressing rapidly and there are some missing papers (e.g Thomsen et al. 2016, Canals et al. 2021, McClenaghan et al. 2020, Govindarajan et al. 2021, etc.) that are relevant to the technique used in this manuscript and the habitat studied.

      We are very grateful to this reviewer for providing such an in-depth review of our manuscript that allowed us to improve our manuscript significantly. We tried to follow explicitly almost every suggestion. In particular, we appreciate the input of other important missing literature that we readily included in this new version of our paper. The data on the volume of seawater filtered for each sample is given in Table Supplementary file 1a. Regarding field blanks, they were not collected per se. However, as part of the molecular protocol used (see Methodology) a “negative extraction control” was applied to check for possible contamination. Also, from the results themselves, we carefully checked for any indication of contamination that could have biased our results and conclusions.

      Reviewer #2 (Public Review):

      My primary critique is the near-absence of statistical analyses in the current version of the manuscript that are necessary to support the many descriptive observations made with a more formal hypothesis testing framework, as well. Developing an appropriate framework for such analyses throughout the paper, including consideration of the multiple tests that will be performed. This is important for many reasons, including by providing a more formal sense of uncertainty in the conclusions to readers, given the understandable sampling limitations. Planning and conducting these analyses will require considerable work.

      We thank the reviewer for raising such concern. We did include statistical analyses in part of our work. For example, all the phylogenetic analyses (using the IQ-tree software) implicitly include statistical analyses. The calculation of the Gini index in Figure 2 is also a statistical measure. However, we agree with the reviewer that some of our results lacked statistical analysis. We thus now include statistical significance to more statements in the text and additional panels to Figure S2—figure supplement 1 (with support on data in new tables in Supplementary file 1h and 1i) to illustrate the statistical support to some of our claims. We have also removed some unnecessary statement.

    1. Author Response

      Reviewer #2 (Public Review):

      This is a single RNA-seq analysis of traumatic brain injury (TBI) in mice that looks at recovery from milder TBI. It addresses an important question of why older individuals may have poor recovery. The investigators undertake unbiased analysis in both young and old mice and identify a number of macrophage, fibroblast, lymphocyte, and more specifically B cell inflammatory programs that are activated and some of which do not recover well in older mice. Taken together, these findings identify unique pathways that could be further investigated in functional studies to examine what immunologic mechanisms in the meninges may drive long-term problems from TBI. The models and analysis are well performed and compelling. This paper can serve as a resource for those who study brain immunology. Open questions include the following: 1) What exactly predisposes to such pro-inflammatory programs in the aged meninges? Epigenetic alterations?, 2) What are the effector mechanisms that negatively impact brain function, and 3) Can bioinformatic approaches reveal putative intercellular communication networks that would lend insight into the spatiotemporal sequence of events and ligand-receptor interactions?

      We are glad to hear that the Reviewer finds our work compelling, well performed and that it will be a good resource for those who wish to study brain immunology. The open questions that the Reviewer brings forth are very compelling areas of future investigation that we believe will help to shape and advance this field in the coming years.

    1. Reviewer #1 (Public Review):

      In this manuscript by Feng et al., the authors investigate the mechanism regulating the development of the levator veli palatini (LVP) in the posterior palate/pharyngeal region. While set up as a model to understand how myogenic progenitors migrate to discrete sites to form individual muscles, it is not clear how applicable the findings are to other subpopulations, though this is not a weakness. The mechanisms driving LVP development are of great interest to a broad group of developmental biologists, as LVP malformation is a common problem even in mild cases of cleft palate. The authors hypothesized that the perimysial population within palatal mesenchyme cells is a niche required for pharyngeal muscle development. Using exquisite analysis of scRNA-seq data from E13.5-E15.5 palatal cells, the authors illustrate that TGFb signaling is likely involved in perimysial cell development, using gene expression analysis in wild-type palatal sections to show that TGFb signaling precedes the arrival of myogenic cells. Inactivating ALk5 in palatal mesenchyme cells results in failure of LVP formation. The authors continue by identifying a number of transcription factors that presumably function downstream of TGFb signaling that drive LVP development. Among these are Fgf18, in which SMAD sites observed in the upstream region were validated to bind Smad2/3. The authors also identify Creb5 as a potential regulator of Fgf18. Overall, this is a remarkable use of scRNA-seq data, in which findings are supported by subsequent in vivo analysis of gene function using knockout mouse models. These findings will drive further analysis of LVP development and may shed light on the myogenesis of pharyngeal muscle in general.

      Strengths

      1) The treatment of scRNA-seq data using a variety of bioinformatic programs illustrates the utility of this type of data when using sufficient analysis software. The description of the approach is very clear and concise and the controls appear excellent. Further, the use of multiple time points further improves the analysis.

      2) The focus of perimysial cell expression patterns supports the hypothesis of the authors, though as with this type of data, one probably can make a story out of several pathways. The use of RNAscope to carefully examine where TGFb signaling in the posterior pharynx occurs between E12.5 and E16.5 is critical to the setup of this manuscript and is well done. Further aiding the interpretation of these results are cartoons associated with the staining, which illustrate where the staining is occurring, though never over-stating the observed patterns.

      3) Careful histological analysis illustrates the poor myogenic differentiation in the LVP of OSr1-Cre;Alk5fl/fl embryos.

      4) Identifying that TGFb is more important for regulating late perimysial cell development is important in identifying the targets of TGFb signaling.

      5) The use of CellChat to identify sending and receiving cells is well done and further supports the late function of TGFb signaling, in this context working through Fgf18 and Lama4.

      6) The attempt to build a signaling network again using CellChat (Figure 6) is admirable, though there are a few caveats to that approach (see below).

      7) While bead implant studies have been used for 40 years, the approach of culturing a piece of the pharynx and then performing a bead implant to prove that Fgf18 can positively influence myogenic development is admirable.

      Weaknesses

      1) In general, the authors are careful to not suggest that staining is significant unless showing quantification, though, at several points, this is not true.

      2) The authors identify five putative Smad2 sites upstream of Fgf18, using one of them in a Cut and Run assay whose results suggest enhanced Smad2/3 binding. The problem is that this likely would have worked with the other Smad sites and probably would have worked for any other putative site that one might pick. Proving that a putative site can be bound by its cognate transcription factor is not the same as proving that this occurs in vivo and is sufficient to control the process of LVP development. One would need reporter assays using that TF binding site to better support the points being made by the authors.

      3) In a similar manner, the authors try to define which factors might function with TGFb signaling to regulate myogenic development. Using SCENIC, the authors found a number of genes that might be involved in perimysial fibroblast development. Of these, they illustrate that Creb5 siRNA knockdown decreases Fgf18 expression in cultured palates. The focus on Creb5 was based on it showing, "the most specific expression patterning the late perimysial cells (Figure 6H)....". In fact, Creb5 appears the most broad, appearing to be expressed across the entire LVP, not just in the area where myogenic precursors are found. Thus, any statement or discussion about Creb5 being a direct regulator of Fgf18 should be removed probably needs to be reworded. However, the second problem is that Creb5 knockdown reducing Fgf18 expression does not prove any direct regulation. Both of these are rather circuitous arguments.

      4) While the disorganization of myogenic fibers in the posterior LVP is somewhat obvious, it is not as clear as the authors suggest. This change (which I believe) needs to be better quantified (length, width, area, etc.).

      We thank the reviewer for these “Public Review” comments. For point 1, we have added more quantification for clarification and rephased the wording when quantification was not performed. For point 4, we added measurement to quantify the changes of volume and cross-section area of the LVP in Osr2Cre;Fgf18fl/fl mice (Figure 7M-V).

      Reviewer #2 (Public Review):

      In this study, the authors take advantage of unbiased scRNA-seq datasets of the developing mouse soft palate that they previously reported and performed a new bioinformatic analysis to identify differential signaling pathway activities in the heterogeneous palatal mesenchyme. They found a strong association of TGF-beta signaling pathway activity with the perimysial cells and validated through immunofluorescent detection of pSmad2, which led to their hypothesis that TGF-beta signaling in the perimysial cells might regulate palatal muscle formation. They generated and analyzed Osr2-Cre;Alk5fl/fl mice and showed those mice have cleft soft palate and disruption of the levator veli palataini (LVP) muscle. They then performed a comparative scRNA-seq analysis of the soft palate tissues from E14.5 Osr2-Cre;Alk5fl/fl and control embryos and showed that the Osr2-Cre;Alk5fl/fl embryos exhibited defects in the perimysial cells, in particular reduction in Tbx15+ perimysial fibroblasts that directly associate with the LVP muscle progenitors. The FGF18 is one of the most highly enriched signaling molecules in the perimysial cells and showed that the Osr2-Cre;Alk5fl/fl embryos exhibited reduced Fgf18 expression together with loss of MyoD+ myoblasts in the prospective LVP region. Further data showed that pSmad2 bound in the Fgf18 promoter region in the developing soft palate tissues. In addition, bioinformatic gene regulatory network analysis of the scRNA-seq data identified Creb5 as a potential tissue-specific transcription factor in the perimysial cells and RNAi knockdown assays in palatal mesenchyme culture suggested that Creb5 is required for Fgf18 expression. Further studies identified a subtle deficiency in LVP in Osr2-Cre;Fgf18fl/fl mice and showed that exogenous Fgf18 bead implantation in explants of E14 Osr2-Cre;Alk5fl/fl embryonic head increased the MyoD+ myoblast population in the prospective LVP region. The authors concluded that TGF-beta signaling and Creb5 cooperatively regulate Fgf18 to control pharyngeal muscle development. While the study used multiple complementary approaches and the data presented are solid, important questions need to be addressed to resolve reasonable alternative explanations of the data to the authors' main conclusion.

      We thank the reviewer for the evaluation and suggestions. Responses to each of the suggested revisions are detailed below.

      Major points:

      1) TGF-beta signaling is known to be crucial for neural crest-derived palatal mesenchyme cell proliferation from E13.5 to E14.5. The Osr2-Cre;Alk5fl/fl mutant embryos exhibited obvious disruption of LVP myogenesis and reduced soft palatal shelf size at E14.5 (Fig3-Sup2A-D and Fig 4H-K). The cellular and molecular defects likely started prior to E14.5. Thus, it is important to examine at earlier stages (E13.5/E14.0) whether the palatal mesenchyme was already defective in cell proliferation/survival and/or perimysial cell marker expression, including Creb5 and Tbx15, to resolve whether the primary defect in the Osr2-Cre;Alk5fl/fl palatal mesenchyme could be a reduction in perimysial progenitor cell proliferation and/or differentiation of the myoblast-associated subset, for which Tbx15 and Fgf18hi act as marker genes rather than direct molecular targets. Furthermore, the apparent loss of Tbx15+ cells coincided with a specific reduction of Fgf18 expression in the myoblast-associated perimysial cells (Fig 4J/K versus Fig 5H-K), which raises the possibility that TGF-beta signaling regulates the differentiation of the Tbx5+ population from the mesenchymal progenitors while the reduction in Fgf18 expression might be a secondary consequence of the cellular defect. The data in Fig 6O showing a lack of significant induction of Fgf18 expression in the palatal mesenchyme culture in both control and Creb5-RNAi cells is also consistent with this alternative explanation.

      We thank the reviewer for the valuable suggestion to identify the primary defects of the perimysial cells. We compared the expression of Creb5, Tbx15 and Fgf18 as well as Smoc2 in E13.5-E14.5 palatal mesenchyme from control and Osr2-Cre;Alk5fl/fl mice (Osr2Cre;Tgfbr1fl/fl mice). We found that expression of Creb5 is prominent from E13.5 to E14.5 and is not affected in Osr2Cre;Tgfbr1fl/fl mice, suggesting that Creb5 may not be a downstream factor but just a “partner” for TGF-β signaling. At E13.5, Tbx15 is not expressed, while Smoc2 is expressed extensively in the palatal mesenchyme but is not affected in the Osr2Cre;Tgfbr1fl/fl mice. In contrast, Fgf18 is expressed as early as E13.5 and this expression was already reduced in the palatal of Osr2Cre;Tgfbr1fl/fl mice relative to controls at this stage, suggesting the changes of Fgf18 expression are primary and precede changes in the perimysial populations. While the proliferation and apoptosis at E13.5 remain unchanged in Osr2Cre;Tgfbr1fl/fl mice, Smoc2 expression in the palate starts to be reduced at E14.0 in Osr2Cre;Tgfbr1fl/fl mice. This suggests that TGF-β signaling is required for the activation of Smoc2 during E13.5-E14.0. In parallel, Tbx15 expression is just starting to be activated in a few cells at E14.0 and this expression increased between E14.0-E14.5 in the control but failed to increase in Osr2Cre;Tgfbr1fl/fl mice. This suggests that TGF-β signaling is also required for the activation of Tbx15 during E14.0-E14.5. Thus, loss of TGF-β signaling leads to differentiation defects of both Smoc2+ and Tbx15+ perimysial cells. For Figure 6O, we performed a time-course experiment of TGF-β induction and found a significant increase of Fgf18 expression after 4 to 18 hours of treatment (instead of 24 hours used in previous experiments), with more obvious changes at 4 hours, confirming the early response of Fgf18 expression to TGF-β induction. These results have been added to Figure 4-figure supplement 2, Figure 5I-L, 5U, Figure 6-figure supplement 2, and Figure 6C.

      2) Since the Osr2-Cre;Fgf18fl/fl mice exhibited much subtler palatal and LVP defects than the Osr2-Cre;Alk5fl/fl mice even though the latter still had a lot of Fgf18-expressing perimysial cells at E14.5, Fgf18 is likely a minor player in the TGF-beta mediated gene regulatory network regulating LVP formation. The major players acting downstream of TGF-beta signaling in the palatal mesenchyme, that control initial LVP progenitor migration to and/or proliferation in the soft palate region, remain to be identified and functionally validated. Whether and how Fgf18 directly regulates the perimysial-myoblast interaction is also not known.

      We agree with the reviewer that the phenotype of Osr2-Cre;Fgf18fl/fl mice is much milder than that of Osr2-Cre;Alk5fl/fl mice, as we postulate that Fgf18 is just one of several perimysial-derived signals that may be affected. It will be of great interest to explore the function of other players in future studies. However, we are more inclined toward the possibility that there may be no single “major” player but rather a combination of many signals associated with different aspects of the muscle development. For example, loss of Fgf18 seems to mainly affect the Myf5+ cell proliferation in Osr2-Cre;Fgf18fl/fl mice (Osr2Cre;Fgf18fl/fl mice), as we do not observe any differentiation defect except the reduced muscle size. It is likely that other factors may also play specific functions in specific subpopulations as well. To clarify whether Fgf18 can directly affect the myogenic cell fate, we treated C2C12 mouse myogenic cells with exogenous FGF18 and found that this treatment could indeed significantly increase the proliferation of these cells. We have added these results to Figure 7—figure supplement 2.

      3) While the title and the main conclusion of this manuscript imply a crucial role of Creb5 in the regulation of pharyngeal muscle development, there is no data supporting such a crucial role. Do Creb5-/- mice have specific defects in pharyngeal muscle development?

      We thank the reviewer for this insight. We agree that it is very likely that Creb5 itself may have many roles in the regulation of palatal development or pharyngeal muscle development, given the prominent expression of Creb5 throughout soft palate development and in other myogenic sites of the pharyngeal muscles. Creb5-/- mice (reported as Cre-bpa-/- mice) die immediately after birth; however, the detailed phenotype of this mice was merely described as “data not shown” in a previous publication and defects of craniofacial development in these mice remain unclear (Maekawa et al., 2010). In this study, we focused on the role of Creb5 as a partner of TGF-β signaling, but we plan to generate a Creb5fl/fl mouse model to thoroughly evaluate Creb5’s functions in craniofacial development as an independent study following this work.

      4) Data in Fig 6 are not sufficient to conclude that TGF-beta signaling and Creb5 cooperatively regulate Fgf18. The TGFb1 treatment did not significantly induce Fgf18 expression in either the control or Creb5-RNAi palate mesenchyme cells (Fig 6O). No data regarding how they act cooperatively to regulate Fgf18 expression.

      We appreciate the reviewer for carefully reviewing our data. We re-evaluated the temporal response of Fgf18 expression following TGF- induction and found a significant increase of Fgf18 expression 4 hours post-treatment (instead of 24 hours post-treatment as used in previous experiments). We repeated the Creb5-siRNA treatment experiment using the new experimental condition and replaced the previous Figure 6O with new results showing a significant increase of Fgf18 after TGF-β induction, which was attenuated by Creb5-RNAi treatment, suggesting a requirement of Creb5 for TGF-β-mediated Fgf18 expression. The new result is now included in Figure 6Q.

      Reviewer #3 (Public Review):

      In this study, the authors investigated cell-cell communication between perimysial cells and skeletal muscle progenitors during soft palate development in the mouse. The authors have previously reported on the development of this structure and here they propose that a TGF-β signaling and Creb5 act to regulate Fgf18, and this pathway regulates pharyngeal muscle development through the indicated cell populations. The study is of high quality, very nicely illustrated, and uses multiple approaches including inferences from single cell transcriptomics, validations on sections, and lineage-specific gene activations. In addition, the authors successfully optimized an organ culture system from thick sections to test locally the role of FGF signaling (bead implantation). The results largely confer with the conclusions and provide a valuable example of how subjacent cell populations cooperate to establish an embryonic structure.

      We thank the reviewer for the evaluation and suggestions.

    1. Author Response

      Reviewer #3 (Public Review):

      The PCNT gene is found on human chromosome 21, and the same group previously showed that its increased expression is associated with reduced trafficking to the centrosome and reduced cilia frequency, which suggests a possible connection between cilia and ciliary trafficking, SHH signaling, and Down syndrome phenotypes. Jewett et al build upon this prior work by closely examining the trafficking phenotypes in cellular models with different HSA21 ploidy, or its mouse equivalent, thereby increasing the copy number of PCNT (3 or 4 copies of HSA21). They show that most of the trafficking defects can be reversed through the knockdown of PCNT in the context of HSA21 polyploidy. They also begin to examine the in vivo consequences of these trafficking disruptions, using a mouse model (Dp10) that partially recapitulates trisomy 21, including an increased copy number of PCNT. While I think this work advances our understanding of the trafficking defects caused by increased PCNT and has significant implications for our understanding of the cellular basis of a major hereditary human disorder, some improvements can be made to strengthen the conclusions and improve readability.

      Major points:

      I'm a little confused by the authors' conclusion that the increased PCNT levels in T21 and Q21 result in delayed but not attenuated ciliogenesis. The data show lower percentages of ciliated cells at all time points analyzed (Fig 1E) by quite a large margin in both T21 and Q21. Do the frequencies of cilia in the T21 or Q21 cells ever reach the same level as D21, say after 48-72 hours? If not it seems like not simply a delay. A bit more clarity about this point is needed.

      We have now performed a ciliation time course in RPE1 D21, T21, and Q21 cells over 7 days. Our new data confirms that increasing HSA21 dosage delays but does not abolish ciliogenesis (Fig S1H). By day 3 of serum depletion, D21 and T21 cells reach similar ciliation frequencies, and after 4 days all three cell lines reach similar ciliation frequencies.

      The in vivo analysis of the cerebellum was interesting and important but it felt a bit incomplete given that it was a tie between the cell biology and a specific DS- associated phenotype. For example, it is interesting that the EGL of the P4 Dp10 pups is thinner. Does this translate into noticeable defects in cerebellar morphology later? Is there a reduction in proliferation that follows the reduced cilia frequency? I think it would be possible to look at the proliferation and cerebellar morphology at some additional stages without becoming an overly burdensome set of experiments. At a minimum, are there defects in cerebellar morphology at P21 or in the adult mice? The authors allude to developmental delays in these animals - maybe that complicates the analysis? But additional exploration and/or discussion on this point would help the paper.

      We have now analyzed P21 animals and found no significant differences in ciliation frequency or gross cerebellar morphology at this age. This is consistent with our new tissue culture data demonstrating that HSA21 ploidy delays but does not abolish ciliogenesis. We cannot rule out long term changes in neuronal processes or glial cells, but we believe this analysis is outside the scope of this paper.

      It was a bit unclear to me why specific cell lines were used to model trisomy 21 and why this changed part way through the paper. I understand the justification for making the Dp10 mice- to enable the in vivo analysis of the cerebellum, but some additional rationale for why the RPE cell line is initially used and then the switch back to mouse cells would improve readability.

      The rationale for switching to MEFs was twofold. First, Shh ciliary signaling cannot be easily studied in RPE1 cells. Therefore, ciliary function via Smoothened localization or GLI1 transcription, needed to be performed in a different cell line and the most commonly used line is MEFs. Second, the Dp mice allowed us to tease apart contributions to cilia defects from separate regions of HSA21. We have worked to clarify this point in the text.

    1. Author Response

      Reviewer #2 (Public Review):

      Grasses develop morphologically unique stomata for efficient gas exchange. A key feature of stomata is the subsidiary cell (SC), which laterally flanks the guard cell (GC). Although it has been shown that the lateral SC contributes to rapid stomatal opening and closing, little is known about how the SC is generated from the subsidiary mother cell (SMC) and how the SMC acquires its intracellular polarity. The authors identified BdPOLAR as a polarity factor that forms a polarity domain in the SMC in a BdPAN1-dependent manner. They concluded that BdPAN1 and BdPOLAR exhibit mutually exclusive localization patterns within SMCs and that formative SC division requires both. Further mutant analysis showed that BdPAN1 and BdPOLAR act in SMC nuclear migration and the proper placement of the cortical division site marker BdTANGLED1, respectively. This study reveals a unique developmental process of grass stomata, where two opposing polarity factors form domains in the SMC and ensure asymmetric cell division and SC generation.

      The findings of this study, if further validated, are novel and interesting. However, I feel that the data presented in the current manuscript do not fully support some crucial conclusions. The lack of dual-color images is the weakest point of this study. If it is technically impossible to add them, alternative analyses are needed to validate the main conclusions.

      1) Is BdPOLAR-mVenus functional? Although the authors interpret that weak BdPOLAR-mVenus expression partially rescued the bdpolar mutant phenotype in Fig. S4D, the localization pattern visualized by BdPOLAR-mVenus may not be completely reliable with this partial rescue activity.

      This is indeed a valid point. The partial complementation of weakly expressing translational reporters (Figure 3–figure supplement 1D) and the weak effect of BdPOLAR-mVenus overexpression lines (Figure 3–figure supplement 1J) at least suggest partial functionality which is strongly dependent on dosage. Yet the localization pattern and the temporal dynamics might indeed not fully reflect the spatiotemporal dynamics of the endogenous BdPOLAR. This criticism is, however, true for any transgenic reporter line–even when fully complementing–as the requirement for dosage, stability, and turnover likely varies strongly between different protein classes and functions.

      Nonetheless, we have added a sentence on p. 7, which mentions this potential caveat.

      2) Regardless of the functionality of the tagged protein, the authors need to provide more information on their localization. For example, is there a difference in polarity pattern depending on expression level? Does overexpressed BdPOLAR-mVenus invade the BdPAN1 zone? In such cases, might the loss of BdPOLAR polarity in the bdpan1 mutant be a side effect of overexpression, not PAN1 exclusion? Does BdPOLAR expression (no tag) show a dose-dependent effect, similar to the mVenus-tagged protein?

      The difference in polarity patterns in bdpan1 mutants and wild-type does not depend on expression level. BdPOLAR-mVenus was crossed into bdpan1 and mutant and wild-type siblings in the F2 generation were analyzed. This means that the data presented in Fig. 3E and F show exactly the same transgene insertion line in wt and bdpan1 and were imaged with the same setting for comparability. Therefore, the difference in localization is not due to different expression levels but indeed reflects a PAN1-dependent effect.

      To address if BdPOLAR without a tag is also sensitive to dosage, we have generated an untagged complementation line that includes the untagged, genomic locus of BdPOLAR including promoter (-3.1kb) and terminator (+1.1kb). Yet, even though this construct is much better at rescuing the mutant, we still see remaining defects in T0 lines (Figure 3–figure supplement 1K) suggesting that even without a tag we cannot fully recapitulate wild-type functionality. Yet, to actually measure protein levels of untagged BdPOLAR, we would need to raise an antibody against BdPOLAR, which we think is clearly out of the scope of this study.

      3) A major conclusion of this study was that the polarity domains of BdPOLAR and BdPAN1 are mutually exclusive. However, not all the cells in the figures were consistent with this statement. For example, the BdPOLAR signals at the GMC/SMC interphase appear to match BdPAN1 localization (compare 0:03 s in Video 1 and 0:20 s in Video 2 [top cell]). The 3D rendered image in Fig. 2F shows that BdPOLAR is excluded near the GMC on the front side of the SMC, where BdPAN1 is not localized. Some cells did not exhibit polarization (Fig. 3A, bottom left; Fig. 3E, bottom left). The most convincing data are the dual-color images of these two proteins. Otherwise, a sophisticated image analysis is required to support this conclusion.

      We agree that dual-color image analysis would have provided the most convincing data. As mentioned in our answers to the reviewing editor and reviewer 1, we have generated a dual marker line (BdPAN1p:BdPAN1-CFP; BdPOLARp:BdPOLAR-mCitrine), yet the BdPAN1-CFP signal (compared to mCitrine signal) was too weak to visualize the proximal BdPAN1 domain.

      This issue was also raised by reviewer 1 and deemed an essential revision. To determine how BdPOLAR and BdPAN1 relate spatially to each other, we have added data in Figure 2E where we manually traced mature SMC outlines to determine BdPOLAR-mVenus and BdPAN1-mCitrine occupancy along the SMC’s circumference. This confirmed that the polarization is indeed opposite yet not perfectly reciprocal (see details above, Essential Revisions #1).

      Finally, we realized that the 3D image renderings were more confusing than helpful and we removed them from the revised version.

      4) Another central conclusion was that BdPOLAR was excluded at the future SC division site, marked with BdTANGLED1. However, these data are also not very convincing, as such specific exclusion cannot be seen in some figure panels (e.g., Fig. 3A, bottom left; Fig. 3E, all three cells on the left). If dual-color imaging is not feasible, a quantitative image analysis is needed to support this conclusion.

      As for point 3, this was also criticized by reviewer 1 and deemed an essential revision by the reviewing editor.

      To determine whether the absence of BdPOLAR signal and the presence of BdTAN1 signal colocalize, we again manually traced mature SMC outlines to determine BdPOLAR-mVenus and BdTAN1-mCitrine occupancy along the SMC’s circumference. We plotted the relative average fluorescence intensity in Figure 4G-I nicely showing that BdTAN1 indeed resides in the BdPOLAR gaps above and below the GMC (again, details above, Essential Revisions #2).

      5) I could not find detailed imaging conditions and data processing methods. Are Figs. 2B and 2E max-projection or single-plane images? If they are single-plane images, which planes of the SMC are observed? In addition, how were Figs. 2C and 2F rendered? (e.g., number of images, distance intervals, processing procedures). This information is important for data interpretations.

      We agree that we might not have provided sufficient imaging condition details and have added more details regarding image acquisition in the method part (p. 20). We always use a consistent depth and show the midplane of SMCs. As mentioned above, we removed Figs. 2C and 2F and the supplemental movies as these data did not seem to be helpful.

      6) [Minor point] The authors should clearly describe where BdPAN1 is expressed and localized. Is it expressed in the GMC and localized at the GMC/SMC interface? Alternatively, is it expressed and localized in the SMC?

      BdPAN1 is expressed throughout the epidermis but starts to strongly accumulate at the GMC/SMC interface. According to the literature (Cartwright et al 2009 with immunostainings against ZmPAN1 and Sutimantanapi et al. 2014 with PAN1 and PAN2 reporter) and our own observations (Fig. S3), this accumulation occurs in the SMC rather than in the GMC. In Fig. S3A, third panel, second GMC from the top, for example, one can see that the early PAN1 polarity domain expands beyond the GMC/SMC interface suggesting that it is indeed forming in SMCs rather than in GMCs. We have specified this in the text more clearly now (p. 5).

    1. Author Response

      Reviewer #1 (Public Review):

      The research investigates the genetic basis for resistance to high CO2 levels in the human pathogenic fungus Cryptococcus neoformans. Screening collections of over 5,000 gene deletion strains revealed 96 with impaired growth, including a set of genes all related to the same RAM signaling pathway. Further genetic dissection was able compellingly to place where this pathway lies relative to upstream inputs and through the isolation of suppressor mutants as potential downstream targets of the pathway. Given the high levels of CO2 encountered by fungi in the human host, this work may provide new directions for the control of disseminated fungal disease.

      The research presents both strengths and weaknesses.

      Strengths include:

      (1) One of the largest scale analyses of genes involved in growth under high CO2 concentrations in a fungus, revealing a set of just under 100 mutants with impaired growth.

      (2) Elegant genetic epistasis analysis to show where different components fit within a pathway of transmission of CO2 exposure. For example, over expression of one of the kinases, Cbk1, can overcome the CO2-sensitivity of mutations in the CDC24 or CNA1 genes (but not in the reciprocal overexpression direction).

      (3) Isolation of suppressor mutations in the cbk1 background, now able to grow at high CO2 levels, was able to lead to the identification of two genes. Follow up characterization, which included examining in vitro phenotypes, gene expression analysis, and impact during mouse infection was able to reveal that the two suppressors restore a subset of the phenotypes impacted by mutation of CBK1. Indeed, one conclusion from this careful work is that the reduced virulence of the cbk1 mutant is not due to its sensitivity to high levels of CO2, perhaps an unexpected finding given the original goals of the study towards linking CO2 sensitivity with decreased virulence.

      Weaknesses include:

      (1) What is the rationale for examining gene expression using the NanoString technology of 118 genes rather than a more genome-wide approach such as RNA-sequencing?

      (2) Without additional species examined, some of the conclusions about differences in impact between ascomycetes and basidiomycetes might instead reflect differences between species. For example, RAM mutants in other strains of C. neoformans do not exhibit so strong a temperature sensitive phenotype. Or to extend the comparison further, one might assume given the use of CO2 for Drosophila manipulations that the RAM pathway components in an insect would not be required for surviving high CO2.

      (3) Given the relative ease of generate progeny of this species, it would have been informative to explore if the suppressors of cbk1 also suppressed the loss of genes like CDC24, CNA1, etc, equivalent to the experiment performed of overexpression of CBK1 in those backgrounds.

      We thank the reviewer for the kind summary of our work and the highlights of the major findings. We chose NanoString because we have already generated a probe set of 118 genes that are differentially expressed in response to CO2 based on RNA-seq profiles of multiple natural cryptococcal isolates in a separate study. Nanostring allowed us to focus on CO2 relevant transcripts and do multiple replicates and conditions in a way that is not practical using RNA-Seq.

      Although the RAM pathway has not been extensively characterized in different species of Cryptococcus, we do know that RAM pathway mutants lead to pseudohyphal growth in multiple strain backgrounds including two different species of Cryptococcus (Magditch, Liu, Xue, & Idnurm, 2012; Walton, Heitman, & Idnurm, 2006). We have added corresponding references and discussed this point on lines 167-169.

      We agree with the reviewer that it would be interesting to test the effects of the cbk1Δ suppressor mutations in the backgrounds of other CO2-sensitive gene knockout strains. This is part of our plan for future investigation in characterizing the signaling pathways involved in CO2 tolerance.

      Reviewer #2 (Public Review):

      In the paper by Chadwick et al., the authors identify the molecular determinants of CO2 tolerance in the human fungal pathogen Cryptococcus neoformans. The authors have screened a collection of deletion mutants to identify the genes that are sensitive at 37oC (host temperature) and elevated CO2 levels. The authors identified that the genes responsible for CO2 sensitivity are involved in the pathways responsible for thermotolerance mechanisms such as Calcineurin, Ras1-Cdc24, cell wall integrity, and the Regulator of Ace2 and Morphogenesis (RAM) pathways. Moreover, they identified that the mutants of the RAM pathway effector kinase Cbk1 were most sensitive to elevated temperature and CO2 levels. This study uncovers the previously unknown role of the RAM pathway in CO2 tolerance. Transcriptome data indicates that the deletion of CBK1 results in an alteration in the expression of CO2-related genes. To identify the potential downstream targets of Cbk1, the authors performed a suppressor screen and obtained the spontaneous suppressor mutants that rescued the sensitivity of cbk1 mutants to elevated temperature and CO2. Through this screen, the authors identified two suppressor groups that showed a modest improvement in growth at 37˚C and in presence of CO2.

      Interestingly, from the suppressor screen, the authors identified a previously known interactor of Cbk1 which is SSD1, and an uncharacterized gene containing a putative Poly(A)-specific ribonuclease (PARN) domain named PSC1 (Partial Suppressor of cbk1Δ) which acts downstream of Cbk1. Deletion of these two genes in cbk1 null mutants rescued the sensitivity to elevated CO2 levels and temperature but did not fully rescue the ability to cause disease in mice.

      This study highlights the underappreciated role of the host CO2 tolerance and its importance in the ability of a fungal pathogen to survive and cause disease in host conditions. The authors claim to gain insight into the genetic components associated with carbon dioxide tolerance. The experimental results including the data presented, and conclusions drawn do justice to this claim. Overall, it is a well-written manuscript. However, some sections need improvement in terms of clarity and experimental design.

      • One major drawback of the study is the virulence assay performed to test the ability of cbk1 mutants to cause the disease in the mouse model. The cbk1 null mutants are thermosensitive in nature. Using these mutants, establishing the virulence attributes in mice would undermine the mutants' ability to infect mice as they won't be able to survive at the host body temperature.

      • The rationale for choosing the genes to test further is not clear in two instances in the study. a) From a list of 96 genes, how do the authors infer the pathways involved? Was any pathway analysis performed that helped them in shortlisting the pathways that they subsequently tested? A GO term analysis of the list of genes identified through the genetic screen would be more helpful to get an overview of the pathways involved in CO2 tolerance. b) The authors do not clearly mention why they chose only four genes to test for the CO2 sensitivity out of 16 downregulated genes identified from the nano string analysis.

      • It would be more useful to the readers if the authors could also include a thorough analysis of the presence of the putative PARN domain-containing protein across various fungal species rather than mentioning that it is only observed in C. neoformans and S. pombe. Also, the authors may want to discuss the known role(s) of SSD1, if any, in pathogenic ascomycetous yeasts so that the proposed functional divergence is supported further.

      We are glad that the reviewer appreciated the approach, the findings, and the significance of this research, and we are grateful for the helpful suggestions to improve the manuscript.

      To remove temperature sensitivity as a variable when testing virulence, we have added a new infection model in the revised manuscript to test the cbk1Δ mutant and its suppressors. This infection model uses the Galleria mellonella larvae as a host. G. mellonella larvae are commonly used to test virulence for temperature sensitive strains as the body temperature of the larvae is similar to that of the environment. We performed cryptococcal infection in this model and the larvae were kept at 30°C rather than at 37°C. The results of these experiments are now described in results section 5 and shown in Figure 6 of the manuscript. The data using the larva-infection model supports our original conclusion about the virulence of these strains observed in mouse models.

      We performed a GO term analysis of the hits from our screening, but did not find any significant or outstanding pathways. From our list of 96 genes, we chose to focus on the RAM pathway because the mutants were among the most sensitive to CO2. We have added an explanation for the genes we decided to test for host CO2 level sensitivity from the 16 downregulated genes on lines 139-141.

      Through Blast searching, we have found that the PARN domain-containing protein has homologs in other basidiomycetes. There might be some homologs in a few zygomycetes and ascomycetes but the confidence scores were so low that we deemed unlikely. We now report this in the manuscript on lines 210-213, “This domain was previously reported to be found in S. pombe (Marasovic, Zocco, & Halic, 2013). Interestingly, through a Blast search of the PARN domain, we did not identify this domain in the genomes of S. cerevisiae, C. albicans or other ascomycetes, but found it in Basidiomycetes and higher eukaryotes”.

      Ssd1 has been studied in the pathogenic yeast Candida albicans and is also regulated by Cbk1 in this organism. We have added a discussion about possible functions of Ssd1 in C. neoformans based on references to studies in C. albicans in the discussion section on lines 401-408. “In C. albicans, Ssd1 plays an important role in polarized growth and hyphal initiation by negatively regulating the transcription factor Nrg1 (H. J. Lee, Kim, Kang, Yang, & Kim, 2015). The observation that cbk1Δpsc1Δ and cbk1Δssd1Δ suppressor mutants partially rescue cell separation defects or depolarized growth suggests that C. neoformans may primarily utilize Ssd1/Psc1 rather than a potential Ace2 homolog to regulate cell separation or polarization. Differential regulation of target mRNA transcripts by Ssd1 and Psc1 may explain the functional divergence of the RAM pathway we observed here between basidiomycete Cryptococcus and the ascomycete yeasts.”

      Reviewer #3 (Public Review):

      In this work the authors identify genes and pathways important for CO2 and thermotolerance in Cryptococcus neoformans. They additionally rule out the contribution of the bicarbonate or cAMPdependent activation of adenylyl cyclase to this pathway, which is important for CO2 sensing in other fungi, further solidifying the need to characterize CO2 sensing in basidiomycetes. The authors establish the importance of focusing on CO2 tolerance by testing the impact of CO2 on fluconazole susceptibility with varied pH, suggesting the ability of CO2 to sensitize cryptococcal cells to fluconazole. Furthermore, the authors compared the CO2 tolerance of clinical reference strains to environmental isolates. The characterization of the RAM pathway Cbk1 kinase illustrated the integration of multiple stress signaling pathways. By using a series of CBK1OE insertions in strains with deletions in other pathways, the ability of Cbk1 over-expression to rescue several strains from CO2 sensitivity was apparent. Additionally, NanoString expression analysis comparing cbk1∆ to H99 validated the author's screen of CO2-sensitive mutants as 16/57 downregulated genes were found in their screen, further confirming the interconnected nature of these pathways. The importance of the RAM pathway in maintaining CO2 and thermotolerance was also incredibly clear.

      Perhaps most interestingly, the authors identify suppressor colonies with distinctive phenotypes that allowed for the characterization of downstream effectors of the RAM pathway. These suppressor colonies were found to have mutations in SSD1 and PSC1 which somewhat restore growth at 37oC with CO2 exposure. Further confirming the importance of the RAM pathway, the cbk1∆ strain had markedly attenuated virulence during infection. Interestingly, the generated suppressor strains had varying impacts on fungal infection in vivo. While the sup1 suppressor was completely cleared from the lungs during both intranasal and IV infection, the sup2 strain, containing mutations in SSD1, maintained a high fungal load in the lungs and was able to disseminate into host tissues during IV infection but not intranasal infection.

      The authors make a strong case for the exploration of thermotolerance and CO2 tolerance as contributors to virulence. Through screening and characterization of RAM pathway kinase CBK1's ability to rescue other mutants from CO2 sensitivity, the overlapping contributions of several signaling pathways and the importance of this kinase were revealed. This work is important and will be valuable to the field. However, the cbk1∆ strain does show reduced melanization, urease secretion, and higher sensitivity to cell wall stressor Congo Red in SI Appendix, Figure S4. While the authors make a strong argument that these well-established virulence factors are not perfect predictors of virulence in vivo, the cbk1∆ strain is not an example of such a case as it does have defects in these important factors in addition to thermotolerance and CO2 tolerance. Not acknowledging the changes in these virulence factors in the cbk1∆ and their potential contribution to phenotypes observed is a weakness of the manuscript. Interestingly, the sup1 and sup2 strains also rescue these virulence factors compared to cbk1∆. Additionally, the assertion that "the observation that only sup2 can survive, amplify, and persist in animals stresses the importance of CO2 tolerance in cryptococcal pathogens" due to the sup2's slightly higher CO2 tolerance compared to sup1, could be better supported by the data. These suppressors did not restore transcript abundances of the differentially expressed genes to WT levels, suggesting post-transcriptional regulation. However, there may be differences in the ability of sup2 to resist stress better than sup1 especially given the known Ssd1 repression of transcript translation in S. cerevisiae. Finally, pH appears to impact the sup1 and sup2 strain's sensitivity to CO2 in SI Appendix Figure 4. This could be better explained and interrogated in the manuscript. Finally, this work includes a variety of genes in several signaling pathways. The paper would be greatly clarified by a graphical abstract indicating how CBK1 may be integrating these pathways or by indicating which genes belong to which pathways in the Figure 1 legend to make this figure easier to follow.

      We thank the reviewer for the thorough summary of the study. We appreciate the reviewer’s enthusiasm about this study as well as constructive critiques on the manuscript. Indeed, the suppressor mutations in the cbk1Δ mutant rescue more phenotypes of cbk1Δ in vitro than just thermotolerance and CO2 tolerance (Supplemental Figure 5), which could benefit the survival of these suppressor strains in vivo compared to the original the cbk1Δ mutant. However, between the sup1 and the sup2 mutants, the only clear difference in growth we observed was in host levels of CO2 and temperature. There was no obvious difference in their resistance to Congo red (cell wall stress), melanization, susceptibility to FK506 (calcineurin pathway inhibitor), sensitivity to H2O2 (ROS), or urease (Supplemental Figure 5). Nonetheless, we agree with the reviewer that there could be other reasons which may influence the outcome in vivo, given that the host environment is more complex than we know. We have changed our wording in the manuscript to make it clear that contribution of better tolerance of CO2 to better survival of the sup2 mutant is only our hypothesis and there could be other unrecognized contributing factors. “The only in vitro difference observed between sup1 and sup2 was better growth of sup2 at host CO2 levels which may explain the difference in their ability to propagate and persist in the mouse lungs. However, due to the complexity of the host environment, there could be other unrecognized factors contributing to their growth difference in vivo.” (Lines 276279).

      About growth at different pH levels, C. neoformans tends to grow better at lower pH, closer to pH 5. This fungus can grow at pH 3, the lowest pH that our lab had tested (it may be able to sustain viability even at pH 2 based on others’ conference presentations). The high temperature/CO2 combined with neutral or high pH likely causes worse growth of both H99 and the mutants tested.

      We tried making a model to integrate all the pathways and factors identified in this work as the reviewer suggested. However, in this process, we found it difficult to propose a model. Although the current findings clearly demonstrate the importance of Cbk1 in thermotolerance and CO2 tolerance (overexpression of CBK1 can partially restore thermotolerance and/or CO2 tolerance in the mutants defective in the cell wall integrity pathway, the calcineurin pathway or the Cdc24-Ras1 pathway, and that the reciprocal overexpression of these genes in the cbk1∆ mutant does not rescue any of the cbk1∆ mutant’s defects), we do not know the exact mechanisms underlying this phenomenon. Do these pathways directly interact with Cbk1, affect its phosphorylation status, or alter its subcellular localization? Or do these pathways act through some other massagers to indirectly activate Cbk1 or maybe Cbk1’s downstream targets? These are the questions that warrant further investigations in the future. To be prudent, we think it is better not to propose a model at this point given the uncertainty of the mechanism. The mutants belonging to each of the pathways are clearly specified in the texts in this revised manuscript to help orient the readers. For example “As the RAM pathway effector kinase mutant cbk1Δ showed the most severe defect in thermotolerance and CO2 tolerance compared to the mutants of the other pathways, we first overexpressed the gene CBK1 in the following mutants, cdc24∆ (Ras1-Cdc24), mpk1∆ (CWI), cna1∆ (Calcineurin), and the cbk1Δ mutant itself, and observed their growth at host temperature and host CO2 (Figure 2B)...”

    1. Author Response

      Public Evaluation Summary:

      The authors re-analyzed a previously published dataset and identify patterns suggestive of increased bacterial biodiversity in the gut may creating new niches that lead to gene loss in a focal species and promote generation of more diversity. Two limitations are (i) that sequencing depth may not be sufficient to analyze strain-level diversity and (ii) that the evidence is exclusively based on correlations, and the observed patterns could also be explained by other eco-evolutionary processes. The claims should be supported by a more detailed analysis, and alternative hypotheses that the results do not fully exclude should be discussed. Understanding drivers of diversity in natural microbial communities is an important question that is of central interest to biomedically oriented microbiome scientists, microbial ecologists and evolutionary biologists.

      We agree that understanding the drivers of diversity in natural communities is an important and challenging question to address. We believe that our analysis of metagenomes from the gut microbiomes is complementary to controlled laboratory experiments and modeling studies. While these other studies are better able to establish causal relationships, we rely on correlations – a caveat which we make clear, and offer different mechanistic explanations for the patterns we observe.

      We also mention the caveat that we are only able to measure sub-species genetic diversity in relatively abundant species with high sequencing depth in metagenomes. These relatively abundant species include dozens of species in two metagenomic datasets, and we see no reason why they would not generalize to other members of the microbiome. Nonetheless, further work will be required to extend our results to rarer species.

      Our revised manuscript includes two major new analyses. First, we extend the analysis of within-species nucleotide diversity to non-synonymous sites, with generally similar results. This suggests that evolutionarily older, less selectively constrained synonymous mutations and more recent non-synonymous mutations that affect protein structure both track similarly with measures of community diversity – with some subtle differences described in the manuscript.

      Second, we extend our analysis of dense time series data from one individual stool donor and one deeply covered species (B. vulgatus) to four donors and 15 species. This allowed us to reinforce the pattern of gene loss in more diverse communities with greater statistical support. Our correlational results are broadly consistent with the predictions of DBD from modeling and experimental studies, and they open up new lines of inquiry for microbiome scientists, ecologists, and evolutionary biologists.

      Reviewer #1 (Public Review):

      This paper makes an important contribution to the current debate on whether the diversity of a microbial community has a positive or negative effect on its own diversity at a later time point. In my view, the main contribution is linking the diversity-begets-diversity patterns, already observed by the same authors and others, to genomic signatures of gene loss that would be expected from the Black Queen Hypothesis, establishing an eco-evolutionary link. In addition, they test this hypothesis at a more fine-grained scale (strain-level variation and SNP) and do so in human microbiome data, which adds relevance from the biomedical standpoint. The paper is a well-written and rigorous analysis using state-of-the-art methods, and the results suggest multiple new experiments and testable hypotheses (see below), which is a very valuable contribution.

      We thank the reviewer for their generous comments.

      That being said, I do have some concerns that I believe should be addressed. First of all, I am wondering whether gene loss could also occur because of environmental selection that is independent of other organisms or the diversity of the community. An alternative hypothesis to the Black Queen is that there might have been a migration of new species from outside and then loss of genes could have occurred because of the nature of the abiotic environment in the new host, without relationship to the community diversity. Telling the difference between these two hypotheses is hard and would require extensive additional experiments, which I don't think is necessary. But I do think the authors should acknowledge and discuss this alternative possibility and adjust the wording of their claims accordingly.

      We concur with the reviewer that the drivers of the correlation between community diversity and gene loss are unclear. Therefore, we have now added the following text to the Discussion:

      “Here we report that genome reduction in the gut is higher in more diverse gut communities. This could be due to de novo gene loss, preferential establishment of migrant strains encoding fewer genes, or a combination of the two. The mechanisms underlying this correlation remain unclear and could be due to biotic interactions – including metabolic cross-feeding as posited by some models (Estrela et al., 2022; San Roman and Wagner, 2021, 2018) but not others (Good and Rosenfeld, 2022) – or due to unknown abiotic drivers of both community diversity and gene loss.”

      Additionally, we have revised Figure 1 to show that strain invasions/replacements, in addition to evolutionary change, could be an important driver of changes in intra-species diversity in the microbiome.

      Another issue is that gene loss is happening in some of the most abundant species in the gut. Under Black Queen though, we would expect these species to be most likely "donors" in cross-feeding interactions. Authors should also discuss the implications, limitations, and possible alternative hypotheses of this result, which I think also stimulates future work and experiments.

      We thank the reviewer for raising this point. It is unclear to us whether the more abundant species would be donors in cross-feeding interactions. If we understand correctly, the reviewer is suggesting that more abundant donors will contribute more total biomass of shared metabolites to the community. This idea makes sense under the assumption that the abundant species are involved in cross-feeding interactions in the first place, which may or may not be the case. As our work heavily relies on a dataset that we previously analyzed (HMP), we wish to cite Figure S20 in Garud, Good et al. 2019 PLoS Biology in which we found there are comparable rates of gene changes across the ~30 most abundant species analyzed in the HMP. This suggests that among the most abundant species analyzed, there is no relationship between their abundance and gene change rate.

      That being said, we acknowledge that our study is limited to the relatively abundant focal species and state now in the Discussion: “Deeper or more targeted sequencing may permit us to determine whether the same patterns hold for rarer members of the microbiome.”

      Regarding Figure 5B, there is a couple of questions I believe the authors should clarify. First, How is it possible that many species have close to 0 pathways? Second, besides the overall negative correlation, the data shows some very conspicuous regularities, e.g. many different "lines" of points with identical linear negative slope but different intercept. My guess is that this is due to some constraints in the pathway detection methods, but I struggle to understand it. I think the authors should discuss these patterns more in detail.

      We sincerely thank the reviewer for raising this issue, as it prompted us to investigate more deeply the patterns observed at the pathway level. In short, we decided to remove this analysis from the paper because of a number of bioinformatics issues that we realized were contributing to the signal. However, in support of BQH-like mechanisms at play, we do find evidence for gene loss in more diverse communities across multiple species in both the HMP and Poyet datasets. Below we detail our investigation into Figure 5b and how we arrived at the conclusion that is should be removed:

      (1) Regarding data points in Figure 5B where many focal species have “zero pathways”,we firstly clarify how we compute pathway presence and richness. Pathway abundance data per species were downloaded from the HMP1-2 database, and these pathway abundances were computed using HUMAnN (HMP Unified Metabolic Analysis Network). According to HUMAnN documentation, pathway abundance is proportional to the number of complete copies of the pathway in the community; this means that if at least one component reaction in a certain pathway is missing coverage (for a sample-species pair), the pathway abundance may be zero (note that HUMAnN also employs “gap filling” to allow no more than one required reaction to have zero abundance). As such, it is likely that insufficient coverage, especially for low-abundance species, causes many pathways to report zero abundance in many species in many samples. Indeed, 556 of the 649 species considered had zero “present” pathways (i.e. having nonzero abundance) in at least 400 of the 469 samples (see figure below).

      (2) We thank the reviewer for pointing out the “conspicuous regularities” in Figure 5B,particularly “parallel lines” of data points that we discovered are an artifact of the flawed way in which we computed “community pathway richness [excluding the focal species].” Each diagonal line of points corresponds to different species in the same sample, and because community pathway richness is computed as the total number of pathways [across all species in the sample] minus the number of pathways in the focal species, the current Figure 5B is really plotting y against X-y for each sample (where X is a sample’s total community pathway richness, and y is the pathway richness of an individual species in that sample). This computation fails to account for the possibility that a pathway in an excluded focal species will still be present in the community due to redundancy, and indeed BQH tests for whether this redundancy is kept low in diverse communities due to mechanisms such as gene loss.

      We attempted to instead plot community pathway richness defined as the number of unique pathways covered by all species other than the focal species. This is equivalent to [number of unique pathways across all species in a sample] minus the [number of pathways that are ONLY present in the focal species and not any other species in the sample]. However, when we recomputed community pathway richness this way, it is rare that a pathway is present in only one species in a sample. Moreover, we find that with the exception of E. coli, focal species pathway richness tended to be very similar across the 469 samples, often reaching an upper limit of focal species pathway richness observed. (It is unclear to what extent lower pathway richnesses are due to low species abundance/low sample coverage versus gene loss). This new plot reveals even more regularities and is difficult to interpret with respect to BQH. (Note that points are colored by species; the cluster of black dots with outlying high focal pathway richness corresponds to the “unclassified” stratum which can be considered a group of many different species.)

      Overall, because community pathway richness (excluding a focal species) seems to primarily vary with sample rather than focal species in this dataset when using the most simple/strict definition of community pathway richness as described above, it is difficult to probe the Black Queen Hypothesis using a plot like Figure 5B. As pointed out by reviewers, lack of sequencing depth to analyze strain-level diversity and accurately quantify pathway abundance, irrespective of species abundance, seems to be a major barrier to this analysis. As such, we have decided to remove Figure 5B from the paper and rewrite some of our conclusions accordingly.

      Finally, I also have some conceptual concerns regarding the genomic analysis. Namely, genes can be used for biosynthesis of e.g. building blocks, but also for consumption of nutrients. Under the Black Queen Hypothesis, we would expect the adaptive loss of biosynthetic genes, as those nutrients become provided by the community. However, for catabolic genes or pathways, I would expect the opposite pattern, i.e. the gain of catabolic genes that would allow taking advantage of a more rich environment resulting from a more diverse community (or at least, the absence of pathway loss). These two opposing forces for catabolic and biosynthetic genes/pathways might obscure the trends if all genes are pooled together for the analysis. I believe this can be easily checked with the data the authors already have, and could allow the authors to discuss more in detail the functional implications of the trends they see and possibly even make a stronger case for their claims.

      We thank the reviewer for their suggestion. As explained above, we have removed the pathway analysis from the paper due to technical reasons. However, we did investigate catabolic and biosynthetic pathways separately as suggested by the reviewer as we describe below:

      We obtained subsets of biosynthetic pathways and catabolic pathways by searching for keywords (such as “degradation” for catabolic) in the MetaCyc pathway database. After excluding the “unclassified” species stratum, we observe a total of 279 biosynthetic and 167 catabolic pathways present in the HMP1-2 pathway abundance dataset. Using the corrected definition of community pathway richness excluding a focal species, for each pathway type—either biosynthetic or catabolic—we plotted focal species pathway richness against community pathway richness including all pathways regardless of type:

      We observe the same problem where, within a sample, community pathway richness excluding the focal species hardly varies no matter which focal species it is, due to nearly all of its detected pathways being present in at least one other species; this makes the plots difficult to interpret.

      Reviewer #2 (Public Review):

      The authors re-analysed two previously published metagenomic datasets to test how diversity at the community level is associated with diversity at the strain level in the human gut microbiota. The overall idea was to test if the observed patterns would be in agreement with the "diversity begets diversity" (DBD) model, which states that more diversity creates more niches and thereby promotes further increase of diversity (here measured at the strain-level). The authors have previously shown evidence for DBD in microbiomes using a similar approach but focusing on 16S rRNA level diversity (which does not provide strain-level insights) and on microbiomes from diverse environments.

      One of the datasets analysed here is a subset of a cross-sectional cohort from the Human Microbiome Project. The other dataset comes from a single individual sampled longitudinally over 18 months. This second dataset allowed the authors to not only assess the links between different levels of diversity at single timepoints, but test if high diversity at a given timepoint is associated with increased strain-level diversity at future timepoints.

      Understanding eco-evolutionary dynamics of diversity in natural microbial communities is an important question that remains challenging to address. The paper is well-written and the detailed description of the methodological approaches and statistical analyses is exemplary. Most of the analyses carried out in this study seem to be technically sound.

      We thank the reviewer for their kind words, comments, and suggestions.

      The major limitation of this study comes with the fact that only correlations are presented, some of which are rather weak, contrast each other, or are based on a small number of data points. In addition, finding that diversity at a given taxonomic rank is associated with diversity within a given taxon is a pattern that can be explained by many different underlying processes, e.g. species-area relationships, nutrient (diet) diversity, stressor diversity, immigration rate, and niche creation by other microbes (i.e. DBD). Without experiments, it remains vague if DBD is the underlying process that acts in these communities based on the observed patterns.

      We thank the reviewer for their comments. First, regarding the issue of this being a correlative study, we now more clearly acknowledge that mechanistic studies (perhaps in experimental settings) are required to fully elucidate DBD and BQH dynamics. However, we note that our correlational study from natural communities is complementary to experimental and modeling studies, to test the extent to which their predictions hold in more complex, realistic settings. This is now mentioned throughout the manuscript, most explicitly at the end of the Introduction:

      “Although such analyses of natural diversity cannot fully control for unmeasured confounding environmental factors, they are an important complement to controlled experimental and theoretical studies which lack real-world complexity.”

      Second, to increase the number of data points analyzed in the Poyet study, we now include 15 species and four different hosts (new Figure 5). The association between community diversity and gene loss is now much more statistically robust, and consistent across the Poyet and HMP time series.

      Third, we acknowledge more clearly in the Discussion that other processes, including diet and other environmental factors can generate the DBD pattern. We also now stress more prominently the possibility that strain migration across hosts may be responsible for the patterns observed. For example, in Figure 1, we illustrate the possibility of strain migration generating the patterns we observe.

      Below we quote a paragraph that we have now added in the Discussion:

      "Second, we cannot establish causal relationships without controlled experiments. We are therefore careful to conclude that positive diversity slopes are consistent with the predictions of DBD, and negative slopes with EC, but unmeasured environmental drivers could be at play. For example, increased dietary diversity could simultaneously select for higher community diversity and also higher intra-species diversity. In our previous study, we found that positive diversity slopes persisted even after controlling for potential abiotic drivers such as pH and temperature (Madi et al., 2020), but a similar analysis was not possible here due to a lack of metadata. Neutral processes can account for several ecological patterns such as species-area relationships (Hubbell, 2001), and must be rejected in favor of niche-centric models like DBD or EC. Using neutral models without DBD or EC, we found generally flat or negative diversity slopes due to sampling processes alone and that positive slopes were hard to explain with a neutral model (Madi et al., 2020). These models were intended mainly for 16S rRNA gene sequence data, but we expect the general conclusions to extend to metagenomic data. Nevertheless, further modeling and experimental work will be required to fully exclude a neutral explanation for the diversity slopes we report in the human gut microbiome.”

      Finally, we now put more emphasis on the importance of migration (strain invasion) as a non-exclusive alternative to de novo mutation and gene gain/loss. This is mentioned in the Abstract and is also illustrated in the revised Figure 1.

      Another limitation is that the total number of reads (5 mio for the longitudinal dataset and 20 mio for the cross-sectional dataset) is low for assessing strain-level diversity in complex communities such as the human gut microbiota. This is probably the reason why the authors only looked at one species with sufficient coverage in the longitudinal dataset.

      Indeed, this is a caveat which means we can only consider sub-species diversity in relatively abundant species. Nevertheless, this allows us to study dozens of species in the HMP and 15 in the more frequent Poyet time series. As more deeply sequenced metagenomes become available, future studies will be able to access the rarer species to test whether the same patterns hold or not. This is now mentioned prominently as a caveat our study in the second Discussion paragraph:

      “First, using metagenomic data from human microbiomes allowed us to study genetic diversity, but limited us to considering only relatively abundant species with genomes that were well-covered by short sequence reads. Deeper or more targeted sequencing may permit us to determine whether the same patterns hold for rarer members of the microbiome. However, it is notable that the majority of the dozens of species across the two datasets analyzed support DBD, suggesting that the phenomenon may generalize.”

      We also note that rarefaction was only applied to calculate community richness, not to estimate sub-species diversity. We apologize for this confusion, which is now clarified in the Methods as follows:

      “SNV and gene content variation within a focal species were ascertained only from the full dataset and not the rarefied dataset.”

      Analyzing the effect of diversity at a given timepoint on strain-level diversity at a later timepoint adds an important new dimension to this study which was not assessed in the previous study about the DBD in microbiomes by some of the authors. However, only a single species was analysed in the longitudinal dataset and comparisons of diversity were only done between two consecutive timepoints. This dataset could be further exploited to provide more insights into the prevailing patterns of diversity.

      We thank the reviewer for raising this point. We now have considered all 15 species for which there was sufficient coverage from the Poyet dataset, which included four different stool donors. Additionally, in the HMP dataset, we analyze 54 species across 154 hosts, with both datasets showing the same correlation between community diversity and gene loss.

      Additionally, we followed the suggestion of the reviewer of examining additional time lags, and in Figure 5 we do observe a dependency on time. This is now described in the Results as follows:

      “Using the Poyet dataset, we asked whether community diversity in the gut microbiome at one time point could predict polymorphism change at a future time point by fitting GAMs with the change in polymorphism rate as a function of the interaction between community diversity at the first time point and the number of days between the two time points. Shannon diversity at the earlier time point was correlated with increases in polymorphism (consistent with DBD) up to ~150 days (~4.5 months) into the future (Figure S4), but this relationship became weaker and then inverted (consistent with EC) at longer time lags (Fig 5A, Table S8, GAM, P=0.023, Chi-square test). The diversity slope is approximately flat for time lags between four and six months, which could explain why no significant relationship was found in HMP, where samples were collected every ~6 months. No relationship was observed between community richness and changes in polymorphism (Table S8, GAM, P>0.05).”

      Finally, the evidence that gene loss follows increase in diversity is weak, as very few genes were found to be lost between two consecutive timepoints, and the analysis is based on only a single species. Moreover, while positive correlation were found between overall community diversity and gene family diversity in single species, the opposite trend was observed when focusing on pathway diversity. A more detailed analysis (of e.g. the functions of the genes and pathways lost/gained) to explain these seemingly contrasting results and a more critical discussion of the limitations of this study would be desirable.

      We agree that our previous analysis of one species in one host provided weak support for gene loss following increases in diversity. As described in the response above, we have now expanded this analysis to 15 focal species and 4 independent hosts with extensive time series. We now analyze this larger dataset and report the more statistically robust results as follows:

      “We found that community Shannon diversity predicted future gene loss in a focal species, and this effect became stronger with longer time lags (Fig 5B, Table S9, GLMM, P=0.006, LRT for the effect of the interaction between the initial Shannon diversity and time lag on the number of genes lost). The model predicts that increasing Shannon diversity from its minimum to its maximum would result in the loss of 0.075 genes from a focal species after 250 days. In other words, about one of the 15 focal species considered would be expected to lose a gene in this time frame.

      Higher Shannon diversity was also associated with fewer gene gains, and this relationship also became stronger over time (Fig 5C, Table S9, GLMM, P=1.11e-09, LRT). We found a similar relationship between community species richness and gene gains, although the relationship was slightly positive at shorter time lags (Fig 5D, Table S9, GLMM, P=3.41e-04, LRT). No significant relationship was observed between richness and gene loss (Table S9, GLMM, P>0.05). Taken together with the HMP results (Fig 4), these longer time series reveal how the sign of the diversity slope can vary over time and how community diversity is generally predictive of reduced focal species gene content.”

      As described in detail in the response to Reviewer 1 above, we found that the HUMAnN2 pathway analyses previously described suffered from technical challenges and we deemed them inconclusive. We have therefore removed the pathway results from the manuscript.

      Reviewer #3 (Public Review):

      This work provides a series of tests of hypothesis, which are not mutually exclusive, on how genomic diversity is structured within human microbiomes and how community diversity may influence the evolution of a focal species.

      Strengths:

      The paper leverages on existing metagenomic data to look at many focal species at the same time to test for the importance of broad eco-evolutionary hypothesis, which is a novelty in the field.

      Thank you for the succinct summary and recognition of the strengths of our work.

      Weaknesses:

      It is not very clear if the existing metagenomic data has sufficient power to test these models.

      It is not clear, neither in the introduction nor in the analysis what precise mechanisms are expected to lead to DBD.

      The conclusion that data support DBD appears to depend on which statistics to measure of community diversity are used. Also, performing a test to reject a null neutral model would have been welcome either in the results or in the discussion.

      In our revised manuscript, we emphasize several caveats – including that we only have power to test these hypotheses in focal species with sufficient metagenomic coverage to measure sub-species diversity. We also describe more in the Introduction how the processes of competition and niche construction can lead to DBD. We also acknowledge that unmeasured abiotic drivers of both community diversity and sub-species diversity could also lead to the observed patterns. Throughout the manuscript, we attempt to describe the results and acknowledge multiple possible interpretations, including DBD and EC acting with different strengths on different species and time scales. Our previous manuscript assessing the evidence for DBD using 16S rRNA gene amplicon data from the Earth Microbiome Project (Madi et al., eLife 2020) assessed null models based on neutral ecological theory, and found it difficult to explain the observation of generally positive diversity slopes without invoking a non-neutral mechanism like DBD. While a new null model tailored to metagenomic data might provide additional nuance, we think developing one is beyond the scope of the manuscript – which is in the format of a short ‘Research Advance’ to expand on our previous eLife paper, and we expect that the general results of our previously reported null model provide a reasonable intuition for our new metagenomic analysis. This is now mentioned in the Discussion as follows:

      “In our previous study, we found that positive diversity slopes persisted even after controlling for potential abiotic drivers such as pH and temperature (Madi et al., 2020), but a similar analysis was not possible here due to a lack of metadata. Neutral processes can account for several ecological patterns such as species-area relationships (Hubbell, 2001), and must be rejected in favor of niche-centric models like DBD or EC. Using neutral models without DBD or EC, we found generally flat or negative diversity slopes due to sampling processes alone and that positive slopes were hard to explain with a neutral model (Madi et al., 2020). These models were intended mainly for 16S rRNA gene sequence data, but we expect the general conclusions to extend to metagenomic data. Nevertheless, further modeling and experimental work will be required to fully exclude a neutral explanation for the diversity slopes we report in the human gut microbiome.”

    1. Author Response

      Reviewer #2 (Public Review):

      1&2) Throughout the paper, the authors use a BiFC assay to monitor direct interactions between GDOWN1 and other transcription factors in the cell. While this assay works well for their experiments, we are unsure why GDOWN1 appears to interact with every protein found in the cytoplasm. This is particularly concerning when we look at GDOWN1 interacting with itself (Figure 1D), as GDOWN1 is not known to self-oligomerize. The authors should provide a negative control that GDOWN1 does not non-specifically interact with any cytoplasm-localized protein. Additionally, every GDOWN1 truncation tested was able to interact with NELF-E. We are unsure why each truncation tested (given that they tested multiple non-overlapping GDOWN1 regions) can interact with NELF-E. Do the authors believe that NELF-E directly interacts with every tested GDOWN1 construct? We believe that demonstration of BiFC specificity is critical for the conclusions drawn in the manuscript.

      Thank you for your comments and valuable suggestions! We added more negative BiFC controls in the revised manuscript to demonstrate the specificity of BiFC assays (Figure 1——figure supplement 1D). Since both reviewers brought up this question, we provided our answers to this question above in the “Common concerns by the Reviewers” session (Q#1).

      3) The authors note that the NES1 site is not as strong as the NES2 site at regulating exportin 1-dependent nuclear export. However, they suggest this is because mutating the NES2 site is more likely to disrupt the CAS site nearby. We ask the authors to expand on this concept. Do they have direct evidence that NES2 disrupts CAS activity (such as regulating its association with the nuclear pore complex)?

      From Figure 4A, we can see that both NES1 (4A-b) and NES2 (4A-d) work as functional nuclear export signals. When NES1 was mutated (4A-c), NES2 and CAS both remained functional in blocking GDOWN1’s nuclear shuttling upon LMB addition. However, when NES2 was mutated (4A-e), comparing the localization changes before and after LMB addition, we concluded that NES1 remained functional, while the cytoplasmic retention activity of CAS was partially lost. From the quantification of the images, it seems that NES1 has a stronger activity than NES2 in terms of the LMB responsiveness/CRM1-depentent nuclear export activity, while apparently NES2 exhibits another layer of regulation/correlation on the CAS activity.

      To further confirm this observation, we generated a HeLa stably cell line expressing GDOWN1(NES2 mutant)-Venus and tested the subcellular localization of this mutant. As shown in the Figure 4C of the revised manuscript, compared with the wild type GDOWN1, loss of the NES2 activity directly caused the loss of the perinuclear staining, which was consistent to the defect of the CAS mutant. These results further support that the mutagenesis of NES2 disrupts the CAS-mediated association to the nuclear pore complex.

      4) The authors show the critical role of the NES1, NES2, and CAS sites for the localization and function of GDOWN1. Have the authors checked post-translational modification databases to check if any of the identified sites could be post-translationally modified and thereby regulated? Elucidation of the mechanism by which GDOWN1 localization is regulated is of broad interest to the transcription community.

      Good suggestion! It is worthy of checking and testing the potential modifications on the key arginines identified in CAS (R352, R354, and R357). We did check the web tools for arginine methylation site prediction (http://msp.biocuckoo.org/online.php), but none pf the known motifs was found to match with the CAS sequences of GDOWN1. In addition, our pilot studies for the treatments using the inhibitors of arginine methyltransferases (- or + LMB) did not result in any nuclear accumulation of GDOWN1 (data not shown). So far, we do not have any strong evidence to confirm that these arginines are directly modified in our assays, and we cannot exclude the possibilities of other amino acids nearby also play key roles on the CAS function. Thus, more research is badly needed to uncover the regulatory mechanism of CAS.

    1. Author Response

      Reviewer #1 (Public Review):

      This study aimed to test the hypothesis that resident immune cells are strategically positioned along the epididymal duct to provide different immunological environments to prevent pathogens from ascending the urogenital tract. By using an epididymitis mouse model, the differential responses at different segments along the epididymis were examined at both histological and gene expression levels, and the data appeared to support their hypothesis. Furthermore, single-cell RNA-seq analyses identified the composition of resident immune cell types along the epididymal duct, and the parabiosis model further corroborated the major findings. Overall, the study was well conducted and the major conclusion seems well supported. The only caveat is the lack of elucidation on the direct or indirect impact of the resident immune cells on sperm maturation.

      We thank the reviewer for his/her feedback and the valuable comments.

      We are aware of the fact that the current manuscript lacks further experimental evidence on the effects of immune cells on organ function, especially sperm maturation, and agree that this would constitute a relevant object to study. Although the assessment of the direct or indirect impact of particular immune cells on sperm maturation would require further intensive research, encompassing e.g. the consequences of targeted cell depletions (using several transgenic mouse models) with comprehensive follow-up analysis (i.e. by detecting anti-sperm antibodies, assessing the potential appearance of sperm-induced autoimmune reactions in vivo and conducting in vitro co-culture assays besides conducting sperm functional tests to evaluate capacitation and fertilization competencies). A study of this magnitude is outside of the scope of the present manuscript and would form a separate examination that alone would take more than a year to perform. Therefore, our intention was to submit this article as a ‘Tools and Resource’ article as it is providing a detailed overview of all immune cell types that are shaping the regional immunological landscape based on crucial information about their transcriptional profiles on single cell resolution. In our view the provided data are closing a gap in the current state of knowledge (particularly regarding the transcriptional identity and distribution of described immune cell populations) and will serve as a relevant common platform for current and future approaches.

      Reviewer #2 (Public Review):

      Pleuger et al. investigated the heterogeneity of resident immune cells in the murine epididymis. The response of immune cells in the different epididymal segments was characterized following acute bacterial infection by flow cytometry, and immunofluorescence microscopy. Single-cell RNA sequencing analysis and parabiosis experiments were performed to provide an atlas of resident immune cells and their etiology in the epididymis under steady-state conditions. The authors conclude that distinct immune cell phenotypes govern specific responses of the different epididymal segments during acute bacterial infection. Overall, the conclusions of this study are well supported by the data, but some specific aspects related to the region-specific phenotypes of resident immune cells need to be revisited.

      1) In order to conclude that there was an infiltration of neutrophils and monocytes following bacterial injection, the authors should provide flow cytometry quantification of the percentages of immune cell subsets relative to live cells, rather than relative to the CD45+ population.

      Following the reviewer’s request, we have replaced the data previously shown in figure 2 by a completely new high-dimensional flow cytometry analysis including FltSNE visualization of CD45+ cell populations in different epididymal regions (IS, Caput, Corpus, Cauda) under different conditions (naive, sham, UPEC 10 days post infection). In addition, we have included bar diagrams displaying the percentage of all investigated immune cell subsets in relation to single live cells. The results displayed in the new figure are similar to previous shown data, but the overall figure layout and visualization method is clearer and more comprehensible. We thank the reviewer for the helpful comment.

      2) In general, all flow cytometry and immunofluorescence data should be presented and discussed with respect to previously published studies.

      This is reflected in the discussion (line 564-575) and in addition by addressing similar points raised by the reviewers.

      3) A surprisingly low number of CX3CR1-EGFP cells was detected by immunofluorescence in the cauda. This is not in agreement with previous studies showing a similar % of CX3CR1-EGFP cells in the IS and cauda regions by immunofluorescence and flow cytometry. The authors need to discuss this discrepancy. Perhaps the different fixation procedures used in the current study compared to those used in previous studies could account for the loss of EGFP in epididymis cryo-sections. As such, cells that appear to be F4/80 positive but negative for EGFP by immunofluorescence might simply be due to the loss of cytoplasmic EGFP, while F4/80 immunogenicity remained intact.

      Within our study, we have shown by combining scRNASeq, flow cytometry and immunostaining that distinct macrophage subgroups co-exist within the epididymis and that the diversity increases towards the cauda. Based on these data, we can assume that cells that appear to be F4/80 positive but negative for CX3CR1 (e.g. clusters 6-9 of the macrophage clustering show a very low level or even lack of Cx3cr1 expression) are distinct from CX3CR1+F4/80+ cells (e.g. clusters 1 and 2 of the macrophage subclustering, both showing a high expression of Cx3cr1). Therefore, our immunostaining (on Cx3cr1GFPCcr2RFP reporter mice) and flow cytometry data (on wild type C57BL/6J mice) in Figure 6 are in line with our transcriptomic data and strongly support the co-existence of both populations. We have seen the described gradient of macrophage numbers (decreasing from IS towards cauda) in all independently performed experiments (naive control group in infection experiments, steady-state characterization in wild type and transgenic mice). A previous study, however, demonstrated a constant CX3CR1+ cell ‘number’ throughout all epididymal regions (~5-6% in live cells, (Battistone et al., 2020)). Here, indeed we notice a discrepancy to our results that show a relatively high ‘number’ of CX3CR1+ cells in the initial segment of naive mice (20% in single live cells, new Figure 2G) that decreases towards the cauda (~5% in single live cells, data shown in the new Figure 2 of naive mice). [It needs to be mentioned that these numbers are slightly different to the percentage of CD45+ cells in single live cells shown in Figure 4 due to different settings in the flow cytometry (thresholding to exclude spermatozoa and debris)]. However, another study (Voisin et al., 2018) showed a comparable ratio of total macrophages within caput and cauda with a similar gradient throughout the epididymal regions (significantly lower ratio within the cauda compared to the caput). Although this study discriminated only between caput and cauda, these data are in line with our results.

      Nevertheless, it needs to be noted that calculating the percentage of a population in single live cells is not representing an unbiased quantification approach as this calculation is highly dependent on previous gating (thresholding, aimed events, single cells as well as live cells; the latter is, in turn, dependent on the experimental procedures that may have an impact on the cell viability and antigen recognizability, see below). Rather, it provides important information about the population distribution among regions or conditions. For this reason, a comparison among studies as requested above is not expedient from our point of view. This as well as other studies are limited in the way that they lack an absolute quantification of immune cell populations as that would require e.g. a prior cell-counting or the relation of absolute cell numbers to mg of tissue as conducted in the parabiosis experiment shown in Figure 7 (that in turn is also limited for the epididymal regions due to the necessity of pooling tissue from several mice to obtain a sufficient cell number and thus, masking individual differences). Another alternative would be quantitative morphometric analysis of stained sections that has not been performed in the present study.

      By comparing the protocol for the cell isolation and preparation of the single cell suspension between our study and previous reports (Battistone et al., 2020), it appears that different protocols have been applied that indeed could have a major impact. In this regard, the study of (Battistone et al., 2020) used a mixture of collagenase type I (0.5 mg/ml) and collagenase type II (0.5 mg/ml) and incubated tissue fragments for a short period (30 minutes) at 37°C. In contrast, in this study we have chopped the tissue fragments with scissors until no fragments were visible anymore then followed by enzymatic digestion (shaking at 37°C for 45 minutes with 1.5 mg/ml collagenase type IV and 60 U/ ml DNAse). Afterwards, we aspirated the digest 5-6 times through a 30G needle (to release pre-digested sticky cells from each other by shear forces) before passing through a 70 µm cell strainer. We have experienced that we can significantly increase the number of viable cells when using collagenase type IV for a longer time at the ideal concentration at 1.5 mg/ml (similar concentration and incubation duration with collagenase I resulted in a higher proportion of dead cells in the analysis). A longer incubation time increases the obtained cell numbers especially from the IS where the epithelial cells are densely connected to each other. In general, collagenase type IV has a lower tryptic activity than other collagenases and therefore, the usage of collagenase IV limits the damage on membrane proteins and receptors (an overview of the different collagenase types with respective references can be found at: https://www.worthington-biochem.com/products/collagenase/manual).

      In summation, we agree with the reviewer that very likely methodological differences account for the mentioned discrepancy of our data to Battistone et al (2020) and raised this point in the revised discussion(ses line 559-564).

      The statement "Intriguingly, our data revealed that distinct immunological landscapes exist within proximal (IS, caput) and distal regions (corpus, cauda), that are tailored to the respective needs of the microenvironments" implies that this is the first study that describes immune cell heterogeneity in the epididymis. Please rephrase this statement as previous studies have already shown the segment-specific heterogeneity of resident immune cells in this organ.

      To address the reviewer's comment, we have rephrased the statement to “our data unraveled the transcriptional identity and tissue location of extravascular immune cells and further support the existence of distinct immunological environments along the epididymal duct that are tailored to the respective needs of the microenvironment” within the discussion section (line 555-558). Moreover, the previous investigations on epididymal immune cells were acknowledged and cited within the introduction (line 107-124) as well as in the discussion (line 549-554, line 564-575, line 580-584,). We hope that this satisfactorily addresses the reviewer’s critique.

      The conclusion that macrophages constitute the major immune cell population of the murine epididymis is not supported by the data provided here. In fact, the authors found that macrophages account for only approximately 20% of CD45+ immune cells in the cauda. The authors should, therefore, modify their conclusion to state that macrophages constitute the major immune cell population in the IS. In fact, this conclusion would be more in line with previously published studies.

      The reviewer is correct and we have changed the conclusion to “macrophages constitute the major immune cell population, especially within the IS” accordingly (see line 559-560).

      The authors conclude that fewer intraepithelial CX3CR1-EGFP+ cells are present in the cauda, but they do not explain how they actually quantified these intraepithelial cells. A description of how these results were obtained is missing.

      We agree with the reviewer that we did not quantify cells based on our immunostaining. All quantification approaches were obtained by flow cytometry on wild type mice with respective surface staining (acc. to previous selection of markers derived from scRNASeq, see Figure 6) and show only ratios, but no absolute numbers. An additional counting of the immunostained section would be required to ultimately determine whether these cells are quantitatively different in the cauda compared to the IS. The respective sentence, however, does not intend to compare the abundance of these cells among epididymal regions, rather it is stating that ‘the distal regions are populated by a more heterogeneous macrophage pool consisting of less intraepithelial CX3CR1+ macrophages, but higher abundance of interstitial pro-inflammatory monocyte-derived CCR2+MHC-II+, vasculature-associated TLF+ macrophages as well as CX3CR1-TLF-CCR2- macrophages’. This statement is pointing to the increasing macrophage heterogeneity towards the distal parts and is based on the clustering of the scRNASeq data, flow cytometry analysis and supported by the immunostaining that localized these populations in the epididymal compartments. For this reason flow cytometry and immunostaining are combined included in Figure 6 to display the ratio of identified macrophage subgroups to each other (Fig. 6B, bar diagram showing % of distinct subpopulations in total F4/80+ cells) with supportiving immunostaining using the same marker for localization.

    1. Author Response

      Reviewer #3 (Public Review):

      Weaknesses

      The spontaneous activity of the network is extremely low, with [0.02 0.09] spks/s considered as a high activity range. Granted, this is based on ex vivo measurements. However, if this phenomenon is to be considered computationally relevant, as the authors claim, the paper should have examined the reliability of propagation and routing with in vivo activity levels.

      The above weakness is a special case of the issue that the limits of applicability/robustness of results to model assumptions have not been well established. In particular, it is not clear how strong the strongest weights must be whilst still enabling long sequences, and what is the dependence of the results on the parameters of the distance-dependent connectivity.

      Regarding the two first weaknesses listed in Reviewer #3 Public Review, we wish to note that:

      ● The statement that our estimate of spontaneous activity “is based on ex vivo measurements” is incorrect. Our single-cell and connectivity parameters are certainly based on ex vivo measurements, but the range of spontaneous activity that the Reviewer cites ([0.02 0.09] spks/s) is an estimate from in vivo recordings. Furthermore, in our model, we explored mean firing rates higher than this in vivo range and still observed sequences.

      ● While the Reviewer states that “it is not clear how strong the strongest weights must be”, we do provide a lower-bound estimate. We explored simulations where we truncated sections of the distribution of synaptic strengths and observed that networks that included the bottom 90% of connections did not produce sequences.

    1. Author Response

      Reviewer #1 (Public Review):

      This study sets out to decipher whether the eDNA that promotes biofilm dispersal in Caulobacter crescentus biofilms is released when a random portion of cells lyse within biofilms, or whether eDNA release is a regulated process. They start by investigating whether any of the C. crescentus TA systems contribute to biofilm-associated cell death, and find that one of the systems, ParDE4 is responsible for cell death and eDNA release. They go on to show that this system is O2-regulated and thus contributes to cell death in particular in the oxygen limited interior regions of biofilms. These findings contribute significantly to our understanding of the biological functions of toxin-antitoxin systems, mechanisms of bacterial programmed cell death, and biofilm growth. The notion that TA systems function in cell death in particular has been controversial, and often based on overexpression of the toxin component, therefore the fact that this study uses a TA system in its native genomic context is notable. The authors also show clearly the somewhat counterintuitive result that the cell death (and presumably, toxin activity) is negatively correlated with transcription of the TA system. This is consistent with what is known about TA biology (but not with many past TA papers, which often correlated TA transcription with toxin activation). The study also provides a logical rationale for how ParDE4 mediated cell death ultimately contributes to bacterial fitness. The paper is well written and figures are clear and easy to follow.

      There are two relatively minor shortcomings of the paper, both acknowledged as caveats by the authors in their discussion. First, while the authors do include one experiment that addresses whether the toxin is responsible for the cell death (Fig 3), they do not show direct evidence of the activity of the toxin other than cell death/eDNA release. Second, the authors do not address whether the reduced TA transcription they observe is what causes the release of the toxin and thus the cell death phenotype. This seems likely to be the case based on previous studies of other TA systems (e.g. TA systems involved in plasmid segregation, most clearly shown for CcdAB, or more recently the ToxIN system during phage infection). Connecting this directly would be a very valuable addition to this study.

      We thank the reviewer for those positive comments. We agree that the TA system we describe in this study needs to be characterized in more detail. Understanding how this TA expression levels are linked to cell death is our next goal and will be the scope of a future publication.

      We now discuss the important missing point about possible TA expression being linked to cell death and refer to CcdAB, ToxIN and other relevant systems, as well characterized examples of such mechanisms. In the introduction, we now present the role of TAS in plasmid addiction and phage defense mechanisms. We also provide more information about those systems in the discussion and speculate the similarities with the TAS described here (see our reply to essential revisions above).

      Reviewer #2 (Public Review):

      In this work, the authors present compelling evidence that a toxin-antitoxin system contributes to biofilm dispersal under oxygen limited conditions. This work makes important contributions to two areas of microbial physiology; functional understanding of toxin-antitoxin systems, which have remained largely elusive, and mechanistic regulation or biofilm dispersal, is a critical, but less understood aspect of biofilm physiology.

      A major goal of the work described in this manuscript was to better understand the regulation of biofilm dispersal. These authors provide compelling evidence that the parDE4 toxin-antitoxin (TA) system in Caulobacter crescentus mediates enhanced cell death under conditions of oxygen limitation. This group previously reported that extracellular DNA (eDNA) inhibits attachment of new-born swarmer cells. Here they build on that observation by identifying a genetic module that contributes to cell death and DNA release under oxygen limitation, a sub-optimal condition present in a dense biofilm community, and demonstrate that parDE4 affects biofilm development. Together, this work makes important contributions toward understanding functional roles for toxin-antitoxin systems and regulation of mature stages of biofilm development. In addition, although eDNA is often depicted as having a structural role in strengthening and maintaining biofilms in some species, this work further establishes that eDNA can have multiple roles in biofilms including contributing to dispersal in Caulobacter.

      Strengths of this work include 1) comprehensive evaluation of multiple paralogous TAS and specific identification of the contribution of parDE4 to cell death, eDNA release and biofilm restriction, 2) genetic dissection of the TA pair to establish that the ParD4-antitoxin prevents eDNA release and promotes biofilm formation in a ParE4-toxin dependent manner, 3) provision of evidence that the parDE system affects cell death / eDNA release, but not responsiveness to eDNA, 4) demonstration of an anti-correlation between expression of parDE and ccoN, a hypoxic responsive gene, at both the population level under different growth conditions and at the single cell level within different growth conditions.

      We thank the reviewer for these positive comments.

      One weakness of this work is that the authors do not directly measure O2 concentrations in their growth conditions. However, they do monitor activity of an established hypoxic responsive promoter, which provides strong evidence that the various conditions tested do indeed affect oxygen concentrations in the culture medium. Nevertheless, it is difficult to assess oxygen availability in the flow cell experiments, which will be dependent on both dissolved oxygen in the media pumped through the flow cell and cell density within the flow cells. In the competition experiments, the ∆parDE4 mutant has an advantage before there seems to be an appreciable cell density, perhaps reflecting low oxygen in the growth medium or a monolayer of cells that is not obvious in the images as presented. It would be interesting to evaluate expression of ccoN in biofilms grown under these flow conditions.

      We agree with the reviewer that one limitation of our study is that we could not directly measure the O2 concentration in our different growth conditions. Unfortunately, we were unable to find a way to reliably and reproducibly assay the dissolved O2 concentration in our experimental set-ups (both static biofilms and flow-cells). We think that regulation of parDE4 expression is linked to the composition of the local environment surrounding each cell, and offering a proxy via ccoN expression is the best method we could provide to assess this. Results provided in Figures 7 and S3 (now S5) clearly show that cells that respond to limiting O2 levels (by activating ccoN expression) have low parDE4 expression. We also show in this set of experiments that, at the population level, there are cells highly expressing ccoN or parDE4 regardless of the culture conditions and the overall O2 levels.

      We now provide the expression of ccoN in different areas of biofilms, in addition to the already presented parDE4 expression, in Fig. 8A. We quantified ccoN transcription levels using the PccoN-lacZ construct (already used to generate data in Figure 5) and the fluorogenic ß-galactosidase substrate we used to quantify parDE4 expression in biofilms in the first version of this manuscript (Figure 8A). These new results now show that in biofilm areas where parDE4 is more expressed, ccoN expression is low and vice-versa and confirm other observations made throughout this work.

      The discussion regarding the observation that parDE expression drops under activating (oxygen limiting) conditions is contradictory to what I would expect based on the early findings about TA systems as genetic stabilization systems. The authors seem to expect that conditions that activate the toxin should correspond to increase expression of the TA operon. However, TA systems have frequently been characterized as DNA stabilization systems for plasmids or other mobile elements because the toxin proteins are more stable than the antitoxin proteins. In these cases, if the gene pair is lost (or in this case if expression is decreased) then the toxin protein persists longer than the antitoxin protein, effectively activating the toxin to arrest or kill cells that have lost (or in this case turned off) the gene pair. Thus I disagree with the statement that this is a "novel regulatory mechanism of PCD that remains to be understood" (line 436-7).

      The sentence preceding this one was "We are unaware of cases where reduced TAS expression is correlated with the condition that activates the PCD in biofilm regulation." and we suggested a "novel regulatory mechanism of PCD" in the context of biofilm formation. However, we realize now that our statements could be misleading and we entirely rewrote this section (Lines 510-519: " It is interesting to note that the "neutralized" steady state of the ParDE4 TAS, when the toxin is inactivated, seems to be when O2 is abundant, i.e, when parDE4 transcription is at its highest. In most studied TAS, stresses have been shown to induce transcription of TAS (LeRoux et al., 2020, Jurėnas et al., 2022), but here, the stress inflicted on the cells by O2 limitation is accompanied by a lower expression of parDE4. We are unaware of cases where reduced TAS expression is correlated with the condition that activates the PCD in biofilm regulation. This suggests a novel regulatory mechanism of PCD, in the context of biofilms, that remains to be understood.").

      Differential stability of toxin and antitoxin proteins provides a reasonable regulatory mechanism to explain the programed cell death observed. Testing of this, or other, mechanistic model(s) will be important in future studies of this system.

      We agree with the reviewer and testing protein stability is definitively on the list of experiments to do to dissect this TA killing mechanism in the near future. As mentioned above, we have been unable to obtain antibodies to these proteins so far, delaying these types of experiments.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper proposes a 2D U-Net with attention and adaptive batchnorm modules to perform brain extraction that generalises across species. Generalisation is supported by a semi-supervised learning strategy that leverages test-time monte-carlo uncertainty to integrate the best-predicated labels into the training strategy. Monte-Carlo dropout maps also tend to align with inter-rate disagreement from manual segmentations meaning that they can realistically be used for fast QC. The networks (trained on a range of source domains) have been made publicly available, meaning that it should be relatively simple for users to apply them to their own cohorts, allowing for retraining on a very small number of labelled datasets. Overall the paper is exceptionally well written and validated, and the tool has broad application.

      We thank this reviewer very much for these encouraging and valuable comments.

      Reviewer #2 (Public Review):

      In this manuscript, the authors are proposing a generalizable solution to masking brains from medical images from multiple species. This is done via a deep learning architecture, where the key innovation is to incorporate domain transfer techniques that should allow the trained networks to work out of the box on new data or, more likely, need only a limited training set of a few segmented brains in order to become successful.

      The authors show applications of their algorithm to mice, rats, marmosets, and humans. In all cases, they were able to obtain high Dice scores (>0.95) with only a very small number of labelled datasets. Moreover, being deep-learning-based segmentation once a network has been trained is very fast.

      The promise of this work is twofold: to allow for the easy creation of brain masking pipelines in species or modalities where no such algorithms exist, and secondly to provide higher accuracy or robustness of brain masking compared to existing methods.

      I believe that the authors overstate the importance of generalizability somewhat, as masking brains is something that we can by and large do well across multiple species. This often uses specialized tools for human brains that the authors acknowledge work well, and in the usually simpler non-human (i.e. lissencephalic rodent) brains also work well using image registration or multi-atlas segmentation style techniques. So generalizability adds definite convenience but is not a game-changer.

      The key to the proposed algorithm is thus that it works better than, or at least as well as, existing tools. The authors show multiple convincing examples that this is the case even after retraining with only a few samples. Yet in those examples, the authors proposed retraining the network on even subtle acquisition changes, such as moving in field strength from 7 to 9.4T. I tried it on some T2 weighted ex-vivo and T1 weighted manganese enhanced in-vivo mouse data and found that the trained brain extraction net does not generalize well. None of the pre-trained networks provided by the authors produced reasonable masks on my data. Using their domain adaptation retraining algorithm on ~20 brains each resulted in, as promised, excellent brain segmentations. Yet even subtle changes to out-of-sample inputs degraded performance significantly. For example, one set of data with a slight intensity drop-off due to a misplaced sat band created masks that incorrectly excluded those lower intensity voxels. Similarly, training on normal brains and applying the trained algorithm to brains with stroke-induced lesions caused the lesions to be incorrectly masked. BEN thus seems to be in need of regular retraining to very precisely matched inputs. In both those examples, the usual image registration/multi-atlas segmentation approach we use for brain masking worked without needing any adaptation.

      Overall, this paper is filled with excellent ideas for a generalized brain extraction deep learning algorithm that features domain adaptation to allow easy retraining to meet different inputs, be they species or sequence types. The authors are to be highly commended for their work. Yet it appears to at the moment produce overtrained networks that are challenged by even subtle shifts in inputs, something I believe needs to be addressed for BEN to truly meet its promised potential.

      We sincerely thank the reviewer for these constructive comments. We appreciate that the article is considered to be a valuable contribution to the field of neuroimaging by providing BEN as an efficient and generalisable deep learning based tool for brain extraction. The major concern of this Reviewer is that a pretrained BEN leads to unsatisfactory performance on some external data (e.g. the reviewer’s own data), although the domain adaptation retraining algorithm on ~20 brains did lead to, as promised, excellent segmentation results. Here, we would like to emphasize that the initial version of BEN on Github was designed to reproduce the results we presented in the manuscript, not an optimized version for processing external datasets. To address this issue, we have optimized the BEN pipeline in the revised version, which is summarized as follows:

      1) Orientation detection. We found that in the original version of BEN, our training rodent images for BEN are all axial views, so it works the best on testing images of axial view. Therefore, if rodent MR images are loaded in other views (such as sagittal, coronal), the performance of BEN will degrade. To solve this issue, we have updated an orientation detection function in the BEN pipeline and automatically align other orientations to axial view, thus optimizing BEN’s performance.

      2) Performance optimization using plug-and-play functions. We have added post-processing steps to improve performance and running logs for quick inspection.

      3) Validation and tutorials. To further validate BEN’s generalization, we have evaluated BEN on two new external public ex-vivo MRI datasets (rTg4510 mouse: 25 ex-vivo scans, and C57BL/6 mouse: 15 ex-vivo scans). When only one label is used for BEN adaptation/retraining, impressive performance is achieved on both datasets, despite the fact that BEN was originally designed for in-vivo MRI data. To make the implementation transparent and give detailed guidance to users, we have prepared video tutorials on our Github/Documentation (https://github.com/yu02019/BEN#video-tutorials). Note that BEN’s performance may degenerate when dealing with MR images with low image quality. As an open-resource tool, BEN is extensible, our team will continuously maintain and update it.

      Nevertheless, there could be a couple of reasons that cause suboptimal performance when using a pretrained BEN. We discuss them below and have revised the manuscript accordingly (last paragraph in Discussion).

      On the one hand, as pointed out by the reviewer, domain generalization is a challenging task for deep learning. Although BEN could adapt to new out-of-domain images without labels (zero-shot learning) when the domain shift is relatively small (e.g. successful transfer between modalities and scanners with different MR strengths), the domain gap exists in ex-vivo MRI data used by the reviewer and in-vivo images in our training images could be so large that it compromises the performance. In this case, additional labeled data and retraining are indeed necessary for BEN to perform few-shot learning, which we have emphasized and demonstrated in our manuscript and confirmed by the reviewer (although in our opinion, it is possible we only need <5 more brains instead of 20 to complete the task).

      On the other hand, as a deep learning tool, it is difficult or nearly impossible to guarantee optimal performance on any unseen data. This is also a motivation for us to design BEN as an extensible tool. As stated in the manuscript, the source domain for BEN is flexible and does not bind to Mouse-T2-11.7T, in our manuscript. Instead, users can provide their own data and pretrained network as a new source domain, therefore facilitating domain generalization by reducing the domain gap between the new source and target domains.

    1. eLife assessment

      This paper will be of interest to those studying DNA replication in the context of chromatin and development. This important study uncovers a new interaction partner for the chromatin protein SuUR and tries to understand how this complex (SUMM4) functions to control under-replication in polytene chromosomes. While the experiments are of high quality and carefully controlled, the data currently do not fully support all the conclusions, particularly as they relate to conclusions about DNA replication timing.

      We appreciate a positive evaluation of our work. We agree that the relevance of under-replication phenomenon to the establishment of late replication in dividing cells has only been established based on circumstantial evidence. In the revised manuscript, we expand the explanation of this relationship and discuss limitations of the endoreplication model as applied to understanding of late DNA replication in the cell cycle of diploid cells. We also edited the abstract to soften our conclusions. We believe that the improvements made in the revised manuscript produced a more stringent alignment between our data and the conclusions.

      Reviewer #1 (Public Review):

      Andreyeva et al. developed a novel purification/mass spec approach to identify SuUR-associated proteins. From this biochemical tour de force, they identify a complex consisting of the insulator-associated protein Mod(Mdg4) and SuUR that they term, SUMM4. They show that this complex (at least SuUR) has ATPase activity, which is an exciting result was no known biochemical activity associated with SuUR. Given SuUR's function in the under-replication of Drosophila salivary glands, the authors show that SuUR and Mod(Mdg4) at least partially localize on polytene chromosomes and that SuUR displays at least a partial dependence on Mod(Mdg4) for localization to IH, but not PH regions. Finally, using two independent genetic reporters, they show that SuUR itself has an insulator function, which is a new function for SuUR and exciting as it is likely a diploid cell-specific function for SuUR. The authors then attempt to show the Mod(Mdg4) functions in under-replication. Unfortunately, under-replication is minimally, if at all, changed in the Mod(Mdg4) mutant. While the authors bring up several possible scenarios of why this could be, it is still uncertain whether Mod(Mdg4) has a direct effect on under-replication.

      Strengths:<br /> The authors developed a very useful strategy to identify protein interactions through multiple purification steps using mass spectrometry. This approach can be applied to different systems and will be generally useful to the community. Through this approach, they provide very compelling data that SuUR and Mod(Mdg4) form a complex. Furthermore, the experiments all have been rigorously performed and the data is of high quality.

      Weaknesses:<br /> The way the paper is written, its main focus is on under-replication. What the authors were not able to conclusively demonstrate is whether Mod(Mdg4) functions in under-replication.

      We thank the Reviewer for a positive evaluation of our work, specifically the biochemical and cytological results. Unfortunately, this Reviewer was less convinced by our conclusions about the role of Mod(Mdg4) in regulation of under-replication. However, we believe that our data strongly implicate Mod(Mdg4) in under-replication:

      1) Although SuUR is considered a bona fide suppressor of under-replication, its mutation does not fully restore DNA copy numbers in under-replicated regions of polytene chromosomes but, rather, by ~78% on average (Table 1). Although the mutation of mod(mdg4) produces a weaker recovery (~26% on average, Table 1), it is still robust and statistically significant. Presently, there is only one other mutant (Rif1) known to restore DNA copy numbers at most under-replicated regions in salivary gland polytene chromosomes.

      2) DNA copy numbers in SuUR and Rif1 mutants, which are homozygous viable and fertile, are measured in L3 larvae produced from crosses of homozygous parents, i.e. in the absence of maternally contributed gene products. In contrast, mod(mdg4) is essential for viability, and the DNA copy numbers have to be measured in homozygotes that have Mod(Mdg4) protein and RNA loaded by heterozygous mothers. Since endoreplication initiates before the maternal product is exhausted, it limits the observed suppression. However, when we directly compare zygotic functions of SuUR and mod(mdg4) by analyzing the progeny of heterozygous mod(mdg4)/+ and SuUR/+ parents, they appear indistinguishable.

      3) Finally, we demonstrate that Mod(Mdg4) is essential for the proper loading of SUUR in polytene chromosomes, thus implicating it as a direct, SUUR-dependent effector of late DNA replication.

      In the revised manuscript, we provide a clearer explanation of our results. We hope that our arguments and modifications of the manuscript will alleviate the Reviewer’s concerns.

      Reviewer #2 (Public Review):

      This paper from the Fyodorov lab reports the isolation of a native protein complex of SUUR, a Drosophila SNF2-related factor, in a complex with Mdg4, an established chromatin boundary protein. The discovery of this native complex, called SUMM4, was enabled by the development of a mass spec-linked proteomic analysis of fractions from an unbiased, conventional multi-step chromatographic purification of low-abundance protein complexes. The authors validate the native interactions by co-immunoprecipitation and show further with recombinant proteins that SUUR displays ATPase activity, a property not previously shown, and which is stimulated by Mdg4. From a functional perspective, authors demonstrate that both components SUUR and Mdg4 mediate activities of the Drosophila gypsy insulator that blocks enhancer-promoter interactions and acts as a heterochromatin-euchromatin barrier, and moreover, has a role in the under-replication of intercalary heterochromatin.

      Overall, this work is a substantial contribution to the field in two respects. First, it provides a new approach to the identification of novel native complexes that are of low abundance and difficult to isolate and identify by conventional biochemistry and mass spectrometry. Second, the interaction between Mdg4 and SUUR is novel and offers an ATP-driven pathway to be further investigated for understanding the mechanism of insulator (gypsy) function. Together, these advances are supported by the compelling quality and quantity of data. However, the paper does not read smoothly and can benefit from rewriting for readers who are not familiar with mass-spec proteomics or Drosophila biology.

      We thank the Reviewer for a positive evaluation of our work. To improve clarity, we made several modifications of our manuscript as requested by the Reviewer.

    1. Author Response

      Reviewer #1 (Public Review):

      The layered costs and benefits of translational redundancy by Raval et al. aim to investigate the impact of gene copy number redundancy on E. coli fitness, using growth rate in different media as the primary fitness readout. Genes for most tRNAs and the three ribosomal RNAs are present in multiple copies on the E. coli chromosome. The authors ask how alterations in the gene copy number affect the growth rate of E. coli in growth media that support different rates of growth for the wild type.

      While it was shown before that mutants with reduced numbers of ribosomal RNA operons grow at reduced rates in rich medium (LB), this study extends these findings and reaches some important conclusions:

      1) In a poor medium (supporting slow growth rates), the mutants with fewer rRNA operons actually grow faster than the wild type, showing that redundancy comes at a cost.

      2) The same is true for mutants with reduced gene copy number of certain tRNAs and correlates with slower rates of protein synthesis in these mutants.

      3) That rRNA operon gene copy number is more decisive for growth rate than any tRNA gene copy number (>1).

      In addition, measurements of strains with deletions of genes encoding tRNA-modification enzymes that affect tRNA specificity are included. While interesting, no unifying conclusion could be reached on the impact of these mutations on growth rate.

      Thank you for this clear summary of our work.

      The well-known "growth law" relationships between growth rate and macromolecular composition (RNA/protein ratio, for example) specifically concern steady-state growth rates. It is concerning that all growth rates in this work were measured on cultures that were only back-diluted 1:100 from overnight LB precultures. That only allows 6-7 doubling times before the preculture OD is reached again. The exponential part of growth would end before that, allowing perhaps only 3-4 generations of growth in the new medium before the growth rate was measured. Thus, the cultures were not in balanced growth ("steady state") when the measurements were made, rather they were presumably in various states of adapting to altered nutrient availability.

      A detailed connection with exact growth rate laws indeed requires growth rate measurement in steady-state. Hence, we refrained from making such a connection in this manuscript, though it would be an interesting future avenue to explore. Our main goal here was to ask how E. coli growth rate is affected by external nutrient availability and internal translation components. For this, the key comparisons involve the WT vs. gene deletion mutations, and rich vs. poor growth media. For any given comparison, strains were tested under identical conditions and experimental protocols, and hence we can address our main questions without the need to obtain steady-state growth. As an aside, we note that the nutrient fluctuations inherent in such experiments may also be more relevant than steady-state growth for natural bacterial populations.

      As noted by the reviewer, we measured fitness only in a relatively narrow growth regime of several doublings; but we do capture exponential growth by focusing on the early data points (representing the exponential phase) for our growth rate calculations. We have now explicitly mentioned this in the methods section “Measuring growth parameters”.

      A second concern is the use of the term "tRNA expression levels" in the text in Figure 4. I believe the YAMAT-seq method reports on the fractional contribution of a given tRNA to the total tRNA pool. Thus, since the total tRNA pool is larger in fast-growing cells than in slow-growing cells, a given tRNA may be present at a higher absolute concentration in the fast than in the slow-growing cells but will be reported as "higher in poor" in figure 4, if the given tRNA constitutes a smaller fraction of the total tRNA pool in rich than in poor medium. For this reason, the conclusions regarding the effect of growth medium quality on tRNA levels are not justified.

      Thank you for this important point. We agree that our phrasing was incorrect, and we have modified the relevant text and figures accordingly. The fractional contribution of a given tRNA isotype to the total tRNA pool is still useful to compare, and is justified as now rephrased.

      Reviewer #2 (Public Review):

      Raval et al. by creating a series of deletion mutants of tRNAs, rRNAs, and tRNA modifying enzymes, have shown the importance of gene copy number redundancy in rich media. Moreover, they successfully showed that having too many tRNAs in poor media can be harmful (for a subset of the examined tRNAs). Below, please find my comments regarding some of the methodologies, conclusions, and controls needed to stratify this manuscript's findings.

      Figure 2 presents Rrel as a relative measurement (GRmut/GRwt). Therefore, I'm confused as to how Rrel can be negative, as shown in supplemental file 3 (statistics).

      We apologize for the confusion. Supplemental file 3 shows details of the statistical analysis (not raw data), and we included the effect size here (mean difference between the WT and the mutant relative growth rate) along with statistical significance. Thus, if the rel R of a given mutant is 1.1, the mean difference would be (1–1.1) = –0.1, meaning that it is performing 10% better than the WT.

      The “raw” relative growth rates are provided in source data files (labeled figure-wise), and there are no negative values there, as expected.

      We have now explicitly (and separately) referenced the source and statistics data files in the data analysis section in the methods, and in each figure legend. We hope this avoids confusion and makes it easier for readers to find the correct file.

      Does Figure 3 show the mean of 4 biological replicates or technical replicates? It should be stated clearly in the legend of figure 3.

      All replicates are biological replicates until unless stated otherwise. This is now stated in the methods (lines 185-187), and in the figure legends.

      Do all strains (datapoint on figure 3 left panel) significantly perform better than the WT in nutrient downshift? Looking at supplemental file 3 I see this is not the case. Please mark the statistically significant points. I suggest giving each set a different symbol/shape and coloring the significant ones in red.

      We had considered indicating statistical significance in the plot, but decided not to do so because it was difficult to show the many potentially useful layers of information without cluttering the plot. One other practical difficulty was that each point in the figure represents two values: one from the upshift (Y axis) and one from the downshift (X axis). For some mutants the fitness difference was significant in only one direction, so it was not straightforward to indicate significance. Further, our main goal here was to show where strains from different deletion Sets (Figure 1) fall in this plot (i.e. which quadrant they occupy), and so we wanted to ensure that points were easily distinguished by Set. In the text we do not include statistically non-significant points in the summary of observed patterns, and refer readers to information on statistical significance provided in the supplemental file.

      Another issue is that in the statistics of figure 2 (in supplemental file 3), positive values reflect cases where the mutant performs poorly compared to the WT, while in figure 3 the negative values indicate this. Such discrepancy is not very clear. And again, how can Rrel be negative?

      As noted in response to an earlier comment, Rrel values (given in source data files) are not negative, but effect sizes (given in supplemental file with statistics) may be negative or positive since they show differences in the relative growth rate of WT and mutant. We agree that the discrepancy between the calculation of mean difference for Figs 2 and 3 was confusing. We have now fixed this: in both cases, negative mean difference values now indicate that the mutant performs better.

      Both axes say glycerol. What about galactose?

      The typo has been corrected.

      Lines 414-419: The authors state that "all but one had a growth rate that was comparable to WT (16 strains) or higher than WT (10 strains) after transitioning from rich to poor media (i.e. during a nutrient downshift, note data distribution along the x-axis in Fig 3; Supplementary file 3). In contrast, after a nutrient upshift, 11 strains showed significantly slower growth in one or both pairs of media, and only 2 showed significantly faster growth than WT (note data distribution along the y-axis in Fig 3; Supplementary file 3)".

      Looking at the Rrel values when transitioning from TB to Glycerol and vice versa suggests no direction in the effect of reducing redundancy. During downshift, four strains perform better, and three strains perform worse than the WT. During upshift, four stains perform better, and six strains perform worse. Only during downshift and upshift from TB to Gal and vice versa give a strong signal.

      The authors should write it clearly in the text because the effect is specific to that transition/conditions and not of general meaning is written in the text (e.g., transition from every rich to every poor media and vice versa). I am convinced that the authors see an actual effect when downshifting or upshifting from TB to galactose and vice versa. In that case, the conclusion is that redundancy is good or bad depending on the conditions one used and not as a general theme.

      Also, this is true just for some tRNAs, so I don't think the conclusion is general regarding the question of redundancy.

      The fitness impacts of altered redundancy are best explained by a combination of multiple factors (in addition to nutrient availability): the number of tRNA genes deleted, number of tRNA gene copies remaining as a backup, availability of wobble or ME as backup, and codon usage. Thus, any of these variables alone would provide only partial explanation for the observed fitness effects of all strains.

      In many tRNA deletion strains – especially single gene deletions – redundancy was not significantly lowered by the deletion, as we explain in the results section. These strains were therefore not expected to show major fitness impacts or follow strong nutrient dependent trends, and this is what we observe.

      The same is true for nutrient upshift-downshift experiments, where a vast majority of strains were not expected to show a specific pattern because they do not show significant fitness impacts in general, nor do they show a strong correlation in relative fitness impacts vs. growth rate (Figure 1d). In addition, in these experiments the difference between the two media also matters. For example, comparing the maximum WT growth rate, M9 Gal is poorer than M9 Glycerol. Therefore, shifts between TB-Gal are nutritionally more drastic than TB-Gly shifts, and one would expect a larger fitness impact in the former (for strains with significantly altered redundancy). Hence, despite differences across media pairs, our broader conclusions about the impact of redundancy are generalizable as long as redundancy and nutrients are both substantially altered, e.g. due to deletion of 3 tRNA genes, deletion of tRNA+ME, or deletion of multiple rRNA operons.

      Figures are indicated differently along the text. Sometimes they are written "figure X", sometimes FigX. Referring to the supplemental figures are also not consistent.

      We have now corrected this.

      Line 443-444: "In fact, 10 tRNAs were significantly upregulated in the poor medium relative to the rich medium".

      This result contradicts the author's hypothesis. If redundancy is bad in poor media because the cells have more tRNAs than they need, the tRNAs level will be downregulated, not upregulated. How do the authors explain this?

      This statement referred to the WT strain, and was meant to highlight that (as noted by the reviewer) some tRNAs appear to be upregulated in poor medium, which is counterintuitive. However, as noted by reviewer 1 (see their comment on the interpretation of YAMAT-seq data), we can only infer the relative contribution of each tRNA isotype to the total tRNA pool (rather than absolute up- or down- regulation). Thus, we have removed this specific sentence, and instead we focus on the mismatch between the media-dependent changes in the composition of the tRNA pool and the fitness effects of different tRNA isotypes (lines 475-482).

      Line 445-447: "In contrast (and as expected), all tested tRNA deletion strains had lower expression of focal tRNA isotypes in the rich medium (Fig 4B, left panel), showing that the backup gene copies are not upregulated sufficiently to compensate for the loss of deleted tRNAs". It is great that the authors validated the expression in their strains. However, for accuracy, please indicate that it was done in four strains to avoid the impression that they did it in all the strains.

      We have now reworded this sentence to remind readers that we measured 4 tRNA deletion strains in this experiment.

      Finally, across the manuscript, the authors reveal that deleting some tRNAs or modifying enzymes can be deleterious in rich media or advantageous in poor media. However, I think this result and the conclusions derived from it could be more convincing if the authors would show in a subset of their strains that expressing the deleted tRNAs or modifying enzymes from a plasmid can rescue the phenotype.

      Thank you for this suggestion. For a small subset of strains, we now include data showing that complementation from a plasmid indeed rescues the deletion phenotype (Fig 2 – Fig supplement 7).

      Reviewer #3 (Public Review):

      In this manuscript, Raval et al. investigated the cost and benefit of maintaining seemingly redundant components of the translation machinery in the E. coli genome. They used systematic deletion of different components of the translation machinery including tRNA genes, tRNA modification enzymes, and ribosomal RNA genes to create a collection of mutant strains with reduced redundancy. Then they measured the effect of the reduced redundancy on cellular fitness by measuring the growth rate of each mutant strain in different growth conditions.

      This manuscript beautifully shows how maintaining multiple copies of translation machinery genes such as tRNA or ribosomal RNA is beneficial in a nutrient-rich environment, while it is costly in nutrient-poor environments. Similarly, they show how maintaining parallel pathways such as non-target tRNA which directly decodes a codon versus target tRNA plus tRNA modifying enzymes which enable wobble interactions between a tRNA and a codon have a similar effect in terms of cost and benefit.

      Further, the authors show the mechanisms that contribute to the increased or reduced fitness following a reduction in gene copy number by measuring tRNA abundance and translation capacity. This enables them to show how on one hand reduced copy numbers of tRNA genes result in lower tRNA abundance in rich growth media, however in nutrient-limiting media higher copy number leads to increased expression cost which does not lead to an increased translation rate.

      Overall, this work beautifully demonstrates the cost and benefits of the seemingly redundant translation machinery components in E. coli.

      Thank you for the clear summary and encouraging comments.

      However, in my opinion, this work’s conclusion should be that the seeming redundancy of the translation machinery is not redundant after all. As mentioned by the authors, it is known that tRNA gene copy number is associated with tRNA abundance (Dong et al. 1996, doi: 10.1006/jmbi.1996.0428), this effect is also nicely demonstrated by the authors in the section titled “Gene regulation cannot compensate for loss of tRNA gene copies”. Moreover, this work demonstrates how the loss of the seeming redundancy is deleterious in a nutrient-rich environment. Therefore, I believe the experiments presented in this work together with previous works should lead to the conclusion that the multiple gene copies and parallel tRNA decoding pathways are not redundant but rather essential for fast growth in rich environments.

      The point is well taken. However, as described in the introduction, here we focus on functional redundancy at the cellular level, where there are multiple ways of achieving the same translation rate. Hence we say that translation components are redundant at this level of analysis. One of the key conclusions from our work is that such redundancy is context-dependent, i.e. it is essential when rapid growth is possible, but is costly and dispensable otherwise. Therefore, we show that the definition of redundancy itself changes with environmental conditions.

      The following analogy may help convey this. There may be many ways to reach a flight on an airport: multiple entrances, multiple check-in and security check counters, multiple boarding gates, etc. On a deserted airport these may seem redundant and even costly to maintain. On the other hand, they have a utility when traffic is high. Hence even though from a purely architectural perspective the multiple routes are redundant, from a utilitarian perspective it depends on the flux of passengers.

    1. Author Response

      Reviewer 2 (Public Review):

      The paper addresses the question of how brain circuits associate stimuli onto abstract representations, and how both the neuronal activity and the synaptic connectivity change during this process. To do so, the authors make use of a feedforward network model that learns to map stimuli vectors onto two categories by means of gradient descent. They show that the model successfully learns the abstract classes in a simple and context-dependent categorisation task. The authors analyse a number of measures, like category and context selectivity to link their results to experimental findings. Moreover, they analyse the network thoroughly and unravel network and task properties that may underlie previous, seemingly contradictory experimental findings. The paper is very well written, the analyses and mathematical derivations are very thorough and the results are convincing. However, the work and its presentation would benefit from a few changes:

      1) The paper may benefit from a more thorough discussion on how the results fit into the current literature (neuroscience and machine learning) and how the findings may generalise to more complex tasks and network structures (Dale’s principle, including recurrent/feedback connections, more than two categories, more than one hidden layer, alternatives to gradient descent).

      2) While the simulations and detailed analyses in the results and methods section are very convincing, some claims should be also supported by more intuitive explanations so that a broader audience can be reached.

      3) The introduction to the context-dependent task may need to be revised because as now the difference to the simple task presented first is not immediately clear.

      4) It would be nice if their findings could be related back to the experimental literature more qualitatively. While the authors mention the contradictory findings in monkey and rat PFC vs. monkey LIP in their introduction, a thorough comparison with those findings is missing.

      We thank the reviewer for his detailed assessment and his supportive words. We hope that our revision addresses your suggestions. Concerning point 4: we agree with the reviewer that a thorough comparison with experimental findings would be important, and is currently missing. A thorough comparison would require, however, a number of additional steps that we feel lie beyond the scope of this manuscript (adapt the tasks to each different experimental setup, e.g. by increasing the number of categories and changing the structure of context-dependent associations; re-analyse experimental data).

      We have thus decided to leave this major effort for future work.

    1. Author Response

      Public Evaluation Summary

      The authors aim to tackle a fundamental question with their study: whether there is a direct age-associated increase of transcriptional noise. To investigate this question, they develop tools to analyze single-cell sequencing data from mouse and human aging datasets. Ultimately, application of their novel tool (Scallop) suggests that transcriptional noise does not change with age, changes in transcriptional noise can be attributed to other sources such as subtle shifts in cell identity. This study is in principle of broad interest, but it currently lacks a definitive demonstration of the robustness of Scallop. Systematic testing of this new package would ultimately strengthen the key conclusion of the work and give additional users more confidence when using the tool to estimate expression noise.

      We have now attempted to further demonstrate the robustness of Scallop by performing a more systematic analysis and a side-by-side comparison to other existing methods using a set of artificially generated datasets. These analyses have resulted in the inclusion of six supplementary figures that are presented in the subsections Scallop membership score accurately identifies transcriptionally noisy cells, Ability to detect noisy cells within cell types, Effect of cellular composition, Effect of dataset size, Effect of feature expression and Effect of cell type marker expression within the Results section of the revised manuscript.

      We have also included a supplementary figure showing an in-depth analysis of a dataset where ageassociated increase in transcriptional noise was detected using alternative methods, but whose closer dissection has revealed that the difference in noise is due to a single donor and to the choice of methods. We discuss this is in the subsection Distance-to-centroid methods detect transcriptionally stable cell subtypes as transcriptional noise within the Results section.

      Finally, we have revised the manuscript to clarify the main points raised by the reviewers: the definition of transcriptional noise, the reasoning behind the choice of the single-cell aging datasets and Leiden’s rationale. Also, we have expanded the description of the method to make the definition of membership score more clear to the readers, and discussed the implications of our main findings (a lack of evidence for age-related transcriptional noise) in the broader context of theories of aging.

      Reviewer #1 (Public Review):

      In the present study, Ibanez-Sole et al evaluate transcriptional noise across aging and tissues in several publicly available mouse and human datasets. Initially, the authors compare 4 generalized approaches to quantify transcriptional noise across cell types and later implement a new approach which uses iterative clustering to assess cellular noise. Based on implementation of this approach (scallop), the authors survey noise across seven sc-seq datasets relevant for aging. Here, the authors conclude that enhanced transcriptional noise is not a hallmark of aging, rather changes in cell identity and abundances, namely immune and endothelial cells. The development of new tools to quantify transcriptional noise from sc-seq data presents appeal, as these datasets are increasing exponentially. Further, the conclusion that increased transcriptional noise is not a defined aspect of aging is clearly an important contribution; however, given the provocative nature of this claim, more comprehensive and systematic analyses should be performed. In particular, the robustness and appeal of scallop is still not sufficiently demonstrated and given the complexity (multiple tissues, species and diverse relative age ranges) of datasets analyzed, a more thorough comparison should be performed. I list a few thoughts below:

      Initially, the authors develop Decibel, which centralizes noise quantification methods. The authors provide schematics shown in Fig 1, and compare noise estimates with aging in Fig 2 - Supplement 2. Since the authors emphasize the necessary use of scallop as a ”better” pipeline, more systematic comparisons to the other methods should be made side-by-side.

      We thank the reviewer for their positive assessment of the manuscript and their suggestions. We agree that side-by-side benchmarking of Scallop with the methods implemented in Decibel, as well as a more thorough analysis on the effect of different features such as dataset size, cellular composition, etc. might have on the output of Scallop will reinforce the main points of the manuscript. To experimentally respond to these requests, we took advantage of a set of four artificial datasets previously generated by us with the R package splatter (v1.10.1; as described in Ascensión et al. [1]). In the present work, we first run a side-by-side comparison between Scallop and two distance-to-centroid (DTC) methods on the four artificial datasets with increasing degrees of transcriptional noise present in them (the novel data are included as Figure 1 – Figure supplement 1 in the revised manuscript). Then, we compared Scallop to one DTC method regarding their ability to detect noisy cells in different cell types (Figure 1 – Figure supplement 2). Finally, we implemented four simulations to test the effect of the following features on the performance of Scallop: cellular composition (Figure 1 – Figure supplement 3), dataset size (Figure 1 – Figure supplement 4), number of genes (Figure 1 – Figure supplement 5) and marker gene expression (Figure 1 – Figure supplement 6). A summary of these results follows.

      Side-by-side comparison of Scallop vs DTC methods

      Each of the four artificial datasets used consists of 10K cells, from 9 populations, named Group1 to Group9, with the following relative abundances: 25, 20, 15, 10, 10, 7, 5.5, 4, and 3.5%, respectively. The four datasets only differ in the de.prob parameter used in their generation. The de.prob parameter determines the probability that a gene is differentially expressed between subpopulations within the dataset. The greater the de.prob value, the more differentially expressed genes there will be between clusters, meaning that the different cell types present in the dataset will cluster in a more robust way. Decreasing the value of de.prob results in datasets with noisy cells, with populations that do not have such a strong transcriptional signature. In order to study how Scallop can capture the degree of robustness with which cells of the same cell type cluster together, we selected four de.prob values (0.05, 0.016, 0.01 and 0.005) and measured transcriptional noise using Scallop and two DTC methods, the whole transcriptome-based Euclidean distance to cell type mean and the invariant gene-based Euclidean distance to tissue mean expression. These two methods were selected because GCL does not yield a transcriptional noise measure per cell, so no comparisons can be made with respect to the amount of noisy cells the method is able to detect within a cluster. Similarly, comparing Scallop to the ERCC spike in-based method was not possible for artificial datasets. Importantly, these analyses showed that Scallop, unlike DTC methods, was able to discern between the core transcriptionally stable cells within each cell type cluster from the more noisy cells that lie in between clusters (provided in the Figure 1 - Supplement 1 of revised manuscript).

      Effect of dataset features on the performance of Scallop

      We simulated five artificial datasets with the same nine cell type populations but whose relative abundances were different between datasets. We used the imbalance degree (ID) to measure class imbalance in each of them and to make sure that the selected cell compositions represented a wide range of imbalance degrees (to this end, we explored ID values between 1.2 and 5.3). The ID provides a normalized summary of the extent of class imbalance in a dataset in so-called ”multiclass” settings, that is to say, where more than two classes are present. It was specifically developed to improve the commonly used imbalance ratio (IR) measurement, whose calculation only considers the abundance of the most and the least popular classes and which gives the same summary for datasets with different numbers of minority classes. The presence of multiple minority classes is not uncommon in single-cell RNAseq datasets, as tissues might contain several rare cell types. We observed that the transcriptional noise measurements provided by Scallop were very robust to changes in imbalance degree (see Figure 1 - Supplement 3), both in qualitative and in quantitative terms. For instance, Group2 and Group8 were always detected as the most stable and noisiest cell types, respectively, regardless of their relative abundance in the dataset, and their average percentage of noise had little variation between different ID values: it ranged between 0-0.14% (Group2) and 16-18% (Group8).

      The effect of dataset size (number of cells) and the number of genes was evaluated by generating versions of an artificial dataset where cells/genes had been subsampled from an original artificial dataset (the one generated with de.prob=0.001). We tested datasets sized 1,000-10,000 cells and with a number of genes between 5,000 and 14,000. Dataset size had nearly no impact on the transcriptional noise measurements provided by Scallop (Figure 1 - Supplement 4 of the revised manuscript). The average percentage of transcriptional noise per cell type remained within a narrow range as we implemented a ten-fold increase in dataset size. Perhaps more strikingly, removing the expression of most genes did not substantially impact transcriptional noise measurements per cell type (Figure 1 - Supplement 5). The variation when removing half of the genes (7,000 genes) was minimal, and we did not see important changes in transcriptional noise measurements unless over 60% of the genes from the original dataset were removed. For example, Figure 1 - Supplement 5C shows that noise measurements suffer important variations when removing 8,000 and 9,000 genes (and therefore keeping 6,000 and 5,000 genes, respectively), but only some cell types (Groups 4, 7, 8 and 9) were affected by these variations.

      In order to measure the effect marker gene expression has on the membership with which cells are assigned to their cell type cluster, we ran a simulation where the top 10 markers for a cell type were removed from the dataset one by one, so that the first simulation lacked the expression of the Top1 marker, the second simulation had the effect of the first 2 markers removed (Top1 and Top2), and so on. Then, we ran Scallop on each of the resulting datasets and observed a steady increase in transcriptional noise associated with that cell type. This provided evidence that the strength of cell type marker expression in a cluster is directly related to its transcriptional stability (or lack of transcriptional noise). We included the result of this experiment in the revised version of the manuscript (Figure 1 - Supplement 6).

      In conclusion, by using artificially generated datasets where the ground truth (cell type labels, degree of noise, etc) was known, the newly provided systematic analyses showed that Scallop had a remarkably robust response to said changes in dataset features, further reinforcing the manuscript conclusions.

      For example, scallop noise estimates (Fig 2) compared to other euclidean distance-based measures (Fig 2 supplement 2) looks fairly similar.

      It is true that some datasets show similar trends regardless of the transcriptional noise quantification method. For instance, the murine brain dataset by Ximerakis et al. shows no overall change in noise between the age groups across different methods. However, we do observe important differences in other examples. This is the case of the human pancreas dataset by Enge et al. and the human skin dataset by Solé-Boldo et al., where not only the magnitude but also the directionality of the trend are different depending on the method used to measure noise. In the former, three methods (Scallop, invariant gene-based Euclidean distance to average tissue expression and GCL) show an age-related increase in noise, whereas one method (whole transcriptome-based Euclidean distance to the cell type mean) shows a decrease in noise. In the latter, two methods (Scallop and GCL) yield a decrease in noise and the two DTC methods measure a mild increase in noise. These inconsistencies can now be reconciled with our proposed explanation that said ”noise” may actually be referring to substantially different biology in the diverse experimental settings.

      Are downstream observations (ex lung immune composition changes more than noise) supported from these methods as well? If so, this would strengthen the overall conclusion on noise with age, but if not, it would be relevant to understand why.

      Studying changes in cell type composition in the lung and other aged tissues would be highly pertinent. Nevertheless, we have measured changes in cell type composition using only one method that is based on Generalized Linear Models, covered in the subsection Age-related cell type enrichment of the Methods. The methods that we have compared in our study (DTC methods, ERCC-based methods, GCL, etc.) were all designed to measure transcriptional noise, but not changes in cell type composition.

      Whether the effects of cell type composition changes are bigger than changes in noise for the rest of the methods used to measure noise was probably not clear enough in the original manuscript. We found no evidence for an increase in noise associated with aging, regardless of the method used. Although not included in the manuscript, we did generate heatmaps similar to the one shown in Figure 3B for each of the noise quantification methods. However, as the heatmap on the right side (the one showing cell type enrichment) was identical in each figure, we considered them to be redundant and decided not to include them, since they did not provide any additional insight besides giving more examples of lack of evidence for transcriptional noise, this time at the cell type level. We consider that the lack of evidence was already well demonstrated in the previous analyses (Figure 2 and Figure 2 - Supplement 2.

      Similarly, the ’validation of scallop seems mostly based on the ability to localize noisy vs stable cells in Fig 1 supplement 1 and relative robustness within dataset to input parameters (Fig 1 supplement 2). A more systematic analysis should be performed to robustly establish this method. For example, noise cell clustering comparisons across the 7 datasets used. In addition, the Levy et all 2020 implemented a pathway-based approach to validate. Specifically, surrogate genes were derived from GCL value where KEGG preservation was used as an output. Similar additional types of analyses should be performed in scallop.

      We believe that this legitimate concern is now solved with the newly included data. In particular, with the systematic comparison between Scallop and DTC methods on three artificially generated datasets with different degrees of transcriptional noise provided in Figure 1 - Supplement 2. The ability of Scallop to detect cells that are particularly noisy within a cell type, or cells that lie between cell types, may represent its biggest advantage with respect to other methods. DTC methods fail to discern between stable and noisy cells within cell types. Also, in our analysis, DTC methods were unable to distinguish between cell types that have a marked transcriptional program (which systematically cluster together) and those that have a less clear transcriptomic identity (which have at least part of their cells be assigned to other cell types across bootstrap iterations). However, comparing the performance of Scallop on the same datasets showed that our method was able distinguish between the two cases.

      The conclusion that immune and endothelial cell transcriptional shifts associate more with age than noise are quite compelling, but seem entirely restricted to the mouse and human lung datasets. It would be interesting to know if pan-tissues these same cell types enrich age-related effects or whether this phenomenon is localized.

      We agree with the reviewer that it would be very interesting to see whether a change in cell type composition (and particularly, an increase in abundance of immune cell types) is observed in aged tissues other than the lung. Qualitative cell type composition changes in the aging lung have been described in the literature [5]. Specifically, the higher abundance of immune cell types was observed in a single-nucleus RNAseq dataset of cardiopulmonary cells in Macaca fascicularis [6]. However, we believe that trying to answer the question whether this phenomenon holds in other tissues would require a systematic analysis of several datasets for each tissue with a sufficient number of donors/individuals in each of them. This is because our approach to measure age-associated cell type enrichment using generalized linear models relies heavily on having multiple biological replicates for each age group. Unfortunately, this is not the case for most published single-cell RNAseq datasets of aging. In any case, we have toned down the last sentence in the subsection Changes in the abundance of the immune and endothelial cell repertoires characterize the human aging lung by making it more clear that our claim regarding changes in the cellular composition of aged tissues is based on lung datasets (the text in italics represents what was added in the revised version of the manuscript):

      "Even though the evidence for changes in tissue composition are based on a single tissue, we hypothesize that these facts may have influenced previous analyses of transcriptional noise associated with aging."

      As discussed in the original manuscript, there is evidence published by other groups pointing out to pantissue changes in cellular composition with age, which undoubtedly will influence those analyses that did not pay attention to cellular composition changes in the datasets that they compared. Cellular composition is in fact a very important aspect that has been greatly overlooked. In fact, only one [7] out of the seven articles that had measured transcriptional noise in aging (the datasets used in Figure 2) had attempted to remove its effect by subsampling cells to balance compositions between age groups prior to their noise analysis. In any case, we do not believe this is the only phenomenon underlying the purported increase in transcriptional noise associated with age. Each dataset will most probably have different issues that the authors originally misread as an increase in noise or loss of cellular identity of a particular organ or tissue. As an additional example of such phenomena, we have now included a re-analysis of the data by Enge et al. [3] on ”noisy” β-cells in the aged human pancreas (Figure 5–Figure supplement 2 of the revised manuscript). In this case, rather than observing an age-dependent pattern, the 21-year-old donor presents much lower transcriptional noise values than the rest of the donors. However, there is no significant difference between the 22-year-old donor and the rest of the donors. We conclude that the statistically significant differences between the ”young” and ”old” age categories can be attributed to the abnormal noise values obtained for the 21-year-old donor, of uncertain origin. Finding out all causes of apparent transcriptional noise in other organs and tissues would be too lengthy, and certainly out of scope for the present manuscript.

      Related to these, there does not seem to be a specific rationale for why these datasets (the seven used in total or the lung for deep-dive), were selected. Clearly, many mouse and human sc-RNA-seq datasets exist with large variations in age so expanding the datasets analyzed and/or providing sufficient rationale as to why these ones are appearing for noise analyses would be helpful. For example, querying ”aging” across sc-seq datasets in Single cell portal yields 79 available datasets: https://singlecell.broadinstitute. org/single_cell?type=study&page=1&terms=aging&facets=organism_age%3A0%7C103%7Cyears.

      We now realize that the reasoning behind our selection of aging datasets was not sufficiently clear in the original manuscript. We thank the reviewer for pointing out this omission. We have made a more explicit reference to Appendices 2, 3, 4 and 6 in the revised manuscript. The seven selected scRNAseq datasets are those where transcriptional noise had originally been measured by the authors, using the computational methods that we later implemented in Decibel. Our aim was to first recapitulate previous reports of transcriptional noise using our novel method (Scallop). Thus, we downloaded all publicly available scRNAseq datasets of aged tissues where transcriptional noise had explicitly been measured. Some of them had reported an increase in transcriptional noise only in some cell types (for instance, the human aged pancreas dataset by Enge et al. [3]), whereas others found an increase in most cell types [7]. Appendix 2 summarizes the main features of those seven datasets (tissue, organism and number of cells) and provides information on whether an increase in transcriptional noise was observed in the original article where they were published. Additionally, the ”scope” column indicates where that increase was found (in which cell types), and the ”Method” column briefly describes the computational method used to measure transcriptional noise in that article. Appendix 3 provides information on the final datasets that were used in our analysis (Figure 2). Not every sample from the original dataset was included, so the inclusion criteria are specified there, as well as the number of cells, individuals and age of each of the cohorts. Appendix 4 shows the abnormal count distribution of two samples that were discarded from the Kimmel lung dataset. As for the selection of lung for the deep dive, the reason was that this was the organ with most datasets available, both for mouse and human. Appendix 6 provides information on the number of cells and donors per age cohort in the human lung datasets included in this study.

      We have included the following sentence in the Increased transcriptional noise is not a universal hallmark of aging subsection in the Results:

      "We provide a summary of the main characteristics of each dataset, as well as the findings regarding transcriptional noise obtained in each of the original studies, whether changes in transcriptional noise were restricted to particular cell types, and the computational method used to measure noise (see Appendix 2)."

      The analysis that noise is indistinguishable from cell fate shifts is compelling, but again relies on one specific example where alternative surfactant genes are used as markers. The same question arises if this observation holds up to other cell types within other organs. For example the human cell atlas contains over dozens of tissue with large variations in age (https://www.science.org/doi/10.1126/science. abl4290).

      We sympathize with this comment but hope that the reviewer will agree with us that providing an additional example of different phenomena originally reported as ”transcriptional noise” (in this case in aged human pancreas; see Figure 5 – Figure supplement 2), but actually reflecting something else, may be sufficient to prevent interested readers. In our opinion, it is likely that diverse phenomena will underlie the purported increases in transcriptional noise, and a re-analysis should be made case-by-case. We can only hope that researchers in the field re-analyze the available aging datasets in this new light.

      Reviewer #2 (Public Review):

      In this manuscript, Ibanez-Sole et al. focus on an important open question in ageing research; ”how does transcriptional noise increase at the cellular level?”. They developed two python toolkits, one for comparison of previously described methods to measure transcriptional noise, Decibel, and another one implementing a new method of variability measure based on cluster memberships, Scallop. Using published datasets and comparing multiple methods, they suggest that increased transcriptional noise is not a fundamental property of ageing, but instead, previous reports might have been driven by age-related changes in cell type compositions.

      I would like to congratulate the authors on openly providing all code and data associated with the manuscript. The authors did not restrict their paper to one dataset or one approach but instead provided a comprehensive analysis of diverse biology across murine and human tissues.

      While the results support their main conclusions, the lack of robustness/sensitivity measures for the methods used makes it difficult to judge the biology.The authors use real data to compare between methods but using synthetic data with known artificial ’variability’ across cell clusters can first establish the methods, which would make the results more convincing and easier to interpret. Despite the comprehensive analysis of biological data, a detailed prior description of how the methods behave against e.g. the number of cells in each cell type cluster, the number of cell types in the dataset, and % feature expression, would make the paper more convincing. Once the details of the method is provided, the python toolkit can be widely used, not limited to the ageing research community. I am also concerned that a definition of ’transcriptional noise’ (e.g. genome-wide noise, transcriptional dysregulation in cell-type-specific genes, noise in certain pathways) and its interpretation with regard to the biology of ageing is missing. Differences in different methods could be explained by the different biology they capture. Moreover, the interpretation of a lack of different types of variability may not be the same for the biology of ageing.

      Increased transcriptional noise is compatible with genomic instability, loss of proteostasis and epigenetic regulation. Showing a lack of consistent transcriptional noise can challenge the widespread assumptions about how these hallmarks affect the organism. Overall, I found the paper very interesting and central to the field of ageing biology. However, I believe it requires a more detailed description of the methods and interpretations in the context of biology and theories of ageing.

      We thank the reviewer for their positive assessment of the manuscript and their suggestions. We respond to each of the specific comments below.

      Major comments

      1) The concept of transcriptional noise is central to the manuscript; however, what the authors consider as transcriptional noise and why is not clear. Genome-wide vs. function or cell-type specific noise could have different implications for the biology of ageing. In line with this, a discussion of the findings in the context of theories of ageing is necessary to understand its implications.

      We thank the reviewer for pointing out the lack of clarity in this key point. The use of the ”transcriptional noise” term in the literature is quite heterogeneous, and we agree that the lack of a consensus definition may be confusing to the reader. For this reason, we adopted in the introduction the definition by Raser and O’Shea [8] as ”the measured level of variation in gene expression among cells supposed to be identical”, i.e. the sum of both intrinsic and extrinsic noise as previously defined by Swain and colleagues [9, 10]. In our opinion, this is generally what the literature of age-associated transcriptional noise is referring to.

      With Scallop, we aimed to translate this concept to the context of single-cell RNAseq datasets, where clusters obtained using a community detection algorithm are typically annotated as distinct cell types.

      Therefore, we aimed to measure transcriptional noise here defined as ”lack of membership to cell type clusters”. When running a clustering algorithm iteratively, if a cell is not unambiguously assigned to the same cluster, we consider it to be noisy. Conversely, when a cell consistently clusters with the same group of cells, we consider it to be stable. The membership score we use as a measure of stability is the frequency with which any given cell was assigned to the same cluster across all iterations.

      We have included in the Results section an explicit reference to the Methods subsection that explains how Scallop works in detail, so that the readers can easily find that information:

      "A detailed description of the three steps of the method (bootstrapping, cluster relabeling and computation of the membership score) is provided in the Scallop subsection in the Methods."

      Additionally, we have now realized that the formula to compute the membership score might be more easily understood if we renamed the freq_score as freq_score(c), to make it clear that each cell is assigned a score. Also, we have used n and m instead of i and j in this notation, to avoid confusing the readers with the notation used in the previous section, where i and j represented the i-th and j-th bootstrap iterations. Finally, we have included a small paragraph to clarify what each component of the formula refers to. Below we show the formula and text included in the Methods section of the revised manuscript:

      "Where |cn| is the number of times cell c was assigned to the n-th cluster, and Pm∈clusters |cm| is the sum of all assignments made on cell c, which is the same as the number of times cell c was clustered across bootstrap iterations."

      Thus, and in order to accommodate this reviewer’s concerns, we have now included this exact definition of how we measure noise plus a statement making clear that we refer to the sum of both intrinsic and extrinsic noise aspects, with no distinction among them.

      Similarly, we had discussed our findings in the framework of different theories of aging, such as their potential relationship to some of the established hallmarks of aging (genomic instability, epigenetic deregulation and loss of proteostasis), as well as with more recent theories of aging such as cell type imbalance in aged organs [11] and inter-tissue convergence [12]. However, it is now clear to us that this was not enough so we have now expanded these paragraphs to make our understanding of the work implications better understood. More specifically:

      "Our results suggest that transcriptional noise is not a bona fide hallmark of aging. Instead, we posit that previous analyses of noise in aging scRNAseq datasets have been confounded by a number of factors, including both computational methods used for analysis as well as other biology-driven sources of variability."

      2) While I found the suggested method, Scallop, quite exciting and valuable, I would suggest including a number of performance/robustness measures (primarily based on simulations) on how sensitive the method is to the number of cells in each cell type (cellular composition), misannotations, % feature expression (number of 0s) etc.:

      We have analyzed the effect of cellular composition and the percentage of feature expression by using artificially generated datasets (see Figure 1 - Supplements 3 and 5, respectively; and section Effect of dataset features on the performance of Scallop in the response to reviewer #1). Although studying the effect of misannotations on downstream analysis is important, we believe that Scallop was already designed so that its effects could be avoided, since the membership is measured for each cluster (and not for each cell type label). That is to say, a reference clustering is obtained at the beginning of the pipeline and memberships are computed using that output as a reference, which means Scallop noise values attributed to each cell are not affected by the original labeling of the dataset.

      The output of these analyses reinforced our original conclusions, and it is now included in the Results section:

      "In order to characterize and validate our method for transcriptional noise quantification, we conducted three types of analyses. First, we used artificially generated datasets containing various degrees of transcriptional noise to compare the performance of Scallop and DTC methods side-by-side, regarding their ability to measure transcriptional noise and detect noisy cells within cell types. Next, we ran simulations using artificial datasets in order to study the effect of a number of dataset features on the performance of Scallop: cellular composition, dataset size, number of genes and marker expression. Finally, we graphically evaluated the output of Scallop on a dataset of human T cells, we analyzed its robustness to its input parameters, and we studied the relationship between membership and robust marker expression, using a PBMC dataset."

      2.1) Most importantly, knowing that cell-type composition changes with age, it is important to know how sensitive community detection is to the number of cells in each cell type. While the average can be robust, I wonder if the size of the cell-type cluster affects membership (voting).

      We have included an analysis on a set of artificial datasets with different cellular compositions to evaluate the performance of Scallop in the presence of different degrees of class imbalance (see Figure 1 - Supplement 3). We explain the output of this analysis, which reinforces the algorithm’s robustness, in the Results section:

      "Next, we ran a series of simulations on artificially generated datasets to evaluate the performance of Scallop in the presence of different levels of class imbalance, dataset size, number of genes, and different degrees of expression of cell type markers. Our analysis showed that Scallop was remarkably robust to changes in cellular composition (see Figure 1 - Supplement 3). Both the average percentage of noise and the distribution remained unchanged for a wide range of class imbalance degrees. Similarly, altering the dataset size (number of cells) and the number of genes of an artificial dataset did not cause any major changes on the transcriptional noise values attributed to each cell type (see Figure 1 - Supplements 4 and 5). Additionally, we conducted an analysis where we identified the 10 most differentially expressed gene markers for a cell type and measured the transcriptional noise associated with that cell type as we removed the expression of those genes from the dataset (Figure 1 - Supplement 5). Transcriptional noise steadily increased as we removed the effect of the top marker genes that defined the cell type under study (see Figure 1 - Supplement 5B). This experiment provides further evidence on how strong marker expression is related to robust cell type identity and how the lack of it results in transcriptional noise."

      3) Although the Leiden algorithm is widely used by many single-cell clustering methods, since the proposed methodology is heavily dependent on clustering, I suggest including a description of the Leiden algorithm.

      We agree that understanding how community detection algorithms in general –and Leiden in particular– work is crucial to understand the core of the paper, so we have included a brief introduction to these methods in the Methods section, at the beginning of the Scallop subsection:

      Leiden is a graph-based community detection algorithm that was designed to improve the popular Louvain method [13]. Graph-community detection methods take a graph representation of a dataset. In the context of single-cell RNAseq data, shared nearest neighbor (SNN) graphs are commonly used. These are graphs whose nodes represent individual cells and edges connect pairs of cells that are part of the K-nearest neighbors of each other by some distance metric. The aim of community detection algorithms like Leiden is to find groups of nodes that are densely connected between them, by optimizing modularity. For a graph with C communities, the modularity (Q) is computed by taking, for each community (group of cells), the difference between the actual number of edges in that community (ei) and the number of expected edges in that community ( K2/1/2m).

      Where r is a resolution parameter (r > 0) that controls for the amount of communities: a greater resolution parameter gives more communities whereas a low resolution parameter fewer clusters. Since maximizing the modularity of a graph is an NP-hard problem, different heuristics are used, and Leiden has shown to outperform Louvain in this task both in terms of quality and speed [14]. However, users can choose to run the Louvain method instead by setting the parameter clustering="louvain" in the initialization of the Bootstrap object.

      3.1) Most importantly, the authors comment that they found stronger expression of cell-type specific markers in the cells with high membership values - is it already a product of the Leiden algorithm that it weighs highly variable (thus cell-type specific) features higher - resulting in better prediction of cell-types for cells with strong cell-marker expression? It is important to make a description of transcriptional noise at this stage as it could be genome-wide or more specific to cell-type markers. Can authors provide any support that their method can capture both?

      We agree with the reviewer that finding a stronger expression of cell-type markers in cells with high membership values is indeed something we expected. The graph representation of the dataset taken as input by Leiden is built after running highly variable gene detection and PCA. The neighbors of each cell are detected based on the expression of genes that are highly variable, as the reviewer pointed out, so genes that are differentially expressed between cells are more likely to contribute to the clusters found by Leiden.

      Whether Scallop measures genome-wide or cell type-specific noise (or a mixture of both) is a very interesting question. Clusters in single-cell RNA sequencing datasets are often mainly driven by the presence/absence of a few cell type markers, rather than changes in expression levels of broader sets of genes. Moreover, it has been shown that single-cell RNAseq datasets generally preserve the same population structure even after data binarization [15]. This is a consequence of the sparsity of single-cell RNAseq datasets. In our case, any difference in expression between one cluster vs the rest of the cells in the dataset –be it the expression of a gene that was not detected in the rest of the cells or a higher expression of a gene whose presence is weaker in other clusters– will certainly have an impact on the output of every downstream analysis, from clustering to dimensionality reduction. The influence of the expression of cell type-specific markers on Scallop membership has been demonstrated in several analyses. First, the simulation where we measured the impact of removing the 10 most defining markers for a particular cell type on transcriptional noise measurements (included in the Figure 1 - Supplement 6 of the revised manuscript). Also, Figure 5 provides evidence that the differential expression of a handful of genes (in this case, genes coding for surfactant proteins) can have an impact on the clustering solutions obtained for a set of human alveolar macrophages, and this in turn influences the membership scores obtained with Scallop. In essence, Scallop merely provides a measure of the robustness of clustering at the single-cell level, so any type of transcriptional noise might have an impact on Scallop memberships, provided it is sufficiently strong to influence the output of the clustering algorithm used. In other words, the fact Scallop membership captures a mixture of both types of noise (genome-wide and that associated with cell type-specific markers) is a consequence of the influence both types of noise have on clustering.

      4) The authors conclude that Scallop outperforms other methods through the analysis of biological data, where there is no positive and negative control. I suggest creating synthetic datasets (which could be based on real data), introducing different levels of noise artificially (considering biological constraints like max/min expression levels) and then testing the performance where the truth about each dataset is known. Otherwise, the definitions of noisy and stable cells, regardless of the method, are arbitrary.

      Our initial focus was on biological datasets, were no positive and negative controls regarding transcriptional noise could be used, but we agree in the need of including an analysis using simulations on artificial datasets. We analyzed artificially generated datasets with known degrees of transcriptional noise in order to evaluate the performance of Scallop on a setting where the ground truth is known beforehand. The way we modeled transcriptional noise was by tuning the de.prob parameter, which determines the probability that a gene will be differentially expressed between clusters. The creation of these datasets is explained in detail in the Methods section of the revised manuscript, and specifically in the subsections Performance of Scallop and two DTC methods on four artificial datasets with increasing transcriptional noise. and Ability to detect noisy cells within cell types.

      We have now included the following section in the Results:

      "We compared the output of Scallop and two DTC methods (the whole transcriptome-based Euclidean distance to average cell type expression and the invariant gene-based Euclidean distance to average tissue expression) on four artificially generated datasets containing various levels of transcriptional noise. The analysis showed that Scallop, unlike DTC methods, was able to discern between the core transcriptionally stable cells within each cell type cluster from the more noisy cells that lie in between clusters (see Figure 1 - Supplement 1). We then compared one of the DTC methods to Scallop regarding their ability to detect noisy cells within each of the cell types, by plotting the top 10% noisiest and top 10% most stable cells and (see Figure 1 - Supplement 2A). Analyzing the distribution of noise values for each cell type separately revealed that Scallop can distinguish between clusters that mainly consist of transcriptionally stable cells from noisier clusters that do not have such a distinct transcriptional signature (Figure 1 - Supplement 2B."

      Reviewer #3 (Public Review):

      In this manuscript, Ibáñez-Solé et al aim to clarify the answer to a very basic and important question that has gained a lot of attention in the past ∼5 years due to fast-increasing pace of research in the aging field and development/optimization of single-cell gene expression quantification techniques: how does noise in gene expression change during the course of cellular/tissue aging? As the authors clearly describe, there have been multiple datasets available in the literature but one could not say the same for the number of available analysis pipelines, especially a pipeline that quantifies membership of single cells to their assigned cell type cluster. To address these needs, Ibáñez-Solé et al developed: 1. a toolkit (named Decibel) to implement the common methods for the quantification of age-related noise in scRNAseq data; and 2. a method (named Scallop) for obtaining membership information for single-cells regarding their assigned celltype cluster. Their analyses showed that previously-published aging datasets had large variability between tissues and datasets, and importantly the author’s results show that noise-increase in aging could not be claimed as a universal phenotype (as previously suggested by various studies).

      We thank the reviewer for their positive assessment of the manuscript and their suggestions.

      Comments:

      1) In two relevant papers (doi.org/10.1038/s41467-017-00752-9anddoi.org/10.1016/j.isci. 2018.08.011), previous work had already shown what haploid/diploid genetic backgrounds could show in terms of intercellular/intracellular noise. Due to the direct nature of age/noise quantification in these papers, one cannot blame any computational pipeline-related issues for the ”unconventional” results. The authors should cite and sufficiently discuss the noise-related results of these papers in their Discussion section. These two papers collectively show how the specific gene, its protein half-life and ploidy can lead to similar/different noise outcomes.

      We agree that we have failed to mention and sufficiently discuss the effects of measuring transcriptional noise from data generated via destructive experimentation, where no longitudinal analyses are possible. As aforementioned in the response to other reviewers, the body of literature on transcriptional noise is quite wide and based on heterogeneous assumptions. We have focused our efforts in measuring actual noise in scRNAseq aging datasets, which by definition imply sampling of different cells and thus make assumptions at the population level. We believe our results provide a different and interesting perspective into transcriptional noise and aging, but we agree with this reviewer in the need to discuss our findings in the context of other attempts to measure transcriptional noise in a more direct way. We have now included a brief discussion of the work by Sarnoski et al. and Liu et al.. This point is explained in more detail later in the letter.

      2) While the authors correctly put a lot of emphasis on studying the same cell type or tissue for a faithful interpretation of noise-related results, they ignore another important factor: tracking the same cell over time instead of calculating noise from single-cell populations at supposedly-different age points. Obviously, scRNAseq cannot analyze the same cell twice, but inability to assess noise-in-aging in the same cell over time is still an important concern. Noise could/does affect the generation durations and therefore neighboring cells in the same cluster may not have experienced the same amount of mitotic aging, for example. Also, perhaps a cell has already entered senescence at early age in the same tissue. This caveat should be properly discussed.

      The distinction between intrinsic and extrinsic noise and the impossibility to discern between the two in destructive experiments is a relevant point that we have now included in the Discussion (the newly added text is shown in italics):

      "Transcriptional noise could be related to genomic instability [18], epigenetic deregulation [19, 20] or loss of proteostasis [21], all established hallmarks of aging. Some authors consider transcriptional noise to be a hallmark of aging in and of itself [22]. In any case, the origin of transcriptional noise is unclear, as it could arise from many different sources. Most importantly, it not possible to distinguish between intrinsic and extrinsic noise from a snapshot of cellular states, i.e., one cannot tell whether the observed differences between cells in a single-cell RNA experiment reflect time-dependent variations in gene expression or differences between cells across a population [23]. Interestingly, recent work by Liu et al. measuring intrinsic noise in S. cerevisiae showed that aging is associated with a steady decrease in noise, with a sudden increase in soon-to-die cells. Another longitudinal study found an increase extrinsic noise and a lack of change in intrinsic noise in diploid yeast [16]."

      Regarding the caveat of cells of individuals in the Young groups showing signs of aging, we can only agree that this is correct: there will be cells sampled that already show signs of cellular damage in the absence of chronological aging. However this applies to every study of aging that samples cells in a destructive manner and it is generally assumed by the field that this is a discrete phenomenon that does not affect the overall results in a meaningful way.

      3) Another weakness of this study is that the authors did not show the source/cause of decreasing/stable/increasing noise during aging. Understanding the source of loss of cell type identity is also important but this manuscript was about noise in aging, so it would have been nice if there could be some attempts to explain why noise is having this/that trend in differentially aged cell types in specific tissues.

      The reviewer raises here a very important point that we would like to discuss in detail. The papers that we have re-analyzed generally assume that an increase in transcriptional noise and a loss in cell type identity are equivalent terms. However, as this reviewer points out, you could theoretically have cells that lose their cell type identity without a concomitant increase in transcriptional noise, for instance by a sharp decrease in a limited number of marker genes that collectively define that cell within a given cell type/cluster. Thus, transcriptional noise can certainly arise from different sources and several mechanisms have been proposed to explain its presence in the context of cellular aging. We agree with the reviewer that discussing how transcriptional noise could be related to aging is of interest to the readers. However, as pointed out in the responses to similar concerns by the other reviewers, our main finding is that we don’t detect meaningful and reliable increases in transcriptional noise associated with cell aging. Instead, what we see is a number of different technical and biological issues/phenomena that have been interpreted as transcriptional noise. We hope this reviewer will agree that the manuscript now presents a full and robust story and that finding the causes of up/down ”noise” trends in the different datasets may be more appropriately tackled by follow up studies.

      4) In the discussion section, the authors say that ”Most importantly, Scallop measures transcriptional noise by membership to cell type-specific clusters which is a re-definition of the original formulation of noise by Raser and O’Shea.” It is not clear what the authors refer to by ”the original formulation of noise by Raser and O’Shea”. Intrinsic/extrinsic noise formulations?? Please be more specific.

      We thank the reviewer for pointing this out, since we agree that the sentence needed to be reformulated for the sake of clarity. What we meant by the definition by Raser and O’Shea was ”the measured level of variation in gene expression among cells supposed to be identical”, which does not make any distinction between intrinsic and extrinsic noise. Since their definition is previous to the development of single-cell technologies, we meant to state our attempt to bring this classic concept to the context of single-cell RNAseq. Nowadays, cell clusters produced by a community detection algorithm are given cell type annotations depending on their expression of known cell type markers. What Scallop aims to measure is the extent of membership each individual cell has for their cluster as evidence of its transcriptional stability. In order to make this point more clear, we have now rewritten the paragraph as follows:

      Most importantly, Scallop measures transcriptional noise by membership to cell type-specific clusters which is a re-definition of the original formulation of noise by Raser and O’Shea: measurable variation among cells that should share the same transcriptome. This is in stark contrast to measurements of noise including other phenomena (as demonstrated in Figure 5) by the distance-to-centroid methods prevalent in the literature.

      References

      [1] M. Alex Ascensión, Olga Ibáñez-Solé, Iñaki Inza, Ander Izeta, and Marcos J Araúzo-Bravo. Triku: A feature selection method based on nearest neighbors for single-cell data. GigaScience, 11, 2022. doi: 10.1093/gigascience/giac017.

      [2] M. Ximerakis, S. L. Lipnick, B. T. Innes, S. K. Simmons, X. Adiconis, D. Dionne, B. A. Mayweather, L. Nguyen, Z. Niziolek, C. Ozek, V. L. Butty, R. Isserlin, S. M. Buchanan, S. S. Levine, A. Regev, G. D. Bader, J. Z. Levin, and L. L. Rubin. Single-cell transcriptomic profiling of the aging mouse brain. Nat Neurosci, 22(10), 2019. doi: https://doi:10.1038/s41593-019-0491-3.

      [3] M. Enge, H. E. Arda, M. Mignardi, J. Beausang, R. Bottino, S. K. Kim, and S. R. Quake. Single-cell analysis of human pancreas reveals transcriptional signatures of aging and somatic mutation patterns. Cell, 171(2), 2017. doi: https://doi:10.1016/j.cell.2017.09.004.

      [4] L. Solé-Boldo, G. Raddatz, and S. et al. Schütz. Single-cell transcriptomes of the human skin reveal age-related loss of fibroblast priming. Commun Biol, 3(188), 2020. doi: https://doi.org/10.1038/ s42003-020-0922-4.

      [5] Jaime L. Schneider, Jared H. Rowe, Carolina Garcia-de Alba, Carla F. Kim, Arlene H. Sharpe, and Marcia C. Haigis. The aging lung: Physiology, disease, and immunity. Cell, 184(8):1990–2019, 2021. doi: 10.1016/j.cell.2021.03.005.

      [6] Shuai Ma, Shuhui Sun, Jiaming Li, Yanling Fan, Jing Qu, Liang Sun, Si Wang, Yiyuan Zhang, Shanshan Yang, Zunpeng Liu, and et al. Single-cell transcriptomic atlas of primate cardiopulmonary aging. Cell Research, 31(4):415–432, 2020. doi: 10.1038/s41422-020-00412-6.

      [7] I. Angelidis, L. M. Simon, and I. E. et al. Fernandez. An atlas of the aging lung mapped by single cell transcriptomics and deep tissue proteomics. Nature Communications, 2019. doi: https://doi.org/10. 1038/s41467-019-08831-9.

      [8] Jonathan M. Raser and Erin K. O’Shea. Noise in gene expression: origins, consequences, and control. Science, 309(5743):2010–2013, 2005. doi: 10.1126/science.1105891.

      [9] Michael B. Elowitz, Arnold J. Levine, Eric D. Siggia, and Peter S. Swain. Stochastic gene expression in a single cell. Science, 297:1183– 1186, 2002. doi: 10.1126/science.1070919.

      [10] Peter S. Swain, Michael B. Elowitz, and Eric D. Siggia. Intrinsic and extrinsic contributions to stochasticity in gene expression. Proc Natl Acad Sci U S A., 99:12795–12800, 2002. doi: 10.1073/pnas.162041399.

      [11] Alex Cagan, Adrian Baez-Ortega, Natalia Brzozowska, Federico Abascal, Tim H. H. Coorens, Mathijs A. Sanders, Andrew R. J. Lawson, Luke M. R. Harvey, Shriram Bhosle, David Jones, Raul E. Alcantara, Timothy M. Butler, Yvette Hooks, Kirsty Roberts, Elizabeth Anderson, Sharna Lunn, Edmund Flach, Simon Spiro, Inez Januszczak, Ethan Wrigglesworth, Hannah Jenkins, Tilly Dallas, Nic Masters, Matthew W. Perkins, Robert Deaville, Megan Druce, Ruzhica Bogeska, Michael D. Milsom, Björn Neumann, Frank Gorman, Fernando Constantino-Casas, Laura Peachey, Diana Bochynska, Ewan St. John Smith, Moritz Gerstung, Peter J. Campbell, Elizabeth P. Murchison, Michael R. Stratton, and Iñigo Martincorena. Somatic mutation rates scale with lifespan across mammals. Nature, 604: 517–524, 2022. doi: 10.1038/s41586-022-04618-z.

      [12] Hamit Izgi, Dingding Han, Ulas Isildak, Shuyun Huang, Ece Kocabiyik, Philipp Khaitovich, Mehmet Somel, and Handan Melike Dönertas. Inter-tissue convergence of gene expression during ageing suggests age-related loss of tissue and cellular identity. eLife, 11, 2022. doi: 10.7554/eLife.68048.

      [13] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10): P10008, oct 2008. doi: 10.1088/1742-5468/2008/10/p10008. URL https://doi.org/10.1088/ 1742-5468/2008/10/p10008.

      [14] V. A. Traag, L. Waltman, and N. J. van Eck. From louvain to leiden: guaranteeing well-connected communities. Scientific Reports, 9, 2019. doi: https://doi.org/10.1038/s41598-019-41695-z.

      [15] Peng Qiu. Embracing the dropouts in single-cell rna-seq analysis. Nature Communications, 11(1), 2020. doi: 10.1038/s41467-020-14976-9.

      [16] Ethan A. Sarnoski, Ruijie Song, Ege Ertekin, Noelle Koonce, and Murat Acar. Fundamental characteristics of single-cell aging in diploid yeast. iScience, 7:96–109, 2018. doi: 10.1016/j.isci.2018.08.011.

      [17] Ping Liu, Ruijie Song, Gregory L. Elison, Weilin Peng, and Murat Acar. Noise reduction as an emergent property of single-cell aging. Nature Communications, 8(1), 2017. doi: 10.1038/s41467-017-00752-9.

      [18] Jan Vijg. From dna damage to mutations: All roads lead to aging. Ageing Res Rev., 68(101316), 2021. doi: 10.1016/j.arr.2021.101316.

      [19] Yuancheng Lu, Benedikt Brommer, Xiao Tian, Anitha Krishnan, Margarita Meer, Chen Wang, Daniel L. Vera, Qiurui Zeng, Doudou Yu, Michael S. Bonkowski, Jae-Hyun Yang, Songlin Zhou, Emma M. Hoffmann, Margarete M. Karg, Michael B. Schultz, Alice E. Kane, Noah Davidsohn, Ekaterina Korobkina, Karolina Chwalek, Luis A. Rajman, George M. Church, Konrad Hochedlinger, Vadim N. Gladyshev, Steve Horvath, Morgan E. Levine, Meredith S. Gregory-Ksander, Bruce R. Ksander, Zhigang He, and David A. Sinclair. Reprogramming to recover youthful epigenetic information and restore vision. Nature, 588(7836):124–129, 2020. doi: 10.1038/s41586-020-2975-4.

      [20] Giorgio Oliviero, Sergey Kovalchuk, Adelina Rogowska-Wrzesinska, Veit Schwämmle, and Ole N. Jensen. Distinct and diverse chromatin proteomes of ageing mouse organs reveal protein signatures that correlate with physiological functions. eLife, 11(e73524), 2022. doi: 10.7554/eLife.73524.

      [21] Jingyi Li, Yuxuan Zheng, Pengze Yan, Moshi Song, Si Wang, Liang Sun, Zunpeng Liu, Shuai Ma, Juan Carlos Izpisua Belmonte, Piu Chan, Qi Zhou, Weiqi Zhang, Guang-Hui Liu, Fuchou Tang, and Jing Qu. A single-cell transcriptomic atlas of primate pancreatic islet aging. Natl Sci Rev., 8(2): nwaa127, 2020. doi: 10.1093/nsr/nwaa127.

      [22] Alexander R. Mendenhall, George M. Martin, Matt Kaeberlein, and Rozalyn M. Anderson. Cellto-cell variation in gene expression and the aging process. Geroscience, 43(1):181–196, 2021. doi: 10.1007/s11357-021-00339-9.

      [23] Lucy Ham, Marcel Jackson, and Michael PH Stumpf. Pathway dynamics can delineate the sources of transcriptional noise in gene expression. eLife, 10, 2021. doi: 10.7554/elife.69324.

    1. Author Response

      Reviewer #1 (Public Review):

      This work identifies distinct contribution of direct (D1+) and indirect (Adora+, D2+) amygdalostriatal medium spiny cells in fear learning and plasticity. The authors combined freely moving calcium imaging with auditory fear learning assay to reveal tone, foot-shock and behavior (movement)-evoked activity of the two MSN population. While D1+ cells show plastic changes driven by fear learning and reaching their maximum tone responsiveness (PSTH) at fear retrieval, Adore+ cells activation remained constant. Furthermore, using optogenetic silencing they showed that the two MSN groups differently contribute to retrieval of fear memory. Both cells receive topographically organized insular cortical inputs which go through learning-induced long-term synaptic changes with opposite direction: postsynaptic LTP at D1 cells, while presynaptic LTD at Adora+ cells. These synaptic changes provide some level of explanation for distinct behavioral contribution of the two cell types in fear learning.

      This study focuses on a so far neglected member of the 'extended' amygdalar circuitry, the amygdalostratal transition zone. The data is well-presented, the experiments are in logical order, built on each other and the paper is easy to read and follow.

      However, some information regarding the connectivity (and function) of Astr have been presented in recent and earlier papers are missing from, or contradicting with, the present work. One reason to explain these is that the targeted striatal regions vary between experiments, and so, it is difficult to judge when the Astr and when the other part of the caudal (tail) striatum is examined. As these striatal regions are involved in different neuronal networks, their functional consequences could also be distinct. Without precisely clarifying and consistently targeting the aimed striatal region, it is difficult to interpret the findings of the present study (though those are relevant and important).

      We thank this reviewer for his/her overall positive evaluation of our paper.

      We agree with the criticism that in the first submission, we have not stringently defined the anatomical region of the amygdala - striatal transition zone (AStria). After validating our previous data, and after performing new anatomical experiments studying the expression of Cre in the D1RCre and AdoraCre mouse lines used here (see Figure 1D; Figure 1 - figure supplement 1; Figure 3 - figure supplement 1), we now refer to the region targeted in our study as "ventral tail striatum" (vTS), as opposed to the more narrowly defined, and more ventrally located "AStria". Therefore, we have changed the word "AStria" to ventral tail striatum ("vTS") throughout the paper.

      We have also improved our introduction to the posterior striatum (p. 4 bottom, p. 5 top), and we briefly discuss the targeting of the vTS (as opposed to the AStria)(p. 19 top).

      Reviewer #2 (Public Review):

      Kintscher et al present a nice study on the responses of Adora2a and D1R expressing cells in the tail of the striatum/amygdala transition zone during auditory fear conditioning. Overall the conclusions are that (1) D1R cells show plasticity in activity patterns during the task, with the emergence of tone/movement co-modulated cells; (2) Adora2a cells show less of such changes; (3) gain of function of activity does little where (4) loss of function of activity in each cell class has moderate effects on the learned behavior (i.e. freezing to the CS). There is a nice section on rabies tracing which maps inputs to both cell types which then motivates an analysis of insular cortex inputs onto both cell types and reveals that (5) CS/US pairing alters insular inputs to both cell types.

      Overall the paper is well done and the conclusions are believable. Furthermore, this brain area is understudied yet potentially very important.

      The analysis of the fluorescence transients is heavy handed. This leads to potential for error and could obscure what appear to be large differences that could be extracted more easily. In some instances, the data are interpreted too optimistically, especially that the silencing experiments implicate plasticity of the neurons rather than the need for activity.

      We thank the reviewer for his/her positive evaluation of our paper. For the revision, we have re-analyzed the Ca-imaging data, and we have made changes in the text to avoid a too optimistic interpretation of our data.

    1. Author Response

      Reviewer #2 (Public Review):

      Wild and colleagues develop a barcoding approach, termed WILD-seq, that combines tumor cell barcoding with single cell transcriptional analysis to concurrently examine clonal tumor cell dynamics and cell state changes during drug treatment. They examine two triple-negative breast cancer (TNBC) cell lines in vivo in response to JQ1 and taxanes. Results from these experiments yield several meaningful conclusions. First, they demonstrate that clonal dynamics are fundamentally distinct depend ending on context and microenvironment, with significant differences observable between cell culture, NSG and immunocompetent mice. Second, they show that bulk expression in treatment refractory tumors represents clonal outgrowth of subpopulations in pretreatment tumors that bear gene expression patterns similar to the tumor relapsed. Finally, they identify mechanisms of in vivo taxman resistance, including EMT and high NRF2 expression - the latter yielding tumors that show collateral sensitivity to L-asparaginase and subsequent resistance mediated by high levels of asparagine synthetase.

      This study is a technical tour de force. The authors deeply engage the complexity of cell barcoding, bottle necking, Hamming analysis, single cell expression analysis and microenvironmental cell analysis. The idea that bulk tumor expression states demarcate drug resistant clonal populations in pre-treatment tumors, while not a new concept, finds critical validation using this approach. Moreover, the use of this approach to examine collateral sensitivity and to identify new strategies to target taxane resistance is compelling.

      I support this work but might suggest some comparisons of primary and relapse tumors, as well as the nature of the taxane collateral sensitivity, be further extended.

      Major comments:

      1) The authors suggest that the bulk expression analysis in relapsed tumors mirrors clonal populations in pretreatment tumors (which, while requiring barcoding to validate, somewhat obviates the need for barcoding to identify mechanisms of drug resistance). In cases like EMT, it has been argued that mesenchymal tumor cells survive therapy, but then undergo MET in the relapsed state. Thus, in the long term, tumors may revert to pre-treatment clonal states. It would be interesting to see whether that is the case here - and whether the informative nature of bulk gene expression in the drug resistant tumor is lost over time.

      This is an interesting point. We don’t have any direct evidence of any of the tumour cell lineages in our model undergoing EMT or MET from our work, but it is entirely possible that the tumour cells dynamically transition between states over longer time frames that we haven’t captured in our experiments to date. It is also possible that there are intermediate states that we have not captured by sampling at end-point. WILD-seq presents an excellent method for such studies but these are beyond the scope of the current paper.

      For such experiments, it would be essential to use barcoded cells to track clonal lineage, otherwise it is impossible to determine whether changes in the EMT of a tumour cell population was driven by a change in the transcriptome/cell state or a shift in clonal abundance. We have added discussion of these points to the discussion section of the manuscript.

      With respect to the necessity of barcoding for identifying treatment resistance mechanisms over bulk approaches, lineage-based analysis serves to prioritise pathways that change in the resistance setting that might otherwise be overlooked as being lower down the list of differential expression in bulk analysis. While not specifically addressed here, being able to differentiate between a pre-existing resistance phenotype or an adaptive mechanism of resistance, may also inform the choice of dosing schedule of agents targeting resistant clones.

      2) Collateral resistance can either refer to the outgrowth of clones that show enhanced sensitivity to distinct therapies or the therapeutic induction of cell states that respond differently to other drugs. To confirm that L-asparaginase sensitivity results from the specific outgrowth of NRF2 clones, it would be meaningful to show that these clones are lost upon L-asparaginase-only treatment and that pretreatment of L-asparaginase promotes long term efficacy of taxanes.

      We agree this is a critical question and one that we had already started to address while the manuscript was under review. The Nrf2-high clones are lowly represented in vehicle treated tumours and on the edge of our detection threshold, thus accurate measurements of their depletion by L-asparaginase-only treatment in tumours derived from our heterogeneous WILD-seq clonal pools is very challenging. To address this question, we have instead chosen to isolate individual resistant clones and directly test their response to L-asparaginase. We were able to isolate two of the Nrf2-high clones (751 and 1240) by growing up clones from single cells. After expansion in vitro, these were implanted as pure monoclonal populations and the resulting tumours treated with L-asparaginase. These new data, presented in Fig 7g, demonstrate that tumours derived from these clones (in contrast to tumours derived from our WILD-seq pools) significantly respond to L-asparaginase-only treatment, suggesting that this cell state is a pre-existing intrinsic property of these clones and not one induced by docetaxel treatment.

    1. Author Response

      Reviewer #1 (Public Review):

      It has been shown that selenium protects against the development of epilepsy, and behavioral comorbidities, as pointed out by the authors. This paper attempts to show it does if administered later after chronic seizures start. While clinically relevant, as noted by the authors, the paper seems not to be a major advance beyond the prior study. The antiseizure effect is also not very convincing because the effect size is so small and the variance so high. The data about behavior is more convincing but similar data were in the previous paper, so it is not very novel.

      Thank you for reviewing our paper. Previous work has shown that sodium selenate, not selenium, can delay the appearance of seizures and mitigate behavioural comorbidities if given immediately after the epileptogenic brain insult, but before the appearance of spontaneous recurring seizures (i.e. before epilepsy development), i.e. is anti-epileptogenic. The novelty of our current work is that we are treating once epilepsy develops, i.e. is disease-modifying. This is the first time a pharmacological agent has been shown to be disease-modifying in established epilepsy, resulting in an enduring reduction in seizures suppression even after treatment withdrawal, as well as to mitigate the behavioural comorbidities that commonly are co-morbid with chronic epilepsy. This is potentially ground-breaking new findings for the epilepsy field, as at present the only current disease-modifying therapy for established chronic epilepsy is epilepsy surgery.

    1. Author Response

      Reviewer #1 (Public Review):

      Building upon the previous evidence of activation of auditory cortex VIP interneurons in response to non-classical stimuli like reward and punishment, Szadai et al., extended the investigation to multiple cortical regions. Use of three-dimensional acousto-optical two-photon microscopy along with the 3D chessboard scanning method allowed high-speed signal acquisition from numerous VIP interneurons in a large brain volume. Additionally, activity of VIP interneurons in deep cortical regions was obtained using fiber photometry. With the help of these two imaging methods authors were able to extract and analyze the VIP cell signal from different cortical regions. Study of VIP interneuron activity during an auditory go-no-go task revealed that more than half of recorded cortical VIP interneurons were responding to both reward and punishment with high reliability. Fiber photometry data revealed similar observations; however, the temporal dynamics of reinforcement stimuli-related response in mPFC was slower than in the auditory cortex. The authors performed detailed analysis of individual cell activity dynamics, which revealed five categories of VIP cells based on their temporal profiles. Further, animals with higher performance on the discrimination task showed stronger VIP responses to 'go trials' possibly suggesting the role of VIP interneurons in discrimination learning. Authors found that reinforcement related response of VIP interneurons in visual cortex was not correlated with their sensory tuning, unveiling an interesting idea that VIP interneurons take part in both local as well as global processing. These observations bring attention to the possible involvement of VIP interneurons in reinforcement stimuli-associated global signaling that would regulate local connectivity and information processing leading to learning.

      The state-of-the-art imaging technique allowed authors to succeed in imaging VIP interneurons from several cortical regions. Advanced analyses revealed the nuances, similarities and differences in the VIP activity trend in various regions. The conclusions about reinforcement stimuli related activity of VIP interneurons made by the authors are well supported by the results obtained, however some claims and interpretations require more attention and clarification.

      We thank Reviewer #1 for the positive general comments.

      Reviewer #2 (Public Review):

      In recent years the activity of cortical VIP+ interneurons in relation to learning and sensory processing has raised great interest and has been intensely investigated. The ability of VIP+ interneurons in the auditory cortex to respond to both reward and punishment was already reported a few years ago by some of the authors (Pi et al., 2013, Nature). However, this work importantly adds to their previous study demonstrating a largely similar and synchronous response of a large fraction of these interneurons across the neocortex to salient stimuli of different valence during the performance of an auditory discrimination task.

      An additional strength of this study is the analysis and identification of the general pattern of VIP+ interneuron responses associated to specific behaviors in the different layers of the neocortex depth.

      Interestingly, the authors also identified using cluster analysis 5 different classes of VIP+ interneurons, based on the dynamic of their responses, that were unequally distributed in distinct cortical areas.

      This is a well performed study that took advantage of a cutting-edge imaging approach with high recording speed and good signal-to-noise ratio. Experiments are well performed and the data are properly analyzed and nicely illustrated. However, one shortcoming of this paper, in my opinion, is the "case report" structure of the data. Essentially for each neocortical area the activity of VIP+ interneurons was analyzed only in one animal. This limits the assessment of the stability of the response/recruitment of these interneurons. I appreciate the high number of recorded VIP+ interneurons per area/animal and I do understand that it would be excessively laborious to perform 3D random-access two-photon microscopy in several mice for each cortical area. On the other hand, it would be important to have some knowledge of the general variability of the responses of these neurons among animals.

      In conclusion, despite the findings described in this manuscript being generally sound, additional experiments are recommended to further substantiate the conclusions.

      Thank you for pointing out this potential misunderstanding. Although we mentioned the number of animals the recordings were obtained from (n=22 total), we repeated this multiple times to alleviate the potential confusion. The data recorded with the 2-photon microscope are from 16 animals, and fiber photometry was performed on a separate 6 animals. Each animal was recorded in one (14 mice) or two areas (8 mice, 2 AOD, 6 photometry). We aimed to acquire data from at least 3 recordings per area (4 in the primary somatosensory cortex, 6 in the primary and secondary motor cortices, 4 in the lateral and medial parietal cortices, 3 in the primary visual cortices, 6 in the auditory and medial prefrontal cortices). In the revised manuscript this information can be found at the beginning of the results section and in the figure legends:

      “To probe the behavioral function of VIP interneurons, we trained head-fixed mice (n=22 in total, n=16 for 2-photon microscopy and n=6 for fiber photometry) on a simple auditory discrimination task (Figure 1A).”

      “Among the 811 neurons imaged in 18 imaging sessions from 16 mice,”

      “Ca2+ responses of individual VIP interneurons recorded separately from 18 different cortical regions from 16 mice using fast 3D AO imaging were averaged for Hit (thick green), FA (thick red), Miss (dark blue), and CR (light blue). Fiber photometry data were recorded simultaneously from mPFC and ACx regions and are shown in gray boxes. Functional map (Kirkcaldie, 2012) used with the permission of the author. Speaker symbols represent the average time of tone onset, and gray triangles mark the reinforcement onset for Hit and FA. Averages of Miss and CR trials were aligned according to the expected reinforcement delivery calculated on the basis of the average reaction time. mPFC: medial prefrontal cortex (n=6 mice), ACx: auditory cortex (n=6), S1Hl/S1Tr/S1Bf/S1Sh: primary somatosensory cortex, hindlimb/trunk/barrel field/shoulder region (n=4), M1/M2: primary/secondary motor cortex (n=6), Mpta/Lpta: medial/lateral parietal cortex (n=4), V1: primary visual cortex (n=3).”

      “This approach allowed us to simultaneously measure bulk calcium-dependent signals from VIP interneurons located in the right medial prefrontal (mPFC) and left auditory cortices (ACx) by implanting two 400 µm optical fibers at these locations (n=6 sessions from n=6 mice, Figure 1–figure supplement 1C).”

      “Raster plot of the trial-to-trial activation of the responsive VIP neurons in Hit and FA trials during the two-photon imaging sessions (n=18 sessions, n=16 mice, n=746 cells).”

      Subregional labels, for example on Figure 2, should be considered as additional information to orient the readers, even if they were very precisely defined on the basis of the coordinates. All analyses considering regional differences were conducted on the level of the main functional areas of the dorsal cortex (motor, somatosensory, parietal, and visual). Despite some location-dependent heterogeneity in the late response phase (Figures 2G and H), even these main dorsal cortical regions were all similar from the perspective of responsiveness to reinforcers and auditory cues.

      Reviewer #3 (Public Review):

      In this study Szadai et al. show reliable, relatively synchronous activation of VIP neurons across different areas of dorsal cortex in response to reward and punishment of mice performing an auditory discrimination task. The authors use both a relatively fast 2 photon imaging, as well as fiber photometry for some deeper areas. They cluster neurons according to their temporal response profiles and show that these profiles differ across areas and cortical depths. Task performance, running behavior and arousal are all related to VIP response magnitude, as has been previously shown.

      Methodologically, this paper is strong: the described imaging technique allows for fairly fast sampling rates, they sample VIP cells from many different areas and the analyses are sophisticated and touch on the most relevant points. The figures are of high quality.

      However, as the manuscript is now, the presentation could be clearer, the methods more complete and it is not clear whether their conclusions are entirely supported by the data.

      The main issue is that reinforcement and arousal are hard to distinguish in this study. It is well known that VIP activity is correlated with arousal. And it is fairly clear that the reinforcement they use in this study - air puffs to the eye, as well as water rewards - cause arousal. It is possible that the reinforcer responses they observe in VIP neurons throughout all areas merely reflect the increases in arousal caused by these behaviorally salient events. They do discuss this caveat (albeit not fully convincingly) and in their abstract even state that the arousal state was not predictive of reinforcer responses. However their data clearly shows the tight relationship of the VIP reinforcer responses to both arousal (as measured by pupil diameter), as well as running speed of the animal. Both of these variables are well known to be tightly coupled to VIP activity.

      Although barely mentioned, the authors do appear to sometimes present uncued reward (Figure S2F). If responses were noticeably different from the same events in the task context (as actual reinforcers) this could at least hint towards the reinforcement signal being distinct from mere arousal. However, this data is only mentioned in one supplementary figure in a different context (comparison with PV cells) and neither directly compared to cued reward, nor is this discussed at all. Were uncued air puffs also presented? How do the responses compare to cued air puffs/punishment?

      Our original approach to distinguish between reinforcement- and arousal-related responses aimed:

      1) to show that VIP cells with both low and high correlation coefficients with arousal produce large signals upon reinforcement presentation (Figure 3B),

      2) the high differences of low and high arousal changes were reflected in a limited way in the VIP activity (Figures 3C and D): as highlighted in Figure R1, where we also added bars to show ∆P/P in high and low pupil change conditions, the difference in ∆P/P is ~5-fold, while it is only ~1.5-fold for ∆F/F. This disproportionality suggests that a large part of the signal below the dashed blue line is independent of arousal. We have added these modifications to the new version of Figure 3 for clarity.

      Figure R1 = Figure 3C-D with modification. Comparison of pupil changes and corresponding calcium averages.

      We collected further evidence to support our claims. In Figure 3–figure supplement 2 we depicted Hit and FA trials in which the reinforcement didn’t elevate the arousal level any further. Many of these trials were associated with locomotion prior to the reinforcement, but it was also common that the animals remained still during the whole trial. Trials with increased locomotion upon reinforcement presentation were excluded. Reinforcement-related calcium signals were still present under these conditions, indicating that these signals are not simple reflections of arousal. Moreover, we estimate the distinct contributions of arousal, locomotion, and reinforcers in Figure 3–figure supplement 2D in a systematic way with a generalized linear model. This model also confirmed our view about the reinforcement-related coding.

      We now say in the results:

      “Finally, to assess the motor- and reinforcement-related contributions to VIP interneuronal activity, we built a generalized linear model using the behavior and imaging data of the SS and Mtr recordings (Figure 3–figure supplement 2D, n=3 mice). This model was able to explain 18.8 ± 11.1% of the variance of the VIP population calcium signal, and highlighted that arousal was the best predictor, followed by reward, punishment, locomotion velocity, and auditory cue (weights = 0.055, 0.031, 0.028, 0.020, 0.018 respectively; all predictors, except the auditory cue in the case of one animal, contributed significantly, p<0.001). These observations indicate that running and arousal changes alone cannot fully explain the recruitment of VIP interneurons by reinforcers.”

      We apologize for not describing the rational and the result from the uncued reward experiments. Briefly, while recording reinforcement related signals in auditory cortex in our task, we realized that the cue delivery, and the resulting purely sensory response could alter the measurement of the reward-related responses. Hence, in order to disentangle the reward and sensory-related responses, we presented the animals with simple, uncued reward and observed a similar and robust recruitment of VIP interneurons. Based on the same rational, we made similar measurement for PV neurons.

      We now say in the results:

      “We did not further analyze the FA responses in auditory cortex as those responses also had a sensory component linked to the white noise-like sound created by the air puff delivery. Because the cue delivery could prove as a confound to measure reward-mediated responses from VIP interneurons in auditory cortex (see also methods), we delivered random reward in separate sessions. Water droplets delivery recruited VIP interneurons in both auditory and medial prefrontal cortex in a similar fashion as water delivery during the discrimination task (Figure 2–figure supplement 1G). Like our single cell results, PV-expressing neuronal population in ACx did not show any significant change in activity upon similar random reward delivery (Figure 2–figure supplement 1G).”

      Regarding the difference between cued and uncued responses, we definitely agree with the reviewer that it is an important point. The goal of this manuscript is however to study how reward and punishment are being represented by VIP interneurons in cortex.

      The imaging method appears well suited for their task, however the improvements listed in table S1 make the method appear far superior to existing methods in many aspects. Published or preprinted papers with 2 photon imaging of VIP populations (eg. from Scanziani lab (Keller et al.), Carandini lab (Dipoppa et al.), deVries lab (Millman et al.), Adesnik lab (Mossing et al.), which use the much more common resonant scanning, seem to be able to image 4-7 layers at 4-8Hz with a good enough SNR and potentially bigger neuronal yield of approximately 100-200 VIP cells, depending on the field of view. While not every single cell in a volume would be captured by these studies, the only main advantage of the here-used technique appears to be the superior temporal resolution.

      We thank the reviewer for the positive comment and we agree that interpretation must be improved. We agree that the imaging methods in the papers listed above have good SNR and were proper to address the scientific questions that had arisen. As the reviewer points out, 3D-AOD imaging allows fast 3D measurement that cannot be achieved otherwise. We used these advantages to address the critical question of layer specificity in the response of VIP interneurons to reinforcer presentation (Figure 2–figure supplement 1F, but see also the new Figure 1–figure supplement 1B). Regarding the comparison and quantification of the factual advantages of AOD microscopy over other imaging methods, the reviewer and readers can refer to the methods section (3D AO microscopy), Table S1 and Szalay et al., 2016. We agree with the reviewer that one of the main advantages is the superior temporal resolution. The second main advantage is the improved SNR. This originates from the fact that the entire measurement time is spent on regions of interest; measurement of unnecessary background areas is not required. More specifically, SNR is improved even in the case of 2D imaging by the factor of:

      ((area of the entire frame )/(area of the recorded VIP cells))^0.5

      which is about (100)0.5=10 as VIP interneurons represent about 1% of the brain. We used this second advantage of AO scanning when we determined the activation ratio (e.g., see Figure 2D).

      As the resolution of single or a few action potentials is challenging in behaving mice labelled with the GCaMP6 sensor, any improvement in SNR will improve the detection threshold. The higher SNR achieved here improved the detection threshold, which also explains the relatively high activation ratio in our work.

      In the case of asynchronous activity patterns, there is negligible contribution of individual small neuropil structures to somatic activities because of the relatively high volume-ratio of a soma and a given small neuropil structure: this minimizes the error during ∆F/F calculation of somatic responses. However, reinforcement, arousal, and running can generate highly synchronous neuronal activities which can synchronize neuropil activity around a given soma and, therefore, effectively and systematically modulating the somatic ∆F/F responses. To avoid this error, we used a high NA objective with proper neuropil resolution and combined it with motion correction. The use of the high NA also decreased the total scanning volume to about 689 µm × 639 µm × 580 µm and, therefore, it limited the maximum number of VIP cells which could be recorded. It is also possible to use a low-NA objective with a much higher FOV and scanning volume and record over 1000 VIP cells, but the extension of the PSF along the z dimension is inversely and quadratically proportional to the NA of the objective, therefore neuropil resolution will be at least partially lost. In summary, using the high-NA Olympus objective we maximized the 2P resolution which, in combination with off-line motion artifact elimination, allowed precise recording of somatic signals without any neuropil contamination: this provided correct activation ratio values.

      Even though this is not mentioned at all, it certainly appears possible, that the accousto-optical scanning emits audible noise. In this case it would be good to know the frequency range and level of this background noise, whether there are auditory responses to the scanning itself and if it interferes with the performance of the animals in the auditory task in any way. If this is not the case, this should probably simply be mentioned for non-experts.

      While the name of the acousto-optical deflectors seems to refer to “acoustic noise”, these devices are driven in the range of 55-120 MHz, which is 3 orders of magnitude higher frequency than the hearing threshold of animals: mice don’t hear them. Moreover, we developed water-cooled AODs ten years ago which means that ventilators are also not required, therefore AOD-based scanning can be used with zero noise emission. In contrast, galvo, resonant, and piezo scanning work in the kHz frequency range, which is in the middle of the hearing range of mice. Moreover, these technologies can’t be used in a vacuum and the scanner is just a few tens of centimeters away from the mice, which means that acoustic noise can’t be canceled but can only be partially suppressed with white noise. We thank the reviewer for the helpful comment and have added one sentence about the absence of acoustic noise during acousto-optical scanning:

      “The deflectors are driven in the 55-120 MHz frequency range, therefore the noise emitted does not interfere with the auditory cues, as mice can’t hear it. This, in combination with the water cooling of the deflectors, makes the AOD-based scanning the quietest technology for in-vivo imaging.”

      The authors show a strong correlation between task performance (hit rate) and the response to the auditory cue on hit trials. Was there any other significant correlations of VIP cells' responses to other trial types? Was reinforcer response correlated to behavioral variables at all?

      We have not found any remarkable correlations between VIP cell activity and behavioral variables except the one mentioned above.

      For example, we tested discrimination rate (hit rate/FA rate) correlation with ∆F/Ftone in Hit trials, but this was not significant (R2=0.03, F=0.49, p=0.69), just like Hit rate vs. ∆F/Ftone in FA trials (R2=0.19, F=3.8, p=0.07), and discrimination rate vs. ∆F/Ftone in FA trials (R2=0.07, F=1.1, p=0.31).

    1. Author Response

      Reviewer #1 (Public Review):

      “Even though the methodology was already introduced, it should be described in some detail. Most importantly, AlphAfold's measures of accuracy have been part of the loss function during training/testing. What about the measure of protein-protein interaction accuracy? Was it also in the loss function?”

      We thank the reviewer for this insightful comment. The metrics used for evaluating predicted structure quality, such as the predicted local distance difference test (pLDDT) score and predicted TM score (pTM), both proposed in the AlphaFold 2 publication (Ref. 27), and the interface score (iScore) proposed in the AF2Complex publication (Ref. 23), are not explicitly employed as the loss function in training the main deep learning model for structure prediction. Instead, the main loss function of AF2 is the Frame Aligned Point Error (FAPE) loss, which measures the errors in the predicted atomic coordinates within local coordinate frames spanned by vectors connecting backbone heavy atoms of individual protein residues. However, this FAPE loss function is very much relevant to predicting TM-scores or iScores; both are derived from an additional module that predicts alignment errors (PAEs) viewed from each residue’s local frame. The training of this PAE module was done separately as described in the AF2 publication (Ref. 27). According to DeepMind, the training of the deep learning models for AlphaFold-Multimer (Ref. 25, AF version 2.2.0 and above) has relatively minor changes in the loss function; changes were made mainly to reduce severe clashes, which were not uncommon in modeling large complexes by earlier versions of AF2.

      We added in the Methods section, line 337,

      “The iScore metric was derived from the predicted alignment errors that gives an estimated distance for interface residue j from its position in the experimental structure, as viewed from a local frame of residue interface residue i [23,27]. To better estimate confidence, the contribution of each interface residue to the interface score is calculated using local frames not located within the same protein chain, i.e., residue i and j belonging to different chains.”

      “Figure 1a (upper panel, PpiD) includes quite a few promising hits but only the first, third, and 12th were considered. How were these chosen? For example, why not consider the second? The lower panel (YfgM) also shows many promising hits but only the first was chosen. Why not more? Likewise, only two of the top hits in Figure 4 are considered. What about the rest? For example, why taking into account the second best hit while skipping the first?”

      These are important questions about similar issues raised by all three reviewers, i.e., R2.1 by reviewer 2 and R3.2 by reviewer 3. We emphasize that our approach predicts physical interactions between proteins, not the biological consequence of such interactions. However, since the most interesting predictions are the ones relevant to biological functions, about which the computational method cannot make a judgement, given the space limitations of the manuscript, we opted to select from the top predictions those that likely provide mechanistic insights into biological function, for example, those that might inspire new hypotheses about molecular mechanisms. In practice, our selection process was guided by existing literature and experimental evidence. Since such information is limited, we can only focus on the very few ones with both strong computational and experimental evidence. Most top predictions, including the ones the reviewers questioned, were not pursued further because we cannot at present say anything about the functional consequences of these predicted interactions, even though they may interact physically. One main contribution of this computational screening approach is to provide short lists that accelerate the search for functionally important protein-protein interactions. Thus, in this contribution, we provide some examples found in the top 20 hits ranked from ~1500 possible pairs for a given query protein.

      In this revision, we added from line 85,

      “Note that our computational predictions are about physical interactions between a pair of proteins subjected to screening, not about their biological roles even if they are predicted to interact physically. Moreover, the predicted physical interactions may not be relevant in the cellular environment due to various factors not considered in modeling, e.g., competition from other proteins with stronger binding affinities, post-translational modifications, etc. Thus, it is possible that many protein-protein interactions predicted by this pipeline do not necessarily have biological relevance. Nevertheless, since cognate protein-protein interactions required by their functions are more likely to be detected than randomly selected proteins, biologically interesting protein-protein interactions are enriched at the top of the screening results ranked by iScore. Thus, the screening procedure may provide valuable even critical clues for subsequent investigation. In this study, assisted by existing experimental evidence, we select from high confidence computational predictions those most likely to have significant biological implications, and then predict the structures of larger complexes if more than two proteins are involved according to our predictions or based on literature information. The interactions that we ignored are either of unknown biological significance, physically interacting but biologically irrelevant, or simply false positives.”

      “Authors argue that the unstructured part of OmpA, which wraps around SurA, is to be trusted, which may be the case. But a more likely explanation is that it is an artefact, in agreement with the very low confidence assigned by AlphaFold.”

      While we do not disagree that the structure prediction about SurA/OmpA complex may contain artifacts, there are several reasons why our predications may be insightful, as we explained in the manuscript. First, it is well-known in experimental studies (references 41, 42, 45) that the SurA/OmpA complex is very dynamic and unlikely to possess a stable structural complex as in a typical crystal structure. As such, the low confidence score by AF2Complex is expected, as it reflects uncertainty due to the existence of many possible conformations. Second, it makes physical sense to have loose wrapping of OmpA around SurA, as it reduces the energetic costs to dissociate OmpA from SurA when SurA approaches BAM for its delivery. Our point is a qualitative assessment, rather than claiming a specific complex model as in a typical structure prediction scenario. To be cautious as the reviewer suggested, we added a sentence in the Discussion, from line 309,

      “Despite the low confidence due to weak interactions, the predicted structures delineate a picture for how SurA prevents OmpA from aggregating. Moreover, since it transports OmpA with a relatively small number of intermolecular contacts, the free energy required to dissociate OmpA from SurA is small. Notwithstanding these considerations, we caution that artifacts likely exist in these predicted structural models.”

      “Figure 5. How is (does) this predicted structure compare with the known structure of the complex? In particular, how similar are the predicted and known structures of the individual subunits, and how similar are the predicted docking poses to the known ones?”

      The BAM complex has been studied extensively, with over one hundred experimental structures of its individual subunits or the full complex. Therefore, a thorough structural comparison is a subject of a review beyond the scope of this study. In our computational models, the structures of the individual subunits or of the full BAM complex closely mimic their known experimental structures, which is expected because some of these structures were likely employed in the training of deep learning models and/or structure predictions. We added a comparison to the highest resolution crystal structure in the revised manuscript after line 225,

      “Because BAM has been extensively studied structurally [7,47], we focus on describing its interaction with SurA, though the predicted BAM complex model closely mimics a known crystal structure of the complex determined at 2.9 Å resolution (PDB 5D0O, [48]). The alignment of the two complex structures yields a very high TM-score of 0.94.”

      “Authors should make the results easily accessible to all. Maybe as Cytoscape and CyToStruct sessions for easy visualization.”

      Cytoscape and the add-on CytoStruct are very useful tools to visualize large networks. In our case, however, we are presenting only a handful of complexes, not a massive protein-protein interaction network like those resulting from all-against-all screening at proteome-scale. A diagram such as Fig. 7 is sufficient for our visualization purposes. Moreover, we provide the atomic coordinates in the standard PDB format for readers who wish to examine the respective structures in detail. In the future, if we have opportunity to expand PPI screening to a large number of targets, Cytoscape and add-ons will be handy to display the resulting gigantic network.

      “Finally, AlphaFold was trained and tested mostly with water-soluble protein. Thus, application to outer membrane proteins is a bit risky. Maybe authors can comment on this.”

      While it is true that most experimental structures used for training AlphaFold models are of water-soluble proteins, there are also structures of many membrane proteins available for training, as over 10,000 structures of membrane-proteins were already deposited in the Protein Data Bank, though there are redundancy within these structures and some domains are outside the transmembrane regions. These structures are likely sufficient for machine-learning approaches such as AlphaFold 2 to learn the sequence and structural patterns unique to transmembrane proteins. This view is supported by our empirical experience, because transmembrane regions of membrane proteins are typically among those with high confidence scores, e.g., complex models for a transmembrane molecular system CcmI presented in our AF2Complex work (Ref. 23). And one of these computational models (of CcmA2B2CD) was just confirmed to have high quality by cyro-EM models (Li et.al., Nature Communications 13:6422, 2022) at TM-score 0.89. We note that this was a non-trivial prediction as this structure was not present in the PDB and was long sought by the experimentalists. The view also agrees with the conclusion of a recent published study on AF2 models of transmembrane proteins (Hegedűs, et. al. Cell. Mol. Life Sci. 79:73, 2022).

    1. Author Response

      Reviewer #1 (Public Review):

      The authors present a study of figure-ground segregation in different species. Figure-ground segregation is an important mechanism for the establishment of an accurate 3D model of the environment. The authors examine whether figure-ground segregation occurs in mice in a similar manner to that reported in primates and compare results to two other species (Tree shrews and mouse lemurs). They use both behavioral measures and electrophysiology/twophoton imaging to show that mice and tree shrews do not use opponent motion signals to segregate the visual scene into objects and background whereas mouse lemurs and macaque monkeys do. This information is of great importance for understanding to what extent the rodent visual system is a good model for primate vision and the use of multiple species is highly revealing for understanding the development of figure-ground segregation through evolution.

      The behavioral data is of high quality. I would add one caveat: it seems unfair to report that the tree shrews could not generalize the opponent motion stimulus as it seems they struggled to learn it in the first place. Their performance was below 60% on the training data and they weren't trained for many sessions in comparison to the mice. Perhaps with more training the tree-shrews might have attained higher performance on the textures and this would allow a more sensitive test of generalization. The authors should qualify their statements about the treeshrews to reflect this issue.

      The reviewer is correct in this assertion. For context, we performed the mouse experiments first and were hoping to see texture-invariant performance but instead realized that the mice were resorting to memorizing patterns. With this in mind, when expanding to treeshrews we wanted to prevent this type of learning to really test whether texture invariant recognition was possible, thus we increased the number of orientations tested to 5, resulting in 10 possible textures that would have to be memorized in contrast to the 4 that had to be memorized for the mice. We now clarify this in the text:

      “We reversed the number of train/test patterns compared to what was used for the mice (Fig. 2i1) because we reasoned that animals might be more likely to generalize if given more patterns for training. We had performed the mouse experiments initially, noticed the memorization approach, and were trying to avoid this behavior in treeshrews. This also means that the naturalistic train condition presented to treeshrews was harder than that for mice (5 orientations for treeshrews vs. 2 orientations for mice in the training set).”

      Reviewer #2 (Public Review):

      Luongo et al. investigated the behavioural ability of 4 different species (macaque, mouse lemur, tree shrew and mouse) to segment figures defined by opponent motion, as well as different visual features from the background. With carefully designed experiments they convincingly make the point that figures that are not defined by textural elements (orientation or phase offsets, thus visible in a still frame) but purely by motion contrast, could not be detected by nonprimate species. Interestingly it appears to be particularly motion contrast, since pure motion - figures moving on a static background - could be discriminated better, at least by mice. This is highly interesting and surprising -- especially for a tree shrew, a diurnal, arboreal mammal, very closely related to primates and with a highly evolved visual system. It is also an important difference to take into account considering the multitude of studies on the mouse visual system in recent years.

      The authors additionally present neuronal activity in mice, from three different visual cortical areas recorded with both electrophysiology and imaging. Their conclusions are mostly supported by the data, but some aspects of the recordings and data analysis need to be clarified and extended.

      The main issues are outlined below roughly in order of importance:

      1) The most worrying aspect is that, if I interpret their figures correctly, their recordings seem not very stable and this may account for many of the differences across the visual conditions. The authors do not report in which order the different stimuli were shown, their supplemental movie, however, makes it seem as though they were not recorded fully interleaved, but potentially in a block design with all cross1 positions recorded first, before switching to cross2 positions and then on to iso... If I interpret Figure 6a correctly, each line is the same neuron and the gray scale shows the average response rate for each condition. Many of these neurons, however, show a large change in activity between the cross1 and the cross2 block. Much larger than the variability within each block that should be due to figure location and orientation tuning. If this interpretation is correct, this would mean that either there were significant brain state changes (they do have the mice on a ball but don't report whether and how much the animals were moving) between the blocks or their recordings could be unstable in time. It would be good to know whether similar dramatic changes in overall activity level occur between the blocks also in their imaging data.

      The same might be true for differences in the maps between conditions in figure 4. If indeed the recordings were in blocks and some cells stopped responding, this could explain the low map similarities. For example Cell 1 for the cross stimuli seems to be a simple ON cell, almost like their idealized cell in 3d. However, even though the exact texture in the RF and large parts of the surround for a large part of the locations is exactly identical for Cross1 and Iso2, as well as Cross2 and Iso1, the cells responses for both iso conditions appear to only be noise, or at least extremely noise dominated. Why would the cell not respond in a phase or luminance dependent manner here?

      This could either be due to very high surround suppression in the iso condition (which cannot be judged within condition normalization) or because the cell simply responded much weaker due to recording instability or brain state changes. Without any evidence of significant visual responses, enough spikes in each condition and a stable recording across all blocks, this data is not really interpretable. Instability or generally lower firing rates could easily also explain differences in their decoding accuracy.

      Similarly, it is very hard to judge the quality of their imaging data. They show no example field of views or calcium response traces and never directly compare this data to their electrophysiology data. It is mentioned that the imaging data is noisy and qualitatively similar, but some quantification could help convince the reader. Even if noisy, it is puzzling that the decoding accuracy should be so much worse with the imaging data: Even with ten times more included neurons, accuracy still does not even reach 30% of that of the ephys data. This could point to very poor data quality.

      We address the issue of stability of selectivity in our response to all reviewers above. Note that we wavered on whether to include the imaging data at all given the much better decoding accuracies from the electrophysiology data, and decided to include it for two main reasons:

      1) It qualitatively gives a very similar result, namely that there is a texture-dependent ability to resolve the position of given figures, suggesting that the rodent visual system is indeed better equipped at representing figure locations for the cross and iso stimuli than that nat stimulus.

      2) The correspondence on subsequent days between single cells and their corresponding spatial preference responses suggests that this is a stable and consistent preference represented by these neurons.

      The following verbiage has been added to the methods section

      Matching cells across days. Cells were tracked across days by first re-targeting to the same plane by eye such that the mean fluorescence image on a given day was matched to that on the previous day, with online visual feedback provided by a custom software plugin for Scanbox. […] This result points to the consistency of the spatial responses in the visual cortex as a substrate for inferring figure position.

      2) There is no information on the recorded units given. Were they spike sorted? Did they try to distinguish fast spiking and regular spiking units? What layers were they recorded from? It is well known that there are large laminar differences in the strength of figure ground modulation, as well as orientation tuned surround suppression. If most of their data would be from layer 5, perhaps a lack of clear figure modulation might not be that surprising. This could perhaps also be seen when comparing their electrophysiology data to the imaging data which is reportedly from layer 2/3, where most neurons show larger figure modulation/tuned surround suppression effects. There is, however, no report or discussion of differences in modulation between recording modalities.

      We used Kilosort (Pachitariu et al., 2016) for spike sorting of the data. The output of the automatic template-matching algorithm from Kilosort was visualized on Phy and then curated manually.

      We did not compute current source density. The 64 contacts on our probe spanned 1 mm, so we recorded cells throughout all layers of cortex. We didn’t focus on specific layer, as we didn’t find strong modulation by figure/ground or border ownership in any of our cells. We did not distinguish the fast and regular spike units.

      3) There is an apparent discrepancy between Figure 5d and i. How can their modulation index be around -0.1 for cross (Figure 5d) - which would correspond to on average ~20% weaker responses to a figure than to background, when their PSTH (5i) shows an almost 50% increase of figure over ground. This positive figure modulation has also been widely reported in the literature (Schnabel, Kirchberger, Keller). Are there different populations of cells going into these analyses?

      There was a mismatch in cells for plotting the F/G modulation index and time-course, since we previously set different criteria. Now we used the same criteria and replotted Figure 5d, e, g, h.

      4) In a similar vein, it is not immediately clear why the average map correlation would be bigger for random cell pairs (~0.2, Fig 3g) than for the different conditions of the same cell (~0, Fig 5b). Could this be due to differences in recording modality (imaging in 3g and ephys in 5b)?

      We suspect the reviewer is correct, namely, that the difference in recording modality accounts for these differences. The spatial mixing of signals inherent to calcium imaging can be problematic for the study of these figure ground and border ownership signals. Thus, it can be assumed that the non-zero mean observed in Fig 3g, is likely due to neuropil contamination, whereas Fig. 5 is purely ephys data and thus has no such confounds.

      5) The maps in Figure 4 should show the location of the RF, because they cannot be interpreted without knowledge of the RF center and size. For example cell 4 in the iso 1 condition could be a border cell, or could respond to the center of the figure. It is impossible to deduce without knowledge of the location of the RF.

      We have added the following clarification to the figure legend for Fig. 4a:

      “Overlaid on these example stimuli are grids representing the 128 possible figure positions and a green ellipse representing the ON receptive field. Note that this receptive field is the Gaussian fit from the sparse noise experiment.”

      We have also added the following clarification to the figure legend for Fig. 4b:

      “Please note that for all of these experiments the population receptive field was centered on the grid of positions.”

      6) It could help the reader to discuss the interpretation of the map correlations in Fig 5 a and b in more detail. My guess is that negatively correlated maps (within cross or iso condition) could come from highly orientation tuned neurons, whereas higher correlation values point to more generally figure/contextually modulated cells (within this condition). While the distribution is far from bimodal, this does not rule out a population of nicely figured modulated cells at the high end of the distribution. It might not be necessary at the level of V1 that the figure modulation be consistent across all textures. It would not be surprising, if orientation contrast-defined, phase contrast-defined and motion contrast-defined figures could be signalled to higher areas by discrete populations of V1 or even LM cells.

      We agree the reviewer’s interpretation of the neural findings is possible. But at least from the behavior, it seems unlikely that a motion contrast-defined figure is generated anywhere in the rodent brain.

      7) Some of the behavioural results warrant a little more explanation or discussion, as well. In Figure 2h, the mice seem significantly better on the static version of the iso task, than on the moving one. If statistically significant, this should be discussed. Is this because the static frame was maximally phase offset? Then the figure would indeed be better visible better (bigger phase contrast in more frames) than in the moving condition.

      Yes, indeed, in Figure 2h, the static frame was chosen with maximal positional displacement, and thus the figure can likely be seen better. We have added this clarification to the figure legend for Fig. 2h.

      Figure 2 and extended Figure 1c: why is the mouse lemur performing so poorly on average? It also appears to have biggest problems with the cross stimulus early on in training.

      The behavior experiments in the mouse lemur were carried out under an international collaboration and with substantially less exploratory experiments than was done for mouse, treeshrew, and macaque. For the mouse lemur, we simply went with a training regimen that we knew had worked efficiently for treeshrews and without any optimization of the procedure. Thus we would caution against over-interpreting the exact learning rates of the mouse lemurs and instead focus on the qualitative result that they could generalize for the Nat condition. This was a marked departure from the rodents and shrews and is the main finding we would like to convey. We suspect that with future optimizations of behavior shaping, training times and performances could likely both be improved.

      Tree shrews seem not to be able to memorize the textures as well as the mice do. Is this because of less deprivation/motivation? Or because of the bigger set of textures in training? This would make memorization harder and could thus lower their overall performance. The comparative aspects are very interesting but the absolute differences in performance could be discussed in more detail or explained better.

      Reviewer 1 raised a similar concern, please see our response above

      8) In Figure 7b, why wouldn't the explanation for the linear decodability in cross also hold for iso? There are phase offsets at the borders that simple cells should readily be able to resolve, just as in the case of orientation discontinuities. Could they make a surround phase model, similar to their surround orientation model, that could more readily capture the iso discontinuities?

      The reviewer is likely correct in their assertion that one could consider further hand tuning the model to account for the observed diversity in responses (namely, Cross > Iso > Nat for figure position decoding). We went directly to a DNN to model the data, since we thought this would be more powerful, given that the DNN features were not tuned to explain our neural data per se.

    1. Author Response

      Reviewer #1 (Public Review):

      This study used a multidimensional stimulus-response mapping task to determine how monkeys learn and update complex rules. The subjects had to use either the color or shape of a compound stimulus as the discriminative dimension that instructed them to select a target in different spatial locations on the task screen. Learning occurred across cued block shifts when an old mapping became irrelevant and a new rule had to be discovered. Because potential target locations associated with each rule were grouped into two sets that alternated, and only a subset of possible mapping between stimulus dimensions and response sets were used, the monkeys could discover information about the task structure to guide their block-by-block learning. By comparing behavioral models that assume incremental learning, quantified by Q-learning, Bayesian inference, or a combination, the authors show evidence for a hybrid strategy in which animals use inference to change among response sets (axes), and incremental learning to acquire new mappings within these sets.

      Overall, I think the study is thorough and compelling. The task is cleverly designed, the modeling is rigorous, and the manuscript is clear and well-written. Importantly there are large enough distinctions in the behavior generated by different models to make the authors' conclusions convincing. They make a strong case that animals can adopt mixed inference/updating strategies to solve a rule-based task. My only minor question is about the degree to which this result generalizes beyond the particulars of this task.

      Thanks for these kind comments. Regarding generalization, we agree with the reviewer and did not intend to make any claim about how the particular result generalizes beyond this task. Indeed, the specific result could depend on the training protocol even within the same task. We now discuss this explicitly in the manuscript, lines 800-810. However, we do take the view that even if the way the monkey’s behavior played out in this setting is a lucky accident, that may still reveal something fundamental about learning processes in the brain.

      Reviewer #2 (Public Review):

      The authors trained two monkeys to perform a task that involved sequential (blocked) but unsignalled rules for discriminating the colour and shape of visual stimulus, by responding with a saccade to one of four locations. In rules 1 and 3, the monkeys made shape (rule 1) or colour (rule 3) discriminations using the same response targets (upper left / lower right). In rule 2, the monkeys made colour judgments using a unique response axis (lower left/upper right). The authors report behaviour, with a focus on time to relearn the rules after an (unsignalled) switch for each rule, discrimination sensitivity for partially ambiguous stimuli, and the effect of congruency. They compare the ability of models based on Q-learning, Bayesian inference, and a hybrid to capture the results.

      The two major behavioural observations are (1) that monkeys re-learn faster following a switch to rule 2 (which occurs on 50% of blocks and involves a unique response axis), and (2) that monkeys are more sensitive to partially ambiguous stimuli when the response axis is unique, even for a matched feature (colour). These data are presented clearly and convincingly and, as far as I can tell, they are analysed appropriately. The former finding is not very surprising as rule 2 occurs most frequently and follows each instance of rule 1 or 3 (which is why the ideal observer model successfully predicts that the monkeys will switch by default to rule 2 following an error on rules 1 or 3) but it is nevertheless reassuring that this behaviour is observed in the animals. It additionally clearly confirms that monkeys track the latent state that denotes an uncued rule.

      The latter finding is more interesting and seems to have two potential explanations: (i) sensitivity is enhanced on rule 2 because it is occurs more frequently; (ii) sensitivity is enhanced on rule 2 because it has a unique response axis (and thus involves less resource sharing/conflict in the output pathway).

      The authors do not directly distinguish between these hypotheses per se but their modelling exercise shows that both results (and some additional constraints) can be captured by a hybrid model that combines Bayesian inference and Q learning, but not by models based on either principle alone. A Q-learning model fails to capture the latent state inference and/or the rule 2 advantage. The Bayesian inference model captures the rapid switches to rule 2 (which are more probable following errors on rule 1 and rule 3) but predicts matched discrimination performance for partially ambiguous stimuli on colour rules 2 and 3. This is because although knowing the most likely rule increases the probability of a correct response overall it does not increase discriminability and thus boosts the more ambiguous stimuli. I wondered whether it might be possible to explain this result with the addition of an attention-like mechanism that depends on the top-down inference about the rule. For example, greater certainty about the rule might increase the gain of discrimination (psychometric slope) in a more general way.

      We agree with the reviewer that our logic in ruling out pure inference models assumes that other factors affecting performance, like attention or motivation, are equivalent between blocks. In principle, if there were large and sustained differences in these factors between Rule 2 vs Rule 1 or 3 blocks, that might offer a different explanation for the effect. We now mention this caveat in the manuscript. In terms of actually leveraging this into a full account of the behavior, we are not quite sure how to instantiate the reviewer’s particular idea why this would be the case, however, since (as as we show in Fig. 3a,b,c, and Fig. S4a,b,c) the difference in psychometric slopes lasts at least 200 trials into the rule, even when (in the hybrid learning model) the feature weights have converged (Figure 4 – figure supplement 2). It’s hard to see why elevated uncertainty about the rule would persist this long in anything resembling an informed ideal observer model.

      The authors propose a hybrid model in which there is an implicit assumption that the response axis defines the rule. The model infers the latent state like an ideal observer but learns the stimulus-response mappings by trial and error. This means that the monkeys are obliged to constantly re-learn the response mappings along the shared response axis (for rules 1/3) but they remain fixed for rule 2 because it has a unique response axis. This model can capture the two major effects, and for free captures the relative performance on congruent and incongruent trials (those trials where the required action is the same, or different, for given stimuli across rules) on different blocks.

      I found the author's account to be plausible but it seemed like there might be other possible explanations for the findings. In particular, having read the paper I remained unclear as to whether it was the sharing of response axis per se that drove the cost on rule 3 relative to 2, or whether it was only because of the assumption that response axis = rule that was built into the authors' hybrid model. It would have been interesting to know, for example, whether a similar advantage for ambiguous stimuli on rule 2 occurred under circumstances where the rule blocks occured randomly and with equal frequency (i.e. where there was response axis sharing but no higher probability); or even whether, if the rule was explicitly signalled from trial to trial, the rule 2 advantage would persist in the absence of any latent state inference at all (this seems plausible; one pointer for theories of resource sharing is this recent review: https://www.cell.com/trends/cognitive-sciences/fulltext/S1364-6613(21)00148-0?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS1364661321001480%3Fshowall%3Dtrue). No doubt these questions are beyond the scope of the current project but nevertheless it felt to me that the authors' model remained a bit tentative for the moment.

      Thanks for these interesting thoughts. It is true that the imbalanced pattern of sharing (of response axes, and actually also features) across the three rules has important consequences for learning/inference under our model (and indeed other latent state inference models such as the informed ideal observer). It is an intriguing idea that these features of the design might cause interference even per se, for instance even without the need to do inference or learning because the rules are fully signaled. We agree this (and the other case the reviewer mentioned) is an interesting direction for future work. We have added this in the discussion, line 800-812.

    1. Author Response

      Reviewer #1 (Public Review):

      In order to study odor response dynamics in the olfactory peripheral organ, Kim et al. employs extracellular sensillum recording from the locust antenna to a set of 4 odors at different concentrations. Using spike sorting to assign odor responses to single olfactory sensory neurons (OSNs), the authors demonstrate that OSNs exhibit four distinct response motifs comprising two types of excitation, namely fast and delayed excitatory responses, as well as inhibitory responses in form of offset responses and inhibition. Notably, OSNs can switch between these four motifs depending on the odor applied. This finding is highly interesting and facilitates odor classification as demonstrated by computational modeling in this study. Furthermore, the authors demonstrate that each response motifs follows different adaptation profiles which further results in an increased coding space. The authors conclude and provide evidence with their model that the experimentally observed response dynamics also facilitate determining the distance to the odor source. The obtained results are novel and demonstrate a new dimension of odor response properties at the peripheral level. However, given that the authors used a very limited set of chemically similar odors and considering that the broad tuning and wiring of OSNs in the locust is special and follows different rules compared to the olfactory circuitry of OSNs in other insects (i.e. locust OSNs do not converge onto a single glomerulus but target multiple glomeruli), I wonder whether the observed distinct response motifs are a general phenomenon or a rather special case. I therefore recommend that the authors discuss their findings in the light of these key issues before general conclusions with regard to odor coding rules is being drawn. Do these response motifs also occur for highly ecologically relevant odors, such as PAN, where a rather specialized olfactory circuit would be assumed? Hence, the MS would benefit if those questions would be addressed as well. In addition, the computational modeling approach is written in specialized terms and is therefore difficult to grasp for readers lacking modeling expertise.

      We thank the reviewer for this very positive and helpful assessment of our work. We agree with suggestions to expand our discussion of (1) olfactory circuitry following OSNs and of (2) responses to highly ecologically relevant odors. We have also extensively revised the description of our computational modeling approach to make it understandable to non-specialists.

      In brief:

      (1) The results we present here address only peripheral activity – we do not record or model responses of follower neurons. Because our conclusions do not depend to any extent upon the architecture of the locust's olfactory system, we would prefer to limit necessarily speculative discussion or analyses of these factors. We agree these factors provide interesting context for our work, so we have now expanded our discussion to include: “In other species, how exactly ORN response patterns are utilized downstream may depend on species-specific variations in connectivity between ORNs and the antennal lobe and its glomeruli” (lines 490-492). More investigation is needed to address this important question. Nevertheless, our study shows ORN response motifs provide useful information, and conveying this information to downstream circuits augments coding space.

      (2) We share the reviewers’ concern that our odor set should include ecologically particularly relevant odors. Indeed, it was for this reason that our odor set includes components of the locust diet, wheat grass: 1-Octanol, 1-Hexanol, and Cyclohexanol. As above, though, we are reluctant to speculate on the responses of downstream circuits. But to acknowledge the reviewer’s important point, we have added the following text to our discussion in lines 401-405: “For these studies we used odorants known to be ecologically relevant to locusts, including several found in the head space of wheat grass. Future experiments with larger sets of odorants, including blends or locust pheromones like 4-vinylanisole (4VA) and phenylacetonitrile (PAN), may help clarify the logic of motif switching.”

      Reviewer #2 (Public Review):

      This manuscript provides additional data about how smell is encoded by insects. The study includes both new experimental measurements and simulations. At present, there are questions about whether simulations are appropriately performed to support experimental measurements.

      The main experimental finding reported here is that the same olfactory receptor neurons (ORN) can respond with different temporal dynamics to different odorants. This finding is of interest. However, it is very important to discuss whether the differences in temporal dynamics can be explained by differences in how this odorant is carried by air, as has been described here: https://pubmed.ncbi.nlm.nih.gov/23575828/.

      We agree this phenomenon is of great interest, and we have now expanded our discussion section to address it.

      In the cited paper (see also Su et al, 2011), PID response characteristics were indeed quite different for different odors, reflecting “fast” and “slow” intrinsic odor dynamics. We are aware of these studies and shared the reviewer’s concern, and for this reason we also made PID recordings during odor presentations. These recordings show our odor set included only “fast” odorants (please see the figure below). We also note that, across our extensive dataset, all odors could elicit all four response motifs. These observations rule out the possibility that differences in how odorants are carried by air underlie the different temporal dynamics we observed in OSN responses.

      We now discuss this important point in the text, as follows: “Earlier work established that the intrinsic dynamic properties of odorants, described as “fast” or “slow,” can contribute to variations in the timing of ORN responses (Su et al., 2011; Martelli et al., 2013). However, our experiments ruled out the possibility that intrinsic odorant dynamics underly the response motifs we describe here. First, across our extensive dataset, all odors could elicit all four response motifs; second, photoionization detector recordings of our odor presentations all revealed “fast” dynamics (not shown). It seems likely that “slow” odors would elicit concentration-dependent elaborations in the response motifs we observed. In future work it will be interesting to investigate ways intrinsic odor dynamics interact with ORN response motifs. We predict such interactions would further increase ORN response dimensionality” (lines 370-380).

      There are several questions that need to be addressed regarding the simulations part of the manuscript.

      1) There is a mismatch between the number of ORNs used in the model and in the insect system studied.

      The exact number of ORNs in the locust is not known, but estimates range from 45,000 to 113,000 per antenna (Leitch & Laurent 1996; Perez-Orive et al 2002; Galizia & Sachse 2010). We chose to model a smaller but still large set of ORNs (10,000) which we believe is a reasonable compromise between the ideal size (which would be true number of ORNs in locust), and limitations needed to achieve practical computational efficiency. Indeed, almost all computational models are unavoidably scaled-down versions of the biological organisms.

      2) The demonstration in Figure 5 that motif switching improves odor classification includes motif switching for a given odorant, which is not observed experimentally.

      We regret that our description of the experiment presented in Figure 5 was confusing, and we have revised extensively to clarify this in our revision. In brief, the simulation shown in Figure 5 was not, as the reviewer understood, an attempt to model motif switching that occurs when a given odorant is presented repeatedly; rather, it shows how responses to two different, similar odors (Odor 1 and Odor 2) become increasingly different from each other when the probability of motif switching increases.

      We have now revised the text to clarify this point as follows: “With our model we could independently vary odor-elicited response motifs and response magnitudes (Figure 4E), allowing us to evaluate the extent to which motif switching benefitted odor classification in a way that cannot be tested in vivo. Thus, we simulated a realistically large number of ORNs (10,000) and compared the relative success of classifying two different odors (Odor 1 and Odor 2) with three different versions of our model in which we systematically varied the probability of motif switching. Model Version 1: the probability of switching response motif when switching from Odor 1 to Odor 2 was 0%; Version 2: 10%; Version 3: 50%. We found that the model versions that simulated higher motif switching probability made it easier to distinguish these two similar odors.” (lines 191-195, 206-209).

      We have also revised the figure caption as follows: “Computational model shows response motif switching substantially improves odor classification. A) Simulated ORN spiking illustrates different motif switching probabilities. Odors 1 and 2 are similar (see Methods). Each ORN response is sorted by motifs elicited by Odor 1. Raster plots show the responses to Odor 2 become increasingly different from responses to Odor 1 as motif switching probability increases. B) ORN odor-elicited response trajectories in reduced PCA space show motif switching increases the separation between responses to similar Odors 1 and 2; response to Odor 1 (blue) is the same in each panel; response to Odor 2 (red) changes with switching probability. C) Odor classification success as a function of odor similarity and motif switching probability for 1s (top) and 4s (bottom) stimulus pulses; even low switching probabilities improve classification performance; darker shading indicates lower classification accuracy. Odor similarity is quantified by angles (degrees) between odor vectors (see Methods)” (lines 231-239).

      3) The methodology for estimating neural temporal dynamics needs to be corrected to apply to the natural stimuli used here.

      We agree and thank the reviewer for raising this important point. To appropriately account for natural correlations present in the stimuli we used in experiments, we have now completely redone our analysis, substantially revised Figure 6, and rewritten the Methods section titled “Temporal filters using linear non-linear models.” Using methods appropriate for strongly correlated and natural odorant stimuli delivered experimentally, we obtained results consistent with those in the previous version of our manuscript.

      Reviewer #3 (Public Review):

      In this contribution, the authors align an extensive analysis of in vivo recordings of olfactory receptor neuron (ORN) responses to odors in the locust with a data-driven mathematical model of ORN population coding. Their results provide novel insights into the temporal dynamics of peripheral encoding of time-varying and naturalistic olfactory input.

      The manuscript presents three central experimental results: 1) ORNs odor responses can be grouped into 4 distinct response motifs (response profiles). This has partly been known with respect to the typical excitatory phasic-tonic motif and odor offset responses. The exhaustive data set here is however unprecedented. 2) Individual ORNs can switch their response motif, e.g. from excitatory to inhibitory responses. To my knowledge, this is entirely new, highly interesting, and has strong implications. For one it implies an increased coding space and odor separability, which is supported by the authors' model study. It also bears implications for our understanding of processing in the antennal lobe where projection neurons were shown to exhibit property but this has largely been attributed to network processing within the AL. The authors discuss ephaptic interactions as a possible underlying mechanism. 3) ORNs not only show classical within and across pulse adaptation where the response amplitude reduces, but also the novel result that the offset response can increase across repeated pulses with short inter-stimulus intervals. The data-driven model reproduces the experimental observations and a population model that confirms the assumed increase in coding space. In the temporal domain, the authors then perform simulations that mimic realistic stimulus statistics with stochastic arrival of odor packets of variably short duration. The model with a trained linear filter and a non-linear transfer function faithfully predicts the experimental firing rates.

      These results, based on an exhaustive set of experimental data, provide a novel view of peripheral odor coding in insects and they will have a particularly strong impact on biologically realistic computational (spiking) circuit models of sensory processing and sensory-to-motor transformations during odor source navigation in naturalistic simulated odor environments where conclusive data and analysis of ORN signaling has thus far been lacking.

      We thank the reviewer for this very thoughtful and positive assessment of our work.

    1. Author Response

      Reviewer #1 (Public Review):

      Authors introduced new strategy of genetic manipulation in mice to reveal functional development of the retrotrapezoid nucleus (RTN) neurons that is known as an important brainstem region for central chemoreception and the dysfunction is relate to congenital central hypoventilation syndrome (CCHS) neuropathology. They used a conditional mutation of Phox2b within Atoh1derived cells (Atoh1Cre/Phox2bΔ8 mice) and examined a) respiratory rhythm; b) ventilatory responses to hypercapnia and hypoxia and c) number of RTN-chemosensitive neurons. They found that 1) mice with mutant Phox2b expression showed a suppressed breath activity to hypoxia and hypercapnia in neonates; 2) adult mutant mice presented irregular breathing pattern, partial recovery of the ventilatory response to hypoxia and complete recovery of response to hypercapnia; 3) anatomical data showed reduced number of activated neurons by hypercapnia and Phox2b immunoreactivity in the RTN. They concluded that conditionally expression of Phox2b mutation by Atoh1 affected development of the RTN neurons and suggested that Atoh1/Phox2b system in the RTN was essential for the activation of breathing under hypoxic and hypercapnia condition. They thought that their findings provided new evidence for mechanisms related to CCHS neuropathology. The conclusions of this paper are well supported by data, but careful discussion seems to be required for comparison with results of various previous studies performed by different genetic strategies for the RTN development.

      We would like to thank the reviewer for the comments on our manuscript. In the present version, we made several corrections as suggested by the reviewers to facilitate interpretation and strength the manuscript.

      Reviewer #2 (Public Review):

      Mutations in the Phox2B gene can lead to congenital central hypoventilation syndrome with variable presentations. Two distinct classes of causative mutations have been found in the human population. The first group consists of mutations that result in trinucleotide, polyalanine repeat expansions, referred to as PARM. The second group are non- polyalanine repeat expansion mutations (NPARM) that includes missense, nonsense, and frameshift mutations. Each group (and even specific mutations) present with differing clinical phenotype severity, with NPARM mutations typically being more severe. As Phox2B is expressed across a multitude of cell types across the life an individual, there remains much to be understood as to the cell specific effects of various Phox2B mutations on phenotype. To add to our understanding, the authors utilized a conditional Phox2bΔ8 allele that, upon recombination, replaces Exon 3 and UTR with a mutated exon and IRES GFP reporter. This approach allows for an inducible NPARM mutation and reporter expression in a targeted cell type. The authors focused on Atoh1 expressing cells using an Atoh1 expressing Cre recombinase line (Atoh1_Cre). Atoh1 has been shown to also be coexpressed in the RTN and in the para and inter-trigeminal regions of the Pons. After inducing the Phox2B mutations in one allele, the authors examined respiratory features in both adults and neonate mice under room air, hypercapnia (7%) and Hypoxia (8%). The Atoh1_Cre; Phox2bΔ8 adult mice showed a significant body weight difference. Under their plethysmography approach neonate mice breathing room air showed few differences with a potential difference in tidal volume. Notably adult mice show irregularity in their breathing. Both adult and neonate mice may show compromised chemosensory deficits. A potential hypercapnic deficit likely resolves in the adult but there may remain a compromised hypoxic reflex in the adult. Notably, Atoh1_Cre; Phox2bΔ8 mice showed reduced cfos expression in the RTN after hypercapnic stimulation and reduced Phox2B immuno-reactivity.

      The premise of the paper is to examine how a distinct mutation in a specific cellular context may contribute to clinical outcomes. The potential phenotypes are interesting and illuminate how differing mutations may drive different phenotypes or phenotype severity. While the RTN is likely a key mediator of the reported phenotypes, the conclusions drawn by the authors cannot be fully supported with the data presented.

      We would like to thank the reviewer for the comments. In the present version, we have made all changes suggested and we performed new sets of additional experiments to strengthen the work. We are very enthusiastic about the new version of the manuscript, and we believe it opened new questions that could be addressed in the future.

      The authors assign all phenotypes to RTN function. However, there are other documented and potential undocumented areas of Atoh1 and Phox2b overlap that could either impact breathing directly or indirectly through metabolism and stress responses (PMID 8184995). As noted above, para trigeminal neurons including those in the ITR also co-express Atoh1 and Phox2B and are captured in the Atoh1_Cre; Phox2bΔ8 mouse model. The inter-trigeminal region is associated with apneic reflexes and jaw opening (PMID: 19914183). Thus, perturbations to this center may underlie the increased irregularity seen in adult life. A potential role in chemosensory function cannot be entirely ruled out either. While Rose et al. assert that the RTN and para- and inter- trigeminal neurons are the only ones co-expressing Atoh1 and Phox2B (using antibodies), the persistent cumulative GFP labeled fate map offered by the Atoh1_Cre; Phox2bΔ8 model would allow the authors to rule in or rule out any other uncharacterized overlapping populations. Such a fate map may also help to inform as to why the adult mice are significantly underweight. The weight phenotype may stem from metabolic dysregulation, changes in behavior, or feeding. Changes in metabolism may drive secondary changes in breathing and chemosensory reflexes that play a role in the reported phenotypes. Ultimately, the relative roles of para-trigeminal and RTN neurons in these phenotypes should be dissected out.

      Yes, we ran a new series of experiments and noticed that Phox2b+ neurons in the pons as well as the number of TH cells in the A1, A2, A6, and C1 were not affected by the mutation. Unfortunately, we were unable to quantify the number of Phox2b-expressing neurons in the paratrigeminal region.

      Both the adult and neonate plethysmography was not collected in line with current best practices. Adult whole body plethysmography is best carried out in a temperature controlled chamber held at thermo-neutrality. This minimizes any thermo-regulatory and metabolic effects on respiratory drive. Concurrent measurement of one or more metabolic parameters such as VO2 or VCO2 is required to determine if baseline breathing, and chemosensory reflex phenotypes may be affected by changes metabolism or persistent metabolic imbalances (acidosis or alkalosis). Whole body measurements in neonates are do not allow for accurate assessment of tidal volume. Rather head out or facemark pneumotachography are more accurate, (PMID: 25017785).

      We totally agree with the reviewer. In fact, some information and misconception were noticed in the previous version. We added the correct way in which the respiratory parameters were measured in both neonate and adult mice. Additionally, we performed head-out plethysmograph in a subset of neonates (control and mutant) and added it in the result section. We also measure VO2 and VE/VO2 in neonates and adults.

      Reviewer #3 (Public Review):

      The work by Ferreira and colleagues set to define the functional consequences of a PHOX2B (Phox2bdelta8) mutation, belonging to the group of non-polyalanine repeat expansions, when restricted to Atoh1 expressing cells. In doing so, the authors generated a new mouse model (Atoh1Cre,Phox2bdelta8 mice) for the study of the central respiratory chemoreceptor circuit. Ferreira et al., found that these conditional mutants present with largely unaffected breathing parameters in postnatal life. However, neonatal breathing irregularities, normally observable in control neonates, are not corrected with the maturation of the conditional mutants. Furthermore, the authors found that conditional Atoh1Cre,Phox2bdelta8 neonates fail to display ventilatory responses to hypoxic (low O2 content in air) and hypercapnic (high CO2 content in air) challenges. The authors show that Atoh1Cre,Phox2bdelta8 adult mice appear to "recover" the capacity to response to hypercapnic, but not hypoxic, challenges. Lastly, the authors found reduced numbers of Phox2b+ cells in an "area" where the retrotrapezoid nucleus, a key center in the respiratory chemoreceptor circuit, normally locates.

      Strengths:

      The most exciting aspect of this work is the modelling of the Phox2bdelta8 mutation in one element of the central neuronal circuit mediating respiratory reflexes, that is in the retrotrapezoid nucleus. To date, mutations in the PHOX2B gene are commonly associated with most patients diagnosed with central congenital hypoventilation syndrome (CCHS), a disease characterized by hypoventilation and absence of chemoreflexes, in the neonatal period, which in severe cases can lead to respiratory arrest during sleep. Two distinct types of PHOX2B mutations have been identified in CCHS patients: i) polyalanine repeat expansions, and ii) non-polyalanine repeat expansions. Non-polyalanine repeat expansions tend to be more prevalent in severe cases of CCHS. Thus, the characterization of the Phox2bdelta8 mutation could allow for a better understanding of the etiology behind CCHS.

      Weaknesses:

      Whereas the most exciting part of this work is the modelling of the Phox2bdelta8 mutation in retrotrapezoid neurons using conditional mutagenesis driven by Atoh1 (i.e. Atoh1Cre,Phox2bdelta8 mice), the weakness of this study is the lack of a clear physiological, developmental, and anatomical distinction between this approach and similar studies already reported elsewhere, for instance the use of Atoh1Cre,Phox2bflox/flox and P2b::CreBAC1;Atoh1lox/lox mice (Ruffault et al., 2015, DOI: 10.7554/eLife.07051), Egr2cre;P2b27Alacki (Ramanantsoa et al., 2011, DOI: 10.1523/JNEUROSCI.1721-11.2011), Atoh1Phox2bCKO mice (Huang et al., 2017, DOI: 10.1016/j.neuron.2012.06.027) and Egr2cre;Lbx1FS (Hernandez-Miranda et al., 2018, DOI: 10.1073/pnas.1813520115).

      Several conclusions presented in this work are not directly supported by the provided data. For instance, the reduction in the number of retrotrapezoid neurons in Atoh1Cre,Phox2bdelta8 mice or the reduction of fos+ activated retrotrapezoid neurons after CO2 exposure, as the identity of retrotrapezoid neurons was not thoroughly determined. Furthermore, the authors conclude from their plethysmograph (respiratory recordings) data that Atoh1Cre,Phox2bdelta8 neonatal mice display an impaired ventilatory responses to hypoxia (low O2 in air) and hypercapnia (high CO2 in air), but that these mutant animals recover the capacity to respond to hypercapnia, but not to hypoxia, in the adult life. This is a bit of an overstatement, as their plethysmograph recordings show that adult Atoh1Cre,Phox2bdelta8 mice do respond to low O2 in air, as these mice accelerate respiration, increase tidal volumes and minute ventilation in the same fashion as control mice. However, what the presented data show is that adult Atoh1Cre,Phox2bdelta8 mice do not sustain the ventilatory response as efficient as control mice.

      We would like to thank the reviewer for the comments, strengths, and weakness of our study. In the present version, we have made a significant change throughout the manuscript as suggested by the editor and reviewers. In addition, we performed new sets of experiments to strengthen our work. We are very enthusiastic about the current version, and we believe it will open new questions that need to be addressed in future studies

    1. Author Response

      Reviewer #1 (Public Review):

      Wosniack et al. perform the analysis of larval trajectories from behavioral experiments and build a phenomenological model and efficiently combine the two to dissect behavioral strategies that Drosophila larvae use during foraging. The paper touches upon several factors that influence foraging: from food quality and distribution to genetic polymorphism and finally the contribution of sensory cues. While the first two are well explored and characterized in the paper, the contribution of different sensory modalities is less investigated. They study how homogeneous food substrates or food distributed in patches influence foraging strategies. They find a modular organization of behavioral strategies that is dependent of food characteristics: food quality modulates crawling speed, turning and pausing while increases in the time spent inside the patches are the result of biasing turning towards the patch center when the larvae are at the food-no food interface. Furthermore, using anosmic animals they determine that olfaction is differentially involved in the foraging decisions depending on the type of food substrates that the larvae are exploring. Finally, they perform this analysis in rover and sitter larvae to determine the effect of the foraging gene polymorphism on these behaviors and show that its expression (where sitter larvae are slower, turn less and pause more compared to rover larvae) is dependent on the food distribution. They propose that larvae adapt the extent of their exploration to the quality of food. This detailed analysis of elements that constitute behavioral strategies sets the basis for identifying genes involved in foraging and the neural substrates of the different behavioral modules and ultimately understanding the neural circuit mechanisms involved.

      The paper efficiently combines analysis of larval trajectories from experiments with computational modeling and identifies the behavioral elements that contribute to foraging. The authors show that olfaction has an important role when foraging on yeast substrates but not on sugar-rich substrates using anosmic larvae. They propose that taste could contribute more on sugar and apple juice substates however they do not test this hypothesis. Did the authors try or consider testing the Gr43a mutant on these substrates? Determining to which extent taste contributes to the different strategies completes the picture of how sensory cues contribute to foraging decisions that the authors started to address by tackling the contribution of olfaction to foraging on the different substrates. Also on patchy substrates, is the border completely smooth or could the larvae also sense the border as a rough edge? Could other modalities be involved?

      The idea of testing the anosmic animals was to understand to what extent volatile sensory cues influence the search outside the patch. We did not intend to make a complete analysis of the role of different sensory modalities for the foraging adaptation. In particular, investigating taste is complicated since it is not very well known how yeast taste is sensed. Several yeast metabolites have been shown to activate subsets of taste receptor neurons but the work has mostly been done in adult flies. There is a clearer picture regarding sugars where Gr43a is known to be a sucrose and fructose receptor. To understand the role of taste for foraging, we should do a series of experiments which go beyond the scope of this paper.

      But we agree it is an interesting question and have added a new section in the discussion. See line 634: “An experiment using the gustatory sweet sensor Gr43a mutant on sucrose, which is not volatile and does not produce smell, could help discerning the contribution of taste at the border of the patch (Fujishiro et al. 1984; Marella et al., 2006; Miyamoto et al. 2013; Wang et al.,2004; Mishra et al.,2013). For yeast, the lack of smell completely changed the response of the larvae, which did not show differences inside and outside the patch for most foraging parameters (Figure 4B, C, E, G). In this instance, taste was not sufficient to retain larvae inside the yeast patch (compare Figure 3H with Figure 4F) even though several gustatory receptors have been shown to be activated by yeast metabolites (Wisotsky et al., 2011, Ganguly et al.,2017, Croset et al., 2016).”

      Regarding the edge sensation, the revised version includes two control experiments where we have tested the impact of the edges in the absence of nutrients. In the first control experiment, we prepared wells for food patches like in the “sucrose” and “apple juice” conditions, but we filled them with agar. In the second experiment, to control for the “yeast” condition, we made patches with gel. The results are presented in Figure 3-figure supplement 2 and they show that in both cases, in the absence of nutrients, the edge does not have a significant influence on the turning rate towards the center.

      The revised version also includes mentions to mechanosensation:

      Line 337 : “We observed that inward turns occur more often than outward turns at the border of the patch for the three substrates (Figure 3B, inward turns are shown in black). To control for possible mechanosensory effects due to the border edges, we prepared new arenas with patches that contained no nutrients, either using the same agar that composed the rest of the arena, or using ultrasound gel (Methods). Larvae in the agar-agar or the agar-gel border did not show any changes in their preference to turn towards the patch center, confirming that the behavioral change observed in response to food is specific (Figure 3-figure supplement 2).”

      Line 646: “However, when larvae are crawling, they leave a print of their denticle attachment on the agar, that could inform them about their previous location and help returning to the food.”

      In Figure 3C the crawling speed is lower in yeast and apple juice experiments both inside and outside of patches (and in both rovers and sitters) compared to sucrose experiments. Do the authors have an explanation for this? Also, as they note, surprisingly the turn bias persisted when the larvae exited the patches. Are these two related? Do larvae turn more frequently?

      The speed outside the patches of yeast and apple juice is indeed lower than outside sucrose. We now mention this in the main text and propose an explanation:

      Line 313: “Outside yeast and apple juice patches, the crawling speed increased but did not return to levels similar to the agar-only condition, suggesting that the behavior of larvae that exit the patch is influenced by the recent food experience or that larvae might still be sensing the food (Figure 3-figure supplement 1E). In line with this, in yeast the number of turns outside the patch was higher than inside the patch.”

      The authors describe and discuss handedness in larval turning. While this in itself is an interesting characterisation, it does not appear to be thoroughly addressed in the context of its influence on foraging behavior. The authors conclude that the presence of patches induces turning bias that overrides handedness. It would be interesting to determine whether there are differences in turn size and/or reorientation frequency depending if the larvae are turning on the preferred side versus the non-preferred side.

      Thank you for pointing this, the sentence was somewhat misleading. We corrected it and added a quantification of the percentage of larvae whose handedness changes when comparing in and out behaviour, in Figure 3-figure supplement 1F. This is generally around 20% so larvae mostly adjust their angles rather than their handedness.<br /> Line 354: “This is accomplished by turning towards the patch center while maintaining the handedness (Figure 3J and Figure 3-figure supplement 1F) and represents an important mechanism to remain inside the food.”

      During different types of taxes, the larvae modulate crawling speed, duration, turn rate, size and direction to avoid unfavourable conditions and approach unfavourable conditions. This is true across different types of sensory gradients. Some of these strategies are also described in this paper. The authors make a link between behaviour on patch-no patch interface and taxis behaviour. It would be interesting to further develop the comparison between the behavioural elements described here and those in navigational strategies in sensory gradients. The commonalities and possible modular organisation of both could point to an existence of neural circuits for the different behavioural modules that are recruited differentially dependent on the sensory context, motivation state, or a combination of both (and based on different types of sensory information).

      Thank you for the comment. We have added a new section in the discussion. Line 651: “One of the strengths of our phenomenological model is that it incorporates a modular organization of foraging that could reflect how the crawl and turn modules are controlled. First, we modelled a stochastic search where no information regarding food is available outside of the current location, because food is absent or because the larvae cannot sense it. This corresponds to an autonomous search behavior implemented by circuits located in the ventral nerve cord without input from the brain (Berni et. al 2012; Sims et al. 2019). Second, we have incorporated a goal-directed navigation that allows larvae return to the food. Our phenomenological model includes a distance-dependent probability to turn inwards that mimics the effect of chemotaxis (when present), as much as any other possible mechanism that contributes to the turning probability. As a consequence, we observed that simulated larvae, even when the resources are fractioned in eight patches, could stay inside the food patch for longer periods, in line with experimental observations (Figure 5 and Figure 6). The model could be improved by setting the turning properties outside the patch to match as closely as possible experimental observations. To this end, we could consider studies of larvae crawling in different attractive gradients, where the changes in turning probability and angle, including weathervaning, have been investigated in relation to precise spatio-temporal information of odorants (Louis et al., 2008; Gomez-Marin et al., 2011; Davies et al.,2015). It would also be helpful to have information about other attractive gradients, like taste, to know if a common set of mechanisms is used regardless of the sensory modality. Using this information, our model could be used to investigate how crawling speed and turning properties are controlled via descending pathways from the brain (Tastekin et al. 2018; Jovanic et al. 2019). Finally, in the presence of nutrients, our model adjusts movements to stay on the food patch. The concerted decrease in turning rate and crawling speed and the increase in the number of pauses, suggests that a neuromodulatory depression of movement (Marder, 2012) could be relevant in this phase. It would be interesting to investigate more generally how neuromodulators influence the decision to remain or explore new food resources in relation to the resources available and the larval motivational state.”

      Reviewer #3 (Public Review):

      The authors of the paper study foraging strategy in crawling Drosophila larvae. They utilize single-larva tracking in isotropic and patchy food nutrition environments, detailed quantitative analysis of the animals' behavioral states and transitions, and a random-walk-style Monte Carlo simulation setting. They investigate how specific components of behavior are modulated for the animal to locate suitable food resources.

      Strengths:

      • The main results of the paper, laying out how crawling speed, turn/pause rates, and turn direction bias work together cause larvae to find the food they need are interesting, nicely presented, and important for ultimately understanding how foraging really works in detail, here at the behavioral level, and somewhere down the road at the circuit and/or molecular levels too.

      • Comparing rovers and sitters throughout the experimental parts of the paper was a really nice idea, with interesting results, and it is well motivated in the introduction.

      • The handedness of individuals is a nice finding as well, I think the first time this has been published for larval Drosophila.

      • Simulations that use empirical results as probability distributions make for a nice environment for testing ideas about larva behavior.

      • Creating the patchy food environments was a great idea, as it puts the larva behavior in a more realistic setting, but still controlled enough to be analyzed clearly.

      Weaknesses:

      • For an animal that tends to have a very high variance in its behavior, the number of larvae used in each experiment seems pretty low to me. As a result, some of the secondary claims are perhaps not as well supported when they rely on "not significant" statistical test results. * The introduction is generally good, but could perhaps better motivate why fly larva foraging should be of interest to a more general audience.

      We answered the question about the number of larvae used in our experiments in the required revisions above.

      We have added a section in the introduction to explain the relevance and generality of our work:

      Line 45: “These models postulate that animals will use different strategies depending on the distribution of the resources. In environments where resources are abundant, animals will search and exploit them performing short movements in random directions, in patterns well approximated by Brownian random walks. When resources are sparse, and foragers have incomplete knowledge about their location, a more diffusive strategy is needed, with an alternation between short-range and long-range movements, which can be modelled as a Lévy random walk. Analysis of animal movements in the wild has demonstrated that environmental context can induce the switch between Levy to Brownian movement patterns (Humphries et al., 2010), but the mechanisms behind the implementation of such a behavior (e.g., cognitive capacity, memory) often remain elusive (Budaev et al., 2019). Understanding the motor mechanisms that regulate the execution of different movement strategies and the transitions between them could provide insight into how the nervous system can drive the search for resources in complex and ever-changing environments. Drosophila larva is an excellent model to study this question, because the movement of single animals can be tracked for long periods of time in a controlled environment.”

      • The execution of the simulations seems reasonable, but perhaps don't add a lot to this particular paper, especially given how much of the manuscript they take up.

      We now specifically highlight the unique contributions of the model that go beyond the performed experiments, especially in terms of making experimental predictions. See our answer to the specific point in the requires revisions above. Overall, the primary results of the paper do achieve the stated goals and set the stage nicely for further studies into the underlying mechanisms of foraging in larvae.

      For those studying foraging, especially in flies/larvae but probably other animals as well, this should be an important paper that highlights the utility of individual animal tracking with high resolution, analyzing specific components of behavior, and creating simulation environments as playgrounds for investigating the impact of those components.

    1. Author Response

      Reviewer #2 (Public Review):

      This fascinating study describes a possible effect of cancer-generated microvesicles on fibroblasts. Microvesicles from a particularly metastatic line promote more contractile and proliferative fibroblasts, and there is a key role for at least one microvesicle factor - the crosslinking enzyme Transglutaminase-2. A wide range of studies help identify and elucidate these effects, but a few aspects remain unclear.

      1) MV- has more crosslinking TGM2 but also less MMP14 degradation, and so ECM is more stable either way. The authors should describe any other factors that would give a similar effect as these. The authors should address: do other genes change with TGM2 knockdown; does MMP14 change? If the latter changes, does it have a more important role than TGM2?

      We included a more thorough investigation into the proteomics data to determine what other factors in the MVs may induce fibroblast activation or matrix remodeling. Lists of “fibroblast-activating proteins” and “matrix remodeling proteins” were generated based on online datasets. All fibroblast-activating proteins tested were more highly expressed in MV- compared to MV+, but TGM2 was the only protein on this list with significantly increased expression (Figure 3b-d).

      A large variety of matrix-remodeling proteins were detected in the MV proteomics, including matrix ligands, proteases, protease inhibitors, and crosslinking enzymes. Interestingly, MV+ had significantly higher levels of the matrix remodeling proteins TIMP3, FN1, and COL8A1 (Figure 3d). MV- had significantly higher levels of the crosslinking enzymes PLOD1 and PLOD3, the matrix ligand COL12A1, and TGM2 (Figure 3d). As TGM2 can be categorized as both a matrix remodeling and fibroblast-activating protein and was significantly greater in the MV- compared to MV+, we believe this addition to the paper reinforces our focus on TGM2 (Figure 3).

      2) Perhaps the cleanest and important study of MV effects is in Fig.6j,k, but it shows in vivo differences that are barely significant or not significant, and compares to 'SF' serum free media as a control. Are serum components detected in Mass Spec? If so, wouldn't this suggest a serum supplemented media is a better control? The serum is usually from another species, which is a further (xenogeneic) concern that motivates care and discussion about dose -- especially given the high frequency of injection. Also, is there a survival difference for the mice?

      Thank you for bringing this concern to our attention. We realize that our wording was not clear. MVs are isolated under serum-free conditions and after isolation are resuspended in serum-free media. For this experiment, our mice were injected with either MVs suspended in serum-free media or serum-free media alone. We have revised the text to explain this more thoroughly.

      Additionally, we were unable to assess survival differences as our IACUC protocol requires sacrificing mice upon a certain percentage of weight loss.

    1. Author Response

      Reviewer #1 (Public Review):

      Using fMRI-based univariate and multivariate analyses, Root, Muret, et al. investigated the topography of face representation in the somatosensory cortex of typically developed two-handed individuals and individuals with a congenital and acquired missing hand. They provide clear evidence for an upright face topography in the somatosensory cortex in all three groups. Moreover, they find that one-handers, but not amputees, show shorter distances from lip representations to the hand area, suggesting a remapping of the lips. They also find a shift away of the upper face from the deprived hand area in one-handers, and significantly greater dissimilarity between face part representations in amputees and one-handers. The authors argue that this pattern of remapping is different to that of cortical neighborhood theories and points toward a remapping of face parts which have the ability to compensate for hand function, e.g., using the lips/mouth to manipulate an object.

      These findings provide interesting insights into the topographic organization of face parts and the principles of cortical (re)organization. The authors use several analytical approaches, including distance measures between hand- and face-part-responsive regions and representational similarity analysis (RSA). Particularly commendable is the rigorous statistical analysis, such as the use of Bayesian comparisons, and careful interpretation of absent group differences.

      We thank the reviewer for their positive and constructive feedback.

      Reviewer #2 (Public Review):

      After amputation, the deafferented limb representation in the somatosensory cortex is activated by stimulation of other body parts. A common belief is that the lower face, including the lips, preferentially "invades" deafferented cortex due to its proximity to cortex. In the present study, this hypothesis is tested by mapping the somatosensory cortex using fMRI as amputees, congenital one-handers, and controls moved their forehead, nose, lips or tongue. First, they found that, unlike its counterpart in monkeys, the representation of the face in the somatosensory cortex is right-side up, with the forehead most medial (and abutting the hand) and the lips most lateral. Second, there was little evidence of "reorganization" of the deafferented cortex in amputees, even when tested with movements across the entire face rather than only the lips. Third, congenital one-handers showed significant reorganization of deafferented cortex, characterized principally by the invasion of the lower face, in contrast to predictions from the hypothesis that proximity was the driving factor. Fourth, there was no relationship between phantom limb pain reports and reorganization.

      As a non-expert in fMRI, I cannot evaluate the methodology. That being said, I am not convinced that the current consensus is that the representation of the face in humans is flipped compared to that of monkeys. Indeed, the overwhelming majority of somatosensory homunculi I have seen for humans has the face right side up. My sense is that the fMRI studies that found an inverted (monkey-like) face representation contradict the consensus.

      Thank you for point this out. As we tried to emphasise in the introduction, very few neuroimaging studies actually investigated face somatotopy in humans, with inconsistent results. We agree the default consensus tends to be dominated by the up-right depiction of Penfield’s homunculus (recently replicated by Roux et al, 2018). However, due to methodological and practical constraints, alignment across subjects in the case of intracortical recordings is usually difficult to achieve, and thus makes it difficult to assess the consistency in topographical organisation. Moreover, previous imaging studies did not manage to convincingly support Penfield’s homunculus. For these two key reasons, the spatial orientation of the human facial homunculus is still debated. A further limiting factor of previous studies in humans is that the vast majority of human studies investigating face (re)mapping in humans focused solely on the lip representation, using the cortical proximity hypothesis to interpret their results. Consequently, as we highlight above in our response to the Editor, there is a wide-spread and false representation in the human literature of the lips neighbouring the hand area.

      To account for the reviewer’s critic and convey some of this context, we changed our title from: Reassessing face topography in primary somatosensory cortex and remapping following hand loss; to: Complex pattern of facial remapping in somatosensory cortex following congenital but not acquired hand loss. This was done to de-emphasise the novelty of face topography relative to our other findings.

      We also rewrote our introduction (lines 79-94) as follows:

      “The research focus on lip cortical remapping in amputees is based on the assumption that the lips neighbour the hand representation. However, this assumption goes against the classical upright orientation of the face in S126–30, as first depicted in Penfield’s Homunculus and in later intracortical recordings and stimulation studies26–29, with the upper-face (i.e., forehead) bordering the hand area. In contrast, neuroimaging studies in humans studying face topography provided contradictory evidence for the past 30 years. While a few neuroimaging studies provided partial evidence in support of the traditional upright face organisation31, other studies supported the inverted (or ‘upside-down’) somatotopic organisation of the face, similar to that of non-human primates32,33. Other studies suggested a segmental organisation34, or even a lack of somatotopic organisation35–37, whereas some studies provided inconclusive or incomplete results38–41. Together, the available evidence does not successfully converge on face topography in humans. In line with the upright organisation originally suggested by Penfield, recent work reported that the shift in the lip representation towards the missing hand in amputees was minimal42,43, and likely to reside within the face area itself. Surprisingly, there is currently no research that considers the representation of other facial parts, in particular the upper-face (e.g., the forehead), in relation to plasticity or PLP.”

      We also updated the discussion accordingly (lines 457, 469-477, 490-492).

      Similarly, it is not clear to me how the observations (1) of limited reorganization in amputees, (2) of significant reorganization in congenital one-handers, and (3) of the lack of relationship between PLP and reorganization is novel given the previous work by this group. Perhaps the authors could more clearly articulate the novelty of these results compared to their previous findings.

      Thank you for giving us the opportunity to clarify on this important point. The novelty of these results can be summarised as follow:

      (1) Conceptually, it is crucial for us to understand if deprivation-triggered plasticity is constrained by the local neighbourhood, because this can give us clues regarding the mechanisms driving the remapping. We provide strong topographic evidence about the face orientation in controls, amputees and one-handers.

      (2) The vast majority of previous research on brain plasticity following hand loss (both congenital and acquired) in humans has exclusively focused on the lower face, and lips in particular. We provide systematic evidence for stable organisation and remapping of the neighbouring upper face, as well as the lower face. We also study topographic representation of the tongue (and nose) for the first time.

      (3) The vast majority of previous research on brain remapping following hand loss (both congenital and acquired, neuroimaging and electrophysiological) was focused on univariate activity measures, such as the spatial spread of units showing a similar feature preference, or the average activity level across individual units. We are going beyond remapping by using RSA, which allows us to ask not only if new information is available in the deprived cortex (as well as the native face area), but also whether this new information is structured consistently across individuals and groups. We show that representational content is enhanced in the deprived cortex one-handers whereas it is stable in amputees relative to controls (and to their intact hand region).

      (4) Based on previous studies, the assumption was that reorganisation in congenital one-handers was relatively unspecific, affecting all tested body parts. Here, we provide evidence for a more complex pattern of remapping, with the forehead representation seemingly moving out of the missing hand region (and the nose representation being tentatively similar to controls). That is, we show not just “invasion” but also a shift of the neighbour away from the hand area which has never been documented (or in fact suggested).

      (5) Using Bayesian analyses we provide definitive evidence against a relationship between PLP and forehead remapping, providing first and conclusive evidence against the remapping hypothesis, based on cortical neighbourhood.

      Our inclination is not to add a summary paragraph of these points in our discussion, as it feels too promotional. Instead, we have re-written large sections of the introduction and discussion to better emphasise each of these points separately throughout the text, where the context is most appropriate. Given the public review strategy taken by eLife, the novelty summary provided above will be available for any interested reader, as part of the public review process. However, should the reviewer feel that a novelty summary paragraph is required (or an emphasis on any of the points summarised above), we will be happy to revise the manuscript accordingly.

      Finally, Jon Kaas and colleagues (notably Niraj Jain) have provided evidence in experiments with monkeys that much of the observed reorganization in the somatosensory cortex is inherited from plasticity in the brain stem. Jain did not find an increased propensity for axons to cross the septum between face and hand representations after (simulated) amputation. From this perspective, the relevant proximity would be that of the cuneate and trigeminal nuclei and it would be critical to map out the somatotopic organization of the trigeminal and cuneate nuclei to test hypotheses about the role of proximity in this remapping.

      Thank you for highlighting this very relevant point, which we are well aware of. We fully agree with the reviewer that this is an important goal for future study, but functional imaging of the brainstem in humans is particularly challenging and would require ultra high field imaging (7T) and specialised equipment. We have encountered much local resistance due to hypothetical issues for MRI safety for scanning amputees in this higher field strength, meaning we are unable to carry out this research ourselves. Our former lab member Sanne Kikkert, who is now running her independent research programme in Zurich, has been working towards this goal for the past 4 years. So we can say with confidence that this aim is well beyond the scope of the current study. In response to your comment, we mentioned this potential mechanism in the introduction (lines 98-101), we ensured that we only referred to “cortical proximity” throughout our manuscript, and we circle back to this important point in the discussion.

      Lines 539-543: “Moreover, even if the remapping we observed here goes against the theory of cortical proximity, it can still arise from representational proximity at the subcortical level, in particular at the brainstem level44,45. While challenging in humans, mapping both the cuneate and trigeminal nuclei would be critical to provide a more complete picture regarding the role of proximity in remapping.”

      Reviewer #3 (Public Review):

      In their study, the authors set up to challenge the long-held claim that cortical remapping in the somatosensory cortex in hand deprived cortical territories follows somatotopic proximity (the hand region gets invaded by cortical neighbors) as classically assumed. In contrast to this claim, the authors suggest that remapping may not follow cortical proximity but instead functional rules as to how the effector is used. Their data indeed suggest that the deprived hand area is not invaded by the forefront which is the cortical neighbor but instead by the lips which may compensate for hand loss in manipulating objects. Interestingly the authors suggest this is mostly the case for one-handers but not in amputees for who the reorganization seems more limited in general (but see my comments below on this last point).

      This is a remarkably ambitious study that has been skilfully executed on a strong number of participants in each group. The complementarity of state-of-the-art uni- and multi-variate analyses are in the service of the research question, and the paper is clearly written. The main contribution of this paper, relative to previous studies including those of the same group, resides in the mapping of multiple face parts all at once in the three groups.

      We are grateful to the reviewer for appreciating the immense effort that this study involved.

      In the winner takes all approach, the authors only include 3 face parts but exclude from the analyses the nose and the thumb. I am not fully convinced by the rationale for not including nose in univariate analyses - because it does not trigger reliable activity - while keeping it for representational similarity analyses. I think it would be better to include the nose in all analyses or demonstrate this condition is indeed "noisy" and then remove it from all the analyses. Indeed, if the activity triggered by nose movement is unreliable, it should also affect multivariate.

      Following this comment, we re-ran all univariate analyses to include the nose, and updated throughout the main text and supplemental results and related figures. In short, adding the nose did not change the univariate results, apart from a now significant group x hemisphere interaction for the CoG of the tongue when comparing amputees and controls, matching better the trends for greater surface coverage in the deprived hand ROI of amputees. Full details are provided in our response to Reviewer 1 above.

      The rationale for not including the hand is maybe more convincing as it seems to induce activity in both controls and amputees but not in one-handers. First, it would be great to visualize this effect, at least as supplemental material to support the decision. Then, this brings the interesting possibility that enhanced invasion of hand territory by lips in one-handers might link to the possibility to observe hand-related activity in the presupposed hand region in this population. Maybe the authors may consider linking these.

      Thank you for this comment. As we explain in our response to Reviewer 1 above, we did not intent the thumb condition in one-handers for analysis, as the task given to one-handers (imagine moving a body part you never had before) is inherently different to that given to the other groups (move - or at least attempt to move - your (phantom) hand). As such, we could not pursuit the analysis suggested by the reviewer here. To reduce the discrepancy and following Reviewer 1’s advice, we decided to remove the hand-face dissimilarity analysis which we included in our original manuscript, and might have sparked some of this interest. Upon reflection we agreed that this specific analysis does not directly relate to the question of remapping (but rather of shared representation), in addition to making the paper unbalanced. We will now feature this analysis in another paper that appears more appropriate in the context of referred sensations in amputees (Amoruso et al, 2022 MedRxiv).

      The use of the geodesic distance between the center of gravity in the Winner Take All (WTA) maps between each movement and a predefined cortical anchor is clever. More details about how the Center Of Gravity (COG) was computed on spatially disparate regions might deserve more explanations, however.

      We are happy to provide more detail on this analysis, which weights the CoG based on the clusters size (using the workbench command -metric-weighted-stats). Let’s consider the example shown here (Figure 1) for a single control participant, where each CoG is measured either without weighting (yellow vertices) or with cluster weighting (forehead CoG=red, lip CoG=dark blue, tongue CoG=dark red). When the movement produces a single cluster of activity (the lips in the non-dominant hemisphere, shown in blue), the CoG’s location was identical for both weighted (red) and unweighted (yellow) calculations. But other movements, such as the tongue (green), produced one large cluster (at the lateral end), with a few more disparate smaller clusters more medially. In this case, the larger cluster of maximal activity is weighted to a greater extent than the smaller clusters in the CoG calculation, meaning the CoG is slightly skewed towards it (dark red), relative to the smaller clusters.

      Figure 1. Centre-of-gravity calculation, weighted and unweighted by cluster size, in an example control participant. Here the winner-takes-all output for each facial movement (forehead=red, lips=blue, tongue=green) was used to calculate the centre-of-gravity (CoG) at the individual-level in both the dominant (left-hand side) and non-dominant (right-hand side) hemisphere, weighted by cluster size (forehead CoG=red, lip CoG=dark blue, tongue CoG=dark red), compared to an unweighted calculation (denoted by yellow dots within each movements’ winner-takes-all output).

      This is now explained in the methods (lines 760-765) as follows:

      “To assess possible shifts in facial representations towards the hand area, the centre-of-gravity (CoG) of each face-winner map was calculated in each hemisphere. The CoG was weighted by cluster size meaning that in the event of multiple clusters contributing to the calculation of a single CoG for a face-winner map, the voxels in the larger cluster are overweighted relative to those in the smaller clusters. The geodesic cortical distance between each movement’s CoG and a predefined cortical anchor was computed.”

      Moreover, imagine that for some reason the forefront region extends both dorsally and ventrally in a specific population (eg amputees), the COG would stay unaffected but the overlap between hand and forefront would increase. The analyses on the surface area within hand ROI for lips and forehead nicely complement the WTA analyses and suggest higher overlap for lips and lower overlap for forehead but none of the maps or graphs presented clearly show those results - maybe the authors could consider adding a figure clearly highlighting that there is indeed more lip activity IN the hand region.

      We agree with you on this limitation of the CoG and this is why we interpret all cortical distances analyses in tandem with the laterality indices. The laterality indices correspond to the proportion of surface area in the hand region for a given face part in the winner-maps.

      Nevertheless, to further convince the Reviewer, we extracted activity levels (beta values) within the hand region of congenitals and controls, and we ran (as for CoGs) a mixed ANOVA with the factors Hemisphere (deprived x intact) and Group (controls x one-handers).

      As expected from the laterality indices obtained for the Lips, we found a significant group x hemisphere interaction (F(1,41)=4.52, p=0.040, n2p=0.099), arising from enhanced activity in the deprived hand region in one-handers compared to the non-dominant hand region in controls (t(41)=-2.674, p=0.011) and to the intact hand region in one-handers (t(41)=-3.028, p=0.004).

      Since this kind of analysis was the focus of previous studies (from which we are trying to get away) and since it is redundant with the proportion of face-winner surface coverage in the hand region, we decided not to include it in the paper. But we could add it as a Supplementary result if the Reviewer believes this strengthens our interpretation.

      In addition to overlap analyses between hand and other body parts, the authors may also want to consider doing some Jaccard similarity analyses between the maps of the 3 groups to support the idea that amputees are more alike controls than one-handers in their topographic activity, which again does not appear clear from the figures.

      We thank the reviewers for this clever suggestion. We now include the Jaccard similarity analysis, which quantified the degree of similarity (0=no overlap between maps; 1=fully overlapping) between winner-takes-all maps (which included the nose; akin to the revised univariate results) across groups. For each face part/amputee, the similarity with the 22 controls and 21 one-handers respectively was averaged. We utilised a linear mixed model which included fixed factors of Group (One-handers x Controls), Movement (Forehead x Nose x Lips x Tongue) and Hemisphere (Intact x Deprived) on Jaccard similarity values (similar to what we used for the RSA analysis). A random effect of participant, as well as covariates of ages, were also included in the model.

      Results showed a significant group x hemisphere interaction (F(240.0)=7.70, p=0.006; controlled for age; Fig. 5), indicating that amputees’ maps showed different similarity values to controls’ and one-handers’ depending on the hemisphere. Post-hoc comparisons (corrected alpha=0.025; uncorrected p-values reported) revealed significantly higher similarity to controls’ than to one-handers’ maps in the deprived hemisphere (t(240)=-3.892, p<.001). Amputees’ maps also showed higher similarity to controls’ maps in the deprived relative to the intact hemisphere (t(240)=2.991, p=0.003). Amputees, therefore, displayed greater similarity of facial somatotopy in the deprived hemisphere to controls, suggesting again fewer evidence for cortical remapping in amputees.

      We added these results at the end of the univariate analyses (lines 335-351) and in the discussion (lines 464-465 and 497-500).

      This brings to another concern I have related to the claim that the change in the cortical organization they observe is mostly observed in one-handers. It seems that most of this conclusion relies on the fact that some effects are observed in one-handers but not in amputees when compared to controls, however, no direct comparisons are done between amputees and one-handers so we may be in an erroneous inference about the interaction when this is actually not tested (Nieuwenhuis, 11). For instance, the shift away from the hand/face border of the forehead is also (mildly) significant in amputees (as observed more strongly in one-handers) so the conclusion (eg from the subtitle of the results section) that it is specific to one-hander might not fully be supported by the data. Similar to the invasion of the hand territory from the lips which is significant in amputees in terms of surface area. All together this calls for toning down the idea that plasticity is restricted to congenital deprivation (eg last sentence of the abstract). Even if numerically stronger, if I am not wrong, there are no stats showing remapping is indeed stronger in one-handers than in amputees and actually, amputees show significant effects when compared to controls along the lines as those shown (even if more strongly) in one-handers.

      Thank you for this very important comment. We fully agree – the RSA across-groups comparison is highly informative but insufficient to support our claims. We did not compare the groups directly to avoid multiple comparisons (both for statistical reasons and to manage the size of the results section). But the reviewer’s suggestion to perform a Jaccard similarity analysis complements very nicely the univariate and multivariate results and allows for a direct (and statistically lean) comparison between groups, to assess whether amputees are more similar to controls or to congenital one-handers, taking into account all aspects of their maps (both spatial location/CoG and surface coverage). We added the Jaccard analysis to the main text, at the end of the univariate results (lines 335-385). The Jaccard analysis suggests that amputees’ maps in the deprived hemisphere were more similar to the maps of controls than to the ones of congenital one-handers. This allowed us to obtain significant statistical results to support the claim that remapping is indeed stronger in one-handers than in amputees (lines 346-351). We also compared both amputees and one-handers to the control group. In line with our univariate results, this revealed that the only face part for which controls were more similar to one-handers than to amputees was the tongue (lines 379-381). And that the forehead remapping observed at the univariate level in amputees (surface area), is likely to arise from differences in the intact hemisphere (lines 381-383).

      Finally, we also added the post-hoc statistics comparing amputees to congenitals in the RSA analysis (lines 425-427): “While facial information in the deprived hand area was increased in one-handers compared with amputees, this effect did not survive our correction for multiple comparisons (t(70.7)=-2.117, p=0.038).”

      Regarding the univariate results mentioned by the reviewer, we would like to emphasise that we had no significant effect for the lips in amputees, though we agree the surface area appears in between controls and one-handers. But this laterality index was not different from zero. This test is now added lines 189-190. Regarding the forehead, we fully agree with the Reviewer, and we adjusted the subtitle accordingly (lines 241-242). For consistency, we also added the t-test vs zero for the forehead surface area (non-significant, lines 251-253).

      Also, maybe the authors could explore whether there is actually a link between the number of years without hand and the remapping effects.

      To address this question, we explored our data using a correlation analysis. The only body part who showed some suggestive remapping effects was the tongue, and so we explored whether we could find a relationship (Pearson’s correlation) between years since amputation and the laterality index of the Tongue in amputees (r = 0.007, p=0.980, 95% CI [-0.475, 0.475]). We also explored amputees’ global Jaccard similarity values to controls in the deprived hemisphere (r = -0.010, p=0.970, 95% CI [-0.488, 0.473]), and could not find any relationship. Considering there was no strong remapping effect to explain, we find this result too exploratory to include in our manuscript.

      One hypothesis generated by the data is that lips remap in the deprived hand area because lips serve compensatory functions. Actually, also in controls, lips and hands can be used to manipulate objects, in contrast to the forehead. One may thus wonder if the preferential presence of lips in the hand region is not latent even in controls as they both link in functions?

      We agree with the reviewer’s reasoning, and we think that the distributed representational content we recently found in two-handers (Muret et al, 2022) provides a first hint in this direction. It is worth noting that in that previous publication we did not find differences across face parts in the activity levels obtained in the hand region, except for slightly more negative values for the tongue. But we do think that such latent information is likely to provide a “scaffolding” for remapping. While the design of our face task does not allow to assess information content for each face part (as done for the lips in Muret et al, 2022), this should be further investigated in follow-up studies.

      We added a sentence in the discussion to highlight this interesting notion: Lines 556-559: “Together with the recent evidence that lip information content is already significant in the hand area of two-handed participants (Muret et al, 2022), compensatory behaviour since developmental stages might further uncover (and even potentiate) this underlying latent activity.”

    1. Author Response

      Reviewer #1 (Public Review):

      It has previously been shown that deletion of the GluA3 subunit in mice leads to alterations in auditory behavior in adult mice that are older than a couple of months of age. The GluA3 subunit is expressed at several synapses along the auditory pathway (cochlea and brainstem), and in ko mice changes in brainstem synapses have been observed. These previously documented changes may account for some of the deficits in hearing in adult ko mice.

      In the current study, the authors investigate an earlier stage of development (at 5 wks) when the auditory brainstem responses (ABRs) are normal, and they ask how transmission persists at inner hair cell (ihc) ribbon synapses in GluA3 ko mice. They discovered that deletion of GluR3A significantly changed 1) the relative expression of Glu A2 (dramatically downregulated) and A4 subunits at SGN afferents, and 2) caused morphological changes in ihc ribbons (modiolar side) and synaptic vesicle size (pillar).

      The changes documented in the 5 wk old GluA3ko mice were not necessarily predicted because in general the mechanisms involved in shuffling GluA receptors at this synapse (or other sensory synapses) are not completely understood; furthermore, much less is known about the role of differentiation of ihc-sgn synapses along a modiolar-pillar axis. With that said, the only shortcoming of the study is a lack of explanation for the observed changes in the synaptic structure; but this is not specific to this study.

      Given the quality of the data and the clarity of presentation of results, this is a very valuable study that will aid and motivate researchers to further explore how auditory circuitry develops, and becomes differentiated, at the level of ihc-sgn synapses.

      We thank the reviewer for the positive and helpful comments. Ongoing studies are seeking to explain the observed changes in synapse structure.

      Reviewer #2 (Public Review):

      The goal of the study by Rutherford and colleagues was to characterize functional, structural, and molecular changes at the highly specialized cochlear inner hair cell (IHC) - spiral ganglion neuron (SGN) ribbon synapse in GluA3 AMPA receptor subunit knockout mice (GluA3KO). Previous work by the authors demonstrated that 2-month-old GluA3KO mice experienced impaired auditory processing and changes in synaptic ultrastructure at the SGN - bushy cell synapse, the next synapse in the auditory pathway.

      In the present study, the authors investigated whether GluA3 is required for ribbon synapse formation and physiology in 5-week-old mice using a series of functional and light- and electron microscopy imaging approaches. While deletion of GluA3 AMPAR subunit did not affect hearing sensitivity at this age, the authors reported that cochlear ribbon synapses exhibited changes in the molecular composition of AMPARs and pre- and postsynaptic ultrastructural alterations. Specifically, the authors demonstrated that GluA3KO ribbon synapses exhibit i) a global reduction in postsynaptic AMPARs, which is also reflected by smaller AMPAR arrays, ii) a reduction in GluA2 and an increase in GluA4 protein expression at individual postsynaptic sites, and iii) changes in the dimensions and morphology of the presynaptic specialization ("ribbon") and in the size of synaptic vesicles. These reported structural changes are linked to the side of innervation with respect to the IHC modiolar-pillar axis.

      The results presented by the authors are conceptually very interesting as the data support the notion that potentially detrimental changes in the molecular composition of a sensory synapse can be compensated to sustain synaptic function to a certain extent during development. The conclusions of the study are mostly well supported by the data, but some experimental details or control experiments are missing or need to be clarified to allow a full assessment.

      1) The authors tested which GluA isoforms are expressed in SGNs of GluA3KO mice and reported that only GluA2 and GluA4, and not GluA1, receptor subunits are present in the cochlear. It is, however, a bit difficult to understand why immunolabelling for GluA1 was only performed on brainstem sections (Fig. 1B right) and not in the cochlear to probe for postsynaptic localization at ribbon synapses as it was done for the other isoforms (Fig. 2 and 6) given that GluA3KO IHCs exhibited a larger number of ribbons that lacked GluA2 and 3 (lone or 'orphaned' ribbons; Fig. 6B). It is also not clear why immunolabelling for GluA2 and 4 was performed to probe for expression of these receptor subunits on SGN cell bodies in the cochlear spiral ganglion. Which neurons are expected to synapse onto these somata?

      There is precedent for expression of GluA subunits in the SGN cell bodies reflecting expression at the synapse, although it is not clear if any of that immunoreactivity reflects cell surface expression in the intact ganglion or if it represents solely intracellular subunits being trafficked to synapses.

      Figure 1b shows that GluA2 is expressed in the somata of WT mice and KO mice. The lower panels show that GluA1 is not expressed in the somata of WT or KO mice. The right panels show that while GluA1 is expressed in the cerebellum of WT and KO mice, is not expressed in the cochlear nucleus of WT or KO mice. We think this demonstrates the lack of compensation by GluA1 in the GluA3 KO.

      We have now added GluA4 immunoreactivity in the SGNs to Fig. 1, for completeness. In our experience, GluA subunits expressed at synapses are also found in the cell bodies, and GluA subunits not expressed at synapses are not found in the cell bodies. The current data is consistent with this, although we did not label GluA1 in the organ of Corti.

      2) The authors state in the text that GluA3 expression is completely abolished in GluA3KO IHCs, however, there appears to still be a faint punctate immunofluorescence signal visible when an antibody directed against GluA3 was used (Fig. 2C). Providing additional information on the specificity of this (and the other) antibodies used in the study would be helpful.

      We agree, and thank the reviewer for pointing this out. There is indeed a small signal presumably due to cross-reactivity of the anti-GluA3 with GluA2 subunits, because the cytoplasmic epitope recognized by the antibody is in a region of high similarity of GluA2 and GluA3 (Dong et al., 1997). In addition, the specification sheet of the Santa Cruz company states that the GluA3 antibody can detect GluA2. This relatively small cross-reactivity is noted now in the text on p. 9. Also, this appearance was a product of the same brightness and contrast issue noted above in the response to the editor’s summary. Upon readjustment, the signal is less apparent, because in the readjustment we used less brightness and less contrast enhancement to avoid the unwanted saturation in some of the panels.

      3) The authors reported changes in the volume of the presynaptic ribbon and postsynaptic density surface area in GluA3KO KO animals. The EM data as presented are however not sufficiently convincing.

      i) There appears to be a mismatch between the EM data shown in Fig. 3 and 4 and the information in the text with respect to the number of data points in the plots and the reported number of reconstructed synapses. This raises several questions with respect to the analysis. For instance, it is unclear whether certain synapses were reconstructed but excluded from the analysis. If so, what were the exclusion criteria?

      We thank the reviewer for pointing out this discrepancy within the text and the figures. The discrepancies are now fixed. We have added more information on how the synapses were reconstructed in the M&M (p.14-15).

      ii) The authors compare PSD surface areas in reconstructions from 3D serial sections, but for some of the shown reconstructions (i.e. Fig. 3A' and B' and 4B'), it appears as if PSDs were only incompletely reconstructed.

      We included all the ultrathin sections that show afferent dendrites with a visible PSD. We revised all the reconstructions and fixed some misalignments. The appearance of the reconstructed PSD relates to how the Reconstruct software creates the 3-D rendering. We did not use any extra software to smooth the hedges of the 3D reconstructions.

      4) The immunolabelling experiments shown in Fig. 2 and 6 are of very high quality and the quantitative analysis of the light microscopy data (Fig. 6-9) is clearly very detailed, but slightly difficult to interpret the way it is presented. Specifically, it is unclear how the number of synapses per IHC (Fig. 6B) and the separation into modiolar and pillar side (Fig. 8) was achieved based on the shown images without the outlines of individual cells being visible.

      We agree. Please see the revised Figs. 2, 6, and 8, and explanation in the figure legend of Fig. 8.

      5) Adding more detailed information about important parameters (mean, N/n, SD/SEM) and the statistical tests used for the individual comparisons presented in the Figures would help strengthen the confidence in the presented data.

      Please see the new spreadsheets accompanying the revised manuscript.

      6) In general, the authors report a series of molecular and structural changes in IHCs and reach the conclusion that GluA3 subunits may have a role in "trans-synaptically" determining or organizing the architecture of both the pre- and post-synapse. However, some of the arguments are very speculative and many of the claims are not supported by experimental data presented in the paper. The authors should consider to also compare their findings to studies that investigated ultrastructural changes of AMPAR subunit knockouts in other synapse types, and discuss alternative interpretations (e.g. homeostatic changes).

      Thank you for this comment. Considering that reviewer 1 asked for more speculation, we have decided to leave the level of speculation similar to the initial submission. However, we went through the text to make sure our claims were backed by our observations.

      Due to space constraints, rather than comparing to additional other synapses, in this context we prefer to compare with auditory brainstem synapses.

      The possibility of homeostatic changes we now added on p. 29.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Winter and colleagues define the sensitivity of cancer cells lacking the mitochondrial AAA+ ATAD1 to proteasome inhibition. They show that ATAD1 is often co-deleted with PTEN¬ in many different types of cancer. Using two complementary CRISPR screens in two distinct cell models, they identified the mitochondrial E3 ubiquitin ligase MARCH5 as a gene whose deletion is synthetically lethal with ATAD1. Since MARCH5 was previously reported to function to attenuate apoptotic signaling through mechanisms including promoting degradation of pro-apoptotic factors including BIM1, they sought to define the specific role of ATAD1 in regulating pro-apoptotic factor. They present evidence that ATAD1 extracts the pro-apoptotic protein BIMEL from mitochondria to facilitate its inactivation by mechanisms including degradation and inhibitory phosphorylation - a mechanism that appears enhanced during proteasome inhibition. This suggested that ATAD1-deficient cells could be preferentially sensitive to proteasome inhibitors. Consistent with this, expression of ATAD1 in ATAD1deficient cells decreases sensitivity to proteasome inhibition. Similarly, depletion of ATAD1 in PC3 cells increased sensitivity to proteasome inhibition in xenografts, although somewhat curiously a corresponding increase in BIM was not readily observed (NOXA levels did increase). Finally, the authors show that prostate cancer patients with combined PTEN1/ATAD1 deletion show improved survival as compared to tumors where PTEN1 was deleted alone. Ultimately, these results support a model whereby ATAD1 promotes tumor cell survival and highlights that ATAD1 deletion may represent a vulnerability that can be exploited to treat tumors through the use of proteasome inhibitors.

      Overall, this is an interesting and generally well-performed study that defines the mechanistic and functional implications of a genetic 'hitchhiker' in the context of cancer cell survival. The synthetic lethality for ATAD1 and MARCH5 observed using two different genetic approaches (deletion/overexpression) in two different cell models underscores a strong link between these two genes. Further, the data showing an important role for ATAD1 in regulating BIM mitochondrial localization/cytosolic phosphorylation are interesting. The evidence demonstrating relationships between ATAD1 and proteasome sensitivity is also convincing. However, there are some weaknesses. For example, the direct relationship between ATAD1-dependent prosurvival activities and BIM is not clearly defined. This is evident as BIM1 depletion did not influence ATAD1-deficient PC3 cells' sensitivity to bortezomib and BIM was not significantly impacted in the xenograft models. BIM deletion did partially rescue synthetic lethality in Jurkat cells deficient in both MARCH5 and ATAD1, indicating a potential role in those cells. While the authors do address this, these results do create a disconnect within the studies that complicates the overall interpretation, as the specific importance of BIM regulation by ATAD1 in different models is not consistent or always clear. Regardless, this study does reveal new insights into the genetic relationship between ATAD1 deficiency and proteasome inhibition that could have direct therapeutic potential to improve the treatment of patients. Further, considering that the anti-apoptotic roles for ATAD1 appear to extend beyond BIM regulation, this will open new avenues for investigation of the underlying molecular mechanisms whereby ATAD1 contributes to regulating apoptotic signaling in cancer and other models. With that being said, tempering the writing to better highlight that BIM regulation does not explain the ATAD1 protection observed across cancer cell models (it is the case in some, but not all) would be helpful. While there is value in the new mechanistic insight provided into the potential mechanism of ATAD1-dependent apoptotic regulation, more focus on the specific relationship between ATAD1 deficiency and proteasome inhibitor sensitivity would better suit the current work.

      Reviewer #2 (Public Review):

      This manuscript by Winter et al represents an analysis of the function of the ATAD1 gene in cancer. At present, the manuscript makes a number of interesting observations, with strong experimental support. First, the authors show that tumors with PTEN deletions frequently have additional mutations in ATAD1, and that prostate tumors with both mutations are associated with a shorter period of survival. Second, tumors lacking ATAD1 are more sensitive to proteotoxic stress, based in part on an increased tendency to apoptosis. Third, the ATAD1 protein interacts with BIM, and interactions with BIM contribute in part to an increased tendency to apoptosis. Fourth, ATAD1 and MARCH5 have at least moderate synthetic sick/lethal interactions; together with other data, this suggests they control the release of BIM from the OMM, contributing to its degradation. Overall, the data suggest that tumors with ATAD1 deletions may be particularly vulnerable to drugs that induce proteotoxic stress, suggesting new potential therapeutic regimens, which would be a valuable contribution to the field. The level of data presented here is already substantial; however, some additional experiments to support the authors' contentions would strengthen the work. Some claims about the mechanism are overstated given the current body of data and should be qualified.

      First, we thank the reviewers and editors for considering our work and providing insightful critiques. We are also grateful that our prior reviews from another journal were considered as part of a holistic review. Overall, we have rewritten key aspects of the manuscript to emphasize strengths pointed out by the reviewers (the relationship between the proteasome and ATAD1) while de-emphasizing the claims surrounding ATAD1 and BIM. Specifically, we added a new paragraph to the discussion section to help focus the reader on how loss of ATAD1 sensitizes cells to ubiquitin proteasome system (UPS) dysfunction and describe the implications thereof. We also removed a paragraph from the discussion that may have put undue emphasis on BIM. Lastly, we reconfigured our schematic figure (Fig 4F) to describe a model in which ATAD1 and the UPS represent two parallel pathways of dealing with proteins on the OMM, where loss of one pathway increases dependency on the other. We believe that BIM is an important piece of this story, and clearly demonstrate that ATAD1-dependent extraction of BIM partly explains the synthetic lethality of ATAD1 and MARCH5. However, we agree with the reviewers that to focus too much on BIM detracts from the more general thesis of the work, as described above. We added another paragraph to the discussion that describes limitations of the study, to explicitly outline what our manuscript does and does not demonstrate.

    1. Author Response

      Reviewer #1 (Public Review):

      With a real interest, I read the manuscript entitled "Sex-specific effects of an IgE polymorphism on immunity susceptibility to infection and reproduction in a wild rodent", written by Wanelik and colleagues. Actually, I am impressed with each and every part of this work. This study is very well designed and answers intriguing scientific questions. The study is multilayer and multidimensional and goes far beyond a genomic association as it deeply addresses, to mention only those most important, ecological, parasitological, immunological, and gene expression aspects. In addition to studying the free-living animal community of voles, it utilizes this opportunity to get some insights into the genetics and biology of the high-affinity IgE receptor not possible to be gained in studies performed in humans or standard laboratory animals. The data are presented in a very elegant way and the article is really nicely written.

      We thank the Reviewer for these positive comments, and are very glad to hear they think our work is so comprehensive.

      Reviewer #2 (Public Review):

      In this manuscript, Wanelik et al. use a wild rodent population to test if a polymorphism in a receptor for immunoglobulin E (IgE) affects immune responses, resistance to infection, and fitness. Finding such effects would imply that polymorphisms in immune genes can be maintained by antagonistic pleiotropy between sexes, which has important implications for our understanding of how genetic variation is maintained. The work presented here extends previous work by the same group where they have shown that expression of GATA3 (a transcription factor inducing Th2 immune responses) affects tolerance to ectoparasites and that polymorphism in Fcer1a affects the expression of GATA3. The present study is based on a fairly large data set and comprehensive analysis of a number of different traits. Indeed, the authors should be commended for investigating all steps in the chain polymorphism→immune response→resistance→fitness. Unfortunately, the presentation of the methodology is a bit confusing. Moreover, most of the key results are only marginally significant.

      We thank the Reviewer for their positive feedback, and are very glad to hear they think our work is so comprehensive. As detailed below, we have tried to clarify our methodology and to temper our claims in the revised manuscript.

      As regards methodology, I was confused by the differential expression (DE) analyses presented in fig 1A. First, it took a while to understand that these were based on a comparison of unstimulated cells (i.e. baseline expression), not ex vivo stimulated cells; this should be made explicit in conjunction with the presentation of the results. Second, it would be good to clarify (and motivate) in the Results that you compare individuals with at least one copy of the GC haplotype against the rest, i.e. a dominant model.

      We apologise for the confusion. We now explicitly state in the Results (lines 313-314) that the DGE analysis was based on unstimulated splenocytes: “Differential gene expression (DGE) analysis performed on unstimulated splenocytes taken from 53 males and 31 females assayed by RNASeq”. We also explicitly state “Unstimulated immune gene expression” in the legend for Figure 1.

      Please note that an additive model was used for all analyses run using the hapassoc package (macroparasites and SOD1). A dominant model was used in the DGE analysis and in other analyses where it was not possible to use the hapassoc package (gene expression assayed by Q-PCR, microparasites and reproductive success) which meant that only those individuals for which haplotype could be inferred with certainty could be included (i.e. a smaller dataset). In this case, a dominant model was used. Our use of the dominant model in the DGE analysis is now more explicitly explained on lines 933-935: “Only those individuals for which haplotype could be inferred with certainty could be included (n = 53 males and n = 31 females; none of which were known to have two copies of the GC haplotype hence the choice of a dominant model).” And its use in other non-hapassoc analyses is now explicitly stated on lines 991-992: “as in the DGE analysis, genotype was coded as the presence or absence of the GC haplotype (i.e. a dominant model)”.

      The first key result is that polymorphisms in Fcer1a have sex-specific effects on the expression of pro- and anti-inflammatory genes in males and females. However, the GSEA analyses (fig 1A) show that the GC haplotype has positive effects on the expression of both pro- and anti-inflammatory gene sets in both sexes - albeit with a stronger effect of proinflammatory genes in males and anti-inflammatory genes in females - but there is no formal evidence for an effect of genotype by sex. I am not sure how to test for interaction with GSEA (or if it is at all possible), so it would be good to complement the GSEA with other analyses (perhaps based on PCA?) of these data to provide more formal evidence for an effect of genotype by sex.

      It is not possible to provide formal evidence for an effect of genotype by sex in the DGE analysis/GSEA. Instead, we have tried to temper our claims about sex-specific effects (please see below for further details).

      Some more evidence of a sex-specific effect of Fcer1a genotype is actually provided by analyses of the expression of 18 immune genes in ex vivo stimulated T cells. Here, a sex-specific effect of Fcer1a genotype was found on the expression of one of 18 measured immune genes, the cytokine IL17a. However, Fcer1a is as far as I am aware not expressed by T cells, so the relevance of these results is unclear. Moreover, it is unclear why these 18 genes were analyzed one by one, rather than by some multidimensional approach (e.g. PCA).

      The Reviewer is right that Fcer1a is not generally considered to be expressed by T cells. However, the stimulation could have indirect effects. We have clarified this on lines 801-804: “Although Fcer1a is not expressed by T-cells themselves, polymorphism in this gene could be acting indirectly on T-cells through various pathways, including via cytokine signalling, following expression of Fcer1a by other cells”.

      The 18 immune genes were specially selected because they represent different immune pathways and are expected to have limited redundancy. This is why individual tests were performed (followed by a correction for multiple testing) rather than using a multidimensional approach like PCA. This is now explicitly explained in the Methods on lines 804-808: “The choice of our panel of genes was informed by…(iii) the aim of limited redundancy, with each gene representing a different immune pathway” and on lines 1031-1032: “We did not use a multidimensional approach (such as principal component analysis) because of limited redundancy in our panel of genes.” and in the Results on line 363-366: “we used an independent dataset for males and females whose spleens were stimulated with two immune agonists and assayed by Q-PCR (for a panel of 18 immune genes with limited redundancy); see Methods for how these genes were selected.”

      The second key result is that Fcer1a genotype has sex-specific effects on resistance to parasites, but this is based on a marginally significant effect as regards one of three tested pathogens.

      We acknowledge that this is a marginally significant result and have acknowledged this in the text on line 428 of the Results section.

      The third key result is that Fcer1a genotype has sex-specific effects on reproductive fitness. However, this is based on a marginally significant effect in males only, and a formal test for sex by genotype could not be performed (and since the direction of the effect was similar in females it is doubtful whether there would be an effect of sex by genotype; see fig 1C).

      Thus, while the results presented here are clearly indicative of sex-specific effects of an immune gene polymorphism, I think it is too early to actually claim such effects.

      We understand the Reviewer’s concerns about the overall lack of formal evidence for an effect of genotype by sex. As we are not able to provide this for the DGE analysis, GSEA (see above), or for the reproductive success analysis, we have tempered our claims about sex-specific effects (as suggested by the Reviewer). We have done this by removing the term “sex-specific effect” throughout the manuscript, including in the title. We now focus more heavily on the multiple effects we have shown across different phenotypic traits, and use the term “sex-dependent effects” or describe effects as “differing between sexes” sparingly, and only where necessary. These changes have been made throughout the manuscript, but more so in the introduction where the narrative has been substantially reworked to lay out this change in focus.

      Reviewer #3 (Public Review):

      This is a well-replicated study: the authors sampled over a thousand field voles (Microtus agrestis), over three years at seven different sites, with a combination of cross-sectional and longitudinal sampling. The authors compared individuals carrying the GC haplotype (<10% of the population) of the high-affinity immunoglobulin receptor gene (Fcer1). They recorded parasite infections (Babesia, Bartonella, ticks, fleas, gastrointestinal helminths), expression levels of inflammatory and immune genes using transcriptomes and quantitative PCR, and genotype and pedigree.

      We thank the Reviewer for their positive feedback, and are very glad to hear they think our work is well replicated.

      A comparison of overall gene expression between GC-carrying and all other voles indicated two sex-dependent differences, the expression in males of Il33, which is associated with antihelminthic responses, and in females of Socs3, which is implicated in regulating immune responses. One substantial issue with the authors' interpretation of these data is to attribute Il33 to the inflammatory response - this taints the rest of their interpretation (e.g., Fig 1A, see below); instead, this is a key cytokine of the antihelminthic Th2 response and its detection suggests there might be a difference in helminth infection between the haplotypes - which is consistent with the role of IgE. Therefore, the authors would need to explore further how the GC haplotype, IgE, and parasite burdens might be driving the expression of IL-33. Specifically, the authors did not control for potential confounding effects of infection, which might be expected to differ based on the rest of their data.

      We acknowledge the difficulty in grouping genes under single GO terms, and the need for more nuance when describing these classifications. No gene set is perfect and immune networks are highly complex, so the same gene can be grouped into multiple gene sets. IL33 is an example of this – it appears in the GO term GO:0050729 (positive regulation of inflammatory response) but, as the Reviewer points out, is also commonly associated with the antihelminthic Th2 response. We have edited the text in the Results (on lines 322-324 and lines 350-352) to communicate this nuance, as well as adding references to support each of these associations: “Il33 is commonly associated with anti-helminthic response [25] and Socs3 with regulation of the immune response more broadly [26]….Both Il33 and Socs3 also share an association with the inflammatory response [26,27]. While Il33 positively regulates this response (appearing in the gene set GO:0050729), Socs3 negatively regulates it (GO:0050728).” References added:

      1. Liew FY, Pitman NI, McInnes IB. Disease-associated functions of IL-33: The new kid in the IL-1 family. Nat Rev Immunol. Nature Publishing Group; 2010;10: 103–110. doi:10.1038/nri2692
      2. Carow B, Rottenberg ME. SOCS3, a major regulator of infection and inflammation. Front Immunol. 2014;5: 1–13. doi:10.3389/fimmu.2014.00058
      3. Cayrol C, Girard JP. IL-33: An alarmin cytokine with crucial roles in innate immunity, inflammation and allergy. Curr Opin Immunol. Elsevier Ltd; 2014;31: 31–37. doi:10.1016/j.coi.2014.09.004

      We have also run an extra DGE analysis including cestode burden as a covariate (cestodes being the most prominent helminth infection in terms of biomass), to check whether IL33 still emerges as a top-responding gene in males (see Appendix 1-table 4 & 5). We found that it did (in fact the signal was even stronger), indicating that the differences in Il33 expression are not being driven by differences in cestode infection. We now mention this additional analysis in the text: “Given the link between Il33 and the antihelminthic response (and more generally, IgE-mediated responses and the antihelminthic response), we repeated the DGE analysis while controlling for cestode burden, but this had little effect on our results (same top-responding immune genes; see Appendix 1—table 4 & 5), suggesting that these effects were not driven by differences in cestode infection”. This is consistent with our finding that there is no difference in macroparasite burden (including cestode burden) between individuals with and without the GC haplotype (see Appendix 1—table 11) and lines 449-451: “However, we found no effect of the haplotype (interactive or not) on the probability of infection with the other parasites in our population”.

      We have also included the following caveat in our discussion on lines 540-542: “Some of the differences in immune phenotype that we observed may also be driven by difference in parasite infection (although we accounted for cestode burden in a follow-up analysis, we cannot rule this out).”

      Among a narrow panel of immune genes measured in ex vivo settings, the authors reported elevated expression of Il17a, which is associated with inflammatory, antibacterial responses. Of note, the panel of genes they measured did not contain antihelminth effectors beyond the transcription factor GATA3, and therefore could not confirm the expression of IL-33 observed in the transcriptomes. However, the expression of IL-17a appears consistent with the elevated activity of antioxidant SOD1.

      In response to this comment, we now point out more clearly that our panel of genes did not include Il33 or Socs3, but did include other inflammatory genes including Il17a, Ifng, Il1b, Il6 and Tnfa.

      Somewhat unexpectedly given the authors' claim that in males the GC haplotype is prone to a more inflammatory immune phenotype, it had no effect on infection in that sex. However, the identity of the genes and pathways matter and the authors do not provide sufficient detail to evaluate their interpretation (GSEA analysis and Figure 1A).

      Barcode plots, such as the one we include in Figure 1A, are commonly used representations of GSEA results. In order to aid interpretation for those who are unfamiliar with barcode plots, we have included some more information in the legend of Figure 1.

      An intriguing and potentially important finding is that males carrying the GC haplotype appeared to have fewer offspring (little to no effect detected in the females). To confirm whether the effect of the haplotype is direct or mediated by other factors, it would be useful to test how other covariates, like infection, might contribute to this.

      To explore this possibility, we have run extra GLMs for both females and males which include two parasite variables: proportion of samples taken from an individual that tested positive for Babesia and proportion of samples taken from an individual that tested positive for Bartonella. We found no difference in the main results – males with the GC haplotype still have fewer offspring, suggesting that infection is not acting as a confounder.

    1. Author Response

      Reviewer #1 (Public Review):

      1) Fig 6E shows that CAPE1 is released only upon Fol infection. This appears to contradict with the notion that FolSpv1 prevents CAPE1 release. However, Fol strain overexpressing FolSpv1 prevented the release of CAPE1. It is necessary to compare WT and the mutant strain in which the FolSvp1 gene is deleted. One would expect that the mutant strain induces significantly more CAPE1 release. Similarly, mutant strain complemented with the nls1 construct needs to be tested to see whether nuclear localization is required for preventing CAPE1 release.

      Thank you for the good suggestions! According to the revision policy of eLife in response to COVID-19, we stated in the Discussion section that FolSpv1-mediated translocation of SlPR1 into the nucleus impedes CAPE1 release needs to be further strengthened with additional data in the revised manuscript (lines 441-444). We would like to perform the suggested experiments in future studies.

      2) SlPR1 is localized in the apoplast in a manner dependent on the signal peptide (Fig 5-figure supplement 1). Overexpression of SlPR1 with added NLS but lacking the signal peptide failed to enhance disease resistance to Fol infection (Fig 7G). What about overexpression of SlPR1 lacking the signal peptide without the added NLS? Does retention of SlPR1 in the cytoplasm sufficient to abolish its function? It is not even discussed why SlPR1 has to be in the nucleus to prevent CAPE1 release.

      Thank you for these suggestions! We have discussed the possibility that binding of FolSvp1 to SlPR1 may inhibit the function of the latter in the cytoplasm and stated that additional experiments are required in future studies in the revised manuscript (lines 436-444).

      3) FolSvp1 carrying the PR1 signal peptide interacted with SlPR1 in the apoplast (Fig 6D and Fig 6-figure supplement 2). Why weren't these proteins translocated into the nucleus? These seem to contradict the in vitro uptake data. It seems that either no or only a very small proportion of SlPR1 transiently expressed in tobacco cells is located in the nucleus. Fig 7C shows that infection of the WT strain, but not the nls1 mutant strain, allowed detection of SlPR1 in the nucleus of tomato cells. However, it is not clear how much of SlPR1 remain in the apoplast or cytoplasm. Is the FolSpv1 protein secreted by Fol sufficient to translocate a significant portion of SlPR1 into the nucleus? The authors are suggested to examine apoplastic and cytoplasmic protein fractions for the relative amounts of SlPR1 after Fol infection.

      Thank you very much for this constructive point! The observations of FolSvp1 and SlPR1 interaction in both the apoplast and the nucleus of N. benthamiana leaves suggest that binding of FolSvp1 to SlPR1 may inhibit its anti-fungal activity and/or the cleavage of SlPR1 to produce CAPE1 in the extracellular region or even the cytoplasm. In addition, the BiFC assays performed with N. benthamiana leaves might not completely mimic the physiological conditions. Therefore, whether FolSpv1-mediated translocation of SlPR1 into the nucleus impedes CAPE1 release is the only way of PR1 inactivation needs to be further strengthened with additional data in future studies. We have added these information to the revised manuscript (lines 436-444).

      4) Fig 7J and 7K, a better experiment would be to pretreat WT tomato plants with CAPE1 prior to inoculation with WT and FolSpv1 OE strains. The pretreatment should eliminate the virulence function of FolSpv1 OE if the virulence is solely dependent on the prevention of CAPE1 release.

      Thank you for this suggestion! We have stated in the Discussion section that FolSpv1-mediated translocation of SlPR1 into the nucleus impedes CAPE1 release needs to be further strengthened with additional data in the revised manuscript (lines 441-444). It will be of considerable interest to perform the suggested experiments in future studies.

      Reviewer #2 (Public Review):

      1) As far as I know, the apoplastic PR1 proteins may have a fungicide activity. When the authors tested the interaction between FolSvp1 and SlPR1 in Nicotiana benthamiana by BiFC, both apoplastic and nuclear interactions could be detected. Therefore, the authors should discuss the possibilities whether the binding of FolSvp1 to SlPR1 remained in the apoplast can inhibit (i) its anti-Fol activity and (ii) the cleavage of SlPR1 to produce the CAPE1 peptide. In other words, although translocating SlPR1 to the nucleus by FolSvp1 is effective for suppressing CAPE1 production, this may not be the only way.

      Thank you very much for this constructive point! The observations of FolSvp1 and SlPR1 interaction in both the apoplast and the nucleus of N. benthamiana leaves suggest that binding of FolSvp1 to SlPR1 may inhibit its anti-fungal activity and/or the cleavage of SlPR1 to produce CAPE1 in the extracellular region or even the cytoplasm. Therefore, whether FolSpv1-mediated translocation of SlPR1 into the nucleus impedes CAPE1 release is the only way of PR1 inactivation needs to be further strengthened with additional data in future studies. According to the revision policy of eLife in response to COVID-19, we have added these information to the revised manuscript (lines 436-444).

      2) The FolSvp1 produced in N. benthamiana was using the SlPR1 signal peptide and lacked the acetylation modification. It is possible that the acetylation of FolSvp1 can affect the interaction affinity or localization between FolSvp1 and SlPR1. The K167Q mutation of FolSvp1 might not be able to faithfully mimic the K167 acetylation.

      Thank you for this suggestion! It’s true that the BiFC assays performed with N. benthamiana leaves might not completely mimic the physiological conditions. We have discussed this possibility in the revised manuscript (lines 439-444).

    1. Author Response

      Reviewer #1 (Public Review):

      This paper describes longitudinal MRI measurements of "grey matter volume" (GMV) and "white matter volume" (WMV) in the brains of mice that were trained in a well-established one-pawed reaching/grasping paradigm for fine-motor skill learning. GMV/WMV ratio is presumed to reflect the extent to which axons in the region of interest are ensheathed by water-poor myelin membrane ("myelinated"). The conclusion is that WMV increases during learning in several task-related brain regions such as the primary motor cortex and somatosensory cortex, as well as a number of regions that are not so obviously task-related. Parallel decreases in GMV were observed. No change in overall cortical volume was detected so the conclusion is that some intra-cortical axons become myelinated in response to motor learning - supporting the idea of "adaptive myelination" proposed by others. Supporting histochemical evidence is provided (quantitative myelin immunolabelling). The MRI changes observed did not occur in a simple linear or cumulative fashion during learning, but rather increased in a non-linear asymptotic way, or even peaked and decreased again during training ("quadratic"). This is an interesting and useful study that takes us a little closer to understanding what is going on in the brain during learning and memory formation and continues the development of MRI as a useful non-invasive tool for studying the contribution of myelin to these processes.

      Specific points:

      1) "Grey matter" and "white matter" are normally used to describe spatially distinct brain regions that are sparsely myelinated (grey) or heavily myelinated (white), for example, the cerebral cortex (grey) and underlying subcortical axon tracts (white). However, most or all regions are described here as white matter within the classical grey matter - within the motor cortex, for example. Classical white matter regions such as corpus callosum do not get a mention. Presumably, the authors' use of the terms grey and white matter refer to specific MRI signals that are designed to pick up relatively water-rich or water-poor domains that are presumed to reflect the abundance of myelinated versus unmyelinated fibers, not necessarily the classic anatomical grey or white matter. However, this is confusing. Is it possible to change the terminology from grey and white matter to myelin-rich and myelin-poor, water-poor and water-rich, or something similar? At the very least it requires a better explanation.

      We thank this reviewer for bringing up this point and apologize for the confusion. In the revised version of the manuscript, we now present higher-magnification of the images that were used to quantify MBP immunoreactivity (densitometry) (see Main Figure 5-Supplementary Figure 3 in the revised version of the manuscript). In addition, new immunohistochemical experiments were performed and a second method was used to investigate myelinated axons within the cortex. Coronal sections were immunolabeled for myelin basic protein (MBP) and high-resolution confocal imaging was performed on a subset of trained mice (n=12 mice, n=108 probes, 9 probes per animal, represented in Main Figure 6-Supplementary Figure 1 in the revised version of the manuscript). We acquired Z-stacks with a minimum of 30 optical sections and performed an analysis of fibers based on a quantitative 3D immunohistochemical method (3D-QICH) to reconstruct and analyze length density, diameter and volumetric fraction of myelinated axons. This method of analysis of fibers was first implemented to measure vascularity (Fouard et al., 2006) which was later developed further and validated for the systematic analysis of axons (Hamodeh et al., 2010; Hamodeh et al.,2014; Hamodeh et al., 2017). The method employed for the 3D-reconstruction and analysis of myelinated axons is explained in detail in the Material and Methods section of the revised manuscript. There a significant increase in the length density of myelinated axons from baseline to experimental day 6 followed by a significant decrease towards baseline levels at experimental day 14 (one-way ANOVA, F2,7 = 8.249, P < .05; Fig. 6B), following a quadratic model rather than a linear one (AIC > 2).

      Fouard, C., Malandain, G., Prohaska, S., & Westerhoff, M. (2006). Blockwise processing applied to brain microvascular network study. IEEE Trans.Med Imaging, 25(10), 1319-1328.

      Hamodeh, S., Eicke, D., Napper, R. M. A., Harvey, R. J., & Sultan, F. (2010). Population based quantification of dendrites: evidence for the lack of microtubule-associate protein 2a,b in Purkinje cell spiny dendrites. Neuroscience, 170(4), 1004-1014. doi:10.1016/j.neuroscience.2010.08.021

      Hamodeh, S., Sugihara, I., Baizer, J., & Sultan, F. (2014). Systematic analysis of neuronal wiring of the rodent deep cerebellar nuclei reveals differences reflecting adaptations at the neuronal circuit and internuclear level. J Comp Neurol, 522, 2481-2497.

      Hamodeh, S., Bozkurt, A., Mao, H., & Sultan, F. (2017). Uncovering specific changes in network wiring underlying the primate cerebrotype. Brain Struct Funct, 222(7), 3255-3266. doi:10.1007/s00429-017-1402-6

      2) Several previous studies of motor learning in rodents, both MRI- and histology-based, have identified structural alterations and/or changes to oligodendrocytes and myelin in the corpus callosum underlying the motor cortex. In general, those white matter alterations were proportionally greater than those detected within the cortex itself. However, the present study apparently did not find significant MRI signal changes in sub-cortical white matter, which is surprising. Was this because the MRI sequences were not optimized for classical "white matter", or because the white matter was specifically excluded from the analysis (masked out)? If the latter, why was sub-cortical white matter excluded from the analysis? This needs discussion and explanation.

      We thank this reviewer for bringing up this critical point. As mentioned above in point #4 to the Editor, significant increases in WMV were observed on the whole-brain level in many areas of WM in the brain (also see Main Figure 2-Supplemnetary Figure 3). For whole-brain analyses, all subcortical white matter regions were included in the analysis of WMV. Table 1 in the revised version of the manuscript indicate the significant changes and the direction of these changes: decreases in GMV (Main Figure 2A) and increases in WMV (Main Figure 2B). Significant changes were found in WMV, but these were not represented in the Figures originally presented. Instead, we chose to depict significant changes at PFDR corr < 0.01 for increases in WMV and PFDR corr < 0.001 for decreases in GMV, due to the high number of significant voxels at PFDR corr < 0.05, for both WMV and GMV. The Figure in point #4 to the Editor (new Main Figure-Supplementary Figure 4) depicts significant increases in WMV according to the asymptotic model at PFDR corr < 0.05. Clear changes are observed in subcortical WMV, however, we chose to present higher thresholded results (PFDR corr < 0.01) to present the more discrete clusters of increases in WMV together with the more discrete clusters of decreases in GMV at PFDR corr < 0.001.

      3) The quantitative MBP immunolabelling is a crucial piece of supporting evidence for the suggestion that MRI signal changes reflect adaptive myelination. What was the baseline against which immunoreactivity was measured? What did the fluorescence labelling look like at higher magnification - can individual myelin sheaths be distinguished, for example, and could these sheaths be counted, to complement and reinforce densitometry? Higher-mag images should be included in a revision.

      We thank this reviewer for these questions. Baseline measurements of myelin immunoreactivity were quantified in brain sections from food-restricted mice that never underwent behavioral training, represented as experimental day 0 in Main Figure 5C, Main Figure 6A-C, Main Figure 5-Supplementary Figure 2B. We also evaluated myelin immunoreactivity in non-trained control mice; mice that were food-restricted and placed into the training cage during the 15 experimental days, yet the daily ration of food pellets was provided on the floor of the cage rather than the shelf of the training cages. These data are represented in Main Figure 5-Supplementary Figure 2A and 2B.

      In the revised version of the manuscript, we have included a higher magnification image of a representative section (see below and as Main Figure 5-Supplementary Figure 3) to depict the area for which MBP-immunoreactivity was quantified. Individual myelinated axons can be appreciated in areas of cortex or striatum with limited myelinated axons. Yet due to the dense plexus of myelinated axons in cortical areas where significant VBM clusters were observed, it was not possible to identify and count individual myelinated axons within 20-micron thick histological sections using fluorescence light microscopy. To complement and reinforce our observations from MBP densitometry, we performed additional immunohistochemical labeling in subsequent coronal brain sections and used confocal laser scanning microscopy to be able to distinguish individual myelinated axons. As mentioned in answer #1 to editor, we acquired Z-stacks with a minimum of 30 optical sections and performed an analysis of fibers based on a quantitative 3D immunohistochemical method (3D-QICH) to reconstruct and analyze length density, diameter and volumetric fraction of myelinated axons. The method employed for the 3D-reconstruction and analysis of myelinated axons is explained in detail in the Material and Methods section of the revised manuscript. There a significant increase in the length density of myelinated axons from baseline to experimental day 6 followed by a significant decrease towards baseline levels at experimental day 14 (one-way ANOVA, F2,7 = 8.249, P < .05; Fig. 6B), following a quadratic model rather than a linear one (AIC > 2). This new data is now presented in Main figure 6 in the revised version of the manuscript and confirm our observations from densitometry of adaptive myelination during learning.

      Reviewer #2 (Public Review):

      This study uses a well-established reaching task to assess the effect of learning on cortical structures as assessed by MRI in mice. The results show a decrease in grey matter (GM) and an increase in white matter (WM) volumes that appear to peak at experimental day 8, falling slightly thereafter.

      This is an interesting addition to the literature around myelination changes associated with learning/activity (adaptive myelination). However, it requires significant additional analysis. The correlation between imaging and histology is critical, but the only measure used here is MBP immunoreactivity. This is insufficient, as MBP can be expressed by newly-formed oligodendrocyte cell bodies, by their processes, and by the myelin sheath they form; but only the latter is relevant to function. So, a much more detailed analysis of oligodendrocyte morphology and myelin sheath number/size is required. This analysis needs to distinguish different layers of the cortex. This is easy for the superficial layers where myelination is sparse but much more difficult in the more heavily myelinated deeper layers. Here, counting nodes of Ranvier by Caspr immunostaining provides a good proxy. Ideally, both sheath number and sheath length would be analysed, but I accept that most studies point to number rather than changes in length as being the key changes in adaptive myelination. Then, the critical precise correlation of imaging changes with myelin sheath number can be made and the conclusion that the MRI changes represent physiologically significant changes in myelination becomes more solid.

      We thank this reviewer for bringing up their suggestions to improve our manuscript. In the revised manuscript, we have now addressed which cortical layers demonstrate significant changes in GMV and WMV (new Main Figure 4 in the revised manuscript) and we have now included an additional series of experiments to further quantitate myelinated axons in somatosensory cortex for the forelimb.

      We acknowledge that MBP can be expressed in newly-formed oligodendrocyte cell bodies, by their processes, and by the myelin sheath they form. For this reason, we complemented the densitometry now presented in Main Figure 5 of the revised manuscript with a confocal-based analysis of myelin sheath/myelinated axons. The latter is presented in Main Figure 6 of the revised manuscript and further supports adaptive changes in intracortical myelin during learning. Using confocal microscopy, in combination with the quantitative analysis of fibers by using a function in Amira software for fiber skeleton reconstruction, significant changes were observed in length density. In the revised discussion we have stated that changes in the length density of myelinated axons reflect both changes in length and in number, or density, of myelinated axons in somatosensory cortex for the forelimb. Our analysis also quantitated the diameter of myelinated axons, for which we observed a decrease at experimental day 6 followed by an increase in diameter at experimental day 14, albeit these changes did not reach significance. We added in the revised discussion a paragraph hypothesizing that an increase in length density combined with a putative decrease in the diameter of myelinated axons at experimental day 6 could indicate the appearance of new myelinated axons (novel candidate circuits). Afterwards, during the consolidation phase of learning, optimal candidate circuits may be selected and refined, for which putative increases in myelin sheath diameter may occur. However, to further understand changes in myelinated axons with learning, future studies should focus on a longitudinal in vivo evaluation of individual myelinated axons.

      Due to the dense plexus of myelinated axons in cortical areas where significant VBM clusters were observed, these deeper layers are challenging to quantitate adaptive changes in individual myelinated axons and/or nodes of Ranvier by the use of Caspr immunoreactivity in the 20-µm thick histological sections generated by our dataset. These heavily myelinated deeper layers are also challenging to quantitate adaptive changes in myelination using, for example, longitudinal in vivo measurements by two-photon microscopy since this technique is typically limited to imaging more superficial depths (300–400 µm) of cortex. The focus of this manuscript was to demonstrate that white matter volume in somatosensory cortex significantly correlates with myelin immunoreactivity, to support the hypothesis that myelin is a component of non-linear structural changes observed by longitudinal voxel-based morphometry during learning. We are planning a future study are to determine a physiological correlate to the changes we present in this manuscript using fiber photometry and multielectrode recordings during learning.

    1. Author Response:

      We thank the reviewers for their thoughtful comments. We would like to respond to one point made by reviewer 2. We agree with the recommendations of this reviewer for improving the manuscript, including additional studies in non-transformed cells. However, we would also like to clarify one point. Reviewer 2 stated that “results in Figure 4C indicate that total STAT1 is completely localized in the nucleus even prior to interferon stimulation when it should be in the cytoplasm.” Figure 4C uses proximity ligation assays to show that the interaction of STAT1 with DUX4-CTD occurs in the nucleus, at a lower level without interferon and a higher level with interferon, but does not measure the distribution of total STAT1. Supplemental Figure S3A/3B shows a combined cytoplasmic and nuclear distribution of STAT1 without interferon treatment and shows increased nuclear STAT1 with interferon treatment, as would be expected in cells with an intact signaling pathway, although we also agree that the presentation of this finding can be improved with additional images that specifically address this point. Again, we thank both reviewers for their careful reading and helpful comments on our study.

    1. Author Response

      Reviewer #1 (Public Review):

      Iyer et al. address the problem of how cells exposed to a graded but noisy morphogen concentration are able to infer their position reliably, in other words how the positional information of a realistic morphogen gradient is decoded through cell-autonomous ligand processing. The authors introduce a model of a ligand processing network involving multiple ”branches” (receptor types) and ”tiers” (compartments where ligand-bound receptors can be located). Receptor levels are allowed to vary with distance from the source independently of the morphogen concentration. All rates, except for the ligand binding and unbinding rates, are potentially under feedback control. The authors assume that the cells can infer their position from the output of the signalling network in an optimal way. The resulting parameter space is then explored to identify optimal ”network architectures” and parameters, i.e. those that maximise the fidelity of the positional inference. The analysis shows how the presence of both specific and non-specific receptors, graded receptor expression and feedback loops can contribute to improving positional inference. These results are compared with known features of the Wnt signalling system in Drosophila wing imaginal disc.

      The authors are doing an interesting study of how feedback control of the signalling network reading a morphogen gradient can influence the precision of the read-out. The main strength of this work is the attention to the development of the mathematical framework. While the family of network architectures introduced here is not completely generic, there is enough flexibility to explore various features of realistic signalling systems. It is exciting to find that some network topologies are particularly efficient at reducing the noise in the morphogen gradient. The comparison with the Wnt system in Drosophila is also promising.

      Major comments:

      1) The authors assume that the cell estimates its position through the maximum a posteriori estimate, Eq.(5), which is a well-defined mathematical object; it seems to us however that whether the cell is actually capable of performing this measurement is uncertain (it is an optimal measurement in some sense, but there is no guarantee that the cell is optimal in that respect). Notably, this entails evaluating p(theta), which is a probability distribution over the entire tissue, so this estimate can not be done with purely local measurements. Can the authors comment on this and how the conclusions would change if a different position measurement was performed?

      This is indeed an important question. Our viewpoint is that if the cells were to use a maximum a posteriori (MAP) estimate (Eq. 5) to decode their positions, then what features of the channel architecture would lead to small errors in positional inference. Whether the maximum a posteriori estimate is employed by the cell, or some other estimate, is an important but difficult question to address. Our choice has been motivated by how this estimate has allowed the precise determination of developmental fates in the context of gap gene expression in Drosophila embryo [1, 2, 3]. We had earlier computed the inference error with a different estimate i.e.

      which computes the mean squared deviations of the inferred positions from the true position for each x, taking into account the entire distribution p(x∗|x). While the qualitative results are the same, the inference errors showed spurious jitters from outliers in sampling the noisy morphogen input distribution. This consistency might suggest that our qualitative results are insensitive to the choice of the estimate.

      Further, when evaluating the MAP estimate, the term p(θ) in the denominator serves as a normalisation factor to ensure p(x|θ) is a probability density. This is not strictly necessary for MAP estimation. Since p(θ) does not depend on x, the MAP estimate can be written as follows

      without the need for evaluating p(θ). In the case of a uniform prior, it would be equivalent to maximum likelihood estimate (MLE) i.e.

      2) One of the features of the signalling networks studied in the manuscript is the ability of the system to form a complex (termed a conjugated state, Q) made of two ligands L, one receptor and one nonsignalling receptor. While there are clear examples of a single ligand binding to two signalling receptors (e.g. Bmps), are there also known situations where such a complex with two ligands, one receptor, and one non-signalling receptor can form? In the Wnt example (Fig. 10a), it is not clear what this complex would be? In general, it would be great to have a more extended discussion of how the model hypothesis for the signalling networks could relate to real systems.

      This is a good suggestion. We have now added a discussion on the various possible realisations of the “conjugate state” Q in Section 3.6. We have also explored the various states in the context of different signalling contexts such as Dpp, Hh, Fgf in the Discussion section.

      The conjugated state ‘Q’ represents a combination of the readings from the two branches i.e. receptor types. This could be realised through processes like ligand exchange or complex formation, both in a shared spatial location such as a compartment. As discussed in the original manuscript (Section 3.6 of the revised manuscript), the ligand Wg in the Wg signalling pathway is internalised through two separate endocytic pathways associated with the receptor types - signalling receptor Frizzled (via Clathrin-mediated endocytosis (CME)) and non-signalling receptor HSPGs (via the CLIC/GEEC pathway (CLIC - (clathrin-independent carriers, GEEC - GPI-anchored protein-enriched early endosomal compartments)). Both pathways meet in a common early endosomal compartment where the ligands may be exchanged between the two receptors [4]. In a previous work by Hemalatha et al [4], we had shown that there are more Wg-DFz2 interactions in the endosomal compartment (measured through FRET) than on the cell surface. Therefore, the non-signalling receptors directing Wg through the CLIC/GEEC pathway titrate the amount of Wg interaction with the signalling receptor, DFz2.

      As mentioned in the original manuscript (Section 3.3 and subsection 4.2 of the Discussion in the revised manuscript), apart from Wg signalling, non-signalling receptors such as the HSPGs have also been proposed to act as co-receptors for Dpp, Hh, FGF (reviewed in [5, 6]). Although some ligands bind to the core protein of HSPG, the majority of the ligands bind to the negatively charged HS chains [7, 8]. Here, the coreceptors HSPGs aid in capturing diffusible ligands and presenting the same to signalling receptors (either on the cell surface or within endosomes).

      3) The authors consider feedback on reaction rates - it would seem natural to also consider feedback on the total number of receptors; notably, since there are known examples of receptors transcriptionally down-regulated by their ligands (e.g. Dpp/Tkv)? Also it is not clear in insets such as in Fig. 7b, if the concentration plotted corresponds to the concentration of receptors bound to ligands?

      As mentioned in the original manuscript (Section 2.2 of the revised manuscript), we have indeed considered control on reaction rates and receptors, although the control on the latter is done with the constraint of receptor profiles being monotonic. Further, while the control on reaction rates is considered via feedbacks explicitly, the control on receptors is done via an approach akin to the openloop control used in control theory. In reality, cellular control on receptors will involve transcriptional up- or down-regulation of receptor and thus warrant a feedback control approach – however, the timescales involved in such a control are different from the binding-unbinding and signalling timescales.

      Therefore, in the current work, we take the morphogen profile to be given i.e. independent of receptor concentrations, and we ask for the receptor concentrations that would help reduce the inference errors.

      Our predictions of increasing signalling receptor and decreasing non-signalling receptors in a twobranch channel architecture are consistent with the known transcriptional up-regulation of Dally/Dlp and down-regulation of Fz by Wg signalling [9].

      In a future work, we will extend the control on receptors to include feedbacks explicitly. Furthermore, the explicit feedback control on receptors may need to be considered concomitantly with the effect of receptors on morphogen dynamics (i.e. morphogen sculpting by receptors) along with the possibility of spatial correlations in receptor concentrations through neighbouring cell-cell interactions.

      As mentioned in the original manuscript (Section 2.2 of the revised manuscript), the variables ψ and φ stand for the total (bound + unbound) surface receptor concentrations of the signalling and the non-signalling receptors respectively. Therefore, the insets showing receptor profiles such as in Fig. 6b, 7b, and Appendix H Fig.8b,e correspond to the total surface receptor concentrations.

      4) The authors are clear about the fact that they consider the morphogen gradient to be fixed independently of the reaction network; however, that seems like a very strong assumption; in the Dpp morphogen gradient for instance over expression of the Tkv receptor leads to gradient shortening. Can the authors comment on this?

      This point is related to the earlier question 4. As discussed in the Discussion of the original manuscript (subsection 4.3 of the revised manuscript), we focus on finding the optimal receptor concentration profiles and reaction networks that enable precision and robustness in positional information from a given noisy morphogen profile. The framework and the optimisation scheme within it will prescribe different receptor profiles and reaction networks for different monotonically behaving, noisy morphogen profiles. It is possible that cells may achieve the optimal receptor concentrations via feedback control on production of the receptors.

      Broadly, morphogen dynamics depends on cell surface receptors, which could participate in both the inference and the sculpting of the morphogen profile, and factors independent of them such as extracellular degradation, transport and production, etc. In our present work, we have taken the receptors involved in sculpting and inference as being independent.

      In a more general case, feedback control on receptors will change the receptor concentrations as well as the morphogen profile. We are currently working on realising such a feedback control on receptors within the same broader information theoretic framework proposed in the current work.

      5) Fig. 10f is showing an exciting result on the change in endocytic gradient CV in the WT and in DN mutant of Garz. Can the authors check that the Wg morphogen gradient is not changing in these two conditions? And can they also show the original gradient, and not only its CV?

      The reviewer raises a legitimate concern – could the observed changes in CV upon perturbation of endocytic machinery be attributed to a systematic change in the mean levels of the endocytosed Wg alone? In the original manuscript (Appendix O Fig.17b,c of the revised manuscript), we show the normalised profiles of endocytic Wg in control and myr-Garz-DN cases. Here, in Fig.1 below, we show a comparison between the mean Wg concentrations (measured as fluorescence intensity) in control wing discs and discs wherein CLIC/GEEC endocytic pathway is removed using UAS-myr-Garz-DN. For clarity, we show the discs with largest and smallest fluorescence intensities from the control and myr-Garz-DN discs. It is hard to conclude that the mean concentrations are significantly different in the two cases.

      Reviewer #2 (Public Review):

      The work of Iyer et al. uses a computational approach to investigate how cells using multiple tiers of processing and multiple parallel receptor types allow more accurate reading of position from a noisy signal. Authors find that combining signaling and non-signaling types of receptors together with additional feedback increases the accuracy of positional readout against extrinsic noise that is conveyed in the morphogen signal. Further, extending the number of layers of signal processing counteracts the intrinsic stochasticity of the signal reading and processing steps. The mathematical formulation of the model is general but comprehensive in the way it handles the difference between branches and tiers for the processing of channels with feedbacks. The results of the model are presented from simple one-branch and one-tier architecture to two-branch and two-tier architecture with feedbacks. Interestingly authors find that adding more tiers results in only very small improvements in the accuracy of positional readout. The model is tested against a perturbation experiment that impairs one of the signaling branches in the Drosophila wing disc, but the comparison is only qualitative as further experiment-oriented work is planned in a separate paper.

      Strengths

      There is a clear statement of objectives, model, and how the model is evaluated. In particular, the objective is to find what number of receptor types and their concentrations for a given number of tiers and feedback types is resulting in the most accurate positional readout. The employed optimization procedure is capable to find signalling architectures that result in one cell diameter positional precision for most of the tissue with 3-4 cells at the tissue end that is most distant to the morphogen source. This demonstrates that employing additional complexity in signal processing results in a very accurate positional readout, which is comparable with estimates of positional precision obtained in other developmental systems (Petkova et al., Cell 2019, Zagorski et al., Science 2017).

      The optimal signalling architectures indicate that both signalling (specific) and non-signalling (nonspecific) receptors affect the precision of positional readout, but the contributions of each type of these receptors are qualitatively different. Even slight perturbation of signalling receptors drives the system out of optimum, resulting in a decrease in positional precision. In contrast, the non-signalling receptors could accommodate much larger perturbations. This observation could provide a biophysical explanation for how cross-talk between different morphogen species could be realized in a way that positional precision is kept at the optimum when morphogen signaling undergoes extrinsic and intrinsic perturbations.

      Last, the model formulation allows to specifically address perturbations of signalling and feedbacks, that could be explored to validate model predictions experimentally in Drosophila wing disc, but also in other developmental tissues. The authors present a proof-of-concept by obtaining consistent results of variation of output profiles in two-tier two-branch architectures with non-signaling branch removed and intensity profiles of Wg in wing disc where the CLIC/GEEC endocytic pathway was perturbed.

      Weaknesses

      The list of model parameters is long including more than 20 entries for two-tier two-branch architectures. This is expected, as the aim of the model is to describe the sophisticated signalling architecture mimicking the biological system. However, this also makes it very challenging or impossible to provide guiding principles or understanding of the system behaviour for the complete space of signalling architectures that optimize positional readout. Although, the employed optimization procedure finds solutions that exhibit very high positional accuracy, there is only very limited notion how these solutions depend on variation of different parameters. The authors do not address the following question, whether these solutions correspond to broad global optima in the space of all solutions, or were rather fine-tuned by the optimization procedure and are quite rare.

      It is unclear how contributions from the intrinsic noise affect the system behaviour compared to contributions from extrinsic noise. In principle, the two-branch one-tier architecture results in an already very accurate positional readout across the tissue. The adding of another tier seems to provide only a very weak improvement over a one-tier solution. It is possible that contributions from intrinsic noise for the investigated signalling architectures are only mildly affecting the system compared with contributions from extrinsic noise. Hence, it is difficult to assess whether the claim of reducing intrinsic noise by adding another tier is supported by the presented data, as the contributions from intrinsic noise could overall very weakly affect the positional readout.

      The optimal response of the channel to extrinsic and intrinsic noises is very distinct. As noted correctly by the reviewer, an additional tier provides only a marginal improvement in inference error due extrinsic noise (compare Fig.7 and Fig.8 in the revised manuscript). However, as shown in Fig.9c of the revised manuscript (same as in the original manuscript), adding an extra tier provides a substantial improvement in inference errors due to intrinsic noise.

      References

      [1] Gasper Tkacik, Julien O Dubuis, Mariela D Petkova, and Thomas Gregor. Positional information, positional error, and readout precision in morphogenesis: a mathematical framework. Genetics, 199:39– 59, 2015.

      [2] Mariela D Petkova, Gasper Tkacik, William Bialek, Eric F Wieschaus, and Thomas Gregor. Optimal decoding of cellular identities in a genetic network. Cell, 176:844–855, 2019.

      [3] Julien O Dubuis, Gaˇsper Tkaˇcik, Eric F Wieschaus, Thomas Gregor, and William Bialek. Positional information, in bits. Proceedings of the National Academy of Sciences, 110:16301–16308, 2013.

      [4] Anupama Hemalatha, Chaitra Prabhakara, and Satyajit Mayor. Endocytosis of wingless via a dynaminindependent pathway is necessary for signaling in drosophila wing discs. Proceedings of the National Academy of Sciences, 113:E6993–E7002, 2016.

      [5] Xinhua Lin. Functions of heparan sulfate proteoglycans in cell signaling during development. Development, 131:6009–6021, 2004.

      [6] Stephane Sarrazin, William C Lamanna, and Jeffrey D Esko. Heparan sulfate proteoglycans. Cold Spring Harbor perspectives in biology, 3(7):a004952, 2011.

      [7] Catherine A Kirkpatrick, Sarah M Knox, William D Staatz, Bethany Fox, Daniel M Lercher, and Scott B Selleck. The function of a drosophila glypican does not depend entirely on heparan sulfate modification. Developmental biology, 300(2):570–582, 2006.

      [8] Mariana I Capurro, Ping Xu, Wen Shi, Fuchuan Li, Angela Jia, and Jorge Filmus. Glypican-3 inhibits hedgehog signaling during development by competing with patched for hedgehog binding. Developmental cell, 14(5):700–711, 2008.

      [9] Kenneth M Cadigan, Matthew P Fish, Eric J Rulifson, and Roel Nusse. Wingless repression of drosophila frizzled 2 expression shapes the wingless morphogen gradient in the wing. Cell, 93(5):767–777, 1998.

    1. Author Response

      Reviewer #2 (Public Review):

      Strengths:

      This is potentially a very large and robust dataset of spinal stimulation while the animal performs a wrist torque task. However, the authors do not detail the number of trials obtained for each combination of conditions - stimulation location, current intensity, movement direction, number of repetitions, etc.

      We have provided an additional table to present the summary of collected data (Table 1 and 2 in Supplementary File 1). Each experiment consisted of 63-1004 successful trials that were evenly distributed to 8 task targets. We described this in the text on line 823-824. However, we indicated the averaged evoked muscle responses or the averaged evoked torques using the stimulus triggered average throughout the manuscript, we believe that it is more important to show the number of stimuli for averaging. Thus, we have kept the description of the number of stimuli in the typical examples of Figures 2A-C, 5A-C, 7B-D and 8A-C.

      Lines: 823-824 “Each experiment consisted of 63-1004 successful trials (Table 2 in Supplementary File 1).”

      Weaknesses:

      The authors' primary conclusion is that spinal stimulation at moderate current intensities facilitates the effects of descending inputs of the motor command. However, the authors need to expand on:

      i. The effect of these intensities of spinal stimulation on their own; without voluntary movement.

      ii. The robustness of the interactions observed.

      We added the results of stimulus-induced muscle responses (Figure 2A-C, 5A-C and 6A-D) and stimulus-induced torques (Figure 7B-D) during the hold period for the center target (i.e., during awake rest). These data allowed us to quantify the PStEs and the evoked torques without the effect of intended torque production. We could observe clearly the PStEs for Facilitation and the evoked torque. However, it was difficult to observe PStEs for Suppression because it required the substantial voluntary muscle activation to be inhibited. The robust interaction was demonstrated by the modulations of PStEs and the evoked torque from the awake rest to the voluntary torque production. We added further discussion on this point as follows:

      Results Lines 126-131 “The PStEs during the entire period of the task (insets on Figure 2A-C) showed either post-stimulus facilitative (Facilitation, insets on Figure 2A and C) or suppressive effect (Suppression, inset on Figure 2B). Spinal stimulation occasionally produced small magnitude of Facilitation during the hold period for the center target where the voluntary wrist torque production was not intended (center panels on Figure 2A). However, different magnitudes and/or types of PStEs were observed among the directions of voluntary torques (Figure 2A-C).”

      Lines 143-146 “Especially in PStEs of Facilitation, the magnitude of PStEs in the peripheral target close to the PD of background EMG (Figure 2A, 270° and 315°) was generally larger compared with that in the center target and smaller in the peripheral target opposite to the PD (Figure 2A, 90° and 135°).”

      Legend of Figure 2A-C Lines 170-171 “Muscle responses to spinal stimulation during the hold period for the 8-peripheral (peripheral panels) and the center targets (center of peripheral panels).”

      Results Lines 356-357 “Left insets and gray dots in right panels (Figure 5A-C) show the PStEs and background EMGs during hold period for the center target.”

      Legend of Figure 5A-C Lines 368-372 “The leftmost insets show PStEs during the hold period for the center target. The rightmost panels for each muscular condition show two-sided Pearson’s correlation coefficients between the magnitudes of background EMGs and PStEs. Gray dots in right panels indicate the result during the hold period for the center target that were not included for the correlation analyses.”

      Results Lines 394-402 “PStEs during the hold period for the center target increased as current intensity increased, showing a simple input-output property of stimulus-indued muscle responses (“Center target”, insets on Figure 6A-D). In general, including the hold period for the center target, the magnitudes of PStEs at low stimulus currents was linearly increased depending on the magnitudes of background EMGs (Figures 5A-C and 6A). However, the magnitudes of PStEs of Facilitation at medium currents were often larger during hold period for the center target (Figure 6B and C insets) compared to that during voluntary torque production even though the magnitude of background EMG was identical between them (Figure 6B and C, rightmost panels).”

      Legend of Figure 6A-D Lines 419-423 “The leftmost insets show PStEs during hold period for the center target intended to relax the wrist. The rightmost panels indicate two-sided Pearson’s correlation coefficients between the magnitudes of background EMGs and PStEs. Gray dots in right panels indicate the result during hold period for the center target that were not included for the correlation analyses.”

      Results Lines 452-460 “In another case, spinal stimulation at 300 μA mainly induced Facilitation effects on muscles with higher background EMG (outer peripheral panels in Figure 7C and Figure 7-figure supplement 1B), and the directions of the Evoked Torque were similar to the directions of voluntary torque independent of the direction of the Evoked Torque at the center target (center and inner peripheral panels in Figure 7C). Stimulation at 1700 μA exhibited large magnitudes of Facilitation in all muscles for all targets (outer peripheral panels in Figure 7D and Figure 7-figure supplement 1), and the Evoked Torques displayed ulnar-flexion directions regardless of the presence/absence or the direction of voluntary torque (center and inner peripheral panels in Figure 7D).”

      Legend of Figure 7B-D Lines 487-489 “StTAs of rectified EMGs (outer peripheral panels and center-bottom panel) and StTAs of wrist torque trajectories (inner peripheral panels and center-top panel).”

      Discussion Lines 669-680 “Compared with the hold period for the center target, the stimulus-induced muscle responses and torques at low to medium currents were generally more pronounced during the hold period for the peripheral targets (Figure 2A-C, Figure 7B and C, and Figure 7-figure supplement 1), indicating that the descending commands augmented activation in the spinal motoneurons and interneurons driven by spinal stimulation. Interestingly, at medium currents, the stimulus-induced facilitatory responses were sometimes smaller when the responses were recorded in the antagonistic muscles against the wrist torque direction regardless of the background EMG activity (Figure 2A and Figure 7-figure supplement 1B), suggesting that spinal reciprocal inhibitory function was evolved by the descending commands (Meunier and Pierrot-Deseilligny, 1998). Together, our findings indicate that voluntary commands amplify the functions of spinal circuits, including excitatory and inhibitory synaptic connections to motoneurons activated by spinal stimulation.”

      Specific comments:

      1) Interpretation of the main result - The authors state that they investigated the "effect of descending inputs on the stim-evoked EMG and torque output". But, their experimental design which compares post-stim EMG to pre-stim EMG provides a somewhat different result, i.e., the effect of spinal stimulation on voluntarily-evoked EMG and torque output. In other words, the voluntary output is held constant (independent variable) and the spinal stimulation parameters are varied (dependent variable).

      To get what the authors state, the design would have to be modified wherein the comparison would have to be between post-stim muscle activity recorded in the wrist neutral vs one of the holding state; Or comparison of post-stim muscle activity when the arm is passively torqued vs when voluntarily torqued.

      In our study, we compared pre-stim EMG and post-stim EMG in order to determine the presence/absence and the polarity (facilitation/inhibition) of PStEs. Our main aim in this study was to investigate the effect of descending commands (voluntary output) on the stimulus-evoked responses, and we concluded that the descending commands influence the spinal interneuron activities elicited by spinal stimulation. The motor task requires the control of the direction and magnitude of wrist torque attained in order to manipulate the magnitude of descending commands that were expressed as the background EMG activity at each muscle. Then, the result showing that PStEs were modulated by the variation of background EMG certainly indicates that the descending commands influence PStEs.

      In the revised version of the manuscript, we present additional data of PStEs and evoked torque while the wrist remained in the neural position (i.e., during awake rest) to address your comment.

      2) Most of the studies that have demonstrated the benefits of spinal stimulation, esp. in humans, have used sub-threshold stimulation. The manuscript does not provide direct information regarding the threshold of stimulation. Only table 2 provides such information but the data collection paradigm is so different from the actual task that it makes it difficult to make a relevant connection.

      • Why was the stimulation protocol under sedation different from during the wrist torque task? It would be really useful to describe the kind of involuntary movements evoked at different current intensities at the different spinal levels in awake, behaving animals. For instance, the higher amplitudes appear to just lock the arm into a full ulnar deviation. Such current intensities would be unlikely to be effective in enhancing movement in spinal cord injury. Thus, all the results for these amplitudes are somewhat irrelevant to therapeutic intervention. Similarly, does the moderate amplitude generate movement or muscle contraction?

      The stimulus evoked muscle responses changed their size depending on many variables, such as stimulus intensity, torque direction (i. e, voluntary muscle pre-activation in combination with other muscles activities), and the recording muscle. The stimulus threshold for each facilitatory and inhibitory effect is changed depending on these variables. Therefore, we did not aim to measure stimulus threshold independently. However, it was essential to map spinal somatotopic representation in relation with the site of the stimulus electrode for the experiment in Figure 4. Therefore, we delivered spinal stimulation with each electrode channel under anesthesia in order to capture muscle representation without concomitant voluntary descending drives in the intact monkey.

      As the reviewer indicated the importance, we agree to obtain the information of stimulus-evoked torques at each stimulus intensity while the wrist torque was neutral in the awake monkeys. In addition, we presented data of stimulus-evoked muscle responses and torques at each low, moderate, and high stimulus intensity while the monkeys’ cursor was maintained on the center target in Figures 2 and 7 (see the responses to previous comments).

      3) Please explain the term Spinal PD.

      Does the PD of the background EMG remain the same irrespective of the current intensity and site of stimulation? There is a decrease in background EMG amplitude in Fig. 2A and B with increasing stim amplitude. Can the authors please discuss this observation and how it would affect the efficacy of the spinal stimulation in facilitating descending inputs?

      Spinal PD is the preferred direction (PD) of facilitative evoked muscle responses (Facilitation) or suppressive evoked muscle responses (Suppression) that was calculated separately by the data obtained during the hold period for the peripheral targets. We added this explanation in the text (lines 146-149) and the legend of Figure 2D (lines 183-185).

      The amplitude of the background EMG changed with increasing current intensities, as the reviewer pointed out. Hence, it might be possible that the large ulnar-flexor torques due to the high stimulus currents had somewhat direction-biased effects on the required voluntary effort (i.e., for ulnar-flexor targets, less voluntary commands for ulnar-flexor muscles might be required under the support of stimulus evoked torque whereas for radial-extensor targets, more voluntary commands for radial-extensor muscles might be required under the opposed stimulus evoked torque). Nevertheless, we confirmed that the PD of the background EMG was consistent irrespective of the current intensity and stimulus site as presented in Figures 3A, 3B, 4B, and 4C (green polar plots). In addition, we showed that Spinal PD at high current was even opposite to the PD of background EMG, indicating that the magnitude of background EMG hardly explains the differences in the results between low to medium and high stimulus currents.

      Results Lines 146-149 “Significant PDs were observed in the 603 muscular conditions in 16 muscles for Facilitation (Spinal PD of Facilitation), 333 muscular conditions in 16 muscles for Suppression (Spinal PD of Suppression), and 1006 muscular conditions in 16 muscles for background EMG.”

      Legend of figure 2D Lines 183-185 “ Spinal PD (top panels) and Background EMG PD (middle panels) show the PDs calculated by the magnitudes of Facilitation or Suppression of PStEs and by the magnitudes of background EMG activity, respectively, during the hold period for the peripheral targets.”

      4) Line 546 - The authors speculate that higher current intensities resulted in direct activation of motoneurons. While this is certainly possible, It seems somewhat do the authors see proof of this in their data? Latency measurements?

      We newly analyzed the results for onset latency of PStEs as Figure 8, and added the relating descriptions in the Results, Discussion, and Materials and Methods of the revised manuscript. Please refer the responses to the 2nd comment from Reviewer 1. The results showing the latency shortening at the high currents support our statement that higher current intensities result in direct activation of ventral root axons.

      5) Line 589 - "However, in the rostrally-innervated muscles, the PDs for facilitation effects from caudal sites were opposite to those for background EMGs (Figure 4G, bottom-left panel), suggesting the direct activation of motor nerves." Can the authors clarify how they infer direct activation of motoneurons from the discrepancy between spinal PD and background EMG PD?

      We revised the Discussion as follows:

      Lines 702-710 “However, an exception was observed in some cases of rostrally-innervated muscles that showed facilitation effects. The Spinal PDs for facilitation in the rostrally-innervated muscles from caudal sites were opposite to those for background EMGs (Figure 4G, bottom-left panel). The magnitude of these responses was quite small (Figure 4E, left panel), but this feature of responses was similar to the response at higher current (Figure 3F, lower panel). These results suggest that some motoneurons of rostrally-innervated muscles may not receive excitatory ascending inputs from afferents of the caudal part of the spinal site. Although there is a considerable distance between them, current targeting to the caudal site might spread to ventral roots of rostrally-innervated muscles.”

      • I wonder why the authors did not look at the effect of spinal stimulation-evoked EMG and torque during the movement of the cursor? This could be used to determine the parameters that improve the performance of the task, by either increasing the speed or decreasing the effort required to perform the task.

      As the aim of this study was to reveal fundamental characteristics of descending commands on stimulus effects, we systematically and quantitatively explored evoked motor outputs, but did not directly investigate how the spinal stimulation improves the motor task to suggest a therapeutic interventional approach.

      For the analyses shown in Figure 7, we have shown the data of evoked torques, instead of the movement of the cursor, and concluded that the magnitudes and directions of evoked torque change depending on the current intensity and direction of voluntary torque production.

      • I wonder if the current dataset allows the generation of a map that shows the lower and upper limits of current intensity that result in facilitation of descending inputs for each muscle, at each stimulation location. Additionally, is this map stable across days/sessions.

      In the present study, we showed that descending commands amplified the functions of intraspinal neural elements regardless of stimulus sites (Figures 4G and H). In addition, we revealed that the current of 150-1350 μA boosted torque production in a direction corresponding with the direction of voluntary torque production (Figure 7C and F).

      Since it took many days to get these data with various stimulus conditions (stimulus current and site), we could not compare motor outputs to spinal stimulation in the same stimulus condition across days/sessions. Future studies will be needed to investigate the stability of motor outputs. We add this issue in discussion as follows:

      Lines: 750-752 “However, the effectiveness of subdural stimulation in controlling dexterous hand movements and the long-term stability of motor output need to be determined in future studies.”

      Reviewer #3 (Public Review):

      1) To characterize the effects of stimulation, stimulation was first delivered during an anesthetized experiment to map the evoked responses from each electrode. A major result of the paper is that the level of background activity affects the response to stimulation. It would be interesting to see these baseline responses to stimulation in awake monkeys while they were sitting quietly and not attempting a task to see if these align well with the anesthetized responses.

      As we had similar comments from Reviewer 2, we presented additional data of the evoked muscle responses and evoked torques during the hold period for the center target where the wrist torque production was not intended in awake monkeys (Figure 2A-C, Figure 5A-C, Figure 6A-D and Figure 7B-D). These data support our results that descending commands amplify the function of intraspinal elements. Please refer the responses to the 2nd comment from Reviewer 2 for the revisions to the text.

      On the other hand, the currents and frequencies of subdural spinal stimulation used in the anesthetized monkeys were different from those in awake monkeys. Thus, we could not compare the evoked motor outputs between anesthetized and awake conditions in present study.

      2) To understand the coordinated effects of stimulation across muscles, the authors present wrist torque data in Figure 7. These data are certainly important from a functional perspective and provide some information about coordination, but additional detail about coordination across muscles would be helpful throughout the paper. Currently, most of the results are presented on a per-muscle basis but don't describe whether there were (un)coordinated responses across muscles. For example, was there co-contraction of agonists or antagonists during stimulation? Increased activity of multiple antagonists could potentially lead to increased joint stiffness or fatigue without resulting in an increase in joint torque at the wrist.

      As you suggested, the inter-muscular relationship is another aspect of important information to comprehend the coordination of forearm muscles. Based on our data, the monkeys properly engaged each muscle as agonist with following anatomical constraint. We found antagonistic voluntary contraction to be quite rare or mostly non-dominant even during high intensity electrical stimulation, suggesting that the stimulus evoked responses of each muscle were independent of the voluntary activation (i.e., background EMG) of antagonistic muscles. We added these results in Figure 7-figure supplement 1 and the relating descriptions in the text as follows:

      Results Lines 460-467 “During the 8-directional torque task, the monkeys properly engaged each muscle as agonist (Figure 7-figure supplement 1). We found the antagonistic voluntary contraction were quite rare or mostly non-dominant even during high intensity electrical stimulation. There was a tendency that the magnitude of PStEs was stronger in agonists and weaker in antagonists at low and medium currents (Figure 7-figure supplement 1A and B). On the other hand, stimulation at high currents tended to induce large magnitudes of facilitation effects for all targets irrespective of agonist and antagonists (Figure 7-figure supplement 1C).”

      Legend of Figure 7-figure supplement 1 Lines 1179-1189 “Figure 7-figure supplement 1. Subdural spinal stimulation simultaneously evoked facilitative and suppressive effects in multiple muscles and activated synergistic muscle groups. (A-C) StTAs of rectified EMGs in five wrist muscles during the hold period for the center and the 4 peripheral targets. Each polar plot was normalized by the maximum value of each muscle. Each example in (A-C) corresponds to the cases of Figure 7B-D, respectively. At low and medium currents of stimulations, large magnitudes of PStEs were observed in the muscles with high background EMG. For instance, stimulation given at the flexion directed target in (B) strongly facilitated wrist flexor muscles (e.g., FCR, PL and FCU), while stimuli at the extension directed target strongly facilitated wrist extensor muscles (e.g., ECR and ECU). On the other hand, at high current of stimulation, the magnitudes of PStEs hardly changed regardless of the magnitudes of background EMGs and the directions of voluntary torque.”

      Discussion Lines 644-650 “The inter-muscular relationship characterized by the PDs of background EMGs in the wrist muscles (Figure 7-figure supplement 1) demonstrate that the monkeys consistently engaged each muscle as agonist, and that antagonistic voluntary contractions were rare irrespective of stimulus currents (see polar plots of background EMGs of Figure 7-figure supplement 1A-C). This result indicates that the presumed different activation in the spinal excitatory and inhibitory interneurons at different current intensity is not supported by the change of wrist torque production strategy.”

      3) Authors infer from the consistent ulnar wrist torque during high amplitude stimulation that these responses are likely to direct activation of the ventral motor pathway rather than activation through the dorsal sensory pathway and spinal circuitry. Is there any evidence in the EMG data (e.g. decreased latency, more consistent pulse-to-pulse amplitude of evoked EMG responses) to further support this finding?

      We added the results of the onset latency of PStEs as Figure 8, and the related description in the Results, Discussion, and Materials and Methods. The results showing the decreased latency at high stimulus current supports our argument that stimulus-evoked muscle response at the high currents resulted from the direct activation of ventral motor pathways. Please refer the response to Reviewer 1 for the revisions to the text.

  4. Oct 2022
    1. Author Response

      Reviewer #1 (Public Review):

      Overview

      In this work, the authors set to study the effects of topographic connectivity in a hierarchical model of neural networks. They hypothesize that the topographic connectivity, often observed in cortical networks, is essential for signal propagation and allows faithful transmission of signals. To study the effects of topographic connectivity on the dynamics, the authors consider a network composed of several layers. Each layer is a recurrent neural network with excitatory and inhibitory sub-populations. The excitatory neurons in each layer enervate a sub-population of the following layer. The receiving excitatory sub-population targets a specific group in the next layer and so on. This procedure leads to separate channels that carry the inputs through the network. The authors study how the degree of specificity in each targeted projection, called ’modularity,’ affects signal propagation through the network.

      The authors find that the network reduces noise above a critical level of network modularity: the deep layers show a clear separation of an active channel and inactive channels, despite the noisy input signal. They study how different dynamical and structural properties affect the signal propagation through the network layers and suggest that the dynamics can implement a winnertakes-all computation.

      We thank the reviewer for the concise summary of our work.

      Strengths and novelty

      Topographic projections, in which sub-populations of neurons target specific cells in efferent populations, are common in the central nervous system. The dynamic and computation benefits of this organization are not fully understood. With their simple model, the authors were able to quantify the amount of topographic structure and selectivity in the network and study its impact on the network’s steady-state. In particular, a bifurcation point suggests a qualitative difference between networks with and without sufficient topographic modularity. The theoretical analysis in the paper is rigorous, and the mean-field study shows good agreement with computer simulations of the model.

      We thank the reviewer for acknowledging the rigor of our work both in terms of theory and simulations.

      The authors describe simulation results of networks with different dynamical properties, including rate-based networks, integrate-and-fire neurons, and more realistic conductancebased spiking neurons. All simulations exhibit similar qualitative behavior, supporting the conclusion that the behavior due to structural modularity will carry to more complex and biologically relevant neural dynamics.

      Overall, the authors convince that the topographic structure of the network can lead to noise reduction, given that the input to the network is provided as distinct channels.

      Weaknesses

      The authors support their hypothesis and show a relation between topographic connection and noise reduction in their model. However, I find the study limited and struggle to see the impact it will have on the field. The paper is purely theoretical; it does not provide any physiological evidence that supports the conclusion. On the other hand, and this is the key issue, I do not find real theoretical insights in this work. In the following, I elaborate on why I hold this opinion.

      We understand the reviewer’s point and therefore significantly extended our theoretical results and their conclusions in the revised manuscript (see below). We are confident that the revised manuscript provides the theoretical insights that the reviewer was asking for.

      The hypothesis is that topographic projections in cortical areas allow faithful signal propagation. However, as the authors point out, reliable transmission can be achieved in other ways, such as by direct routing of information (lines 17-19). Furthermore, denoising can be accomplished by a simple feedforward network (e.g., ref 38) without E/I balance and with plasticity rules that do not require topographic connectivity. Thus, I find the computational model not well motivated.

      The reviewer mentions an important point that has not been sufficiently addressed in the previous version, namely the distinguishing feature of our model. Direct routing is indeed a simple way to transmit signals, but without the possibility of denoising them. The reviewer is also right that the denoising solution in the work by Kadmon and Sompolinsky (ref 38) does not require any topographic connectivity. However, their model does not constrain feedforward connections between layers in any way. In particular, neurons can excite and inhibit other neurons (i.e., ignoring Dale’s law) in downstream layers so that feedforward input covers a much wider range, thereby extending the activity range of the target neurons and generating fixed points more easily. In the biologically more plausible setting that we study (excitatory and inhibitory populations, excitatory background input and excitatory feedforward connectivity), we find that recurrent inhibition is crucial to compensate the excitation from previous layers and the external input. Only if the recurrent inhibition is sufficiently strong does the topographic organization of feedforward connections enable denoising. This is addressed in a new section ”Critical modularity for denoising” of the revised manuscript, where we also study the case of no recurrent connectivity and excitatory recurrent connectivity (for further details, see answers below). We further extended our discussion on other forms of signal transmission and denoising (see lines 489-498).

      The task studied here is a simple classification of static inputs: the efferent readout needs to identify the active channel. Again, this could be achieved by a single layer of simple binary neurons [Babadi and Sompolinsky 2014]. The recurrent connectivity and E/I balance suggest that dynamics should play an essential part in the model. However, the task is not well suited for understanding the role of dynamics.

      We appreciate the reviewer’s comments and completely agree. The simple classification task we explored can certainly be performed by simpler network architectures, such as the one studied in Babadi and Sompolinsky. However, as discussed above, this only works if the feedforward connectivity is unconstrained. In the case of Babadi and Sompolinsky, there is an expansion of inputs into a higher dimensional space through random connectivity drawn from a centered Gaussian distribution and appropriately chosen readout weights. This scenario is not compatible with the well-established biological constraints mentioned above that our model takes into account. In the new section ”Critical modularity for denoising” of the revised manuscript we show that recurrent inhibition is necessary to enable signal transmission and denoising under these constraints. The inhibition thereby not only generates competition between input channels but it also allows the modules to track their input very rapidly (as originally demonstrated by van Vreeswijk and Sompolinsky in 1996). To demonstrate this point and emphasize the relevance of dynamics, we added a new signal reconstruction task in the new section ”Reconstruction and denoising of dynamic inputs”, where we show that our model can faithfully track and denoise spatially encoded time-varying inputs.

      The authors perform a mean-field study to explain how modularity affects signal propagation. At the heart of their argument is that the E/I network exhibit bistability. However, bistability can be achieved by an excitatory population with a threshold [Renart et al., 2013]. The role of the inhibitory population does not seem crucial for the task and questions the motivations for this analysis.

      We thank the reviewer for raising this important point which we address in the section ”Critical modularity for denoising” of the revised manuscript. The reviewer is correct that bistability can be obtained in a purely excitatory network, and the modular topographic connectivity in our work essentially renders the stimulated pathway excitatory. The important feature of our model, however, is that the non-stimulated pathways remain inhibitory to get a distinction between stimulated and non-stimulated populations and the denoising feature. This is only achieved by recurrent inhibition that causes competition between pathways. Our analyses show that, for networks without recurrent connections or even excitatory recurrent connections, the network lacks mechanisms to compensate the excitatory feedforward and external background input. In these cases, all populations show high (and synchronous) activity and no classification and denoising can be achieved. Therefore, the revised manuscript unambiguously demonstrates the critical role of recurrent inhibition.

      Active and inactive channels are decided by the two stable states of the network: the high and the low activity regimes. However, noise fluctuations and their propagation through the network may have a prominent role in the overall dynamics. I find that noise fluctuation analysis is bluntly missing in this work.

      Fig. 7b of the previous version showed the stability of theoretically predicted fixed points using numerical fluctuation analysis around the fixed points. We apologize for not having made this sufficiently clear, and have therefore updated the caption of Fig. 7 to emphasize this point and extended the subsection ”Fixed point analysis” of the Methods detailing our approach. Furthermore, we fully agree with the reviewer that fluctuation analyses are important to understand the dynamics of our system. Therefore, we performed a theoretical fluctuation analysis in the new Figure 8 and the extended Appendix B of the revised version. This extended theory shows that competition induced by recurrent inhibition stabilizes the low activity state of non-stimulated sub-populations such that fluctuations cannot build up and propagate across layers, in line with the previously presented numerical simulation results.

      The main finding is a critical level of modularity, m= 0.83, above which the network shows denoising properties of silencing inactive channels and increasing the mean activity of active ones. However, the critical modularity is numerically demonstrated and is not derived theoretically. For a theoretical insight into this transition between denoising and mixing properties of the network, I would have liked to see a more rigorous discussion on the critical value. What does the critical point depend on? The authors show that the single-neuron dynamics do not affect the critical value, but what about other structural elements such as the relative efficacies of the E/I and the feedforward connectivity matrices? Do the authors suggest that m=0.83 is a universal number? I expect a more detailed analysis and discussion of this core issue in a theoretical paper.

      We fully agree with the reviewer and are grateful that this point was brought up. The initial submission did not provide a sufficent or deep enough discussion on which features determine the critical modularity and it certainly is important to do so. We also apologize that our presentation was misleading and suggested a universal number for the critical modularity. Unfortunately, there is no closed form expression for the critical modularity for the non-linear activation functions shown in the previous version. We therefore added a new analysis with a fully tractable piecewise linear activation function that allows us to derive a closed-form solution for the critical modularity. The new section ”Critical modularity for denoising” and Appendix B show the results of this analysis and discuss the various parameters that affect the value of the critical modularity. In short, the reviewer was completely right that the critical modularity depends on a number of connectivity parameters as well as single-neuron properties. In particular, our theoretical results show that recurrent inhibition is crucial for denoising.

      To conclude my main criticism, I believe that a theoretical paper should offer a more in-depth analysis and discussion of the core ideas presented and not rely mainly on simulations. For example, to provide theoretical insight, the authors should address central questions such as the origin of the critical modularity, the role of the recurrent balance connectivity, and how the network can facilitate computations other than winner-takes-all among channels. Alternatively, if the authors aim to describe a neural dynamics model without deep theoretical insights, I would expect to see physiological evidence supporting the suggested dynamics.

      We are very grateful for the reviewer’s criticism and believe the manuscript has substantially improved as a consequence. We are confident that our revised manuscript, by addressing these issues and extending the theoretical insights, now provides a much more thorough and comprehensive understanding.

      Conclusions

      The model studied by the authors is novel and provides a valuable way of exploring the effects of modularity and topographic connectivity on signal propagation through hierarchical recurrent neural networks. However, the study lacks theoretical insights into cortical circuit functions in its current version. I believe that for this work to impact the field, it needs to show further analysis and not rely on a numerical study of the model with limited theoretical derivations.

      Reviewer #2 (Public Review):

      This manuscript puts forward a new idea that topography in neural networks helps to remove noise from inputs. The neural network consists of multiple stages. At each stage, the network is structured to be balanced in terms of the strength of inhibitory and excitatory signals. Because of topography, the networks become ”dis-balanced” and receive more recurrent excitatory signals locally for those regions that receive strong initial inputs. This leads to error correction. The main weakness in the manuscript is that the approach will only work for inputs that are constant-in-time. It is important to acknowledge this limitation in both the title and throughout the manuscript.

      We thank the reviewer for the concise summary of our work and for acknowledging its novelty. Given the importance of the issue raised by the reviewer regarding the nature of the input signals, in the revised manuscript we added a new section ”Reconstruction and denoising of dynamic inputs” in which we investigate more complex, time-varying inputs and demonstrate that the model, due to the balance between excitation and inhibition, is able to quickly follow, process and denoise the external inputs. There are of course limits to the signal frequencies which can be successfully denoised, which we discuss in the Supplementary Materials (see Figure 10 - supplement 1) and elaborate on in the Discussion, but these are roughly within the ranges found in Human psychophysics studies.

    1. Author Response

      Reviewer #1 (Public Review):

      In the article "Neuroendocrinology of the lung revealed by single cell RNA sequencing", Kuo et. al. described various aspects of pulmonary neuroendocrine cells (PNECs) including the scRNA-seq profile of one human lung carcinoid sample. Overall, although this manuscript does not have any specific storyline, it is informative and would be an asset for researchers exploring various new roles of PNECs.

      Thank you for appreciating the significance of the data presented. Our storyline focuses on the newly uncovered molecular diversity of PNECs and the extraordinary repertoire of peptidergic signals they express and cell types these signals can directly target in (and outside) the lung, in mice and human, and in health and disease (human carcinoid tumor).

      Major comments:

      The major concern about the work is most results are preliminary, and at a descriptive level, conclusions or sub-conclusions are derived from scRNA-seq analysis only, lacking in-depth functional analysis and validation in other methods or systems. There are many open-end results that have been predicted by the authors based on their scRNA-seq data analysis without functional validation. In order to give them a constructive roadmap, it would be better to investigate literature and put them in a potential or probable hypothesis by citing the available literature. This should be done in each section of the result part. The paper lacks a main theme or specific biology question to address. In addition, the description about the human lung carcinoid by scRNA-seq is somehow disconnected from the main study line. Also, these results are derived from the study on only one single patient, lacking statistical power.

      We agree that much of the data and analysis presented in the paper is descriptive and hypothesis-generating for PNECs, however we do not consider it preliminary. We focused on validating two key conclusions from the scRNA-seq analysis: PNECs are extraordinarily diverse molecularly (as validated by multiplex in situ hybridization and immunostaining) and they express many different combinations of peptidergic signals (and appear to package them in separate vesicles). From the lung expression profiles of the cognate receptors, we also predicted the direct lung targets of the dozens of new PNEC peptidergic signals we uncovered, and validated the cell target (PSN4, a recently identified subtype of pulmonary sensory neuron) of one of the newly identified PNEC signals (the classic hormone angiotensin) by confirming expression of the cognate receptor gene in PSN4 neurons that innervate PNECs and showing that the hormone can directly activate PSN4 neurons. The characterized human carcinoid provided evidence that during tumorigenesis, the amplified PNECs retain a memory (albeit imperfect) of the molecular subtype of PNEC from which they originated. As suggested by the Reviewer, we have provided more background in Results by adding additional citations from the literature to clarify the rationale for each analysis and what was known prior to the analysis. We feel that our paper provides a broad foundation for exploring the diversity and signaling functions of PNECs, and although each molecular type of PNEC and new PNEC peptidergic signal we uncovered and potential target cell in (and outside) the lung warrants follow up (as do the sensory and other properties of PNECs we inferred from their expression profiles), such studies will require the effort of many individuals in many labs studying both normal and disease physiology in mouse and human, and exploiting the data, hypotheses, approaches, and framework we provide.

      Reviewer #2 (Public Review):

      Pulmonary neuroendocrine cells (PNECs) are known to monitor oxygen levels in the airway and can serve as stem cells that repair the lung epithelium after injury. Due to their rarity, however, their functions are still poorly understood. To identify potential sensory functions of PNECs, the authors have used single-cell RNA-sequencing (scRNA-seq) to profile hundreds of mouse and human PNECs. They report that PNECs express over 40 distinct peptidergic genes, and over 150 distinct combinations of these genes can be detected. Receptors for these neuropeptides and peptide hormones are expressed in a wide range of lung cell types, suggesting that PNECs may have mechanical, thermal, acid, and oxygen sensory roles, among others. However, since some of these cognate receptors are not expressed in the lung, PNECs may also have systemic endocrine functions. Although these data are largely descriptive, the results represent a significant resource for understanding the potential roles of PNECs in normal biology as well as in pulmonary diseases and cancer and are likely to be relevant for understanding neuroendocrine cells in other tissue contexts.

      However, there are several aspects of the data analysis that are unclear and require clarification, most notably the definition of a neuroendocrine cell (points #1 and #2 below).

      1) Figure S1 shows the sorting strategy used for isolation of putative PNECs from Ascl1CreER/+; Rosa26ZsGreen/+ mice, and distinguishes neuroendocrine cells defined as ZsGreen+ EpCAM+ and "neural" cells defined as ZsGreen+ EpCAM-; the figure legend also refers to the ZsGreen+ EpCAM- cells as "control" cells. However, the table shown in panel D indicates that the NE population combines 112 ZsGreen+ EpCAM+ cells together with 64 ZsGreen+ EpCAM- cells to generate the 176 cells used for subsequent analyses. Why are these ZsGreen+ EpCAM- cells initially labeled as neural or control, but are then defined as neuroendocrine? If these do not express an epithelial marker, can they be rigorously considered as neuroendocrine?

      As explained above in the response to Essential Revision point 1, we define pulmonary neuroendocrine cells (PNECs) throughout the paper by their transcriptomic clustering and signatures, which includes the dozens of newly identified PNEC markers as well as the few extant marker genes available before this study (listed in Table S2). The confusion here arises from the two previously known markers (Ascl1 lineage marker ZsGreen, EpCAM) we used for flow sorting to enrich for these rare cells for transcriptomic profiling (Fig. S1). Although most of the cells with PNEC transcriptomic profiles were from the ZsGreenhi EpCAMhi sorted population (as expected), some were from the ZsGreenhi EpCAMlo sorted population. The latter resulted from the high EpCAM gating threshold we used during flow sorting, which excluded some PNECs with intermediate levels of surface EpCAM. Indeed, nearly all PNECs (> 95%) expressed EpCAM by scRNAseq, and there was no difference in EpCAM transcript levels or transcriptomic clustering of PNECs that were from the ZsGreenhi EpCAMhi vs. ZsGreenhi EpCAMlo sorted populations, as we now show in the new panels (C', C'') added to Fig S1C. This point is now clarified in the legend to Fig. S1C, and it nicely demonstrates that transcriptomic profiling is a more robust method of identifying PNECs than flow sorting based on two classical markers.

      2) Similarly, in the human scRNA-seq analysis, how were PNECs defined? The methods description states that these cells were identified by their expression of CALCA and ASCL1, but does not indicate whether they also expressed epithelial markers.

      Human PNECs were identified in the single cell transcriptomic analysis by the same strategy described above for mouse PNECs: by their transcriptomic clustering and signatures, which includes the dozens of newly identified PNEC markers as well as the few extant marker genes available before this study (listed in Table S2). In addition to expression of classic and new markers, the human PNEC cluster defined by scRNA-seq indeed showed the expected expressed of epithelial markers (e.g, EPCAM, see dotplot below), like other epithelial cells.

      3) The presentation of sensitivity and specificity in Figure 1 is confusing and potentially misleading. According to Figure 1B, Psck1 and Nov are two of the top-ranked differentially expressed genes in PNECs with respect to both sensitivity and specificity. However, the specificity of these two genes appears to be lower than that of Scg5, Chgb, and several other genes, as suggested in Figure 1C and Figure S1E. In contrast, Chgb appears to have higher specificity and sensitivity than Psck1 in Figures 1C and E but is not shown in the list of markers in Figure 1B.

      As explained above in the response to Essential Revision point 2, because different marker features are important for different applications, we have provided several different graphical formats (Figs. 1B,C, Fig. S1E) and a table (Table S1) to aid in selection of the optimal markers for each application. Fig. 1B shows the most sensitive and specific PNEC markers identified by ratio of the natural logs of the average expression of the marker in PNECs vs. non-PNEC epithelial cells (Table S1), and we have added a two-dimensional plot of this sensitivity and specificity for a large set of PNEC markers (new panel E of Fig. S1). The violin plots in Fig. 1C allow visual comparison of expression of selected markers across PNECs and 40 other lung cell types including non-epithelial cells (from our extensive mouse lung atlas in Travaglini, Nabhan et al, Nature 2020). Pcsk1 and Nov score high in the analysis of Fig. 1B because they are highly sensitive and specific markers within the pulmonary epithelium, and they are also valuable markers because they are highly expressed in PNECs. However, they appear slightly less specific in the violon plots of Fig. 1C (Pcsk1) and Fig. S1F (Nov) because of expression (though at much lower levels) in individual lung cell types outside the epithelium: Pcsk1 is expressed also at low levels in some Alox5+ lymphocytes, and Nov is expressed at low levels in some smooth muscle cells. Chgb is a new PNEC marker that did not make the cutoff for the list in Fig. 1B because it is expressed in a slightly higher percentage of non-PNEC epithelial cells than the markers shown, which ranked slightly above it by this metric (see Table S1).

      4) The expression of serotonin biosynthetic genes in mouse versus human PNECs deserves some comment. The authors fail to detect the expression of Tph1 and Tph2 in any of the mouse PNECs analyzed, but TPH1 is expressed in 76% of the human PNECs (Table S8). Is it possible that Tph1 and Tph2 are not detected in the mouse scRNA-seq data due to gene drop-out? If serotonin signaling by mouse PNECs is due to protein reuptake, as implied on p. 5, is there a discrepancy between serotonin expression as detected by smFISH versus immunostaining?

      It is always possible that the failure to detect expression of Tph1 and Tph2 in the mouse scRNA-seq dataset is due to technical dropout, however when we analyzed this in our other mouse PNEC scRNA-seq dataset obtained using a microfluidic platform and also deeply-sequenced (Ouadah et al, Cell 2019), we found similar values as in the previously analyzed dataset: no Tph2 expression was detected and only 3% (3 of 92) of PNECs had detected Tph1 expression, whereas 24% (22 of 92) had detected expression of serotonin re-uptake transporter Slc6a4. Because our mouse and human scRNA-seq datasets were prepared similarly and sequenced to a similar depth (105 to 106 reads/cell), the difference observed in Tph1/TPH1 expression between mouse (0-3% PNECs) and human (76% PNECs) is more likely a true biological difference. We also analyzed serotonin levels in mouse PNECs by immunohistochemistry (not shown) and detected serotonin in nearly all (~90%) embryonic PNECs but only ~10% of adult PNECs. Systematic follow up studies will be necessary to resolve the mechanism of serotonin biogenesis and uptake in PNECs, and the potential stage and species-specific differences in these processes suggested by this initial data.

      5) The smFISH and immunostaining analyses are often presented without any indication of the number of independent replicate samples analyzed (e.g., Figure 2B, Figure 3F, G).

      The number of samples analyzed have been added (the values for Fig. 2B are given in legend to Fig. 2C, the quantification of Fig. 2B).

      6) It would be helpful to provide a statistical analysis of the similarities and differences shown in the graphs in Figures 1E and G.

      We added a statistical analysis (Fisher's exact test, two-sided) of Fig. 1E comparing expression of each examined gene in the two scRNA-seq datasets (Table S4). We added a similar statistical analysis of Fig. 1G comparing the expression values of each examined gene by scRNA-seq vs smFISH (see Fig. 1G legend).

    1. Author Response

      Reviewer #2 (Public Review):

      SIGNIFICANCE: Movement is based on the coordinated activation and deactivation of muscle groups that depend on the timing and strength of firing of the motoneurons connected to them. Motoneuron recruitment ultimately depends on the activity of local interneurons. By difference to other CNS regions, the interneurons in the spinal cord controlling motor output display a very high diversity in genetics, anatomy, localization, and electrophysiological properties. Making sense of the interneuronal circuits that modulate motor output to the different muscles of the body has revealed to be quite complex. One technique proposed over 10 years ago is the use of retrograde transsynaptic-monosynaptic tracing with modified rabies virus injected in single muscles to define premotor connections to individual motor pools controlling single muscles. Using this technique, the original authors suggested that interneurons controlling flexors and extensors occupied different locations in the spinal cord. This idea was an extension of pioneering work from the Jessell lab at Columbia University demonstrating that positional identity determined input connectivity of motoneurons, at least from Ia afferents. This principle, if extended to premotor spinal interneurons would simplify mechanisms by which extensor and flexor interneuron networks could be connected and controlled. In this paper, the authors combine data from four independent groups to show this principle might not be correct. In other words, interneurons connected to individual motor pools are highly intermingled in the spinal cord. This raises the bar for understanding both the intrinsic organization principles of interneuron microcircuits in the spinal cord (if any) and how they develop their specific connectivity.

      STRENGTHS AND WEAKNESSES: The authors propose that the conflicting conclusions occur because technical differences. The technique is based on complementation of rabies virus glycoprotein (G) in specific targeted motoneurons infected with a glycoprotein deficient rabies virus (RVdG). The way G and RVdG are delivered to specific motoneurons controlling one muscle differ. Originally this was accomplished by co-injecting RVdG and an AAV-G vectors in the same muscle simultaneously. However as previously published by a different group and now confirmed by the authors, this approach also infects muscle sensory afferents capable of transynaptically labeling populations of interneurons in the spinal cord anterogradely. This results in the labeling of mixed interneuron populations through their output to specific motor pools and/or their input from primary afferents of the same muscle. To avoid this problem the authors used transgenic approaches to induce expression of G in all motoneurons (not sensory neurons) and obtain muscle specifity by injecting RVdG in single muscles. Unfortunately, there is no single gene that selects only motoneurons for transgenic expression and tools for intersectional approaches were not available. Therefore, G is unavoidably expressed in some interneurons, in addition to motoneurons. These interneurons could be an additional source of transsynaptic jumps if they receive the RVdG from the motoneurons, raising the possibility that some labeling is the result of disynaptic, not monosynaptic, connections. The authors try to control for this possibility by comparing two different cre lines to direct G expression in motoneurons and each with different types of additional interneurons targeted. The results in both lines are similar raising confidence in the main conclusions. Moreover, the authors indicate that some motoneurons outside the intended pools are also labeled because motoneuron-to-motoneuron connections. In other words, the starter neurons for tracing monosynaptic connections are not as homogeneous or specific to a single motor pool as desired. This is acknowledged as a current limitation and is addressed in the discussion by proposing possible alternative approaches. Despite this weakness, the main conclusion of the study remains strong.

      A second technical issue raised by the authors is that of possible leakage during injection in the muscle. To reduce this possibility the authors reduced the volume injected compared to previous studies from 5 to 1 microliter and checked post-hoc the injection site for possible leakage (these are neonatal pups with muscles volumes of 2-3 microliters or less). In addition, they make a nice comparison injecting different titers of RVdG to demonstrate that variations in the number of infected motoneurons of one or two orders of magnitude does not alter the main conclusion on the topographic positioning of the interneurons connected to different motor pools. One weakness is that the exact numbers of motoneurons that start the tracing is impossible to evaluate and this prevents accurate comparisons across experiments. This is because cell death induced by the rabies virus is to be expected and only a variable subset of surviving neurons can be identified. Currently, this is an unavoidable characteristic of the technique. Nevertheless, there is a nice correlation between titer, surviving motoneuron numbers and interneurons labeled in number and location. The large number of replicates and their consistency further raises confidence in the authors claim of high specificity and replicability during injection, despite variable numbers of recovered motoneurons. The authors conclude that it is very important to check for the number and localization of starter motoneurons to confirm specificity after the injections. This reviewer totally agrees and is surprised this was not done in the experiment in which they try to replicate previous experiments by co-injecting in muscle AAV-G and RVdG.

      We agree with the reviewer that ideally the starter cells should have been identified in all the experiments. However, data were collected independently, at very different times in each of the labs involved, with different initial aims and there was no prior agreement on the details of injection and post-processing. The realization that we had similar experiments, performed with different techniques, led us to pool our observation together in order to give a picture of the distribution of premotor interneurons, the leitmotif of this paper, and a great effort has been devoted to ensure that all the cell counting procedures were uniform across labs. The lack of initial coordination is the reason why in some datasets the starter cells have not been quantified. Furthermore, in the previous version we had wrongly indicated that motor neurons analysed at Glasgow University were identified by ChAT expression. We have corrected this in the current version, since for those experiments motor neurons were only identified by location within any of the motor nuclei and size (diameter greater than 30 µm). On the other hand, since we have started comparing results, we have agreed on a uniform way of analysing and representing the data using the same normalization criteria. Therefore, while we cannot compare quantities like the ratio of secondary and primary infected cells for all the experiments (but we do it for the subset in which this is possible, see new Figure 4-Figure supplement 3 and comment number 3 below), the positional analysis has been done following the exact same criteria.

      One final problem with interpretation is that, for yet unknown reasons, the technique is dependent on the age of the animal and cannot be implemented in mature animals. Therefore, the connectivity revealed here is the one present during the first few days of life in the mouse. This is a period of significant synaptogenesis and synaptic selection and de-selection. The authors are encouraged to discuss further this limitation when interpreting interneuron connectivity in adult from studies in neonates.

      A very important point, see detailed answer to comment number 10 below.

      In summary, the authors have introduced new technical variations to trace premotor interneurons and challenge a major idea in the field, that is that interneuron connected to flexors and extensors occupy different positions in the spinal cord. The technique has still some weaknesses. 1) possibility of disynaptic jumps, 2) accurate identification of starter neurons, 3) restriction to neonates. However, the authors strengthen their conclusions considering alternatives and introducing a large number of controls (two cre lines, different titers, large number of animals analyzed, large numbers and consistency of replicates, independent counting in different labs... etc). This is an important and very useful study that suggests topographic localization is not a major organizing principle for interneuron connections with motor pools. It remains to be investigated then what are the organizational mechanisms that couple interneurons to functional distinct motor pools.

      The weaknesses summarized in the paragraph above are addressed in detail below in the answers to the specific comments.

      Reviewer #3 (Public Review):

      The manuscript by Ronzano et al presents a rigorous neuroanatomical study to convincingly demonstrate that there is no difference in the medio-lateral organization of flexor and extensor premotor interneurons. The study uses monosynaptic restricted transsynaptic tracing from ankle flexor and extensor muscles with several (4) strategies for delivery of the G protein complement and delta G Rabies virus, and additional (2) variations that consider titer and cre line. The authors went to great lengths here in attempt to replicate prior studies for which they had initial conflicting findings. Further, the experiments are performed in laboratories in four different locations. The variations on the Rabies and complement delivery, regardless of lab performing the experiment and analysis, all converge on the same conclusion. Aside from the primary conclusion, the paper can be used as a manual for anyone considering transsynaptic tracing as it details the benefits and caveats of each strategy with examples.

      The initial conflicting results put the onus on the authors to demonstrate where the divergence occurred. The authors took a highly comprehensive approach, which is a clear strength of the paper. All of the data is fully and transparently presented. Standardizations and differences between experiments run or analyzed in each lab are well laid out. Figure 1 and Table 2 provide a great summary of the techniques and their limitations. These are also well thought out and discussed within each section of results.

      The only thing missing is a likely explanation for the differences seen. Although the authors made several attempts to provide such explanation, the question remains - how did two groups who published independent studies using different strategies demonstrate flexor and extensor separation in the dorsal horn, when this study, using several strategies in multiple labs, show that the premotor neurons are in complete overlap? Additional small differences in methodologies could be identified which are not discussed and may provide potential explanations, but only for discrepancies in results of single techniques, not for all of the strategies used. The lack of reason for the discrepancy with prior studies despite the extensive efforts is unsatisfying, but, most importantly, the experiments were rigorously performed and the data support the conclusions presented.

      We thank the reviewer for the positive comments and we share the opinion that the discrepancy is unsatisfying. While we propose possible explanations, despite the extensive efforts, we could not provide a definite answer, but we hope that making our work public and all the data available, will trigger even more efforts from the rest of the community.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a very interesting paper. In this manuscript Hendi et al. examined how two independent mechanisms, Wnt signalling and gap junction control two critical aspects of neuronal tiling. Here they have quite elegantly used two neighboring GABAergic motor neurons to show while one specific C. elegans Wnt-homolog, EGL-20, regulates the axonal tiling; innexin UNC-9-mediated gap junction at a very specific position on these axons regulate the chemical synapse tiling on these axons. They also performed multiple experiments to show that the UNC-9 gap junctions controls chemical synapse tiling independent of their channel activity.

      Overall, the paper is interesting and would be of general interest for many neuroscience researchers, specifically to those who are studying neuronal tiling and the role of gap junctions. However, there are some concerns with this study.

      Major concerns:

      1) Authors here only looked at the tiling of axons and presynaptic clusters in DD5/DD6 axons. However, these neurites get transformed in L1 from dendrite to axon and subsequently the nature of the synaptic termini also changes from postsynaptic to presynaptic. To say that egl-20/UNC-9 specifically control axonal tiling and GABAergic presynaptic tiling the authors must check the dendritic tiling and tiling of postsynaptic termini. Specifically, a) does UNC-9 channels also affect the postsynaptic patterning in L1? b) what is the time of unc-9 puncta formation? Is it present in the L1 stage or appears at L2 stage only after the fate switch from dendrite to axon? c) does egl-20 also control dendritic tiling in L1?

      We thank the reviewer for their insightful comments. As described in our original manuscript, we could not check the dendritic tiling between DD5 and DD6 at L4 stage due to the inconsistent labeling of DD6 dendrite with our fluorescent marker. As an alternative method, we measured the length of the (ventral) posterior dendrite of DD5 and showed that it is significantly longer in the egl-20(n585) mutant than in wild type at L4 stage. We also measured the length of postsynaptic domains in the DD5 posterior dendrite and showed that it was also longer in the egl-20(n585) mutant than wild type. Furthermore, we show that the UNC-9 localization at the tip of DD6 dendrite is unaffected in the egl-20(n585) mutant, despite the extension of postsynaptic domains. From these observations, we suggested that postsynaptic spines are distributed throughout the dendrite of DD5 in the egl-20(n585) mutant, and it is not regulated by unc-9.

      In the revised manuscript, we included images of wild type and egl-20(n585) animal in which ACR-12::GFP is co-labeled with mCherry::CAAX. In these strains, the expression of mCherry::CAAX and ACR-12::GFP is not detectable in DD6 in most animals. Using these strains, we confirmed that the DD5 postsynaptic sites are present throughout the dendrite of DD5 in both wild type and egl-20(n585) mutant backgrounds (Figure 1- figure supplement 1).

      a) Unfortunately we were not able to quantify postsynaptic patterning at L1 due to the low expression of ACR-12::GFP and mCherry::CAAX at L1 stage.

      b) UNC-9::7×GFP puncta are present at the tiling border of DD neurons on both ventral and dorsal sides throughout the development. In the original manuscript, we only showed the UNC-9 localization at the dorsal side. We believe our limited description of UNC-9 in the dendrites has caused confusion regarding the phenotypes of DD5 posterior dendrite and postsynaptic sites. In the revised manuscript, we have updated the images of UNC-9::7×GFP to show that the puncta are present in both axons and dendrites (Figures 2F-H).

      In the revised manuscript we also show that UNC-9 puncta are present at DD tiling border in L1 animals. We have included images of UNC-9::7×GFP at L1 at the axonal and dendritic tiling borders of DD5 and DD6 in both wild type and egl-20(n585) animals in Figure 2- figure supplement 5.

      c) As described above, we could not quantify dendritic tiling at L1 due to the low expression of our fluorescent makers at the L1 stage.

      2) Authors have shown that the previously known regulators for gap junction formation, NLR-1 and ZOO-1, do not regulate UNC-9 gap junction puncta on DD5/DD6 axons. Since they are cell adhesion molecule and tight junction component, respectively, presynaptic tiling should be checked in these mutants as well. Also, it is not clear whether these proteins are expressed in DD5/DD6 neurons at all. Since, NLR-1 has previously been shown to regulate unc-9 puncta in nerve ring, expression of these genes in DD5/DD6-neurons should be checked before making these conclusions.

      In the revised manuscript, we have included the presynaptic tiling quantification in zoo-1(tm4133); egl-20(n585) and nlr-1(miz202) egl-20(n585) mutants which showed no significant presynaptic tiling defects (Figure 2- figure supplement 1). We also cited a paper (Taylor et al., 2021) that described the expression of zoo-1 and nlr-1 in the DD neurons.

      3) Authors assumed that the relevant gap junction to be an UNC-9 homotypic homomeric channel, but DD neurons also express several other innexins (inx-1, inx-2, inx-10, inx-14 and unc-7). This raises the possibility that unc-9 channel could be heteromeric in nature. Effect of some other expressed innexins on synaptic tiling apart from unc-7 should also be tested.

      We thank the reviewer for their comment. As per their advice, we tested four additional innexins (inx-1, inx-2, inx-10, and inx-14) which have been reported to be expressed in DD neurons and examined their potential role in presynaptic tiling in egl-20(n585) mutant background. We found that none of them showed significant presynaptic tiling defect. In the revised manuscript, we have included this data in Figure 2E.

      4) Effect of unc-9(Del18) and unc-1 double mutant should be tested.

      We knocked out unc-1 using CRISPR/Cas9 genome editing in the egl-20(n585); unc-9(syb3236 [unc-9(ΔN18)]) mutant background and observed no significant presynaptic tiling defect compared with egl-20(n585); unc-9(syb3236 [unc-9(ΔN18)]), which further strengthen our model that the gap junction channel activity of UNC-9 is dispensable for its function in presynaptic tiling. We have included this data in Figure 5D.

      5) Authors have acknowledged the need to study the role of UNC-9 gap junction channels in maintaining the presynaptic pattering. This reviewer appreciates that idea and suggests the authors check whether late expression of UNC-9 is sufficient to rescue the presynaptic pattering defect observed in egl-20; unc-9 double mutant animals.

      We thank the reviewer for their comment. We conducted late rescue experiment using a heat shock promoter to express unc-9 at L2 stage after the presynaptic tiling competes. We did not observe significant rescue in presynaptic tiling defect in two independent transgenic lines of Phsp::unc-9. While we understand that this does not deny the function of unc-9 for the maintenance of presynaptic tiling, this result is consistent with the idea that unc-9 is required for the establishment of presynaptic tiling. We have included this data in Figure 2- figure supplement 4.

      Reviewer #3 (Public Review):

      This interesting paper from Hendi et al. describes a novel mechanism governing synaptic tiling that depends on expression of a gap junction protein at the border between adjacent presynaptic domains of neighboring neurons. The authors define the role of innexin UNC-9 in establishing the spatial arrangement of synapses in adjacent C. elegans GABA motor neurons. They show that axonal tiling is controlled by Wnt signaling. However, synaptic tiling is preserved when axonal tiling is disrupted in egl-20/Wnt mutants. Synaptic and axonal tiling are both disrupted in egl-20; unc-9 double mutants, suggesting these two processes are controlled through distinct molecular mechanisms. The authors find that UNC-9 is localized to the border between axons of adjacent GABA neurons and provide evidence that the function of UNC-9 in tiling does not require its channel function. The experiments are made possible by the development of a new system for labeling adjacent GABA motor neurons that will also be of general use to the field. The studies rule out requirements for either gap junction activity or several other genes previously implicated in gap junction function/localization, but fall short of clearly defining mechanism. Instead, the study provides additional support for channel-independent structural roles of gap junctions in the nervous system.

      The study's conclusions are generally well-supported by the data but more clarification is required in some areas:

      1) Overlaps between DD5 and DD6 dendrites are not evaluated directly. The authors show the extent of labeling in the DD5 dendrite. This should be clarified.

      We thank the reviewer for their comment. As described above, we could not directly quantify dendritic tiling defect between DD5 and DD6 neurons due to the inconsistent expression of mCherry in the dendrite of DD6. Alternatively, we measured the length of DD5 posterior dendrite in wild type and the egl-20(n585) mutant, and found a significant increase in the DD5 posterior dendrite length in the egl-20(n585) mutants. In the revised manuscript, we have edited the text to more clearly explain the defect of DD5 posterior dendrite.

      2) The authors suggest UNC-9 establishes axonal tiling as early as L2 stage, immediately following DD remodeling. However, no data is shown for UNC-9 localization at this developmental stage. It would also be interesting to know whether UNC-9 performs a similar role prior to remodeling, or if UNC-9 itself undergoes redistribution during the remodeling process.

      We thank the reviewer for their comment. As described above, we acknowledge our initial description of UNC-9 localization in the DD neurons was not sufficient. UNC-9 is present at both the axonal and dendritic tiling borders between DD5 and DD6 neurons throughout the larval development.

      In the revised manuscript, we included UNC-9 localization at the axonal and dendritic tiling borders between DD5 and DD6 in both wild type and egl-20(n585) animals at the L1 stage (Figure 2- supplement figure 5). However, we could not determine whether egl-20(n585); unc-9(e101) mutant exhibits presynaptic patterning defect in the ventral axons prior to remodeling at the L1 stage due to the low expression of our axonal and presynaptic markers at L1 stage.

      3) Based on the representative image, UNC-9 abundance appears reduced in unc-104. The authors should quantify.

      We thank the reviewer for their comment. In the revised manuscript, we quantified the signal intensity of UNC-9::7×GFP at the DD5-DD6 axonal tiling border in wild type, egl-20(n585), unc-104(e1265), zoo-1(tm4133) and nlr-1(gk366849). We found that the fluorescent intensity of UNC-9::7×GFP was indeed slightly lower in egl-20(n585) and unc-104(e1265) mutants compared with wild type animals. This result implies that egl-20 and unc-104 have a minor role in UNC-9 localization. Nevertheless, the UNC-9 puncta are always present in all genotypes we examined. The quantification is included in Figure 2- figure supplement 6, and we suggest that the weak presynaptic tiling defect in the egl-20 single mutant could be explained by this reduction of UNC-9 localization (lines 284-285).

      4) The authors show the distribution of muscle NLG-1 mirrors that of RAB-3. While this suggests the altered distribution of RAB-3 reports on synaptic rearrangement, this conclusion would be strengthened by analysis of an active zone marker.

      We agree with the reviewer that examining the co-localization of RAB-3 with an active zone protein would strengthen our conclusion. As such, we expressed BFP::RAB-3 under the DD specific promoter, flp-13, in a transgenic marker strain (wyIs292) that expresses the active zone protein, UNC-10::tdTomato under the GABAergic promoter, unc-25, and NLG-1::YFP expressed under the body wall muscle promoter, unc-129dm (Maro et al., 2015). Using this strain, we show that RAB-3 co-localized with UNC-10 and apposed to the postsynaptic NLG-1 in both wild type and the egl-20(n585); unc-9(e101) mutant. The representative images are included in Figure 2- figure supplement 2.

    1. Author Response

      Reviewer #1 (Public Review):

      The stated goal of this research was to look for interactions between metabolism, (manipulated by glucose starvation) and the circadian clock. This is a hot topic currently, as bi-directional links between metabolism and rhythmicity are found in several organisms and this connection has important implications for human health. The authors work with the model organism Neurospora crassa, a filamentous fungus that has many advantages for this type of research.

      The authors' first approach was to assay the effects of glucose starvation on the levels of the RNA and protein products of the key clock genes frq, wc-1, and wc-2. The WC-1 and WC-2 proteins form a complex, WCC, that activates frq transcription. The surprising finding was that WC-1 and WC-2 protein levels and WCC transcriptional activity were drastically reduced but frq RNA and protein levels remained the same. Under conditions where rhythmicity is expressed, the rhythms of frq RNA, FRQ protein, and expression of clock-driven "output" genes were also unaffected by starvation. The standard model for the molecular clock is a transcription/translation feedback loop dependent on the levels and activity of these clock gene products, so this disconnect between the starvation-induced changes in the stoichiometry of the loop components and the lack of effects of starvation on rhythmicity calls into question our understanding of the molecular mechanism of the clock. This is yet another example of the inadequacy of the TTFL model to explain rhythmicity. For me, the most significant sentence in the paper was this: "...an unknown mechanism must recalibrate the central clockwork to keep frq transcript levels and oscillation glucose-compensated despite the decline in WCC levels."

      The author's second approach was to try to identify mechanisms for the response to starvation by focussing on frq and its regulators, using mutations in the frq gene and strains with alterations in the activity of kinases and phosphatases known to modify FRQ protein. The finding that all of these manipulations have some effect on the starvation-induced changes in WC protein level is taken by the authors to indicate a role for FRQ itself in the response to starvation. This conclusion is subject to the caveat that manipulations of the activity of multifunctional kinases and phosphatases will certainly have pleiotropic effects on many cellular processes beyond FRQ protein activity.

      Because of the sometimes-speculative nature of our conclusions and based on the suggestion of the editor, we restructured the Discussion and discuss now the mechanism addressed by the Reviewer in the subsection "Ideas and Speculation". We added a sentence to the section about the possible pleiotropic effects of the tested signaling pathways: "Starvation triggers characteristic changes in the activity of signaling routes that affect basic components of the circadian clock. Although the multifunctional pathways might act via pleiotropic mechanisms as well, based on their earlier characterized role in the control of the Neurospora clock, their action can be inserted into a model describing the glucose-dependent reorganization of the oscillator."

      The third section of the paper is a major transcriptomic study of the effects of starvation on global gene expression. Two strains are compared under two conditions: wc wild-type and the wc-1 knockout strain, under fed and starved conditions. The hypothesis is that WCC has a role in the starvation response. The results of starvation on the wild-type are unsurprising and predictable: the expression of many genes involved in metabolic processes is affected. There are no new insights that come from these results and no new testable hypotheses are generated by the data.

      We agree with the reviewer that it is not surprising that glucose depletion strongly affects genes involved in metabolic processes and monosaccharide transport. These data obtained in wt served rather as a control for our experimental conditions. As a new aspect, our analysis focused on the differences between wt and wc-1 in the transcriptomic response to altered glucose availability.

      The authors refer to the wc-1 mutant strain as "clockless" and discuss its effects on the transcriptome only in terms of WC-1's function in the clock mechanism. However, WCC is known to be a major transcriptional regulator, controlling a number of genes beyond the TTFL. As acknowledged earlier in the paper, WC-1 is also the major light receptor in Neurospora. The transcriptomics experiments were carried out in a light/dark cycle, with cultures harvested at the end of the light period, when "an adapted state for light-dependent genes can be expected" according to the authors. However, wc-1 mutants are essentially blind, and so those samples are equivalent to being harvested in the dark. The multifunctional nature of WCC complicates the interpretation of the transcriptomics data. The differences in the transcriptome between wild-type and wc-1 may not be due to loss of clock function, but rather the loss of a major multifunctional transcription factor, or the difference between light and "dark".

      The reviewer is right, when we discussed the difference between wt and wc-1 in the transcriptional response to glucose, we did not emphasize the possible contribution of the photoreceptor function of the WCC. We added the following sentence to the revised version of the discussion: "Further investigations could differentiate between the clock and photoreceptor functions of the WCC in the glucose-dependent control of the transcriptome." Furthermore, we more specifically indicate that in wc-1 the lack of the WCC (and not the lack of a functional clock) results in the altered transcriptomic response to starvation when compared to wt (P15 L14-17).

      In the final set of experiments, the authors tested the hypothesis that the changes in the transcriptome between wild type and wc-1 might make wc-1 less competent to recover growth after starvation. They also test the recovery of frq9, a "clockless" mutant. The very surprising result is that the growth rates of these two mutants are slower than the wild type after transfer from starvation media to high glucose. This is surprising because there will be several generations of nuclear division and doublings of mass within a few hours and the transcriptome should have recovered fully fairly rapidly. A mechanism for this apparent "after-effect" is suggested with evidence concerning differences in expression of a glucose transporter, but it is not clear why this expression should not change rapidly with re-feeding on high glucose. As with previous experiments, the cultures were grown in light/dark cycles, which results in different conditions for the mutants, both of which have very low or absent WC-1 and are therefore blind to light. The potential effects of light have been disregarded.

      The reviewer is right that several generations of nuclear divisions occur within a few hours and lead to a number of doublings of the biomass. However, when the first phase of regeneration is delayed in one or more strains compared to the control, until the stationary phase a substantial difference in the biomass can be expected.

      To the expression change of the glucose transporter: In order to emphasize the different tendency of how glt-1 levels respond to glucose in the different strains, in the previous version of the manuscript we normalized the expression levels to the beginning of recovery (time point of glucose addition). Thus, expression differences between the strains were not shown. To give a more comprehensive picture, in the revised version of the manuscript expression levels without normalization are depicted (Fig 5F). The mutants did not adapt efficiently to changes in the glucose levels, i.e. expression of the transporter was relatively high in both wc-1 and frq10 during starvation and did not further increase upon glucose addition. On the other hand, 24 hours after glucose resupply, glt-1 levels were similar in all strains which might contribute to the similar growth rates observed under steady-state conditions in the standard medium.

      To the photoreceptor-independent function of the WCC during growth recovery: In the revised version of the manuscript we present additional data suggesting the importance of the photoreceptor-independent function of the WCC for efficient recovery from starvation. Fig. 5C and Fig. 5D show now that upon resupply of glucose, wt grows faster than the clock-deficient strains Δwc-1 and frq10 in both LD cycles and constant darkness, indicating that the role of the WCC in growth regeneration is at least partially independent of its photoreceptor function. To the function of the WCC in frq10: frq10 can not be considered blind. Although both Δwc-1 and frq10 lack a functional clock and WC levels are reduced in frq10, these strains show significant differences in WCC activity. While Δwc-1 is considered blind, in frq10 lack of the negative feedback results in high activity of the WCC in both DD and LL and expression levels of all examined, light-sensitive or light-dependent genes were found comparable in wt and in frq-less mutants (Schafmeier et al., 2005; Hunt et al., 2007; own unpublished data).

      The title of the paper refers to a "flexible circadian clock" but this concept of flexibility is not developed in the paper. I would substitute "the White Collar Complex" for this phrase: "Adaptation to starvation requires a functional White Collar Complex in Neurospora crassa" would be more accurate. Some experiments are also conducted using an frq null "clockless" strain, but because WC expression is very low in frq null mutants, any effects of frq null could also be attributed to WC depletion.

      As detailed above, low level of the WCC in the frq-less mutant does not mean low transcriptional activity and accordingly, the two clock mutants, wc-1 and frq10 show important functional differences. We used the word "flexible" to indicate that the molecular clock is able to operate under critical nutrient conditions and with a significantly changed stoichiometry of its key components. Results of our new experiments performed in DD (mentioned above) indicate that growth regeneration is rather independent of the photoreceptor function of the WCC. Nevertheless, we accepted the criticism of the reviewer and changed the title to "Adaptation to glucose starvation is associated with molecular reorganization of the circadian clock in Neurospora crassa".

      The major conclusion I took away from this paper is the multifunctional nature of the WCC as a transcription factor complex. It has been known for a long time that WCC controls the expression of many genes beyond the frq gene at the core of the circadian transcription/translation feedback loop. WC-1 is also the major blue light photoreceptor in Neurospora, controlling the expression of light-regulated genes, and this fact is barely touched on in the paper. These new data now extend the role of WCC in the regulation of metabolic networks as well.

      Reviewer #2 (Public Review):

      The authors have performed an interesting study addressing a topical question in considering how circadian oscillators remain accurate in changing environmental conditions and these circadian oscillators contribute to responses to environmental changes. The authors have performed their studies in Neurospora crassa. The authors have made a very interesting finding that starvation causes a profound decrease in white collar 1 WC-1 abundance, yet the circadian system continues to run despite this decrease in the abundance of a core oscillator component. The study of chronic glucose starvation in a Δwc-1 mutant is interesting and provides the opportunity to investigate the role of the WHITE COLLAR COMPLEX (WCC) and the clock system in adaption to starvation.

      Strengths:

      The authors have used a range of techniques to measure clock behaviour, including qPCR, phosphorylation, protein abundance, and subcellular localisation studies.

      An frq9 mutant was used to test the effects of FRQ on WC1 abundance since WC1 decreased during starvation. This is elegant, though it is not quite clear the logic of this experiment because FRQ did not change abundance during starvation, so why did the author think this experiment was needed?

      We regret that the examination of frq9 was not clearly justified in the previous version of the manuscript. It is true that FRQ levels did not change during starvation, only phosphorylation of the protein was affected, i.e. FRQ became more phosphorylated (displayed by an electrophoretic mobility shift on the Western blot (Garceau N, Liu Y, Loros J J, Dunlap J C. Cell. 1997;89:469–476.)) under low glucose conditions. We tested the starvation response in the FRQ-less strain because WCC level changed significantly in wt upon glucose depletion and expression of WC proteins is known to be controlled by FRQ. In the revised version of the manuscript we tried to introduce and explain the experiments performed with frq9 more thoroughly (P7 L22-P8 L14; P16 L21 – P17 L6).

      An interesting experiment was performed to test whether CK1a-dependent phosphorylation and inactivation of the WCC are involved in the starvation response. An FRQΔFCD1-2 mutant is used in which FRQ cannot interact with CK1a and therefore CK1a cannot phosphorylate and inactivate WC. This experiment suggested that CK1a is not involved in the response to starvation, again leading to the conclusion that FRQ is not involved in the starvation regulation of WC.

      The referee is right, effect of FRQ-bound CK-1a seems to be minor on the adaptation of the molecular clock to starvation, and this is also our conclusion in the manuscript. The major message of this experiment was that FRQ became phosphorylated in response to starvation without stably interacting with CK1a, probably via another mechanism. We agree with the notion that the behavior of WCC levels upon starvation was similar to that in the FRQ-less mutant.

      PKA is shown to be involved in the starvation-induced reduction of WC because the starvation-induced reduction in abundances of WC-1 was absent in the mcb strain in which the regulatory subunit of PKA is defective and hence, PKA is constitutively active.

      The authors have found an interesting potential link between glucose levels and WCC phosphorylation, they demonstrated that starvation reduces PP2A activity and that in a regulatory mutant of PP2A, which has reduced PP2A activity, there is little effect of starvation on WCC levels, suggesting the hypothesis that glucose-dependent PP2A dephosphorylation stabilises WCC.

      Analysis of starvation-regulated transcriptome in Δwc-1 and wild type found strong evidence that the transcriptomic response to starvation is in part dependent on WCC. Much of the misregulated transcriptome appears to be associated with metabolism.

      In a series of growth studies in wild-type frq and wc-1 mutants the authors provide strong evidence that FRQ and WC are involved in growth and survival following starvation, and recovery from starvation.

      Weaknesses:

      The authors describe Neurospora crassa as a model for circadian biology and apparently make the assumption that the findings are indicative of the behaviour of clock systems in other kingdoms. This is not the case. Neurospora crassa is a wonderful model for studying fungal clocks and is a great tool for studying basic circadian dynamics, but the interesting findings here are of a detailed molecular nature and therefore are applicable for fungal clocks, but not other kingdoms.

      We agree that we still do not know whether the described mechanism is specific for only fungal clocks. However, besides the basic feedback loop, overlapping mechanisms (controlled by e.g. casein kinases, glycogen synthase kinase, PKA, PP2A) are involved in the regulation of circadian timekeeping in different eukaryotic systems (reviewed in Reischl and Kramer, 2011, FEBS Lett; Brenna and Albrecht, 2020, Front Physiol). Our results suggest that some of these common factors (PKA, GSK, PP2A) are involved in the reorganization of the Neurospora clock in response to changes in glucose availability. Therefore, it is possible that analogous changes occur in the time keeping mechanisms of other eukaryotic systems when they face serious environmental challenges.

      We included a short section into the Discussion which gives a short overview about known interactions between glucose availability and circadian timekeeping at different levels of the phylogenetic hierarchy (P15 L18 – P16 L7).

      The authors assume that the reader is intimate with the intricacies of Neurospora crassa circadian studies and the significance of differences between LL and DD investigations. More background on the logic of the experiments would be helpful for readers from other fields.

      Thank you for the comment. In the revised version of the manuscript we tried to introduce the molecular clock of Neurospora more thoroughly and completed the description of the experimental conditions with detailed explanations.

      The data in Figure 2 are essential for the interpretation of the findings, demonstrating the presence of free-running rhythms. However, the data are entirely qualitative, making it hard to fully assess the authors' interpretations, a more quantitative assessment of the data would improve clarity.

      We quantified the Western blot signals and show the results in Fig 1E in the new version of the manuscript (according to the reviewer's suggestion Fig 2 of the old version is now part of Fig 1). Our data indicate that oscillation of FRQ levels is similar under both nutrient conditions.

      The conclusion that FRQ contributes to the regulation of WC1 abundance in response to starvation does not seem to be supported by the data because FRQ RNA does not change upon starvation. Furthermore, the authors conclude that the starvation-induced decrease in WC-1 and WC-2 protein levels are due to FRQ because a lack of reduction in an frq9 mutant is open to misinterpretation because this mutant makes WC levels low and therefore starvation might not lower already low levels of WC. Indeed WC-1 is lower in the frq9 mutant under any condition than in the WT under starvation and WC-2 does decrease in abundance in the frq9 mutant in starvation. The data strongly suggest to this reader that FRQ does not participate in the regulation of WC abundance in response to starvation.

      After rereading the criticized section, we admit that the text was not well structured and we carried out several modifications. We intended to emphasize that upon drastic changes of the glucose availability frq RNA levels remained compensated in wt, but this compensation was affected when functional FRQ was not present. We agree with the reviewer's opinion that the low expression of the WCC in frq9 makes it difficult to compare the glucose-dependence of WCC expression in frq9 and wt. We modified the conclusion by adding this information and now mainly focus on the strain-dependent difference in the changes of frq RNA expression. (P7 L22-P8 L14)

      The discussion accurately summarises the results and provides an interpretation but lacking is a comparison to other circadian systems in other kingdoms. How do the data compare with the effects of glucose and other sugars on the mammalian, plant, and insect clocks?

      We included a short section into the Discussion which gives a short overview about known interactions between glucose availability and circadian timekeeping in different organisms (P15 L18-P16 L7).

      How changes in WCC might result in changes in transcription is not explained. This might be very obvious to the authors but to the reader, it is not. Are the transcriptional outputs direct targets of WCC? Has WCC CHIPseq been performed by the authors or others, are the regulated transcripts directly bound by WCC? What are the enriched promoter sequences in the regulated genes, is it possible to identify the network by which these changes in transcription occur?

      We now show the list of genes (Figure 4 – Figure supplement 2) that changed in a strain-specific manner in response to glucose starvation and, based on Chip-Seq results, were earlier described as direct targets of the WCC (Smith et al., 2010; Hurley et al., 2014). Based on the literature data showing that the WCC affects the expression of several other transcription factors and controls basic cellular functions which might affect the expression of further genes, it was not surprising that only 90 out of the 1377 genes were reported to be direct targets of the WCC.

      Whilst the authors claim it is the circadian clock that is involved in the starvation response, in my view a more precise interpretation of the data is that WCC is involved in the response. Since WCC is a photoreceptor with dual function in the clock, is it yet possible to conclude that the effects discovered are due to the clock role of WCC? Or do the data support the role of light signalling in regulating the starvation response through WCC?

      We thank you for the comment. In the revised version of the manuscript we more specifically indicate that in wc-1 the lack of the WCC (and not the lack of a functional clock) results in the altered transcriptomic response to starvation compared to wt. In addition, in the revised version we present a new experiment (Fig. 5D.) which shows that upon resupply of glucose wt grows faster also in constant darkness than the clock-deficient strains wc-1 and frq10 do. This indicates that the role of the WCC in growth regeneration is largely independent of its photoreceptor function.

      The authors do not apparently reconcile that the effect of starvation is to hugely decreases WCC levels, but they find the transcriptional and growth response to starvation requires WCC?

      We agree with the reviewer that the problem of how low levels of WCC could sufficiently support the transcription of frq and different output genes under starvation conditions was not discussed properly. Our results suggest a model in which the maintained level of nuclear WCC and the weakened inhibition by both FRQ (the hyperphosphorylated form is less active in the negative feedback) and PKA (its activity lowered upon glucose depletion) together might ensure that transcriptional activity of the WCC is preserved upon glucose withdrawal in both DD and LL despite the decrease of the overall level of the complex. In the revised version these aspects are discussed more thoroughly (P16-18).

      This study contributes to the increased focus of the circadian community on the regulation of outputs by circadian oscillators. The manuscript will be of interest to many in the field. There needs to be less assumption of knowledge about the N. Crassa circadian system, and better discussion in a broader context of clocks in other kingdoms.

      We added a new section to the Discussion with data concerning interrelationships between glucose availability and the circadian clock in other organisms.

    1. Author Response

      Reviewer #1 (Public Review):

      Drosophila ovarian follicle cells have been utilized as a model system to study organogenesis and tumorigenesis of epithelia. Studies have found that lack of proper cell polarity causes invasive delamination of cells and formation of multilayered epithelia, reminiscent of Epithelial-Mesenchymal Transition (EMT). Using this system, the authors analyzed the single-cell transcriptome of follicle cells and show that distinct cell populations emerge shortly after induction of polarity loss. Authors identified dynamic activation of Keap1-Nrf2 pathway Finally, subpopulation classification and analysis of regulon activity identified that Keap1-Nrf2 pathway is responsible for epithelial multilayering caused by polarity loss.

      Strengths:

      The authors characterized the single-cell transcriptome of follicle cell subpopulations after induction of polarity loss. Using temperature-inducible driver, they can induce the polarity loss in a short period of time, which enables detection of epithelial populations in various transition stages. Detected cell-heterogeneity could be caused intrinsically or by environmental cues within in vivo tissue. Therefore, it is likely well recapitulating tumorigenesis in vivo.

      Weaknesses:

      1) Authors should show cells corresponding to identified key cell clusters within the tissue by immunostaining, GFP-trap, or RNA FISH.

      We thank the reviewer for their comment. However, for this particular case, we would like to underscore the observation that the clusters derived from our integrated analysis do not exhibit mutually exclusive gene expression. This is unlike other studies where different clusters exhibit unique markers. The different clusters in this study represent distinguishable cell states and not distinct cell types. Even though the Lgl-KD follicle cells transcriptomically deviate from their corresponding cells of origin to form their own clusters, they continue to express several markers that show gene-expression overlap with normal follicle cells. This overlap exacerbates the problem of identifying distinct cells using differentially-enriched markers.

      However, we have shown the antibody staining against Drpr to identify cluster 8 follicle cells that associate with Dcp1+ dying germline cells. We have used GstD-lacZ reporter (cluster 7 marker, specifically cluster 7_3) to show pathway activity within the multilayer. Besides GstD-lacZ, we also show F-Actin enrichment in cluster 7 (specifically 7_3) cells, that is significantly enriched in invasive cells. Additionally, we now have added images depicting the cell/stage specific expression pattern of JNK pathway components pJNK and puc, as well as that of Thor (4E-BP) which is expressed at high levels in cluster 8 and medium levels in cluster 7, and Xbp1-GFP (UPR stress sensor) that marks late stages of Lgl-KD cells.

      2) Images are low magnification and difficult to see individual cells.

      We have replaced several such images in the revised manuscript. Specifically, the revised manuscript has entirely new (or improved versions of) image panels in figure 5. In figure 1A, the focus is the entire ovariole and therefore, we have only highlighted the enrichment of Hnt and pH3 antibody staining separately for a subsetted region of interest (ROI). The ROI panels are included within the larger image itself. For figure 6, we have converted the LUTs of panels showing distinct channels for RFP and Shg/Arm antibody stainings to grayscale.

      3) Manuscript is written weighted toward the technical aspect and more biology behind this study has to be discussed.

      We have added new paragraphs to discuss the evidence supporting the loss of polarity, specifically that of Lgl, in human cancers. Additionally, we have also discussed how our results regarding Keap1 relates to what is already known about it and the implications of our results in context to cancer progression and metastasis.

      Reviewer #2 (Public Review):

      Chatterjee et al. perform extensive image and single-cell RNA sequencing (scRNA-seq) analysis of Drosophila ovaries with and without knockdown of a gene, Lethal giant larvae (Lgl), which is known to establish apical-basal polarity as well as controlling proliferation of epithelial tissues. The goal of the study is to characterize the effect of apicobasal-polarity loss in epithelial cells via Lgl knockdown on Drosophila ovaries at the phenotypic, cellular, single-cell gene expression and regulatory level. By focusing on single-cell gene expression clusters that are unique to Lgl-KD compared to those from flies without the knockdown, they were able to identify a highly transient cluster (cluster 7) which consists of tumorigenic cells. Differential markers within a sub-cluster (cluster 7_3) of this cluster followed by validation using a GstD-lac-Z enhancer-trap reporter assay lead to their conclusion that cluster 7 represents the cells of multilayering phenotype (i.e., the major Lgl-KD phenotype observed from image analysis) where activation of Keap1-Nrf2 signaling was observed. The KEAP1-NRF2 pathway is associated with protecting cells from oxidative stress. KEAP1 forms part of an E3 ubiquitin ligase, which controls NRF2, a transcription factor, by targeting it for ubiquitin-mediated proteasomal degradation. Surprisingly, inducing loss of function of both Keap1 and separately NRF2 (cnc in Drosophila) in Lgl-KD cells resulted in the same phenotype/rescue, loss of the multilayering phenotype. Over expression of Keap1 in Lgl-KD induced increased multilayer volume compared to Lgl-KD alone further supporting the role of Keap1 in cellular invasion and possibly early stages of tumorigenesis when epithelial cells start losing their polarity.

      The strengths of this paper are:

      The mutually reinforcing advanced imaging, scRNA-seq and genetic manipulation (knockdown and over expression) experiments/analyses that largely support the major conclusions of the manuscript which are summarized above as well as more minor observations that the authors make.

      The systems biology flow of the study from broad to a specific gene/pathway implicated in the phenotype. The authors start with a clear phenotypic characterization of Lgl-KD and genome-wide scRNA-seq analysis. This leads to regulatory factor enrichment and further identification of a cluster (cluster 7) and then to a sub-cluster (cluster 7_3). This is followed by the identification of the KEAP1-NRF2 pathway and demonstration that KEAP1 knockdown and overexpression in Lgl-KD rescues and aggravates the cell multilayering phenotype, respectively.

      The multilayering phenotype, genes and regulatory factors associated with loss of polarity are known to play an important role in the epithelial to mesenchymal transition (EMT). For example, this includes the enrichment of AP-1 family members, which are known to regulate EMT, in the regulon analyses as well as identification of KEAP1-NRF2.

      The weaknesses of the paper are:

      The framing/motivation of the study could be improved especially for those who study EMT/metastasis in humans. Given that loss of polarity is one of many events associated with tumorigenesis and metastatic progression, the claims made that studying Lgl-KD in Drosophila ovaries directly leads to insights into tumor cell invasiveness, early stages of tumorigenesis and EMT may leave some readers doubtful if they are not familiar with Lgl. Reviewing major findings that show that Lgl is a tumor suppressor as is its human homologue Hugl-1 as well as making a stronger case that studying Lgl-KD in Drosophila is relevant for tumorigenesis and EMT would be helpful.

      We thank the reviewer for these suggestions. Accordingly, we have added new paragraphs to the Discussion section, where how the Lgl-KD mediated polarity loss links to mammalian tumorigenesis, as well as the implications of our results, have been discussed.

      Given that Keap1 antagonizes NRF2, the apparent contradictory result that inducing loss of function of both Keap1 and separately NRF2 (cnc in Drosophila) in Lgl-KD cells resulted in the same phenotype/rescue (loss of the multilayering phenotype) is not fully addressed. Keap1 over expression revealed it aggravates multilayering. NRF2 over expression experiments were not performed. In addition, it was shown that over expression and knockdown of Keap1 did not affect NRF2 gene expression (Figure 5C); however, Keap1 regulates Nrf2 at the protein level directly via ubiquitin-mediated proteasomal degradation. Nrf2 protein levels in flies with and without Lgl-KD with various manipulations of Keap1 including control, KD and OE were not measured.

      As the Keap1-Nrf2 pathway is widely studied in context of oxidative-stress response signaling, Keap1 is widely accepted as a negative regulator of Nrf2-driven transcription. However, Nrf2 has been found to positively drive the expression of Keap1 (Sykiotis and Bohmann, 2008), and that manipulating Keap1 did not change Nrf2 expression (Fig.5C). In response to this comment however, we performed additional experiments driving the ectopic expression of Nrf2 (CncC-OE) in Lgl-KD cells, which increased the invasiveness of Lgl-KD cells, similar to that by Keap1-OE. Since the UAS-CncC line has been shown to upregulate Keap1 expression (Sykiotis and Bohmann, 2008), we concluded that this increase in invasiveness is indirectly due to the increase in Keap1 expression itself.

      Given that the antagonizing relationship of Keap1 and Nrf2 is only relevant to oxidative-stress response pathway, the genetic epistasis experiments in this study render that relationship irrelevant in context to the observed phenotype, as KD or OE of both components result in comparable phenotypes. Previous studies showing that Keap1 plays a role in cytoskeletal regulation (which is in agreement with our observation) also add weight to the argument that the observed phenotype is likely an indirect consequence of Keap1-Nrf2 signaling activation.

      Many of the conclusions in early Results paragraphs are purely technical and not biological. For example, "These observations highlight the limitations of marker validation to identify specific cells of the differential Lgl-KD phenotype" and "SCENIC was able to detect the common as well as distinct transcriptomic states of the cells in unique Lgl-KD clusters, while also highlighting the heterogeneity among them". Some of these technical conclusions could be part of brief discussions in the Methods section.

      For those not familiar with various detailed scRNA-seq analysis approaches (e.g., RNA velocity analysis), a brief description of how they should be interpreted biologically in Methods would be helpful. This might help resolve what appear to be contradictory/confusing results. First, the upper branch of cluster 7 (which is a focus of the study) shown in Fig. 3B is in a "late" stage based on Velocity Pseudotime analysis (left panel) and a "root" or an early stage based on Terminal end-points of differential analysis (right panel). The bottom branch of cluster 7 is "late"/"stable end point" based on these two analyses which is now consistent. Second, given these differences between the upper and lower branch of cluster 7, how is cluster 7 biologically the same cluster? Third, the bottom branch of cluster 7 bleeds into cluster 8 and while Ets21C is uniquely expressed in the bottom branch of 7, important markers of the study including Jra, kay (AP-1 family members), grnd, cnc (NRF2), Keap1, and the genes shown in Fig. 6F are all robustly expressed in clusters 7 (bottom branch) and 8. The biologically relevant distinction between the bottom branch of cluster 7 and 8 is not clear. Is cluster 8 important/relevant to the phenotypes observed as well?

      We have now added the following paragraph elaborating the logical choices made within the analytical pipeline in our Methods section:

      In this study, we have highlighted RNA velocity-derived interpretations that strictly agree with the other analytical perspectives pursued in this study. We applied scVelo to obtain information on the underlying lineage for (1) all unique Lgl-KD clusters, and (2) cluster-7 cells. The cells of the unique Lgl-KD clusters represent a mixed population of mitotic, post-mitotic, border-follicle cells and dying germline-cell associating cells that depict inconsistent transcriptional lineages. In this group of cells, the true developmental end-point of the observed Lgl-KD lineage is cluster 8 (germline-cell death occurs at the end of Lgl-KD follicular development), which likely consists of a mixed population of cells from the lateral epithelia as well as the multilayered epithelia, all responding to germline-cell death. Indeed, certain sections of cluster 7 appear more similar to cluster 8 and others seem comparable to that of cluster 13. These observations underscore our conclusions that the unique Lgl-KD clusters exhibit distinguishable gene expression, representing different cell states. For cluster 7, the state of transcriptomic heterogeneity is what defines its unique state of gene expression and we have assessed this heterogeneity by specifically sub-setting those cells.

      For a comprehensive interpretation of the results of the RNA-velocity based analysis, more information can be found in the scVelo tutorial (https://scvelo.readthedocs.io/).

    1. Author Response

      Reviewer #1 (Public Review):

      Gu et al. examine how activity in the substantia nigra pars reticulata (SNr) contributes to proactive inhibition - the suppression of upcoming actions - by recording SNr activity in rats performing a task requiring them to be prepared to cancel a planned movement. This task was developed in a previous study by the same authors in which they examined how globus pallidus pars externa (GPe) activity depends on proactive inhibition (Gu et al., 2020), which motivated the present focus on SNr. The task is rich and the complementary analyses of how the neural activity relates to the behavior, at the level of individual neurons and populations, are appropriate and illuminating. Overall, this study is well done and has the potential to be a nice contribution to our understanding of how the SNr, and therefore the basal ganglia, mediate behavioral inhibition. Addressing a few questions, however, would improve the paper.

      We appreciate both the positive comments and constructive criticism.

      • It is not obvious why the presence or absence of proactive inhibition should be determined on a session-by-session basis. It seems quite possible that proactive inhibition is not an all-or-none phenomenon, and also that it might be exhibited to a greater or lesser extent across a session (e.g., due to changes in motivational drive). It would therefore strengthen the paper to better explain the rationale for comparing neural activity across entire sessions "with" and "without" proactive inhibition. Within-session variation in proactive inhibition could be quite advantageous, allowing for within-neuron comparisons. It is even possible that the differences in neural activity that the authors report here using session-by-session analysis are an underestimate of the true effect of proactive inhibition.

      It is true that some of our analyses compare whole sessions with- and without- overall behavioral evidence for proactive inhibition. But our primary results come from within-session comparisons of Maybe-Stop to No-Stop trials. For this purpose, the session-wide assessment of proactive inhibition is primarily a screen for which sessions to use for within-session analysis.

      It would be desirable if we could use behavior to determine the degree of proactive inhibition on each individual trial, and then compare this to neural measures. Unfortunately, this is not generally feasible in our experiments. Our key evidence for proactive inhibition is the prolongation of reaction times (RTs). However, RTs are famously highly variable over trials. This variability likely reflects a variety of factors, not simply proactive inhibition. For example, in our previous paper (Gu et al. 2020) we showed that dividing trials into slower and faster RTs did not reproduce the same neural differences as comparing Maybe-Stop to No-Stop trials.

      An alternative approach to investigating proactive inhibition is to focus on the increased restraint that typically follows over-hasty responses. We found that when rats fail to Stop, on the next trial the degree of SNr variability increases (Fig. 6). We have now expanded this analysis to include additional types of errors. We find that another form of over-hasty action, premature responses before the Go cue, are also followed on the next trial by increased SNr variability (Fig. 6- supp1). By contrast, other error types (wrong choices; failure to respond quickly enough) do not provoke greater variability. These additional within-session analyses provide convergent evidence for increased variability as an adaptive response to failures evoked by excessive haste.

      • It is difficult to rule out alternative explanations for the observed differences in SNr activity. While the authors acknowledge this point in the 3rd paragraph of the discussion, they only discuss one potential alternative - reward expectation. Another difference between maybe-stop and no-stop trials is the likelihood that a particular target should be selected, which has also been shown to modulate SNr activity (Basso & Wurtz, 2002). As is often the case with complex behavioral tasks, there may be many other differences between trial types that may contribute to differences in neural activity. It would be helpful for the authors to more fully explain how their results relate to contextual modulation of SNr activity, and why the dependence of SNr activity on proactive inhibition may be a novel finding.

      We have expanded the Discussion to include additional alternative explanations.

      • A natural question arising from this study, as with most studies of neural recordings during behavior, is the causal nature of the neural activity. It would be non-trivial and beyond the scope of the current study to perform the sort of perturbations that could determine whether population variability causally relates to preparation to suppress actions. But it would be useful to discuss future experiments that might be able to test causality.

      We added in Discussion the possibility of using optogenetic manipulations of specific inputs to SNr, to help determine their distinct contributions to SNr firing patterns and proactive behavior.

      Reviewer #2 (Public Review):

      The authors have recorded the activity of neurons in the rat substancia nigra pars reticulata (SNr) while animals performed a version of a stop-signal task. The goal of this study is to investigate and describe the contribution of SNr in proactive inhibitory control. By examining single-cell responses as well as population activity, the authors show that increasing the probability of stop signal trials induces several changes in SNr responses. First, specific populations of SNr neurons increase their activity during proactive, direction-specific inhibition. At the population level, neurons are biased away from the side of the movement that has to be potentially inhibited. Second, during proactive inhibition, neuron activity is more variable, both at the single-cell and population levels. Finally, the authors show that animals' outcome history influences both firing rates and variability of neuron responses in the current trial. Especially, neural variability is increased following a failure to inhibit a movement.

      Strengths

      The manuscript provides an interesting and timely insight into the role of the basal ganglia output nucleus in movement initiation control. The paper is often clearly and concisely written (although see one issue related to this below). One of the main strengths of the work is to allow an interesting comparison with recent work by the same team, aimed at investigating the responses of another basal ganglia nucleus (GPe) in the same task, using similar analyses (this comparison is not extensively exploited in the discussion section though). Another potential strength is the use of different analysis scales. The authors investigated single-unit responses as well as population "trajectories" in the neural state space. This is an interesting option that could have been better motivated, given that the two approaches assume quite different brain operations.

      Thank you for the interest and careful comments.

      Weaknesses

      The analyses and results sometimes lack clarity and details. For instance, and unless I missed the information, it is not clearly stated whether "maybe-stop" trial analyses only include Go trials or if (failed) Stop trials are also considered. Moreover, quite complicated figures are often described very briefly in the main text. Methods are also often too succinctly described, and sometimes refer to a previous publication (Gu et al., 2020) that readers did not necessarily read.

      We have made a range of changes to make the analyses and their rationale more clear. This includes specifying that Maybe-Stop trials include both Go and Stop trials (and why). We have also added more details in both main text and Methods.

      There are some points that the authors might need to discuss more. Especially, a global picture of the role of the different basal ganglia nuclei during movement control would have been appreciated. Also, the authors monitored the activity of the rat basal ganglia output. We would have appreciated more information regarding the impact of this output activity on SNr target areas, as compared to their previous work that focused on GPe for instance. Another example concerns the observation that SNr activity is elevated during active inhibition regardless of the firing rate pattern before movement (increase or decrease). As noted by the authors themselves, this is inconsistent with the classical role assigned to the basal ganglia output nucleus (i.e. a decrease in activity promotes movement). Despite that this observation is of potential interest to readers working on the basal ganglia, it is not discussed.

      The revised Discussion includes a section on how altered basal ganglia output may affect targets to alter behavior.

    1. Author Response

      Reviewer #3 (Public Review):

      In the submitted manuscript, Eliazer et. al. conclude that Dll4 and Mib present on myofibers maintain a continuum of SC fates providing SCs capable of regenerating muscle and repopulatin the SC niche. The data provide new insights into the maintenance of SCs, demonstrating niche-derived factors are responsible for regulating SC behavior. Loss of either Dll4 or Mib from the myofiber reduces SC numbers and impairs muscle regeneration. Overall the data provide compelling evidence that niche-derived Dll4 and Mib regulate SC fate, however, whether the interaction maintains a continuum of SC fates as concluded by the authors is insufficiently supported by the data provided.

      We thank the reviewer for their comments.

      One significant issue with the manuscript is the "discovery" of an SC continuum related to the relative levels of Pax7 expression. A similar continuum was established nearly a decade ago by Zammit et al., 2004 and Olguin et al., 2004 and thus, is not new. The authors need to reference the work and discuss the prior published data with regard to the observations in the current manuscript. The data establishing a continuum of SCs and the relationship to Pax7 protein levels can largely be eliminated and referenced by the two former manuscripts. For example, these manuscripts establish that elevated Pax7 levels drive quiescence and low Pax7 levels correlate with differentiation. The data from these manuscripts establish that SCs with modest Pax7 protein levels can acquire quiescence accompanied by increases in Pax7 protein

      The omission of these two seminal papers was a massive oversight on our behalf. They have now been included. In the original manuscript we acknowledged that SCs exist on a continuum-a gradual transition from one state to another, based on scRNA-seq studies and the present data (Dll4, Pax7 and Ddx6 expression). The references for the sequencing data were included. But with all due respect to the reviewer, the Zammit and Olguin papers binned Pax7 into discrete classes once satellite cells had activated. This is not a demonstration of a continuum. Moreover, we do not make any statements about Pax7 levels in activated conditions. Therefore, the reviewer is drawing comparisons between two different contexts. The statements we have made as they pertain to a continuum under homeostatic conditions are accurate with publications to date.

      The data relating the level of Pax7 expression with Dll4a and Mib are intriguing but the authors do not establish a direct relationship, demonstrating that Dll4 or Mib regulate Pax7 levels. An alternative explanation is that Dll4 and Mib inhibit differentiation and thus promote SC quiescence indirectly. This is a critical distinction, as the authors could be correct and Dll4 via Mib regulate SC fate.

      We don’t make the claim that Dll4/Mib1 regulates Pax7 directly. We would side with the majority of publications showing that Notch signaling directly regulates Pax7. We have now added further experiments to examine whether Dll4 regulates Notch signaling. We crossed a transgenic mouse line harboring a Notch reporter with MF-Dll4 mice to analyze Notch signaling in SCs. The first experiment we performed with this reporter was to correlate the levels of Pax7 and Notch signaling on a cell-by-cell basis. In control mice, we found a linear positive relationship between levels of Pax7 and the Notch reporter. Next, we compared Notch reporter levels in control versus Dll4-null. We observed that Notch reporter levels decreased to below detectable levels in Dll4 null muscle. Therefore, Dll4 acts non-autonomously to regulate Notch signaling in SCs during homeostasis (refer to Reviewer 1 comment 1, and Essential revisions #3).

      The reviewer raises an important point: Does Notch regulate quiescence directly or a differentiation/commitment program when SCs are in a quiescent state. We never claimed that Dll4/Mib1 regulates quiescence. The only way to conclude anything about quiescence would be to examine expression of proliferative markers in vivo. Rather, throughout the manuscript we referred to Dll4 regulating the state of the quiescent SC pool, as measured by changes in Pax7 and Ddx6 expression. In the discussion section we had discussed that Notch signaling may regulate differentiation/commitment of cells in a quiescent state.

      It is unclear that the loss of Dll4 or Mib1 reduce diversity of SCs. If these repress differentiation then their loss would be expected to enhance differentiation and reduce SC numbers, which is what the data demonstrate.

      Diversity can be restated as the variability across a population. We demonstrate that the variance of Pax7 and Ddx6 expression decreases after Dll4 deletion. Important to note that we are analyzing the SCs that are not lost through differentiation. The fact that some of the SCs are lost through differentiation is not inconsistent with a shift in the continuum. We expect SCs to be lost through differentiation as they shift along the continuum towards a Dll4/Notch/Pax7 low state.

      We observe reduced number of Dll4/Pax7 high cells, which is consistent with a shift in continuum. The counterpoint would be that Dll4/Notch/Pax7 high cells commit to differentiation. There is no evidence for that conclusion in this work or any other work published to date. We discussed this issue in the results section.

      We have also performed an experiment where mice were treated with a lower dose of TMX to reduce rather than delete Dll4. We find that the total number of SCs does not change, while the relative number of Dll4/Pax7 high cells is reduced while mid and low are increased (Figure 4). This is consistent with a shift in a continuum of states.

      Finally, the injury data provided are for 4d post injury and thus, the data may represent a delay in regeneration as opposed to a failure to regenerate. At 30 d post injury regeneration is typically considered complete. How do wild type and Dll4 null as well as Mib null muscle compare at 30d post injury.

      We analyzed muscle regeneration of MF-Dll4fl/fl tissue, 40 days after injury. The mean CSA of muscle fibers are significantly smaller than the control fibers, suggesting a defect in tissue regeneration. This is now included in Figure 5-figure supplement 2. Due to time constraints, we have not performed the same experiment with Mib1 mutants.

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      The data presented here is, on the whole, descriptive. Whilst the descriptive elements are strong and important, more analysis and quantification is required to support the conclusions made in the paper. For example, in contrast to their analysis of the rail-MIP, their assertion that the ciliary vane orientation is linked to the CPC orientation is not backed up by quantification. In addition, this paper does not extensively discuss proteins within the MIP densities and central pair complex in detail, to the extent they can be discussed using the recent structures from Chlamydomonas.

      We thank the reviewer for pointing out these areas for improvement, which are addressed. We are grateful for their helpful suggestions, which we have incorporated to the best of our ability to improve the quality of the manuscript.